How can I correctly change the upos of words in a sentence using Stanza? #1352

edump72 · 2024-02-24T17:54:22Z

edump72
Feb 24, 2024

I am using Stanza in order to receive a sentence and change its upos tag so that I can get a more personalized constituency tree. This is my code snippet :

import stanza
import nltk
from nltk import Tree
from nltk.draw.util import CanvasFrame
from nltk.draw import TreeWidget

sentence = "Juan camina al parque con su madre."
pipeline = stanza.Pipeline('es', processors='tokenize,mwt,pos,lemma,depparse,constituency')
parser = pipeline(sentence)

upos_mapping = {
"NOUN": "Sustantivo",
"ADP": "Preposición",
"DET": "Determinante"
}

for sentence in parser.sentences:
for word in sentence.words:
if word.upos in upos_mapping:
word.upos = upos_mapping[word.upos]

parsed_sentence = parser.sentences[0].constituency.children[0].children

print(parsed_sentence, parser)

The problem that I am facing is that when I print the "parser" variable that contains the processed sentence, I get all the upos modified. However, when I print the constituency parser, the upos tags aren´t changed. This is what I get :

((sn (grup.nom (PROPN Juan))), (grup.verb (VERB camina)), (sn (spec (ADP a) (DET el)) (grup.nom (NOUN parque))), (sp (prep (ADP con)) (sn (spec (DET su)) (grup.nom (NOUN madre)))), (PUNCT .)) [ [ { "id": 1, "text": "Juan", "lemma": "Juan", "upos": "PROPN", "xpos": "np00000", "head": 2, "deprel": "nsubj", "start_char": 0, "end_char": 4 }, { "id": 2, "text": "camina", "lemma": "caminar", "upos": "VERB", "xpos": "vmip3s0", "feats": "Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin", "head": 0, "deprel": "root", "start_char": 5, "end_char": 11 }, { "id": [ 3, 4 ], "text": "al", "start_char": 12, "end_char": 14 }, { "id": 3, "text": "a", "lemma": "a", "upos": "Preposición", "xpos": "spcms", "head": 5, "deprel": "case" }, { "id": 4, "text": "el", "lemma": "el", "upos": "Determinante", "feats": "Definite=Def|Gender=Masc|Number=Sing|PronType=Art", "head": 5, "deprel": "det" }, { "id": 5, "text": "parque", "lemma": "parque", "upos": "Sustantivo", "xpos": "ncms000", "feats": "Gender=Masc|Number=Sing", "head": 2, "deprel": "obl", "start_char": 15, "end_char": 21 }, { "id": 6, "text": "con", "lemma": "con", "upos": "Preposición", "xpos": "sps00", "head": 8, "deprel": "case", "start_char": 22, "end_char": 25 }, { "id": 7, "text": "su", "lemma": "su", "upos": "Determinante", "xpos": "dp3cs0", "feats": "Number=Sing|Person=3|Poss=Yes|PronType=Prs", "head": 8, "deprel": "det", "start_char": 26, "end_char": 28 }, { "id": 8, "text": "madre", "lemma": "madre", "upos": "Sustantivo", "xpos": "ncfs000", "feats": "Gender=Fem|Number=Sing", "head": 2, "deprel": "obl", "start_char": 29, "end_char": 34 }, { "id": 9, "text": ".", "lemma": ".", "xpos": "fp", "feats": "PunctType=Peri", "head": 2, "deprel": "punct", "start_char": 34, "end_char": 35 } ] ]

I have not find any resources in the Internet. I would be pleased if someone could help me. Thank you!

Answered by AngledLuffa

Feb 24, 2024

The tree is built using the POS tags at the time of parsing, so changing them afterwards won't affect the tree. You could always edit the tree as well after running the pipeline. There is a function in stanza/models/constituency/utils.py which replaces the tags of a tree with a new list of tags, "replace_tags" Alternatively, you could use a specialized version of the POS annotator which replaces tags at the time of running the POS tagger, but I do not recommend this, as each of the other annotators you use (lemma, depparse, constituency) are expecting the Stanza tags.

View full answer

AngledLuffa · 2024-02-24T19:42:26Z

AngledLuffa
Feb 24, 2024
Maintainer

The tree is built using the POS tags at the time of parsing, so changing them afterwards won't affect the tree. You could always edit the tree as well after running the pipeline. There is a function in stanza/models/constituency/utils.py which replaces the tags of a tree with a new list of tags, "replace_tags" Alternatively, you could use a specialized version of the POS annotator which replaces tags at the time of running the POS tagger, but I do not recommend this, as each of the other annotators you use (lemma, depparse, constituency) are expecting the Stanza tags.

…

On Sat, Feb 24, 2024 at 9:54 AM edump72 ***@***.***> wrote: I am using Stanza in order to receive a sentence and change its upos tag so that I can get a more personalized constituency tree. This is my code snippet : import stanza import nltk from nltk import Tree from nltk.draw.util import CanvasFrame from nltk.draw import TreeWidget sentence = "Juan camina al parque con su madre." pipeline = stanza.Pipeline('es', processors='tokenize,mwt,pos,lemma,depparse,constituency') parser = pipeline(sentence) upos_mapping = { "NOUN": "Sustantivo", "ADP": "Preposición", "DET": "Determinante" } for sentence in parser.sentences: for word in sentence.words: if word.upos in upos_mapping: word.upos = upos_mapping[word.upos] parsed_sentence = parser.sentences[0].constituency.children[0].children print(parsed_sentence, parser) The problem that I am facing is that when I print the "parser" variable that contains the processed sentence, I get all the upos modified. However, when I print the constituency parser, the upos tags aren´t changed. This is what I get : ((sn (grup.nom (PROPN Juan))), (grup.verb (VERB camina)), (sn (spec (ADP a) (DET el)) (grup.nom (NOUN parque))), (sp (prep (ADP con)) (sn (spec (DET su)) (grup.nom (NOUN madre)))), (PUNCT .)) [ [ { "id": 1, "text": "Juan", "lemma": "Juan", "upos": "PROPN", "xpos": "np00000", "head": 2, "deprel": "nsubj", "start_char": 0, "end_char": 4 }, { "id": 2, "text": "camina", "lemma": "caminar", "upos": "VERB", "xpos": "vmip3s0", "feats": "Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin", "head": 0, "deprel": "root", "start_char": 5, "end_char": 11 }, { "id": [ 3, 4 ], "text": "al", "start_char": 12, "end_char": 14 }, { "id": 3, "text": "a", "lemma": "a", "upos": "Preposición", "xpos": "spcms", "head": 5, "deprel": "case" }, { "id": 4, "text": "el", "lemma": "el", "upos": "Determinante", "feats": "Definite=Def|Gender=Masc|Number=Sing|PronType=Art", "head": 5, "deprel": "det" }, { "id": 5, "text": "parque", "lemma": "parque", "upos": "Sustantivo", "xpos": "ncms000", "feats": "Gender=Masc|Number=Sing", "head": 2, "deprel": "obl", "start_char": 15, "end_char": 21 }, { "id": 6, "text": "con", "lemma": "con", "upos": "Preposición", "xpos": "sps00", "head": 8, "deprel": "case", "start_char": 22, "end_char": 25 }, { "id": 7, "text": "su", "lemma": "su", "upos": "Determinante", "xpos": "dp3cs0", "feats": "Number=Sing|Person=3|Poss=Yes|PronType=Prs", "head": 8, "deprel": "det", "start_char": 26, "end_char": 28 }, { "id": 8, "text": "madre", "lemma": "madre", "upos": "Sustantivo", "xpos": "ncfs000", "feats": "Gender=Fem|Number=Sing", "head": 2, "deprel": "obl", "start_char": 29, "end_char": 34 }, { "id": 9, "text": ".", "lemma": ".", "xpos": "fp", "feats": "PunctType=Peri", "head": 2, "deprel": "punct", "start_char": 34, "end_char": 35 } ] ] I have not find any resources in the Internet. I would be pleased if someone could help me. Thank you! — Reply to this email directly, view it on GitHub <#1352>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWPG7UMSWJDFSW4Q5F3YVISNXAVCNFSM6AAAAABDYFPHJGVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZWGI3DINRSGE> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I correctly change the upos of words in a sentence using Stanza? #1352

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How can I correctly change the upos of words in a sentence using Stanza? #1352

edump72 Feb 24, 2024

Replies: 1 comment

AngledLuffa Feb 24, 2024 Maintainer

edump72
Feb 24, 2024

AngledLuffa
Feb 24, 2024
Maintainer