How can I correctly change the upos of words in a sentence using Stanza? #1352
Replies: 1 comment
-
The tree is built using the POS tags at the time of parsing, so changing
them afterwards won't affect the tree. You could always edit the tree as
well after running the pipeline. There is a function in
stanza/models/constituency/utils.py which replaces the tags of a tree with
a new list of tags, "replace_tags"
Alternatively, you could use a specialized version of the POS annotator
which replaces tags at the time of running the POS tagger, but I do not
recommend this, as each of the other annotators you use (lemma, depparse,
constituency) are expecting the Stanza tags.
…On Sat, Feb 24, 2024 at 9:54 AM edump72 ***@***.***> wrote:
I am using Stanza in order to receive a sentence and change its upos tag
so that I can get a more personalized constituency tree. This is my code
snippet :
import stanza
import nltk
from nltk import Tree
from nltk.draw.util import CanvasFrame
from nltk.draw import TreeWidget
sentence = "Juan camina al parque con su madre."
pipeline = stanza.Pipeline('es',
processors='tokenize,mwt,pos,lemma,depparse,constituency')
parser = pipeline(sentence)
upos_mapping = {
"NOUN": "Sustantivo",
"ADP": "Preposición",
"DET": "Determinante"
}
for sentence in parser.sentences:
for word in sentence.words:
if word.upos in upos_mapping:
word.upos = upos_mapping[word.upos]
parsed_sentence = parser.sentences[0].constituency.children[0].children
print(parsed_sentence, parser)
The problem that I am facing is that when I print the "parser" variable
that contains the processed sentence, I get all the upos modified. However,
when I print the constituency parser, the upos tags aren´t changed. This is
what I get :
((sn (grup.nom (PROPN Juan))), (grup.verb (VERB camina)), (sn (spec (ADP
a) (DET el)) (grup.nom (NOUN parque))), (sp (prep (ADP con)) (sn (spec (DET
su)) (grup.nom (NOUN madre)))), (PUNCT .)) [ [ { "id": 1, "text": "Juan",
"lemma": "Juan", "upos": "PROPN", "xpos": "np00000", "head": 2, "deprel":
"nsubj", "start_char": 0, "end_char": 4 }, { "id": 2, "text": "camina",
"lemma": "caminar", "upos": "VERB", "xpos": "vmip3s0", "feats":
"Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin", "head": 0,
"deprel": "root", "start_char": 5, "end_char": 11 }, { "id": [ 3, 4 ],
"text": "al", "start_char": 12, "end_char": 14 }, { "id": 3, "text": "a",
"lemma": "a", "upos": "Preposición", "xpos": "spcms", "head": 5, "deprel":
"case" }, { "id": 4, "text": "el", "lemma": "el", "upos": "Determinante",
"feats": "Definite=Def|Gender=Masc|Number=Sing|PronType=Art", "head": 5,
"deprel": "det" }, { "id": 5, "text": "parque", "lemma": "parque", "upos":
"Sustantivo", "xpos": "ncms000", "feats": "Gender=Masc|Number=Sing",
"head": 2, "deprel": "obl", "start_char": 15, "end_char": 21 }, { "id": 6,
"text": "con", "lemma": "con", "upos": "Preposición", "xpos": "sps00",
"head": 8, "deprel": "case", "start_char": 22, "end_char": 25 }, { "id": 7,
"text": "su", "lemma": "su", "upos": "Determinante", "xpos": "dp3cs0",
"feats": "Number=Sing|Person=3|Poss=Yes|PronType=Prs", "head": 8, "deprel":
"det", "start_char": 26, "end_char": 28 }, { "id": 8, "text": "madre",
"lemma": "madre", "upos": "Sustantivo", "xpos": "ncfs000", "feats":
"Gender=Fem|Number=Sing", "head": 2, "deprel": "obl", "start_char": 29,
"end_char": 34 }, { "id": 9, "text": ".", "lemma": ".", "xpos": "fp",
"feats": "PunctType=Peri", "head": 2, "deprel": "punct", "start_char": 34,
"end_char": 35 } ] ]
I have not find any resources in the Internet. I would be pleased if
someone could help me. Thank you!
—
Reply to this email directly, view it on GitHub
<#1352>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWPG7UMSWJDFSW4Q5F3YVISNXAVCNFSM6AAAAABDYFPHJGVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZWGI3DINRSGE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
edump72
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
import stanza
import nltk
from nltk import Tree
from nltk.draw.util import CanvasFrame
from nltk.draw import TreeWidget
sentence = "Juan camina al parque con su madre."
pipeline = stanza.Pipeline('es', processors='tokenize,mwt,pos,lemma,depparse,constituency')
parser = pipeline(sentence)
upos_mapping = {
"NOUN": "Sustantivo",
"ADP": "Preposición",
"DET": "Determinante"
}
for sentence in parser.sentences:
for word in sentence.words:
if word.upos in upos_mapping:
word.upos = upos_mapping[word.upos]
parsed_sentence = parser.sentences[0].constituency.children[0].children
print(parsed_sentence, parser)
((sn (grup.nom (PROPN Juan))), (grup.verb (VERB camina)), (sn (spec (ADP a) (DET el)) (grup.nom (NOUN parque))), (sp (prep (ADP con)) (sn (spec (DET su)) (grup.nom (NOUN madre)))), (PUNCT .)) [ [ { "id": 1, "text": "Juan", "lemma": "Juan", "upos": "PROPN", "xpos": "np00000", "head": 2, "deprel": "nsubj", "start_char": 0, "end_char": 4 }, { "id": 2, "text": "camina", "lemma": "caminar", "upos": "VERB", "xpos": "vmip3s0", "feats": "Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin", "head": 0, "deprel": "root", "start_char": 5, "end_char": 11 }, { "id": [ 3, 4 ], "text": "al", "start_char": 12, "end_char": 14 }, { "id": 3, "text": "a", "lemma": "a", "upos": "Preposición", "xpos": "spcms", "head": 5, "deprel": "case" }, { "id": 4, "text": "el", "lemma": "el", "upos": "Determinante", "feats": "Definite=Def|Gender=Masc|Number=Sing|PronType=Art", "head": 5, "deprel": "det" }, { "id": 5, "text": "parque", "lemma": "parque", "upos": "Sustantivo", "xpos": "ncms000", "feats": "Gender=Masc|Number=Sing", "head": 2, "deprel": "obl", "start_char": 15, "end_char": 21 }, { "id": 6, "text": "con", "lemma": "con", "upos": "Preposición", "xpos": "sps00", "head": 8, "deprel": "case", "start_char": 22, "end_char": 25 }, { "id": 7, "text": "su", "lemma": "su", "upos": "Determinante", "xpos": "dp3cs0", "feats": "Number=Sing|Person=3|Poss=Yes|PronType=Prs", "head": 8, "deprel": "det", "start_char": 26, "end_char": 28 }, { "id": 8, "text": "madre", "lemma": "madre", "upos": "Sustantivo", "xpos": "ncfs000", "feats": "Gender=Fem|Number=Sing", "head": 2, "deprel": "obl", "start_char": 29, "end_char": 34 }, { "id": 9, "text": ".", "lemma": ".", "xpos": "fp", "feats": "PunctType=Peri", "head": 2, "deprel": "punct", "start_char": 34, "end_char": 35 } ] ]
Beta Was this translation helpful? Give feedback.
All reactions