Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-projective annotation error in en_lines-ud-train-doc8-3070 #16

Open
nikitakit opened this issue Feb 3, 2024 · 3 comments
Open

Non-projective annotation error in en_lines-ud-train-doc8-3070 #16

nikitakit opened this issue Feb 3, 2024 · 3 comments

Comments

@nikitakit
Copy link

The sentence en_lines-ud-train-doc8-3070 was flagged by my scripts as having a degree of non-projectivity that is otherwise unattested in any of the English UD treebanks.

Three pairs of two: Jove and Stella, Jove and Alice, Alice and Stella and under the surface of each the head of the other.

en_lines-ud-train-doc8-3070

I believe the determiner/"of" attachments are incorrect and should be fixed as in the diff below.

# sent_id = en_lines-ud-train-doc8-3070
# text = Three pairs of two: Jove and Stella, Jove and Alice, Alice and Stella and under the surface of each the head of the other.
1       Three   three   NUM     CARD-PL NumType=Card    2       nummod  _       _
2       pairs   pair    NOUN    PL-NOM  Number=Plur     0       root    _       _
3       of      of      ADP     _       _       4       case    _       _
4       two     two     NUM     CARD-PL NumType=Card    2       nummod  _       SpaceAfter=No
5       :       :       PUNCT   Colon   _       6       punct   _       _
6       Jove    Jove    PROPN   SG-NOM  Number=Sing     2       appos   _       _
7       and     and     CCONJ   _       _       8       cc      _       _
8       Stella  Stella  PROPN   SG-NOM  Number=Sing     6       conj    _       SpaceAfter=No
9       ,       ,       PUNCT   Comma   _       10      punct   _       _
10      Jove    Jove    PROPN   SG-NOM  Number=Sing     6       conj    _       _
11      and     and     CCONJ   _       _       12      cc      _       _
12      Alice   Alice   PROPN   SG-NOM  Number=Sing     10      conj    _       SpaceAfter=No
13      ,       ,       PUNCT   Comma   _       10      punct   _       _
14      Alice   Alice   PROPN   SG-NOM  Number=Sing     6       conj    _       _
15      and     and     CCONJ   _       _       16      cc      _       _
16      Stella  Stella  PROPN   SG-NOM  Number=Sing     14      conj    _       _
17      and     and     CCONJ   _       _       20      cc      _       _
18      under   under   ADP     _       _       20      case    _       _
19      the     the     DET     DEF     Definite=Def|PronType=Art       20      det     _       _
20      surface surface NOUN    SG-NOM  Number=Sing     6       conj    _       _
21      of      of      ADP     _       _       22      case    _       _
22      each    each    PRON    TOT-SG  PronType=Tot    20      nmod    _       _
-23      the     the     DET     DEF     Definite=Def|PronType=Art       19      det     _       _
+23      the     the     DET     DEF     Definite=Def|PronType=Art       24      det     _       _
24      head    head    NOUN    SG-NOM  Number=Sing     20      nsubj   _       _
-25      of      of      ADP     _       _       22      case    _       _
+25      of      of      ADP     _       _       27      case    _       _
-26      the     the     DET     DEF     Definite=Def|PronType=Art       22      det     _       _
+26      the     the     DET     DEF     Definite=Def|PronType=Art       27      det     _       _
27      other   other   ADJ     POS     Degree=Pos      24      amod    _       SpaceAfter=No
28      .       .       PUNCT   Period  _       6       punct   _       SpacesAfter=\n\n

I'm also not sure about the analysis of "and under the surface of each the head of the other". I'm unfamiliar with the correct UD convention here, but might something like this be a better analysis:

2       pairs   pair    NOUN    PL-NOM  Number=Plur     0       root    _       _
-17      and     and     CCONJ   _       _       20      cc      _       _
+17      and     and     CCONJ   _       _       24      cc      _       _
-20      surface surface NOUN    SG-NOM  Number=Sing     6       conj    _       _
+20      surface surface NOUN    SG-NOM  Number=Sing     24       ???    _       _
-24      head    head    NOUN    SG-NOM  Number=Sing     20      nsubj   _       _
+24      head    head    NOUN    SG-NOM  Number=Sing     2      conj   _       _
@dan-zeman
Copy link
Member

That is clearly an error. Perhaps a conversion script got misled somehow.

@LarsAhrenberg
Copy link
Contributor

Thanks for pointing this out. The annotation will be changed for the next version. I agree that token 24, head, could be taken as the head of the last conjunct but I believe token 6 should be its head, not 2.

@nikitakit
Copy link
Author

Thank you! You are right, after taking another look I see that having token 6 be the head of the last conjunct is good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants