Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass empty nodes to Dependency Graphs #1389

Merged
merged 6 commits into from
Oct 15, 2023
Merged

Pass empty nodes to Dependency Graphs #1389

merged 6 commits into from
Oct 15, 2023

Conversation

AngledLuffa
Copy link
Contributor

Allow for passing "empty nodes" to Dependency Graphs (SemanticGraph) This will allow for running Semgrex queries over those documents.

An example where this will be relevant is in the Estonian sentence

# sent_id = ewtb2_000035_15
# text = Ja paari aasta pärast rôômalt maasikatele ...
1       Ja      ja      CCONJ   J       _       3       cc      5.1:cc  _
2       paari   paar    NUM     N       Case=Gen|Number=Sing|NumForm=Word|NumType=Card  3       nummod  3:nummod        _
3       aasta   aasta   NOUN    S       Case=Gen|Number=Sing    0       root    5.1:obl _
4       pärast  pärast  ADP     K       AdpType=Post    3       case    3:case  _
5       rôômalt rõõmsalt        ADV     D       Typo=Yes        3       advmod  5.1:advmod      Orphan=Yes|CorrectForm=rõõmsalt
5.1     panna   panema  VERB    V       VerbForm=Inf    _       _       0:root  Empty=5.1
6       maasikatele     maasikas        NOUN    S       Case=All|Number=Plur    3       obl     5.1:obl Orphan=Yes
7       ...     ...     PUNCT   Z       _       3       punct   5.1:punct       _

Also need to add a mechanism for passing multiple graphs using the same nodes and searching over multiple graphs at once. This will allow for Semgrex queries over those enhanced graphs. Similarly, will eventually need to upgrade Ssurgeon to process multiple graphs at once

…is will be especially useful when passing around dependency graphs with emptyIndex

Don't pass around IndexAnnotation if the implicit index is sufficient (save some space... possibly not necessary)
…dencyGraph proto. Will allow for the passing of UD graphs with fake words
Throws a descriptive exception instead of NPE if a token comes back null, as that will eventually crash anyway

Will need to incorporate emptyIndex as well in order to pass around graphs with EmptyIndex
…anticGraph with the fake nodes used in UD

Switch to ThreeDimensionalMap for the SemanticGraph nodes and a TwoDimensionalMap for the incoming words
…ld version still works, in case there are legacy systems out there or old serialized graphs
@AngledLuffa AngledLuffa merged commit da47715 into dev Oct 15, 2023
2 checks passed
@AngledLuffa AngledLuffa deleted the semgrex_empty branch October 15, 2023 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant