Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Style=Expr for epressive word forms #493

Open
rhdunn opened this issue Dec 3, 2023 · 0 comments
Open

Missing Style=Expr for epressive word forms #493

rhdunn opened this issue Dec 3, 2023 · 0 comments
Labels

Comments

@rhdunn
Copy link
Contributor

rhdunn commented Dec 3, 2023

The following are Style=Expr instead of abbreviations.

See https://universaldependencies.org/u/feat/Style.html#expr-expressive-emotional:

Kinds of expressive spelling variation include: expressive lengthening (niiiiice), dialectal or colloquial pronunciation (Hahvahd), censored characters (sh*t), symbolic characters (CA$H), etc. As CA$H defies typographical convention it should also be labeled Typo=Yes.

These are also missing CorrectForm annotations:

dialectal or colloquial pronunciation

thru -> through

ERROR: Sentence email-enronsent42_01-0075 token 22 -- IN/Abbr=Yes lemma 'through' does not match lowercase-form applied to form 'thru', expected 'thru'
ERROR: Sentence answers-20111106215236AAycANO_ans-0024 token 2 -- IN/Abbr=Yes lemma 'through' does not match lowercase-form applied to form 'thru', expected 'thru'
ERROR: Sentence answers-20111106215236AAycANO_ans-0029 token 2 -- IN/Abbr=Yes lemma 'through' does not match lowercase-form applied to form 'thru', expected 'thru'
ERROR: Sentence reviews-348369-0006 token 15 -- IN/Abbr=Yes lemma 'through' does not match lowercase-form applied to form 'thru', expected 'thru'
ERROR: Sentence reviews-360937-0002 token 28 -- IN/Abbr=Yes lemma 'through' does not match lowercase-form applied to form 'thru', expected 'thru'

luv -> love

ERROR: Sentence reviews-042012-0005 token 19 -- NN/Abbr=Yes lemma 'love' does not match uppercase-form applied to form 'luv', expected 'LUV'
ERROR: Sentence reviews-042012-0006 token 11 -- NN/Abbr=Yes lemma 'love' does not match uppercase-form applied to form 'luv', expected 'LUV'
ERROR: Sentence reviews-042012-0007 token 1 -- NN/Abbr=Yes lemma 'love' does not match uppercase-form applied to form 'Luv', expected 'LUV'

Others:

WARN: Sentence answers-20111108075853AAUIKRQ_ans-0004 token 4 -- JJ/Abbr=Yes lemma 'good' does not have a validation rule for form 'gud'
WARN: Sentence answers-20111108102621AA3hPqj_ans-0007 token 15 -- JJ/Abbr=Yes lemma 'little' does not have a validation rule for form 'lil'
WARN: Sentence answers-20111108071852AAxbh5F_ans-0015 token 1 -- DT/Abbr=Yes lemma 'that' does not have a validation rule for form 'dat'
ERROR: Sentence reviews-038358-0003 token 20 -- IN/Abbr=Yes lemma 'for' does not match lowercase-form applied to form 'fo', expected 'fo'
ERROR: Sentence reviews-159371-0005 token 12 -- IN/Abbr=Yes lemma 'though' does not match lowercase-form applied to form 'tho', expected 'tho'

The following should have "because" as the lemma and CorrectForm:

ERROR: Sentence answers-20111107224336AAxQbzk_ans-0002 token 1 -- IN/Abbr=Yes lemma 'cause' does not match lowercase-form applied to form 'cos', expected 'cos'
ERROR: Sentence answers-20111108084355AAvLpRa_ans-0009 token 40 -- IN/Abbr=Yes lemma 'because' does not match lowercase-form applied to form 'coz', expected 'coz'
ERROR: Sentence reviews-018548-0006 token 9 -- IN/Abbr=Yes lemma 'cause' does not match lowercase-form applied to form 'cus', expected 'cus'

symbolic characters, etc.

These also need Typo=Yes:

ERROR: Sentence answers-20111106230959AAuYQ5Q_ans-0005 token 2 -- IN/Abbr=Yes lemma 'to' does not match lowercase-form applied to form '2', expected '2'
ERROR: Sentence reviews-351950-0002 token 5 -- IN/Abbr=Yes lemma 'for' does not match lowercase-form applied to form '4', expected '4'
ERROR: Sentence reviews-100592-0005 token 8 -- IN/Abbr=Yes lemma 'for' does not match lowercase-form applied to form '4', expected '4'
ERROR: Sentence answers-20111108075853AAUIKRQ_ans-0002 token 21 -- TO/Abbr=Yes lemma 'to' does not match lowercase-form applied to form '2', expected '2'
ERROR: Sentence answers-20111108075853AAUIKRQ_ans-0002 token 40 -- NN/Abbr=Yes lemma 'anyone' does not match uppercase-form applied to form 'any1', expected 'ANY1'
ERROR: Sentence weblog-blogspot.com_healingiraq_20050121235804_ENG_20050121_235804-0030 token 10 -- VBN lemma 'fuck' does not match past-participle-verb applied to form 'f*ed', expected 'f*'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants