Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix incorrect similar words #220

Closed
wants to merge 2 commits into from
Closed

Conversation

KarlieZhao
Copy link
Collaborator

dhowe/rita#178
instead of 'toing', should be 'toeing'.

dhowe/rita#177
updated dictionary to fix the incorrect similar words

@dhowe
Copy link
Owner

dhowe commented Jun 4, 2022

Are there other cases you can think of for this? 'toe' is very uncommon as a verb -- I think its better to remove the verb form in the dictionary.

But also, why do I see "20,849 additions, 22,007 deletions" for the dict file ?

@KarlieZhao
Copy link
Collaborator Author

true, I'll remove the verb form of 'toe'. (But also I just found the word 'hoe''s presentpart form is returned as 'hoing', and there might be more similar cases -- I'll run more tests and modify the regex.)

@KarlieZhao
Copy link
Collaborator Author

But also, why do I see "20,849 additions, 22,007 deletions" for the dict file ?

there are a lot of changes in the dictionary because I was trying to remove those 'vbn' words that can be computed from their base forms (there were ~1000 such words so I didn't do it manually). Most lines in the new dict file remain the same but since I replaced the entire dictionary with a new one, they were recognized as additions & delections, I believe...not sure if there's a better way to fix this?

@dhowe
Copy link
Owner

dhowe commented Jun 4, 2022

there are a lot of changes in the dictionary because I was trying to remove those 'vbn' words that can be computed from their base forms (there were ~1000 such words so I didn't do it manually). Most lines in the new dict file remain the same but since I replaced the entire dictionary with a new one, they were recognized as additions & delections, I believe...not sure if there's a better way to fix this?

this is a very big decision/task... we should discuss more before trying it
for example we still need to be able to find those words when that pos is specified in a search or soundsLike, etc...

@KarlieZhao
Copy link
Collaborator Author

for example we still need to be able to find those words when that pos is specified in a search or soundsLike, etc...

I can think of two ways to solve this:
a. when looking for similar words (with soundsLike, etc), we not only iterate through the dictionary but also look at all other forms of each word, and check whether they rhyme with/sound/spell like the target word. So then we should be able to keep the base form of each word and remove the other forms in dict.
b. or, for words like 'discriminated', simply remove the 'vb' tag and keep 'vbn'.

@dhowe dhowe closed this Jun 4, 2022
@KarlieZhao KarlieZhao deleted the master branch June 15, 2022 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants