-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flag to permit synset lookup without stemming #17
Comments
This is actually a little complicated. The WordNet access has been heavily dependent on the >>> from wn import WordNet
>>> wn = WordNet()
>>> wn.synsets('geese')
[Synset('goose.n.01'), Synset('fathead.n.01'), Synset('goose.n.03')]
>>> wn.synsets('mice')
[Synset('mouse.n.01'), Synset('shiner.n.01'), Synset('mouse.n.03'), Synset('mouse.n.04')]
|
Also I think the first argument input to |
Okay, now this is awkward. Actually without modification, the "eyeglasses" example has been "resolved". Unlike the cyclic nature of the old WordNet API, the new wordnet interface don't look at the Existing behavior of >>> from wn import WordNet
>>> wn = WordNet()
>>> wn.synsets('eyeglasses')
[Synset('spectacles.n.01')] Still, I think exposing an argument for users to disable morphy when needed is helpful. Thus #18 |
This is nice, but Also, I'm with @stevenbird that it's best to make behavior consistent for all languages instead of special-casing English. Since you're replacing the default WordNet module in the NLTK, this seems like a good time to introduce such a change. However, if NLTK follows semantic versioning and you're not ready to make a 4.0 release (because of the backward-compatibility breakage), you could make the default Finally, it would be even better if users could supply their own lemmatizer. E.g., |
Thanks for these suggestions @goodmami. So default behaviour would be to use a lemmatizer if available else proceed without (issuing the warning). The only change required will be for users of wordnets other than English who have to tweak their code to avoid the warning. And if a function is passed, we use it. |
@goodmami @stevenbird got some free time to look at this again. Let me try to confirm the requirements before I reimplement stuff =)
Does the requirements sound about right? |
That's close to what I was thinking. But more specifically: DEFAULT_LEMMATIZERS = {
'eng': morphy,
...
}
def synsets(word, pos=None, lang='eng', check_exceptions=True, lemmatize=True):
if lemmatize is True:
if lang not in DEFAULT_LEMMATIZERS:
warnings.warn(
WordNetWarning,
"No default lemmatizer for language '{}'".format(lang))
lemmatize = False
lemmatize = DEFAULT_LEMMATIZERS[lang]
if lemmatize:
word = lemmatize(word, pos=pos, check_exceptions=check_exceptions)
... This way we keep the default behavior, but users can easily disable English lemmatization with Finally, I now wonder if "lemmatize" is even the right word, because I can imagine users only wanting simple normalization, like downcasing. Maybe |
Cf nltk/nltk#2421
I propose that we add a
stem=False
flag town.synsets()
.It means that default behaviour for English will change, but I see no other option, given that stemming only happens for English wordnet. This would make behaviour consistent across languages.
The text was updated successfully, but these errors were encountered: