Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_top_sumgrams function not working #34

Open
willmhowes opened this issue Nov 2, 2023 · 3 comments
Open

get_top_sumgrams function not working #34

willmhowes opened this issue Nov 2, 2023 · 3 comments

Comments

@willmhowes
Copy link

Running the code from the README, specifically sumgrams = get_top_sumgrams(doc_lst, ngram, params=params), returns the following exception:

sumgram.py, 1178, (<class 'sklearn.utils._param_validation.InvalidParameterError'>, InvalidParameterError('The \'stop_words\' parameter of CountVectorizer must be a str among {\'english\'}, an instance of \'list\' or None. Got {\'hasnt\', \'be\', \'seeming\', \'doing\', \'e.g\', \'enough\', \'elsewhere\', \'themselves\', \'in\', \'inc.\', \'so\', \'else\', \'a\', \'do\', \'beside\', \'myself\', \'i\', \'became\', \'never\', \'whereupon\', \'seem\', \'detail\', \'several\', \'from\', \'theirs\', \'due\', \'can\', \'e.g.,\', \'yet\', \'none\', \'either\', \'our\', \'alone\', \'he\', \'therein\', \'thereafter\', \'each\', \'were\', \'sometime\', \'besides\', \'move\', \'image\', \'we\', \'everywhere\', \'upon\', \'whole\', \'thus\', \'amongst\', \'ours\', \'this\', \'behind\', \'why\', "hasn\'t", \'her\', \'nowhere\', \'e.g.\', \'once\', \'other\', \'amount\', \'where\', \'thereby\', \'somewhere\', \'inc\', \'becomes\', \'any\', \'ourselves\', \'whither\', \'between\', \'seems\', \'even\', \'an\', \'your\', \'did\', \'could\', \'describe\', \'take\', \'hereafter\', \'whose\', \'eg\', \'its\', \'made\', \'however\', \'sometimes\', \'except\', \'often\', \'along\', \'cannot\', \'one\', \'get\', \'himself\', \'into\', \'part\', \'such\', \'every\', \'his\', \'therefore\', \'latter\', \'re\', \'nobody\', \'above\', \'meanwhile\', \'otherwise\', \'with\', \'see\', \'more\', \'anything\', \'some\', \'moreover\', \'him\', \'cant\', \'formerly\', \'same\', \'seemed\', \'beforehand\', \'somehow\', \'noone\', \'couldnt\', \'well\', \'becoming\', \'whenever\', \'because\', \'there\', \'show\', \'anywhere\', \'during\', \'rather\', \'over\', \'whom\', \'indeed\', \'both\', \'herself\', \'another\', \'would\', \'does\', \'below\', \'the\', \'without\', \'you\', \'anyway\', \'have\', \'thru\', \'who\', \'own\', \'nevertheless\', \'about\', \'via\', \'was\', \'nothing\', \'amoungst\', \'make\', \'being\', \'are\', \'has not\', \'someone\', \'hereupon\', \'former\', "couldn\'t", \'nor\', \'us\', \'them\', \'everyone\', \'no\', \'namely\', \'of\', \'until\', \'before\', \'within\', \'is\', \'whereas\', \'to\', \'how\', \'again\', \'around\', \'than\', \'become\', \'whence\', \'thereupon\', \'their\', \'hence\', \'also\', \'although\', \'i.e\', \'or\', \'whoever\', \'mine\', \'others\', \'wherein\', \'go\', \'afterwards\', \'neither\', \'on\', \'yourself\', \'me\', \'hereby\', \'something\', "it\'s", \'may\', \'further\', \'latterly\', \'though\', \'already\', \'too\', \'but\', \'back\', \'herein\', \'yourselves\', \'had\', \'beyond\', \'if\', \'done\', "can\'t", \'which\', \'when\', \'next\', \'keep\', \'has\', \'hers\', \'toward\', \'un\', \'wherever\', \'co\', \'etc.\', \'per\', \'and\', \'whereby\', \'will\', \'might\', \'yours\', \'they\', \'by\', \'anyhow\', \'these\', \'please\', \'through\', \'what\', \'those\', \'put\', \'perhaps\', \'am\', \'only\', \'here\', \'all\', \'against\', \'for\', \'she\', \'then\', \'ever\', \'it\', \'most\', \'ie\', \'thence\', \'as\', \'almost\', \'after\', \'just\', \'itself\', \'could not\', \'de\', \'everything\', \'across\', \'throughout\', \'whereafter\', \'having\', \'towards\', \'off\', \'since\', \'whether\', \'while\', \'still\', \'etc\', \'sincere\', \'less\', \'side\', \'not\', \'anyone\', \'always\', \'among\', \'mostly\', \'that\', \'should\', \'together\', \'i.e.\', \'out\', \'whatever\', \'been\', \'at\', \'must\', \'very\', \'now\', \'my\', \'onto\'} instead.'), <traceback object at 0x11ff3b0c0>)

get_top_sumgrams appears to be broken.

@anwala
Copy link
Member

anwala commented Nov 2, 2023

Hello @willmhowes,

Thanks for reporting this. Can you please add the code which you ran to generate this error? I suspect you might have supplied the wrong type for stopword. Ensure to replicate the Python script example:

import json
from sumgram.sumgram import get_top_sumgrams

doc_lst = [
    {'id': 0, 'text': 'The eye of Category 4 Hurricane Harvey is now over Aransas Bay. A station at Aransas Pass run by the Texas Coastal Observing Network recently reported a sustained wind of 102 mph with a gust to 132 mph. A station at Aransas Wildlife Refuge run by the Texas Coastal Observing Network recently reported a sustained wind of 75 mph with a gust to 99 mph. A station at Rockport reported a pressure of 945 mb on the western side of the eye.'},
    {'id': 1, 'text': 'Eye of Category 4 Hurricane Harvey is almost onshore. A station at Aransas Pass run by the Texas Coastal Observing Network recently reported a sustained wind of 102 mph with a gust to 120 mph.'},
    {'id': 2, 'text': 'Hurricane Harvey has become a Category 4 storm with maximum sustained winds of 130 mph. Sustained hurricane-force winds are spreading onto the middle Texas coast.'}
  ]

'''
  Use 'add_stopwords' to include list of additional stopwords not included in stopwords list (https://github.com/oduwsdl/sumgram/blob/0224fc9d54034a25e296dd1c43c09c76244fc3c2/sumgram/util.py#L31)
'''
params = {
    'top_sumgram_count': 10,
    'add_stopwords': ['image'],#<--- add stopwords here.
    'no_rank_sentences': True,
    'title': 'Top sumgrams for Hurricane Harvey text collection'
}

ngram = 2
sumgrams = get_top_sumgrams(doc_lst, ngram, params=params)
with open('sumgrams.json', 'w') as outfile:
  json.dump(sumgrams, outfile)

@willmhowes
Copy link
Author

Thanks for the response, @anwala. To clarify, I ran the example python script from the README exactly as written and received the error posted above. Are you able to replicate?

@anwala
Copy link
Member

anwala commented Nov 3, 2023

@willmhowes,

I've been able to successfully run sumgram. I got a different error, which I fixed in the main branch of sumgram. I wasn't able to reproduce your error. So I hope this fix works for you too.

Kindly uninstall ($ pip uninstall sumgram) and reinstall sumgram ($ git clone https://github.com/oduwsdl/sumgram.git; pip install sumgram/) and then test the code again. If it fails, please report your version of sklearn, numpy, and requests and the error message.

Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants