get_top_sumgrams function not working #34

willmhowes · 2023-11-02T21:59:58Z

Running the code from the README, specifically sumgrams = get_top_sumgrams(doc_lst, ngram, params=params), returns the following exception:

sumgram.py, 1178, (<class 'sklearn.utils._param_validation.InvalidParameterError'>, InvalidParameterError('The \'stop_words\' parameter of CountVectorizer must be a str among {\'english\'}, an instance of \'list\' or None. Got {\'hasnt\', \'be\', \'seeming\', \'doing\', \'e.g\', \'enough\', \'elsewhere\', \'themselves\', \'in\', \'inc.\', \'so\', \'else\', \'a\', \'do\', \'beside\', \'myself\', \'i\', \'became\', \'never\', \'whereupon\', \'seem\', \'detail\', \'several\', \'from\', \'theirs\', \'due\', \'can\', \'e.g.,\', \'yet\', \'none\', \'either\', \'our\', \'alone\', \'he\', \'therein\', \'thereafter\', \'each\', \'were\', \'sometime\', \'besides\', \'move\', \'image\', \'we\', \'everywhere\', \'upon\', \'whole\', \'thus\', \'amongst\', \'ours\', \'this\', \'behind\', \'why\', "hasn\'t", \'her\', \'nowhere\', \'e.g.\', \'once\', \'other\', \'amount\', \'where\', \'thereby\', \'somewhere\', \'inc\', \'becomes\', \'any\', \'ourselves\', \'whither\', \'between\', \'seems\', \'even\', \'an\', \'your\', \'did\', \'could\', \'describe\', \'take\', \'hereafter\', \'whose\', \'eg\', \'its\', \'made\', \'however\', \'sometimes\', \'except\', \'often\', \'along\', \'cannot\', \'one\', \'get\', \'himself\', \'into\', \'part\', \'such\', \'every\', \'his\', \'therefore\', \'latter\', \'re\', \'nobody\', \'above\', \'meanwhile\', \'otherwise\', \'with\', \'see\', \'more\', \'anything\', \'some\', \'moreover\', \'him\', \'cant\', \'formerly\', \'same\', \'seemed\', \'beforehand\', \'somehow\', \'noone\', \'couldnt\', \'well\', \'becoming\', \'whenever\', \'because\', \'there\', \'show\', \'anywhere\', \'during\', \'rather\', \'over\', \'whom\', \'indeed\', \'both\', \'herself\', \'another\', \'would\', \'does\', \'below\', \'the\', \'without\', \'you\', \'anyway\', \'have\', \'thru\', \'who\', \'own\', \'nevertheless\', \'about\', \'via\', \'was\', \'nothing\', \'amoungst\', \'make\', \'being\', \'are\', \'has not\', \'someone\', \'hereupon\', \'former\', "couldn\'t", \'nor\', \'us\', \'them\', \'everyone\', \'no\', \'namely\', \'of\', \'until\', \'before\', \'within\', \'is\', \'whereas\', \'to\', \'how\', \'again\', \'around\', \'than\', \'become\', \'whence\', \'thereupon\', \'their\', \'hence\', \'also\', \'although\', \'i.e\', \'or\', \'whoever\', \'mine\', \'others\', \'wherein\', \'go\', \'afterwards\', \'neither\', \'on\', \'yourself\', \'me\', \'hereby\', \'something\', "it\'s", \'may\', \'further\', \'latterly\', \'though\', \'already\', \'too\', \'but\', \'back\', \'herein\', \'yourselves\', \'had\', \'beyond\', \'if\', \'done\', "can\'t", \'which\', \'when\', \'next\', \'keep\', \'has\', \'hers\', \'toward\', \'un\', \'wherever\', \'co\', \'etc.\', \'per\', \'and\', \'whereby\', \'will\', \'might\', \'yours\', \'they\', \'by\', \'anyhow\', \'these\', \'please\', \'through\', \'what\', \'those\', \'put\', \'perhaps\', \'am\', \'only\', \'here\', \'all\', \'against\', \'for\', \'she\', \'then\', \'ever\', \'it\', \'most\', \'ie\', \'thence\', \'as\', \'almost\', \'after\', \'just\', \'itself\', \'could not\', \'de\', \'everything\', \'across\', \'throughout\', \'whereafter\', \'having\', \'towards\', \'off\', \'since\', \'whether\', \'while\', \'still\', \'etc\', \'sincere\', \'less\', \'side\', \'not\', \'anyone\', \'always\', \'among\', \'mostly\', \'that\', \'should\', \'together\', \'i.e.\', \'out\', \'whatever\', \'been\', \'at\', \'must\', \'very\', \'now\', \'my\', \'onto\'} instead.'), <traceback object at 0x11ff3b0c0>)

get_top_sumgrams appears to be broken.

The text was updated successfully, but these errors were encountered:

anwala · 2023-11-02T22:33:06Z

Hello @willmhowes,

Thanks for reporting this. Can you please add the code which you ran to generate this error? I suspect you might have supplied the wrong type for stopword. Ensure to replicate the Python script example:

import json
from sumgram.sumgram import get_top_sumgrams

doc_lst = [
    {'id': 0, 'text': 'The eye of Category 4 Hurricane Harvey is now over Aransas Bay. A station at Aransas Pass run by the Texas Coastal Observing Network recently reported a sustained wind of 102 mph with a gust to 132 mph. A station at Aransas Wildlife Refuge run by the Texas Coastal Observing Network recently reported a sustained wind of 75 mph with a gust to 99 mph. A station at Rockport reported a pressure of 945 mb on the western side of the eye.'},
    {'id': 1, 'text': 'Eye of Category 4 Hurricane Harvey is almost onshore. A station at Aransas Pass run by the Texas Coastal Observing Network recently reported a sustained wind of 102 mph with a gust to 120 mph.'},
    {'id': 2, 'text': 'Hurricane Harvey has become a Category 4 storm with maximum sustained winds of 130 mph. Sustained hurricane-force winds are spreading onto the middle Texas coast.'}
  ]

'''
  Use 'add_stopwords' to include list of additional stopwords not included in stopwords list (https://github.com/oduwsdl/sumgram/blob/0224fc9d54034a25e296dd1c43c09c76244fc3c2/sumgram/util.py#L31)
'''
params = {
    'top_sumgram_count': 10,
    'add_stopwords': ['image'],#<--- add stopwords here.
    'no_rank_sentences': True,
    'title': 'Top sumgrams for Hurricane Harvey text collection'
}

ngram = 2
sumgrams = get_top_sumgrams(doc_lst, ngram, params=params)
with open('sumgrams.json', 'w') as outfile:
  json.dump(sumgrams, outfile)

willmhowes · 2023-11-02T22:40:00Z

Thanks for the response, @anwala. To clarify, I ran the example python script from the README exactly as written and received the error posted above. Are you able to replicate?

anwala · 2023-11-03T14:46:46Z

@willmhowes,

I've been able to successfully run sumgram. I got a different error, which I fixed in the main branch of sumgram. I wasn't able to reproduce your error. So I hope this fix works for you too.

Kindly uninstall ($ pip uninstall sumgram) and reinstall sumgram ($ git clone https://github.com/oduwsdl/sumgram.git; pip install sumgram/) and then test the code again. If it fails, please report your version of sklearn, numpy, and requests and the error message.

Good luck!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_top_sumgrams function not working #34

get_top_sumgrams function not working #34

willmhowes commented Nov 2, 2023

anwala commented Nov 2, 2023

willmhowes commented Nov 2, 2023

anwala commented Nov 3, 2023

get_top_sumgrams function not working #34

get_top_sumgrams function not working #34

Comments

willmhowes commented Nov 2, 2023

anwala commented Nov 2, 2023

willmhowes commented Nov 2, 2023

anwala commented Nov 3, 2023