Clustering News With Artificial Intelligence And Lots of Love

This is an experimental project to cluster news articles. Some of the technologies used include:

Text modelling:

word2vec (gensim)
doc2vec (gensim)
fastText
LDA (gensim)

Database:

redis
mongodb

Web back-end:

Flask

Nearest-neighbour Approximation:

Annoy

Notes

Currently, the document processing is a bit slow ( ~10 Minutes for ~3000 articles).

''Installation''

git clone https://github.com/amirothman/news_aggregator

install missing dependencies with

pip install <package-name>

create following empty-directories if they do not exist:

model
corpus
similarity_index
textfiles

Launching the web server

With flask:

FLASK_APP=webapp.py flask run

With Gunicorn:

gunicorn -b 0.0.0.0:5000 webapp:app

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
corpus		corpus
dictionary		dictionary
static		static
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
corpus_dictionary.py		corpus_dictionary.py
crawler.py		crawler.py
create_empty_dirs.sh		create_empty_dirs.sh
extract.py		extract.py
fast_text.py		fast_text.py
fast_text_bulk.sh		fast_text_bulk.sh
fast_text_vector.sh		fast_text_vector.sh
fasttext		fasttext
modelling.py		modelling.py
news_rss.txt		news_rss.txt
pradesh18.txt		pradesh18.txt
preprocess_text.py		preprocess_text.py
run_forever.sh		run_forever.sh
similarity.py		similarity.py
testmongo.py		testmongo.py
text_preprocess_grabber.py		text_preprocess_grabber.py
update.py		update.py
webapp.py		webapp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clustering News With Artificial Intelligence And Lots of Love

Notes

''Installation''

Launching the web server

About

Releases

Packages

Languages

License

amirothman/news_aggregator

Folders and files

Latest commit

History

Repository files navigation

Clustering News With Artificial Intelligence And Lots of Love

Notes

''Installation''

Launching the web server

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages