This is a repository of web crawling work undertaken while working with Dr. Shoaib Jameel and Senior Data Scientist Mozhgan Talebpour.
The main aim of the work is to try to use web crawling to get images and label them. To achieve this, several different approaches was tested, all of which are explained below.
The work was undertaken in a 7 week period in colaboration with the Frontrunners program at the University of Essex,
All work is of my own creation, any code or concepts taken or adapted from anywhere is aknowledged.
- beautifulSoup_test.py
- getImages.py
- iterativeGetImages.py
- imageLinks.txt
- singleOutput.txt
- output.txt
- README.md
- sportPictureLabeling.py
- result.csv
- README.md
- link_extraction_old.py
- link_extraction.py
- image_labeler.py
- independent_crawler.py
- links.csv
- result.csv
- science_and_tech_result.csv
- README.md
- text_filtering.py
- image_filtering.py
- operation_speed.py
- improved_independent_crawler.py
- uk_and_americas_news_result.cvs
- tech_and_science_result.csv
- README.md