Skip to content

This is a repository of web crawling work undertaken while working with Dr. Shoaib Jameel and Senior Data Scientist Mozhgan Talebpour.

Notifications You must be signed in to change notification settings

KnutSander/web-crawling-work

Repository files navigation

Web Crawling Work

This is a repository of web crawling work undertaken while working with Dr. Shoaib Jameel and Senior Data Scientist Mozhgan Talebpour. The main aim of the work is to try to use web crawling to get images and label them. To achieve this, several different approaches was tested, all of which are explained below. The work was undertaken in a 7 week period in colaboration with the Frontrunners program at the University of Essex, All work is of my own creation, any code or concepts taken or adapted from anywhere is aknowledged.

Image Crawling

Folder Contents

  • beautifulSoup_test.py
  • getImages.py
  • iterativeGetImages.py
  • imageLinks.txt
  • singleOutput.txt
  • output.txt
  • README.md

Topic Searching

Folder Contents

  • sportPictureLabeling.py
  • result.csv
  • README.md

Image Labeling

Folder Contents

  • link_extraction_old.py
  • link_extraction.py
  • image_labeler.py
  • independent_crawler.py
  • links.csv
  • result.csv
  • science_and_tech_result.csv
  • README.md

Improved Image Labeling

Folder Contents

  • text_filtering.py
  • image_filtering.py
  • operation_speed.py
  • improved_independent_crawler.py
  • uk_and_americas_news_result.cvs
  • tech_and_science_result.csv
  • README.md

Data Analysis

Folder Contents


Folder Name

Folder Contents


About

This is a repository of web crawling work undertaken while working with Dr. Shoaib Jameel and Senior Data Scientist Mozhgan Talebpour.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages