Web Crawling Work

This is a repository of web crawling work undertaken while working with Dr. Shoaib Jameel and Senior Data Scientist Mozhgan Talebpour. The main aim of the work is to try to use web crawling to get images and label them. To achieve this, several different approaches was tested, all of which are explained below. The work was undertaken in a 7 week period in colaboration with the Frontrunners program at the University of Essex, All work is of my own creation, any code or concepts taken or adapted from anywhere is aknowledged.

Image Crawling

Folder Contents

beautifulSoup_test.py
getImages.py
iterativeGetImages.py
imageLinks.txt
singleOutput.txt
output.txt
README.md

Topic Searching

Folder Contents

sportPictureLabeling.py
result.csv
README.md

Image Labeling

Folder Contents

link_extraction_old.py
link_extraction.py
image_labeler.py
independent_crawler.py
links.csv
result.csv
science_and_tech_result.csv
README.md

Improved Image Labeling

Folder Contents

text_filtering.py
image_filtering.py
operation_speed.py
improved_independent_crawler.py
uk_and_americas_news_result.cvs
tech_and_science_result.csv
README.md

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
Week 1 & 2: Image Crawling		Week 1 & 2: Image Crawling
Week 3: Topic Searching		Week 3: Topic Searching
Week 4 & 5: Image Labeling		Week 4 & 5: Image Labeling
Week 6: Improved Image Labeling		Week 6: Improved Image Labeling
Week 7: Data Analysis		Week 7: Data Analysis
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Crawling Work

Image Crawling

Folder Contents

Topic Searching

Folder Contents

Image Labeling

Folder Contents

Improved Image Labeling

Folder Contents

Data Analysis

Folder Contents

Folder Name

Folder Contents

About

Releases

Packages

Languages

KnutSander/web-crawling-work

Folders and files

Latest commit

History

Repository files navigation

Web Crawling Work

Image Crawling

Folder Contents

Topic Searching

Folder Contents

Image Labeling

Folder Contents

Improved Image Labeling

Folder Contents

Data Analysis

Folder Contents

Folder Name

Folder Contents

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages