Mediawiki downloader

This python script scans website for pages and downloads them. Notice: it will not download media files like photos, videos and so on.

Download directory: script execution directory.

It indexes all pages and creates a file "num2name.txt" where you can get original page name for each number of xml-document (see below). After this it downloads all pages in XML format with all history. Names look like "0.xml", "1.xml", ....

Tested on Linux where it works. Should be work on Mac or Windows too.

Requirements

Obviously, python
BeautifulSoup4

         pip install bs4

Usage

python mediawiki_downloader.py "url"

URL should be like in example below (with 'http://' or 'https://' at beginning).

Example:

python mediawiki_downloader.py "http://wiki.example.com"

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LICENSE		LICENSE
README.md		README.md
mediawiki_downloader.py		mediawiki_downloader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mediawiki downloader

Requirements

Usage

About

Releases

Packages

Languages

License

Vitalkrilov/mediawiki_downloader

Folders and files

Latest commit

History

Repository files navigation

Mediawiki downloader

Requirements

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages