Archive Grand Sumo tournament highlights, as they are removed 😭 before each new tournament.
Update basho.json
with latest source and metadata, e.g.
"201609": {
"en": "http://www.sumo.or.jp/EnHonbashoTopicsKoTorikumi15/wrap",
"ja": "http://www.sumo.or.jp/ResultDataKoTorikumi15/wrap",
"date": "7 Oct 2016",
"title": "Aki 2016 (September) Grand Sumo Highlights",
"archive": "honbasho-201609-aki",
"description": "<b>Aki 2016</b>\n\nTokyo, Ryogoku Kokugikan\n\nSeptember 11, 2016 - September 25, 2016\n\n"
Get highlights metadata:
$ mkdir {dest}
$ crawl.py {selector} > {dest}/data.json
Download movies and text:
$ download.py {dest} {dest}/data.json
Make highlights HTML index:
$ index.py {dest}/data.json {selector} > {dest}/highlights.html
- Add a description for the archive page in
basho.json
- Move crawl HTML out of {dest}/
- Make sure {selector} and {dest} have same name (e.g. 201607)
Review metadata changes to be made:
$ upload.py {selector}
Upload files and modify metadata:
$ upload.py {selector} -u # upload files
$ upload.py {selector} -m # modify metadata
- Checkout
gh-pages
branch and updateindex.html
- See https://siznax.github.io/honbasho
Thanks to the Internet Archive for hosting, and @jjjake for the excellent internetarchive python library.
@siznax