Skip to content

Latest commit

 

History

History
192 lines (143 loc) · 6.71 KB

README.md

File metadata and controls

192 lines (143 loc) · 6.71 KB

The Wikidata Toolkit

Python version License: MIT

A Python project with WikiBot implementations to fix consistency issues on Wikidata.

Table of Contents

  1. Introduction
  2. Design
  3. Usage
    1. Pre-Requisites
    2. Sample Commands
    3. Canned Scripts
  4. [Tests]
  5. Contributing

Introduction

This repo contains a few utility scripts that fix consistency issues and missing data on Wikidata, focusing on TV series.

It is used by my Wikidata bot a.k.a. TheFireBenderBot. Check out its contributions to get an idea of what it specializes at. Here are some stats.

Design

Architecture Diagram

constraint.py contains the abstract definition for the concept of a Constraint. This is similar to how Wikidata defines constraints, except that the implementation may contain a way to fix them.

general.py and tv.py contain a few concrete implementations for constraints.

bots contains various Bot implementations that can be used to iterate through Wikidata pages using a generator, and treat (process) them.

television.py contains abstract models for the concepts of Episode, Season, Series and more. Each model has some semantic knowledge of the item it encapsulates, as well as the constraints it should be checked for.

wikidata_properties.py has a bunch of constants that encode property codes and a few common ID values. A list of all properties can be found here

Usage

Pre-Requisites

Account Setup

In order to run the scripts, you need to create a Bot account on Wikidata. Bot names usually end with the suffix "Bot". Once you have the appropriate credentials, create the following files:

user-config.py

family = 'wikidata'
mylang = 'wikidata'

usernames['wikidata']['wikidata'] = u'YourBotName'
password_file = "user-password.py"

user-password.py

(u'YourBotName', BotPassword(u'YourBotName', u'YourBotPassword'))

OR

(u'YourUserName@YourBotName', u'YourBotPassword')

Also see the Wikidata page on Bots

Requirements

Next, you need to install dependencies using the requirements.txt file. This is best done using a virtualenv and pip3:

virtualenv pywiki
source pywiki/bin/activate
pip3 install -r requirements.txt

Sample Commands

Checking and Fixing Constraints

  1. Checking individual items for constraint failures:
    # Q65604139 = Season 1 of "Dark"
    # Q65640227 Q65640226 Q65640224 = Episodes of "Dark"
    python3 check_constraints.py Q65640227 Q65640226 Q65640224 Q65604139
  2. Checking the episodes of a series (Jessica Jones) for constraint failures:
    # Q18605540 = Jessica Jones
    python3 check_tv_show.py Q18605540 \
        --child_type=episode
  3. Checking and fixing the seasons of a series for constraint failures
    # Q18605540 = Jessica Jones
    python3 check_tv_show.py Q18605540 \
        --child_type=season \
        --autofix
  4. Checking and fixing the episodes of a series for constraint failures, but wait until all the failures have been reported before fixing all of them at the end.
    # Q18605540 = Jessica Jones
    python3 check_tv_show.py Q18605540 \
        --child_type=episode \
        --autofix \
        --accumulate
  5. Fixing only the titles of episodes of a series
    # Q18605540 = Jessica Jones
    python3 check_tv_show.py Q18605540 \
        --child_type=episode \
        --autofix \
        --accumulate \
        --filter title
    An equivalent command is
    # Q18605540 = Jessica Jones
    python3 check_tv_show.py Q18605540 \
        --child_type=episode \
        --autofix \
        --accumulate \
        --filter P1476

Fetching/Updating Data from Wikipedia

  1. Get the list of episodes for The Neighborhood:

    # This will write out two files
    # the-neighborhood-tv-series_S01.csv and
    # the-neighborhood-tv-series_S02.csv
    python3 -m cli.list_episodes "https://en.wikipedia.org/wiki/The_Neighborhood_(TV_series)" --episode-counts=21,22
  2. Create seasons in Wikidata

    # Create two seasons for Q7753382 (The Neighborhood)
    python3 -m cli.create_seasons Q7753382 2
  3. Create the episodes in Wikidata:

    python3 -m cli.create_episodes Q7753382 Q99419240 the-neighborhood-tv-series_S01.csv --quickstatements

Canned Scripts

A few fixes are fairly straightforward, and should not require supervision. The canned folder exposes these fixes in the form of scripts that can be run directly without any arguments. If you want to see what changes will be made, run the script with the --dry flag.

Example:

# Dry run mode, won't update labels
python3 -m canned.fix_missing_labels --dry

# Run after confirming that the changes look correct
python3 -m canned.fix_missing_labels

Tests

Run pytest at the root of the repository. You should see something similar to:

================================== test session starts ==================================
platform darwin -- Python 3.7.6, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
rootdir: /foo/bar/baz/wikidata-toolkit
plugins: mock-3.3.1
collected 4 items

cli/test_cli.py ....                                                              [100%]

=================================== 4 passed in 3.40s ===================================

Contributing

Hacktoberfest

Hello there! If you are a Hacktoberfest 🎃 participant and wish to contribute to this repository, you can

  1. Pick an issue with the hacktoberfest label
  2. Fork this repository
  3. Clone this repository to your local machine
  4. Create a new branch
  5. Work on the issue on this new branch
  6. Push your branch to your fork
  7. Send a PR!