A little auk goes a long way

BotWikiAwk is a framework and libraries for creating and running bots on Wikipedia.

Features

Overview

BotWikiAwk contains two elements:

Why awk? Awk is a small, elegant language composed of a single binary file, the interpreter. It is a POSIX tool installed on most unix computers. The language syntax is simple and forgiving. It is usually associated with one-line scripts, but since about 2012 the GNU version has become more powerful. While not a general purpose language, awk is primarily a text processing language which is exactly what bots do. The areas that awk can not support (eg. networking) are executed through external programs.

BotWikiAwk is batch oriented. After creating a master list of articles, it then carves out batches which are assigned a unique name, called a project ID. Each utility takes as input the project ID and what action to take for the project. Projects can be any size including the full size of the master-list ie. a single project.

Requirements

Setup

If installing on Toolforge see special instructions.

export AWKPATH=.:/home/adminuser/BotWikiAwk/lib:/usr/local/share/awk
If on Toolforge see special instructions
PATH=$PATH:/home/adminuser/BotWikiAwk/bin
Change #1) StopButton URL
Change #2) UserPage URL

New bot

To create a new bot:

makebot ~/botname

The path should point to a new directory, botname that has not been created yet, with "botname" being the name of your bot (no spaces recommended). The path can be to anywhere, but if different from the default ~/BotWikiAwk/bots directory also update ~/BotWikiAwk/lib/botwiki.awk section #3 following the "mybot" example.

I find locating the bot outside the ~/BotWikiAwk directories makes it easier to upgrade BotWikiAwk later. One can simply delete everything and re-clone it (saving only the original botwiki.awk file).

It will prompt for type of bot skeleton. If the bot will be doing operations on CS1|2 templates choose #2.

Writing bot

See ~/BotWikiBot/example-bots

<to be expanded>

Running bot

In summary, the process works by running four utilities:

The utility programs (wikiget, project, runbot and bug) have many options available with -h

Example bot

The easiest way to demonstrate BotWikiBot by running a real bot.

0. Create the bot using existing example, accdate, a bot for removing |access-date= in CS|2 templates.

Make the bot:
makebot ~/BotWikiBot/bots/accdate
Copy in the pre-written example bot:
cp ~/BotWikiBot/example-bots/accdate.awk ~/BotWikiBot/bots/accdate
cd to the bot directory
cd ~/BotWikiBot/bots/accdate
All utilities only work while in the bot's home directory; with the exception of wikiget which can run anywhere.

A. Make a master list of pages to process, called an "auth" file. Here getting the list from a category, the "-c" option.

wikiget -c "Category:Pages using citations with accessdate and no URL" > meta/accdate20181102.auth
The file ends in .auth (required) and is located in the bot's meta subdirectory.
In this case '20181102' is today's date but it can be any identifying string of numbers or letters.
The "accdate" portion of the filename can also be anything, though it's helpful to use the bot name.
Manually edit meta/accdate20181102.auth to remove unwanted pages eg. "Template:" or "Wikipedia:" space.

B. Create (-c) a batch (called a 'project') of 50 articles to process

project -c -p accdate20181102.00001-00050
The project ID (-p) is composed of the name created in Step A (accdate20181102) followed by a "." followed by a set of numbers (00001-00050) which means line # 1 -> line #50 in the file meta/accdate20181102.auth ie. the first 50 articles to process.
The project ID is referenced by every utility to identify which project is being worked on.

C. Run the bot in dry-run mode

runbot accdate20181102.00001-00050 auth dryrun

D. Look at resulting local diffs

Find which pages the bot modified as recorded in the "discovered" file in the meta directory
cat meta/accdate20181102.00001-00050/discovered
For each, visually check the diff with bug -dc
bug -p accdate20181102.00001-00050 -n "Theory of relativity" -dc
The bot can be re-run for individual pages
bug -p accdate20181102.00001-00050 -n "Theory of relativity" -r
Further info available with -v shows location of data directory
bug -p accdate20181102.00001-00050 -n "Theory of relativity" -v

E. Push changes to Wikipedia

If project was previously run in dry-run mode, first delete it and recreate
project -x -p accdate20181102.00001-00050
project -c -p accdate20181102.00001-00050
Then run in live mode (CAUTION: don't do this for the demonstration)
runbot accdate20181102.00001-00050 auth
If project has never been created before just create it new and run
project -c -p accdate20181102.00001-00050
runbot accdate20181102.00001-00050 auth

F. Repeat

Repeat steps B->F increasing the size of the batch and using the "bug -dc" to spot check diffs until confidence is high. Once confidence is high, only the last part of step E required. As can be seen each project run is a 2-step process: create the project defining its size, then run the bot on the project.