06:31, 15 June 2008 (UTC): this page contains my notes about the possibility of building an efficient human-bot hybrid for identifying and allocating repetitive tasks on Wikipedia that are too difficult for current bot programs to do alone. I invite comments on the talk page.

Inspiration

[edit]

I happened to read about the Amazon Mechanical Turk,[1] after a user on the Richard Dawkins Forum called my attention to it.

While I was answering this question on the Help desk: Wikipedia:Help_desk/Archives/2008 June 14#Really long references/URLs (permanent link), I got an idea about how to fix the enormous number of bare-URL footnotes on Wikipedia.

One of the featured article criteria is:

WP:FAIL says the number of featured articles on Wikipedia is increasing far too slowly, compared to the number of articles (6,882,453 currently). This is not surprising, considering the outlandish difficulty of working with footnotes and citation templates on Wikipedia. The fraction of users who understand how to edit citations up to featured article standards is very small, and not growing very fast. The overwhelming majority of Wikipedia's 47,965,892 registered users (and a probably comparable number of unregistereds) are unlikely to invest many hours to learn how to edit citations with the current system. The intellectual overhead in the current system is very high, but the actual work that needs doing is usually not more than tedious once the user finally identifies where to do it.

In other words, fixing citations doesn't require much creativity, but the amount of knowledge an editor needs to be able to do it all out of proportion to the creativity requirement.

It would be wonderful to write a bot program which could analyze the citations in a Wikipedia article, and bring them all up to a consistent standard. However, this would probably require a bot program capable of passing the Turing test. The Amazon Mechanical Turk system, however, does pass the Turing test, quite easily in fact.

Perhaps it may be possible to construct a similar system for use with Wikipedia, a human-bot hybrid, or a hum-bot (pronounced: hyoom-bot). (To-do: figure out how to say that in IPA.)

The bot program component of the hum-bot could identify and collect citations needing repair, and present them to human editors with as much supporting material as the bot can supply (citation template text partially filled out as much as it automatically possible, and instructions with links to examples explaining to the human exactly what to do).

Candidate articles include recent newsworthy events that have lots of sources, and lots of edits from subject enthusiasts who don't have lots of Wikipedia experience. Examples: Peak oil, Oil price increases since 2003, and Treaty of Lisbon. For the bot to get a foothold, it needs the article to contain <ref ...>...</ref> tags. Articles with completely unwiki references will be too hard for a bot to analyze, most likely.

Outline of a citation repair hum-bot

[edit]

A hum-bot that repairs inconsistent citations in an article could work like this:

Justification, prior art

[edit]

Before actually attempting to build such a system, or persuade someone else to try building it, we might first try to measure the scale of the problem. E.g., use a bot to scan a large number of articles to determine how many have inconsistent reference styles. See what others have done or are doing:

Query

[edit]

22:36, 15 June 2008 (UTC): I requested comments from the members of Wikipedia:WikiProject Citation cleanup:

I also asked User:Smith609 (author of the Universal reference formatter which ((Google scholar cite)) wraps around) to review this page and comment:

See also

[edit]

References

[edit]
  1. ^ Mieszkowski, Katharine (2006-07-24). "I make $1.45 a week and I love it". Salon.com. Retrieved 2008-06-15.
  2. ^ Smith 2007, p. 1.