The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.
2

Operator: Beetstra

Automatic or Manually assisted: Automatic

Programming language(s): Perl / PerlWikipedia

Source code available: Yes

Function overview: Keeping an eye on verifyable numerical data in infoboxes (e.g. 'boilingPtC = 100' for water (molecule) in the ((chembox)), or 'birth_date = (1917-05-29)May 29, 1917' for John F. Kennedy in the ((Infobox President))

Edit period(s): Continuous

Estimated number of pages affected: 10-20 mainspace pages / hour (with the three infoboxes currently followed, set of followed pages about 9.000)

Exclusion compliant (Y/N): No(t yet?)

Already has a bot flag (Y/N): Y

Function details:

As explained in Wikipedia:Bots/Requests_for_approval/CheMoBot, CheMoBot is following verifyable data in infoboxes. E.g. we all know that the boiling point of water is 100 degrees Centigrade, so there is hardly any reason to change the value 'boilingpoint = 100' in the infobox on the wikipage for water to anything else (maybe except if there will be a rewrite of the infobox, which would have a larger effect, not only for water). If that value is changed, then that should be noted. For water it will be clear, practically every editor will recognise that a change of that value needs a good and proper reason, however for e.g. the melting point of Sodium bisulfate one would have to dig in literature to see if a change from 35.2 to 58.5 is correcting the value or not.

CheMoBot works with an index with revids of pages where certain values are correct. E.g. the CAS registry number ('CASNo' field in the ((chembox))) of Acetone was checked by Physchim62 in revid 266420980 (see this revid) and found correct. This revid was recorded in the index for the ((chembox)) (stored in Wikipedia:WikiProject Chemicals/Index). If the CASNo would now be different, that can be checked by CheMoBot, by comparing the value in the verified revid with the current revid.

Every time a page transcluding one of the followed boxes, CheMoBot checks if verified values are changed, and the changes are documented in his logs. However, the logs are difficult to read, and one would have to go through all of them to check if there is data changed. The same could be accomplished with proper categorisation of boxes which have these verified values changed.

Here is proposed, that CheMoBot adds (a) parameter(s) to the infoboxes in mainspace that have verified or watched values changed to values which are not the same as those in the indexed, verified version. These parameters would enable these boxes to be categorised using the template code to note that (critical) values in the box were changed. When the parameters are correct, the parameters are removed again.

The bot is not going to correct the values, it only tags the pages by putting another parameter in the infobox!

I mention 'verified' and 'watched' fields. The verification project under the WikiProject Chemicals is currently only verifying the 'CASNo', and for that field only the index is set up (at the moment). However, the infoboxes also contain other numerical fields for which we are interested in changes in them, but which may not be correct in the indexed version. In short: CheMoBot follows both the 'watched' and 'verified' parameters in the box (as changes to them are of interest), but treats the 'verified' fields special (as they are correct in the indexed version).

Regarding the being 'Exclusion complaint', I don't think that that is necessery here, it does not show strictly what is displayed, it only helps in categorisation of boxes containing verified data. Although I understand that the settings can be used to 'enforce' certain data in boxes, I would say that the problem would be with either having wrong settings, or having to adapt the index of verified versions, not having deliberately wrong data in the infobox.

Settings

For the 'variables' in the settings, see User:CheMoBot/Settings; when I mention a setting here, I will put the variable between < and > (e.g., <boxes> is the variable 'boxes', in the settings set in the line 'boxes=Chembox|Drugbox|Reactionbox|Chembox_new'

Pages transcluding watched infoboxes (((chembox)), ((drugbox)) and ((reactionbox)) can either be 'verified' or not 'verified' (so no revid in the index). Verified here means that someone has checked the correctness of the verified fields in the infobox on a page, and if verifyably correct, has recorded the revid of the page in the index (for ((chembox)) that is in Wikipedia:WikiProject Chemicals/Index).

I will from here talk about a page with a verified ((chembox)). For chembox, the verified fields are <chembox_verifiedfields>, the watched fields are <chembox_watchedfields> (variables for the chembox all have the prefix 'chembox_'), others are either for other boxes, or systemwide. The system works the same for every box in <boxes>, each infobox having his own set of settings.

So what happens when an editor changes a value in a verified, followed infobox

This functionality does not do anything with a not-verified box, i.e. a box on a page which does not have a revid with verified values in the index!

1) The editor is changing a verified or watched field (normal)

Functionality is the same for the verified and for the watched fields, the actual infobox-code should be able to handle the two things differently.

The added fields (here 'Verifiedfields=changed' and 'Watchedfields=changed') can be used to trigger categorisation, box colouration, &c. for easy recognition. For the ((chembox)) the effect of changing the values for a verified field can be seen at the bottom of the box, where also the box-disclaimer is displayed).

2) The editor is changing a verified or watched field (special, optional)
Verified revid
some notes

Discussion[edit]

For people who want to test and play with this functionality. The bot operates this functionality in my userspace (it is strictly turned off for mainspace, it only operates in specified userspace):

Feel free to edit the page User:Beetstra/Propane to see how changes will operate. The handle-delay (i.e., the time between a change to the page, and the moment that the bot adapts the parameters) is now set to 15 seconds in userspace (to allow reasonably rapid testing; ). In mainspace I am thinking to force it to a minimum of 5 minutes, even if the setting for <handledelay> is set shorter. If the setting is higher, that number will be used; but I expect that in the order of 5-10 minutes would be reasonable); the actual delay can be set in the settings page, parameter <handledelay>. I would appreciate if people would try to fool the bot on my userspace page before we apply this to mainspace (though I don't think that it can be done). --Dirk Beetstra T C 15:13, 3 August 2009 (UTC)[reply]

I have to say this is a superbly useful piece of functionality, it would be useful also for watching a lot of other carefully entered and verifiable data, such as co-ordinates, death dates etc. Obviously there will be errors that need correcting, and we know that IP corrections often get slapped down by vandal fighters,especially if they have no edit summary, but that is a red herring, this is a useful tool, not just for Chem articles. Rich Farmbrough, 19:04, 20 August 2009 (UTC).[reply]
I have just added ((Infobox Person)), ((Infobox Royalty)), and ((Infobox Officeholder)) to the watching of boxes, and made the bot output to #wikipedia-en-blp for these boxes (there may be more boxes, but those can come later, lets not take too big steps, the bot still has to be able to munch every edit (there is space for more, though)). For these boxes no index is defined yet (and hence, there is no verified data). Bot does seem to do well, there. --Dirk Beetstra T C 19:52, 20 August 2009 (UTC)[reply]
I think the point to be made is that, in chemistry, we have an almost ideal dataset against which to test the bot. We know that our CAS numbers are correct, because we have obtained them from the primary source, but they can also be verified and challenged by any user (see our Signpost article from last May). At a rough estimate, our error rate in CAS numbers has dropped from 1–2 percent to 1–2 permille (an error rate of zero doesn't exist) thanks to our verification efforts: WP:CHEM would like to keep it that low! The principle can certainly be applied to other items of data which we have gone to some (often considerable) effort to verify – such as birth and death dates and places, or geographical coordinates – but we need to check how things will work in practice first. Physchim62 (talk) 23:03, 29 August 2009 (UTC)[reply]

I can't see any reason to oppose approval as requested. The bot would be running as a cleanup bot, ie tagging articles that might be in need of cleanup, but leaving intellectual decisions to human editors. The addition of a single parameter to an infobox, one which would have minimal visibility for users – in fact no visibility at all, given recent changes to the chembox since the request for approval was posted – seems to me to be quite within the established bounds of a cleanup bot. It is less intrusive than, say, adding ((unreferenced)) tags (even when such tags are justified). Physchim62 (talk) 22:38, 29 August 2009 (UTC)[reply]

((BAGAssistanceNeeded)) Thanks, but ... err ... the testing for this is already working in the specified userspace (specifically ONLY in my userspace, see e.g. User:Beetstra/Propane, so we can actually test if it works), and it actually works .. --Dirk Beetstra T C 11:35, 2 September 2009 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.