The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard. The result of the discussion was  Approved.

New to bots on Wikipedia? Read these primers!

Operator: MichaelMaggs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 13:52, Monday, December 14, 2020 (UTC)

Function overview: Add short description to pages in Moth categories that are currently lacking one

Automatic, Supervised, or Manual: Automatic, after pre-review

Programming language(s): Pywikibot

Source code available: GitHub

Links to relevant discussions (where appropriate): WikiProject. Also noted on the WP short description page

Edit period(s): One time

Estimated number of pages affected: 26,000

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: This is the first of a series of proposed bot tasks intended to make headway in adding short descriptions to the 3.5 million articles that still don’t have one. This is of some importance to mobile users as it means that a large number of articles still don't have any descriptive/disambiguating text appearing under the title when a search is carried out. I've some experience in working with short descriptions having added some 10,000 so far, most semi-manually with JWB and the short description helper gadget.

The moths seem a good place to start since suitably-precise short descriptions can’t trivially be generated from existing inboxes (even for articles where one exists), at least without expensive Lua calls. This task will skip over all pages that already have an existing short description. The bot deals with Wikipedia short descriptions only, and doesn't make use of Wikidata short descriptions in any way. I could add Bots exclusion-compliance if needed, but that doesn't seem appropriate here.

The aim is to keep the new descriptions simple so that they can be added to many articles quickly, while still maintaining a low error rate. The procedure is, on a category-by-category basis:

  1. Run the bot in trial mode, exporting all of the bot-proposed changes to a local spreadsheet
  2. Review for obvious errors, adjust the code, and repeat 1 until the automated error rate is sufficiently low
  3. Manually remove any remaining evident errors from the list
  4. Run the bot in edit mode, making changes only to the pages in the final corrected list.

The moth articles are well structured, and it’s possible to identify “Species of moth” and “Genus of moths” with near 100% accuracy. You can see a sample of 200 or so proposed edits from Category:Moths of the United States at User:MichaelMaggs/Moths; note that the bot correctly identifies several articles as genus which Wikidata wrongly has as species. Of the 837 target articles in that category, the bot is able to fix over 98%, with just a few being skipped where it wasn't quite able to extract the first sentence of the lead.

Discussion

[edit]

My initial reaction was that this should be possible with taxobox directly, but as noted in the discussions and function details it's difficult to do cheaply, so the bot makes sense. There is clear consensus for this specific task as well as for prior bots like this, so no concerns there.

Reviewing the list from User:MichaelMaggs/Moths, the cases where the bot and Wikidata differ (Apreta, Apocrisias, Abrenthia) are all monotypic genera. Our convention is for the article to be titled after the genus, but Wikidata doesn't seem to share this as far as I can tell; for example, it has separate items for the species and the genus (i.e. it may be that we are associating these articles with the wrong Wikidata items and not that the Wikidata items are wrong). I'm not sure it would be incorrect for such a short description to say "species" instead of "genus" (they are, in a sense, the same thing); in fact, the example from the guideline of a monotypic order, Amphionides, actually has "monotypic species" in its short description. I don't think what you are doing is wrong or the bot should be changed, but I'm wondering if it points to deeper issues with our categorization that might need to be noted and addressed later.

Exclusion compliance indeed seems unlikely to be an issue, but it is cheap to add and serves as an extra safety check. As you'll be editing the mainspace and lots of pages, I recommend you add it.

I also did a quick code review. I didn't find any major issues, but here are some suggestions:

I also wanted to point out a few Python conventions to encourage cleaner code, unrelated to functionality. Feel free to ignore these:

Thanks! (Please ping me if responding.) — Earwig talk 07:06, 18 December 2020 (UTC)[reply]

The Earwig, Mike Peel: thanks for your comments, and for the most helpful suggested improvemements to the code. I'll make those changes.
Interesting question about the |bot= parameter. I've never once come across that on any page I've looked at, though PearBOT 5 seems to have used it, and to be honest I can't see that it's of much use. All it does is to clutter the wikicode permanently with the bot/username that made the change - information which is easily available in the history, and which isn't so far as I know permanently recorded in connection with other bot edits. The parameter is 'optional' according to the template, and I'd prefer not to use it unless BAG recommends that I should. MichaelMaggs (talk) 18:00, 18 December 2020 (UTC)[reply]
I agree that it clutters wikicode and requires a separate edit to clean up later. The main benefit seems to be categorization, but that can be similarly achieved with a query over the user contributions if we end up in a situation where we need to mass-revert or something. Previous BRFAs (1, 2) also did not add it. If no one else objects, I'm fine leaving it out. — Earwig talk 18:35, 18 December 2020 (UTC)[reply]
The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard.