New to bots on Wikipedia? Read these primers!
Operator: MichaelMaggs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 13:52, Monday, December 14, 2020 (UTC)
Function overview: Add short description to pages in Moth categories that are currently lacking one
Automatic, Supervised, or Manual: Automatic, after pre-review
Programming language(s): Pywikibot
Source code available: GitHub
Links to relevant discussions (where appropriate): WikiProject. Also noted on the WP short description page
Edit period(s): One time
Estimated number of pages affected: 26,000
Namespace(s): Mainspace
Exclusion compliant (Yes/No): Yes
Function details: This is the first of a series of proposed bot tasks intended to make headway in adding short descriptions to the 3.5 million articles that still don’t have one. This is of some importance to mobile users as it means that a large number of articles still don't have any descriptive/disambiguating text appearing under the title when a search is carried out. I've some experience in working with short descriptions having added some 10,000 so far, most semi-manually with JWB and the short description helper gadget.
The moths seem a good place to start since suitably-precise short descriptions can’t trivially be generated from existing inboxes (even for articles where one exists), at least without expensive Lua calls. This task will skip over all pages that already have an existing short description. The bot deals with Wikipedia short descriptions only, and doesn't make use of Wikidata short descriptions in any way. I could add Bots exclusion-compliance if needed, but that doesn't seem appropriate here.
The aim is to keep the new descriptions simple so that they can be added to many articles quickly, while still maintaining a low error rate. The procedure is, on a category-by-category basis:
The moth articles are well structured, and it’s possible to identify “Species of moth” and “Genus of moths” with near 100% accuracy. You can see a sample of 200 or so proposed edits from Category:Moths of the United States at User:MichaelMaggs/Moths; note that the bot correctly identifies several articles as genus which Wikidata wrongly has as species. Of the 837 target articles in that category, the bot is able to fix over 98%, with just a few being skipped where it wasn't quite able to extract the first sentence of the lead.
My initial reaction was that this should be possible with taxobox directly, but as noted in the discussions and function details it's difficult to do cheaply, so the bot makes sense. There is clear consensus for this specific task as well as for prior bots like this, so no concerns there.
Reviewing the list from User:MichaelMaggs/Moths, the cases where the bot and Wikidata differ (Apreta, Apocrisias, Abrenthia) are all monotypic genera. Our convention is for the article to be titled after the genus, but Wikidata doesn't seem to share this as far as I can tell; for example, it has separate items for the species and the genus (i.e. it may be that we are associating these articles with the wrong Wikidata items and not that the Wikidata items are wrong). I'm not sure it would be incorrect for such a short description to say "species" instead of "genus" (they are, in a sense, the same thing); in fact, the example from the guideline of a monotypic order, Amphionides, actually has "monotypic species" in its short description. I don't think what you are doing is wrong or the bot should be changed, but I'm wondering if it points to deeper issues with our categorization that might need to be noted and addressed later.
Exclusion compliance indeed seems unlikely to be an issue, but it is cheap to add and serves as an extra safety check. As you'll be editing the mainspace and lots of pages, I recommend you add it.
I also did a quick code review. I didn't find any major issues, but here are some suggestions:
|bot=ShortDescBot
parameter?shortdesc_exists()
only checks for ((Short description)) in the lead section. While it would be against the MOS, to be safe I think we should check it for anywhere in the page.I also wanted to point out a few Python conventions to encourage cleaner code, unrelated to functionality. Feel free to ignore these:
required_words
) but then passes them as function parameters with the same name. This variable shadowing isn't necessary and can be confusing; you can use the globals in the function body directly. This can greatly simplify your function signatures.global
keyword (e.g. global wikipedia
) in a function unless you are assigning a value to that name in that function. Python knows to access names globally if they aren't defined in the function body.Thanks! (Please ping me if responding.) — Earwig talk 07:06, 18 December 2020 (UTC)[reply]
|bot=
parameter. I've never once come across that on any page I've looked at, though PearBOT 5 seems to have used it, and to be honest I can't see that it's of much use. All it does is to clutter the wikicode permanently with the bot/username that made the change - information which is easily available in the history, and which isn't so far as I know permanently recorded in connection with other bot edits. The parameter is 'optional' according to the template, and I'd prefer not to use it unless BAG recommends that I should. MichaelMaggs (talk) 18:00, 18 December 2020 (UTC)[reply]