Yobot 54

Operator: Magioladitis (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 09:40, Saturday, March 25, 2017 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): AWB

Source code available:

Function overview: Replace ISBN / PMID / RFC magiclinks with ISBN / PMID / RFC template

Find: \b(ISBN)((?:[^\S\n]| |&\#0*160;|&\#[Xx]0*[Aa]0;)+)((?:97[89](?:-|(?:[^\S\n]| |&\#0*160;|&\#[Xx]0*[Aa]0;))?)?(?:[0-9](?:-|(?:[^\S\n]| |&\#0*160;|&\#[Xx]0*[Aa]0;))?){9}[0-9Xx])\b

Replace: ((ISBN|$3)) Links to relevant discussions (where appropriate):

Edit period(s): One off and revisit

Estimated number of pages affected: Roughly 400,000 + 7,000 + a few more

Exclusion compliant (Yes/No):

Already has a bot flag (Yes/No): Yes

Function details: AWB's Find and Replace. It will skip page if not applicable. General fixes will be done in addition. Example: ISBN 123456789-0 → ISBN 123456789-0 Parameter error in ((ISBN)): checksum

Discussion

@Legoktm, Bgwhite, and MZMcBride: -- Magioladitis (talk) 09:47, 25 March 2017 (UTC)[reply]

See also previous discussion: Wikipedia:Bots/Requests for approval/Yobot 27. Now there is also clear consensus to proceed. -- Magioladitis (talk) 09:47, 25 March 2017 (UTC)[reply]

What regex are you using to detect the ISBN magic links? See also Wikipedia:Bots/Requests for approval/PrimeBOT 13 where this same task has been proposed (it may be ok to have two bots working on this, as long as they both actually do the same replacements). Anomie 12:51, 25 March 2017 (UTC)[reply]

Anomie I'll be using pretty much the same with PrimeBot. I have not noticed the other bot. I have proposed this a long time ago and reopened it.

ISBN[\s\t]*(([0-9]|-|–|X)+)→((ISBN|$1))

I'll be catching a bit more since I also check for tabs between ISBN and the number. -- Magioladitis (talk) 12:56, 25 March 2017 (UTC)[reply]

As I mentioned on the other task, the regular expression used by MediaWiki for this purpose is different. To what extent does the community care about bots catching additional ISBNs that aren't currently magic links? Anomie 13:00, 25 March 2017 (UTC)[reply]

Anomie Since PrimeBOT will be using AWB we can cooperate to have the same regex if there are two bots to do this. Whatever suits best. There are very few cases that are not magic links and are detected in Wikipedia:WikiProject Check Wikipedia/ISBN errors. Me and other are fixing these. Notice that the list has dropped drastically the last few months. -- Magioladitis (talk) 13:02, 25 March 2017 (UTC)[reply]

This task does not require general fixes, and they should be disabled. Yobot in particular has had longstanding issues with unnecessary fixes; it's better to stick just to the specific task at hand rather than adding unrelated edits to the task. This is particularly the case for a job of this size. — Carl (CBM · talk) 13:04, 25 March 2017 (UTC)[reply]

There seems to be a consensus that there are tasks that should be done as secondary tasks. No task actually requires other tasks. Exactly of the size if a good opportunity to perform hundreds of other minor tasks in a single run. Moreover, a F&R is ideal for skipping when the primary tasks has not been fixed. -- Magioladitis (talk) 13:07, 25 March 2017 (UTC)[reply]
I don't think there is any such consensus. Yobot has a track record of problems with general fixes, so it makes particular sense for them to be disabled until a better track record is established. Thus is an area where BAG can show more oversight going forward. — Carl (CBM · talk) 13:11, 25 March 2017 (UTC)[reply]
Cc: @Headbomb: -- Magioladitis (talk) 13:21, 25 March 2017 (UTC)[reply]
While there is a consensus that general fixes can be done alongside a task, and that this would cover so many pages that we might as well apply genfixes, there is a history of Yobot not following WP:COSMETICBOT. Since many such genfixes are cosmetic in nature, and that you've had a full/recently closed ARBCOM case on this specific issue, the community's concern are valid. To be approved to run with genfixes, you would, at the very least, need to both skip if only minor genfixes fixes are made, and skip if only cosmetic genfixes are made. But because of Yobot's history, it might be best if the bot is only allowed to make an edit iff the task's main purpose would have triggered an edit. However, I'd rather let someone else from the BAG make the final decision, because I have advised you too many times on this. Headbomb {talk / contribs / physics / books} 13:54, 25 March 2017 (UTC)[reply]
IMO "the bot is only allowed to make an edit iff the task's main purpose would have triggered an edit" would be satisfactory and should be followed by all bots making genfixes. Also, if we wind up with multiple bots doing the task, ideally they should try to avoid a situation where they make different general fixes such that one edit conflicting with the other results in MediaWiki producing a merged edit that appears to be fixes-only. Anomie 14:45, 26 March 2017 (UTC)[reply]
It's good guidance in general, but there are exceptions. For instance bots operating on data dumps will often run into an instance of their task having already been done. I don't see the benefit of restricting an otherwise non-contentious bots from doing non-cosmetic genfixes (like properly closing a ref template/malformatted wikilink). Headbomb {talk / contribs / physics / books} 15:13, 26 March 2017 (UTC)[reply]
Each bot edit is supposed to be backed up by a BRFA. If the goal is to fix malformed wikilinks, that is a separate task from this one, which should be run with a separate edit summary. One of the longstanding problems with Yobot was that it did not use clear edit summaries that identified the specific goal of each edit. If a particular bot task no longer applies to a particular page, it's common sense that a bot running that task should skip the page, rather than claiming to fix a problem that no longer exists. — Carl (CBM · talk) 15:47, 26 March 2017 (UTC)[reply]
That's why such bots should have "and AWB genfixes" in their summary somewhere. I also highly encourage the AWB team to develop a list similar to WP:CWERRORS where they could deliver a blow-by-blow list of genfixes done, but that this beyond the scope of this BRFA. Headbomb {talk / contribs / physics / books} 15:59, 26 March 2017 (UTC)[reply]
But that edit summary does not permit them to go ahead and make the genfixes if the approved task does not apply. Otherwise, they could just run the "approved" task on every article to make genfixes.... The genfixes are only valid when the approved task is also performed. As I said, one of the lonstanding problems with Yobot was bad edit summaries, such as vague references to CHECKWIKI, which made it impossible to tell what approved task was actually running. Going forward, BAG needs to rein in the edits so that they more precisely match the approved tasks. — Carl (CBM · talk) 16:02, 26 March 2017 (UTC)[reply]
Edit summaries permit nothing. BRFAs and consensus do. The edit summary addresses the issue of WP:BOTCOMM per our requirements. The community has always accepted a small false positive rate on editing conditions, as long the edits made were not contentious. If a bot runs against say CW Error #32, and 3 out of the listed 80 instances of the error were already fixed by the time the bot got to it, and the bot happens to fix something else instead (e.g. fixing categories), that is fine even if the main scope of its task isn't to fix categories. This of course, assumes the bot was approved to do AWB genfixes alongside its main task. Headbomb {talk / contribs / physics / books} 17:25, 26 March 2017 (UTC)[reply]
We are off-topic. This was already done by Yobot for CHECKWIKI with very few exceptions due to other problems. For example the #104 error was supposedly being fixed with error ratio < 1% till an error in the list generation and not in the code added more pages in the list which resulted in a disaster. This fact was overlooked during the ArbCom. Anyway, this bot task is cut-clear because AWB provides a very specific skip condition in case the ISBN is not fixed. The error ratio will be 0%. -- Magioladitis (talk) 17:41, 26 March 2017 (UTC)[reply]

Headbomb We, as AWB team, already have WP:GENFIXES. -- Magioladitis (talk) 17:21, 26 March 2017 (UTC)[reply]

And that list is non-exhaustive. Compare with WP:CWERRORS where each error is identified and categorized. In theory, it is possible to have a bot have a blow-by-blow edit summary such as "Fixing CW Error #01, CW Error #02 CW Error #04, CW Error #41 and CW Error #88". With AWB, we are not told much except "general fixes". The WP:GENFIXES pages is also not linked by default. I'll file a phabricator ticket for this actually. Headbomb {talk / contribs / physics / books} 17:29, 26 March 2017 (UTC)[reply]
Yes, I agree the edit summary should list the actual changes being made - Magioladitis has been asked to do that for years, but has declined. It would be good for BAG to require it for CHECKWIKI fixes going forward. On the other hand, an edit summary of "Fix problem ZZZ and/or do general fixes" is so vague that the operator could just run it on every page, regardless whether the page ever had problem ZZZ. That is not what BRFAs are meant to approve. — Carl (CBM · talk) 17:33, 26 March 2017 (UTC)[reply]
And if a bot starts to primarily do genfixes, it can be blocked as running outside its approval. Theoretical bad faith concerns are outside the scope of this BRFA. I'll also remind you that like ARBCOM, BAG cannot dictate software development. While we could certainly ban AWB-based bots until certain software features are implemented, it would be gigantically detrimental to do so, especially when in the vast majority of cases this is not an actual issue. Perfect is the enemy of good. Headbomb {talk / contribs / physics / books} 17:52, 26 March 2017 (UTC)[reply]
The point of a BRFA is to vet both the task and the bot operator - it is not at all outside the scope of a BRFA to look at all aspects of the request, and even to say that a task is theoretically OK but should be run by a different operator. One of the things that led to the arbitration case was previous BAG members who did not do sufficient vetting of tasks, leading to vague approvals. We know from experience that, unless everything is laid out in very specific detail, the scope of Yobot bot runs has a tendency to grow well beyond what was originally approved. We know that the jobs have often been run on inaccurate lists. One job of BAG is to find a way, with in the BRFA process, to obtain better outcomes. — Carl (CBM · talk) 18:37, 26 March 2017 (UTC)[reply]

@Headbomb: Perfect.I will ask PrimeBot if they an do the same. -- Magioladitis (talk) 14:09, 25 March 2017 (UTC)[reply]

Primebot and Primface are under no arbcom restrictions, and have no history of WP:COSMETICBOT violations. There is little reason to withhold genfixes from that bot's task, or be concerned that it would not have those skip conditions enabled. Headbomb {talk / contribs / physics / books} 14:12, 25 March 2017 (UTC)[reply]
Headbomb OK. Happy to hear that. If Primfac/Primebot is approved to do the task with general fixes on ad the problem is only mistrust fpr my bot, I'll leave the entire task to them and I will help with bug fixing. -- Magioladitis (talk) 14:17, 25 March 2017 (UTC)[reply]
As one of the harshest critics of Magioladitis, I have to say, I think he's being treated extremely unfairly here. If he has "skip if no replacement" checked, there is a 0% chance of error on cosmetic edits. The only potential concern is that he would lie to us about checking that option, but I certainly don't think that's at all likely. If a cosmetic edit is made, we'd be able to clearly see that the option wasn't checked and react accordingly. The remedies allowing blocking of his bot as an AE action are intended to allow him to get back to normal bot operation, not be used as an excuse to refuse tasks that would be approved for other bot operators. If anything, he should be encouraged for coming to the table with a very reasonable proposal to use genfixes and a clear idea of how to avoid cosmetic editing. ~ Rob13Talk 15:46, 25 March 2017 (UTC)[reply]
The issue is not only cosmetic edits, although it would be unwise to ignore that issue. Look at the example of BenderBot, which replaces http with https in some URLs. One reason that task draws few complaints is that the bot sticks to the desired task without making additional, unrelated edits. This makes it much easier for editors to understand what the bot was doing. Diffs full of other changes can only cause confusion. — Carl (CBM · talk) 17:22, 25 March 2017 (UTC)[reply]
In which scenario a bot should do minor edits in addition then? -- Magioladitis (talk) 16:16, 26 March 2017 (UTC)[reply]
Some bot authorizations explicitly allow bots to perform other changes, like general fixes, as long as the main change is also made. That's when minor changes can be done. I don't think anyone is concerned about that - the issue is when the main change is not made, in which case the bot authorization doesn't apply, and the bot should not make any edit. — Carl (CBM · talk) 16:30, 26 March 2017 (UTC)[reply]
@CBM: Yes, but here the bot operator has provided a very clear method to avoid that issue – checking "skip if no replacement made". That option is 100% accurate; it skips if one of your main changes isn't implemented. So long as the main changes are coded properly (discussed below), that won't be an issue unless the bot operator is lying about checking that particular option. As I said before, I'm extremely confident he's not intentionally misleading us during the bot approvals process, but in the highly unlikely event he is, a single cosmetic-only edit related to this task would prove that. In that hypothetical situation (again, highly unlikely), proven intentional deceit at the bot approvals process would likely result in a long-term block based on the history. But let's give him some credit, both in the sense of his good-faith efforts to comply with the bot policy and his intelligence, and acknowledge that Magioladitis would not lie to us here, especially in a manner so easily caught. ~ Rob13Talk 16:40, 26 March 2017 (UTC)[reply]

The regex above is too broad. It would presumably catch "ISBN0123456789", which is not currently a magic link, and "http://www.example.com/booklink?ISBN0123456789", which is also not a magic link and should definitely not be converted. The bot needs to be limited to operating on articles in Category:Pages using ISBN magic links, and it should ideally use the same regex that Mediawiki uses to detect ISBNs to turn into magic links. – Jonesey95 (talk) 17:48, 25 March 2017 (UTC)[reply]

Jonesey95 True. I'll do some tests. Moreover, I'll ignore all urls, filenames, etc. using AWB's hide feature. -- Magioladitis (talk) 19:09, 25 March 2017 (UTC)[reply]

Jonesey95 In fact I plan to start by helping in manually fixing those in Wikipedia:WikiProject Check Wikipedia/ISBN errors to avoid unnecessary duplicated runs. -- Magioladitis (talk) 19:11, 25 March 2017 (UTC)[reply]

Thanks for picking this up again. Is there a reason you're only doing ISBN in this run and not PMID and RFC too? Legoktm (talk) 20:18, 25 March 2017 (UTC)[reply]

OK. I can do them too. -- Magioladitis (talk) 20:30, 25 March 2017 (UTC)[reply]

It should not need to be said, but edit summaries for this task should include the phrase "Task 54" and link back to this BRFA or to an up-to-date user page that links to this BRFA. – Jonesey95 (talk) 16:29, 26 March 2017 (UTC)[reply]

A user has requested the attention of a member of the Bot Approvals Group. Once assistance has been rendered, please deactivate this tag by replacing it with ((t|BAG assistance needed)). I ve already seen a bot that started the task! -- Magioladitis (talk) 18:01, 21 June 2017 (UTC)[reply]

2 bots started. I'll start manually too to help. -- Magioladitis (talk) 18:12, 21 June 2017 (UTC)[reply]

Magioladitis, you may want to check your edit summaries, they're... not really working right. You might want to use Special:PermaLink/... instead of the full URL. Primefac (talk) 18:24, 21 June 2017 (UTC)[reply]
Primefac Thanks. Fixed. You may also want that your bot runs out of mainspace. Maybe leave these to the very end? -- Magioladitis (talk) 18:25, 21 June 2017 (UTC)[reply]
They all have to be done eventually. Who cares in which order they're fixed? Primefac (talk) 18:26, 21 June 2017 (UTC)[reply]
Primefac In one case you "Fixed" a bug report to FrescoBot's page related to ISBN syntax. Maybe some should nnot be fixed. -- Magioladitis (talk) 18:28, 21 June 2017 (UTC)[reply]

If Yobot is using the same regex that Magioladitis was using (1, 2, 3, 4, 5, 6, 7, 8, 9), then the regex needs to be corrected. — JJMC89(T·C) 22:07, 21 June 2017 (UTC)[reply]

JJMC89 Already noted and fixed. I already have fixed all the pages you noted too. Thanks, Magioladitis (talk) 22:54, 21 June 2017 (UTC)[reply]
This was also a problematic edit. Not sure if it actually got fixed in the code, given that the solution was simply to remove one of the ISBNs before running it again. Primefac (talk) 01:13, 22 June 2017 (UTC)[reply]
When an ISBN is in the actual title of a journal article, the best way I know how to fix it is to wrap the ISBN in nowiki tags. There were too many editors trying to fix this article (gnome edit conflicts, hooray!), but here's a diff that gets to the basic idea. – Jonesey95 (talk) 03:41, 22 June 2017 (UTC)[reply]
The issue wasn't ((ISBN|2503049117)), 9782503049113 ((ISBN|9782503049113)), which was done manually; it was (1) ISBN 2 503 04911 7 ((ISBN|2)) 503 04911 7 (AWB) and then (2) ISBN 2 503 04911 7 ((ISBN|2-503-04911-7)) (manual). (1) is from bad regex. (2) and ISBN 2 503 04911 7 ISBN 2-503-04911-7 shouldn't have been done since they altered the title of the work being cited by adding -. — JJMC89(T·C) 03:58, 22 June 2017 (UTC)[reply]
Magioladitis, why aren't you using the same regex that the bots are using? It is working very well. – Jonesey95 (talk) 04:49, 22 June 2017 (UTC)[reply]

Primefac I did a lot of tese manually. These 2 ISBNs are identical. This has nothing to do with the bot run. I fix these manually for years. -- Magioladitis (talk) 05:30, 22 June 2017 (UTC)[reply]

Primefac I am also fixing these. AWB oes a lot of conversions but not all. The id parameters is very often abused to contain a second identical ISBN. -- Magioladitis (talk) 05:39, 22 June 2017 (UTC)[reply]

Jonesey95 Where is the regetx the bot use? I see only individual approvals that do not contain the approved regex. Here for instance: Wikipedia:Bots/Requests for approval/PrimeBOT 13 the final regex is not given. The regx used by Magic links bot is broken too: ((ISBN|\3)) is bad code. -- Magioladitis (talk) 05:32, 22 June 2017 (UTC)[reply]

Most recently at User talk:JJMC89.
I had a thought, though. Rather than trying to do the work that the bots are already doing, which is (clearly, as illustrated above) a waste of your time and that of others, follow behind the bots in article space (Magic Links Bot is up to articles starting with "H" at this writing) and fix the magic links that the bots were unable to fix. You will find many interesting and satisfying problems to work on. For example, I found articles that did not have any magic links left, but were transcluding Template:WdH or Template:Infobox comics character and title or Template:Black and Bolton 2001. By fixing those templates, I was able to fix a few hundred pages with just three edits. There are also magic links embedded in ((quote)) templates and [File] calls that need to be fixed using semi-automated editing with careful previews. – Jonesey95 (talk) 05:40, 22 June 2017 (UTC)[reply]

Jonesey95 I use pretty much the same. I use ((ISBN|$3)) instead of ((ISBN|\3)). -- Magioladitis (talk) 05:42, 22 June 2017 (UTC)[reply]

If you were using "pretty much the same" regex, I don't see how you could have made (and saved without catching in preview) the edits linked above. I urge you to listen to good-faith, constructive advice. – Jonesey95 (talk) 05:46, 22 June 2017 (UTC)[reply]
Jonesey95 I adjusted to match the one you gave me with the correction I just wrote. From my normal avccount I do a lot of manual fixes too. Yobot won't do these. -- Magioladitis (talk) 05:49, 22 June 2017 (UTC)[reply]

Jonesey95 I updated the regex of my bot and the function details. Thanks for the link. -- Magioladitis (talk) 05:52, 22 June 2017 (UTC)[reply]

Anomie I adjusted to catch the same ISBNs with the other bots. I also suggest that all bots should share the same regex to avoid problems. -- Magioladitis (talk) 07:53, 23 June 2017 (UTC)[reply]

Xaosflux Please review this request. -- Magioladitis (talk) 06:18, 27 June 2017 (UTC)[reply]

@Anomie: would you mind taking this one? — xaosflux Talk 11:10, 27 June 2017 (UTC)[reply]

Running the regex in manual mode works fine. -- Magioladitis (talk) 08:24, 30 June 2017 (UTC)[reply]