This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.

You may also be interested in the following:

Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with Quarry. However, someone may be able to assist by using Quarry in another way (e.g. checking the table of category links rather than the "Category:" text) or suggest an alternative tool.

AfC review's 25th review[edit]

I would like to find all new AfC reviewers who recently hit some review threshold (see this discussion). I'd operationalize this as something like "All users with their first AfC review under 6 months ago and their 25th review within the last week", but if something with a similar effect is technically easier I'm not attached to the specific formulation. @Novem Linguae directed me here to see if someone could help write a query; they say the typical way to grab AfC reviews, which isn't a formally logged action, is to look at edit summaries, e.g. here. Rusalkii (talk) 00:20, 24 February 2024 (UTC)[reply]

To start with, this can't possibly see afc reviews where the underlying page has been deleted (including where the redirect left from draft or, more likely userspace, has been deleted) - edit summaries of deleted revisions aren't exposed to the toolforge replicas. And searching revision, which is necessary if you're looking more than 30 days back, without some way to drastically narrow the search space isn't feasible. "Only users who've had at least one matching edit summary in the last week" may or may not be sufficient, we'll see. These actions should really, really be tagged; go ask at WP:EFR, if leaving such edit summaries without using the actual tool are to be counted, or change the tool if not. —Cryptic 02:20, 24 February 2024 (UTC)[reply]
The AFC helper script that accepts/declines/rejects drafts makes a variety of edits, including to: drafts, the log page at Wikipedia:Articles for creation/recent, sometimes a userspace log depending on the user's preferences (example: User:Novem Linguae/AfC log), and user talk pages. Which edits would be most helpful to tag? Or do you recommend tagging all of them? Also, tags are different than edit filters right? I could just make a tag at Special:Tags and update AFCH's code, and not need to use an edit filter at all, correct? –Novem Linguae (talk) 02:44, 24 February 2024 (UTC)[reply]
I'd tag accepts and declines but not comments, log pages, or user talk pages, probably with a separate tag for accepts and declines/rejects. Yes, they're very different; I mention edit filters because they can add tags even if someone doesn't use the tool. —Cryptic 02:59, 24 February 2024 (UTC)[reply]
To give you some idea of the scope of the problem, quarry:query/80638. There's 118 users with at least one vaguely-likely-to-be-an-AFC-review edit in the last week; among them, they have 10.8 million total edits. Each of those edits has to be looked at to see if the edit summary matches. (Well, not quite all - I can exclude edits from before 11 December 2009, the first revision of what became the AFCH script. But most of them.) Even limiting it to one of the sets of users with lower edit count is going to take a long time.
Are the writes to Wikipedia:Articles for creation/recent optional, or do they always happen? Is there always exactly one edit there per afc review? If so, when did that start? Analyzing its history would be much, much faster even than if there'd been tags from the beginning, and would be able to deal with now-deleted reviews too. —Cryptic 04:23, 24 February 2024 (UTC)[reply]
And, as tends to happen, my query on toolforge finished just after I wrote that. The public version at quarry:query/80639 should finish in about an hour unless the results happened to cache favorably (I don't think that's likely for this sort of query), but as of a few minutes ago the only such user was Broc. —Cryptic 04:41, 24 February 2024 (UTC)[reply]
The /recent page is always written to for accepts, but never for declines and rejects. Filtering by user talk namespsace may be helpful for counting afc actions, because everyone always gets a notification, and user talks are never deleted by g13. –Novem Linguae (talk) 04:43, 24 February 2024 (UTC)[reply]
Yeah, just saw that from analyzing Broc's reviews. Looking at user talk notifications would help some with now-deleted reviews, but still be unusably slow with nothing to narrow them except edit summary. And many, many, many IP talk pages have been badly speedied. —Cryptic 04:56, 24 February 2024 (UTC)[reply]
I think it's a bit rare to have more than one tag per piece of software. At this point I am leaning towards having AFCH tag all its edits as AFCH. That could hopefully then be used to speed up quarry queries. Feature request.Novem Linguae (talk) 11:12, 24 February 2024 (UTC)[reply]

Users who meet certain thresholds[edit]

Hi all, over at WP:RFA2024/I someone proposed we pull administrators from those who meet the following criteria:

  1. At least 10,000 total edits, including at least 5,000 in main space
  2. At least 1,000 edits in the past year, including at least 500 in main space
  3. Account registered at least three years ago
  4. No sanctions within the past five years
  5. At least one featured article or three good articles
  6. Have never lost adminship under a cloud

Is it possible to generate a list of editors who meet at least 1-4? #6 is rare enough that I can check manually. #5 is a bit tricky. I believe each editor with number of GAs is listed at User:GA_bot/Stats. I believe all users who have brought a featured article through FAC are at Wikipedia:List of Wikipedians by featured article nominations. I think generating this list would help inform the discussion. My guess is that it's a small list. Thanks a million to anyone up to the challenge! Ajpolino (talk) 02:42, 27 February 2024 (UTC)[reply]

The short answer is "no".
User:GA bot/Stats is users by the number of reviews performed, not good articles written; the last entry on the list, ヒストリ案, is illustrative. Similarly, the FA page isn't the users who primarily wrote the articles, it's the ones who took them through the FAC bureaucracy (though there's a high amount of correspondence). And even if they were both good data, there isn't a convenient way to extract the number of articles from the GA page and the FA page includes articles that are no longer featured; those, at least, could be addressed by manually creating similar pages (WP:List of users with three or more Good Articles and WP:List of users with at least one still-Featured Article).
#4 isn't doable even in principle. Sanctions, in general, aren't logged (in the Special:Log sense) unless they happen to be blocks.
I don't think there's a good list anywhere for #6. We have a couple applicable subpages of Wikipedia:Former administrators/reason, but lots are mixed into /resigned.
#2 and the second half of #1 are absurdly slow. There's no remotely efficient way to count edits by time or namespace other than to look at all of them. That's barely doable when we're already looking at individual users, or a small list of users with relatively few edits - say, no more than a million total edits, in either case - but when I poked at this briefly when it was first proposed, I ran the numbers just for 10k total edits and 3 years' registration; there's 10,318 users with 441.2 million total edits. #5 will cut that down, but not nearly by enough. —Cryptic 04:33, 27 February 2024 (UTC)[reply]
Understood, thank you for looking into it. Would it be possible to get the list of just the first halves of #s 1 and 2, plus #3. I.e. Users whose accounts are at least 3 years old, with at least 10,000 total edits including 1,000 in the past year? I'm guessing even a filter of "have edited in the past year" would cut down your 10,318 users quite a bit, but I'm not sure it's possible or worthwhile to attempt. Thanks again Cryptic. Ajpolino (talk) 13:56, 27 February 2024 (UTC)[reply]
Here's the first half of #1 combined with #3: https://quarry.wmcloud.org/query/80710Novem Linguae (talk) 14:51, 27 February 2024 (UTC)[reply]
That shows, as of this writing, users who've registered less than three years ago. You'll also want to exclude users in groups sysop or bot. —Cryptic 15:18, 27 February 2024 (UTC)[reply]
"1000 in the last year" still means having to look at every edit each of those editors made in the last year, and there's too many. "Any edits in the last year" does help - it cuts the number down from 10319 (congrats, whoever you are) to 6176 (quarry:query/80711)- but the other counts still aren't going to be feasible until we're down to a couple dozen. —Cryptic 15:18, 27 February 2024 (UTC)[reply]

Top 500 most redlinked articles?[edit]

I would love to see the most red-linked articles. There's a wikispace table for this somewhere, but it has serious problems. Zanahary (talk) 07:41, 28 February 2024 (UTC)[reply]

Related:
  • Special:WantedPages - all namespaces, so not very useful. phab:T37758 to add filtering by namespace to this page exists and is open and awaiting a patch.
  • Wikipedia:Most-wanted articles - looks like an attempt to list this for mainspace. but the list is outdated by over a year and also appears to be incomplete.
Will let someone more experienced speak about if Quarry/SQL can help here. –Novem Linguae (talk) 07:49, 28 February 2024 (UTC)[reply]
Thanks for this! Yeah, Most-wanted is what I was referring to. It's confusing and clearly just not the most-redlinked articles. Zanahary (talk) 07:56, 28 February 2024 (UTC)[reply]
To be more specific: Most redlinked articlespace articles, whose redlinks are in articlespace (if that's possible). Zanahary (talk) 07:58, 28 February 2024 (UTC)[reply]
If this would exclude transclusions via template! I just want a meaningful result that has red-linked articles by the number of times they were linked to by an editor. Zanahary (talk) 08:06, 28 February 2024 (UTC)[reply]
You can't get that. —Cryptic 08:10, 28 February 2024 (UTC)[reply]
Why not? Zanahary (talk) 08:11, 28 February 2024 (UTC)[reply]
Bad decisions made 22 years ago. phab:T14396 from 2007 is a reasonable place to start looking; it contains a couple indices of the many, many discussions (even then!) begging for this to be made possible. —Cryptic 08:19, 28 February 2024 (UTC)[reply]
So it's not possible for the query to exclude template links? Zanahary (talk) 08:23, 28 February 2024 (UTC)[reply]
I believe the link documents that pretty thoroughly? I mean, the workarounds to get a single page's incoming nontranscluded links clearly can't be scaled to multiple pages, let alone all articles. —Cryptic 09:39, 28 February 2024 (UTC)[reply]
It would still be useful and has been requested repeatedly, most recently in m:Community Wishlist Survey 2023/Larger suggestions/A way to see WhatLinksHere directly, excluding transcluded links. Certes (talk) 16:05, 28 February 2024 (UTC)[reply]
Leaving out the transcluded-link question, there's no reason someone can't just run Certes' query, as linked from WP:MWA, again. The hard part's already done, it just takes someone willing to deal with the tedium of running it for each partition of titles and collating the results - AND pl_title < "B", AND pl_title >= "B" AND pl_title < "C", ... AND pl_title >= "Y" AND pl_title < "Z", AND pl_title >="Z". —Cryptic 08:21, 28 February 2024 (UTC)[reply]
If it's going to be used regularly then we can copy the Quarry query to Wikipedia as a ((database report)). That has the advantage that anyone can edit it on Wikipedia and re-run it without logging in to Quarry. (Only I can edit the Quarry query, though anyone can copy and paste it.) pl_title LIKE "B%" seems as fast as the >= AND < syntax, and may be more readable. Beware that many titles which appear to be commonly redlinked are actually linked once from a widely used template which contain dozens of similar redlinks, e.g. ((Areas of Chennai)). Certes (talk) 09:40, 28 February 2024 (UTC)[reply]
LIKE 'B%' is never going to work for non-alphabetically-starting titles. Granted they're not likely to be widely-linked, but it's not impossible. —Cryptic 09:51, 28 February 2024 (UTC)[reply]
True. There are plenty of initials before A (notably 1 and 2 for "2024 in Tiddlywinks" etc.) and after Z. We need to use <= and > for the first and last partitions. We may also be able to get away with something like >="W" for adjacent uncommon initials. Certes (talk) 16:07, 28 February 2024 (UTC)[reply]

List of old issues (per year) for articles in a WikiProject[edit]

I've been able to create some queries at Quarry but my SQL is a bit too rusty for some things. Using my basic query for listing all mainspace pages in a WikiProject, I'd like to take a category like Category:Clean-up categories from 2013 and two levels of subcategories' worth of pages and do an intersection with the initial query's results. I realize I could probably use PetScan but I'd like to list these and results for other years on a page that a bot fills in with Database Reports. Any assistance appreciated! Stefen Towers among the rest! GabGruntwerk 03:14, 8 March 2024 (UTC)[reply]

@StefenTower quarry:query/81027. Change the two WHERE statements in the CTEs as needed and let me know if you encounter any issues. Uhai (talk) 06:42, 8 March 2024 (UTC)[reply]
Uhai Wow, it works perfectly - thanks! I only needed to make a couple minor adjustments for my use. Check out the results here. I'm excited I won't need to manually keep up this data. Now if I could wrap my head around the SQL (I never had to use RECURSE before). :) Best, Stefen Towers among the rest! GabGruntwerk 08:28, 8 March 2024 (UTC)[reply]
No problem, glad to help! Uhai (talk) 11:16, 8 March 2024 (UTC)[reply]