The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.

Operator: Cyde Weys

Automatic or Manually Assisted: Automatic, runs every xx minutes as a cron job.

Programming Language(s): pyWikipediaBot and crontab

Function Summary: Automatically listifies certain maintenance categories at regular intervals.

Edit period(s) (e.g. Continuous, daily, one time run): Continuous

Edit rate requested: See below for full description, this varies.

Already has a bot flag (Y/N): Yes

Function Details:

A test run has already been completed without malfunction or incident. Basically, the bot automatically updates the following four pages at given intervals (each page is a listification of a category):

The purpose of this is to keep track of the members of these categories over time. This is a useful analytical tool, as the way categories are implemented on-wiki, they only display what is in the category at the exact moment, not what was in it, say, minutes or hours ago. By doing this all on-wiki and keeping it in a history page on-wiki, it is easily accessible and understandable by all regular Wikipedians. The lists will prove useful to, among other things, identify pages which have been removed from proposed deletion categories without actually being deleted, allow an easy way to analyze the churn rate of maintenance categories, provide a semi-static list view of these categories which some editors prefer to use, and allows long-term auditing of requests for unblock that doesn't require database access to view all old contributions to parse for ((unblock)).

As the bot is set up right now, it edits the pages at intervals of 20, 5, 20, and 10 minutes, respectively. This can easily be changed as necessary. I haven't yet figured out the final values I want to settle with, as I've only just started experimenting with the trade-off between keeping the lists up to date and the resources used in committing an edit.

Discussion

[edit]
Bot trial run approved for the duration of one week. Betacommand (talkcontribsBot) 03:47, 19 January 2007 (UTC)[reply]
The lists look exceptional (particularly the PROD ones) -- Samir धर्म 12:38, 19 January 2007 (UTC)[reply]
Will these pages reflect the current entries as of point in time, or be ever apending? — xaosflux Talk 15:13, 21 January 2007 (UTC)[reply]
Looks like they are replacing, in that case care to add some functionality to them , perhaps (delete) links next to the CSD's or old prods? — xaosflux Talk 05:27, 22 January 2007 (UTC)[reply]
Admins are expected to check the articles before deleting, even if they are expired prods, to make sure it should actually be deleted. As such, they'll open the article, review it, and then they might as well click the "delete" tab. I may have missed something, but it would seem that the "delete" button in the list would be redundant and possibly even foster non-sighted deletion, which is a Very Bad Thing. Just my thoughts, and if I've gone off on a totally wrong tangent, please tell me. Cheers, Daniel.Bryant 06:54, 23 January 2007 (UTC)[reply]
Unless you have a tabbed browser in which case you might open the article and the delete page at the same time. I do that on Wikipedia:AIV all the time, if the user does not need to be blocked I close the page, if he does the page is ready. HighInBC (Need help? Ask me) 22:05, 23 January 2007 (UTC)[reply]

I don't particularly see the value of delete links, although I will look into putting the current number of items in the list into the edit summary. That's something that can easily be done, and should add some value. Unfortunately, because Cydebot is a bot, his edits won't by default show up in watchlists, so someone has to specifically choose to view bot edits before this edit summary stuff will really be all that useful. --Cyde Weys 05:10, 24 January 2007 (UTC)[reply]

There are situtions where CSD gets loaded with pages that you can delete simply on title alone, but it is rare enough that it would more likely cause the bad practice mentioned above for those not familiar enough with the situation. (Discussion withdrawn) — xaosflux Talk 03:27, 25 January 2007 (UTC)[reply]

I have modified the edit summary in pyWikipediaBot to include the number of entries in the listified category. This addresses the suggestion that was made above. --Cyde Weys 17:40, 25 January 2007 (UTC)[reply]

Excellent, I've been waiting for that. I find these useful to use, but if someone is going to use this for analysis wouldn't it make sense to divide the list into 3 sections (new items added since the last update, items carried over from last update, and items removed since last update (which would be red links in the case of the deletion lists)). I guess it depends on how you intend to use the output (other than as an alternate way to see what's in the category). NoSeptember 18:03, 25 January 2007 (UTC)
Well, the list of what has been added and removed is already in the diffs between revisions, so it isn't really that necessary to also display it explicitly within each revision. I was, however, thinking of breaking out the number of entries per diff, which doesn't really tell the whole story. For instance, nine new entries could be added to CAT:CSD while ten are deleted, yet someone just tracking the # of entries in the edit summaries would see it as going down by one, and wouldn't think to view the diff to see that some new stuff has actually been added. Thus I'm looking into putting numbers into the edit summary like (100 current, 10 removed, 9 added) or some such. --Cyde Weys 21:30, 25 January 2007 (UTC)[reply]
Sounds good to me. The intervals sound quite short though: can anyone comment on the implications of frequent category page-gets as far as server impact is concerned? I'd assume these have more implications for db queries than the same number of article (etc) page fetches due to different caching, but I'm entirely hazy on the details of how that's implemented. Anyhoo, that's just a matter of tuning, as Cyde says. Alai 23:57, 26 January 2007 (UTC)[reply]
Getting a category page takes just as much server "effort" (as it were) as getting an article page. What requires a little bit of extra effort with categories is updating them when the text of another page changes to add or remove the category from it. But the category page itself is entirely cached. Once every five minutes is going to have a completely negligible effect on the servers. Hell, look at AntiVandalBot: it compares the diffs of every edit made to Wikipedia, live, and it's not really a problem either. --Cyde Weys 16:59, 28 January 2007 (UTC)[reply]
From what I know of the Mediawiki software, I think Cyde is right about category pages being cached, and having a server load on par with loading a cached article. HighInBC (Need help? Ask me) 17:05, 28 January 2007 (UTC)[reply]
I know zilch about same, but I should have inferred as much from the structure of the job queue handling of large category changes. Fair enough then, aforementioned 'tuning' on the basis of the write-rate seems perfectly sufficient. I'll note that the "not a problem by analogy" argument isn't entirely sound, since replication of numerous reasonable loads isn't necessarily itself reasonable. (More of a problem if there were umpteen AVBs, though, rather than in this case.) Alai 07:31, 31 January 2007 (UTC)[reply]

 Approved.Mets501 (talk) 19:51, 7 February 2007 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.