The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.

Operator: Dispenser

Automatic or Manually Assisted: Automatic

Programming Language(s): Python (pywikipedia framework)

Function Summary: Updates filesize for external links tagged with ((PDFlink)).

Edit period(s) (e.g. Continuous, daily, one time run): Monthly

Edit rate requested: 1 EPM (dependent on the querying servers responsiveness)

Already has a bot flag (Y/N): N

Function Details: It examines pages that trancludes the ((PDFlink)). Applies fixes related to the formating of ((PDFlink)) if needed. After it queries the HTTP server as indicated by the containing URL for the content-length and content-type, it records these values. It then inserts (or replaces) the second parameter the a three significant figure binary-prefixed size, which is derived from the content-length, followed by a HTML comment that contains Content-type and content-length from the query. This is repeats for every instance of ((PDFlink)) found in the wikitext. If the content-length value is not found or the URL 404s it leaves that instance unchanged. In the instance that ((PDFlink)) is embedded as the format parameter in a citation template it is removed. The list is created from the "What links here" page and filter out all non-article pages.

The source code been posted at User:PDFbot/pdfbot.py

Discussion

[edit]

This seems like a good idea to me. The only concern I have, is if there is a consensus that PDFs should be linked in this manner? HighInBC (Need help? Ask me) 21:02, 14 February 2007 (UTC)[reply]

((PDFlink)) provides a css wrapper around a regular external link which changes the external link icon for IE5-6 users. It also append PDF to the link and the the filesize if a second parameter is given.
I've also generated a sample edit and I hope to be cleaning up the code for posting. --Dispenser 21:47, 14 February 2007 (UTC)[reply]

Neat, I like how the details are in comments, they stay out of the way but are available. Sort of like how you can get the second that an edit occurred by using the Special:Export page. HighInBC (Need help? Ask me) 22:02, 14 February 2007 (UTC)[reply]

Second trial run: [1] [2] [3] [4]

I'm not sure if I like the unit linked as suggested by MOS:NUM#Binary prefixes. I'll remove it as it's not require. You'll also note that there's a tiny parsing bug where it doesn't parse indirect character references. --Dispenser 23:26, 14 February 2007 (UTC)[reply]

I like this idea a lot, but I worry that it will require too many edits. I counted 54,916 articles with PDF links as of the November 2006 DB dump. I guess that's only 38 days at one/minute but still, it's a lot of edits. If this were a vote I'd support. -Selket Talk 23:48, 14 February 2007 (UTC)[reply]

I think adding the file sizes is a good idea, but you should take out the space between the unit and the comment. — Omegatron 00:14, 15 February 2007 (UTC)[reply]
Changed. --Dispenser 00:33, 15 February 2007 (UTC)[reply]
I've been using AWB to generate the list so far. It has only reported 1,583 articles (excluding talk pages) which would take a little over a day to run. --Dispenser 00:33, 15 February 2007 (UTC)[reply]
I just realized that you were counting all PDFs linked from Wikipedia, no the bot only only work on those that specifically transclude the PDFlink template. --Dispenser 03:09, 15 February 2007 (UTC)[reply]
Well, cheers then. I like it. -Selket Talk 05:26, 15 February 2007 (UTC)[reply]

Third trial run: [5] [6] (These will probably be the last) --Dispenser 23:27, 16 February 2007 (UTC)[reply]

I like, but perhaps using more standard size abbreviations would be appropriate? MB as opposed to MiB and KB as opposed to KiB? Locriani 09:18, 18 February 2007 (UTC)[reply]
I tend to agree about the units. HighInBC (Need help? Ask me) 13:19, 18 February 2007 (UTC)[reply]
According to the Manual of Style using binary prefixes is the preferred (and proper) method since the base I'm using is 1024 and not SI's 1,000. The MOS also suggests linking the unit for those unfamiliar with them. However, I had wished to avoid creating unneeded links. I will linking the unit if it is support for it. —Dispenser 01:13, 19 February 2007 (UTC)[reply]
I'd support linking it as [[Mebibyte|MiB]]. --ais523 13:29, 19 February 2007 (UTC)
I agree. Please add the link. —METS501 (talk) 04:48, 24 February 2007 (UTC)[reply]
I've linked KiB, MiB, and GiB, but not bytes is that alright? —Dispenser 20:19, 4 March 2007 (UTC)[reply]
That seems sensible to me. --ais523 13:26, 5 March 2007 (UTC)

 Approved. This bot shall run with a flag. —METS501 (talk) 20:35, 5 March 2007 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.