Archive 50 Archive 54 Archive 55 Archive 56

Talk:Donald Trump and using WP:LOCALCON to disallow citation archives

The fine folks over at Talk:Donald Trump currently have a "Current consensus" item on their talk page that disallows including archive URLs for citations that aren't dead (25. Do not add web archives to cited sources which are not dead. (Dec 2017, March 2018)). This runs counter to this guideline, specifically WP:DEADREF, which seems to suggest that it's better to preventatively archive pages than to wait for them to be dead and hope that an archived copy is available (this guideline also notes that even if a link doesn't necessarily die, the content of the link can change and make the source unsuitable for statements it is used to support). My gut says to simply strike that item as a clear WP:LOCALCON and direct those editors here to make their case for an exception, but I wanted to see what the feeling was here before proceeding. Also relevant is this closed discussion: Special:Permalink/1197984238#Reversion_of_archives. —Locke Cole • tc 07:07, 23 January 2024 (UTC)

You know that it is entirely possible to "preventatively archive pages" without pushing the archive link into Wikipedia, right? Just tell archive.org to archive the page. Then, if you ever need it, there it is on archive.org waiting for you. If you don't yet need it, what is the point of keeping a prematurely frozen archive link here, when archive.org will keep track of all the archived versions that it has and let you choose which one you want when you want it?
I would suggest that, to the extent that WP:DEADREF suggests copying the archive link here rather than merely making an archived copy, that language should be changed. But I note that the actual language of DEADREF is merely to consider making an archived copy; the actual language suggesting copying it here is in WP:ARCHIVEEARLY which does not even have the status of a Wikipedia guideline. Therefore, there is nothing for LOCALCON to be violating.
As for why it can be a bad idea to copy the links here: because sources may still be in flux and the editors may prefer readers to see the current version than an old frozen version. This may be especially true for topics in current politics. —David Eppstein (talk) 07:52, 23 January 2024 (UTC)
(edit conflict × 2) I was about to suggest something similar, i.e. making sure archives exist without actually adding them (if that's possible). Primefac (talk) 07:57, 23 January 2024 (UTC)
WP:DEADREF links to a section titled Preventing and repairing dead links, which is kind of where I got the impression it was more than simply a suggestion (and as to WP:ARCHIVEEARLY, it is literally tagged as a how-to guide). I agree it's possible to create an archive and not link it, but this still places the burden on future editors/readers to find a revision of the page that supports the statement being cited which can be problematic if a source changes (as you note for political content, this can happen frequently). I've also always viewed citations as a point-in-time thing when it comes to people/events, so the idea that an archive link might point to an "old" version is a feature, not a bug. The reasons given at Talk:Donald Trump all seemed to revolve around bloating of the page size which seems like a technical concern that shouldn't be getting used as a means to stifle page development. —Locke Cole • tc 08:08, 23 January 2024 (UTC)
So your position is that this guideline forces editors to use frozen versions of sources rather than allowing sources to be dynamic? Instead, that seems to me to be the kind of content-based editorial decision that a local consensus is entirely appropriate for. —David Eppstein (talk) 17:30, 23 January 2024 (UTC)
*sigh* If the live source changes after a statement is written, the frozen archive can be used to verify the source as it was originally seen... Nothing is being "forced", I'm just stating plainly that it's better behavior for editors to preserve their sources as they write rather than have to go through archives for potentially years to find the source that originally said something if the source ended up being dynamic/changing. Regardless of that, I'm concerned that we're recommending preventing dead links here in this guideline and a page has taken it upon itself to wholly disallow this good and desirable behavior. I'll again point to Special:Permalink/1197984238#Reversion_of_archives, where an editor was basically hit with a hammer over this and their response was about as good as you'd expect (I am never touching this article again). Do we really want individual pages to unilaterally decide these guidelines are irrelevant and drive off productive editors doing what we're suggesting? —Locke Cole • tc 18:44, 23 January 2024 (UTC)
"disallows including archive URLs for citations that aren't dead" – Good. It's not "good and desirable behavior". It's code bloat that we don't need, and additional cite-by-cite verbiage and link confusion that the reader doesn't need. Removing that cruft does nothing whatsoever to "stifle page development". It's entirely sufficient to have IA archive something while you cite it, and just not add to Wikipedia the archive-url that we do not presently need. If linkrot happens for a particular citation, the add it.  — SMcCandlish ¢ 😼  22:01, 2 February 2024 (UTC)
I think the theory is that if the archive links are added now, then they will less likely to be archive links to 404 pages (thus requiring manual intervention to find the correct one, rather than just using the most recent). WhatamIdoing (talk) 04:33, 3 February 2024 (UTC)
I don't understand how you think that changing Wikipedia to point to an archive link now, rather than merely telling the archive to make a copy but then only using that copy later when it is needed, would have any effect on what one finds at the archive link. If the archive link works, it works, and linking to it will not change that. If the archive link 404s, it 404s, and linking to it will not change that either. —David Eppstein (talk) 06:13, 3 February 2024 (UTC)
  • If you make the archive link today, and you record the archive link today, then you know the content is good, and you know which archive link you need to use.
  • If you make the archive link today, and sometime during the next several years, the page becomes a 404, then at some future, post-breakage date, you will have to go through multiple archived links, some of which have the desired content and some of which don't, to figure out which one actually verifies the contents (see "requiring manual intervention" in my comment above).
This is due to the structure and goals of the Internet Archive. They don't archive a URL just once. They make multiple copies at different points in time. WhatamIdoing (talk) 06:54, 15 February 2024 (UTC)
Does WP:DEADREF not reflect the current consensus here? I honestly don't care if people here want to shoot themselves in the foot anymore, so if the thought process from @David Eppstein and @SMcCandlish is that early archiving is code bloat that we don't need or it is entirely possible to "preventatively archive pages" without pushing the archive link into Wikipedia (sic, emphasis added) then perhaps it's time to strike DEADREF or shuffle it off to a different (non-guideline) page. Sources, especially online sources, can be brittle and subject to the whims of website designers and complete site overhauls where old links die completely (and current "archives" are just "not found" pages). I don't think "code bloat" should be a concern used to undermine preventative measures to preserve sources/citations. —Locke Cole • tc 04:45, 3 February 2024 (UTC)
DEADREF is not broken in any way, and is quite clear: When permanent links [DOIs, etc.] aren't available, consider making an archived copy of the cited document when writing the article; on-demand web archiving services such as the Wayback Machine (https://web.archive.org/save) or archive.today (https://archive.today) are fairly easy to use (see pre-emptive archiving). That does not say "and put the archived copy into the article before it is actually needed". All of the other material in that section, as in every single word of it, is about repairing citations with dead links.

What is broken is WP:ARCHIVEEARLY (which is part of a supplementary how-to essay, not a guideline), which someone added as their opinion and which clearly does not represent an actual consensus. It says To ensure link accessibility and stability, please consider pre-emptively adding an archive URL from an archive source such as the Internet Archive or WebCite. This practice is actually and clearly disputed, and that material should be changed, unless/until there is a firm consensus that not only is it good advice but that we actually need it despite WP:CREEP. It should instead re-state in a how-to manner what is said about this at DEADREF: create the archive on-demand today, but do not put it into the article if it is not already needed.  — SMcCandlish ¢ 😼  05:07, 3 February 2024 (UTC)

create the archive on-demand today, but do not put it into the article if it is not already needed [citation needed] —Locke Cole • tc 17:44, 3 February 2024 (UTC)
I really wish the consensus at Donald Trump were exported site-wide. I had a discussion about a month ago on the same topic at Talk:Augustus. Basically, people are still wasting their time WP:MEATBOT-ing and the results of it are extremely disruptive to editors seeking to actually improve articles rather than "maintaining" them. Ifly6 (talk) 03:42, 18 March 2024 (UTC)
I'm struggling to see how having an archive of a citation used in our article is somehow a negative thing. I still haven't come across a convincing reason other than WP:IJUSTDONTLIKEIT. Which.. cool. I like an encyclopedia I can verify the information it contains through it sources, today and in the future. It kind of stuns me that anyone can defend not having archive links ready that capture sources in the state they were when they were used for a statement. —Locke Cole • tc 04:41, 18 March 2024 (UTC)
Can you explain what these drive-by archivers are doing that isn't already done automatically? Ifly6 (talk) 05:12, 18 March 2024 (UTC)
automatically I'm assuming you mean bot that finds dead references and attempt to produce an archive after the link has died? That's easy, see WP:DEADREF, but basically it's better to create an archive before a page goes missing (or changes substantially) than to wait until the worst has happened. If an archiving system like archive.org hasn't produced a backup, then there's no getting that source back (because it's already gone). —Locke Cole • tc 05:45, 18 March 2024 (UTC)
Bots create the archives automatically too... doing so around 24 hours after the site is added. And if you use |access-date= the bot will also choose the version closest to or before the access date if the link 404s. What is being done that isn't just drive-by archivers duplicating bot work? Ifly6 (talk) 09:45, 18 March 2024 (UTC)
What bot is doing this? —Locke Cole • tc 15:10, 18 March 2024 (UTC)
It's all documented at Wikipedia:Link rot#Automatic archiving. There is a bot called No more 404 that archives added links. There is a bot, WP:IABOT, which monitors whether those links become dead and inserts |archive-url= when that occurs. Ifly6 (talk) 16:52, 18 March 2024 (UTC)
So... not a BOT in the WP:BOT sense but an opaque, off-wiki process that has no way of being verified? I'm still not entirely sure why people are so aggressively against pre-emptive archiving. Do you want your work to be unverifiable if a link goes stale, dead or changes? —Locke Cole • tc 19:52, 18 March 2024 (UTC)
If the information being cited at the link source changes, then our articles need to reflect that change. Linking to “archived” (ie out of date) version of the source isn’t what we want. Indeed, an out of date source may be considered “no longer reliable.” Blueboar (talk) 20:04, 18 March 2024 (UTC)
That's wonderful. It sounds like something that should be addressed on an article talk page when a changed link occurs. It sounds secondary to wanting to preserve our sources so they can be verified even if they change or disappear. —Locke Cole • tc 20:56, 18 March 2024 (UTC)
That is the opposite of how I see it. We cite a source to verify content in an article. If the information on a website changes, then it may no longer support that content. It is then necessary to either change what the article says to match the source, or find a new source to support what the article says. We need to be able to verify that the website in its previous state did indeed support the content in the article. If the original content of that website is no longer valid, then it doesn't matter whether the website is unchanged, has been updated, or is dead. We then need to assess available reliable sources to determine what the article should say. If we know that a website is likely to be updated, we should be citing an archived version of the website that supports the content of the article, rather than linking to something that is likely to stop supporting the contents. I think that in the overwhelming majority of cases, any changes to a website are likely to reduce its usefulness as a source for the contents of the article. Donald Albury 21:27, 18 March 2024 (UTC)
Exactly. Suppose we write that 25 people were killed in a deadly accident, based on a source that originally reports “25 people were killed”… ok, our content is verifiable. HOWEVER, let’s say that subsequently that source amends its reporting to say “25 people were seriously injured, and 3 died”… now our content is outdated, and is no-longer verified by the source. We need to update our content. If we prematurely archive the source, we might never catch that the source corrected its information and no longer supports the “25 dead” number. Blueboar (talk) 22:06, 18 March 2024 (UTC)
There are more kinds of articles than "current events"-type articles, you understand that right? There are other reasons to have archives prepared in advance as well, not least of which is being able to confirm if a statement was ever true (for behavioral issues where an editor makes a statement, provides a source, then claims it "changed"). —Locke Cole • tc 03:21, 19 March 2024 (UTC)
Blueboar, in that unusual circumstance, both the article content and the archived link need to be updated.
The far more common circumstance is: the article gets cited, the bot adds an archive link, the original site (or at least that article) dies, and we can still see what the original article said when it existed.
On a side note, I wonder if people are really understanding each other. We're talking about the difference between these two versions:
  • Regina Milanov. "Istorija ribarskog gazdinstva Ečka". Retrieved 2018-07-31.
  • Regina Milanov. "Istorija ribarskog gazdinstva Ečka". Archived from the original on 2019-10-04. Retrieved 2018-07-31.
If you've got the first, and the website dies (this particular website now throws a HTTP 403 error), then you can't tell whether the website used to say something relevant without someone digging through the Internet Archive to see whether they happened to archive that page before it died. WhatamIdoing (talk) 05:35, 26 March 2024 (UTC)
That article will have a thousand citations by the election. Will 1000 extra parameters and links slow the page loading? Rjjiii (talk) 05:18, 18 March 2024 (UTC)
Will 1000 extra parameters and links slow the page loading? Even if it does, it shouldn't be the basis for how we edit the project. See WP:AUM for a time when page loading was used as an excuse to try and prevent editors from creating a better encyclopedia. It's on the devs to look at things that are causing site problems and address them using technical means. —Locke Cole • tc 05:48, 18 March 2024 (UTC)
I can't see anything in policy that dictates the point either way, so editors seem to be allowed to make article by article decisions on the matter. Personally I would be pro-inclusion for the reason outlined by WhatamIdoing above, but I don't see anything that says it must be done one way or the other. -- LCU ActivelyDisinterested «@» °∆t° 10:12, 18 March 2024 (UTC)

Why would we WANT to “preemptively” archive?

Perhaps I am missing something, but I don’t really understand why anyone would want to archive a citation “preemptively”. Could someone who supports doing so enlighten me? Blueboar (talk) 16:51, 18 March 2024 (UTC)

Is there something at WP:DEADREF and WP:ARCHIVEEARLY you don't understand? —Locke Cole • tc 19:45, 18 March 2024 (UTC)
Yes… I understand using archives for dead links… but when we expect a webpage to change its content (say because it is out-of-date or incorrect) why would we want to cite an archived version? I would think we would want to cite the most up-to date version (and if necessary change OUR article content to match the up-to-date, corrected website). Blueboar (talk) 23:01, 18 March 2024 (UTC)
There's one case I know of where it's worth doing: Galactic Central hosts bibliographic details such as this which are autogenerated from a database that is updated once a quarter. When the quarterly update happens, all the URLs change, so if you were citing that page to show that Keith Laumer's The Planet Wreckers appeared in the February 1967 issue of Worlds of Tomorrow, the page will no longer contain that information. When I cite this website I usually preemptively archive it so that I don't have to go hunting for the right archive page a year later. Mike Christie (talk - contribs - library) 23:11, 18 March 2024 (UTC)
Blueboar, I see on a very regular basis articles with dead references that were never archived. Would it be nice if those references had been archived shortly after they were originally added to the article? Yes. Do I think it must occur? Not really. So I guess I'm not really in either of the camps discussing the issue in the main thread, but I guess from a maintenance standpoint I am pro-archive. Primefac (talk) 16:49, 19 March 2024 (UTC)
That is a good argument for triggering off-site archival of pages that you use as references. It is not an argument for using the archived copy to replace the source on Wikipedia. —David Eppstein (talk) 17:51, 19 March 2024 (UTC)
Who's talking about "replacing"? The idea, I gather, is to have both live URL and archive URL listed before it's too late. Which sounds reasonable enough. Frankly, I can understand if people say "I'm too lazy for that", but I don't have the slightest idea why anyone would want to prevent others from doing it. Gawaon (talk) 18:04, 19 March 2024 (UTC)
@David Eppstein, @Gawaon Yeah, is this the reason for the attitude thus far? At no point has anyone suggested we replace functional URLs with their archive's. This is why ((cite web)) has both a |url= and |archive-url=, and only when |url-status=dead (or if |url-status= is not set) does the |archive-url= get used if it is present. If |url= is still live, simply using |url-status=live will keep |archive-url= from being shown. This is all explained in the docs at ((cite web)). The only argument at Talk:Donald Trump appears to center around "bloat" of the page, which again, is not a reason to avoid good maintenance of one of our more popular articles. —Locke Cole • tc 22:32, 19 March 2024 (UTC)

A modest proposal

The following discussion is an archived record of a request for comment. Please do not modify it. No further edits should be made to this discussion. A summary of the conclusions reached follows.
Withdrawing to see if proposed method is acceptable —Locke Cole • tc 15:49, 28 March 2024 (UTC)

There appears to have been some misunderstandings above about what archiving a citation means (see WP:DEADREF for the current language and WP:ARCHIVEEARLY for the process, see WP:LINKROT for some reasons why archiving is a good idea). Just to reiterate, it does not mean replacing existing |url= with a link to Archive.org/Wayback Machine. It means filling the |archive-url= and |archive-date= parameters and setting |url-status=live for links that are not presently dead (see ((cite web)) for more details on the parameters and how they interact). While my reading was that creating such archive URLs was strongly encouraged, there appears to be a consensus that the current language does not even say that. However, what I would propose is not explicitly requiring archive links, but perhaps language here that effectively disallows individual pages from banning the practice altogether. I can't really see where having them causes any harm to our editors or our readers, and the benefits of having them far outweigh the arguments against including them.

All that being said, please indicate whether you Support including language that would forbid individual pages from creating a WP:LOCALCON to disallow archive links, or whether you would Oppose such language. —Locke Cole • tc 04:33, 27 March 2024 (UTC)

A proposed version appears below with the addition highlighted.

To help prevent dead links, persistent identifiers are available for some sources. Some journal articles have a digital object identifier (DOI); some online newspapers and blogs, and also Wikipedia, have permalinks that are stable. When permanent links aren't available, consider making an archived copy of the cited document when writing the article; on-demand web archiving services such as the Wayback Machine (https://web.archive.org/save) or archive.today (https://archive.today) are fairly easy to use (see pre-emptive archiving). No page covered by this guideline may forbid including archive links in citations as described here using a local consensus.

—Locke Cole • tc 04:33, 27 March 2024 (UTC)

!Votes

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Comments

In-line citations and spaces

Hi, I have a general question. Why is it required by the MOS to always put a citation immediately after the final character, instead of leaving a space in some cases? For the body of an article, I understand leaving no space. But for some areas, like an infobox, my humble opinion is that a space looks far better. Please see the infobox on this page. By the time you click the link, hopefully nobody has edited it, but currently some of the lines have spaces before citations and some don't. I may be in the minority, but I think when there is no space it looks dreadful, cluttered, and sometimes difficult to read if the word ends with a certain character, such as lowercase "i". If there's plenty of room for a space without messing up text or formatting, is there any flexibility for using spaces? Sorry, but this is just my pet peeve. I hate seeing those citations slammed up against the words when there is apparently no practical reason for it, other than adhering to a rigid policy. Wafflewombat (talk) 16:48, 20 April 2024 (UTC)

There's a note[3] in WP:Manual of Style#Punctuation and footnotes that suggests using a hair space for this purpose. -- LCU ActivelyDisinterested «@» °∆t° 20:29, 20 April 2024 (UTC)
Thanks for the tip! Very helpful to know. Wafflewombat (talk) 22:38, 20 April 2024 (UTC)