The Signpost

Tips and tricks

Citation tools for dummies!

In the last edition of The Signpost, I covered three bots highly useful to WikiProjects. In this edition of Tips and Tricks, I'm going to focus on smaller, more personal tools, that let you focus on more specialized tasks – some user scripts, some gadgets, and some external tools. In particular, I'm going to try to do a brief summary of the main scripts/gadgets/tools related to citations.

Some of the text was taken from these tools' description pages, which I highly encourage you to read if any of them interest you.

How to install user scripts

A quick note on how to install user scripts, using my own WP:UPSD (which is hosted at User:Headbomb/unreliable.js) as an example. The other scripts can be installed in exactly the same way, replacing User:Headbomb/unreliable.js with User:Example/source.js accordingly. Some scripts may have additional customization options, detailed on their documentation page.

Method 1 – Automatic
  1. Go to the 'Gadgets' tab of your preferences and select the 'Install scripts without having to manually edit JavaScript files' option at the bottom of the 'Advanced' section. Refresh this page after enabling that.
  2. Click on the 'Install' button in the infobox on the right of the documentation page (if it exists), or at the top of the source page.
Method 2 – Manual
  1. Go to Special:MyPage/common.js. (Alternatively, you can go to Special:MyPage/skin.js to make the script apply only to your current skin.)
  2. Add importScript( 'User:Headbomb/unreliable.js' ); // Backlink: [[User:Headbomb/unreliable.js]] to the page (you may need to create it), like this.
  3. Save the page and bypass your cache to make sure the changes take effect.

As a general caveat, for cybersecurity reasons, you should only install user scripts if you trust their author to not be secretly nefarious. Likewise for external sites from third parties. Gadgets can be directly enabled in your preferences, and their code has been community vetted, thus they represent a lesser security risk.

Citation Expander

The Citation Expander is a gadget that lets you invoke Citation bot. I have already written an in-depth guide a year ago, so I'll summarize the main points and you can read that article if you want to know more. If you're new to tool-assisted editing, if you only install one tool today, this is very likely the one you want.

The key idea is that you can have citations like

  • ((cite journal |jstor=20107388 ))
  • ((cite journal |doi=10.1038/351624a0 ))
  • ((cite book |isbn=978-0-9920012-2-3 ))

and have the bot automatically expand them to

  • Keen, Suzanne (2006). "A Theory of Narrative Empathy". Narrative. 14 (3): 207–236. doi:10.1353/nar.2006.0015. JSTOR 20107388. S2CID 52228354.
  • Wigley, Dale B.; Davies, Gideon J.; Dodson, Eleanor J.; Maxwell, Anthony; Dodson, Guy (1991). "Crystal structure of an N-terminal fragment of the DNA gyrase B protein". Nature. 351 (6328): 624–629. Bibcode:1991Natur.351..624W. doi:10.1038/351624a0. PMID 1646964. S2CID 4373125.
  • Robichaud, Marc; Basque, Maurice (September 2013). Histoire de l'Université de Moncton. ISBN 978-0-9920012-2-3.

This can save you a huge amount of time and headaches, not having to format things yourself, not having to manually enter authors, etc. All you need is an identifier (URLs will often work too), and let the bot take over. Then all you have to do is review what the bot did (e.g. it missed the publisher of the book, which you could add yourself with |publisher=Institut d'Études Acadiennes).

You can also unleash the bot on existing citations so it can perform some cleanup and find other relevant bibliographic information.

OAbot

OAbot is a tool designed to find and add links to open access publications and find suitable links to free versions of paywalled articles by searching several databases, author websites, and so on. In the case of open-access DOIs, it will append |doi-access=free to the citation to flag that the publication is indeed open access.

The bot will make edits on its own, but you can ask the bot to make edits on your behalf via ToolForge. Keep in mind that some database or website, like CiteSeerX or ResearchGate, might host papers in violation of copyright, even if most are not, so you ought to review that the uploader has the permission to upload the paper in the first place. If they aren't one of the authors of the paper, they likely do not have such permission.

Citation Style Markers

Note: You'll need to use the manual install method (method 2) with the following code to use its custom options.

importScript('User:BrandonXLF/CitationStyleMarker.js'); // Backlink: [[User:BrandonXLF/CitationStyleMarker.js]]
window.CSMarkerMode = 'both';

BrandonXLF's Citation Style Markers is a very simple script that lets you know if there are clashes between Citation Style 1 (e.g. ((cite book))), Citation Style 2 (i.e. ((citation))), and others like ((vcite book)) or ((cite LSA)) in an article.

If you have two different citation styles, it will append a small CS1, CS2, CSVAN, or CSLSA at the end of the citation.

  • ((cite book |title=Albatrosses, Butlers, and Communists |publisher=Fake Publisher))
  • ((citation |title=Albatrosses, Butlers, and Communists |publisher=Fake Publisher))
  • Albatrosses, Butlers, and Communists. Fake Publisher. CS1
  • Albatrosses, Butlers, and Communists, Fake Publisher CS2

I personally choose to enable those warnings only when there's a clash. I can then search for 'CS1' and 'CS2' to see which is the dominant style and which citations are compromising consistency. It's often only a matter of changing one or two citations from a ((citation)) to a ((cite book)) or vice versa. Sometimes it's a matter of appending |mode=cs1 or |mode=cs2 to premade citations (like ((McCorduck 2004|mode=cs1))) or specialized templates (like ((cite arXiv|...|mode=cs2))), which will change the template style from CS1 to CS2 or vice versa.

Note that plain text citations, like <ref>Smith, J. (2010) "Random Book". Random Publisher. pp. 32–38 ((ISBN|978-0-123-45678-9))</ref> will be completely ignored by the script, so you still have to keep an eye out for those.

You can choose if you want the markers always present, present by default, off by default, or only present when there is a clash by changing OPTION in window.CSMarkerMode = 'OPTION'; above. See the documentation for details.

HarvErrors

Trappist the monk's HarvErrors is an evolution of the now-outdated Ucucha's HarvErrors. This script deals specifically with all sorts of issues unique to Harvard citation templates like ((harv)), ((harvnb)), ((sfn)), etc. These Harvard templates are prone to problems with their automatically generated links (see this old version of the industrial espionage article for an example).

HarvErrors checks these links for validity and displays an error message for incorrect links. In addition, it checks for citations that are likely set up to receive links, but do not have any pointing to them.

If you don't want to deal with warnings, and only with confirmed errors, use Svick's original HarvErrors instead.

Reference Tooltips

Placeholder alt text
Reference Tooltips – Hovering a footnote link shows what the footnote says

Reference Tooltips is a small gadget that simply shows you the citation upon hovering the reference link. You no longer need to click and go down to the reference section to see what the reference is. This is particularly helpful with articles that make use of ((rp)).

Sadly, it will not work if the Navigation Popups gadget is enabled.

Unreliable/Predatory Source Detector

My own (i.e. Headbomb's) Unreliable/Predatory Source Detector, or UPSD for short, is a relatively famous script. The core idea is that the script looks for URLs and DOIs, and colour codes them according to reliability, summarized in the table below.

Severity Appearance Explanation
Blacklisted example.com The source is blacklisted on Wikipedia and can only be used with explicit permission.
Deprecated/predatory example.com There is community consensus to deprecate the source. The source is considered generally unreliable, and use of the source is generally prohibited.
Generally unreliable example.com The source has a poor reputation for fact-checking, fails to correct errors, is self-published, is sponsored content, presents user-generated content, violates copyrights, or is otherwise of low-quality.
Marginally reliable example.com Sources which may or may not be appropriate for Wikipedia. For instance Forbes.com is generally reliable, but its contributors generally are not.

In general, the script is kept in sync with WP:CITEWATCH, WP:DEPRECATE, WP:NPPSG, WP:RSN discussions, WP:RSPSOURCES, WP:SPSLIST (not fully synced), WP:VSAFE/PSOURCES, ((Predatory open access source list)), and common sense "duh" cases I come across (like a parody website) with some minor differences. Obvious issues should be reported on the script's talk page, but since I do not want my opinion to be king, I maintain a general policy that everything is appealable at WP:RSN.

The documentation contains several warnings and caveats, and I would highly recommend that you at the very least read the big warning box at the top of the documentation and the full summary table before making use of this script so you understand its limitations.

The script can be customized to an extent, and can even support supplemental lists for specialized tasks, like User:GeneralNotability/unreliable-rules.js which helps find many black hat SEO efforts.

CiteUnseen

Placeholder alt text
CiteUnseen on Citizens United v. FEC:

SuperHamster's CiteUnseen analyses citations much like UPSD above, but focuses more on the origin and nature of sources. For example, it will mark citations as coming from advocacy groups, government-controlled outlets, opinion pieces, tabloids, etc. It will also add icons to reflect reliability based on WP:RSP, but you can configure which are displayed.

Like with any scripts dealing in citation analysis, it comes with heavy caveats, so you should read the documentation in detail to understand what it does.

Both CiteUnseen and UPSD will work together without issue, to provide fairly comprehensive analysis of both the reliability and nature of sources – and if one script misses a source, maybe the other will pick it up.

CiteHighlighter

Novem Linguae's CiteHighlighter is another citation analyzer script. CiteHighlighter colour-codes things in the same way they are found on WP:RSP#Legend. Sources of data include WP:RSP, WP:NPPSG (which is based on WP:RSN discussions), and the source reliability pages of various WikiProjects. It recognizes around 1,800 sources.

Placeholder alt text
CiteHighlighter colour-codes things in the same way they are found on WP:RSP

Like with any scripts dealing in citation analysis, you should read the documentation in detail to understand what it does. In particular, it makes certain assumptions like The New York Times = reliable, without consideration to the type of article being published, or a reference with a PMID = reliable, despite the PubMed database including sources of various reliability.

CiteHighlighter works with either or both of UPSD and CiteUnseen, so feel free to mix and match as your heart desires.

Copyvios

The Earwig's Copyvios invokes Earwig's Copyvio Detector. Which, as you might suspect, searches the web for potential copyright violations. Like reFill below, it runs on ToolForge.

This tool is normally more useful to reviewers than to regular editors; if you don't know that copy-pasting/closely paraphrasing things from sources is bad, the intervention you need is education on the topic, not more tools. New Page Patrollers and AFC Reviewers in particular might want to install this, but anyone that is interested in copyright cleanup will be well served by this tool.

Checklinks

Dispenser's Checklinks is typically used to make sure external links are working (i.e. not dead). If they are not, you can use it to search for archived versions of these links. It runs on Dispenser's personal site.

reFill/Reflinks/CiteGen

reFill is a tool that specializes in dealing with bare URLs. Like Copyvios above, it runs on ToolForge, but you can use Zhaofeng Li's Reflinks script to invoke it directly from Wikipedia, or CiteGen to run it from your web browser. You can also run it directly from a Linux or Windows PC (see reFill's FAQ for details).

It adds information (page title, work/website, author and publication date, if metadata is included) to bare URL references, and does additional fixes as well (e.g. combining duplicated references). The tool is an open-source replacement of Dispenser's Reflinks.

It is not perfect, and you will often need to cleanup its output, like |last=Welle |first=Deustche for Deutsche Welle links. But it gets you at least 90% of the way there!

Reference Organizer

Placeholder alt text
Reference Organizer lets you easily manage citations. It is particularly useful to name/manage references that are used in multiple places.

Kaniivel's Reference Organizer displays all an article's references in graphical user interface, where you can choose whether the references should be defined in the body of article or in the reference list template (see WP:LDRHOW). You can also use it to sort the references in various ways, and rename the references.

RefRenamer

Nardog's RefRenamer is similar to Reference Organizer, but focuses specifically on renaming Visual Editor reference names, like <ref name=":0"/> or <ref name=":1"/>, to something more editor-friendly, like <ref name="Smith-2006"/>. It will automatically make suggestions, but you can always choose a different name in case it picks something silly like <ref name="Rindfleischetikettierungsueberwachungsaufgabenuebertragungsgesetz"/>.

Sources

Ohconfucius's Sources is a script that deals with common newspapers, magazines, and websites to ensure that their names are accurate per the frontpage of these publications, from including/omitting the leading The, to making use of and or &, to making sure they are properly italicized per MOS:ITALICTITLE, to making sure that magazines aren't in the |publisher= field of citation templates, etc.

Closing remarks

Phew! That was a lot wasn't it? That's ok, you don't have to install all these scripts, or memorize all those details. Just pick the ones that seem useful to you.

That said, there are important caveats to using UPSD, CiteUnseen and CiteHighlighter. I know I've mentioned those before, but it bears repeating that these are not scripts to use mindlessly. They are, at least in part, based on the interpretation of discussions, many with limited participation. It's perfectly possible, and even likely, that some of these discussions did not reflect the entirety of the source, and that a closer look would change its classification from generally unreliable to marginally unreliable (or vice versa), or a source would be deemed unreliable in context X, but reliable in context Y.

Also remember that just because a source is considered generally unreliable, it doesn't mean that it cannot or shouldn't be used. Scripts cannot appreciate the full context in which a source is used. But you can. So don't be a meat popsicle, and use your brain.

Feel free to post your experiences (new or old) with any of these scripts in the comment section! Also feel free suggest other scripts that you feel might benefit your fellow Wikipedian!

Note: This article was updated on 5 August 2023 to mentioned Nardog's RefRenamer. The omission was due to a bug in WP:TOPSCRIPTS listings. The author apologizes for the oversight. Thanks to PamD for bringing this script to light in the comment section.


Tips and Tricks is a general editing advice column written by experienced editors. If you have suggestions for a topic, or want to submit your own advice, follow these links and let us know (or comment below)!

+ Add a comment

Discuss this story

  • Thank you so much for including the caveats that script-generated citations must be reviewed and frequently cleaned up. It's an important step that often goes overlooked. The citation generation scripts are powerful tools that can save a lot of time – and the stable identifiers tend to work much more accurately than the URLs – but not verifying their output afterwards is like following a recipe you've never made and bringing it to service without tasting it. Folly Mox (talk) 06:17, 1 August 2023 (UTC)Reply[reply]
  • Another useful tool, for sorting out existing citations which have been created using Visual Editor, is User:Nardog/RefRenamer. It looks for the references with "names" such as ":0" and offers editable suggestions for human-friendly names for the refs. The guidance at Wikipedia:Citing sources#Repeated citations says "To help with page maintenance, it is recommended that the text of the name have a connection to the inline citation or footnote, for example "author year page", but VE ignores this guidance. This tool makes it easy to improve on VE's work. PamD 07:30, 1 August 2023 (UTC)Reply[reply]
    Now mentioned. With apologies to @Nardog: for the initial omission. Headbomb {t · c · p · b} 07:47, 5 August 2023 (UTC)Reply[reply]
  • Thanks @Headbomb: scope_creepTalk 08:51, 1 August 2023 (UTC)Reply[reply]
  • Great article! However, I find that the OAbot Toolforge link always returns a 502 error on my device. Have anyone else had gotten it to work lately? Ca talk to me! 14:57, 1 August 2023 (UTC)Reply[reply]
    @Ca: Should be fixed now. Headbomb {t · c · p · b} 07:02, 5 August 2023 (UTC)Reply[reply]
  • I've noticed an increase in references with |author=August 2 and other clearly misplaced information due to these tools—or rather, due to incorrect usage of these excellent tools. No doubt the tools can get more sophisticated (e.g. never putting "[month] [number]" in an author parameter) but ultimately human oversight is always needed. — Bilorv (talk) 19:52, 2 August 2023 (UTC)Reply[reply]
    I've asked at Citoid talk for this extremely basic error checking, but the will doesn't seem to be there, possibly because, as I understand it, the WMF has a single contractor in charge of the codebase. The kind people who maintain reFill also don't have the time to implement it. The citation templates do populate a maintenance category for this genre of problem data, Category:CS1 maint: numeric names: authors list‎ (59,342), which I've asked to be elevated to error status, but also to no action. Folly Mox (talk) 20:17, 2 August 2023 (UTC)Reply[reply]
  • Regarding "... if you don't know that copy-pasting/closely paraphrasing things from sources is bad ...", I think a qualification is needed - when done for larger amounts of text. I believe that if the information from a single sentence in a source is being added to a Wikipedia article, not only is closely paraphrasing acceptable - it's really the only option. You can (and should) change a word or two, and/or rearrange a few words, focusing on what's factual within the sentence, since facts can't be copyrighted. -- John Broughton (♫♫) 18:16, 1 August 2023 (UTC)Reply[reply]
Ah, like "The leaves of the plant are short and spiky." Very limited ways to rephrase that and keep the sam meaning, moreso if the leaves are described with a technical term like "rugose". Adam Cuerden (talk)Has about 8.5% of all FPs. 19:22, 1 August 2023 (UTC)Reply[reply]
"The plant has short, spiky leaves." It's exceedingly rare that I've written a sentence that exactly matches a source (I'm thinking of one case that was "[subject] attended [long university name]"). Anyway, maybe the point is that close paraphrasing and copyright violations are only properties of a whole text, not a property of a single clause or few words. — Bilorv (talk) 19:52, 2 August 2023 (UTC)Reply[reply]
Point, though think it's harder to invent these kind of things. But think the idea matters more than the example: there are some very basic phrasings that are likely to be maintained because that's just how you say that kind of fact. "He died in 1897 in London." or something like "He was born in 1850. His father was an electrician.". Simple declarative statements. Adam Cuerden (talk)Has about 8.5% of all FPs. 07:41, 4 August 2023 (UTC)Reply[reply]
  • Thanks Headbomb for the very useful compilation of citation tools. I was looking for good way of identify duplicate references. It looks like reFill could fill the bill. I will try it out. Cheers. Boghog (talk) 20:11, 1 August 2023 (UTC)Reply[reply]
You can also run WP:AWB on an article, and it will combine duplicate references if other named references are used in an article. Headbomb {t · c · p · b} 20:32, 1 August 2023 (UTC)Reply[reply]
  • BTW, for those who tried OABot and it was giving a 503 error, the issue is now solved. Headbomb {t · c · p · b} 07:01, 5 August 2023 (UTC)Reply[reply]
  • I've joined in very late, but still, thank you so much for the precious advice! I've already started using CiteHighlighter, but I think RefRenamer will definitely come in handy for me, too! @Headbomb and Novem Linguae: If you don't mind, I've got just one more question about the former plug-in: how can I contribute to the expansion of the pools of sources recognized by the script, and especially the one provided by WP:NPPSG? As a non-native English speaker, I'd like to help rate more international sources, focusing on Italian and European media. I'm also very interested in adding entries by topics such as pop culture, sports (mostly association football), religion and science! Oltrepier (talk) 14:17, 12 August 2023 (UTC)Reply[reply]
    Hey @Oltrepier. Thanks for the ping. CiteHighlighter gets its sources from pages such as RSP, NPPSG, and WikiProject reliable sources lists. And NPPSG itself is a summary of RSN discussions. I'd prefer to keep the sources for CiteHighlighter some kind of consensus process that has at least 2 people involved, rather than 1 person. Hope that makes sense. Oh and I sync CiteHighlighter to NPPSG randomly every couple months, so if new stuff gets added there (following the RSN with at least 2 participants criteria mentioned above), it'll eventually make its way into CiteHighlighter. Hope this helps. –Novem Linguae (talk) 22:18, 13 August 2023 (UTC)Reply[reply]
    @Novem Linguae: Yes, it definitely does! I know how crucial it is to build consensus around the reliability (or unreliability) of a source, so I'll definitely respect that process whenever I'll propose or ask for advice on certain newspapers/magazines. Thank you for reaching out, by the way! Oltrepier (talk) 10:22, 14 August 2023 (UTC)Reply[reply]
  • Can any of these tools also automatically include the archived version of the source URL into the formatted reference? rootsmusic (talk) 04:39, 5 September 2023 (UTC)Reply[reply]
    You may be interested in WP:IABOT, which is a website you visit, feed it the article you want it to add archive links to, then that triggers a bot to go add archive links for you. –Novem Linguae (talk) 05:07, 5 September 2023 (UTC)Reply[reply]
  • @Headbomb: Please add WP:ProveIt. rootsmusic (talk) 17:17, 5 September 2023 (UTC)Reply[reply]
    Signpost articles do not usually undergo major additions after publication. –Novem Linguae (talk) 18:18, 5 September 2023 (UTC)Reply[reply]