The Signpost

Tips and tricks

Cleaning up awful citations with Citation bot

In this first edition of the Tips and Tricks column, I will outline some of the possible ways to deal with awful citations on Wikipedia by making use of Citation bot. By 'awful' citations, I don't mean 'awful' in the sense that they are dealing with unreliable sources, but rather in the copy editor's sense, where the information is presented in a reader-unfriendly way.

Using the bot

The citation expander gadget adds a button (the exact appearance may differ) to trigger the bot from the edit window!

While the world of bots can be intimidating, making use of Citation bot is very actually very simple even if you don't know the first thing about programming. A full guide is available for readers who want more information, but the two easiest ways of using the bot are

  • Using the citation expander. This gadget will add an "expand citations" options to your sidebar, and a Citations button to your edit window. You can enable this gadget in the gadgets tab of your preferences panel. Go to the "Editing" section, and check the box labelled Citation expander.
  • Using the web interface. There will be a few options, the important ones being your username and the page you want to run the bot on.

After that, all you to do is find a page in need of some love and run the bot. It will automatically try to improve existing citations as best it can.

For sake of this tutorial, I will assume that you use the Citations button of the citation expander. This will let you review the changes made to the article and make sure everything is in order before saving. If you use the web interface method or want to click on "expand citation" in the sidebar, you will need to save the article after the "cleanup step" before running the bot. The bot will then make its changes automatically, and you will need to review them afterwards. While the first method should always work, the second will only work if the bot is unblocked.

Case 1: Accurate citations with limited usefulness

Let's say you come across a citation that's generally accurate, but is missing some information. Something like

Before cleanup
wikitext ((cite journal |last1=West |first1=Jevin D. |last2=Jacquet |first2=Jennifer |last3=King |first3=Molly M. |last4=Correll |first4=Shelley J. |last5=Bergstrom |first5=Carl T. |year=2013 |title=The Role of Gender in Scholarly Authorship |url=https://doi.org/10.1371%2Fjournal.pone.0066212 |journal=PLOS ONE |volume=8 |issue=7 |pages=e66212))
output West, Jevin D.; Jacquet, Jennifer; King, Molly M.; Correll, Shelley J.; Bergstrom, Carl T. (2013). "The Role of Gender in Scholarly Authorship". PLOS ONE. 8 (7): e66212.

Nothing in this citation is wrong strictly speaking, but it could be made a lot more useful to readers if it contained standard identifiers. Since we already have a doi url in |url=, we only need to click Citations to run the bot and get

After the bot
wikitext ((cite journal |last1=West |first1=Jevin D. |last2=Jacquet |first2=Jennifer |last3=King |first3=Molly M. |last4=Correll |first4=Shelley J. |last5=Bergstrom |first5=Carl T. |year=2013 |title=The Role of Gender in Scholarly Authorship |url=https://doi.org/10.1371%2Fjournal.pone.0066212 |journal=PLOS ONE |volume=8 |issue=7 |pages=e66212 |arxiv=1211.1759 |bibcode=2013PLoSO...866212W |doi=10.1371/journal.pone.0066212 |pmc=3718784 |pmid=23894278))
output West, Jevin D.; Jacquet, Jennifer; King, Molly M.; Correll, Shelley J.; Bergstrom, Carl T. (2013). "The Role of Gender in Scholarly Authorship". PLOS ONE. 8 (7): e66212. arXiv:1211.1759. Bibcode:2013PLoSO...866212W. doi:10.1371/journal.pone.0066212. PMC 3718784. PMID 23894278.((cite journal)): CS1 maint: unflagged free DOI (link)

And we have a nice, well-formatted citation, with many ways to find the article. Those who want to go the extra mile can realize the DOI will link you to a full free version of the article, and can add |doi-access=free to get

Final text
output West, Jevin D.; Jacquet, Jennifer; King, Molly M.; Correll, Shelley J.; Bergstrom, Carl T. (2013). "The Role of Gender in Scholarly Authorship". PLOS ONE. 8 (7): e66212. arXiv:1211.1759. Bibcode:2013PLoSO...866212W. doi:10.1371/journal.pone.0066212. PMC 3718784. PMID 23894278.

This method will work with most URLs that points to a standard identifier, or with citations already using standard identifiers: arXiv, Bibcode, DOI, ISBN, JSTOR, PMCID, PMID. It will also work with URL to major repositories, like Academia.edu, ResearchGate, ScienceDirect, and Wiley Online Library, although it will not necessary be as reliable as the DOI method.

While the bot is not guaranteed to find every DOIs and PMIDs out there, running the bot will often cut down a lot of work. So before to do it all yourself, it's a good idea to run the bot first. Not only will it add missing information, it will also cleanup several common mistakes like |last=Smith,|last=Smith, |volume=12(3)|volume=12 |issue=3, or |journal=PLOS GENETICS|journal=PLOS Genetics. You can then focus on cleaning up the things the bot was not able to figure out, like fixing poorly formatted or incomplete citations, or hunting down missing DOIs and JSTOR ids.

Case 2: Poor plain text citations

While WP:CITEVAR is a thing worth keeping in mind, plain text citations are often poorly presented, with typos, and limited usefulness. In those case, WP:CITEVAR does not apply, as no consistent style has been used. Consider for example the following

Before cleanup
wikitext G. Coppola + coauthors (2009). "Sérsic galaxy with Sérsic halo models of early-type galaxies: A TOOL FOR N-BODY SIMUILATION". Publications of the ASP. volume 121-879 pp. 437. ((doi|10.1086/599288))((bibcode|2009PASP..121..437C))
output G. Coppola + coauthors (2009). "Sérsic galaxy with Sérsic halo models of early-type galaxies: A TOOL FOR N-BODY SIMUILATION". Publications of the ASP. volume 121-879 pp. 437. doi:10.1086/599288Bibcode:2009PASP..121..437C

There are several things wrong with this one. Coauthors are not listed (or at least the standard et al. is not used). We have an inconsistently capitalized title, with a typo (SIMUILATION). The journal's name is abbreviated in a non-standard manner, and there are presentation problems with the volume/issue/pages. We also have issues with the presentation of identifiers as well. While it is possible to clean this up by hand, this would take a lot of time.

A much more efficient way of doing the cleanup is to the bot. We already have identifiers, so the hard part has been done for us; we simply need to feed them to the bot. We can do this in many ways.

Cleanup step
wikitext
method 1
Any of
  • ((cite journal |bibcode=2009PASP..121..437C))
  • ((cite journal |doi=10.1086/599288))
  • ((cite journal |bibcode=2009PASP..121..437C |doi=10.1086/599288))
  • ((cite journal |url=http://adsabs.harvard.edu/abs/2009PASP..121..437C))
  • ((cite journal |url=https://doi.org/10.1086%2F599288))
wikitext
method 2
Any of
  • <ref>http://adsabs.harvard.edu/abs/2009PASP..121..437C</ref>
  • <ref>https://doi.org/10.1086%2F599288</ref>

The second method in particular is very easy to use, since you can just right-click on the identifier, and copy paste the URL in the <ref></ref> tags. However it only works within <ref></ref> tags. The first method will also work in <ref></ref> tags as long as you don't mind typing a bit more characters and copy-pasting the identifiers.

With whichever method we prefer, we only need to click Citations to run the bot and get

After the bot
wikitext ((cite journal |last1=Coppola |first1=G. |last2=La Barbera |first2=F. |last3=Capaccioli |first3=M. |last4=Correll |first4=Shelley J. |last5=Bergstrom |first5=Carl T. |title=Sérsic Galaxy with Sérsic Halo Models of Early-type Galaxies: A Tool forN-body Simulations |journal=Publications of the Astronomical Society of the Pacific |volume=121 |issue=879 |pages=437–449 |year=2009 |arxiv=0903.4758 |bibcode=2009PASP..121..437C |doi=10.1086/599288))
output Coppola, G.; La Barbera, F.; Capaccioli, M.; Correll, Shelley J.; Bergstrom, Carl T. (2009). "Sérsic Galaxy with Sérsic Halo Models of Early-type Galaxies: A Tool forN-body Simulations". Publications of the Astronomical Society of the Pacific. 121 (879): 437–449. arXiv:0903.4758. Bibcode:2009PASP..121..437C. doi:10.1086/599288.

This is not perfect, but it is very close to the final desired version. We only need to do a bit of retouching (A Tool forN-body SimulationsA Tool for ''N''-body Simulations) to get

Final text
output Coppola, G.; La Barbera, F.; Capaccioli, M.; Correll, Shelley J.; Bergstrom, Carl T. (2009). "Sérsic Galaxy with Sérsic Halo Models of Early-type Galaxies: A Tool for N-body Simulations". Publications of the Astronomical Society of the Pacific. 121 (879): 437–449. arXiv:0903.4758. Bibcode:2009PASP..121..437C. doi:10.1086/599288.

If the citation was as poorly formatted as the initial version, it is very likely that WP:CITEVAR does not apply. However, if this was found in a featured article, and was the only poorly formatted citation in an otherwise excellent reference section, returning to a plain text citation is very straightforward. Simply copy-paste the output you get when previewing, with minor modifications

Final wikitext, if WP:CITEVAR applies
wikitext Coppola, G.; La Barbera, F.; Capaccioli, M.; Correll, Shelley J.; Bergstrom, Carl T. (2009). "Sérsic Galaxy with Sérsic Halo Models of Early-type Galaxies: A Tool for N-body Simulations". '''Publications of the Astronomical Society of the Pacific'''. '''121''' (879): 437–449. ((arXiv|0903.4758)). ((Bibcode|2009PASP..121..437C)). ((doi|10.1086/599288)).

Case 3: Innacurate citations

What if you come across a citation you know is just wildly inaccurate, or just so outrageously formatted that things barely make any sense?

Before cleanup
wikitext ((cite journal |last1=Ar≥≥on;Jacobin |first1=New Scientist |year=2018 |title=Deflector Selector says nuke asteroids |journal=Elsevier ScienceDirect |volume=pages3165|issue=3165 |pages=6 |bibcode=2018NewSc.237....6A |doi=10.1016/S0262-4079(18)30281-1))
output Ar≥≥on;Jacobin, New Scientist (2018). "Deflector Selector says nuke asteroids". Elsevier ScienceDirect. pages3165 (3165): 6. Bibcode:2014PhT....67d..48W. doi:10.1016/S0262-4079(18)30281-1.((cite journal)): CS1 maint: multiple names: authors list (link)

Fixing this manually is possible. But you would need to look up the original and pretty much reformat the whole thing. When something is this badly broken, it is good to make sure that whatever is intended to be cited is actually the thing being cited. This could easily be the result of vandalism that resulted in two citations being merged together inadequately. Here, Bibcode:2014PhT....67d..48W and doi:10.1016/S0262-4079(18)30281-1 don't even point to the same citation! However, once you determine which is the correct citation, we can follow the WP:TNT principle as applied to this particular citation: Blow it up and start over!

Let's assume that doi:10.1016/S0262-4079(18)30281-1 is the correct citation. We can then TNT it back to a state that bot can make sense of

Cleanup step
wikitext ((cite journal |doi=10.1016/S0262-4079(18)30281-1))

and then we only need to click Citations to run the bot and get

After the bot
wikitext ((cite journal |last1=Aron |first1=Jacob |year=2018 |title=Deflector Selector says nuke asteroids |journal=New Scientist |volume=237 |issue=3165 |pages=6 |bibcode=2018NewSc.237....6A |doi=10.1016/S0262-4079(18)30281-1))
output Aron, Jacob (2018). "Deflector Selector says nuke asteroids". New Scientist. 237 (3165): 6. Bibcode:2018NewSc.237....6A. doi:10.1016/S0262-4079(18)30281-1.

Dealing with the bot's imperfections

Sometimes the metadata available to the bot isn't perfect, and the bot will mess up something that it can't know is wrong. If the bot keeps messing up something it after you've fixed it, if it keeps adding a |series= to a ((cite journal)) template, you can bypass the bot by putting a comment in the problematic parameter

  • ((cite journal ... |series=<!--Deny citation bot, Journal of Physics is not a book series!-->))

this will let the bot know it shouldn't try to touch that parameter. Likewise it it incorrectly converts a ((cite journal)) to a ((cite book)), you can put a comment in the template's name

  • ((cite journal <!--Deny citation bot, Journal of Physics is not a book--> |last=... ))

You can report bugs and issues at the bot's talk page. You can also suggest improvements to the bot if you have some ideas.

Final remarks

While Citation bot is not perfect, correctly used it is a very powerful tool that can save you a ton of headaches and make your editing experience that much easier. I gave examples above using ((cite journal)), but the bot will also work with ((cite book)), ((cite web)) and many others (including ((citation))). I focused on cleanup in this Tips and Tricks column, but you can easily use these methods to add citations to articles. Simply find a good identifier and put in in a citation template (or a plain URL in <ref></ref> tags), and unleash the bot!

Happy editing!




Tips and Tricks is a general editing advice column written by experienced editors. If you have suggestions for a topic, or want to submit your own advice, follow these links and let us know (or comment below)!

+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
  • @Le Marteau: Glad you've found it useful! It truly is a time saver and a wonderful tool. It's not perfect, but it gets you 95-98% of the way, and saves you so many headaches. You can focus on content/accuracy instead of manually entering citations and making silly little mistakes that you won't catch because your mind is tired of looking at half a zillion citations. Does J. Phys. Chem. refer to the Journal of Physics and Chemistry or the Journal of Physical Chemistry? Let the bot figure it out! Headbomb {t · c · p · b} 08:27, 7 August 2022 (UTC)[reply]
  • Thanks very much for this, Headbomb! Graham (talk) 07:18, 1 September 2022 (UTC)[reply]
  • The “case 1” example is better before the “improvements” which only add illegible strings of numbers and letters that anyone who cares could find with a few seconds of effort, but fill the article bibliography with a massive amount of distracting visual clutter. Anyone who cares about "bibcode", "s2cid", "pmc", "pmid", "mr", "isbn", etc. etc. already knows how to look them up, and people who don’t care about them are poorly served by having to hunt past them looking for the actual content of the citation. For an open access paper like this, just one link is already entirely sufficient; for a non-open-access paper a single preprint link can be a big help. But adding every conceivable citation index identifier to every citation is ridiculous. –jacobolus (t) 14:27, 26 September 2022 (UTC)[reply]