The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


This proposal covers a plan to incorporate a large number of VIAF authority control identifiers to English Wikipedia biography articles, using the ((Authority control)) template. After an initial period of data-gathering and testing utilising multiple sources the template and VIAF parameter will be added or augmented by bot. This plan is being coordinated by Max Klein, the Wikipedian in Residence at OCLC, and Andrew Gray, the Wikipedian in Residence at the British Library.

Video Summary of the proposal

[edit]

On youtube.

Summary of the proposal

[edit]

The proposal was initially discussed on the Village Pump here and has been updated to include the feedback and commentary received during the discussions. While the Village Pump discussion was broadly favourable, it is being formally listed as an RFC in order to ensure clear support from the community before implementation later in 2012.

Authority control is the term used in librarianship, archival practice and related fields for unique identifiers to disambiguate objects (people, places, academic subjects, etc). On Wikipedia, this is handled with the ((authority control)) template, which places the identifiers at the end of the article and links out to library catalogues and central authority databases.

As well as the links for readers, this also embeds information which can be used to help build tools linking back into Wikipedia, or for maintaining its content.

It is widely used on the German Wikipedia (220,000 articles) and on Commons, but only lightly used on the English Wikipedia (4,000 articles). We plan to add a large number of identifiers to the English Wikipedia using data drawn from VIAF and from the German Wikipedia; depending on the level of overlap, this will probably be between 250,000 and 300,000 records. These will predominantly be drawn from the Virtual International Authority File (VIAF), an international project to merge multiple national authority files. VIAF identifiers correspond to identifiers in other systems, and can be used to populate other identifiers in the future.

Using data already embedded within VIAF, as well as on the German Wikipedia, we will identify pairs of corresponding VIAF numbers and articles. After data validation, a bot will add the VIAF number to the article using a reworked version of the ((Authority control)) template.

Frequently asked questions

[edit]
  1. How do I add a subject's VIAF to the article about them (or mine to my user page)?
    Use ((Authority control)).
  2. Why use VIAF and not another identifier?
    VIAF is a composite of several existing authority control databases, and so includes all the content from many of the other systems. Any entity with, for example, a LCCN should have a corresponding VIAF number as well, but not every entity with a VIAF number will have an LCCN. Adding VIAF does not preclude the inclusion of other identifiers (and may indeed make it easier); this isn't aiming to impose a sole standard.
  3. Why only people?
    The authority control system does cover other things, but for the moment (written 2013) we are only planning to cover people—this is to simplify the initial program, as well as target the articles where the template is most likely to be useful.
  4. What about errors in VIAF?
    You can report apparent errors in VIAF (or its constituent catalogues) at Wikipedia:VIAF/errors. These are then available to the relevant managing body, and for linkage repair on-Wiki. For the German equivalent noticeboard, see de:WP:PND/F.
  5. What about licensing?
    VIAF is licensed as ODC-BY, which is compatible with Wikipedia licensing; the use of a VIAF URI is sufficient attribution for the terms of the license.
  6. Will this give any control over Wikipedia content to third parties?
    No. While we will be including VIAF identifiers, the content of Wikipedia and VIAF will remain entirely separate. No metadata will be imported automatically from VIAF, nor will Wikipedia need to follow VIAF naming conventions.
  7. What if editors object to the template or the identifier?
    Editors of specific pages will in all cases be free to remove the metadata where it is inaccurate or felt to be editorially inappropriate. For the purposes of Wikipedia:Sanctions, the first revert of an automated or semi-automated addition of authority control information shall not count as a revert.
  8. What about pages covering two people?
    There are many cases where a single article deals with two individuals. If two VIAF identifiers refer to the same article, this will be logged but not added to the article; if it currently contains one but not the other, or a mixture of identifiers referring to both, this will also be flagged.
  9. What about Wikidata?
    Wikidata includes authority identifiers. However, adding the template now allows us to gain the benefit of having this information available before Wikipedia transcludes it from Wikidata ; it also will simplify any future work to add these identifiers to Wikidata.
  10. What about cases where several people have the same name?
    The primary purpose of authority control records is to help distinguish between people with the same (or similar) names. As such, identifiers are usually not matched on the name alone; the software is able to take account of other information such as birth and death dates.
  11. I wrote a new biographical article, how do find the VIAF identifier?
    Thank you for contributing to Wikipedia! You can look up a subject's VIAF at http://viaf.org/ Enter their name as the "Search Terms:", and leave the other parameters at their default values. If there are two or more entries with the same name, check the listed works for a match. If you're not sure which to use, you can ask for advice at Wikipedia talk:Authority control.
  12. I have another question
    Any comments, criticisms, etc. will be gratefully received, again at Wikipedia talk:Authority control.

Responses

[edit]
Please leave feedback or comments below. More general queries can also be left at Wikipedia talk:Authority control integration proposal.

Support

[edit]
  1. Tagishsimon (talk) 22:28, 28 June 2012 (UTC)[reply]
  2. DGG ( talk ) 00:45, 29 June 2012 (UTC)[reply]
  3. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:30, 29 June 2012 (UTC)[reply]
  4. Ironholds (talk) 10:46, 29 June 2012 (UTC)[reply]
  5. Nyttend (talk) 13:28, 29 June 2012 (UTC)[reply]
  6. --AndreasPraefcke (talk) 13:42, 29 June 2012 (UTC) Not only helpful for linking out to, but especially for getting linked to from catalogues, scholarly databases and the likes.[reply]
  7. Wer900talkcoordinationconsensus defined 16:41, 29 June 2012 (UTC)[reply]
  8. SarekOfVulcan (talk) 19:44, 29 June 2012 (UTC)[reply]
  9. --j⚛e deckertalk 22:31, 29 June 2012 (UTC)[reply]
  10. Imzadi 1979  23:02, 29 June 2012 (UTC)[reply]
  11.  Sandstein  06:16, 30 June 2012 (UTC)[reply]
  12. --Jarekt (talk) 11:45, 2 July 2012 (UTC)[reply]
  13. Filminfo 15:50, 2 July 2012 (UTC)[reply]
  14. the wub "?!" 16:17, 2 July 2012 (UTC)[reply]
  15. I support this project for having a large benefit, a low risk of harm, for being able to be undone if it is unwanted, and for the attention its coordinators give to addressing the concerns people have for it. This is a great experiment both in terms of incorporating data into Wikipedia and in terms of transparency in doing something new. I appreciate the commitment which project coordinators and participants have shown to making forthright replies to community questions. I have seen no make a comment or share an idea that makes me think anything other than that this project deserves to proceed. Blue Rasberry (talk) 20:51, 2 July 2012 (UTC)[reply]
  16. Bgwhite (talk) 06:57, 3 July 2012 (UTC)[reply]
  17. Mr impossible (talk) 12:07, 3 July 2012 (UTC) - this already seems to be appearing on Commons and the potential of this improved, linked data is very great.[reply]
  18. sunhai76 (talk) 14:20, 3 July 2012 (UTC)[reply]
  19. Yes please. Specs112 t c 12:36, 3 July 2012 (UTC)[reply]
  20. Some concerns, but outweighed by the benefit. Comments below. LeadSongDog come howl! 13:26, 3 July 2012 (UTC)[reply]
  21. Ruud 14:06, 3 July 2012 (UTC)[reply]
  22. kosboot (talk) 14:07, 3 July 2012 (UTC)[reply]
  23. Whouk (talk) 14:42, 3 July 2012 (UTC) Might (or might or not) be issues down the line with generating Wikipedia content from the established links but we can discuss that as and when. Sounds like there's a lot of thought gone in and real potential for this to be useful.[reply]
  24. Night of the Big Wind talk 15:43, 3 July 2012 (UTC)[reply]
  25. Gobōnobo + c 20:46, 3 July 2012 (UTC)[reply]
  26. Helps to uniquely identify a person and to link to works by and about him. Edison (talk) 15:25, 4 July 2012 (UTC)[reply]
  27. Easy to support. Whether the template renders or not can be a separate discussion, but that's not enough of a drawback to outweigh the obvious benefits. In general, though, and this is off-topic, when will Wikipedia move from a file-based system to a database system? Embedding this sort of metadata (along with dozens of other pieces of metadata like it) directly into the content page is ridiculous and will have to be resolved with a fundamental change to the way Wikipedia stores, allows changes to and presents content and metadata. Is anybody working on that? I'd like to help but I don't know where to ask about it. Zad68 17:33, 5 July 2012 (UTC)[reply]
  28. I completely understand where opposers who see this as too much metadata on WP articles are coming from. However, the solution needs to be technical (via Wikidata, better rendering, or semantic web enabled browsers), not by omitting this valuable information. -- Gaurav (talk) 22:53, 7 July 2012 (UTC)[reply]
  29. It Is Me Here t / c 17:56, 8 July 2012 (UTC)[reply]
  30. Unlike some meta-data schemes this one seems to have real value and can be implemented fairly easily for a start. Eluchil404 (talk) 08:12, 9 July 2012 (UTC)[reply]
  31. BDD (talk) 15:23, 10 July 2012 (UTC)[reply]
  32. James F. (talk) 18:15, 11 July 2012 (UTC)[reply]
  33. I only see benefits, and can't see any harm. - Jorgath (talk) (contribs) 23:12, 11 July 2012 (UTC)[reply]
  34. Dsp13 (talk) 00:49, 12 July 2012 (UTC)[reply]
  35. This would be profoundly helpful. wholeheartedly support. eldamorie (talk) 20:58, 13 July 2012 (UTC)[reply]
  36. Ijon (talk) 18:00, 17 July 2012 (UTC)[reply]
  37. Useful, far easier to maintain as a template than EL's, and really of higher value than most EL's are now. Courcelles 00:28, 18 July 2012 (UTC)[reply]
  38. Support, as this would be incredibly useful in identifying sources in languages other than English. G. C. Hood (talk) 17:33, 22 July 2012 (UTC)[reply]

Oppose

[edit]
  1. I am shocked, yet not surprised, yet perplexed, that no one thinks of the readers any more. I haven't seen any mention in discussions to date of why VIAF information needs to be visible in yet another stupid footer box. If you don't think these useless links belong in my comment, perhaps you ought to ask what possible use the common reader of an encyclopedia would have for them. Now even if you had been reading Sigmund Freud instead of this comment, those links in that ugly rectangle at the bottom of his article would be just as uninteresting as they are here. There's a reason that "PERSONDATA" does not render to the article; the same reason applies to "authority control codes". We've already got navboxes, articles feedback tools, external links, other navboxes, categories, and more—and this proposal would see another extremely low-value GUI element added to hundreds of thousands of articles. I am not opposed to markup that does not render in the article, although I don't understand why the metadata lovers (which, believe it or not, includes me in other contexts) don't come up with an article subpage proposal for all metadata. There are other groups taking usability quite seriously these days—specifically, ease of editing)—and at some point, they and the template formalists will have quite a battle. Riggr Mortis (talk) 09:51, 4 July 2012 (UTC)[reply]
    VIAF allows me to get to a number of catalogues of any author's work in a couple of clicks. That is useful to me as a reader, and I presume other readers will find it useful. It follows, in my head, that the central theses of your rant are invalid; that 1. no one has thought of the readers. Not so. 2. That the links are uninteresting and low value. Not so. Let me turn the question around. Why do you think it would be useful and what value would be added to wikipedia by preventing users from being able easily to traverse to a catalogue of the works of an author? --Tagishsimon (talk) 09:59, 4 July 2012 (UTC)[reply]
  2. We have far, far too much on the bottom of pages now, to the point that such templates make some large pages almost impossible to open on slow connections, and very expensive to open on mobile ones. The information is indeed of low value compared to the ever-increasing usability issues we are encountering. (It's all well and good that Westerners with high-speed, relatively inexpensive internet access might perhaps get a tiny bit of value from this; but we're trying to grow outside of our traditional reader base.) As external links, perhaps this is okay, but not as a template. And users should *never* be sanctioned for removing this kind of dross from articles. Risker (talk) 16:15, 4 July 2012 (UTC)[reply]
    wp:BuildTheWeb is what it's about. Visual presentation of this info in the page is (nearly) useless, it might just as well be in a subpage except we don't allow them in the mainspace. But where did you see even a suggestion of user sanctions? It is obviously intended to leave it up to the editors of each article to decide on whether these insertions should persist.LeadSongDog come howl! 16:36, 4 July 2012 (UTC)[reply]
    One point of the FAQ refers to them to clarify that any revert of the bot which included this explicitly doesn't count as a revert for the purpose of sanctions. (AIUI, it's an added protection from being sanctioned). I'm not sure this point is needed, but it's not an issue I do much work with so I left it in - please feel free to remove it if it's causing ambiguity. Andrew Gray (talk) 19:41, 4 July 2012 (UTC)[reply]
    No, it says the *first* revert will not be counted toward sanctions. It says nothing about subsequent reverts, and there have certainly been cases where users have been sanctioned for edit-warring with a bot. Risker (talk) 23:18, 17 July 2012 (UTC)[reply]
    I strongly doubt this template will add any significant amount of overhead to the loading time of pages (<<0.1%). The amount of space a particular interface elements occupies on your screen is in general completely unrelated to the number of bytes needed to encode that element. —Ruud 19:19, 4 July 2012 (UTC)[reply]
    The concern about non-high-speed users is a valid one, but the solution shouldn't be not including valuable information. A low-fi skin of some sort, coupled with some classification of article content types as within or without the scope of the low-fi skin, would solve this problem more generally. Ijon (talk) 18:02, 17 July 2012 (UTC)[reply]
    You see what just happened there, Risker? You had us going with the concern about bandwidth thing. But you couldn't help but describe a link giving access to a plethora of third-party author bibliographies as "dross"; because, you know, honesty always wins out. And you know that author bibliographies, such as are curated by, for instance, national libraries, are not dross. Not of interest to you, maybe. But unambiguously and objectively not dross. So, we come away with the impression that the bandwidth thing was just a proxy for your general dislike for this sort of info. It really would be easier for all concerned if you'd step up to the plate, and, like Riggr Mortis, above, tell us: Why do you think it would be useful and what value would be added to wikipedia by preventing users from being able easily to traverse to a catalogue of the works of an author? --Tagishsimon (talk) 22:59, 17 July 2012 (UTC)[reply]
    Tagishsimon, what's the problem with an external link? Seriously, adding it as a template of any kind *is* dross, when we have other equally effective solutions that are more respectful to our users. The information may be useful, but the process by which we provide access is punitive to the audience that has the strongest need for it. And exactly how is this going to fit with Wikidata? Why are we adding this separately? Why is this not part of the Wikidata collaboration? Risker (talk) 23:08, 17 July 2012 (UTC)[reply]
So let me get this straight. You're okay with the content, but you're arguing the toss over whether the content should be inserted as a template or a plan-text EL? (Let's get wikidata out of the way first: it's not IMO a good reason to halt everything whilst we wait for wikidata to catch up. I anticipate AC will integrate with wikidata exactly as any other structured data within articles.)
As to ELs, seems to me that there's quite a lot of info being given by the template - eight links, in fact. I don't think that's entirely consistent with users expectations of an EL, viz, a single link, not a set of eight links. Even cutting does to the three key numbers and arranging those one on each line seems to me not to be so great an idea. Neither the additional data rendered on the screen, nor marginal page load overhead seem to be anything other than trivial. I'm just not seeing cause for outrage. You'll tell me what I'm missing. --Tagishsimon (talk) 23:33, 17 July 2012 (UTC)[reply]
What outrage are you talking about, Tagishsimon? I'm not outraged, I'm just seeing another little project that someone thinks is a good idea adding on to other little projects that someone else thought was a good idea on top of even more little projects... Our article pages are full of all these little projects: special templates that link all kinds of articles (instead of creating a logical category); infoboxes that are ever-expanding and containing more and more trivial information; links to half a dozen other places; templates nested within templates that take increasingly long to call forward. The fact that pretty well everyone on this page has high-speed, relatively inexpensive access to the internet means that we don't know what the real "call time" is for a page, when at the end of a dial-up in Africa or a mobile in India. The German Wikipedia gets very few "hits" outside of the Western "high-speed-connected" parts of the world, so they do not have to have the same level of concern for accessibility. We, on the other hand, have become the standard reference for the world, and accessibility is becoming an ever-more significant factor for us. We are failing our audience by continually adding layer on layer of resource-intensive metadata. It's not that this one is *the* problem, it's that it is just *one more* problem. Risker (talk) 00:09, 18 July 2012 (UTC)[reply]
There's some ground for concern here, but the target is misplaced. Seriously, look at all the daft navboxen on [1] and ask yourself where the problem lies.LeadSongDog come howl! 05:50, 18 July 2012 (UTC)[reply]

Comments

[edit]
In 2009 VIAF thought that "Stradonitz, Stephan Kekule von" (Portugese entry) and "Kekule von Stradonitz, Stephan" (German entry) were the same person. I think this is just a portugese error in not doing german last names properly, but VIAF recovered from it and matched them and created a cluster number 57425893. Later deWP linked by hand to that VIAF cluster 57425893. Then in 2012 the Norwegian database was added, who have this person cataloged correctly (or at least the same as the Germans) as "Kekule von Stradonitz, Stephan". At this point VIAF identified the exact match of the German and Norwegian names, and deemed the difference of the Portugese one to mean that it was probably a different person since at least two other countries corroborated on the right name. So the German/Norwegian cluster became cluster number 141474549, while cluster 57425893 had its German part removed. This left deWP pointing to the cluster of the wrongly cataloged name. It's not their fault. But what it does mean is that if my bot went to add cluster 141474549 to the enWP article and checked against deWP, it would not match the deWP and classify the mismatch as VIAF clustering error, when in fact it is a Wikipedia linking error. That is one reason not to check the deWP (or treat it as law). The bot that is being proposed here for enWP is going to have a maintenance schedule that will update enWP (and down the road Wikidata) based on diffs, so this sort of things woudln't happen. Maximiliankleinoclc (talk) 19:04, 3 July 2012 (UTC)[reply]
The concern, of course, is to avoid wp:CIRCULAR referencing. It is policy on enWP (and I believe most others) that open wikis are not to be treated as wp:RS. This certainly does not permit using deWP as a RS for enWP, even if sanitized through VIAF or other external process. Some method of identifying similar clusters without necessarily asserting whether or not they refer to the same person seems inevitable. Equating and merging the clusters should be based on an identifiable basis document. Consider two living people of the same name and birthyear as the most worrisome case: when one dies (or does something discreditable), the other may be inadvertently mourned (or libelled, as the case may be). Our wp:BLP treatment is necessarily cautious. LeadSongDog come howl! 22:16, 3 July 2012 (UTC)[reply]
All this is not so much a point for not checking de.wikipedia, but for treating VIAF as what it is (the pooorest of identifiers since it changes so much) and always include LCCN and/or GND in this bot run wherever these numbers are included in current VIAF matchings. --AndreasPraefcke (talk) 05:15, 4 July 2012 (UTC)[reply]
Example: VIAF entries for Frank Herbert expose some inconsistencies.
VIAF has a very clear place where people can make corrections to VIAF records. VIAF does not tamper with individual libraries' (or countries') authorities records - that is their responsibility. Additionally, it is not the responsibility of an authority file to list all the works of a creator - it is just to establish the version of the name to be used in bibliographic records. -- kosboot (talk) 14:28, 9 July 2012 (UTC)[reply]
With "clear place" you mean the email feedback form? Anyway, the point I want to make is another: I am just curious about how the proposed »method for reporting apparent errors in VIAF (or its constituent catalogues) back to the relevant managing body« (FAQ #3) would look like. My concern is that an intransparent one-way channel, where one just dumps error reports might lead to frustration on the side of Wikipedians. Ad hoc I can think of two important questions that arise, when errors are spotted: "Has this already been reported?" and "What is done about it?" – In addition, error resolution times of currently 10 months, like we are experiencing on the otherwise superb de:Wikipedia:PND/F noticeboard are highly discouraging. -- Make (talk) 15:35, 9 July 2012 (UTC)[reply]
I didn't see your response previously. A good example is http://viaf.org/viaf/79197757 - Edmond Duponchel. He was listed on VIAF as two people - the French had him under Henri, and other countries used Edmond as the first name. As it happens, he was the subject of an ample Wikipedia discussion on his talk page. I reported it to VIAF in February. Thom Pease got back to me in a few days later questioning my source, so I showed him that Wikipedia talk page and he was convinced - all within about 1 hour's worth of emails. The correction was implemented by March (I believe VIAF is updated once a month). If someone is going to send in feedback, they should know what they're doing (i.e. have experience working with libraries' authority files). But the system does work. -- kosboot (talk) 02:06, 12 July 2012 (UTC)[reply]
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.