On Wikipedia, provenance is the origin of material that appears in articles. Provenance has been a controversial issue on Wikipedia from its inception. As Wikipedia has grown larger, published concerns about provenance have also grown. (See the external links below.) One of the more prominent critics has been Robert McHenry, former editor-in-chief of Encyclopædia Britannica, who suggested that Wikipedia is like a public restroom where what you find in a Wikipedia article is whatever the last user deposited.[need quotation to verify][This quote needs a citation]

Intellectual provenance has been a traditional concern of academia. Academics study the intellectual heritage of ideas, concepts, methods, theories, etc. Specifically, they are concerned with properly attributing work in their own field of study. So they question how this can be done for Wikipedia. For example, should academic credit be given for contributions to Wikipedia?

Provenance has at least two aspects: source and time. (See below.)

Provenance controversies[edit]

Providing provenance has proven to be controversial. Objections have included the following, e.g., from Wikipedia:Village pump (proposals):

Proposals for provenance[edit]

Proposals for providing provenance have been made and discussed on Wikipedia:Village pump (proposals).

Source provenance

On Wikipedia:Village pump (proposals) (see Wikipedia talk:Provenance), Pseudo Socrates made a proposal to provide source provenance by placing a Source Provenance button on the history page of each article that would produce a dynamic page that is a version of the current article modified as follows: Each interval of text would be preceded by a source link that would link to an article in the history where the text following the link first appeared in the editing history of the article. The name of the source link would be source for that version (log in name or IP address). At the bottom of the dynamic page the following notice would appear:

  The name of each provenance link above was derived from the second
  column, i.e., source (login name or IP address), of the
  history page of the article for which this page was produced.
  Clicking on a provenance link will produce a dynamic page that shows
  a (previous) version of the article in which the text following
  the link first appears in the editing history.  Of course the source
  may not be the real author of any of the text in an article that results
  from their edit.

Alternatively, source provenance could use a color-coding scheme: each of the top ten contributors to an article would be assigned a different color. Users could point the mouse at text of any color to see which contributor that color represented. Or, they could look at a legend at the bottom specifying which color represented which contributor. Users who have set up forced custom text colors in their web browser and blind users using screen readers would not be able to take advantage of this feature, though. This would also require that a standard be established to determine how contribution to an article is measured, e.g. the number of edits made, the number of characters added/changed, etc.

Temporal provenance

On Wikipedia:Village pump (proposals) (see Wikipedia talk:Provenance), Pseudo Socrates made a proposal to provide temporal provenance by placing a Temporal Provenance button on each article that would produce a dynamic page that was a version of the current article modified as follows: Each interval of text would be colored according to the following algorithm: Text of vintage less than 24 hours would be colored red, vintage more than 24 hours but less than one week would be colored green, remaining text would remain black.

Tom Cross' article "Puppy smoothies: Improving the reliability of open, collaborative Wikis" (First Monday) proposes a temporal provenance by coloring based on the number of edits a piece of text has survived. E.G., new edits would be colored red, text surviving 50 edits would be yellow, text surviving 100 edits would be green, and text surviving 150 edits would be black. The exact values here could be tempered by various factors (e.g., perhaps surviving many reads, or many days, would could for something too). By itself, this probably isn't enough; an attacker could automate "editing" to "promote" some other text. But counting only named edits, by multiple people, and adding a minimum time value (say, 7 days to get a new level) would be simple to do, and might make it workable.

Of course, temporal provenance can be combined with source provenance.

Examples[edit]

See also[edit]

External links[edit]