The Signpost

Recent research

STEM articles judged unsuitable for undergraduates below the first paragraph

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"The First Paragraph Is As Good As It Gets": study discourages STEM students from using the rest of Wikipedia articles

A study published last month in the journal College Teaching[1] evaluated the suitability of English Wikipedia articles on STEM topics for undergraduate students' "opportunistic learning", defined as "informal, self-regulated study to learn, relearn, or be introduced to a concept".

The 28 articles were chosen from "six disciplines for which willing academics familiar with introductory STEM topics were available to participate in this study: Biology, Chemistry, Environmental Science, Mathematics, Physics, and Statistics", plus a "General" STEM category. Within each of these, the authors selected "four diverse introductory topics commonly encountered in STEM programs [... covering] topics commonly misunderstood or important in the discipline". The four "Statistics" articles had already been examined in a previous paper by three of the authors (see our review: "Evaluating Wikipedia as a self-learning resource for statistics: You know they'll use it").

Each article was evaluated in three components, based on a revision from 14 November 2019: "the entire article, the preamble (before the Table of Contents) [what Wikipedia's manual of style refers to as the lead section], and the preamble first paragraph". The focus on the latter two was motivated by the observation that they "are easily accessed on mobile devices with small screens, and [...] may be all that is read" (quoted from the earlier paper), an assumption supported by several data points and research results.

The articles were evaluated using what the authors call the "ACPD framework" (developed in their earlier paper), assigning a score from 1 ("Not suitable for opportunistic learning") to 3 ("Recommended for opportunistic learning") in each of four criteria:

  • Article accuracy (A), including definitions; interpretation; notation; usage; examples. Accuracy focuses on errors, ambiguities, omissions, and inconsistencies, but also correct spelling and grammar.
  • Effectiveness of the conceptual explanations (C): logical explanations that lead to procedures; explanation beyond definitions; explanation of what is behind the procedure.
  • Effectiveness of the procedural explanations (P): accuracy of procedures explained; examples used to explain procedure; explanation of procedure.
  • Effectiveness of the display or visual components (D): clear; accessible; coherent and well-paced; organized; logical; interesting; context; readability; density of formulae; use of diagrams, videos, animations etc. for illustration; complexity, use and suitability of images.

The authors summarize the resulting ratings as follows:

"Physics was the only discipline to receive zero 3-Ratings. In contrast, Chemistry received four 3-Ratings. [...] Accuracy (A-qualifier) was a barrier to opportunistic learning in nine of the 84 components (all in Environmental Science and Statistics) [...]. The number of A-qualifiers alone suggests not recommending Wikipedia as a learning resource in STEM disciplines.

Conceptual barriers were common (all components within every discipline, except the first paragraphs of Statistics articles), and procedural barriers reasonably common (except for Chemistry). [...] Statistics has (ignominiously) the most barriers regarding displays. Statistics and Environmental Science have the most identified barriers overall.

The number of C-, P- and D-qualifiers noticeably increased while moving from the first paragraph, to the preamble, to the article (Table 3), suggesting first paragraphs are the most useful component."

In the Statistics category, the authors judged the first paragraphs "excellent" with the exception of histogram. But "the preambles and the entire articles were generally poor, with many A-qualifiers (errors). Some errors were basic..."

In "Environmental Science" (the other discipline where the evaluation had flagged accuracy concerns, in the articles extinction and greenhouse effect), the study criticized "uneven, vague, overly simplistic and/or imprecise" writing, highlighting an example from the article species which said "evolutionary processes cause species to change continually, and to grade into one another". The study also took issues with "paragraphs only tangentially related to the topic [...] For example, the 'Biodiversity' (Environmental Science) article states 'Biodiversity inspires musicians, painters, sculptors, writers and other artists', which is not useful for a learner seeking to understand the concept of biodiversity".

In "Mathematics", "articles were generally instructive from an encyclopedic viewpoint, but the fluid narrative was less useful for learners unless supplemented". While there were no accuracy concerns, "C-qualifiers were frequently applied because the development was less helpful for opportunistic learning".

For "Chemistry", the study criticized that "eight of the 12 article components lacked conceptual development (C). The articles introduced concepts at a level substantially above that expected of undergraduates or assumed knowledge that most would not have".

The authors emphasize that these evaluations were specific to the suitability of the Wikipedia articles for opportunistic learning, and that "a technically correct article may be a poor opportunistic learning resource. Of course, some criticisms (e.g., accuracy) may apply more generally".

Briefly

  • See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
  • A podcast interview with Heather Ford (author of various research publications about Wikipedia and of an upcoming book titled "Writing the Revolution: Wikipedia and the Survival of Facts in the Digital Age" covers "the power struggles and community governance that makes the site one of the most trusted information sources on the web".
  • The Wikimedia Foundation's research team published the fourth and fifth in a series of biannual reports about its work.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"Quality change: norm or exception? Measurement, Analysis and Detection of Quality Change in Wikipedia"

From the abstract:[2]

"... we study evolution of a Wikipedia article with respect to [Wikipedia's internal] quality scales. Our results show novel non-intuitive patterns emerging from this exploration. As a second objective we attempt to develop an automated data driven approach for the detection of the early signals influencing the quality change of articles. We posit this as a change point detection problem whereby we represent an article as a time series of consecutive revisions and encode every revision by a set of intuitive features. Finally, various change point detection algorithms are used to efficiently and accurately detect the future change points."


"Digital Communication and Interactive Storytelling in Wikipedia : A Study of Greek Users' Interaction and Experience"

This master's thesis[3] presents results of a survey asking readers of Greek Wikipedia how useful they found its "interactive storytelling tools (hyperlinks to other articles, navigation tables, page previews, photos, external sources of information, etc.)", and about improvements they would suggest.


"A Map of Science in Wikipedia"

From the abstract:[4]

"We rely on an open dataset of citations from Wikipedia, and use network analysis to map the relationship between Wikipedia articles and scientific journal articles. We find that most journal articles cited from Wikipedia belong to STEM fields, in particular biology and medicine (47.6% of citations; 46.1% of cited articles). Furthermore, Wikipedia's biographies play an important role in connecting STEM fields with the humanities, in particular history."

"Analyzing Race and Country of Citizenship Bias in Wikidata"

From the abstract:[5]

By comparing Wikidata queries to real-world datasets [listed here, ...] we discovered that there is an overrepresentation of white individuals and those with citizenship in Europe and North America; the rest of the groups are generally underrepresented. Based on these findings, we have found and linked to Wikidata additional data about STEM scientists from the minorities. This data is ready to be inserted into Wikidata with a bot.

References

  1. ^ Dunn, Peter K.; Brunton, Elizabeth; Marshman, Margaret; McDougall, Robert; Kent, Damon; Masters, Nicole; McKay, David (2021-11-13). "The First Paragraph Is As Good As It Gets: STEM Articles in Wikipedia and Opportunistic Learning". College Teaching: 1–10. doi:10.1080/87567555.2021.2004387. ISSN 8756-7555. S2CID 244109849. Closed access icon
  2. ^ Das, Paramita; Guda, Bhanu Prakash Reddy; Seelaboyina, Sasi Bhusan; Sarkar, Soumya; Mukherjee, Animesh (2021-11-02). "Quality change: norm or exception? Measurement, Analysis and Detection of Quality Change in Wikipedia". arXiv:2111.01496 [cs.SI].
  3. ^ Mavridis, George (2021). Digital Communication and Interactive Storytelling in Wikipedia : A Study of Greek Users' Interaction and Experience.
  4. ^ Yang, Puyu; Colavizza, Giovanni (2021). "A Map of Science in Wikipedia". arXiv:2110.13790 [cs.DL].
  5. ^ Shaik, Zaina; Ilievski, Filip; Morstatter, Fred (2021-08-11). "Analyzing Race and Country of Citizenship Bias in Wikidata". arXiv:2108.05412 [cs.AI].


+ Add a comment

Discuss this story

These comments are automatically transcluded from this article's talk page. To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.
  • Unsurprising. I mentioned this on EEng's talk before ([1]). Ultimately STEM isn't learned by reading and our articles are written very technically and asan encyclopaedic reference rather than as a learning resource. ProcrastinatingReader (talk) 21:22, 28 December 2021 (UTC)[reply]
  • "Quality change: norm or exception? Measurement, Analysis and Detection of Quality Change in Wikipedia" – It's hard to imagine a research project more doomed from the start than one which (AND I AM NOT MAKING THIS UP) mistakes our absurd Stub-Start-C-B-GA-A-FA tags for actual indicators of article quality. EEng 22:26, 28 December 2021 (UTC)[reply]
  • "The First Paragraph Is As Good As It Gets" – here's the list of articles reviewed, along with Wikipedia's quality assessment. Mostly B/C, but there's one GA (Species) and one FA (Enzyme, kept at FAR in 2015). In those two cases they liked the lead, but Species got 1/3 overall and Enzyme got 2/3. ((u|Sdkb))talk 22:38, 28 December 2021 (UTC)[reply]
  • Our math and physics articles are a mix of garbled verbiage and verbal garbage, the apparent need to write in an impenetrable style instead of maybe using the English language is painful. It comes off to me as an intentional effort to show off instead of inform. To cite an obvious example, there are plenty of YouTube videos that explain Graham's number in a reasonably approachable manner (the man himself does it here, for instance), yet our article is an incomprehensible orgy of hypertechnical terms that quickly make me want to go full Oedipus on my eyes when I even attempt to read it. Linguistics articles can get like that too, for sure, but while a little clicking around is usually good enough to get through those I would never read Wikipedia to learn something about mathematics that I didn't already thoroughly understand; from that, I rather think the problem is self-evident. The Blade of the Northern Lights (話して下さい) 23:24, 28 December 2021 (UTC)[reply]
    Well, BotNL (hey -- you should have a bot called BotNLBot), as Andrew Gleason put it:
    It is notoriously difficult to convey the proper impression of the frontiers of mathematics to nonspecialists. Ultimately the difficulty stems from the fact that mathematics is an easier subject than the other sciences. Consequently, many of the important primary problems of the subject‍—‌that is, problems which can be understood by an intelligent outsider‍—‌have either been solved or carried to a point where an indirect approach is clearly required. The great bulk of pure mathematical research is concerned with secondary, tertiary, or higher-order problem, the very statement of which can hardly be understood until one has mastered a great deal of technical mathematics.
    I'm sorry, but Graham's Youtube chat may give you a warm feeling that you're learned something, but in fact you haven't. The purpose of our articles is to lend real understanding to those who have something like the background, not to let readers pretend they've learned something. That's what Youtube videos are for. EEng 06:42, 29 December 2021 (UTC)[reply]
    Heh. At least on YouTube that's about all you should expect, hence my lack of disappointment from it. The Blade of the Northern Lights (話して下さい) 15:36, 29 December 2021 (UTC)[reply]
    Just dropping this here: Talk:Falsifiability. — Bilorv (talk) 11:56, 30 December 2021 (UTC)[reply]
    I have to say, one of the very few reasonably well-written articles on this is 0.999.... Instead of going out of its way to instantly assault readers with every possible esoteric term imaginable, and a few that aren't, it presents the subject in a very straightforward manner and uses easily approachable examples to demonstrate its point. This is, as the article itself notes, a very counterintuitive bit of math, so clearly it can be done. I get that most other subjects aren't the abject crank magnets that article is, and I suspect that external pressure is why, almost alone, that article is written in such a way that it eschews navel-gazing gobbledygook in favor of language that actively encourages understanding. The Blade of the Northern Lights (話して下さい) 06:19, 31 December 2021 (UTC)[reply]
    To the contrary, 0.999... is mostly unsourced, and much of it looks like the sort of sophomoric filler one gets from writers who don't have a firm enough grasp of the subject to make any clear point but still need to reach a long enough page count. —David Eppstein (talk) 08:27, 31 December 2021 (UTC)[reply]
    I thought the lead was at least worth a read, I came away understanding more than when I went in. Also, since that article is under perpetual siege, I can see where its development is stunted. The Blade of the Northern Lights (話して下さい) 15:56, 31 December 2021 (UTC)[reply]
  • Concurring with The Blade of the Northern Lights, I usually give up after the first paragraph. The rest of such articles seem to be preaching to the choir. Fortunately in linguistics I'm one of the singers but I'm hopelessly confused when I want to learn something in STEM. Kudpung กุดผึ้ง (talk) 00:08, 29 December 2021 (UTC)[reply]
  • A sometimes applied doctrine says it's okay to be an insider to an academic discipline, writing an article, but we must remember our audience is outsiders. Alas, it seems many articles are written for insiders by insiders, or at least they try to teach the things every insider must know. If we are failing to address the audience properly, maybe we should include prominent links to tutorial videos, such as those of Khan Academy, when good ones exist. My efforts at trying to understand why stars pulsate, for example, have been more advanced by YouTube than by Wikipedia. WP assumes I know a lot about transparency of ionized gases, which actually fits me somewhat well, but also about enthalpy and other gas thermodynamic questions, which are a deep mystery to me. Such things tend to be addressed to grad students of the appropriate major and, umm, I never came close to graduating, nor did a course in optics or gas dynamics. If it were just astrophysics; no big deal but most our STEM areas are like that. Jim.henderson (talk) 00:15, 29 December 2021 (UTC)[reply]

File:Pocket ref cover 4th ed.png

  • I use Wikipedia a lot for my STEM-related professional field, and expect it to be as useful as Pocket Ref i.e. some formulas and figures, a reminder or a key to further reading, not a learning resource per se. And I'm not disappointed. ☆ Bri (talk) 01:17, 29 December 2021 (UTC)[reply]
    I probably wouldn't rely on Wikipedia for formulas. It's not uncommon I come across a maths article through edit filter logs or Huggle and notice some sneaky vandalism to a formula has passed through patrollers and managed to stay in the article. ProcrastinatingReader (talk) 04:20, 29 December 2021 (UTC)[reply]
    I do this sort of thing too, and it's usually a recognition-vs-recall situation; I can't remember what I'm looking for offhand but will probably notice if it's wrong. Opabinia regalis (talk) 06:08, 29 December 2021 (UTC)[reply]
  • The audience for Wikipedia isn't just "outsiders". Sometimes, the same article has to serve multiple purposes, because anyone might read it — from schoolchildren to professional mathematicians. This is why we have the "write one level down" and "put the easiest part up front" guidelines. And if we tried to write a hand-holding introduction to every subject that presumed half of a high-school education, we'd need more than one of each. No one introduction works for all readers. The same textbook can come across as friendly to one reader and gimmicky to another. Please the former, and you'll lose the latter. We'd also face WP:NPOV and WP:NOR problems. To write a textbook-style introduction, you need to pick how to begin and what to include, devise a path through the ideas, probably invent examples that haven't been used anywhere else before... Wikipedia, as a platform, simply isn't equipped to do any of that. As a physicist who has written expository material for students, research papers for colleagues, and Wikipedia articles, I can tell you that the process and the mindset are necessarily different for each one. XOR'easter (talk) 05:07, 29 December 2021 (UTC)[reply]
    Moreover, the editors who are subject-matter experts and who have managed to adapt themselves to the different requirements of writing here are either working away in obscure corners because that's where their enthusiasm leads them, or they're run ragged trying to clean up fringe nonsense, vanity-fueled autobiographies that try to use Wikipedia as LinkedIn, schlock added by well-meaning editors suckered by sensationalist pop-science and churnalism... There just aren't enough volunteers to go around. For years, I've seen complaints about the difficulty of our technical articles, and sooner or later that thread of anti-intellectualism always works its way in. "Ha ha, I was never good at math, lolz, I stopped reading at the first word I didn't get, let's pivot to video." There's no consideration for the intrinsic difficulty of the subject, or the challenge of writing about it, or the human element of finding people qualified to do so when all the incentives are for them to be doing work that advances their careers. I've been trying to make it work for years now, and I'm tired. And I'm done. XOR'easter (talk) 07:25, 29 December 2021 (UTC)[reply]
    @XOR'easter: - I've resigned myself to mostly focusing on minor WikiGnome cleanup work for mostly this reason. I do use Wikipedia to hoover up articles in my area of interest that cover concepts I didn't know existed, and I'm fortunate that most of my watchlist is low-interest and low-traffic work, but if I attempted to conduct the Big Big Rewrites that Definitely Need To Happen, I'd wear myself thin. I still find value in the addition of accessibility templates - language templates, using ((ubl)) and ((pb)) in place of line breaks - but it's definitely not the heavy-hitting stuff I used to do. I feel sorry for people working in more active areas of interest - I can't begin to imagine the nightmare of attempting to wrangle more than five high-importance, high-traffic articles at once.--Ineffablebookkeeper (talk) (((ping)) me!) 16:02, 18 January 2022 (UTC)[reply]
  • Of course we should always take what we can learn from this kind of exercise. In particular, we could make better use of the "Introduction to..." format to serve a wider range of audiences. But if you're going to publish an article criticizing a free resource from behind a paywall, well, pooh on you. Some "opportunistic learning" there. (Yes, I know, the OA fees are probably an arm and a leg.) Opabinia regalis (talk) 06:08, 29 December 2021 (UTC)[reply]
  • not useful for a learner seeking to understand the concept of biodiversity - and yet incredibly useful for someone wanting to understand the place of biodiversity in the world and culture at large. The study is inherently flawed by operating on the premise that any article is of primary or even exclusive use to people engaged in higher learning of its subject. Kingsif (talk) 06:20, 29 December 2021 (UTC)[reply]
  • I sometimes think whether I should write something about notions in my research field (which is condensed matter physics) and see whether anything useful would come out, but there is so much red tape (from colleagues potentially unhappy they are not cited or not cited they would like to, to users with 100-edit contribution on talk pages who know the subject way better than I potentially would ever learn it - in fact, any subject - and are not ready to accept that what they say may deviate from the divine truth) that I am not sure I would ever actually write anything related to my research.--Ymblanter (talk) 14:12, 29 December 2021 (UTC)[reply]
  • The Oxford English Dictionary tells us that the original meaning of encylopedia (ἐγκύκλιος παιδεία) is ‘encyclical education’, meaning "the circle of arts and sciences considered by the Greeks as essential to a liberal education." The first definition that follows this is "The circle of learning; a general course of instruction."[1] If a credible source judges the bulk of Wikipedia unsuitable for opportunistic learning by college undergraduates, then we have a serious problem. I have more to say on this than can reasonably be posted here: User:Kent G. Budge/Response to College Teaching critique --Kent G. Budge (talk) 17:20, 29 December 2021 (UTC)[reply]
  • One of the standards of WP is that each article is to be focused on its topic, and make major use of bluelinks with only brief statements about what they contain, rather than have an article be anything like a self-contained exposition on the whole topic. It's not a school essay, or a single page to learn an advanced concept if one doesn't have the foggiest idea of the field of study as a basis. I think the intro section, besides defining the topic in an accessible fashion, lets one know whether one has any sort of background or context to be able to understand much of the rest. We do have a ton of articles on introductory topics and introductory articles on some advanced topics, but as the study rightly points out, we also have lots of non-introductory articles on non-introductory topics. And, as others have noted, some topics really are fairly impenetrable by their nature, due to niche article scope. The problem might not be that "any given article isn't fully accessible for self-learning to anyone with no background", but rather that we don't have at least some article on each major or lay-accessible topic that is. Depending on one's starting point (article, and personal background), that might be a few clicks down a rabbit-hole, and require a major investment to get up to speed. DMacks (talk) 04:24, 30 December 2021 (UTC)[reply]
  • In "Mathematics", "articles were generally instructive from an encyclopedic viewpoint, but the fluid narrative was less useful for learners unless supplemented". While there were no accuracy concerns, "C-qualifiers were frequently applied because the development was less helpful for opportunistic learning". Accords with my experience, not just of Wikipedia, but of university textbooks, lecture notes and tutorials. Higher education maths pedagogy is horrific. I have had a small number of excellent teachers, lecture notes and truly fantastic textbooks (Information Theory, Inference, and Learning Algorithms, if anyone's looking for some light bedtime reading). However, for the most part, mathematics writing is done by a lot of thinking and scribbling and playing with toy examples, coming to some abstract conclusions, and then writing down the abstract conclusions and erasing all evidence of the toy examples, forcing the reader to rediscover those for themselves if they want to actually understand why something is true. During university, I regularly had the experience that a correct answer I had given to a problem sheet question was impenetrable to me just three days later.
    The problem for Wikipedia, then, is that making articles understandable necessarily makes them unverifiable, because mathematical papers and even textbooks and lecture notes do not give any or enough illustrative examples to learn the concept, or spell out the finer points that would be immediately obvious to anyone with the education to be reading the source. — Bilorv (talk) 11:56, 30 December 2021 (UTC)[reply]
    • "The proof is left as an excercise for the reader." Or end-of-chapter question #1 was to prove a theorem not discussed in the chapter, and then question #2 was to use that new theorem to prove yet some other thing. *cringe* DMacks (talk) 16:02, 30 December 2021 (UTC)[reply]
    For real. And at any rate, this "it's not useful for learners unless supplemented" conclusion is... describing exactly what an encyclopedia should be, anyway. An encyclopedia is a reference work! Its use as a pedagogical tool is secondary. I know students are reading it, and that's a very good reason (among many others) to care about whether or not our articles are accurate. But Wikipedia is a free encyclopedia, not a free university course. If it doesn't rate well as a teaching device, well, so what? -- asilvering (talk) 21:29, 30 December 2021 (UTC)[reply]
  • @The Blade of the Northern Lights: - I'd second the comment about linguistics articles - and I'd also point out that densely-written articles that take on an essay-like tone cause problems for editors who aren't even attempting to rewrite the article's content itself. It can be difficult to know what to fix or even if the addition of certain templates will cause problems if the article is too dense to work on. This repels both readers and also editors attempting to improve whatever they can on an article outside of their area of interest.--Ineffablebookkeeper (talk) (((ping)) me!) 16:02, 18 January 2022 (UTC)[reply]
  • Histogram really does suck though. That's not a proper lede at all. Way too long and what looks like body material that was moved up and shoved into the lede for some reason. SilverserenC 19:21, 30 December 2021 (UTC)[reply]
    For my background at least, the first three sentences explain it all perfectly. The rest probably can be moved down.--Ymblanter (talk) 20:26, 30 December 2021 (UTC)[reply]
  • I have to say, I wasn't particularly impressed by the lead publication here. Four articles per area is barely enough to generate anecdotes, much less data or understanding. And the authors' idea that someone without any background knowledge should be able to learn a topic from a quick read of its Wikipedia article, and that any deeper content in the article that impedes that quick understanding is superfluous, seems...misguided at best. Encyclopedia articles should be encyclopedic. —David Eppstein (talk) 08:35, 31 December 2021 (UTC)[reply]
  • I'd be more interested in a study of how STEM students (and others) actually use Wikipedia, rather than some educators speculation on how useful it ought to be. I certainly would have found it very helpful when I was in school. Back then we were taught to write terms and concepts we didn't understand on index cards and spend time in the library looking them up. How quaint that seems today. Of course search engines could be used if Wikipedia did not exist, but Wikipedia provides more organized, hyperlinked and sourced presentations. One thing that would be interesting to know is how often students start their inquires on Wikipedia rather than a general search. Wikipedia is often one of the top search results in any case. And yes, our articles leave room for improvement. They always will.--agr (talk) 17:58, 4 January 2022 (UTC)[reply]
  • Pouncing on brief summary statements in the lead without looking at the cited paragraphs (and sources) that they summarize is always going to throw up ambiguities – all natural language is ambiguous and metaphoric, basta. If "opportunistic learning" means never going beyond the lead section – because the rest of the article is – what? Too long ("Too many notes, Herr Mozart")? Too difficult? Too picky with defined terms, careful citations, attributions to scientists? – then opportunistic readers are self-condemned to read only the lead, and maybe flick through the pictures, glance at the captions. Coffee-tablePedia? TwitterFeediaPedia? Oh dear. Chiswick Chap (talk) 10:04, 21 January 2022 (UTC)[reply]