Scots Wikipedia language quality problems ripple around the Internet, make the news, and trigger Meta-Wiki response
The Scots Wikipedia is a quiet, sleepy, low activity edition of Wikipedia written in the Scots language, the Anglic language traditionally spoken in the lowlands of Scotland. Nobody paid it much mind... until August 2020, when a Reddit thread entitled "I've discovered that almost every single article on the Scots version of Wikipedia is written by the same person – an American teenager who can’t speak Scots" spread across the Internet. This young volunteer, who dedicated a large amount of time over seven years to translating segments of the English Wikipedia into Scots, unfortunately seemingly was never told that maintaining English sentence structure and translating words 1:1 from a dictionary is no way to translate at all. Further investigation showed the quality problems ran deep: articles untouched by the prolific user in question also had poor quality and ungrammatical Scots, meaning that many more articles on Scots Wikipedia may be essentially worthless. The author of the Reddit post called the incident "cultural vandalism on an unprecedented scale" and wrote that "This is going to sound incredibly hyperbolic and hysterical but I think this person has possibly done more damage to the Scots language than anyone else in history."
The story hit the news media, for both high and low reasons. For the high road, this was a massive and notable failure of Wikipedia, one that has likely poisoned training data sets for the Scots language used by translation algorithms, and led any curious human readers to think that Scots is simply English in an accent with a few funky words thrown in. For the low road, the hobbies and naivety of the prolific user were mocked. Some of the notable coverage includes:
- Alleged Teen Brony Has Filled the Scots Wiki With Thousands of Fake Translations (Gizmodo)
- Shock an aw: US teenager wrote huge slice of Scots Wikipedia (The Guardian)
- Scots Wikipedia taken over by American teenager who wrote thousands of 'very odd' articles without learning language (i)
- How a Scots Wikipedia scandal highlighted AI’s data problem (Quartz)
Several of the tabloid-style sources omitted from this list got the story essentially wrong, confusing Scots with the Scottish Gaelic language, suggesting that the user might have just been writing in silly Groundskeeper Willie-ese, or that the user's admin status was relevant (a status much-misunderstood by the media). The problem was the user's edits: there has been no allegation of misuse of admin tools.
Within the Wikipedia community, several actions were kicked off. User:MJL, the only other active admin on Scots Wikipedia at the time, boldly set up their own "AMA" (short for 'Ask Me Anything') on the Scotland Subreddit to explain the situation as well as solicit interest in potential fixes for Scots Wikipedia. The prolific user apologized for his mistakes after being informed of his lack of proficiency in Scots and has withdrawn from editing for now. Various split discussions eventually coalesced into an RFC on Meta-Wiki: meta:Requests for comment/Large scale language inaccuracies on the Scots Wikipedia. The current short-term course of action with the most support seems to be having a bot perform some sort of mass rollback of affected articles if they meet criteria (which are still being determined), enlisting new admins, and some proposals for other new bots.
The long-term solution requires understanding how this disaster happened in the first place. On Wikipedia user page language templates, the prolific contributor only marked himself a 2/5 and a 3/5 (changing over time) at Scots proficiency in the first place. If he was really that bad at Scots – more like a 1/5 – how did nobody notice? The answer: there simply wasn't anyone to notice. To the extent there ever was an authentic Scots-speaking Scots Wikipedia community, it had departed by 2012. The contributor's contributions were "Scots-y" enough to keep non-native speakers paying mild attention to the wiki from realizing the extent of their problems, and the user himself was a young kid when this started, clearly without the best self-awareness. If even one or two native Scots speakers had been active, they could have sounded the alarm, long before seven years had passed of wasted, counterproductive effort. The fundamental problem at Scots Wikipedia is the lack of a Scots-speaking community of editors. Perhaps not only bad things have emerged from the incident: the burst of attention has drawn the attention of Scots language groups. If the end result is to expand the Scots Wikipedia community, then perhaps something good will have come of this. –Sn
Interim Trust & Safety Case Review Committee
In early July, the Wikimedia Foundation announced the creation of the Interim Trust & Safety Case Review Committee (CRC), designed to allow appeal of certain less clear-cut cases decided by the WMF (both on-wiki and event bans), including appealing against a decision by T&S not to act on a complaint. A charter, a public call for applicants, and a Q&A with WMF Vice President of Community Resilience & Sustainability Maggie Dennis were also created. The CRC charter sets out the scope, objectives, and minimum candidate requirements.
The CRC is specifically temporary, designed to terminate with the creation of a permanent process as part of the Universal Code of Conduct. If those discussions have not concluded by July 1, 2021, then a new candidate call can be made for a new term or a single up to six-month extension can be granted if there is a clear indication the process will wrap up by then (such as if an implementation date has been agreed).
Process: Maggie Dennis responded to a question: "Let's say user FooBar is blocked as a T&S office action and requests case review [...] What does the appeal process look like, both from FooBar's perspective and the review committee's perspective?"
Subject to process changing by the CRC, a rough outline was offered as follows:
- User emails inbox asking for a review
- WMF attorney confirms case is not within remit of "statutory, regulatory, employment, or legal policies", and so is subject to review
- User is notified it is under review and given likely timeline
- CRC Chair appoints 5 members who review the case for "appropriate handling; appropriate collection of evidence; appropriate outcomes"
- Members vote on whether to support, overturn (partially or fully), or return to the WMF for additional investigation
- WMF enacts that decision
- All involved users will be notified of decision
Overturning could occur on two main grounds: the sanction was inappropriately reached (the evidence didn't warrant the sanction) or the case did not fall within the T&S remit. This would indicate that a complaint could then be resubmitted at local community level (Arbitration Committee, Administrators' Noticeboard/Incidents (ANI) or equivalents). The publicly available documentation doesn't make it clear if a case could be simultaneously overturned on both grounds and whether that would still allow for a "double jeopardy" situation. Individuals may only make a single appeal per prohibition.
Candidates: the WMF imposes a number of eligibility requirements, including holding a current or prior advanced permissions role or an experienced contributor as part of a Wikimedia affiliate. Candidates also need to be members in full good standing with no current sanctions and be fluent in English. Several roles were viewed as exclusive, including current/former WMF staff. The en-wiki Community has decided to disallow currently serving arbitrators from acting as CRC members, which Maggie Dennis said would be accepted. Gender and lingual diversity were also sought, the latter most likely also driving a project diversity.
CRC members are intended to be able to spend up to five hours a week on the role, though there were repeated statements that it was anticipated to be less.
One particular requirement was part of a major theme: anonymity. As well as keeping all case information to themselves under a currently non-published reinforced non-disclosure agreement (NDA) – above and beyond the standard non-public information agreement – candidates made anonymous applications and are to keep both others' and their own membership secret. A number of changes were made after applications closed due to "negotiation between committee finalists and Deputy GC", including further limiting CRC membership knowledge to only three Board members but giving retired CRC members the right to self-disclose after 6 months.
The initial filter of applications was made by non-applying Stewards, with members chosen from that group by the WMF General Counsel Amanda Keton. The WMF is also hiring a contractor to support the committee.
Reporting: the CRC is to provide quarterly generalised reports (number of cases ratified, number of cases overturned). It's not clear whether additional information will also be provided, such as number of cases T&S prohibits from going to appeal. –Nbb
- IRS form 990: The WMF has released its Form 990, the major financial filing required of US non-profit organizations, for the year ending June 2019. Links to other WMF financial documents and to a FAQ on Form 990 can be found here.
- New administrator: The Signpost welcomes the English Wikipedia's newest administrator, , who has the additional distinction of being a Signpost staffer.
- Wiki Loves Monuments during the pandemic: WLM will be held this year despite all the difficulties posed by the pandemic. About half of the 40 participating countries will be holding the contest during September, according to the usual schedule. Other countries, including Brazil, Russia, and the United States, will hold the event during October. Bangladesh is scheduled for November and Israel for the month of Tishrei.
- Milestones: Featured picture today. Congratulations. is expecting to mark his 500th