Wikipedia:Wikipedia Signpost/2020-09-27/News and notes

More large-scale errors at a "small" wiki: With inline parenthetical citations!


Large-scale errors at Malagasy Wiktionary

Malagasy Wiktionary page growth to 2020.png
Growth of Malagasy Wiktionary, 99.23% due to bot edits

A small wiki audit of the Malagasy Wiktionary found that the wiktionary, which has the second largest number of entries (over 6,103,961), has had a large number of their pages automatically translated. Bot-Jagwar is a bot account run by Jagwar, the sole admin who has made edits. On the project, his bot has made more than 22 million edits (and counting). Jagwar also has a secondary bot account, Bot-Jagwar II which has made a further 6,976 edits. Another major bot contributing to mg.wikt, making the exact same type of edit, is Ikotobaity, with 2,456,748 edits run by Lohataona until 2017; the bot has been inactive since 20 October 2017. These three bots have created 6,076,769 new mainspace pages, which is 99.23% of all mainspace pages on mg.wikt. (Jagwar also ran bot edits on his main account, so the true number of bot-created entries is likely 50,000 higher.)

In this blog post, Jagwar detailed the history of his bot and mg.wikt. The bot began editing in 2010, at a rate of 50,000 edits per day, initially simply importing foreign words from other wiktionaries. After the wiki reached 200,000 pages in 2011, he wrote a script that "upload[ed] the word forms of that language", and propelled Malagasy Wiktionary to be the third largest. In 2012, Jagwar developed a more refined script. He uses NLP and automated translation in order to generate new entries, with no human intervention nor oversight. In the blog post, he wrote that translation errors were estimated at <5%, though he had "no precise idea" of it.

There is no active editing community, and Jagwar is the sole active admin on the site. Jagwar himself has only made 6 edits in the last 90 days, of which only 3 were in mainspace. The audit noted that there are various mistakes in the entries. Of a random survey of 100 non-Malagasy entries, the auditor concluded that 49 were "unusable", 29 "partially usable", and only 22 were "fully correct and usable" (though they may still have minor errors). Of Malagasy entries, the report noted that:

There are 41,902 entries categorised as lacking any definition, most of which seem to be Malagasy entries, and around 30,000 of which are the result of the definitions being removed due to copyright violation many years ago. Although there are 1,150,182 Malagasy entries in total, most of these are inflected forms, which can generally be safely created by bots. These definitionless entries are not strictly speaking incorrect, but a definition is the most central function of a dictionary, so these entries fail to be a useful part of the dictionary as a whole.

The bots also ran 218,156 edits at chr.wikt from 2012 to 2014 and 127,389 edits at ku.wikt from 2012 to 2013. The audit concluded that "Even an editing community of the size of the biggest Wiktionary, en.wikt, would not be able to clean up after these bots by hand". It strongly recommended deleting all non-Malagasy entries, removing translation sections, and telling the bot owners to cease automated creation of entries, and weakly recommended deleting all definition-less entries. – adapted by Eddie891 from Large-scale errors at Malagasy Wiktionary, written by Metaknowledge, with help from Surjection, AryamanA, Erutuon, and Smashhoof, along with input from a fluent speaker of Malagasy who wishes to remain anonymous.

Inline parenthetical citations deprecated

A Request for Comment (RfC) to deprecate the inline parenthetical citation style was closed by Seraphimblade on 5 September as having reached consensus "that inline parenthetical referencing should be deprecated". The RFC, which was begun by CaptainEek on 5 August, drew a large amount of attention and discussion. A watchlist notice for the RFC was placed on 29 August after a discussion determined that it was a sufficiently high-profile RFC.

In closing the discussion, Seraphimblade noted that roughly 71% of the community had supported the proposal and that there was only consensus to deprecate "parenthetical style citations directly inlined into articles", rather than {{harv}} style-references in <ref></ref> tags. The RFC led to the WP:PAREN and WP:CITEVAR guidelines needing an update, though as of The Signpost's publication deadline, what the update would look like was still under discussion. Before the RfC, CITEVAR specifically stated that "editors should not attempt to change an article's established citation style merely on the grounds of personal preference" and cited a 2006 Arbitration Committee decision that "Wikipedia does not mandate styles in many different areas", including citation style. E

More news

  • The total number of edits on all wikis reached a new monthly high in August at 54.8 million edits. Monthly edits on the English Wikipedia however won't soon reach the highs of 6.7 million set back in 2007. While edits have been on a generally upward path since 2014, the August number was 5.4 million.
  • The total number of pageviews on all wikis in August was 21.2 billion, down from the recent high in April of 25.8 billion. On the English-language Wikipedia, there were 9.5 billion recorded pageviews in August, down from 11.3 billion in April.
  • Kiev moves to Kyiv. Wugapodes closed a long-running page move discussion by moving the Kiev article. The traditional English name of Ukraine's capital city, Kiev, is similar to the Russian name of the city. The article was moved to Kyiv, based on the Ukrainian name transliterated from Ukrainian Cyrillic. While most participants in the RM cited Wikipedia's common name policy, the discussion was likely influenced by Russia's 2014 invasion of Ukraine, their annexation of Crimea, and Russo-Ukrainian War that has been fought since then.

Brief notes

  • Milestones: The Tropical cyclone WikiProject celebrates its 15th birthday on October 5. Hurricanehink reminds us that we're "in the midst of another historically active season. The project has produced more than 1,000 good articles and 234 featured articles/lists, meaning 48% of the project's articles are rated 'good' or better."
  • New user-groups: The Affiliations Committee announced the approval of the newest Wikimedia movement affiliate, the Wikimedians of Western Armenian Language User Group.
  • New administrators: The Signpost welcomes the English Wikipedia's newest administrators, Ajpolino, Jackmcbarn, and LuK3. Five aspirants ran for adminship in a near-simultaneous mid-September flight, which was forecast in previous Signpost coverage.
  • The new requirements for custom user signatures began on 6 July 2020. If you try to create a custom signature that does not meet the requirements, you will get an error message.
  • Registration for editing? The Portuguese Wikipedia is currently voting on a proposal to require registration to edit. Voting, which opened on September 4, will close on October 4. The discussion is currently leaning towards requiring registration to create and edit articles, but would allow IP editing in the help and talk namespaces. Between 2008 and 2014, editors had to complete a CAPTCHA to save edits.
  • The Coolest Tool Award is starting its second year and looking for nominations by the October 14 deadline. JHernandez (WMF) says that "Tools play an essential role for the Wikimedia projects, and so do the many volunteer developers who experiment with new ideas and develop and maintain local and global solutions to support the Wikimedia communities. The Coolest Tool Award aims to recognize and celebrate the coolest tools in a variety of categories." The winning tools and developers will be named in November.
  • Russian Wikinews has begun running English-language news stories: see this English discussion thread for more details.