Partial Automation for Human Tasks in a Collaborative System: The Case of Deletion in Wikipedia

This presentation is part of the WikiSym + OpenSym 2013 program.

Bluma S. Gelley, Torsten Suel

Wikipedia’s low barriers to participation have the unintended effect of attracting a large amount of inappropriate content. One form of inappropriate content is articles whose topics do not meet Wikipedia’s inclusion standards. The deletion of these articles wastes a large amount of time and effort that could be better spent improving Wikipedia’s quality. We propose to partially automate the task of detecting unencylopedic pages using machine learning. We examine three main deletion methods in Wikipedia and collect a dataset of articles, heretofore inaccessible, deleted using each method. We use the data to train classifiers to detect articles that should be deleted. We report precision of .986 and recall of .975 in the best case and high precision with lower, but still useful, recall, in the most difficult case. Our results show that it is possible to use an automated software system to assist humans in finding articles for deletion.

A PDF file will be made available on August 5, 2013, through the WikiSym + OpenSym 2013 conference proceedings.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.