Category Archives: Wikipedia Track

Monitoring the Gender Gap with Wikidata Human Gender Indicators

Title: Monitoring the Gender Gap with Wikidata Human Gender Indicators

Authors: Maximilian Klein (GroupLens Research), Harsh Gupta, Vivek Rai (Indian Institute of Technology, Kharagpur), Piotr Konieczny (Hanyang University) and Haiyi Zhu (GroupLens Research)

Abstract: The gender gap in Wikipedia’s content, specifically in the representation of women in biographies, is well-known but has been difficult to measure. Furthermore the impacts of efforts to address this gender gap have received little attention. To investigate we utilise Wikidata, the database that feeds Wikipedia, and introduce the “Wikidata Human Gender Indicators” (WHGI), a free and open source, longitudinal, biographical dataset monitoring gender disparities across time, space, culture, occupation and language. Through these lenses we show how the representation of women is changing along 11 dimensions. Validations of WHGI are presented against three exogenous datasets: the world’s historical population, “traditional” gender-disparity indices (GDI, GEI, GGGI and SIGI), and occupational gender according to the US Bureau of Labor Statistics. Furthermore, to demonstrate its general use in research, we revisit previously published findings on Wikipedia’s gender bias that can be strengthened by WHGI.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Mining team characteristics to predict Wikipedia article quality

Title: Mining team characteristics to predict Wikipedia article quality

Authors: Grace Gimon Betancourt, Armando Segnini, Carlos Trabuco, Amira Rezgui and Nicolas Jullien (Télécom Bretagne)

Abstract: In this study, we were interested in studying which characteristics of virtual teams are good predictors for the quality of their production. The experiment involved obtaining the Spanish Wikipedia database dump and applying different data mining techniques sui- table for large data sets to label the whole set of articles according to their quality (comparing them with the Featured/Good Articles, or FA/GA). Then we created the attributes that describe the characteristics of the team who produced the articles and using decision tree methods, we obtained the most relevant characteristics of the teams that produced FA/GA. The team’s maximum efficiency and the total length of contribution are the most important predictors. This article contributes to the literature on virtual team organization.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

An Empirical Evaluation of Property Recommender Systems for Wikidata and Collaborative Knowledge Bases

Title: An Empirical Evaluation of Property Recommender Systems for Wikidata and Collaborative Knowledge Bases

Authors: Eva Zangerle, Wolfgang Gassler, Martin Pichl, Stefan Steinhauser, Günther Specht (University of Innsbruck)

Abstract: The Wikidata platform is a crowdsourced, structured knowledgebase aiming to provide integrated, free and languageagnostic facts which are amongst others used by Wikipedias. Users who actively enter, review and revise data on Wikidata are assisted by a property suggesting system which provides users with properties that might also be applicable to a given item. We argue that evaluating and subsequently improving this recommendation mechanism and hence, assisting users, can directly contribute to an even more integrated, consistent and extensive knowledge base serving a huge variety of applications. However, the quality and usefulness of such recommendations has not been evaluated yet. In this work, we provide the first evaluation of different approaches aiming to provide users with property recommendations in the process of curating information on Wikidata. We compare the approach currently facilitated on Wikidata with two state-of-the-art recommendation approaches stemming from the field of RDF recommender systems and collaborative information systems. Further, we also evaluate hybrid recommender systems combining these approaches. Our evaluations show that the current recommendation algorithm works well in regards to recall and precision, reaching a recall@7 of 79.71% and a precision@7 of 27.97%. We also find that generally, incorporating contextual as well as classifying information into the computation of property recommendations can further improve its performance significantly.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Evaluating and Improving Navigability of Wikipedia: A Comparative Study of Eight Language Editions

Title: Evaluating and Improving Navigability of Wikipedia: A Comparative Study of Eight Language Editions

Authors: Daniel Lamprecht (KTI, Graz University of Technology), Dimitar Dimitrov (GESIS – Leibniz Institute for the Social Sciences), Denis Helic (KTI, Graz University of Technology) and Markus Strohmaier (GESIS – Leibniz Institute for the Social Sciences and University of Koblenz-Landau)

Abstract: Wikipedia supports its users to reach a wide variety of goals: looking up facts, researching a topic, making an edit or simply browsing to pass time. Some of these goals, such as the lookup of facts, can be effectively supported by search functions. However, for other use cases such as researching an unfamiliar topic, users need to rely on the links to connect articles. In this paper, we investigate the state of navigability in the article networks of eight language versions of Wikipedia. We find that, when taking all links of articles into account, all language versions enable mutual reachability for almost all articles. However, previous research has shown that visitors of Wikipedia focus most of their attention on the areas located close to the top. We therefore investigate different restricted navigational views that users could have when looking at articles. We find that restricting the view of articles strongly limits the navigability of the resulting networks and impedes navigation. Based on this analysis we then propose a link recommendation method to augment the link network to improve navigability in the network. Our approach selects links from a less restricted view of the article and proposes to move these links into more visible sections. The recommended links are therefore relevant for the article. Our results are relevant for researchers interested in the navigability of Wikipedia and open up new avenues for link recommendations in Wikipedia editing.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Accept, Decline, Postpone: How Newcomer Productivity is Reduced in English Wikipedia by Pre-publication Review

Title: Accept, Decline, Postpone: How Newcomer Productivity is Reduced in English Wikipedia by Pre-publication Review

Authors: Jodi Schneider, Bluma S. Gelley, Aaron Halfaker

Abstract: Wikipedia needs to attract and retain newcomers while also increasing the quality of its content. Yet new Wikipedia users are disproportionately affected by the quality assurance mechanisms designed to thwart spammers and promoters. English Wikipedia’s Articles for Creation provides a protected space for drafting new articles, which are reviewed against minimum quality guidelines before they are published. In this study we explore how this drafting process has affected the productivity of newcomers in Wikipedia. Using a mixed qualitative and quantitative approach, we show how the process’s pre-publication review, which is intended to improve the success of newcomers, in fact decreases newcomer productivity in English Wikipedia and offer recommendations for system designers.

This contribution to OpenSym 2014 will be made available as part of the OpenSym 2014 proceedings on or after August 27, 2014.

WikiBrain: Democratizing Computation on Wikipedia

Title: AWikiBrain: Democratizing Computation on Wikipedia

Authors: Shilad Sen, Matt Lesicko, Ari Weiland, Rebecca Gold, Yulun Li, Benjamin Hillmann, Toby Jia-Jun Li, and Brent Hecht

Abstract: Wikipedia is known for serving humans’ informational needs. Over the past decade, the encyclopedic knowledge encoded in Wikipedia has also powerfully served computer systems. Leading algorithms in artificial intelligence, natural language processing, data mining, geographic information science, and many other fields analyze the text and structure of articles to build computational models of the world. Many software packages extract knowledge from Wikipedia. However, existing tools either (1) provide Wikipedia data, but not well-known Wikipedia-based algorithms or (2) narrowly focus on one such algorithm. This paper presents the WikiBrain software framework, an extensible Java-based platform that democratizes access to a range of Wikipedia-based algorithms and technologies. WikiBrain provides simple access to the diverse Wikipedia data needed for semantic algorithms and technologies, ranging from page views to Wikidata. In a few lines of code, a developer can use WikiBrain to access Wikipedia data and state-of-the-art algorithms. WikiBrain also enables researchers to extend Wikipedia-based algorithms and evaluate their extensions. WikiBrain promotes a new vision of the Wikipedia software ecosystem: every researcher and developer should have access to state-of-the-art Wikipedia-based technologies.

This contribution to OpenSym 2014 will be made available as part of the OpenSym 2014 proceedings on or after August 27, 2014.

Consider the Redirect: A Missing Dimension of Wikipedia Research

Title: Consider the Redirect: A Missing Dimension of Wikipedia Research

Authors: Benjamin Mako Hill, Aaron Shaw

Abstract: Redirects are special pages in wikis that silently transport visitors to other pages. Although redirects make up a majority of all article pages in English Wikipedia, they have attracted very little attention and are rarely taken into account by researchers. This note describes redirects and illustrates why they play an important role in shaping activity in Wikipedia. We also present a novel longitudinal dataset of redirects for English Wikipedia and the software used to produce it. Using this dataset, we revisit several important published findings about Wikipedia to show that accounting for redirects can have important effects on research.

This contribution to OpenSym 2014 will be made available as part of the OpenSym 2014 proceedings on or after August 27, 2014.

Chinese-language Literature About Wikipedia: A Meta-Analysis of Academic Search Engine Result Pages

Title: Chinese-language Literature About Wikipedia: A Meta-Analysis of Academic Search Engine Result Pages

Authors: Han-Teng Liao, Bin Zhang

Abstract: This paper presents a webometric analysis of the academic search engine result pages (SERPs) of the Chinese-language term of “Wikipedia” across major Chinese-speaking regions of mainland China, Hong Kong and Taiwan. Because of the academic outcome, the findings can also be interpreted for further meta-analysis, or “research about research”, of the Wikipedia research in Chinese-language literatures. The findings cover the results from four major search platforms: CNKI Scholar, Google Scholar China, Google Scholar Hong Kong and Google Scholar Taiwan. Cross tabulation of the results shows the major institutions (journals and academic departments) and scholarly archives for Chinese-language Wikipedia research. The findings suggest that there exists a divide between mainland Chinese academic sources/search results on one hand, and Hong Kong/Taiwanese ones on the other. Meta-analysis based on academic SERPs have implications for identifying the gaps and potentials in internationalization of Wikipedia research.

This contribution to OpenSym 2014 will be made available as part of the OpenSym 2014 proceedings on or after August 27, 2014.

Geographic and Linguistic Normalization: Towards a Better Understanding of the Geo-linguistic Dynamics of Knowledge

Title: Geographic and Linguistic Normalization: Towards a Better Understanding of the Geo-linguistic Dynamics of Knowledge

Authors: Han-Teng Liao, Thomas Petzold

Abstract: This paper proposes a method of geo-linguistic normalization to advance the existing comparative analysis of open collaborative communities, with multilingual Wikipedia projects as the example. Such normalization requires data regarding the potential users and/or resources of a geolinguistic unit.

This contribution to OpenSym 2014 will be made available as part of the OpenSym 2014 proceedings on or after August 27, 2014.

Contropedia – The Analysis and Visualization of Controversies in Wikipedia Articles

Title: Contropedia – The Analysis and Visualization of Controversies in Wikipedia Articles

Authors: Erik Borra, Esther Weltevrede, Paolo Ciuccarelli, Andreas Kaltenbrunner, David Laniado, Giovanni Magni, Michele Mauri, Richard Rogers, Tommaso Venturini

Abstract: Collaborative content creation inevitably reaches situations where different points of view lead to conflict. In Wikipedia, one of the most prominent examples of collaboration online, conflict is mediated by both policy and software, and conflicts often reflect larger societal debates. Contropedia is a platform for the analysis and visualization of such controversies in Wikipedia. Controversy metrics are extracted from activity streams generated by edits to, and discussions about, individual articles and groups of related articles. An article’s revision history and its corresponding discussion pages constitute two parallel streams of user interactions that, taken together, fully describe the process of the collaborative creation of an article. Our proposed platform, Contropedia, builds on state of the art techniques and extends current metrics for the analysis of both edit and discussion activity and visualizes these both as a layer on top of Wikipedia articles as well as a dashboard view presenting additional analytics. Furthermore, the combination of these two approaches allows for a deeper understanding of the substance, composition, actor alignment, trajectory and liveliness of controversies on Wikipedia. Our research aims to provide a better understanding of sociotechnical phenomena that take place on the web and to equip citizens with tools to fully deploy the complexity of controversies. Contropedia is useful for the general public as well as user groups with specific interests such as scientists, students, data journalists, decision makers and media communicators.

This contribution to OpenSym 2014 will be made available as part of the OpenSym 2014 proceedings on or after August 27, 2014.