Category Archives: Full Research Papers

What We Talk About When We Talk About Wikidata Quality: A Literature Survey

Title: What We Talk About When We Talk About Wikidata Quality: A Literature Survey

Authors: Alessandro Piscopo (University of Southampton), Elena Simperl (University of Southampton)

Abstract: Launched in 2012, Wikidata has already become a success story. It is a collaborative knowledge graph, whose large community has produced so far data about more than 55 million entities. Understanding the quality of the data in Wikidata is key to its widespread adoption and future development. No study has investigated so far to what extent and which aspects of this topic have been addressed. To fill this gap, we surveyed prior literature about data quality in Wikidata. Our analysis includes 28 papers and categorise by quality dimensions addressed. We showed that a number of quality dimensions has not been yet adequately covered, e.g. accuracy and trustworthiness. Future work should focus on these.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata

Title: When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata

Authors: Lucie-Aimée Ka ffee (University of Southampton and TIB Leibniz Information Centre for Science and Technology), Kemele M Endris (TIB Leibniz Information Centre for Science and Technology), Elena Simperl (University of Southampton)

Abstract: The quality and maintainability of a knowledge graph are determined by the process in which it is created. There are different approaches to such processes; extraction or conversion of available data in the web (automated extraction of knowledge such as DBpedia from Wikipedia), community-created knowledge graphs, often by a group of experts, and hybrid approaches where humans maintain the knowledge graph alongside bots. We focus in this work on the hybrid approach of human edited knowledge graphs supported by automated tools. In particular, we analyse the editing of natural language data, i.e. labels. Labels are the entry point for humans to understand the information, and therefore need to be carefully maintained. We take a step toward the understanding of collaborative editing of humans and automated tools across languages in a knowledge graph. We use Wikidata as it has a large and active community of humans and bots working together covering over 300 languages. In this work, we analyse the different editor groups and how they interact with the different language data to understand the provenance of the current label data.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Approving Automation: Analyzing Requests for Permissions of Bots in Wikidata

Title: Approving Automation: Analyzing Requests for Permissions of Bots in Wikidata

Authors: Mariam Farda-Sarbas (Freie Universitat Berlin), Hong Zhu (Freie Universität Berlin), Marisa Nest (Freie Universität Berlin), Claudia Muller-Birn (Freie Universität Berlin)

Abstract: Wikidata, initially developed to serve as a central structured knowledge base for Wikipedia, is now a melting point for structured data for companies, research projects and other peer production communities. Wikidata’s community consists of humans and bots, and most edits in Wikidata come from these bots. Prior research has raised concerns regarding the challenges for editors to ensure the quality of bot-generated data, such as the lack of quality control and knowledge diversity. In this research work, we provide one way of tackling these challenges by taking a closer look at the approval process of bot activity on Wikidata. We collected all bot requests, i.e. requests for permissions (RfP) from October 2012 to July 2018. We analyzed these 683 bot requests by classifying them regarding activity focus, activity type, and source mentioned. Our results show that the majority of task requests deal with data additions to Wikidata from internal sources, especially from Wikipedia. However, we can also show the existing diversity of external sources used so far. Furthermore, we examined the reasons which caused the unsuccessful closing of RfPs. In some cases, the Wikidata community is reluctant to implement specific bots, even if they are urgently needed because there is still no agreement in the community regarding the technical implementation. This study can serve as a foundation for studies that connect the approved tasks with the editing behavior of bots on Wikidata to understand the role of bots better for quality control and knowledge diversity.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Dwelling on Wikipedia: Investigating Time Spent by Global Encyclopedia Readers

Title: Dwelling on Wikipedia: Investigating Time Spent by Global Encyclopedia Readers

Authors: Nathan TeBlunthuis (Wikimedia Foundation), Tilman Bayer (Wikimedia Foundation), Olga Vasileva (Wikimedia Foundation)

Abstract: Much existing knowledge about global consumption of peer-produced information goods is supported by data on Wikipedia page view counts and surveys. In 2017, the Wikimedia Foundation began measuring the time readers spend on a given page view (dwell time), enabling a more detailed understanding of such reading patterns. In this paper, we validate and model this new data source and, building on existing findings, use regression analysis to test hypotheses about how patterns in reading time vary between global contexts. Consistent with prior findings from self-report data, our complementary analysis of behavioral data provides evidence that Global South readers are more likely to use Wikipedia to gain in-depth understanding of a topic. We find that Global South readers spend more time per page view and that this difference is amplified on desktop devices, which are thought to be better suited for in-depth information seeking tasks.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Article Quality Classification on Wikipedia: Introducing Document Embeddings and Content Features

Title: Article Quality Classification on Wikipedia: Introducing Document Embeddings and Content Features

Authors: Manuel Schmidt (University of Innsbruck), Eva Zangerle (University of Innsbruck)

Abstract: The quality of articles on the Wikipedia platform is vital for its success. Currently, the assessment of quality is performed manually by the Wikipedia community, where editors classify articles into pre-defined quality classes. However, this approach is hardly scalable and hence, approaches for the automatic classification have been investigated. In this paper, we extend this previous line of research on article quality classification by extending the set of features with novel content and edit features (e.g., document embeddings of articles). We propose a classification approach utilizing gradient boosted trees based on this novel, extended set of features extracted from Wikipedia articles. Based on an established dataset containing Wikipedia articles and quality classes, we show that our approach is able to substantially outperform previous approaches (also including recent deep learning methods). Furthermore, we shed light on the contribution of individual features and show that the proposed features indeed capture the quality of an article well.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Do You Have a Source for That? Understanding the Challenges of Collaborative Evidence-based Journalism

Title: Do You Have a Source for That? Understanding the Challenges of Collaborative Evidence-based Journalism

Authors: Sheila O’Riordan (University College Cork), Gaye Kiely (University College Cork), Bill Emerson (University College Cork), Joseph Feller (University College Cork)

Abstract: WikiTribune is a pilot news service, where evidence-based articles are co-created by professional journalists and a community of volunteers using an open and collaborative digital platform. The WikiTribune project is set within an evolving and dynamic media landscape, operating under principles of openness and transparency. It combines a commercial for-profit business model with an open collaborative mode of production with contributions from both paid professionals and unpaid volunteers. This descriptive case study captures the first 12-months of WikiTribune’s operations to understand the challenges and opportunities within this hybrid model of production. We use the rich literature on Wikipedia to understand the WikiTribune case and to identify areas of convergence and divergence, as well as avenues for future research. Data was collected on news articles with a focus on the time it takes for an article to reach published status, the number and type of contributors typically involved, article activity and engagement levels, and the types of topics covered.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Visualization of the Evolution of Collaboration and Communication Networks in Wikis

Title: Visualization of the Evolution of Collaboration and Communication Networks in Wikis

Authors: Youssef El Faqir (Universidad Complutense de Madrid), Javier Arroyo (Universidad Complutense de Madrid), Abel Serrano (Universidad Complutense de Madrid)

Abstract: Commons-based peer production communities can be analyzed with the help of social network analysis. However, since they are fluid organizations that change over time, the time dimension needs to be taken into account.

In this work we present a web application, WikiChron networks, to facilitate the study of the evolution of wiki communities over time. The tool displays three different community networks depending on the pages considered for the interactions: articles, talk pages of articles or talk pages of users. The consideration of these three networks offer complementary views of the same community, while the time dimension makes possible to observe how the network structures changes over time and the changes in the network role experimented by some editors. We illustrate the usefulness of our tool analyzing the evolution of a wiki community in different moments and showing network structures that can be seen in other wiki communities.

WikiChron networks is open source and is publicly available. We hope that it will stimulate research on the evolution of collaboration and communication in wiki communities.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Reducing Procrastination While Improving Performance: A Wiki-powered Experiment With Students

Title: Reducing Procrastination While Improving Performance: A Wiki-powered Experiment With Students

Authors: Antonio Balderas (University of Cadiz), Andrea Capiluppi (Brunel University), Manuel Palomo-Duarte (University of Cadiz), Alessio Malizia (University of Herfordshire), Juan Manuel Dodero (University of Cadiz)

Abstract: Students in higher education are traditionally requested to produce various pieces of written work during the courses they undertake. When students’ work is submitted online as a whole, both the ethically questionable act of procrastinating and late submissions afect performance. The objective of this paper is to assess the performance of students from a control group, with that of students from an experimental group. The control group produced work as a unique deliverable to be submitted at the end of the course. On the other hand, the experimental group worked on each part for a week, and their work was managed by a wiki environment and monitored by a speciically developed software. Positive efects were noticed in the experimental group, as both students’ time management skills and performance increased. Replications of this experiment can and should be performed, in order to compare results in coursework submission.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Bringing Open Data into Danish Schools and its Potential Impact on School Pupils

Title: Bringing Open Data into Danish Schools and its Potential Impact on School Pupils

Authors: Mubashrah Saddiqa (Aalborg University), Lise Rasmussen (Aalborg University), Rikke Magnussen (Aalborg University), Birger Larsen (Aalborg University), Jens Myrup Pedersen (Aalborg University)

Abstract: Private and public institutions are using open and public data to provide better services, which increases the impact of open data on daily life. With the advancement of technology, it becomes also important to equip our younger generation with the essential skills for future challenges. In order to bring up a generation equipped with 21st century skills, open data could facilitate educational processes at school level as an educational resource. Open data could acts as a key resource to enhance the understanding of data through critical thinking and ethical vision among the youth and school pupils. To bring open data into schools, it is important to know the teacher’s perspective on open data literacy and its possible impact on pupils. As a research contribution, we answered these questions through a Danish public school teacher’s survey where we interviewed 10 Danish public school teachers of grade 5-7th and analyzed their views about the impact of open data on pupils’ learning development. After analyzing Copenhagen city’s open data, we identified four open data educational themes that could facilitate different subjects, e.g. geography, mathematics, basic science and social science. The survey includes interviews, open discussions, questionnaires and an experiment with the grade 7th pupils, where we test the pupils’ understanding with open data. The survey concluded that open data cannot only empower pupils to understand real facts about their local areas, improve civics awareness and develop digital and data skills, but also enable them to come up with the ideas to improve their communities.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Open Data Policy Development: How Can Municipalities Take Account of Residents’ Perspectives?

Title: Open Data Policy Development: How Can Municipalities Take Account of Residents’ Perspectives?

Authors: Anneke Zuiderwijk (Delft University of Technology), Martine Romer (Delft University of Technology), Maarten Kroesen (Delft University of Technology)

Abstract: In many countries, governments encourage municipalities to develop open data policies and subsequently open up data. Municipal open data policies are often supply-driven and not based on residents’ wishes. Municipalities lack insight into residents’ perspectives on opening up municipal data and often do not know how to take them into account when developing their open data policies. This paper aims to reveal residents’ perspectives on municipal open data policies and provide recommendations for municipalities on how to account for them when developing future open data policies. Using Q-methodology and applying it to the municipality of Delft in the Netherlands, we elicited the perspective of four main groups of residents on the development of the municipal open data policy as follows: 1) ‘the oblivious residents’, 2) ‘the distrustful residents’, 3) ‘the trusting, passive residents’, and 4) ‘the open data advocates’. We found that all residents considered transparency important for the quality of public administration, and that municipal transparency is currently lacking. We then provide recommendations for policy makers responsible for municipal open data policies and suggest directions for open data theory development concerning municipal open data policy.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.