Approving Automation: Analyzing Requests for Permissions of Bots in Wikidata

Title: Approving Automation: Analyzing Requests for Permissions of Bots in Wikidata

Authors: Mariam Farda-Sarbas (Freie Universitat Berlin), Hong Zhu (Freie Universität Berlin), Marisa Nest (Freie Universität Berlin), Claudia Muller-Birn (Freie Universität Berlin)

Abstract: Wikidata, initially developed to serve as a central structured knowledge base for Wikipedia, is now a melting point for structured data for companies, research projects and other peer production communities. Wikidata’s community consists of humans and bots, and most edits in Wikidata come from these bots. Prior research has raised concerns regarding the challenges for editors to ensure the quality of bot-generated data, such as the lack of quality control and knowledge diversity. In this research work, we provide one way of tackling these challenges by taking a closer look at the approval process of bot activity on Wikidata. We collected all bot requests, i.e. requests for permissions (RfP) from October 2012 to July 2018. We analyzed these 683 bot requests by classifying them regarding activity focus, activity type, and source mentioned. Our results show that the majority of task requests deal with data additions to Wikidata from internal sources, especially from Wikipedia. However, we can also show the existing diversity of external sources used so far. Furthermore, we examined the reasons which caused the unsuccessful closing of RfPs. In some cases, the Wikidata community is reluctant to implement specific bots, even if they are urgently needed because there is still no agreement in the community regarding the technical implementation. This study can serve as a foundation for studies that connect the approved tasks with the editing behavior of bots on Wikidata to understand the role of bots better for quality control and knowledge diversity.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Dwelling on Wikipedia: Investigating Time Spent by Global Encyclopedia Readers

Title: Dwelling on Wikipedia: Investigating Time Spent by Global Encyclopedia Readers

Authors: Nathan TeBlunthuis (Wikimedia Foundation), Tilman Bayer (Wikimedia Foundation), Olga Vasileva (Wikimedia Foundation)

Abstract: Much existing knowledge about global consumption of peer-produced information goods is supported by data on Wikipedia page view counts and surveys. In 2017, the Wikimedia Foundation began measuring the time readers spend on a given page view (dwell time), enabling a more detailed understanding of such reading patterns. In this paper, we validate and model this new data source and, building on existing findings, use regression analysis to test hypotheses about how patterns in reading time vary between global contexts. Consistent with prior findings from self-report data, our complementary analysis of behavioral data provides evidence that Global South readers are more likely to use Wikipedia to gain in-depth understanding of a topic. We find that Global South readers spend more time per page view and that this difference is amplified on desktop devices, which are thought to be better suited for in-depth information seeking tasks.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Article Quality Classification on Wikipedia: Introducing Document Embeddings and Content Features

Title: Article Quality Classification on Wikipedia: Introducing Document Embeddings and Content Features

Authors: Manuel Schmidt (University of Innsbruck), Eva Zangerle (University of Innsbruck)

Abstract: The quality of articles on the Wikipedia platform is vital for its success. Currently, the assessment of quality is performed manually by the Wikipedia community, where editors classify articles into pre-defined quality classes. However, this approach is hardly scalable and hence, approaches for the automatic classification have been investigated. In this paper, we extend this previous line of research on article quality classification by extending the set of features with novel content and edit features (e.g., document embeddings of articles). We propose a classification approach utilizing gradient boosted trees based on this novel, extended set of features extracted from Wikipedia articles. Based on an established dataset containing Wikipedia articles and quality classes, we show that our approach is able to substantially outperform previous approaches (also including recent deep learning methods). Furthermore, we shed light on the contribution of individual features and show that the proposed features indeed capture the quality of an article well.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Do You Have a Source for That? Understanding the Challenges of Collaborative Evidence-based Journalism

Title: Do You Have a Source for That? Understanding the Challenges of Collaborative Evidence-based Journalism

Authors: Sheila O’Riordan (University College Cork), Gaye Kiely (University College Cork), Bill Emerson (University College Cork), Joseph Feller (University College Cork)

Abstract: WikiTribune is a pilot news service, where evidence-based articles are co-created by professional journalists and a community of volunteers using an open and collaborative digital platform. The WikiTribune project is set within an evolving and dynamic media landscape, operating under principles of openness and transparency. It combines a commercial for-profit business model with an open collaborative mode of production with contributions from both paid professionals and unpaid volunteers. This descriptive case study captures the first 12-months of WikiTribune’s operations to understand the challenges and opportunities within this hybrid model of production. We use the rich literature on Wikipedia to understand the WikiTribune case and to identify areas of convergence and divergence, as well as avenues for future research. Data was collected on news articles with a focus on the time it takes for an article to reach published status, the number and type of contributors typically involved, article activity and engagement levels, and the types of topics covered.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Visualization of the Evolution of Collaboration and Communication Networks in Wikis

Title: Visualization of the Evolution of Collaboration and Communication Networks in Wikis

Authors: Youssef El Faqir (Universidad Complutense de Madrid), Javier Arroyo (Universidad Complutense de Madrid), Abel Serrano (Universidad Complutense de Madrid)

Abstract: Commons-based peer production communities can be analyzed with the help of social network analysis. However, since they are fluid organizations that change over time, the time dimension needs to be taken into account.

In this work we present a web application, WikiChron networks, to facilitate the study of the evolution of wiki communities over time. The tool displays three different community networks depending on the pages considered for the interactions: articles, talk pages of articles or talk pages of users. The consideration of these three networks offer complementary views of the same community, while the time dimension makes possible to observe how the network structures changes over time and the changes in the network role experimented by some editors. We illustrate the usefulness of our tool analyzing the evolution of a wiki community in different moments and showing network structures that can be seen in other wiki communities.

WikiChron networks is open source and is publicly available. We hope that it will stimulate research on the evolution of collaboration and communication in wiki communities.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Reducing Procrastination While Improving Performance: A Wiki-powered Experiment With Students

Title: Reducing Procrastination While Improving Performance: A Wiki-powered Experiment With Students

Authors: Antonio Balderas (University of Cadiz), Andrea Capiluppi (Brunel University), Manuel Palomo-Duarte (University of Cadiz), Alessio Malizia (University of Herfordshire), Juan Manuel Dodero (University of Cadiz)

Abstract: Students in higher education are traditionally requested to produce various pieces of written work during the courses they undertake. When students’ work is submitted online as a whole, both the ethically questionable act of procrastinating and late submissions afect performance. The objective of this paper is to assess the performance of students from a control group, with that of students from an experimental group. The control group produced work as a unique deliverable to be submitted at the end of the course. On the other hand, the experimental group worked on each part for a week, and their work was managed by a wiki environment and monitored by a speciically developed software. Positive efects were noticed in the experimental group, as both students’ time management skills and performance increased. Replications of this experiment can and should be performed, in order to compare results in coursework submission.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Bringing Open Data into Danish Schools and its Potential Impact on School Pupils

Title: Bringing Open Data into Danish Schools and its Potential Impact on School Pupils

Authors: Mubashrah Saddiqa (Aalborg University), Lise Rasmussen (Aalborg University), Rikke Magnussen (Aalborg University), Birger Larsen (Aalborg University), Jens Myrup Pedersen (Aalborg University)

Abstract: Private and public institutions are using open and public data to provide better services, which increases the impact of open data on daily life. With the advancement of technology, it becomes also important to equip our younger generation with the essential skills for future challenges. In order to bring up a generation equipped with 21st century skills, open data could facilitate educational processes at school level as an educational resource. Open data could acts as a key resource to enhance the understanding of data through critical thinking and ethical vision among the youth and school pupils. To bring open data into schools, it is important to know the teacher’s perspective on open data literacy and its possible impact on pupils. As a research contribution, we answered these questions through a Danish public school teacher’s survey where we interviewed 10 Danish public school teachers of grade 5-7th and analyzed their views about the impact of open data on pupils’ learning development. After analyzing Copenhagen city’s open data, we identified four open data educational themes that could facilitate different subjects, e.g. geography, mathematics, basic science and social science. The survey includes interviews, open discussions, questionnaires and an experiment with the grade 7th pupils, where we test the pupils’ understanding with open data. The survey concluded that open data cannot only empower pupils to understand real facts about their local areas, improve civics awareness and develop digital and data skills, but also enable them to come up with the ideas to improve their communities.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Open Data Policy Development: How Can Municipalities Take Account of Residents’ Perspectives?

Title: Open Data Policy Development: How Can Municipalities Take Account of Residents’ Perspectives?

Authors: Anneke Zuiderwijk (Delft University of Technology), Martine Romer (Delft University of Technology), Maarten Kroesen (Delft University of Technology)

Abstract: In many countries, governments encourage municipalities to develop open data policies and subsequently open up data. Municipal open data policies are often supply-driven and not based on residents’ wishes. Municipalities lack insight into residents’ perspectives on opening up municipal data and often do not know how to take them into account when developing their open data policies. This paper aims to reveal residents’ perspectives on municipal open data policies and provide recommendations for municipalities on how to account for them when developing future open data policies. Using Q-methodology and applying it to the municipality of Delft in the Netherlands, we elicited the perspective of four main groups of residents on the development of the municipal open data policy as follows: 1) ‘the oblivious residents’, 2) ‘the distrustful residents’, 3) ‘the trusting, passive residents’, and 4) ‘the open data advocates’. We found that all residents considered transparency important for the quality of public administration, and that municipal transparency is currently lacking. We then provide recommendations for policy makers responsible for municipal open data policies and suggest directions for open data theory development concerning municipal open data policy.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Using Context Based MicroTraining to Develop OER for the Benefit of All

Title: Using Context Based MicroTraining to Develop OER for the Benefit of All

Authors: Joakim Kävrestad (University of Skövde), Marcus Nohlberg (University of Skövde)

Abstract: This paper demonstrates how Context Based MicroTraining (CBMT) can be used to develop open educational resources in a way that benefits students enrolled in university courses as well as anyone who wants to participate in open-learning activities. CBMT is a framework that provides guidelines for how educational resources should be structured. CBMT stipulates that information should be presented in short sequences and that is relevant for the learner’s current situation. In this paper, CBMT is implemented in a practical ICT course using video lectures that are delivered as open educational resources using YouTube. The experiences of enrolled students as well as YouTube users are evaluated as well as the actual results of the enrolled students. The results of the study suggest that users of the video lectures appreciate the learning approach. The actual results, i.e. learning outcomes, of the enrolled students are maintained. The study also demonstrates how using CBMT as open educational resources can free up time for teachers and increase the quality of teaching by benefitting from community feedback.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Analyzing Rich-Club Behavior in Open Source Projects

Title: Analyzing Rich-Club Behavior in Open Source Projects

Authors: Mattia Gasparini (Politecnico di Milano), Javier Luis Canovas Izquierdo (Universtat Oberto de Catalunya), Robert Clariso (Universtat Oberto de Catalunya), Marco Brambilla (Politecnico di Milano), Jordi Cabot (ICREA-UOC)

Abstract: The network of collaborations in an open source project can reveal relevant emergent properties that influence its prospects of success. In this work, we analyze open source projects to determine whether they exhibit a rich-club behavior, i.e., a phenomenon where contributors with a high number of collaborations (i.e., strongly connected within the collaboration network) are likely to cooperate with other well-connected individuals. The presence or absence of a rich-club has an impact on the sustainability and robustness of the project. For this analysis, we build and study a dataset with the 100 most popular projects in GitHub, exploiting connectivity patterns in the graph structure of collaborations that arise from commits, issues and pull requests. Results show that rich-club behavior is present in all the projects, but only few of them have an evident club structure. We compute coefficients both for single source graphs and the overall interaction graph, showing that rich-club behavior varies across different layers of software development. We provide possible explanations of our results, as well as implications for further analysis.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.