Category Archives: Research Contributions

Sentiment Analysis of Open Source Software Community Mailing List: A Preliminary Analysis

Title: Sentiment Analysis of Open Source Software Community Mailing List: A Preliminary Analysis

Authors: Jumoke Abass Alesinloye (National University of Ireland, Galway), Eoin Groarke (National University of Ireland, Galway), Jaganath Babu (National University of Ireland, Galway), Subathra Srinivasan (National University of Ireland, Galway), Greg Curran (Intel Shannon), Denis Dennehy (National University of Ireland, Galway)

Abstract: Open source software has become increasingly popular with companies looking to create business value through collaboration with distributed communities of organizations and software developers who rely on mailing lists to review code and share their feedback. This preliminary study reports on the sentiment analysis of the Data Plane Development Kit (DPDK.org) mailing list to identify and interpret patterns of sentiment during a release-cycle in 2018.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Sentiment Analysis of Open Source Communities: An Exploratory Study

Title: Sentiment Analysis of Open Source Communities: An Exploratory Study

Authors: Jennifer Ferreira (National University of Ireland, Galway), Michael Glynn (Intel Shannon), David Hunt (Intel Shannon), Jaganath Babu (National University of Ireland, Galway), Denis Dennehy (National University of Ireland, Galway), Kieran Conboy (National University of Ireland, Galway)

Abstract: Open Source Software (OSS) mailing lists have become popular targets for mining sentiment and emotions, as they provide a centralized communication hub between the distributed OSS community. Sentiment and emotions within communities can provide insights into how a community responds to certain events, who are the key members and how their behaviours impact the rest of the community. Such insights can inform initiatives aimed at fostering positive interactions between OSS community members, strengthening social ties, and helping the community accomplish its tasks. This poster presents our initial results from sentiment analysis of an OSS mailing list, and answers two key questions: (1) Given that the mailing list is used for peer-review of code, is the community sentiment negative overall? (2) Is community sentiment related to the month of the release cycle?

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Open Data Collaborations – A Snapshot of an Emerging Practice

Title: Open Data Collaborations – A Snapshot of an Emerging Practice

Authors: Thomas Olsson (RISE Research Institutes of Sweden), Per Runeson (Lund University)

Abstract: Data defined software is becoming more and more prevalent, especially with the advent of machine learning and artificial intelligence. With data defined systems come both challenges – to continue to collect and maintain quality data – and opportunities – open innovation by sharing with others. We propose Open Data Collaboration (ODC) to describe pecuniary and non-pecuniary sharing of open data, similar to Open Source Software. To understand challenges and opportunities with ODC, we ran focus groups with 22 companies and organizations. We observed an interest in the subject, but we conclude that the overall maturity is low and ODC is rare.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

The Classification and Potential of Business Archetypes by Using Open Data

Title: The Classification and Potential of Business Archetypes by Using Open Data

Authors: Run Duan (Guangdong University of Technology), Tetsuo Noda (Shimane University)

Abstract: Public data collected or possessed by administrative agencies and subsequently released as Open Data is expected to bring about positive economic effects. The purpose of this paper is to summarize the business archetypes of using Open Data to establish whether this expectation holds true, and to classify Open Data business archetypes into 7 types to predict their commercial potential.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

What We Talk About When We Talk About Wikidata Quality: A Literature Survey

Title: What We Talk About When We Talk About Wikidata Quality: A Literature Survey

Authors: Alessandro Piscopo (University of Southampton), Elena Simperl (University of Southampton)

Abstract: Launched in 2012, Wikidata has already become a success story. It is a collaborative knowledge graph, whose large community has produced so far data about more than 55 million entities. Understanding the quality of the data in Wikidata is key to its widespread adoption and future development. No study has investigated so far to what extent and which aspects of this topic have been addressed. To fill this gap, we surveyed prior literature about data quality in Wikidata. Our analysis includes 28 papers and categorise by quality dimensions addressed. We showed that a number of quality dimensions has not been yet adequately covered, e.g. accuracy and trustworthiness. Future work should focus on these.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata

Title: When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata

Authors: Lucie-Aimée Ka ffee (University of Southampton and TIB Leibniz Information Centre for Science and Technology), Kemele M Endris (TIB Leibniz Information Centre for Science and Technology), Elena Simperl (University of Southampton)

Abstract: The quality and maintainability of a knowledge graph are determined by the process in which it is created. There are different approaches to such processes; extraction or conversion of available data in the web (automated extraction of knowledge such as DBpedia from Wikipedia), community-created knowledge graphs, often by a group of experts, and hybrid approaches where humans maintain the knowledge graph alongside bots. We focus in this work on the hybrid approach of human edited knowledge graphs supported by automated tools. In particular, we analyse the editing of natural language data, i.e. labels. Labels are the entry point for humans to understand the information, and therefore need to be carefully maintained. We take a step toward the understanding of collaborative editing of humans and automated tools across languages in a knowledge graph. We use Wikidata as it has a large and active community of humans and bots working together covering over 300 languages. In this work, we analyse the different editor groups and how they interact with the different language data to understand the provenance of the current label data.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Approving Automation: Analyzing Requests for Permissions of Bots in Wikidata

Title: Approving Automation: Analyzing Requests for Permissions of Bots in Wikidata

Authors: Mariam Farda-Sarbas (Freie Universitat Berlin), Hong Zhu (Freie Universität Berlin), Marisa Nest (Freie Universität Berlin), Claudia Muller-Birn (Freie Universität Berlin)

Abstract: Wikidata, initially developed to serve as a central structured knowledge base for Wikipedia, is now a melting point for structured data for companies, research projects and other peer production communities. Wikidata’s community consists of humans and bots, and most edits in Wikidata come from these bots. Prior research has raised concerns regarding the challenges for editors to ensure the quality of bot-generated data, such as the lack of quality control and knowledge diversity. In this research work, we provide one way of tackling these challenges by taking a closer look at the approval process of bot activity on Wikidata. We collected all bot requests, i.e. requests for permissions (RfP) from October 2012 to July 2018. We analyzed these 683 bot requests by classifying them regarding activity focus, activity type, and source mentioned. Our results show that the majority of task requests deal with data additions to Wikidata from internal sources, especially from Wikipedia. However, we can also show the existing diversity of external sources used so far. Furthermore, we examined the reasons which caused the unsuccessful closing of RfPs. In some cases, the Wikidata community is reluctant to implement specific bots, even if they are urgently needed because there is still no agreement in the community regarding the technical implementation. This study can serve as a foundation for studies that connect the approved tasks with the editing behavior of bots on Wikidata to understand the role of bots better for quality control and knowledge diversity.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Dwelling on Wikipedia: Investigating Time Spent by Global Encyclopedia Readers

Title: Dwelling on Wikipedia: Investigating Time Spent by Global Encyclopedia Readers

Authors: Nathan TeBlunthuis (Wikimedia Foundation), Tilman Bayer (Wikimedia Foundation), Olga Vasileva (Wikimedia Foundation)

Abstract: Much existing knowledge about global consumption of peer-produced information goods is supported by data on Wikipedia page view counts and surveys. In 2017, the Wikimedia Foundation began measuring the time readers spend on a given page view (dwell time), enabling a more detailed understanding of such reading patterns. In this paper, we validate and model this new data source and, building on existing findings, use regression analysis to test hypotheses about how patterns in reading time vary between global contexts. Consistent with prior findings from self-report data, our complementary analysis of behavioral data provides evidence that Global South readers are more likely to use Wikipedia to gain in-depth understanding of a topic. We find that Global South readers spend more time per page view and that this difference is amplified on desktop devices, which are thought to be better suited for in-depth information seeking tasks.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Article Quality Classification on Wikipedia: Introducing Document Embeddings and Content Features

Title: Article Quality Classification on Wikipedia: Introducing Document Embeddings and Content Features

Authors: Manuel Schmidt (University of Innsbruck), Eva Zangerle (University of Innsbruck)

Abstract: The quality of articles on the Wikipedia platform is vital for its success. Currently, the assessment of quality is performed manually by the Wikipedia community, where editors classify articles into pre-defined quality classes. However, this approach is hardly scalable and hence, approaches for the automatic classification have been investigated. In this paper, we extend this previous line of research on article quality classification by extending the set of features with novel content and edit features (e.g., document embeddings of articles). We propose a classification approach utilizing gradient boosted trees based on this novel, extended set of features extracted from Wikipedia articles. Based on an established dataset containing Wikipedia articles and quality classes, we show that our approach is able to substantially outperform previous approaches (also including recent deep learning methods). Furthermore, we shed light on the contribution of individual features and show that the proposed features indeed capture the quality of an article well.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Do You Have a Source for That? Understanding the Challenges of Collaborative Evidence-based Journalism

Title: Do You Have a Source for That? Understanding the Challenges of Collaborative Evidence-based Journalism

Authors: Sheila O’Riordan (University College Cork), Gaye Kiely (University College Cork), Bill Emerson (University College Cork), Joseph Feller (University College Cork)

Abstract: WikiTribune is a pilot news service, where evidence-based articles are co-created by professional journalists and a community of volunteers using an open and collaborative digital platform. The WikiTribune project is set within an evolving and dynamic media landscape, operating under principles of openness and transparency. It combines a commercial for-profit business model with an open collaborative mode of production with contributions from both paid professionals and unpaid volunteers. This descriptive case study captures the first 12-months of WikiTribune’s operations to understand the challenges and opportunities within this hybrid model of production. We use the rich literature on Wikipedia to understand the WikiTribune case and to identify areas of convergence and divergence, as well as avenues for future research. Data was collected on news articles with a focus on the time it takes for an article to reach published status, the number and type of contributors typically involved, article activity and engagement levels, and the types of topics covered.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.