OpenSym 2020 will take place in Madrid in August 2020. Given the travel uncertainties around the current Corona crisis, we wanted to reassure everyone that any possible paper at OpenSym that might result from a submission is safe: We guarantee the publication of the proceedings, including all accepted papers, in the ACM digital library. Our strategy is straightforward: If travel is not possible, we will let you participate online. If online is not possible, we will simply create the proceedings and provide accepted authors with their publication. So, nothing should stop you from submitting to OpenSym 2020!
Title: Sentiment Analysis of Open Source Software Community Mailing List: A Preliminary Analysis
Authors: Jumoke Abass Alesinloye (National University of Ireland, Galway), Eoin Groarke (National University of Ireland, Galway), Jaganath Babu (National University of Ireland, Galway), Subathra Srinivasan (National University of Ireland, Galway), Greg Curran (Intel Shannon), Denis Dennehy (National University of Ireland, Galway)
Abstract: Open source software has become increasingly popular with companies looking to create business value through collaboration with distributed communities of organizations and software developers who rely on mailing lists to review code and share their feedback. This preliminary study reports on the sentiment analysis of the Data Plane Development Kit (DPDK.org) mailing list to identify and interpret patterns of sentiment during a release-cycle in 2018.
Title: Sentiment Analysis of Open Source Communities: An Exploratory Study
Authors: Jennifer Ferreira (National University of Ireland, Galway), Michael Glynn (Intel Shannon), David Hunt (Intel Shannon), Jaganath Babu (National University of Ireland, Galway), Denis Dennehy (National University of Ireland, Galway), Kieran Conboy (National University of Ireland, Galway)
Abstract: Open Source Software (OSS) mailing lists have become popular targets for mining sentiment and emotions, as they provide a centralized communication hub between the distributed OSS community. Sentiment and emotions within communities can provide insights into how a community responds to certain events, who are the key members and how their behaviours impact the rest of the community. Such insights can inform initiatives aimed at fostering positive interactions between OSS community members, strengthening social ties, and helping the community accomplish its tasks. This poster presents our initial results from sentiment analysis of an OSS mailing list, and answers two key questions: (1) Given that the mailing list is used for peer-review of code, is the community sentiment negative overall? (2) Is community sentiment related to the month of the release cycle?
Title: Open Data Collaborations – A Snapshot of an Emerging Practice
Authors: Thomas Olsson (RISE Research Institutes of Sweden), Per Runeson (Lund University)
Abstract: Data defined software is becoming more and more prevalent, especially with the advent of machine learning and artificial intelligence. With data defined systems come both challenges – to continue to collect and maintain quality data – and opportunities – open innovation by sharing with others. We propose Open Data Collaboration (ODC) to describe pecuniary and non-pecuniary sharing of open data, similar to Open Source Software. To understand challenges and opportunities with ODC, we ran focus groups with 22 companies and organizations. We observed an interest in the subject, but we conclude that the overall maturity is low and ODC is rare.
Title: The Classification and Potential of Business Archetypes by Using Open Data
Authors: Run Duan (Guangdong University of Technology), Tetsuo Noda (Shimane University)
Abstract: Public data collected or possessed by administrative agencies and subsequently released as Open Data is expected to bring about positive economic effects. The purpose of this paper is to summarize the business archetypes of using Open Data to establish whether this expectation holds true, and to classify Open Data business archetypes into 7 types to predict their commercial potential.
Title: What We Talk About When We Talk About Wikidata Quality: A Literature Survey
Authors: Alessandro Piscopo (University of Southampton), Elena Simperl (University of Southampton)
Abstract: Launched in 2012, Wikidata has already become a success story. It is a collaborative knowledge graph, whose large community has produced so far data about more than 55 million entities. Understanding the quality of the data in Wikidata is key to its widespread adoption and future development. No study has investigated so far to what extent and which aspects of this topic have been addressed. To fill this gap, we surveyed prior literature about data quality in Wikidata. Our analysis includes 28 papers and categorise by quality dimensions addressed. We showed that a number of quality dimensions has not been yet adequately covered, e.g. accuracy and trustworthiness. Future work should focus on these.
Title: When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata
Authors: Lucie-Aimée Kaffee (University of Southampton and TIB Leibniz Information Centre for Science and Technology), Kemele M Endris (TIB Leibniz Information Centre for Science and Technology), Elena Simperl (University of Southampton)
Abstract: The quality and maintainability of a knowledge graph are determined by the process in which it is created. There are different approaches to such processes; extraction or conversion of available data in the web (automated extraction of knowledge such as DBpedia from Wikipedia), community-created knowledge graphs, often by a group of experts, and hybrid approaches where humans maintain the knowledge graph alongside bots. We focus in this work on the hybrid approach of human edited knowledge graphs supported by automated tools. In particular, we analyse the editing of natural language data, i.e. labels. Labels are the entry point for humans to understand the information, and therefore need to be carefully maintained. We take a step toward the understanding of collaborative editing of humans and automated tools across languages in a knowledge graph. We use Wikidata as it has a large and active community of humans and bots working together covering over 300 languages. In this work, we analyse the different editor groups and how they interact with the different language data to understand the provenance of the current label data.
Title: Approving Automation: Analyzing Requests for Permissions of Bots in Wikidata
Authors: Mariam Farda-Sarbas (Freie Universitat Berlin), Hong Zhu (Freie Universität Berlin), Marisa Nest (Freie Universität Berlin), Claudia Muller-Birn (Freie Universität Berlin)
Abstract: Wikidata, initially developed to serve as a central structured knowledge base for Wikipedia, is now a melting point for structured data for companies, research projects and other peer production communities. Wikidata’s community consists of humans and bots, and most edits in Wikidata come from these bots. Prior research has raised concerns regarding the challenges for editors to ensure the quality of bot-generated data, such as the lack of quality control and knowledge diversity. In this research work, we provide one way of tackling these challenges by taking a closer look at the approval process of bot activity on Wikidata. We collected all bot requests, i.e. requests for permissions (RfP) from October 2012 to July 2018. We analyzed these 683 bot requests by classifying them regarding activity focus, activity type, and source mentioned. Our results show that the majority of task requests deal with data additions to Wikidata from internal sources, especially from Wikipedia. However, we can also show the existing diversity of external sources used so far. Furthermore, we examined the reasons which caused the unsuccessful closing of RfPs. In some cases, the Wikidata community is reluctant to implement specific bots, even if they are urgently needed because there is still no agreement in the community regarding the technical implementation. This study can serve as a foundation for studies that connect the approved tasks with the editing behavior of bots on Wikidata to understand the role of bots better for quality control and knowledge diversity.
Title: Dwelling on Wikipedia: Investigating Time Spent by Global Encyclopedia Readers
Authors: Nathan TeBlunthuis (Wikimedia Foundation), Tilman Bayer (Wikimedia Foundation), Olga Vasileva (Wikimedia Foundation)
Abstract: Much existing knowledge about global consumption of peer-produced information goods is supported by data on Wikipedia page view counts and surveys. In 2017, the Wikimedia Foundation began measuring the time readers spend on a given page view (dwell time), enabling a more detailed understanding of such reading patterns. In this paper, we validate and model this new data source and, building on existing findings, use regression analysis to test hypotheses about how patterns in reading time vary between global contexts. Consistent with prior findings from self-report data, our complementary analysis of behavioral data provides evidence that Global South readers are more likely to use Wikipedia to gain in-depth understanding of a topic. We find that Global South readers spend more time per page view and that this difference is amplified on desktop devices, which are thought to be better suited for in-depth information seeking tasks.
Title: Article Quality Classification on Wikipedia: Introducing Document Embeddings and Content Features
Authors: Manuel Schmidt (University of Innsbruck), Eva Zangerle (University of Innsbruck)
Abstract: The quality of articles on the Wikipedia platform is vital for its success. Currently, the assessment of quality is performed manually by the Wikipedia community, where editors classify articles into pre-defined quality classes. However, this approach is hardly scalable and hence, approaches for the automatic classification have been investigated. In this paper, we extend this previous line of research on article quality classification by extending the set of features with novel content and edit features (e.g., document embeddings of articles). We propose a classification approach utilizing gradient boosted trees based on this novel, extended set of features extracted from Wikipedia articles. Based on an established dataset containing Wikipedia articles and quality classes, we show that our approach is able to substantially outperform previous approaches (also including recent deep learning methods). Furthermore, we shed light on the contribution of individual features and show that the proposed features indeed capture the quality of an article well.