Category Archives: Full Research Papers

Using Context Based MicroTraining to Develop OER for the Benefit of All

Title: Using Context Based MicroTraining to Develop OER for the Benefit of All

Authors: Joakim Kävrestad (University of Skövde), Marcus Nohlberg (University of Skövde)

Abstract: This paper demonstrates how Context Based MicroTraining (CBMT) can be used to develop open educational resources in a way that benefits students enrolled in university courses as well as anyone who wants to participate in open-learning activities. CBMT is a framework that provides guidelines for how educational resources should be structured. CBMT stipulates that information should be presented in short sequences and that is relevant for the learner’s current situation. In this paper, CBMT is implemented in a practical ICT course using video lectures that are delivered as open educational resources using YouTube. The experiences of enrolled students as well as YouTube users are evaluated as well as the actual results of the enrolled students. The results of the study suggest that users of the video lectures appreciate the learning approach. The actual results, i.e. learning outcomes, of the enrolled students are maintained. The study also demonstrates how using CBMT as open educational resources can free up time for teachers and increase the quality of teaching by benefitting from community feedback.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Analyzing Rich-Club Behavior in Open Source Projects

Title: Analyzing Rich-Club Behavior in Open Source Projects

Authors: Mattia Gasparini (Politecnico di Milano), Javier Luis Canovas Izquierdo (Universtat Oberto de Catalunya), Robert Clariso (Universtat Oberto de Catalunya), Marco Brambilla (Politecnico di Milano), Jordi Cabot (ICREA-UOC)

Abstract: The network of collaborations in an open source project can reveal relevant emergent properties that influence its prospects of success. In this work, we analyze open source projects to determine whether they exhibit a rich-club behavior, i.e., a phenomenon where contributors with a high number of collaborations (i.e., strongly connected within the collaboration network) are likely to cooperate with other well-connected individuals. The presence or absence of a rich-club has an impact on the sustainability and robustness of the project. For this analysis, we build and study a dataset with the 100 most popular projects in GitHub, exploiting connectivity patterns in the graph structure of collaborations that arise from commits, issues and pull requests. Results show that rich-club behavior is present in all the projects, but only few of them have an evident club structure. We compute coefficients both for single source graphs and the overall interaction graph, showing that rich-club behavior varies across different layers of software development. We provide possible explanations of our results, as well as implications for further analysis.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Ranking Warnings From Multiple Source Code Static Analyzers via Ensemble Learning

Title: Ranking Warnings From Multiple Source Code Static Analyzers via Ensemble Learning

Authors: Athos Ribeiro (University of São Paulo), Paulo Meirelles (Federal University of São Paulo), Nelson Lago (University of São Paulo), Fabio Kon (University of São Paulo)

Abstract: While there is a wide variety of both open source and proprietary source code static analyzers available in the market, each of them usually performs better in a small set of problems, making it hard to choose one single tool to rely on when examining a program looking for bugs in the source code. Combining the analysis of different tools may reduce the number of false negatives, but yields a corresponding increase in the absolute number of false positives (which is already high for many tools). A possible solution, then, is to filter these results to identify the issues least likely to be false positives. In this study, we post-analyze the reports generated by three tools on synthetic test cases provided by the US National Institute of Standards and Technology. In order to make our technique as general as possible, we limit our data to the reports themselves, excluding other information such as change histories or code metrics. The features extracted from these reports are used to train a set of decision trees using AdaBoost to create a stronger classifier, achieving 0.8 classification accuracy (the combined false positive rate from the used tools was 0.61). Finally, we use this classifier to rank static analyzer alarms based on the probability of a given alarm being an actual bug in the source code.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Continuous Assessment in Software Engineering Project Course Using Publicly Available Data From GitHub

Title: Continuous Assessment in Software Engineering Project Course Using Publicly Available Data From GitHub

Authors: Henrik Gustavsson (University of Skövde), Marcus Brohede (University of Skövde)

Abstract: This paper describes an approach for assessment in a large software engineering project course. We propose an approach for continuously collecting information from a source code repository and collaboration tool, and using this information for assessing student contributions and also for assessing the course as a whole from the teacher’s standpoint. We present how we display metrics for how the students perform in relation to some of the requirements of the course. We argue that continuous summative assessment feedback to the students on how they are performing in the project is a suitable strategy for ensuring active participation from the students for the duration of the project course.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

FLOSS FAQ Chatbot Project Reuse – How to Allow Nonexperts to Develop a Chatbot

Title: FLOSS FAQ Chatbot Project Reuse – How to Allow Nonexperts to Develop a Chatbot

Authors: Arthur R. T. de Lacerda (University of Brasilia), Carla Silva Rocha Aguiar (University of Brasilia)

Abstract: FAQ chatbots possess the capability to provide answers to frequently asked questions of a particular service, platform, or system. Currently, FAQ chatbot is the most popular domain of use of dialog assistants. However, developing a chatbot project requires a full-stack team formed by numerous specialists, such as dialog designer, data scientist, software engineer, DevOps, business strategist and experts from the domain, which can be both time and resources consuming. Language processing can be particularly challenging in languages other than English due to the scarcity of training datasets.

Most of the requirements of FAQ chatbots are similar, domain-specific, and projects could profit from Open Source Software (OSS) reuse. In this paper, we examine how OSS FAQ chatbot projects can benefit from reuse at the project level (black-box reuse). We present an experience report of a FLOSS FAQ chatbot project developed in Portuguese to an e-government service in Brazil. It comprises of the chatbot distribution service, as well as for analytics tool integrated and deployed on-premises. We identified assets that could be reused as a black-box and the assets that should be customized for a particular application. We categorized these assets in architecture, corpus, dialog flows, machine learning models, and documentation. This paper discusses how automation, pre-configuration, and templates can aid newcomers to develop chatbots in Portuguese without the need for specialized skills required from tools in chatbot architecture. Our main contribution is to highlight the issues non-English FAQ chatbots projects will likely face and the assets that can be reused. It allows non-chatbot experts to develop a quality-assured OSS FAQ chatbot in a shorter project cycle.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Predicting Open Source Programming Language Repository File Survivability From Forking Data

Title: Predicting Open Source Programming Language Repository File Survivability From Forking Data

Authors: Bee Bee Chua (University of Technology Sydney), Ying Zhang (University of Technology Sydney)

Abstract: Very few studies have looked at repositories’ programming language survivability in response to forking conditions. A high number of repository programming languages does not alone ensure good forking performance. To address this issue and assist project owners in adopting the right programming language, it is necessary to predict programming language survivability from forking in repositories. This paper therefore addresses two related questions: are there statistically meaningful patterns within repository data and, if so, can these patterns be used to predict programming language survival? To answer these questions we analysed 47,000 forking instances in 1000 GitHub projects. We used Euclidean distance applied in the K-Nearest Neighbour algorithm to predict the distance between repository file longevity and forking conditions. We found three pattern types (‘once-only’, intermittent or steady) and propose reasons for short-lived programming languages.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Getting Started With Open Source Governance and Compliance in Companies

Title: Getting Started With Open Source Governance and Compliance in Companies

Authors: Nikolay Harutyunyan (Friedrich-Alexander University Erlangen-Nürnberg), Dirk Riehle (Friedrich-Alexander University Erlangen-Nürnberg)

Abstract: Commercial use of open source software is on the rise as more companies realize the benefits of using FLOSS components in their products. At the same time, the ungoverned use of such components can result in legal, financial, intellectual property, and other risks. To mitigate these risks, companies must govern their use of open source through appropriate processes. This paper presents an initial theory of industry best practices on getting started with open source governance and compliance. Through a qualitative survey, we conducted and analyzed 15 expert interviews in companies with advanced capabilities in open source governance. We also studied practitioner reports on existing practices for introducing FLOSS governance processes. We cast our resulting initial theory in the actionable format of best practice patterns that, when combined, form a practical handbook of getting started with FLOSS governance in companies.

Download: This contribution is part of the OpenSym 2019 proceedings and is available as a PDF file.

Evaluating the Impact of the Wikipedia Teahouse on newcomer socialization and retention

Title: Evaluating the Impact of the Wikipedia Teahouse on newcomer socialization and retention

Authors: Jonathan T Morgan and Aaron Halfaker, Wikimedia Foundation

Abstract: Effective socialization of new contributors is vital for the long-term sustainability of open collaboration projects. Previous research has identified many common barriers to participation. However, few interventions employed to increase newcomer retention over the long term by improving aspects of the onboarding experience have demonstrated success. This study presents an evaluation of the impact of one such intervention, the Wikipedia Teahouse, on new editor survival. In a controlled experiment, we find that new editors invited to the Teahouse are retained at a higher rate than editors who do not receive an invite. The effect is observed for both low- and high-activity newcomers, and for both short- and long-term survival.

Download: This contribution is part of the OpenSym 2018 proceedings, has been awarded the best paper and is available as a PDF file.

Stigmergic Coordination in Wikipedia

Title: Stigmergic Coordination in Wikipedia

Authors: Amira Rezgui (IMT Atlantique) and Kevin Crowston (Syracuse University)

Abstract: We look for evidence of stigmergic coordination (i.e., coordination mediated by changes to a shared work product) in the context of Wikipedia. Using a novel approach to identifying edits to the same part of a Wikipedia article, we show that a majority of edits to two example articles are not associated with discussion on the article Talk page, suggesting the possibility of stigmergic coordination. However, discussion does seem to be related to article quality, suggesting the limits to this approach to coordination.

Download: The contribution is part of the OpenSym 2018 proceedings and is available as a PDF file.

Do We All Talk Before We Type?: Understanding Collaboration in Wikipedia Language Editions

Title: Do We All Talk Before We Type?: Understanding Collaboration in Wikipedia Language Editions

Authors: Taryn Bipat, David W. McDonald and Mark Zachry, University of Washington

Abstract: The English language Wikipedia is notable for its large number of articles and for the intricate collaborative interactions that create and sustain it. However, 288 other active language editions of Wikipedia have also developed through the coordination of contributing editors. While collaboration in the English Wikipedia has been researched extensively, these other language editions remain understudied. Our study leverages an influential collaboration model based on behaviors in the English Wikipedia as a lens to consider collaborative activity in the Spanish and French language editions. Through an analysis of collaborative interactions across article talk pages, we demonstrate that talk pages, the locus of most collaboration on the English Wikipedia, are used differently in these different language editions. Our study raises broader questions about how results from studies of the English Wikipedia generalize to other language editions, demonstrates the need to account for variations in collaborative behaviors in all language editions of Wikipedia and presents evidence that collaborative practices on the English Wikipedia have changed overtime.

Download: The contribution is part of the OpenSym 2018 proceedings and is available as a PDF file.