Category Archives: Proceedings

Predicting the quality of user contributions via LSTMs

Title: Predicting the quality of user contributions via LSTMs

Authors: Rakshit Agrawal and Luca de Alfaro (University of California, Santa Cruz)

Abstract: In many collaborative systems it is useful to automatically estimate the quality of new contributions; the estimates can be used for instance to flag contributions for review. To predict the quality of a contribution by a user, it is useful to take into account both the characteristics of the revision itself, and the past history of contributions by that user. In several approaches, the user’s history is first summarized into a number of features, such as number of contributions, user reputation, time from previous revision, and so forth. These features are then passed along with features of the current revision to a machine-learning classifier, which outputs a prediction for the user contribution. The summarization step is used because the usual machine learning models, such as neural nets, SVMs, etc. rely on a fixed number of input features.We show in this paper that this manual selection of summarization features can be avoided by adopting machine-learning approaches that are able to cope with temporal sequences of input.

In particular, we show that Long-Short Term Memory (LSTM) neural nets are able to process directly the variable length history of a user’s activity in the system, and produce an output that is highly predictive of the quality of the next contribution by the user. Our approach does not eliminatethe process of feature selection, which is present in all machine learning. Rather, it eliminates the need for deciding which features from a user’s past are most useful for predicting the future: we can simply pass to the machine-learning apparatus all the past, and let it come up with an estimate for the quality of the next contribution.

We present models combining LSTM and NN for predicting revision quality and show that the prediction accuracy attained is far superior to the one obtained using the NN alone. More interestingly, we also show that the prediction attained is superior to the one obtained using user reputation as a feature summarizing the quality of a user’s past work. This can be explained by noting that the primary function of user reputation is to provide an incentive towards performing useful contributions, rather than to be a feature optimized for prediction of future contribution quality.

We also show that the LSTM output changes in a natural way in response to user behavior, increasing when the user performs a sequence of good quality contributions,and decreasing when the user performs a sequence of low-quality work. The LSTM output for a user could thus be usefully shown to other users, alongside the user’s reputation and other information.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Differentiating Communication Styles of Leaders on the Linux Kernel Mailing List

Title: Differentiating Communication Styles of Leaders on the Linux Kernel Mailing List

Authors: Daniel Schneider, Scott Spurlock and Megan Squire (Elon University)

Abstract: Much communication between developers of free, libre, and open source software (FLOSS) projects happens on email mailing lists. Geographically and temporally dispersed development teams use email as an asynchronous, centralized, persistently stored institutional memory for sharing code samples, discussing bugs, and making decisions. Email is especially important to large, mature projects, such as the Linux kernel, which has thousands of developers and a multilayered leadership structure. In this paper, we collect and analyze data to understand the communication patterns in such a community. How do the leaders of the Linux Kernel project write in email? What are the salient features of their writing, and can we discern one leader from another? We find that there are clear written markers for two leaders who have been particularly important to recent discussions of leadership style on the Linux Kernel Mailing List (LKML): Linux Torvalds and Greg Kroah-Hartman. Furthermore, we show that it is straightforward to use a machine learning strategy to automatically differentiate these two leaders based on their writing. Our findings will help researchers understand how this community works, and why there is occasional controversy regarding differences in communication styles on the LKML.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Motivation of Newcomers to FLOSS Projects

Title: Motivation of Newcomers to FLOSS Projects

Authors: Christoph Hannebauer and Volker Gruhn (paluno – The Ruhr Institute for Software Technology University of Duisburg-Essen)

Abstract: While the motivations of Free/Libre and Open Source Software (FLOSS) developers have been the subject of extensive research, the motivations for their initial contribution to a FLOSS project has received only little attention. This survey of 94 newcomers to the FLOSS projects Mozilla and GNOME identifies the motivations for the modification of the FLOSS components and for the submission of these modifications back to the FLOSS project. With the responses, we test a hypothesis based on the previous qualitative research on newcomer motivations: Most newcomers modify a component because they need the modification for themselves. Surprisingly, this is not the case for our respondents, who have a variety of primary modification motivations. Newcomer occupation is discussed as a reason for this difference to previous results.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Observing Custom Software Modifications: A Quantitative Approach of Tracking the Evolution of Patch Stacks

Title: Observing Custom Software Modifications: A Quantitative Approach of Tracking the Evolution of Patch Stacks

Authors: Ralf Ramsauer (Technical University of Applied Sciences Regensburg); Daniel Lohmann (Friedrich-Alexander University Erlangen-Nuremberg); Wolfgang Mauerer (Technical University of Applied Sciences Regensburg Siemens AG, Munich)

Abstract: Modifications to open-source software (OSS) are often provided in the form of “patch stacks”– sets of changes (patches) that modify a given body of source code. Maintaining patch stacks over extended periods of time is problematic when the underlying base project changes frequently. This necessitates a continuous and engineering-intensive adaptation of the stack. Nonetheless, long-term maintenance is an important problem for changes that are not integrated into projects, for instance when they are controversial or only of value to a limited group of users. We present and implement a methodology to systematically examine the temporal evolution of patch stacks, track non-functional properties like integrability and maintainability, and estimate the eventual economic and engineering effort required to successfully develop and maintain patch stacks. Our results provide a basis for quantitative research on patch stacks, including statistical analyses and other methods that lead to actionable advice on the construction and long-term maintenance of custom extensions to OSS.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Proceedings of OpenSym 2015 Made Available

Update 2015-09-18: The proceedings and the companion to the proceedings have been posted in the ACM Digital Library.

The proceedings of OpenSym 2015 can now be found on the proceedings page of the conference website. The papers will also appear in the ACM Digital Library (but haven’t posted yet). We will update the Archives page (which includes the proceedings from all years) once we get the information from the ACM.

The Evolution Of Knowledge Creation Online: Wikipedia and Knowledge Processes

Title: The Evolution Of Knowledge Creation Online: Wikipedia and Knowledge Processes

Authors: Ruqin Ren (Annenberg School for Communication, University of Southern California)

Abstract: Using the evolutionary theory framework of the variation, retention, selection process, this paper explains the self-organized knowledge production behaviors online, with Wikipedia as an example. Evolution is presented as a trial-and-error process that produces a progressive accumulation of knowledge. The underlying theoretical assumption is that even though online communities feature very different characteristics than traditional organizations, the basic processes of trial-and-error learning in evolutionary theory still apply to the new forms of organizations. Based on the theory of self-organization system and evolution theory, the processes of variation and selection are explained in depth with examples observed on Wikipedia. The study presents a nested hierarchy of vicarious selectors that plays an important role in online knowledge creation.

This contribution to OpenSym 2015 will be made available as part of the OpenSym 2015 proceedings (or companion) on or after August 19, 2015.

Use of GitHub as a Platform for Open Collaboration on Text Documents

Title: Use of GitHub as a Platform for Open Collaboration on Text Documents

Authors: Justin Longo (University of Regina Johnson-Shoyama Graduate School of Public Policy, Canada), Tanya M. Kelley (Arizona State University, U.S.A.)

Abstract: Recently, researchers are paying attention to the use of the software development and code-hosting web service GitHub for other collaborative purposes, including a class of activity referred to as document, text, or prose collaboration. These alternative uses of GitHub as a platform for sharing non-code artifacts represent an important modification in the practice of open collaboration. We survey cases where GitHub has been used to facilitate collaboration on non-code outputs, identify its strengths and weaknesses when used in this mode, and propose conditions for successful collaborations on co-created text documents.

This contribution to OpenSym 2015 will be made available as part of the OpenSym 2015 proceedings (or companion) on or after August 19, 2015.

Toward efficient source code sharing on the Web

Title: Toward efficient source code sharing on the Web

Authors: Hiroaki Fukuda (Shibura Institute of Technology, Japan)

Abstract: The Web is one of the useful references for developers to find pieces of code that represent what they need nowadays. In addition, we can find websites that contain not only source code but also detailed explanations of the code. In these websites, explanations are usually located above/below code, thereby users, who refer to these explanations, sometimes need to scroll a (browser) window to understand pieces of code reading the corresponding explanations. As a consequence, users have to temporarily memorize code and/or the corresponding expositions, wasting extra time. On the other hand, it is common to use wiki to edit a set of code and corresponding explanations. In most wiki systems, they prepare only one window to edit code and its explanations, therefore editors usually need to scroll the window to complete editing, also consuming extra time. This paper proposes a special wiki system for reading and editing source code referring its explanations, called CodeWiki that provides multiple windows for editors to edit code and explanations. Besides, CodeWiki enables readers to click a link which will lead them to a window that contains corresponding explanations. As a consequence, readers and editors do not need to scroll a window, meaning that CodeWiki can prevent readers/editors from wasting extra time. We propose a prototype implementation of CodeWiki and show its usage.

This contribution to OpenSym 2015 will be made available as part of the OpenSym 2015 proceedings (or companion) on or after August 19, 2015.

Social Collaboration Metrics

Title: Social Collaboration Metrics

Author: Manfred Langen (Siemens AG)

Abstract: Social Media in the enterprise is widely introduced, and its benefit in general is not in doubt. But the arguments of better communication and improved networking of employees will not be sufficient in the long term. Today’s metrics on registered users, number of visits or user generated content have to prove a relation to real business impact. Therefore, we at Siemens Corporate Technology developed the ICUP model (Impact, Connectedness, User engagement, Platform adoption) to close the gap between counting registered users and measuring business value.

This contribution to OpenSym 2015 will be made available as part of the OpenSym 2015 proceedings (or companion) on or after August 19, 2015.

Govwiki.US: An Open Directory of US Local Governments

Title: Govwiki.US: An Open Directory of US Local Governments

Author: Marc D. Joffe (Public Sector Credit Solutions, USA), Vadim Ivlev (Electronic Archive, Russian Federation)

Abstract: This demonstration describes a new open source and open data website we are planning to interface with Wikipedia.

This contribution to OpenSym 2015 will be made available as part of the OpenSym 2015 proceedings (or companion) on or after August 19, 2015.