All posts by Agnes Low

Predicting the quality of user contributions via LSTMs

Title: Predicting the quality of user contributions via LSTMs

Authors: Rakshit Agrawal and Luca de Alfaro (University of California, Santa Cruz)

Abstract: In many collaborative systems it is useful to automatically estimate the quality of new contributions; the estimates can be used for instance to flag contributions for review. To predict the quality of a contribution by a user, it is useful to take into account both the characteristics of the revision itself, and the past history of contributions by that user. In several approaches, the user’s history is first summarized into a number of features, such as number of contributions, user reputation, time from previous revision, and so forth. These features are then passed along with features of the current revision to a machine-learning classifier, which outputs a prediction for the user contribution. The summarization step is used because the usual machine learning models, such as neural nets, SVMs, etc. rely on a fixed number of input features.We show in this paper that this manual selection of summarization features can be avoided by adopting machine-learning approaches that are able to cope with temporal sequences of input.

In particular, we show that Long-Short Term Memory (LSTM) neural nets are able to process directly the variable length history of a user’s activity in the system, and produce an output that is highly predictive of the quality of the next contribution by the user. Our approach does not eliminatethe process of feature selection, which is present in all machine learning. Rather, it eliminates the need for deciding which features from a user’s past are most useful for predicting the future: we can simply pass to the machine-learning apparatus all the past, and let it come up with an estimate for the quality of the next contribution.

We present models combining LSTM and NN for predicting revision quality and show that the prediction accuracy attained is far superior to the one obtained using the NN alone. More interestingly, we also show that the prediction attained is superior to the one obtained using user reputation as a feature summarizing the quality of a user’s past work. This can be explained by noting that the primary function of user reputation is to provide an incentive towards performing useful contributions, rather than to be a feature optimized for prediction of future contribution quality.

We also show that the LSTM output changes in a natural way in response to user behavior, increasing when the user performs a sequence of good quality contributions,and decreasing when the user performs a sequence of low-quality work. The LSTM output for a user could thus be usefully shown to other users, alongside the user’s reputation and other information.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Differentiating Communication Styles of Leaders on the Linux Kernel Mailing List

Title: Differentiating Communication Styles of Leaders on the Linux Kernel Mailing List

Authors: Daniel Schneider, Scott Spurlock and Megan Squire (Elon University)

Abstract: Much communication between developers of free, libre, and open source software (FLOSS) projects happens on email mailing lists. Geographically and temporally dispersed development teams use email as an asynchronous, centralized, persistently stored institutional memory for sharing code samples, discussing bugs, and making decisions. Email is especially important to large, mature projects, such as the Linux kernel, which has thousands of developers and a multilayered leadership structure. In this paper, we collect and analyze data to understand the communication patterns in such a community. How do the leaders of the Linux Kernel project write in email? What are the salient features of their writing, and can we discern one leader from another? We find that there are clear written markers for two leaders who have been particularly important to recent discussions of leadership style on the Linux Kernel Mailing List (LKML): Linux Torvalds and Greg Kroah-Hartman. Furthermore, we show that it is straightforward to use a machine learning strategy to automatically differentiate these two leaders based on their writing. Our findings will help researchers understand how this community works, and why there is occasional controversy regarding differences in communication styles on the LKML.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Motivation of Newcomers to FLOSS Projects

Title: Motivation of Newcomers to FLOSS Projects

Authors: Christoph Hannebauer and Volker Gruhn (paluno – The Ruhr Institute for Software Technology University of Duisburg-Essen)

Abstract: While the motivations of Free/Libre and Open Source Software (FLOSS) developers have been the subject of extensive research, the motivations for their initial contribution to a FLOSS project has received only little attention. This survey of 94 newcomers to the FLOSS projects Mozilla and GNOME identifies the motivations for the modification of the FLOSS components and for the submission of these modifications back to the FLOSS project. With the responses, we test a hypothesis based on the previous qualitative research on newcomer motivations: Most newcomers modify a component because they need the modification for themselves. Surprisingly, this is not the case for our respondents, who have a variety of primary modification motivations. Newcomer occupation is discussed as a reason for this difference to previous results.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Observing Custom Software Modifications: A Quantitative Approach of Tracking the Evolution of Patch Stacks

Title: Observing Custom Software Modifications: A Quantitative Approach of Tracking the Evolution of Patch Stacks

Authors: Ralf Ramsauer (Technical University of Applied Sciences Regensburg); Daniel Lohmann (Friedrich-Alexander University Erlangen-Nuremberg); Wolfgang Mauerer (Technical University of Applied Sciences Regensburg Siemens AG, Munich)

Abstract: Modifications to open-source software (OSS) are often provided in the form of “patch stacks”– sets of changes (patches) that modify a given body of source code. Maintaining patch stacks over extended periods of time is problematic when the underlying base project changes frequently. This necessitates a continuous and engineering-intensive adaptation of the stack. Nonetheless, long-term maintenance is an important problem for changes that are not integrated into projects, for instance when they are controversial or only of value to a limited group of users. We present and implement a methodology to systematically examine the temporal evolution of patch stacks, track non-functional properties like integrability and maintainability, and estimate the eventual economic and engineering effort required to successfully develop and maintain patch stacks. Our results provide a basis for quantitative research on patch stacks, including statistical analyses and other methods that lead to actionable advice on the construction and long-term maintenance of custom extensions to OSS.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

GNU Health: A Free/Libre Community-based Health Information System

Luis Falcón Martín of GNU Solidario, will be presenting the following keynote at OpenSym 2016:

Title: GNU Health: A Free/Libre Community-based Health Information System

Abstract: GNU Health is community-based, Free/Libre Health and Hospital Information System, deployed in many countries around the globe. It merges Social Medicine with state of the art advances in bioinformatics, providing a framework for integrative medicine, governments and Public Health institutions as well as research organizations. In this presentation we will talk about case studies in Public health, integration with other Free Software community projects such as OpenStreetMaps, and the upcoming GNU Health Federation model to interconnect large, heterogeneous health networks. We will present some of the upcoming features on GNU Health, including topics on interoperability and standards (HL7 FHIR) or MyGnuHealth, a mobile application for Personal Health. Finally, we will dedicate a section to the GNU Health functionality on bioinformatics, personalized medicine, clinical genetics, big data, and cooperation with the academia, research institutions and multi-lateral organizations.

Speaker’s Biography: Luis Falcón, M.D., B.Sc, holds a degree in Computer Science and Mathematics from the California State University, Northridge (USA) and in Medicine from IUCS, Buenos Aires (Argentina). Luis is a social, animal rights and Free Software activist. He is the founder of GNU Solidario, a nonprofit organization that delivers Health and Education with Free Software. Luis is the author of GNU Health (http://health.gnu.org), the award-winning Free/Libre Health and Hospital Information System. He is a guest speaker at national and international conferences about Free Software, eHealth and Social Medicine. He currently lives in the Canary Islands.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Good Citizenship is Good Business: Open Source, Sustainable Development and the Corporate Bottom Line

Leslie Hawthorn will be presenting the following keynote at OpenSym 2016:

Title: Good Citizenship is Good Business: Open Source, Sustainable Development and the Corporate Bottom Line

Abstract: This talk examines the current landscape of open source project and enterprise interplay, including the tensions between them. Leslie will demonstrate how models have developed to ease these problematic areas for corporations, but how these new models do not necessarily meet the needs of individual developers. She will conclude with a discussion of how adhering to well-worn approaches to open source software development are not only best practice for corporate players, but provide them with long-term benefits from the perspective of sustainability, employee retention and community good will.

Speaker’s Biography: As an internationally known Developer Relations strategist and Community Management expert, Leslie Hawthorn has spent the past decade creating, cultivating, and enabling open source communities. She’s best known for creating Google Code-in, the world’s first global initiative to involve pre-university students in open source software development, launching the second-most trafficked Google’s Developer Blog, and receiving an O’Reilly Open Source Award in 2010 for her work to grow the Google Summer of Code program and her contributions to Humanitarian open source projects. During her 15 years working in the technology industry, Leslie has developed, honed and shared open source expertise spanning the Enterprise to NGOs, including senior roles at Google, Red Hat, the Open Source Initiative, the OSU Open Source Lab and several startups, including Elastic. Born and raised in Silicon Valley, she and her family now call Amsterdam home, though she travels worldwide to keynote about open source, and building products and teams that are built to last. You can follow her adventures on Twitter @lhawthorn.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Truly Open OER: What the Open Education Movement Can Learn from Open Source’s Success

Adam Blum of Open Ed, will be presenting the following keynote at OpenSym 2016:

Title: Truly Open OER: What the Open Education Movement Can Learn from Open Source’s Success

Abstract: Most OER repositories have been around for more than a decade but the growth rates have been marginal. By contrast open source has become the dominant platform for web development. We believe the primary reason is that OER has not become truly open. A new definition of “open” in OER could be: open source the catalog itself, provide an open API for searching and contributing resources, open universal access to all partners, and openness to paid and free content. We’ll describe how each of these principles will accelerate adoption and impact of OER.

Speaker’s Biography: CEO/CTO/VP Engineering of several successful startups. Formerly adjunct professor at UC Berkeley and Carnegie Mellon. Author of three computer science texts, including first book on web server development. Continually active open source contributor. Now building OpenEd – the largest K-12 resource library and “operating system for personalized learning”, used by many other ed tech companies to provide just the right resource for each student. OpenEd was acquired by ACT in May of this year.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Second round of Industry and Community Track Submissions open until June 2nd

In order to accomodate the time-line of industry participants of OpenSym 2016, we have two deadlines for industry and community track contributions. The first one passed already, but the second one for late-comers is still open. Submit your paper or proposal by June 2nd! Learn more about the OpenSym 2016 industry track.

The Evolution Of Knowledge Creation Online: Wikipedia and Knowledge Processes

Title: The Evolution Of Knowledge Creation Online: Wikipedia and Knowledge Processes

Authors: Ruqin Ren (Annenberg School for Communication, University of Southern California)

Abstract: Using the evolutionary theory framework of the variation, retention, selection process, this paper explains the self-organized knowledge production behaviors online, with Wikipedia as an example. Evolution is presented as a trial-and-error process that produces a progressive accumulation of knowledge. The underlying theoretical assumption is that even though online communities feature very different characteristics than traditional organizations, the basic processes of trial-and-error learning in evolutionary theory still apply to the new forms of organizations. Based on the theory of self-organization system and evolution theory, the processes of variation and selection are explained in depth with examples observed on Wikipedia. The study presents a nested hierarchy of vicarious selectors that plays an important role in online knowledge creation.

This contribution to OpenSym 2015 will be made available as part of the OpenSym 2015 proceedings (or companion) on or after August 19, 2015.

Use of GitHub as a Platform for Open Collaboration on Text Documents

Title: Use of GitHub as a Platform for Open Collaboration on Text Documents

Authors: Justin Longo (University of Regina Johnson-Shoyama Graduate School of Public Policy, Canada), Tanya M. Kelley (Arizona State University, U.S.A.)

Abstract: Recently, researchers are paying attention to the use of the software development and code-hosting web service GitHub for other collaborative purposes, including a class of activity referred to as document, text, or prose collaboration. These alternative uses of GitHub as a platform for sharing non-code artifacts represent an important modification in the practice of open collaboration. We survey cases where GitHub has been used to facilitate collaboration on non-code outputs, identify its strengths and weaknesses when used in this mode, and propose conditions for successful collaborations on co-created text documents.

This contribution to OpenSym 2015 will be made available as part of the OpenSym 2015 proceedings (or companion) on or after August 19, 2015.