Title: Predicting Open Source Programming Language Repository File Survivability From Forking Data
Authors: Bee Bee Chua (University of Technology Sydney), Ying Zhang (University of Technology Sydney)
Abstract: Very few studies have looked at repositories’ programming language survivability in response to forking conditions. A high number of repository programming languages does not alone ensure good forking performance. To address this issue and assist project owners in adopting the right programming language, it is necessary to predict programming language survivability from forking in repositories. This paper therefore addresses two related questions: are there statistically meaningful patterns within repository data and, if so, can these patterns be used to predict programming language survival? To answer these questions we analysed 47,000 forking instances in 1000 GitHub projects. We used Euclidean distance applied in the K-Nearest Neighbour algorithm to predict the distance between repository file longevity and forking conditions. We found three pattern types (‘once-only’, intermittent or steady) and propose reasons for short-lived programming languages.