Category Archives: WikiSym 2011

Session Preview: Wiki Tools and Interfaces

The technical session Wiki Tools and Interfaces will feature three presentations. See the schedule for details on when and where to go.

Vandalism Detection in Wikipedia: A High-Performing, Feature-Rich Model and its Reduction Through Lasso

Sara Javanmardi, David W. McDonald, Cristina V. Lopes

User generated content (UGC) constitutes a significant fraction of the Web. However, some wiki–based sites, such as Wikipedia, are so popular that they have become a favorite target of spammers and other vandals. In such popular sites, human vigilance is not enough to combat vandalism, and tools that detect possible vandalism and poor-quality contributions become a necessity. The application of machine learning techniques holds promise for developing efficient on-line algorithms for better tools to assist users in vandalism detection. We describe an efficient and accurate classifier that performs vandalism detection in UGC sites. We show the results of our classifier in the PAN Wikipedia dataset. We explore the effectiveness of a combination of 66 individual features that produce an AUC of 0.9553 on a test dataset – the best result to our knowledge. Using Lasso optimization we then reduce our feature-rich model to a much smaller and more efficient model of 28 features that performs almost as well – the drop in AUC being only 0.005. We describe how this approach can be generalized to other user generated content systems and describe several applications of this classifier to help users identify potential vandalism.

Autonomous Link Spam Detection in Purely Collaborative Environments

Andrew G. West, Avantika Agrawal, Phillip Baker, Brittney Exline, Insup Lee

Collaborative models (e.g., wikis) are an increasingly prevalent Web technology. However, the open-access that defines such systems can also be utilized for nefarious purposes. In particular, this paper examines the use of collaborative functionality to add inappropriate hyperlinks to destinations outside the host environment (i.e., link spam). The collaborative encyclopedia, Wikipedia, is the basis for our analysis.

Recent research has exposed vulnerabilities in Wikipedia’s link spam mitigation, finding that human editors are latent and dwindling in quantity. To this end, we propose and develop an autonomous classifier for link additions. Such a system presents unique challenges. For example, low barriers-to-entry invite a diversity of spam types, not just those with economic motivations. Moreover, issues can arise with how a link is presented (regardless of the destination).

In this work, a spam corpus is extracted from over 235,000 link additions to English Wikipedia. From this, 40+ features are codified and analyzed. These indicators are computed using wiki metadata, landing site analysis, and external data sources. The resulting classifier attains 64% recall at 0.5% false-positives (ROC-AUC = 0.97). Such performance could enable egregious link additions to be blocked automatically with low false-positive rates, while prioritizing the remainder for human inspection. Finally, a live Wikipedia implementation of the technique has been developed.

NICE: Social translucence through UI intervention

Aaron Halfaker, Bryan Song, D. Alex Stuart, Aniket Kittur, John Riedl

Social production systems such as Wikipedia rely on attracting and motivating volunteer contributions to be successful. One strong demotivating factor can be when an editor’s work is discarded, or “reverted”, by others. In this paper we demonstrate evidence of this effect and design a novel interface aimed at improving communication between the reverting and reverted editors. We deployed the interface in a controlled experiment on the live Wikipedia site, and report on changes in the behavior of 487 contributors who were reverted by editors using our interface. Our results suggest that simple interface modifications (such as informing Wikipedians that the editor they are reverting is a newcomer) can have substantial positive effects in protecting against contribution loss in newcomers and improving the quality of work done by more experienced contributors.

Session Preview: Designing for Open Collaboration

The technical session Designing for Open Collaboration will feature three presentations. See the schedule for details on when and where to go.

A Meta-reflective Wiki for Collaborative Design

Lu Zhu, Ivan Vaghi, Barbara Rita Barricelli

This paper presents MikiWiki, a meta-reflective wiki developed to prototype key aspects of the Hive-Mind Space model. MikiWiki is aimed at supporting End-User Development activities and exploring the opportunities to enable software tailoring at use time. Such an open-ended collaborative design process is realized by providing basic boundary object prototypes, allowing end users to remix, modify, and create their own boundary objects. Moreover, MikiWiki minimizes essential services at the server-side, while putting the main functionalities on the client-side, opening the whole system to its users for further tailoring. In addition to traditional wikis, MikiWiki allows different Communities of Practice to collaboratively design and to continuously evolve the whole system. This approach illustrates the meta-design concept, where some software collaboration between professional developers and end users is made possible through communication channels properly associated with the environment. As such, the MikiWiki environment is presented as a ‘concept demonstrator’ for meta-design and end-user tailoring.

Wiki Grows Up: Arbitrary Data Models, Access Control, and Beyond

Reid Priedhorsky, Loren Terveen

Ward Cunningham’s vision for the wiki was that it would be “the simplest online database that could possibly work”. We consider here a common manifestation of simplicity: the assumption that the objects in a wiki that can be edited (e.g., Wikipedia articles) are relatively independent. As wiki applications in new domains emerge, however, this assumption is no longer tenable. In wikis where the objects of interest are highly interdependent (e.g., geographic wikis), fundamental concepts like the revision and undoing must be refined. This is particularly so when fine-grained access control is required (as in enterprise wikis or wikis to support collaboration between citizens and government officials). We explore these issues in the context of the Cyclopath geowiki and present solutions that we have designed and have implemented or are implementing.

Design and Implementation of the Sweble Wikitext Parser: Unlocking the Structured Data of Wikipedia

Hannes Dohrn, Dirk Riehle

The heart of each wiki, including Wikipedia, is its content. Most machine processing starts and ends with this content. At present, such processing is limited, because most wiki engines today cannot provide a complete and precise representation of the wiki’s content. They can only generate HTML. The main reason is the lack of well-defined parsers that can handle the complexity of modern wiki markup. This applies to MediaWiki, the software running Wikipedia, and most other wiki engines.

This paper shows why it has been so difficult to develop comprehensive parsers for wiki markup. It presents the design and implementation of a parser for Wikitext, the wiki markup language of MediaWiki. We use parsing expres- sion grammars where most parsers used no grammars or grammars poorly suited to the task. Using this parser it is possible to directly and precisely query the structured data within wikis, including Wikipedia.

The parser is available as open source from http://sweble.org.

Session Preview: Collaboration in Diverse Contexts

The technical session Collaboration in Diverse Contexts will feature three presentations. See the schedule for details on when and where to go.

Quality is a Verb: The Operationalization of Data Quality in a Citizen Science Community

S. Andrew Sheppard, Loren Terveen

Citizen science is becoming more valuable as a potential source of environmental data. Involving citizens in data collection has the added educational benefits of increased scientific awareness and local ownership of environmental concerns. However, a common concern among domain experts is the presumed lower quality of data submitted by volunteers. In this paper, we explore data quality assurance practices in River Watch, a community-based monitoring program in the Red River basin. We investigate how the participants in River Watch understand and prioritize data quality concerns. We found that data quality in River Watch is primarily maintained through universal adherence to standard operating procedures, but there remain areas where technological intervention may help. We also found that rigorous data quality assurance practices appear to enhance rather than hinder the educational goals of the program. We draw implications for the design of quality assurance mechanisms for River Watch and other citizen science projects.

Online and Offline Interactions in Online Communities

Wyl McCully, Cliff Lampe, Chandan Sarkar, Alcides Velasquez, Akshaya Sreevinasan

Online communities, while primarily enacted through technology-mediated environments, can also include offline meetings between members, promoting interactivity and community building. This study explores the offline interactions of online community members and its subsequent impact on online participation. We argue that offline interactions have a counterintuitive impact on online participation. Although these offline interactions strengthen relationships, these relationships undermine the community’s sustainability in terms of site participation. Participation has been defined as contribution of content to the online community. A multi-method analysis technique using content analysis, qualitative interviews, and server level quantitative data of users in Everything2.com supports our claim.

Don’t Leave Me Alone: Effectiveness of a Framed Wiki-Based Learning Activity

Nikolaos Tselios, Panagiota Altanopoulou,Vassilis Komis

In this paper, the effectiveness of a framed wiki-based learning activity is examined. A one-group pretest–posttest design was conducted towards this aim. The study involved 146 first year university students of a Greek Education Department using wikis to learn basic aspects and implications of search engines in the context of a first year course entitled “Introduction to ICT”. Data analysis showed significant improvement in learning outcomes, in particular for students with low initial performance. The average students’ questionnaire score jumped from 38.6% to 55%. In addition, a positive attitude towards using wikis in their project was expressed by the students. The design of the activity, the context of the study and the results obtained are discussed in detail.

Session Preview: Understanding Wikipedia

The technical session Understanding Wikipedia will feature four presentations. See the schedule for details on when and where to go.

WP:Clubhouse? An Exploration of Wikipedia’s Gender Imbalance

Shyong (Tony) K. Lam, Anuradha Uduwage, Zhenhua Dong, Shilad Sen, David R. Musicant, Loren Terveen, John Riedl

Wikipedia has rapidly become an invaluable destination for mil- lions of information-seeking users. However, media reports suggest an important challenge: only a small fraction of Wikipedia’s legion of volunteer editors are female. In the current work, we present a scientific exploration of the gender imbalance in the English Wikipedia’s population of editors. We look at the nature of the imbalance itself, its effects on the quality of the encyclopedia, and several conflict-related factors that may be contributing to the gender gap. Our findings confirm the presence of a large gender gap among editors and a corresponding gender-oriented disparity in the content of Wikipedia’s articles. Further, we find evidence hinting at a culture that may be resistant to female participation.

Gender Differences in Wikipedia Editing

Judd Antin, Raymond Yee, Coye Cheshire, Oded Nov

As Wikipedia has become an indispensable source of online information, concerns about who writes, edits, and maintains it have come to the forefront. In particular, the 2010 UNU-MERIT survey found evidence of a significant gender skew: fewer than 13% of Wikipedia contributors are women. However, the number of contributors is just one way to examine gender differences in contribution. In this paper we take a more fine-grained perspective by examining how much and what types of Wiki-work men and women tend to do. First, we find that the so-called “Gender Gap” in number of editors may not be as wide as prior studies have suggested. Second, although more than 80% of editors in our sample were men, among the bottom 75% of editors by activity-level, we find that men and women made similar numbers of revisions. However, among the most active Wikipedians men tended to make many more revisions than women. Finally, we find that the most active women in our sample tended to make larger revisions than the most active men. We conclude by discussing directions for future research.

Finding Patterns in Behavioral Observations by Automatically Labeling Forms of Wikiwork in Barnstars

David W. McDonald, Sara Javanmardi, Mark Zachry

Our everyday observations about the behaviors of others around us shape how we decide to act or interact. In social media the ability to observe and interpret others’ behavior is limited. This work describes one approach to leverage everyday behavioral observations to develop tools that could improve understanding and sense making capabilities of contributors, managers and researchers of social media systems. One example of behavioral observation is Wikipedia Barnstars. Barnstars are a type of award recognizing the activities of Wikipedia editors. We mine the entire English Wikipedia to extract barnstar observations. We develop a multi-label classifier based on a random forest technique to recognize and label distinct forms of observed and acknowledged activity. We evaluate the classifier through several means including use of separate training and testing datasets and the by application of the classifier to previously unlabeled data. We use the classifier to identify Wikipedia editors who have been observed with some predominant types of behavior and explore whether those patterns of behavior are evident and how observers seem to be making the observations. We discuss how these types of activity observations can be used to develop tools and potentially improve understanding and analysis in wikis and other online communities.

What Wikipedia Deletes: Characterizing Dangerous Collaborative Content

Andrew G. West, Insup Lee

Collaborative environments, such as Wikipedia, often have low barriers-to-entry in order to encourage participation. This accessibility is frequently abused (e.g., vandalism and spam). However, certain inappropriate behaviors are more threatening than others. In this work, we study contributions which are not simply “undone” – but deleted from revision histories and public view. Such treatment is generally reserved for edits which: (1) present a legal liability to the host (e.g., copyright issues, defamation), or (2) present privacy threats to individuals (i.e., contact information).

Herein, we analyze one year of Wikipedia’s public deletion log and use brute-force strategies to learn about privately handled redactions. This permits insight about the prevalence of deletion, the reasons that induce it, and the extent of end-user exposure to dangerous content. While Wikipedia’s approach is generally quite reactive, we find that copyright issues prove most problematic of those behaviors studied.

Panel Preview: Apples to Oranges?

On Wednesday right before lunch, we have a great panel:

Apples to Oranges? Comparing across studies of open collaboration/peer production

Panelists: Judd Antin, Ed H. Chi, James Howison, Sharoda Paul, Aaron Shaw, Jude Yew

This panel seeks to begin a discussion of how we can meaningfully compare and contrast between the diverse instances of open collaboration and peer production employed on the Internet today. Current research on the topic have tended to be too platform- (e.g. Wikipedia) or domain- (e.g. open source) specific. The panelists will be tasked with addressing this problem using their own expertise and research projects to bear on the issue. Ultimately, the panel will seek to lay the foundations for the development of theoretical frameworks and principles for the design and application of open collaboration and CBPP based systems.

WikiSym is seeking student volunteers

We are searching for enthusiastic students (undergraduate, graduate or PhD level) who want to help us running WikiSym 2011. We are celebrating our 7th edition on October 3-5, 2011 at the Microsoft Research Campus in Silicon Valley (Mountain View, California).
Collaboration

The only mandatory requirement is that you must have had student status for the past academic year (2010-2011). All students, regardless of discipline, are encouraged to apply, and no previous experience is required. This opportunity is particularly well suited for Bay Area students

WikiSym volunteers commit to the following tasks:

  • Undertake between 5 and 10 hours of work during the conference days (October 3-5).
  • Collaborate with conference organizational team to set up and run on-site activities.
  • Attend a short guidance session in the afternoon before the conference (Sunday, October 2) , at the conference hotel (also in Mountain View).
  • Typical tasks will include: attend the registration desk, support to set up sessions, support for running the Open Space track and other similar duties.

No other technical or logistics support is forseen, since permanent staff from the conference venue (MSR) will undertake those tasks.

In exchange, students will get free access to the conference (including meals, reception and dinner) for the 3-days.

Applicants must send a short motivational letter by September 28 to chair__at__wikisym–dot–org, explaining why they have interest in participating as volunteers in WikiSym. On September 29 we will publish the list of selected volunteers. We have a limited number of volunteer slots, so please contact us early if you are interested.

Looking forward to meeting you at WikiSym!!

WikiSym 2011 Workshops Preview

Over the next 26 days, we’ll be publishing a series of posts that highlight the awesome content that forms the WikiSym 2011 program. We’ll include titles, authors, and in some cases abstracts.

Today’s installment: workshops! There will be three workshops, all on Monday, October 3. An important thing to note about WikiSym workshops is that attendance is open; one needn’t submit a position paper or be accepted, as is often the case in other conferences. Just show up! However, it never hurts to introduce yourself to the organizers ahead of time. Contact information for workshop organizers is available under the “further details” links.

(Aside: there is still time to make plans to attend WikiSym 2011!)

Lessons from the classroom: Successful techniques for teaching wikis using Wikipedia

LiAnna Davis and Timothy Senate

In the fall of 2010, the Wikimedia Foundation partnered with faculty from several top universities to introduce wiki technology and Wikipedia into class assignments of public policy related subjects. Through assignments based in Wikipedia students improved skills in collaboration, critical thinking, expository writing, media literacy, and technology fluency. In video interviews, students describe their experience and the learning objectives emerged through the Wikipedia assignment. Many students also commented on the satisfaction in producing a research document that had value beyond a grade. Professor Max Klein explains the success of his classroom use of a WikiProject page as a springboard for class discussion and homework assignments. Workshop participants experience some of the Wikipedia training modules through activities. This interactive workshop discloses some successes and failures of the Initiative and details specifically what makes a successful Wikipedia- editing assignment.

Hashtags: #wikisym #wsteach
Further details on this workshop (PDF)

WikiLit: Collecting the Wiki and Wikipedia Literature

Phoebe Ayers and Reid Priedhorsky

This workshop has three key goals. First, we will examine existing and proposed systems for collecting and analyzing the research literature about wikis. Second, we will discuss the challenges in building such a system and will engage participants to design a sustainable collaborative system to achieve this goal. Finally, we will provide a forum to build upon ongoing wiki community discussions about problems and opportunities in finding and sharing the wiki research literature.

Hashtags: #wikisym #wslit
Further details on this workshop

5th Workshop on Wikis for Software Engineering

Ademar Aguiar and Paulo Merson

Using a wiki in software engineering settings dates back to its first usage in 1995. In fact, that was the motivation for Ward Cunningham to create the first wiki. Due to its simplicity, attractiveness and effectiveness for collaborative authoring and knowledge management, wikis are now massively disseminated and used in different domains. This workshop focuses on wikis for the specific domain of software engineering. It aims at bringing together researchers, practitioners, and enthusiasts interested on researching, exploring and learning how wikis can be improved, customized and used to better support software projects. Based on lessons learned and obstacles identified, a research agenda will be defined with key opportunities and challenges.

Hashtags: #wikisym #wikis4se (Updated)
Further details on this workshop (PDF)

WikiSym 2011 social media analysis by EventBurn

We are pleased to announce social media analysis for WikiSym 2011 by EventBurn, a Minneapolis-based startup. To see a live summary of what folks have to say about WikiSym on Twitter, Facebook, and Flickr, browse to:

http://www.eventburn.com/wikisym2011

This dashboard will be available from immediately until a few weeks after the conference. (And, of course, only public postings are included in the analysis.)

As a reminder, please tag your photos and posts with “wikisym” in order to share them with the rest of the WikiSym community and ensure they’re included in the summary.

Please let us know if you have any questions or feedback!

WikiViz 2011: Data Visualization Challenge

Data visualization is an emerging field of interest in many areas such as journalism, consulting or research. The abundance of digital information, and specially open and publicly available datasets, is boosting inspiration of InfoViz practitioners and enthusiast to surprise us with creative and beautiful visualizations.

Eagle(owl)-eye - modified

In WikiSym, we have been planning the best way to promote and disseminate interest in this area, considering the advantages of open content, open datasets and open web technologies. Therefore, together with Wikimedia Foundation we decided to co-organized a challenge to ask data/information visualization experts, computational journalists, data artists and data scientists to create the most insightful visualization of open collaboration data. We will also have several partners from design, innovation, and media collaborating with us in this contest.

The rules, schedule and topic of this year’s challenge will be published very soon. A committee of recognized InfoViz experts will review all submissions to select a winner and 2 finalists, who will be able to attend WikiSym 2011 next October at Microsoft Research Silicon Valley to present their creations and receive their awards.

To learn more about this challenge, you can visit the WikiViz page on the WikiSym 2011 website, or follow us on Twitter.

Deadline extended for posters and demos

You asked, we listened.

Again, we have received many requests for extra time to complete submissions. Thus, please note that the deadline for posters and demos has been extended to Friday, May 20. As usual, this deadline is considered as Apia time (that is, as long as it is May 20 somewhere on Earth, you will be able to submit your work).

Please, follow the submission instructions from the conference website.