7th International Symposium on Wikis and Open Collaboration

WikiSym 2011

Mountain View, California
October 3-5, 2011


Forgot your Password ▸

Register ▸


News & Social Media

WikiSym posts updates to:

If you tweet about WikiSym, please use the hashtag #wikisym.


Sponsors

National Science Foundation


Microsoft

Creative Commons

CosmoCode GmbH

Design and Implementation of the Sweble Parser: Unlocking the Structure within Wikipedia

Authors: Hannes Dohrn and Dirk Riehle.

Abstract: The heart of each wiki, including Wikipedia, is its content. Most machine processing starts and ends with this content. At present, such processing is limited, because most wiki en- gines today cannot provide a complete and precise representation of the wiki's content. They can only generate HTML. The main reason is the lack of well-de ned parsers that can handle the complexity of modern wiki markup. This applies to MediaWiki, the software running Wikipedia, and most other wiki engines. This paper shows why it has been so difficult to develop comprehensive parsers for wiki markup. It presents the design and implementation of a parser for Wikitext, the wiki markup language of MediaWiki. We use parsing expression grammars where most parsers used no grammars or grammars poorly suited to the task. Using this parser it is possible to directly and precisely query the structured data within wikis, including Wikipedia. The parser is available as open source from http://sweble.org.

PDF version