LanguageTool participated in Google's Summer of Code 2011 (GSoC2011). What is it? Quoting the GSoC Homepage, "Google Summer of Code is a global program that offers student developers stipends to write code for various open source software projects."
Both LanguageTool projects have been successfully completed this year:
Michael Bryant's project was to add language identification and enable us to reuse linguistic resources from other projects. These were the rules included in the After the Deadline grammar checker (some of them will be included in version 1.5 after some additional checking) and conversion of Constraint Grammar (CG) rules into the format of disambiguation rules. CG is widely used for Scandinavian languages and we hope that adding an easy option to convert them will enable further steps to add deeper linguistic analysis or parsing to LanguageTool without making it too heavy on resources. As far as we know, Michael's conversion of CG rules is the first open-source Java implementation of CG. It is also a practical proof that our disambiguation rules have similar expressive power as CG.
Tao Lin's project was twofold: the first part was to develop a Lucene-based indexing tool that makes it possible to run a rule against a large amount of text. Usually checking large texts needs a lot of time, but thanks to this tool, the rule can be tested within seconds. The other part of Tao's project was to add support for Chinese to LanguageTool. The upcoming version 1.5 of LanguageTool will thus contain more than 200 rules for Chinese text.
Documentation by the GSoC participants
- How to Use Indexer and Searcher for Fast Rule Evaluation (Tao Lin)
- Developing Chinese rules (Tao Lin)
- Adding A New Language To Automatic Language Detection (Michael Bryant)