GSoC Application

[This page is our internal preparation for GSoC application]

Organization Name:
LanguageTool

Description:

  • LanguageTool is an Open Source proofreading tool that supports traditional checks associated with grammar checkers but goes far beyond that to check style and frequent language misuse.
  • It supports 29 languages to a varying degree (see http://www.languagetool.org/languages/)
  • LanguageTool is integrated as a plug-in into multiple environments, including LibreOffice, Apache OpenOffice, Firefox, Thunderbird, vim, CheckMate, OmegaT, LyX, and Mac OSX (experimentally).
  • It is a cutting-edge project whose features get attention from the scientific community in Natural Language Processing, and over a dozen papers were published covering new developments and ideas in the project.

Home page:
http://www.languagetool.org

Main Organization License:
LGPL 2.1

Why is your organization applying to participate in GSoC 2013? What do you hope to gain by participating?

  • LanguageTool needs more people working on more languages, especially native speakers who are competent enough to proofread texts and evaluate rules. We hope to attract students who are interested with Natural Language Processing to help develop more and more robust error-detection rules for different purposes, not only standard word processing.
  • We hope to find students that are willing to spend a bit more time working on the code that is not so strictly related to linguistic processing but affects usability of our software. We need to have better UIs and more integration with more systems. Our core developers are no UI experts and never really had time to work on the UI. In particular, we need to integrate LT more with web-based environments and with other open source / libre projects.
  • We are a small project but already have support for many languages. One more is planned (Norwegian - in its two versions). We hope we will have even more, especially minority languages.

If accepted, would this be your first year participating in GSoC?

No

Did your organization participate in past GSoCs? If so, please summarize your involvement and the successes and challenges of your participation.

We participated in 2011 with two students. One student added support for Chinese and developed a feature that allows to check large amounts of texts very fast by indexing the input text. The other student developed a way to port the rules of another Open Source grammar checker automatically and prepared the ground for the future inclusion of more Scandinavian languages that use a specific grammar formalism called 'Constraint Grammar'. (More documentation can be found at http://www.languagetool.org/gsoc2011/)

Both students passed and their code proved useful and became part of the official LanguageTool versions. A bit of a challenge was to actually get the students started. Once that had happened, they worked effectively on their tasks. We will try to avoid a slow start-up by an even more detailed planning this time (see below).

If your organization participated in past GSoCs, please let us know the ratio of students passing to students allocated, e.g. 2006: 3/6 for 3 out of 6 students passed in 2006.

2011: 2/2

What is the URL for your ideas page?
http://languagetool.wikidot.com/missing-features

What is the main development mailing list for your organization?

What is the main IRC channel for your organization?

  • We don't use IRC for our project. All communication is on the mailing list, which helps us to keep track of the discussions.

Does your organization have an application template you would like to see students use?

We expect students to contact us using e-mail; we will make sure we get the following information from all applicants:

  • Name, e-mail address, and other information that may be useful for contact
  • Why are you interested in grammar checking or proofreading?
  • Why are you interested in the LanguageTool project?
  • Which of the published tasks at http://languagetool.wikidot.com/missing-features are you interested in? What do you plan to do?
  • Do you have own ideas not listed on our page?
  • Include a one- or two-page proposal, including a title, reasons why Google and LanguageTool should sponsor it, a description of how and who it will benefit in society, and a detailed work plan including, if possible, a brief schedule with milestones and deliverables. Include time needed to think, to program, to debug, to document and to disseminate. The detailed plan should cover the whole time of the program and for the first two weeks it should be so detailed that you have one or more specific tasks for each day. We know it is difficult to plan ahead like that, but it will help you later when you are actually working on your tasks.
  • List your skills and give evidence of your qualifications. Tell us what your current field of study, major, etc is. Convince us that you can do the work. In particular we would like to know whether you have worked before in Open Source projects.
  • Please list any non-Summer-of-Code plans you have for the Summer, especially employment and class-taking. Be specific about schedules and time commitments. List the number of hours you can work on your GSoC projects per week.

What criteria did you use to select the individuals who will act as mentors for your organization? Please be as specific as possible.

The mentors are long-term members of the project and they speak different languages, which is important to mentor different projects with multiple languages.

  • Daniel Naber started the LanguageTool project in 2003. He created the architecture, wrote the first versions and contributes to other language-related Open Source projects as well. He works as a software developer in Berlin, Germany. Today, he's co-maintainer together with Marcin.
  • Marcin MiƂkowski extended LanguageTool to cover more languages and developed the current XML encoding of the rules, contributes to many other Open Source projects. He published some research papers in natural language engineering covering the cutting-edge features of LanguageTool. He works as a researcher at the Polish Academy of Sciences. He's a co-maintainer of the project.
  • Yakov Reztsov is contributing to our Russian module, and prepared various data sets for his language.
  • Andriy Rysin has created our Ukrainian module and is currently working on a tag dictionary for the Ukrainian language. He works as a software developer in Cary, NC, USA.

What is your plan for dealing with disappearing students?

  • Students will be encouraged to let us know how they want to break up their time, and to plan for holidays and absences (see application template). This way both mentors and students will not be wasting their time. If a mentor reports an unscheduled disappearance of a student (72-hour silence), the student will be contacted by the administrators. If silence persists, their task will be frozen and we will report to Google.

What is your plan for dealing with disappearing mentors?

  • It is quite unlikely, since the mentors are active developers, with long term commitment to the project. If a mentor fails to respond adequately to a student, they will have been instructed to contact the administrators. The administrators will examine the situation; if disappearance (48 hour silence) is confirmed, they will be assigned a different mentor and Google will be informed.

What steps will you take to encourage students to interact with your project's community before and during the program?

  • Developers who have been chosen as mentors will be available on the mailing list so that the student may receive guidance with any problem they may have during development and before taking decisions on which task to select.
  • We will try to get students involved as early as possible in the project, by asking them to submit patches for simple tasks first and review the code they submit.

What will you do to encourage that your accepted students stick with the project after Google Summer of Code concludes?

  • We are very liberal with our SVN commit rights, so any successful student will have full commit rights on the master branch right from the beginning and can thus feel as part of the team. We also try to avoid using branches to make sure all results end up in the next release, if feasible. As releases happen every three months, this means developers get fast feedback on their work. We will also give students due credit on our website.

If you are a small or new organization applying to GSoC, please list a larger, established GSoC organization or a Googler that can vouch for you here.

n/a

If you are a large organization who is vouching for a small organization applying to GSoC for their first time this year, please list their name and why you think they'd be good candidates for GSoC here:

n/a

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License