This wiki has been moved to - this page is archived and is not updated anymore

After the Deadline
After the Deadline is open-source language-checking software. Refer to
An API (application programming interface) is a specification that describes how different software components interact. Refer to
AtD is an abbreviation of After the Deadline.
In an XML document, an attribute is a part of an element. An attribute gives additional information about an element or the content of an element.
Refer to finite-state automaton.
AWK is a programming language that is designed for text processing. Refer to
base form
The term base form is a synonym of the term lemma.
bilingual text
A bilingual text is a text that contains 2 languages. Segments (usually sentences) are 'aligned'. For example, in a spreadsheet, one column contains source sentences and another column contains translated sentences. Refer to
Refer to bilingual text.
Brill tagger
A Brill tagger is a type of POS tagger. A Brill tagger uses a small set of language rules to assign a part of speech to a word. Refer to and
In the grammar.xml file, a category is an element that is used to put rules into groups. Refer to
A chunk (or phrase) is a group of one or more words that has a particular part of speech. Refer to Refer also to noun chunk; verb chunk.
chunk tag
A chunk tag is a tag that specifies the part of speech that a chunk has. A chunk tag also gives other information such as whether a word in a chunk is at the start of the chunk. Refer to
A chunker is software that partitions plain text into sequences of semantically related words ( While a POS tagger only works on single words, the chunker uses the results of the POS tagger and covers sequences longer than one word.
A committer is a person who can put changes to LanguageTool on the LanguageTool repository in GitHub. The changes can be to the software, the rules, or to both. Refer also to language maintainer.
configuration file
A configuration file is a file that stores a user's settings or preferences. In LanguageTool, the configuration file is .languagetool.cfg in the user's home directory.
Constraint Grammar
Constraint Grammar is a language-independent method of parsing text. Refer to
corpus query language
A corpus query language is a method for extracting text samples from a corpus of text. []
curl is command-line software that uses URL syntax to get or to send files. Refer to
disambiguation.xml is the LanguageTool file that contains the rules for a disambiguator. Not all languages have a disambiguation.xml file. Refer to
A disambiguator is software that tries to determine the part of speech that a word has in a particular context. (In many languages, a word can have more than one part of speech. For example, in English, the word help is both a noun and a verb. In the sentence, "Give me help," the word help is a noun.) In LanguageTool, if a language has a rule-based disambiguator, then the rules are in the disambiguation.xml file.
Eclipse is software that helps software developers to write programs in the Java programming language. Refer to
An element is part of an XML document. Each XML document contains a hierarchy of elements. Each element represents the structure of some information. For example, in the grammar.xml file, each rule element contains information about a particular grammar rule. An element can have one or more attributes.
false friend
False friends are a pair of words or phrases in 2 different languages that look or sound almost the same, but which have different meanings. For example, the English word gift means poison in German.
filter (verb)
In the context of disambiguation, to filter means to remove all but one specified POS tag. Refer to
finite-state automaton (FSA)
A finite-state automaton is a device that can be in one of many states. In certain conditions, the FSA can change to a different state. Refer to and to
finite-state machine (FSM)
Refer to finite-state automaton.
Refer to finite-state automaton.
GitHub is a website that helps software developers to collaborate, to review software code, and to manage software code. The software code for LanguageTool is on
grammar.xml is the LanguageTool file that contains the error detection rules that LanguageTool uses in its evaluation of text. Refer to
A GUI is a 'graphical user interface'. Typically, a screen shows small images, boxes in which to enter text, and buttons to click. Refer to
Hunspell is a spell checker and a morphological analyzer. Hunspell is designed for languages that have rich morphology, complex compounds, and complex character encoding. Refer to and to
To immunize means to prevent LanguageTool from matching one or more words. Refer to
inflected (adj)
Refer to inflection.
An inflection is the form of a particular lemma. For example, In English, the lemma break has the inflections break, breaks, broke, broken, and breaking.
Internationalization Tag Set (ITS)
The Internationalization Tag Set (ITS) is a technology that helps people to create XML that is internationalized and that can be localized effectively. Refer to and to
Refer to Internationalization Tag Set (ITS).
Java is a programming language that is also used to implement LanguageTool. Refer to
Javadoc is software from Oracle that uses comments in the source code to create API documentation in HTML format. Refer to
language code
A language code is a code that represents a language, and possibly, a variant of a language. For example, the language code for German is de. Different standards for language codes exist. LanguageTool uses IETF language tags.
language maintainer
A language maintainer is a committer who maintains the rules in LanguageTool for one or more languages and who translates the LanguageTool GUI.
A lemma is the 'base form' of a word or of a group of words that is one lexical unit (lexeme). English examples: help, laser printer, put up with.
LGPL (GNU Lesser General Public License) is a licence to use software. Refer to
Search engine library used by internally LanguageTool for some of the Wikipedia-related features. See
markup language
A markup language is a computer language that uses special text to specify the structure or the style of the content of a document. Refer to element; XML.
Maven is a 'build automation tool' for software developers. Maven compiles and packages software, and manages dependencies. Refer to
the Morfologik stemming library is used by LT for dictionary lookups. Morfologik allows compressing large text dictionaries into small binary files with fast word lookup.
noun chunk (noun phrase)
A noun chunk is one or more words that acts as a noun. Refer to LanguageTool identifies noun chunks using a chunker. Refer to
noun phrase
Refer to noun chunk.
part-of-speech tag
Refer to POS tag.
part-of-speech tagger
Refer to POS tagger.
POS is an abbreviation for part of speech.
POS tag (part-of-speech tag)
A POS tag is a tag that identifies the possible parts of speech that a word has. Usually, in LanguageTool, a part-of-speech tag also includes other information, such as whether a noun is singular or plural.
POS tagger
A POS tagger is software that annotates (tags) words with part-of-speech information. A POS tagger gives each word one or more POS tags. Refer to
property file
reading Many English words can be, for example, a verb or a noun, depending on their context, so they have to readings. Example: "walk" can be a verb ("I walk home") or a noun ("I took a walk").
regular expression
A regular expression is text that specifies a search pattern. Refer to
A rulegroup is an element in the grammar.xml file. If more than one rule is necessary to find an error, you can put all the applicable rules into a rulegroup. Refer to
shallow parser
The term shallow parser is a synonym for the term chunker.
skip (verb)
To skip means optionally to ignore one or more tokens in a sequence of tokens. For example, possibly you want to ignore a punctuation mark if it exists in a sequence of text. Refer to
SRX (Segmentation Rules eXchange) is a method of specifying the segmentation rules that software uses to split text into segments. (Typically, a segment is equivalent to a sentence.) Refer to
Refer to POS tagger.
testrules is software that you can use to make sure that the rules that you write in LanguageTool are correct. Refer to
Tika™ is software that detects and extracts metadata (data about data) and structured text content from documents. Tika is supplied by Apache. Refer to
A token is an instance of a sequence of characters in some particular document that are grouped together as a useful semantic unit for processing ( Typically, in LanguageTool, a token is equivalent to a word. However, punctuation marks are also tokens.
Transifex is an online translation-management system. The translations for LanguageTool are on
Unification is the matching of sequences of tokens that have some specified features. Refer to
Refer to unification.
verb chunk
[In LT, what is a verb chunk?]
verb phrase
Refer to verb chunk.
XML is a markup language. Refer to In LanguageTool, the language rules are specified using XML. Refer to disambiguation.xml; grammar.xml;
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License