Checking Translations or Bilingual Texts

LanguageTool is able to check bilingual texts (bitexts). The bitexts should be aligned on the level of sentences (not paragraphs, as rules could generate more false alarms).

Rules in bitext mode

In bitext mode, the following rules are used:

  • false friend rules: the rules are matched only when both source and target contain the false friend terms
  • rules for target language
  • generic bitext rules (in Java)
  • bilingual rules for target rules (if they are found in the /rules/[language]/bitext.xml file, where [language] is a two-letter code of the language).

The generic bitext rules include the rules that check:

  • if the translation length of the target is not radically different from the source (TRANSLATION_LENGTH)
  • if the translation is not the same for multiple-word sentences (SAME_TRANSLATION)

In the future, when the general mechanism for adding custom rules will be implemented, any custom rules will be added as well.

Checking from the command line

To check bitext from command line, you need to use -m to specify source language, -l to specify the target language, and -b2 to switch the bitext mode. The rest of the command line stays as before.

NOTE: in version 1.1, the only input format on the command-line is the tabbed text file, where the first field contains the source, and the second - the target. Programatically, you can also use WordFast Translation Memory, but it's not available yet on the command line.

OmegaT

OmegaT has a plugin that can be installed in the OmegaT main directory (you need to use the downloaded version, not the web-start version, though). See the README of the plugin for details. The plugin allows to check translation on the go.

CheckMate

LanguageTool may be used to check bilingual data in many more formats with the use of CheckMate, which also has a nice user interface:

lt-checkmate.png

API

HTTP Server

Use the standard format of the query: specify your source segment text as srctext parameter, your target segment as text, source language as mothertongue, and target language as lang.

Java

To use bitext checking, you need to create two JLanguageTool instances (one for source language, and one for target language), and then use the utility methods from the Tools class. For example, if sourceText is a source string, text as target string, then you may call LT this way:

final JLanguageTool sourceLt = getLanguageToolInstance(motherTongue, null);
final JLanguageTool targetLt = getLanguageToolInstance(lang, null);
final List<BitextRule> bRules = Tools.getBitextRules(motherTongue, lang);
final List<RuleMatch> matches = Tools.checkBitext(sourceText, text, sourceLt, targetLt, bRules);

In the Tools class, there are several other overloaded versions of checkBitext (including some that produce text on the standard output device directly), but the method called above may be used from other software to get a list of RuleMatches that specify all errors found.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License