Checking (La)TeX With LanguageTool

This wiki has been moved to https://dev.languagetool.org - this page is archived and is not updated anymore

LanguageTool does not support stripping TeX formatting markup (in contradistinction to ispell/aspell/hunspell) just because doing it properly would require writing a full-blown parser for TeX (whose markup is Turing-complete, hence hard to parse). But there are options you can use.

Using TeXtidote

A command-line tool that works on LaTeX files: https://sylvainhalle.github.io/textidote/

Using Tex2txt

Use this Python script

Using TeXStudio

Note You need TeXstudio 2.12.2 or later to work with LT 3.6 or later

TexStudio supports LanguageTool as an inline grammar checker. Note that you need to configure TXS first, by using Options > Configure TexStudio… > Language Checking like this:

texstudio.png

You should start the LanguageTool stand-alone version, languagetool.jar. Then simply open Options and select the option to run it as HTTP server. By default, it uses port 8081, which is also the default for TXS:

lt-http.png

If TexStudio is configured to use inline grammar checking (which is the default, see Options > Configure TexStudio… > Editor > Inline Checking > Grammar), you should see grammar checks immediately:

txs1.png

As you can see, there are still some quirks (\cite is incorrectly treated in the example above) but it's still much easier to use than other options.

Using OmegaT with LanguageTool plugin

You can actually correct the .tex source directly using LanguageTool, by using OmegaT, a computer-assisted translation (CAT) tool. Basically, what I will describe below is a simple setup that creates an artificial "translation" of your .tex file: actually, the corrected .tex file in this case will be the target translation. Note: OmegaT does include spell-checking, you simply need to install the dictionaries.

  1. Get OmegaT and install it locally. Note down the location of the install.
  2. Get OmegaT LanguageTool plugin. Install the plugin into the plugin directory of the OmegaT installation on your computer.
  3. Get the script files.

You will need to run the following script from command line in the folder where your .tex file is located:

@echo off
mkdir latexcheck
mkdir latexcheck\source
mkdir latexcheck\target
mkdir latexcheck\glossary
mkdir latexcheck\tm
mkdir latexcheck\tm\auto
mkdir latexcheck\dictionary
mkdir latexcheck\omegat
copy omegat.project latexcheck
copy %1 latexcheck\source\%1
copy filters.xml latexcheck\omegat
java -jar "C:\Program Files (x86)\OmegaT\OmegaT.jar" latexcheck --mode=console-createpseudotranslatetmx --pseudotranslatetmx=latexcheck\tm\tm.tmx --pseudotranslatetype=equal
java -jar "C:\Program Files (x86)\OmegaT\OmegaT.jar" latexcheck

The script takes the .tex file as its argument, and creates a pseudotranslation project for OmegaT. Then it opens it for grammar checking. If you are using a Unix-like operating system, you need to create a shell script but it should be pretty straightforward. Feel free to send the file to me if you changed my script this way.

When the OmegaT is open, you simply move around sentences (or "segments") by using the command to move to the next segment (usually Ctrl+N). After you have done correcting, you need to create a target file (usually, it's simply Ctrl+D), and in the latexcheck\target directory, a corrected file will be created. Obviously, the install path of OmegaT is taken from my computer and you need to tweak it if you installed OmegaT in some other place. Also, the script includes two other files: omegat.project, whose job is to specify the language of the .tex file:

        <source_lang>PL-PL</source_lang>
        <target_lang>PL-PL</target_lang>

Replace "PL-PL" with a language of your document, for example, "EN_US".

Another file is filters.xml. It is useful only if your files contain UTF-8 characters. Here is the important snippet:

    <filter enabled="true" className="org.omegat.filters2.latex.LatexFilter">
        <files targetEncoding="UTF-8" sourceEncoding="UTF-8" targetFilenamePattern="${filename}" sourceFilenameMask="*.tex"/>
        <files targetEncoding="UTF-8" sourceEncoding="UTF-8" targetFilenamePattern="${filename}" sourceFilenameMask="*.latex"/>
    </filter>

If you use your system's encoding, you can remove it, along with this line of this script above:

copy filters.xml latexcheck\omegat

OmegaT does not include a fool-proof parser of TeX but it does a pretty good job. Anyway, if you see any problems with its treatment of your .tex files, simply go to OmegaT support and file an issue.

Using Tex4ht

You can convert your (La)TeX source to html using Tex4ht and check the html file. LanguageTool strips all HTML or XML markup if you specify --xmlfilter (although this option has its bugs), so it should be an easy solution. You may also convert .tex files to OpenDocument and then use our extension for LibreOffice: this allows to check both on-the-fly and using a dialog box (if you don't like our simple GUI).

Using opendetex

Some people use detex to convert TeX files to pure text but your mileage may vary: detex may work incorrectly for many kinds of input. However, opendetex seems to do the job quite satisfactorily. Again, you cannot edit the .tex source directly.

Using LyX and LanguageTool plugin

If your document is in English and your document class fairly simple, you might try to convert your document to lyx, and then try the LanguageTool plugin for LyX. Note however that the plugin does not support other languages than English. It doesn't seem to work under Windows either. Also, the conversion from .tex to .lyx is particularly tricky if you are using non-windows encoding under Windows, and will not work for really complex documents. The plugin definitely needs some refreshing as well.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License