Xml Pattern Rule Extensions

dev.languagetool.org - this page is archived and is not updated anymore

Error rules

Disambiguator

Grouping tokens

Add a new action: chunk. It should specify:

a chunk name (obligatory)
a syntactical group head (optional)
a semantical group head (optional)

This way, we could have more powerful grouping than in the trivial MultiWordChunker – basically, rule-based chunking.

If there are two tokens, the first token should be flagged as the beginning of the chunk, and the second as the end of the chunk. For more than two tokens, flag all tokens from the second until the last but one as the internal part of the chunk.

All patterns

Use antipattern for disambiguation rules?

Add linguistic attributes and values as token attributes

They would operate on parsed POS tag values. For example, <token pos_value="pos='noun',gender='f',number='pl'/>.

Add `match` support inside `exception`s

There is a possible workaround. Instead:

   <token regexp="yes">vue?s?<exception>\2</exception></token>

one can write:

   <and>
        <token regexp="yes">vue?s?</token>
        <token negate="yes"><match no="1"/></token>
     </and>

Add `and` support inside `exception`s

Using <and> with exception looks to be quite natural for LT xml syntax. This also reduces context breaking, as far as <antipattern> can be located rather far from the token (e.g. in the rulegroup description or the previous tokens can have an extensive exceptions list) and exceptions are usually just in place.

It can be easier to write and read exceptions if it was allowed to use <and> to group them (so the group triggers if all listed exceptions in the group are triggered). The possible example for usage after implementation:

It can be easier to write and read exceptions if it was allowed to use <and> to group them (so the group triggers if all listed exceptions in the group are triggered). The boiled out example compiles, however, it seems <and> has no grouping power, I can remove <and> to have the same result.

<token postag="NN:.*|NNN:.*" postag_regexp="yes">
    <and>
        <exception scope="previous"
               regexp="yes">&human;</exception>
        <exception postag="(NN|NNN):.*:Nom" postag_regexp="yes" />
    </and>
</token>

page revision: 17, last edited: 24 Sep 2020 07:48

Edit Tags History Files Print Site tools + Options

LanguageTool Wiki

Open Source proof-reading tool

Add a new page

This wiki has been moved to https://dev.languagetool.org - this page is archived and is not updated anymore

Error rules

Disambiguator

Grouping tokens

All patterns

Add linguistic attributes and values as token attributes

Add `match` support inside `exception`s

Add `and` support inside `exception`s

LanguageTool Wiki

Open Source proof-reading tool

Add a new page

This wiki has been moved to https://dev.languagetool.org - this page is archived and is not updated anymore

Error rules

Disambiguator

Grouping tokens

All patterns

Add linguistic attributes and values as token attributes

Add match support inside exceptions

Add and support inside exceptions

Add `match` support inside `exception`s

Add `and` support inside `exception`s