Xml Pattern Rule Extensions

Error rules

Disambiguator

Grouping tokens

Add a new action: chunk. It should specify:

  • a chunk name (obligatory)
  • a syntactical group head (optional)
  • a semantical group head (optional)

This way, we could have more powerful grouping than in the trivial MultiWordChunker – basically, rule-based chunking.

If there are two tokens, the first token should be flagged as the beginning of the chunk, and the second as the end of the chunk. For more than two tokens, flag all tokens from the second until the last but one as the internal part of the chunk.

All patterns

Use antipattern for disambiguation rules?

Add linguistic attributes and values as token attributes

They would operate on parsed POS tag values. For example, <token pos_value="pos='noun',gender='f',number='pl'/>.

Add match support inside exceptions

There is a possible workaround. Instead:

   <token regexp="yes">vue?s?<exception>\2</exception></token>

one can write:

   <and>
        <token regexp="yes">vue?s?</token>
        <token negate="yes"><match no="1"/></token>
     </and>

Add and support inside exceptions

Using <and> with exception looks to be quite natural for LT xml syntax. This also reduces context breaking, as far as <antipattern> can be located rather far from the token (e.g. in the rulegroup description or the previous tokens can have an extensive exceptions list) and exceptions are usually just in place.

It can be easier to write and read exceptions if it was allowed to use <and> to group them (so the group triggers if all listed exceptions in the group are triggered). The possible example for usage after implementation:

It can be easier to write and read exceptions if it was allowed to use <and> to group them (so the group triggers if all listed exceptions in the group are triggered). The boiled out example compiles, however, it seems <and> has no grouping power, I can remove <and> to have the same result.

<token postag="NN:.*|NNN:.*" postag_regexp="yes">
    <and>
        <exception scope="previous"
               regexp="yes">&human;</exception>
        <exception postag="(NN|NNN):.*:Nom" postag_regexp="yes" />
    </and>
</token>
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License