Add a new action: chunk. It should specify:
- a chunk name (obligatory)
- a syntactical group head (optional)
- a semantical group head (optional)
This way, we could have more powerful grouping than in the trivial MultiWordChunker – basically, rule-based chunking.
If there are two tokens, the first token should be flagged as the beginning of the chunk, and the second as the end of the chunk. For more than two tokens, flag all tokens from the second until the last but one as the internal part of the chunk.
Use antipattern for disambiguation rules?
Add linguistic attributes and values as token attributes
They would operate on parsed POS tag values. For example, <token pos_value="pos='noun',gender='f',number='pl'/>.
Add match support inside exceptions
There is a possible workaround. Instead:
one can write:
<and> <token regexp="yes">vue?s?</token> <token negate="yes"><match no="1"/></token> </and>
Add and support inside exceptions
Using <and> with exception looks to be quite natural for LT xml syntax. This also reduces context breaking, as far as <antipattern> can be located rather far from the token (e.g. in the rulegroup description or the previous tokens can have an extensive exceptions list) and exceptions are usually just in place.
It can be easier to write and read exceptions if it was allowed to use <and> to group them (so the group triggers if all listed exceptions in the group are triggered). The possible example for usage after implementation:
It can be easier to write and read exceptions if it was allowed to use <and> to group them (so the group triggers if all listed exceptions in the group are triggered). The boiled out example compiles, however, it seems <and> has no grouping power, I can remove <and> to have the same result.
<token postag="NN:.*|NNN:.*" postag_regexp="yes"> <and> <exception scope="previous" regexp="yes">&human;</exception> <exception postag="(NN|NNN):.*:Nom" postag_regexp="yes" /> </and> </token>