- All Implemented Interfaces:
public class SimpleRegexLexer
- extends java.lang.Object
- implements Lexer
This is a "dynamic" Lexer that will use Regex patterns to parse any document,
It is NOT as fast as other JFLex generated lexers.
The current implementation is about 20x slower than a JFLex lexer
(5000 lines in 100ms, vs 5ms for JFlex lexer)
This is still usable for a few 100 lines. 500 lines parse in about 10ms.
It also depends on how complex the Regexp and how many of them will actually
provide a match.
Since KEYWORD TokenType is by order less than IDENTIFIER, the higher
precedence of KEYWORD token will be used, even if the same regex matches
an IDENTIFIER. This is a neat side-effect of the ordering of the TokenTypes.
We now just need to add any non-overlapping matches. And since longer matches
are found first, we will properly match the longer identifiers which start with
This behaviour can easily be modified by overriding the
- Ayman Al-Sairafi
|Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
public SimpleRegexLexer(java.util.Map<?,?> props)
public SimpleRegexLexer(java.lang.String propsLocation)
public void parse(javax.swing.text.Segment segment,
- Description copied from interface:
- This is the only method a Lexer needs to implement. It will be passed
a Reader, and it should return non-overlapping Tokens for each recognized token
in the stream.
- Specified by:
parse in interface
segment - Text to parse.
ofst - offset to add to start of each token (useful for nesting)
tokens - List of Tokens to be added. This is done so that the caller creates the
appropriate List implementation and size. The parse method just adds to the list
public SimpleRegexLexer putPattern(TokenType type,
public SimpleRegexLexer putPatterns(java.util.Map<?,?> props)