public abstract class Parser<T> extends Object
Parser
takes as input a
CharSequence
source and parses it when the parse(CharSequence)
method is called.
A value of type T
will be returned if parsing succeeds, or a ParserException
is thrown to indicate parsing error. For example:
Parser<String> scanner = Scanners.IDENTIFIER;
assertEquals("foo", scanner.parse("foo"));
Parser
s run either on character level to scan the source, or on token level to parse
a list of Token
objects returned from another parser. This other parser that returns the
list of tokens for token level parsing is hooked up via the from(Parser, Parser)
or from(Parser)
method.
The following are important naming conventions used throughout the library:
Token
is called a lexer.
index
parameters are 0-based indexes in the original source.
Parser.Mode.DEBUG
mode to
parse(CharSequence, Mode)
and inspect the result in
ParserException.getParseTree()
. All labeled
parsers will generate a node
in the exception's parse tree, with matched indices in the source.Modifier and Type | Class and Description |
---|---|
static class |
Parser.Mode
Defines the mode that a parser should be run in.
|
static class |
Parser.Reference<T>
An atomic mutable reference to
Parser used in recursive grammars. |
Modifier and Type | Method and Description |
---|---|
Parser<Optional<T>> |
asOptional()
p.asOptional() is equivalent to p? in EBNF. |
Parser<List<T>> |
atLeast(int min)
|
Parser<T> |
atomic()
A
Parser that undoes any partial match if this fails. |
Parser<T> |
between(Parser<?> before,
Parser<?> after)
|
<R> Parser<R> |
cast()
|
Parser<List<T>> |
endBy(Parser<?> delim)
|
Parser<List<T>> |
endBy1(Parser<?> delim)
|
Parser<Boolean> |
fails()
|
Parser<T> |
followedBy(Parser<?> parser)
|
Parser<T> |
from(Parser<?> tokenizer,
Parser<Void> delim)
A
Parser that takes as input the tokens returned by tokenizer delimited by
delim , and runs this to parse the tokens. |
Parser<T> |
from(Parser<? extends Collection<Token>> lexer)
|
<R> Parser<R> |
ifelse(java.util.function.Function<? super T,? extends Parser<? extends R>> consequence,
Parser<? extends R> alternative)
|
<R> Parser<R> |
ifelse(Parser<? extends R> consequence,
Parser<? extends R> alternative)
|
Parser<T> |
infixl(Parser<? extends java.util.function.BiFunction<? super T,? super T,? extends T>> operator)
A
Parser for left-associative infix operator. |
Parser<T> |
infixn(Parser<? extends java.util.function.BiFunction<? super T,? super T,? extends T>> op)
A
Parser that parses non-associative infix operator. |
Parser<T> |
infixr(Parser<? extends java.util.function.BiFunction<? super T,? super T,? extends T>> op)
A
Parser for right-associative infix operator. |
Parser<T> |
label(String name)
|
Parser<List<Token>> |
lexer(Parser<?> delim)
A
Parser that greedily runs this repeatedly, and ignores the pattern recognized by delim
before and after each occurrence. |
Parser<List<T>> |
many()
p.many() is equivalent to p* in EBNF. |
Parser<List<T>> |
many1()
p.many1() is equivalent to p+ in EBNF. |
<R> Parser<R> |
map(java.util.function.Function<? super T,? extends R> map)
|
static <T> Parser.Reference<T> |
newReference()
Creates a new instance of
Parser.Reference . |
<To> Parser<To> |
next(java.util.function.Function<? super T,? extends Parser<? extends To>> map)
A
Parser that executes this , maps the result using map to another Parser object
to be executed as the next step. |
<R> Parser<R> |
next(Parser<R> parser)
|
Parser<?> |
not()
A
Parser that fails if this succeeds. |
Parser<?> |
not(String unexpected)
A
Parser that fails if this succeeds. |
Parser<T> |
notFollowedBy(Parser<?> parser)
|
Parser<T> |
optional()
Deprecated.
since 3.0. Use
#optional(null) or asOptional() instead. |
Parser<T> |
optional(T defaultValue)
|
Parser<T> |
or(Parser<? extends T> alternative)
p1.or(p2) is equivalent to p1 | p2 in EBNF. |
Parser<T> |
otherwise(Parser<? extends T> fallback)
a.otherwise(fallback) runs fallback when a matches zero input. |
T |
parse(CharSequence source)
Parses
source . |
T |
parse(CharSequence source,
Parser.Mode mode)
Parses
source under the given mode . |
T |
parse(CharSequence source,
String moduleName)
Deprecated.
Please use
parse(CharSequence) instead. |
T |
parse(Readable readable)
Parses source read from
readable . |
T |
parse(Readable readable,
String moduleName)
Deprecated.
Please use
parse(Readable) instead. |
ParseTree |
parseTree(CharSequence source)
Parses
source and returns a ParseTree corresponding to the syntactical
structure of the input. |
Parser<T> |
peek()
A
Parser that runs this and undoes any input consumption if succeeds. |
Parser<T> |
postfix(Parser<? extends java.util.function.Function<? super T,? extends T>> op)
|
Parser<T> |
prefix(Parser<? extends java.util.function.Function<? super T,? extends T>> op)
|
Parser<T> |
reluctantBetween(Parser<?> before,
Parser<?> after)
Deprecated.
This method probably only works in the simplest cases. And it's a character-level
parser only. Use it at your own risk. It may be deleted later when we find a better way.
|
<R> Parser<R> |
retn(R value)
|
Parser<List<T>> |
sepBy(Parser<?> delim)
|
Parser<List<T>> |
sepBy1(Parser<?> delim)
|
Parser<List<T>> |
sepEndBy(Parser<?> delim)
|
Parser<List<T>> |
sepEndBy1(Parser<?> delim)
|
Parser<Void> |
skipAtLeast(int min)
|
Parser<Void> |
skipMany()
p.skipMany() is equivalent to p* in EBNF. |
Parser<Void> |
skipMany1()
p.skipMany1() is equivalent to p+ in EBNF. |
Parser<Void> |
skipTimes(int n)
|
Parser<Void> |
skipTimes(int min,
int max)
A
Parser that runs this parser for at least min times and up to max times, with
all the return values ignored. |
Parser<String> |
source()
A
Parser that returns the matched string in the original source. |
Parser<Boolean> |
succeeds()
|
Parser<List<T>> |
times(int n)
|
Parser<List<T>> |
times(int min,
int max)
|
Parser<Token> |
token()
|
Parser<List<T>> |
until(Parser<?> parser)
A
Parser that matches this parser zero or many times
until the given parser succeeds. |
Parser<WithSource<T>> |
withSource()
A
Parser that returns both parsed object and matched string. |
public static <T> Parser.Reference<T> newReference()
Parser.Reference
.
Used when your grammar is recursive (many grammars are).public final <R> Parser<R> retn(R value)
public final <To> Parser<To> next(java.util.function.Function<? super T,? extends Parser<? extends To>> map)
Parser
that executes this
, maps the result using map
to another Parser
object
to be executed as the next step.public final Parser<List<T>> until(Parser<?> parser)
Parser
that matches this parser zero or many times
until the given parser succeeds. The input that matches the given parser
will not be consumed. The input that matches this parser will
be collected in a list that will be returned by this function.public final Parser<Void> skipMany()
p.skipMany()
is equivalent to p*
in EBNF. The return values are discarded.public final Parser<Void> skipMany1()
p.skipMany1()
is equivalent to p+
in EBNF. The return values are discarded.public final Parser<Void> skipTimes(int min, int max)
Parser
that runs this
parser for at least min
times and up to max
times, with
all the return values ignored.public final <R> Parser<R> map(java.util.function.Function<? super T,? extends R> map)
public final Parser<T> or(Parser<? extends T> alternative)
p1.or(p2)
is equivalent to p1 | p2
in EBNF.alternative
- the alternative parser to run if this fails.public final Parser<T> otherwise(Parser<? extends T> fallback)
a.otherwise(fallback)
runs fallback
when a
matches zero input. This is different
from a.or(alternative)
where alternative
is run whenever a
fails to match.
One should usually use or(org.jparsec.Parser<? extends T>)
.
fallback
- the parser to run if this
matches no input.@Deprecated public final Parser<T> optional()
#optional(null)
or asOptional()
instead.p.optional()
is equivalent to p?
in EBNF. null
is the result when
this
fails with no partial match.public final Parser<Optional<T>> asOptional()
p.asOptional()
is equivalent to p?
in EBNF. Optional.empty()
is the result when this
fails with no partial match. Note that Optional
prohibits nulls so make sure this
does not result in null
.public final Parser<?> not()
Parser
that fails if this
succeeds. Any input consumption is undone.public final Parser<?> not(String unexpected)
Parser
that fails if this
succeeds. Any input consumption is undone.unexpected
- the name of what we don't expect.public final Parser<T> peek()
Parser
that runs this
and undoes any input consumption if succeeds.public final Parser<T> atomic()
Parser
that undoes any partial match if this
fails. In other words, the
parser either fully matches, or matches none.public final <R> Parser<R> ifelse(Parser<? extends R> consequence, Parser<? extends R> alternative)
public final <R> Parser<R> ifelse(java.util.function.Function<? super T,? extends Parser<? extends R>> consequence, Parser<? extends R> alternative)
public final <R> Parser<R> cast()
this
to a Parser
of type R
. Use it only if you know the parser actually returns
value of type R
.public final Parser<T> between(Parser<?> before, Parser<?> after)
Parser
that runs this
between before
and after
. The return value of this
is preserved.
Equivalent to Parsers.between(Parser, Parser, Parser)
, which preserves the natural order of the
parsers in the argument list, but is a bit more verbose.
@Deprecated public final Parser<T> reluctantBetween(Parser<?> before, Parser<?> after)
Parser
that first runs before
from the input start,
then runs after
from the input's end, and only
then runs this
on what's left from the input.
In effect, this
behaves reluctantly, giving
after
a chance to grab input that would have been consumed by this
otherwise.public final Parser<T> prefix(Parser<? extends java.util.function.Function<? super T,? extends T>> op)
public final Parser<T> postfix(Parser<? extends java.util.function.Function<? super T,? extends T>> op)
Parser
that runs this
and then runs op
for 0 or more times greedily.
The Function
objects returned from op
are applied from left to right to the return
value of p.
This is the preferred API to avoid StackOverflowError
in left-recursive parsers.
For example, to parse array types in the form of "T[]" or "T[][]", the following
left recursive grammar will fail:
Terminals terms = Terminals.operators("[", "]");
Parser.Reference<Type> ref = Parser.newReference();
ref.set(Parsers.or(leafTypeParser,
Parsers.sequence(ref.lazy(), terms.phrase("[", "]"), new Unary<Type>() {...})));
return ref.get();
A correct implementation is:
Terminals terms = Terminals.operators("[", "]");
return leafTypeParer.postfix(terms.phrase("[", "]").retn(new Unary<Type>() {...}));
A not-so-obvious example, is to parse the expr ? a : b
ternary operator. It too is a
left recursive grammar. And un-intuitively it can also be thought as a postfix operator.
Basically, we can parse "? a : b" as a whole into a unary operator that accepts the condition
expression as input and outputs the full ternary expression:
Parser<Expr> ternary(Parser<Expr> expr) {
return expr.postfix(
Parsers.sequence(
terms.token("?"), expr, terms.token(":"), expr,
(unused, then, unused, orelse) -> cond ->
new TernaryExpr(cond, then, orelse)));
}
OperatorTable
also handles left recursion transparently.
p.postfix(op)
is equivalent to p op*
in EBNF.
public final Parser<T> infixn(Parser<? extends java.util.function.BiFunction<? super T,? super T,? extends T>> op)
Parser
that parses non-associative infix operator.
Runs this
for the left operand, and then
runs op
and this
for the operator and the right operand optionally.
The BiFunction
objects
returned from op
are applied to the return values of the two operands, if any.
p.infixn(op)
is equivalent to p (op p)?
in EBNF.
public final Parser<T> infixl(Parser<? extends java.util.function.BiFunction<? super T,? super T,? extends T>> operator)
Parser
for left-associative infix operator. Runs this
for the left operand, and then runs
operator
and this
for the operator and the right operand for 0 or more times greedily.
The BiFunction
objects returned from operator
are applied from left to right to the
return values of this
, if any. For example:
a + b + c + d
is evaluated as (((a + b)+c)+d)
.
p.infixl(op)
is equivalent to p (op p)*
in EBNF.
public final Parser<T> infixr(Parser<? extends java.util.function.BiFunction<? super T,? super T,? extends T>> op)
Parser
for right-associative infix operator. Runs this
for the left operand,
and then runs op
and this
for the operator and the right operand for
0 or more times greedily.
The BiFunction
objects returned from op
are applied from right to left to the
return values of this
, if any. For example: a + b + c + d
is evaluated as
a + (b + (c + d))
.
p.infixr(op)
is equivalent to p (op p)*
in EBNF.
public final Parser<Token> token()
Parser
that runs this
and wraps the return value in a Token
.
It is normally not necessary to call this method explicitly. lexer(Parser)
and from(Parser,
Parser)
both do the conversion automatically.
public final Parser<String> source()
Parser
that returns the matched string in the original source.public final Parser<WithSource<T>> withSource()
Parser
that returns both parsed object and matched string.public final Parser<T> from(Parser<? extends Collection<Token>> lexer)
Parser
that takes as input the Token
collection returned by lexer
,
and runs this
to parse the tokens. Most parsers should use the simpler
from(Parser, Parser)
instead.
this
must be a token level parser.
public final Parser<T> from(Parser<?> tokenizer, Parser<Void> delim)
Parser
that takes as input the tokens returned by tokenizer
delimited by
delim
, and runs this
to parse the tokens. A common misunderstanding is that
tokenizer
has to be a parser of Token
. It doesn't need to be because
Terminals
already takes care of wrapping your logical token objects into physical
Token
with correct source location information tacked on for free. Your token object
can literally be anything, as long as your token level parser can recognize it later.
The following example uses Terminals.tokenizer()
:
Terminals terminals = ...; return parser.from(terminals.tokenizer(), Scanners.WHITESPACES.optional()).parse(str);And tokens are optionally delimited by whitespaces.
Optionally, you can skip comments using an alternative scanner than WHITESPACES
:
Terminals terminals = ...;
Parser<?> delim = Parsers.or(
Scanners.WHITESPACE,
Scanners.JAVA_LINE_COMMENT,
Scanners.JAVA_BLOCK_COMMENT).skipMany();
return parser.from(terminals.tokenizer(), delim).parse(str);
In both examples, it's important to make sure the delimiter scanner can accept empty string
(either through optional()
or skipMany()
), unless adjacent operator
characters shouldn't be parsed as separate operators.
i.e. "((" as two left parenthesis operators.
this
must be a token level parser.
public Parser<List<Token>> lexer(Parser<?> delim)
Parser
that greedily runs this
repeatedly, and ignores the pattern recognized by delim
before and after each occurrence. The result tokens are wrapped in Token
and are collected and returned
in a List
.
It is normally not necessary to call this method explicitly. from(Parser, Parser)
is more convenient
for simple uses that just need to connect a token level parser with a lexer that produces the tokens. When more
flexible control over the token list is needed, for example, to parse indentation sensitive language, a
pre-processor of the token list may be needed.
this
must be a tokenizer that returns a token value.
public final T parse(CharSequence source)
source
.public final T parse(Readable readable) throws IOException
readable
.IOException
public final T parse(CharSequence source, Parser.Mode mode)
source
under the given mode
. For example: try { parser.parse(text, Mode.DEBUG); } catch (ParserException e) { ParseTree parseTree = e.getParseTree(); ... }
public final ParseTree parseTree(CharSequence source)
source
and returns a ParseTree
corresponding to the syntactical
structure of the input. Only labeled
parser nodes are represented in the parse
tree.
If parsing failed, ParserException.getParseTree()
can be inspected for the parse
tree at error location.
@Deprecated public final T parse(CharSequence source, String moduleName)
parse(CharSequence)
instead.source
.source
- the source stringmoduleName
- the name of the module, this name appears in error message@Deprecated public final T parse(Readable readable, String moduleName) throws IOException
parse(Readable)
instead.readable
.readable
- where the source is read frommoduleName
- the name of the module, this name appears in error messageIOException
Copyright © 2013–2018 jparsec. All rights reserved.