NAME
Text::TokenStream - lexer to break text up into user-defined tokens
SYNOPSIS
my $lexer = Text::TokenStream::Lexer->new(
whitespace => [qr/\s+/],
rules => [
word => qr/\w+/,
sym => qr/[^\w\s]+/,
],
);
my $stream = Text::TokenStream->new(
lexer => $lexer,
input => "foo *",
);
my $tok1 = $stream->next; # --> "word" token containing "foo"
my $tok2 = $stream->next; # --> "sym" token containing "*"
DESCRIPTION
This class is part of a collection of classes that act together to lex
(aka scan) an input text into a stream of tokens.
This token stream class provides the stream interface, along with a
notion of the "current position" in the input text, and position-aware
error reporting. It composes Text::TokenStream::Role::Stream; that role
lists the methods this class provides (so that you can easily write a
parser class that has a token stream which in turn handles the
tokenizer methods).
The basic lexer machinery is found in Text::TokenStream::Lexer; it is
separated out from the token stream so that it can be reused across
many inputs.
Tokens are instances of a class, Text::TokenStream::Token by default.
CONSTRUCTOR
This class uses Moo, and inherits the standard new constructor.
ATTRIBUTES
lexer
An instance of Text::TokenStream::Lexer; required; read-only. Will be
used to find tokens in the input.
input
Str; required; read-only. The text that will be lexed into a stream of
tokens.
input_name
A Maybe[Path]; read-only. Can be coerced from a string. If a defined
value is present, it should contain the name of the file that the input
was read from, and that name will be used in any error messages.
token_class
The name of a class that inherits from Text::TokenStream::Token;
defaults to Text::TokenStream::Token itself; read-only. Tokens found in
the input will be constructed as instances of this class.
OTHER METHODS
collect_all
Takes no arguments. Returns a list of all remaining tokens found in the
input.
In the current implementation, this method is provided by
Text::TokenStream::Role::Stream.
collect_upto
Takes a single argument indicating a token to match, as with
Text::TokenStream::Token#matches. Scans through the input until it
finds a token that matches the argument, and returns a list of all
tokens before the matching one. If no remaining token in the input
matches the argument, behaves as "collect_all".
In the current implementation, this method is provided by
Text::TokenStream::Role::Stream.
create_token
Takes a listified hash of token attributes, and creates a token
instance. The token object is created by calling:
$self->token_class->new(%data);
If you have particularly complex needs, you may wish to override this
method in a subclass.
current_position
Takes no arguments. Returns the 0-based position of the first input
character that hasn't yet been returned by "next".
err
Takes multiple arguments, that are concatenated into an error message.
(If no arguments are supplied, acts as if you'd supplied the string
"Something's wrong".) Throws an exception, reporting the locus of the
error as the current input position (using 1-based line and column
numbers).
fill
Takes a single positive-integer argument. Attempts to fill an internal
buffer of already-lexed tokens so that it contains that many tokens.
Returns a boolean that is true iff there were enough tokens to do that.
looking_at
Takes zero or more arguments, each of which indicates a token to match,
as with Text::TokenStream::Token#matches. Returns a boolean that is
true iff there's at least one more token in the input, and it matches
the argument.
next
Takes no arguments. Returns the next token found in the input, and
advances the current position past it; if no tokens remain, returns
undef. The token instance is created by "create_token".
next_of
Takes a single argument indicating a token to match, as with
Text::TokenStream::Token#matches, and an optional string argument
describing the current position (for example, "in expression", or
"after keyword"). If there are no more tokens in the input, reports an
error at the current position, using "err". Otherwise, if the next
token doesn't match the argument, reports an error at the position of
that token, using "token_err". Otherwise, the next token matches what
is being looked for, so that token is returned.
peek
Takes no arguments. Returns the next token that would be returned by
"next", but doesn't advance the current input position, and a
subsequent "next" call will return the same token.
An internal buffer is used to ensure that every token is lexed only
once.
skip_optional
Takes a single argument indicating a token to match, as with
Text::TokenStream::Token#matches. If there are no more tokens in the
input, or the next token doesn't match the argument, returns false;
otherwise, advances past the next token, and returns true.
token_err
Takes a token as an argument, followed by multiple arguments that are
concatenated into an error message. (If no non-token arguments are
supplied, acts as if you'd supplied the string "Something's wrong".)
Throws an exception, reporting the locus of the error as the position
of the token (using 1-based line and column numbers).
AUTHOR
Aaron Crane, <
[email protected]>
COPYRIGHT
Copyright 2021 Aaron Crane.
LICENCE
This library is free software and may be distributed under the same
terms as perl itself. See
http://dev.perl.org/licenses/.