\"      $NetBSD: yacc.1,v 1.10 2024/09/14 22:13:34 christos Exp $
\"
\" Copyright (c) 1989, 1990 The Regents of the University of California.
\" All rights reserved.
\"
\" This code is derived from software contributed to Berkeley by
\" Robert Paul Corbett.
\"
\" Redistribution and use in source and binary forms, with or without
\" modification, are permitted provided that the following conditions
\" are met:
\" 1. Redistributions of source code must retain the above copyright
\"    notice, this list of conditions and the following disclaimer.
\" 2. Redistributions in binary form must reproduce the above copyright
\"    notice, this list of conditions and the following disclaimer in the
\"    documentation and/or other materials provided with the distribution.
\" 3. Neither the name of the University nor the names of its contributors
\"    may be used to endorse or promote products derived from this software
\"    without specific prior written permission.
\"
\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
\" SUCH DAMAGE.
\"
\"      from: @(#)yacc.1        5.7 (Berkeley) 7/30/91
\"      from: Id: yacc.1,v 1.24 2014/10/06 00:03:48 tom Exp
\"      $NetBSD: yacc.1,v 1.10 2024/09/14 22:13:34 christos Exp $
\"
Dd September 14, 2024
Dt YACC 1
Os
Sh NAME
Nm yacc
Nd an
Tn LALR Ns (1)
parser generator
Sh SYNOPSIS
Nm
Op Fl BdhgilLPrtvVy
Op Fl b Ar file_prefix
Op Fl H Ar defines_file
Op Fl o Ar output_file
Op Fl p Ar symbol_prefix
Ar filename
Sh DESCRIPTION
Nm
reads the grammar specification in the file
Ar filename
and generates an
Tn LALR Ns (1)
parser for it.
The parsers consist of a set of
Tn LALR Ns (1)
parsing tables and a driver routine
written in the C programming language.
Nm
normally writes the parse tables and the driver routine to the file
Pa y.tab.c .
Pp
The following options are available:
Bl -tag -width Fl
It Fl b Ar file_prefix
The
Fl b
option changes the prefix prepended to the output file names to
the string denoted by
Ar file_prefix .
The default prefix is the character
Ql y .
It Fl B
Create a backtracking parser (compile-type configuration for
Nm ) .
It Fl d
causes the header file
Pa y.tab.h
to be written.
It contains
No #define Ns 's
for the token identifiers.
It Fl h
print a usage message to the standard error.
It Fl H Ar defines_file
causes
No #define Ns 's
for the token identifiers
to be written to the given
Ar defines_file
rather
than the
Pa y.tab.h
file used by the
Fl d
option.
It Fl g
The
Fl g
option causes a graphical description of the generated
Tn LALR Ns (1)
parser to be written to the file
Pa y.dot
in graphviz format, ready to be processed by
Xr dot 1 .
It Fl i
The
Fl i
option causes a supplementary header file
Pa y.tab.i
to be written.
It contains extern declarations
and supplementary
No #define Ns 's
as needed to map the conventional
Nm
Va yy Ns \&-prefixed
names to whatever the
Fl p
option may specify.
The code file, e.g.,
Pa y.tab.c
is modified to
No #include
this file as well as the
Pa y.tab.h
file, enforcing consistent usage of the symbols defined in those files.
The supplementary header file makes it simpler to separate compilation
of lex- and yacc-files.
It Fl l
If the
Fl l
option is not specified,
Nm
will insert
No #line
directives in the generated code.
The
No #line
directives let the C compiler relate errors in the
generated code to the user's original code.
If the
Fl l
option is specified,
Nm
will not insert the
No #line
directives.
No #line
directives specified by the user will be retained.
It Fl L
Enable position processing, e.g.,
Ql %locations
(compile-type configuration for
Nm ) .
It Fl o Ar output_file
specify the filename for the parser file.
If this option is not given, the output filename is
the file prefix concatenated with the file suffix, e.g.
Pa y.tab.c .
This overrides the
Fl b
option.
It Fl p Ar symbol_prefix
The
Fl p
option changes the prefix prepended to yacc-generated symbols to
the string denoted by
Ar symbol_prefix .
The default prefix is the string
Ql yy .
It Fl P
create a reentrant parser, e.g.,
Ql %pure-parser .
It Fl r
The
Fl r
option causes
Nm
to produce separate files for code and tables.
The code file is named
Pa y.code.c ,
and the tables file is named
Pa y.tab.c .
The prefix
Ql y
can be overridden using the
Fl b
option.
It Fl s
Suppress
No #define
statements generated for string literals in a
Ql %token
statement, to more closely match original
Nm
behavior.
Pp
Normally when
Nm
sees a line such as
Pp
Dl %token OP_ADD \*qADD\*q
Pp
it notices that the quoted
Dq ADD
is a valid C identifier, and generates a
No #define
not only for
Dv OP_ADD ,
but for
Dv ADD
as well,
e.g.,
Bd -literal -offset indent
#define OP_ADD 257
#define ADD 258
Ed
Pp
The original
Nm
does not generate the second
No #define .
The
Fl s
option suppresses this
No #define .
Pp
St -p1003.1
documents only names and numbers for
Ql %token ,
though the original
Nm
and
Xr bison 1
also accept string literals.
It Fl t
The
Fl t
option changes the preprocessor directives generated by
Nm
so that debugging statements will be incorporated in the compiled code.
It Fl v
The
Fl v
option causes a human-readable description of the generated parser to
be written to the file
Pa y.output .
It Fl V
The
Fl V
print the version number to the standard output.
It Fl y
Nm
ignores this option,
which
Xr bison 1
supports for ostensible POSIX compatibility.
El
Pp
The filename parameter is not optional.
However,
Nm
accepts a single
Dq \&-
to read the grammar from the standard input.
A double
Dq \&--
marker denotes the end of options.
A single filename  parameter  is  expected after a
Dq \&--
marker.
Sh EXTENSIONS
Nm
provides some extensions for
compatibility with
Xr bison 1
and other implementations of yacc.
It accepts several
Ql long options
which have equivalents in
Nm .
The
Ql %destructor
and
Ql %locations
features are available only if
Nm yacc
has been configured and compiled to support the back-tracking
Aq ( btyacc )
functionality.
The remaining features are always available:
Bl -tag -width Fl
It Ic %code Ar keyword { Ar code Ic }
Adds the indicated source code at a given point in the output
file.
The optional
Ar keyword
tells yacc where to insert the
Ar code :
Bl -tag -width Fl
It Ic top
just after the version-definition in  the  generated  code-file.
It Ic requires
just after the declaration of public parser variables.
If the
Fl d
option is given, the code is inserted at the beginning of the
Ar defines_file .
It Ic provides
just after the declaration of private parser variables.
If the
Fl d
option is given, the code is inserted at the end  of the
Ar defines_file .
El
Pp
If no
Ar keyword
is given, the code is inserted at the beginning of
the section of code copied verbatim from the source file.
Multiple
Ar %code
directives may be given;
Nm
inserts those into the corresponding code- or defines_file in the order that
they appear in the source file.
It Ic %debug
This has the same effect as the
Fl t
command-line option.
It Ic %destructor { Ar code Ic } Ar symbol Ns +
defines code that is invoked when a symbol is automatically
discarded during error recovery.
This code can be used to
reclaim dynamically allocated memory associated with the corresponding
semantic value for cases where user actions cannot manage the memory
explicitly.
Pp
On encountering a parse error, the generated parser
discards symbols on the stack and input tokens until it reaches a state
that will allow parsing to continue.
This error recovery approach results in a memory leak
if the
Vt YYSTYPE
value is, or contains, pointers to dynamically allocated memory.
Pp
The bracketed
Ar code
is invoked whenever the parser discards one of the symbols.
Within it
Sq Li $$
or
Sq Li $\*[Lt] Ns Ar tag Ns Li \*[Gt]$
designates the semantic value associated with the discarded symbol, and
Sq Li @$
designates its location (see
Ql %locations
directive).
Pp
A per-symbol destructor is defined by listing a grammar symbol
in
Ar symbol Ns + .
A per-type destructor is defined  by listing a semantic type tag (e.g.,
Sq Li \*[Lt] Ns Ar some_tag Ns Li \*[Gt] )
in
Ar symbol Ns + ;
in this case, the parser will invoke
Ar code
whenever it discards any grammar symbol that has that semantic type tag,
unless that symbol has its own per-symbol destructor.
Pp
Two categories of default destructor are supported that are
invoked when discarding any grammar symbol that has no per-symbol and no
per-type destructor:
Bl -bullet
It
The code for
Sq Li \*[Lt]*\*[Gt]
is used
for grammar symbols that have an explicitly declared semantic type tag
(via
Ql %type ) ;
It
The code for
Sq Li \*[Lt]\*[Gt]
is used for grammar symbols that have no declared semantic type tag.
El
It Ic %empty
ignored by
Nm .
It Ic %expect Ar number
tells
Nm
the expected number of shift/reduce conflicts.
That makes it only report the number if it differs.
It Ic %expect-rr Ar number
tell
Nm
the expected number of reduce/reduce conflicts.
That makes it only report the number if it differs.
This is, unlike
Xr bison 1 ,
allowable in
Tn LALR Ns (1)
parsers.
It Ic %locations
Tell
Nm
to enable  management of position information associated with each token,
provided by the lexer in the global variable
Va yylloc ,
similar to management of semantic value information provided in
Va yylval .
Pp
As for semantic values, locations can be referenced within actions using
Sq Li @$
to refer to the location of the left hand side symbol, and
Sq Li @ Ns Ar N\|
Ar ( N
an integer) to refer to the location of one of the right hand side
symbols.
Also as for semantic values, when a rule is matched, a default
action is used the compute the location represented by
Sq Li @$
as the beginning of the first symbol and the end of the last symbol
in the right hand side of the rule.
This default computation can be overridden by
explicit assignment to
Sq Li @$
in a rule action.
Pp
The type of
Va yylloc
is
Vt YYLTYPE ,
which is defined by default as:
Bd -literal -offset indent
typedef struct YYLTYPE {
   int first_line;
   int first_column;
   int last_line;
   int last_column;
} YYLTYPE;
Ed
Pp
Vt YYLTYPE
can be redefined by the user
Dv ( YYLTYPE_IS_DEFINED
must be defined, to inhibit the default)
in the declarations section of the specification file.
As in
Xr bison 1 ,
the macro
Dv YYLLOC_DEFAULT
is invoked each time a rule is matched to calculate a position for the
left hand side of the rule, before the associated action is executed;
this macro can be redefined by the user.
Pp
This directive adds a
Vt YYLTYPE
parameter to
Fn yyerror .
If the
Ql %pure-parser
directive is present,
a
Vt YYLTYPE
parameter is added to
Fn yylex
calls.
It Ic %lex-param { Ar argument-declaration Ic }
By default, the lexer accepts no parameters, e.g.,
Fn yylex .
Use this directive to add parameter declarations for your customized lexer.
It Ic %parse-param { Ar argument-declaration Ic }
By default, the parser accepts no parameters, e.g.,
Fn yyparse .
Use this directive to add parameter declarations for your customized parser.
It Ic %pure-parser
Most variables (other than
Va yydebug
and
Va yynerrs )
are allocated on the stack within
Fn yyparse ,
making the parser reasonably reentrant.
It Ic %token-table
Make the parser's names for tokens available in the
Va yytname
array.
However,
Nm
yacc
does not predefine
Dq $end ,
Dq $error
or
Dq $undefined
in this array.
El
Sh PORTABILITY
According to Robert Corbett:
Bd -filled -offset indent
Berkeley Yacc is an
Tn LALR Ns (1)
parser generator.
Berkeley Yacc has been made as compatible as possible with
Tn AT\*[Am]T
Yacc.
Berkeley Yacc can accept any input specification that conforms to the
Tn AT\*[Am]T
Yacc documentation.
Specifications that take advantage of undocumented features of
Tn AT\*[Am]T
Yacc will probably be rejected.
Ed
Pp
The rationale in
%U http://pubs.opengroup.org/onlinepubs/9699919799/utilities/yacc.html
documents some features of
Tn AT\*[Am]T
yacc which are no longer required for POSIX compliance.
Pp
That said, you may be interested in reusing grammar files with some
other implementation which is not strictly compatible with
Tn AT\*[Am]T
yacc.
For instance, there is
Xr bison 1 .
Here are a few differences:
Bl -bullet
It
Nm
accepts an equals mark preceding the left curly brace
of an action (as in the original grammar file
Dv ftp.y ) :
Bd -literal -offset indent
   |   STAT CRLF
       = {
               statcmd();
       }
Ed
It
Nm
and
Xr bison 1
emit code in different order, and in particular
Xr bison 1
makes forward reference to common functions such as
Fn yylex ,
Fn yyparse
and
Fn yyerror
without providing prototypes.
It
Xr bison 1
support for
Ql %expect
is broken in more than one release.
For best results using
Xr bison 1 ,
delete that directive.
It
Xr bison 1
has no equivalent for some of
Nm Ns 's
command-line options, relying on directives embedded in the grammar file.
It
Xr bison 1
Fl y
option does not affect bison's lack of support for
features of AT\*[Am]T yacc which were deemed obsolescent.
It
Nm
accepts multiple parameters with
Ql %lex-param
and
Ql %parse-param
in two forms
Bd -literal -offset indent
{type1 name1} {type2 name2} ...
{type1 name1,  type2 name2 ...}
Ed
Pp
Xr bison 1
accepts the latter (though undocumented), but depending on the
release may generate bad code.
It
Like
Xr bison 1 ,
Nm
will add parameters specified via
Ql %parse-param
to
Fn yyparse ,
Fn yyerror
and (if configured for back-tracking)
to the destructor declared using
Ql %destructor .
Pp
Xr bison 1
puts the additional parameters
Em first
for
Fn yyparse
and
Fn yyerror
but
Em last
for destructors.
Nm
matches this behavior.
El
Sh ENVIRONMENT
The following environment variable is referenced by
Nm :
Bl -tag -width TMPDIR
It Ev TMPDIR
If the environment variable
Ev TMPDIR
is set, the string denoted by
Ev TMPDIR
will be used as the name of the directory where the temporary
files are created.
El
Sh TABLES
The names of the tables generated by this version of
Nm
are
Va yylhs ,
Va yylen ,
Va yydefred ,
Va yydgoto ,
Va yysindex ,
Va yyrindex ,
Va yygindex ,
Va yytable ,
and
Va yycheck .
Two additional tables,
Va yyname
and
Va yyrule ,
are created if
Dv YYDEBUG
is defined and non-zero.
Sh FILES
Bl -tag -compact
It Pa y.code.c
It Pa y.tab.c
It Pa y.tab.h
It Pa y.output
It Pa /tmp/yacc.aXXXXXX
It Pa /tmp/yacc.tXXXXXX
It Pa /tmp/yacc.uXXXXXX
El
Sh DIAGNOSTICS
If there are rules that are never reduced, the number of such rules is
written to the standard error.
If there are any
Tn LALR Ns (1)
conflicts, the number of conflicts is also written
to the standard error.
\" .Sh SEE ALSO
Xr flex 1 ,
Xr lex 1
\" .Xr yyfix 1
Sh STANDARDS
The
Nm
utility conforms to
St -p1003.2 .