bisonc++input(7)

bisonc++ grammar file organization
(bisonc++.6.08.00)

2005-2024

NAME

bisonc++input - Organization of bisonc++'s grammar file(s)

DESCRIPTION

Bisonc++ derives from bison++(1), originally derived from bison(1). Like these programs bisonc++ generates a parser for an LALR(1) grammar. Bisonc++ generates C++ code: an expandable C++ class.

Refer to bisonc++(1) for a general overview. This manual page covers the structure and organization of bisonc++'s grammar file(s).

Bisonc++'s grammar file has the following generic outline:


    directives (see the next section)
    %%
    grammar rules
        

Grammar rules have the following generic form:


    nonterminal:
        production-rules
    ;
        

Production rules consist of zero or more sequences of terminal tokens, nonterminal tokens and/or action blocks. When multiple production rules are used they must be separated from each other by vertical bars. Action blocks are C++ compound statements.

This manual page contains the following sections:

UNDERSCORES

Starting with version 6.02.00 bisonc++ reserved identifiers no longer end in two underscore characters, but in one. This modification was necessary because according to the C++ standard identifiers having two or more consecutive underscore characters are reserved by the language. In practice this could require some minor modifications of existing source files using bisonc++'s facilities, most likely limited to changing Tokens__ into Tokens_ and changing Meta__ into Meta_.

The complete list of affected names is:

Enums:
DebugMode_, ErrorRecovery_, Return_, Tag_, Tokens_
Enums values:
PARSE_ABORT_, PARSE_ACCEPT_, UNEXPECTED_TOKEN_, sizeofTag_
Type / namespace designators:
Meta_, PI_, STYPE_
Member functions:
clearin_, errorRecovery_, errorVerbose_, executeAction_, lex_, lookup_, nextCycle_, nextToken_, popToken_, pop_, print_, pushToken_, push_, recovery_, redoToken_, reduce_, savedToken_, shift_, stackSize_, startRecovery_, state_, token_, top_, vs_,
Protected data members:
d_acceptedTokens_, d_actionCases_, d_debug_, d_nErrors_, d_requiredTokens_, d_val_, idOfTag_, s_nErrors_

DIRECTIVES

Quite a few directives can be specified in the initial section of the grammar specification file. If command-line options for directives are available, then their specifications take precedence over the corresponding directives in the grammar file. Once class header or implementation header files exist directives affecting those files are ignored.

Directives accepting a `filename' do not accept path names, i.e., they cannot contain directory separators (/); directives accepting a 'pathname' may contain directory separators. A 'pathname' using blank characters should be surrounded by double quotes.

Some directives may generate errors. This happens when their specifications conflict with the contents of files bisonc++ cannot modify (e.g., a parser class header file exists, but doesn't define a namespace, but in a later run the a %namespace directive was provided).

To resolve such errors the offending directive could be omitted, the existing file could be removed, or the existing file could be hand-edited according to the directive's specification.

POLYMORPHIC SEMANTIC VALUES

Like bison(1), bisonc++ by default uses int semantic values, and also supports the %stype and %union directives for using single-type or traditional C-type unions as semantic values. These types of semantic values are covered in bisonc++'s manual.

In addition, the %polymorphic directive can be specified to generate a parser using `polymorphic' semantic values. In this case semantic values are specified as pairs, consisting of tags (which are C++ identifiers), and C++ (pointer or value) type names. Tags and type names are separated by colons. Multiple tag and type name combinations are separated by semicolons, and an optional semicolon ends the final tag/type pair.

Here is an example, defining three semantic values: an int, a std::string and a std::vector<double>:


    %polymorphic INT: int; STRING: std::string; 
                 VECT: std::vector<double>
        
The identifier to the left of the colon is called the tag-identifier (or simply tag), and the type name to the right of the colon is called the type-name. Starting with bisonc++ version 4.12.00 the types no longer have to provide default constructors.

When polymorphic type-names refer to types that have not yet been declared by the parser's base class header, then these types must be (directly or indirectly) declared in a header file whose location is specified using the %baseclass-preinclude directive.

%type directives are used to associate (non-)terminals with semantic value types. E.g., after:


    %polymorphic INT: int; TEXT: std::string
    %type <INT> expr
        
the expr nonterminal returns int semantic values. In a rule like:

    expr:
        expr '+' expr
        {
            // Action block: C++ statements here.
        }
        
symbols $$, $1, and $3 represent int values, and can be used that way in the C++ action block.

Definitions and declarations

The %polymorphic directive adds the following definitions and declarations to the generated base class header and parser source file (if the %namespace directive was used then all declared/defined elements are placed inside the namespace that is specified by the %namespace directive):

The namespace Meta_ contains, among other classes the class SType. The parser's semantic value type STYPE_ is equal to Meta_::SType.

STYPE_ equals Meta_::SType

Meta_::SType provides the standard user interface for using polymorphic semantic data types. It declares the following public interface:

DOLLAR NOTATIONS

Inside action blocks dollar-notations can be used to retrieve and assign values from/to the elements of production rules. Type directives are used to associates dollar-notations with semantic types.

When %stype is specified (and with the default int semantic value type) the following dollar-notations are available:

When %union is specified these dollar-notations are available:

When %polymorphic is specified these dollar-notations can be used:

RESTRICTIONS ON TOKEN NAMES

To avoid collisions with names defined by the parser's (base) class, the following identifiers should not be used as token names:

OBSOLETE SYMBOLS

All DECLARATIONS and DEFINE symbols not listed above but defined in bison++ are obsolete with bisonc++. In particular, there is no %header{ ... %} section anymore. Also, all DEFINE symbols related to member functions are now obsolete. There is no need for these symbols anymore as they can simply be declared in the class header file and defined elsewhere.

USING SYMBOLIC TOKENS IN CLASSES OTHER THAN THE PARSER CLASS

The tokens defined in the grammar files processed by bisonc++ must usually also be available to the lexical scanner, returning those tokens when certain regular expressions are matched. E.g., a NUMBER token may be used in the grammar and the lexical scanner may be expected to return that token when the input matches the [0-9]+ regular expression. To avoid circular dependencies among classes the tokens can be written to a separate file using the token-path directive or option. The location and name of this file is specified by the token-path specification, and is generated from scratch at every run of bisonc++. By default the grammar's symbolic tokens are made available in the class Tokens, and classes may refer to its tokens using the Tokens class scope (e.g., Tokens::NUMBER).

Before bisonc++ version 6.04.00 tokens were made available by including the file parserbase.h, using a simple #define suggesting that the tokens were in fact defined by the parser class itself. Using this scheme lexical scanner specifications returned, e.g., Parser::NUMBER when [0-9]+ was matched. Unless the token-path directive or option is used this approach is still available, but its use is deprecated.

EXAMPLE

Using a fairly traditional example, we construct a simple calculator below. The basic operators as well as parentheses can be used to specify expressions, and each expression should be terminated by a newline. The program terminates when a q is entered. Empty lines result in a mere prompt.

First an associated grammar is constructed. When a syntactic error is encountered all tokens are skipped until then next newline and a simple message is printed using the default error function. It is assumed that no semantic errors occur (in particular, no divisions by zero). The grammar is decorated with actions performed when the corresponding grammatical production rule is recognized. The grammar itself is rather standard and straightforward, but note the first part of the specification file, containing various other directives, among which the %scanner directive, resulting in a composed d_scanner object as well as an implementation of the member function int lex, and the %token-path directive, defining the class Tokens in he file ../scanner/tokens.h. In this example, the Scanner class is generated by flexc++(1). The details of constructing a class using flexc++ is beyond the scope of this man-page, but flexc++'s specification file is shown below.

Here is bisonc++'s input file:

%filenames parser
%scanner    ../scanner/scanner.h
%token-path ../tokens/tokens.h

                                // lowest precedence
%token  NUMBER                  // integral numbers
        EOLN                    // newline

%left   '+' '-' 
%left   '*' '/' 
%right  UNARY
                                // highest precedence 

%%

expressions:
    expressions  evaluate
|
    prompt
;

evaluate:
    alternative prompt
;

prompt:
    {
        prompt();
    }
;

alternative:
    expression EOLN
    {
        cout << $1 << endl;
    }
|
    'q' done
|
    EOLN
|
    error EOLN
;

done:
    {
        cout << "Done.\n";
        ACCEPT();
    }
;

expression:
    expression '+' expression
    {
        $$ = $1 + $3;
    }
|
    expression '-' expression
    {
        $$ = $1 - $3;
    }
|
    expression '*' expression
    {
        $$ = $1 * $3;
    }
|
    expression '/' expression
    {
        $$ = $1 / $3;
    }
|
    '-' expression      %prec UNARY
    {
        $$ = -$2;
    }
|
    '+' expression      %prec UNARY
    {
        $$ = $2;
    }
|
    '(' expression ')'
    {
        $$ = $2;
    }
|
    NUMBER
    {
        $$ = stoul(d_scanner.matched());
    }
;

Bisonc++ processes this file, generating the following files:

For the program no additional members had to be defined in the class Parser. The member function parse is defined by bisonc++ in the source file parse.cc, and it includes parser.ih.

As cerr is used in the grammar's actions, a using namespace std or comparable directive is required. It is specified in parser.ih. Here is the implementation header declaring the standard namespace:

// Generated by Bisonc++ V5.00.00 on Sun, 03 Apr 2016 17:51:26 +0200

    // Include this file in the sources of the class Parser.

// $insert class.h
#include "parser.h"


inline void Parser::error()
{
    std::cerr << "Syntax error\n";
}

// $insert lex
inline int Parser::lex()
{
    return d_scanner.lex();
}

inline void Parser::print()         
{
    print_();           // displays tokens if --print was specified
}

inline void Parser::exceptionHandler(std::exception const &exc)         
{
    throw;              // re-implement to handle exceptions thrown by actions
}


    // Add here includes that are only required for the compilation 
    // of Parser's sources.



    // UN-comment the next using-declaration if you want to use
    // int Parser's sources symbols from the namespace std without
    // specifying std::

using namespace std;

In the current context the member function parse's implementation is not very relevant (it should not be modified by the programmer anyway). It is not shown here, but is available as calculator/parser/parse.cc in the distribution's demos/ directory after building the calculator using the there provided build script.

The lexical scanner is generated by flexc++(1) from the following specification file, using the command flexc++ lexer:

// see also regression/calculator/scanner

%interactive
%filenames scanner

%%

[ \t]+                          // skip white space

\n                              return Tokens::EOLN;

[0-9]+                          return Tokens::NUMBER;

.                               return matched()[0];


%%


Finally, here is the program's main function:

#include "parser/parser.h"

int main()
{
    Parser calculator;
    return calculator.parse();
}

SEE ALSO

bison(1), bison++(1), bisonc++(1), bisonc++api(3), bison.info (using texinfo), flexc++(1), https://fbb-git.gitlab.io/bisoncpp/

Lakos, J. (2001) Large Scale C++ Software Design, Addison Wesley.
Aho, A.V., Sethi, R., Ullman, J.D. (1986) Compilers, Addison Wesley.

AUTHOR

Frank B. Brokken (f.b.brokken@rug.nl).