Chapter 1: Introduction

This manual describes flexc++, a tool for generating lexical scanners: programs recognizing patterns in text. Usually, scanners are used in combination with parsers which can be generated by, e.g., bisonc++

Flexc++ reads one or more input files (called `lexer' in this manual), containing rules: regular expressions, optionally associated with C++ code. From this Flexc++ generates several files, containing the declaration and implementation of a class (Scanner by default). The member function lex is used to analyze input: it looks for text matching the regular expressions. Whenever it finds a match, it executes the associated C++ code.

Flexc++ is highly comparable to the programs flex and flex++, written by Vern Paxson. Our goal was to create a similar program, completely implementing it in C++, and merely generating C++ code. Most flex / flex++ grammars should be usable with flexc++, with minor adjustments (see also `differences with flex/flex++ 2').

This edition of the manual documents version 2.15.00 and provides detailed information on flexc++'s use and inner workings. Some texts are adapted from the flex manual. The manual page flexc++(1) provides an overview of the command line options and option directives, flexc++api(3) provides an overview of the application programmer's interface, and flexc++input(7) describes the organization of flexc++'s input s.

The most recent version of both this manual and flexc++ itself can be found at https://fbb-git.gitlab.io/flexcpp/. If you find a bug in flexc++ or mistakes in the documentation, please report it to the authors.

Flexc++ was designed and written by Frank B. Brokken, Jean-Paul van Oosten, and (up to version 0.5.3) Richard Berendsen.

1.1: Running Flexc++

Flexc++(1) was designed after flex(1) and flex++(1). Like these latter two programs flexc++ generates code performing pattern-matching on text, possibly executing actions when the input matches its regular expressions.

Contrary to flex and flex++, flexc++ generates code that is explicitly intended for use by C++ programs. The well-known flex(1) program generates C source-code and flex++(1) merely offers a C++-like shell around the yylex function generated by flex(1) and hardly supports present-day ideas about C++ software design.

Flexc++ creates a C++ class offering a predefined member function lex which matches input against regular expressions and possibly executes C++ code once regular expressions are matched. The code generated by flexc++ is pure C++, allowing its users to apply all of the features offered by that language.

Flexc++'s synopsis is:

flexc++ [OPTIONS] rules-file
Its options are covered in section 1.1.1, the format of its rules-file is discussed in chapter 3.

1.1.1: Flexc++ options

Where available, single letter options are listed between parentheses following their associated long-option variants. Single letter options require arguments if their associated long options require arguments as well. Options affecting the class header or implementation header file are ignored if these files already exist. Options accepting a `filename' do not accept path names, i.e., they cannot contain directory separators (/); options accepting a 'pathname' may contain directory separators.

Some options may generate errors. This happens when an option conflicts with the contents of an existing file which flexc++ cannot modify (e.g., a scanner class header file exists, but doesn't define a name space, but a --namespace option was provided). To solve the error the offending option could be omitted, the existing file could be removed, or the existing file could be hand-edited according to the option's specification. Note that flexc++ currently does not handle the opposite error condition: if a previously used option is omitted, then flexc++ does not detect the inconsistency. In those cases you may encounter compilation errors.

1.2: Some simple examples

1.2.1: A simple lexer file and main function

The following lexer file detects identifiers:

%%
[_a-zA-Z][_a-zA-Z0-9]* return 1;

The main() function below defines a Scanner object, and calls lex() as long as it does not return 0. lex() returns 0 if the end of the input stream is reached. (By default std::cin will be used).

#include <iostream>
#include "Scanner.h"

using namespace std;

int main()
{
	Scanner scanner;
	while (scanner.lex())
		cout << "[Identifier: " << scanner.matched() << "]";
}

Each identifier on the input stream is replaced by itself and some surrounding text. By default, flexc++ echoes all characters it cannot match to cout. If you do not want this, simply use the following pattern:

%%
[_a-zA-Z][_a-zA-Z0-9]*		return 1;
.|\n						// ignore

The second pattern will cause flexc++ to ignore all characters on the input stream. The first pattern will still match all identifiers, even those that consist of only one letter. But everything else is ignored. The second pattern has no associated action, and that is precisely what happens in lex: nothing. The stream is simply scanned for more characters.

It is also possible to let the generated lexer do all the work. The simple lexer below shows all encountered identifiers.

%%
[_a-zA-Z][_a-zA-Z0-9]*      {
            std::cout << "[Identifier: " << matched() << "]\n";
        }
.|\n                        // ignore

Note how a compound statement may be used instead of a one line statement at the end of the line. The opening bracket must appear on the same line as the pattern, however. Also note that inside an action, we can use Scanner's members. E.g., matched() contains the text of the token that was last matched. The following main function can be used to activate the generated scanner.

#include "Scanner.h"

int main()
{
	Scanner scanner;
	scanner.lex();
}

Note how simple this function is. Scanner::lex() does not return until the entire input stream has been processed, because none of the patterns has an associated action using a return statement.

1.2.2: An interactive scanner supporting command-line editing

The flexc++(1) manual page contains an example of an interactive scanner. Let's add command-line editing and command-line history to that scanner.

Command-line editing and history is provided by the Gnu readline library. The bobcat library offers a class FBB::ReadLineStream encapsulating Gnu's readline library's facilities. This class wass used by the following example to implement the required features.

The lexical scanner is a simple one. It recognizes C++ identifiers and \n characters, and ignores all other characters. Here is its specification:


%class-name Scanner
%interactive

%%

[[:alpha:]_][[:alnum:]_]*   return 1;
\n                          return '\n';
.
    
Create the lexical scanner from this specification file:

    flexc++ lexer
        

Assuming that the directory containing the specification file also contains the file main.cc whose implementation is shown below, then execute the following command to create the interactive scanner program:


    g++ *.cc -lbobcat
        
This completes the construction of the interactive scanner. Here is the file main.cc:

#include <iostream>
#include <bobcat/readlinestream>

#include "Scanner.h"

using namespace std;
using namespace FBB;

int main()
{
    ReadLineStream rls("? ");       // create the ReadLineStream, using "? "
                                    // as a prompt before each line
                                    
    Scanner scanner(rls);           // pass `rls' to the interactive scanner

                                    // process all the line's tokens
                                    // (the prompt is provided by `rls')
    while (int token = scanner.lex())
    {                                   
        if (token == '\n')          // end of line: new prompt
            continue;
                                    // process other tokens
        cout << scanner.matched() << '\n';
        if (scanner.matched()[0] == 'q')
            return 0;
    }
}
    
An interactive session with the above program might look like this (end-of-line comment is not entered, but was added by us for documentary purposes):
   
    $ a.out
    ? hello world               // enter some words
    hello 
    world                       // echoed after pressing Enter
    ? hello world               // this is shown after pressing up-arrow
    ? hello world^H^H^Hman      // do some editing and press Enter
    hello                       // the tokens as edited are returned 
    woman
    ? q                         // end the program
    $
        
The interactive scanner only supports one constructor, by default using std::cin, to read from, and by default using std::cout to write to:

    explicit Scanner(std::istream &in = std::cin,
                     std::ostream &out = std::cout);
        
Interactive scanners only support switching output streams (through switchOstream members).