Museum

Home

Lab Overview

Retrotechnology Articles

⇒ Online Manual

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

yacc(1)

LEX(1)                               SysV                               LEX(1)



NAME
     lex - generate programs for simple lexical tasks

SYNOPSIS
     lex [ -rctvn ] [ file ] ...

DESCRIPTION
     lex generates programs to be used in simple lexical analysis of text.

     The input files (standard input default) contain strings and expressions
     to be searched for, and C text to be executed when strings are found.

     The file lex.yy.c is generated.  When loaded with the library, this file
     copies the input to the output except when a string specified in the file
     is found; then the corresponding program text is executed.  The actual
     string matched is left in yytext, an external character array.  Matching
     is done in order of the strings in the file.  Strings can contain square
     brackets to indicate character classes, as in [abx-z] to indicate a, b,
     x, y, and z; and the operators *, +, and ?  mean respectively any non-
     negative number of, any positive number of, and either zero or one
     occurrence of, the previous character or character class.  The character
     . is the class of all ASCII characters except new-line.  Parentheses for
     grouping and vertical bar for alternation are also supported.  The
     notation r{d,e} in a rule indicates between d and e instances of regular
     expression r.  It has higher precedence than |, but lower than *, ?, +,
     and concatenation.  Thus [a-zA-Z]+ matches a string of letters.  The
     character ^ at the beginning of an expression permits a successful match
     only immediately after a new-line, and the character $ at the end of an
     expression requires a trailing new-line.  The character / in an
     expression indicates trailing context; only the part of the expression up
     to the slash is returned in yytext, but the remainder of the expression
     must follow in the input stream.  An operator character can be used as an
     ordinary symbol if it is used within double quotes ("), or preceded by a
     backslash (\).

     Three subroutines defined as macros are expected:  input() to read a
     character; unput(c) to replace a character read; and output(c) to place
     an output character.  They are defined in terms of the standard streams,
     but you can override them.  The program generated is named yylex(), and
     the library contains a main function (main()) that calls it.  The action
     REJECT on the right side of the rule causes this match to be rejected and
     the next suitable match executed; the function yymore() accumulates
     additional characters into the same yytext; and the function yyless(p)
     pushes back the portion of the string matched beginning at p, which
     should be between yytext and yytext+yyleng.  The macros input and output
     use files yyin and yyout to read from and write to, defaulted to stdin
     and stdout, respectively.

     Any line beginning with a blank is assumed to contain only C text and is
     copied; if it precedes double percent characters (%%) it is copied into
     the external definition area of the lex.yy.c file.  All rules should
     follow a %%, as in YACC.  Lines preceding %% which begin with a non-blank
     character define the string on the left to be the remainder of the line;
     it can be called out later by surrounding it with {}.  Note that curly
     brackets do not imply parentheses; only string substitution is done.

     External names generated by lex all begin with the prefix yy or YY.

     Certain table sizes for the resulting finite state machine can be set in
     the definitions section:

          %p n number of positions is n (default 2500)

          %n n number of states is n (500)

          %e n number of parse tree nodes is n (1000)

          %a n number of transitions is n (2000)

          %k n number of packed character classes is n (1000)

          %o n size of output array is n (3000)

     The use of one or more of the above automatically implies the -v option,
     unless the -n option is used.

OPTIONS
     -r   Indicates RATFOR actions.

     -c   Indicates C actions.  This is the default.

     -t   Causes the lex.yy.c program to be written instead to standard
          output.

     -v   Provides a one-line summary of statistics.

     -n   Does not print out the -v summary.

     Multiple files are treated as a single file.  If no files are specified,
     standard input is used.

EXAMPLE
             D       [0-9]
             %%
             if      printf("IF statement\n");
             [a-z]+  printf("tag, value %s\n",yytext);
             0{D}+   printf("octal number %s\n",yytext);
             {D}+    printf("decimal number %s\n",yytext);
             "++"    printf("unary op\n");
             "+"     printf("binary op\n");
             "/*"      skipcommnts();
             %%
              skipcommnts()
              {
                     for (;;)
                     {
                             while (input() != '*')
                                     ;
                             if (input() != '/')
                                     unput(yytext[yyleng-1]);
                             else
                                     return;
                     }
              }


BUGS
     The -r option is not yet fully operational.

SEE ALSO
     yacc(1).
     Domain/OS Programming Environment Reference.

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026