LEX(1) SysV LEX(1)
NAME
lex - generate programs for simple lexical tasks
SYNOPSIS
lex [ -rctvn ] [ file ] ...
DESCRIPTION
lex generates programs to be used in simple lexical analysis of text.
The input files (standard input default) contain strings and expressions
to be searched for, and C text to be executed when strings are found.
The file lex.yy.c is generated. When loaded with the library, this file
copies the input to the output except when a string specified in the file
is found; then the corresponding program text is executed. The actual
string matched is left in yytext, an external character array. Matching
is done in order of the strings in the file. Strings can contain square
brackets to indicate character classes, as in [abx-z] to indicate a, b,
x, y, and z; and the operators *, +, and ? mean respectively any non-
negative number of, any positive number of, and either zero or one
occurrence of, the previous character or character class. The character
. is the class of all ASCII characters except new-line. Parentheses for
grouping and vertical bar for alternation are also supported. The
notation r{d,e} in a rule indicates between d and e instances of regular
expression r. It has higher precedence than |, but lower than *, ?, +,
and concatenation. Thus [a-zA-Z]+ matches a string of letters. The
character ^ at the beginning of an expression permits a successful match
only immediately after a new-line, and the character $ at the end of an
expression requires a trailing new-line. The character / in an
expression indicates trailing context; only the part of the expression up
to the slash is returned in yytext, but the remainder of the expression
must follow in the input stream. An operator character can be used as an
ordinary symbol if it is used within double quotes ("), or preceded by a
backslash (\).
Three subroutines defined as macros are expected: input() to read a
character; unput(c) to replace a character read; and output(c) to place
an output character. They are defined in terms of the standard streams,
but you can override them. The program generated is named yylex(), and
the library contains a main function (main()) that calls it. The action
REJECT on the right side of the rule causes this match to be rejected and
the next suitable match executed; the function yymore() accumulates
additional characters into the same yytext; and the function yyless(p)
pushes back the portion of the string matched beginning at p, which
should be between yytext and yytext+yyleng. The macros input and output
use files yyin and yyout to read from and write to, defaulted to stdin
and stdout, respectively.
Any line beginning with a blank is assumed to contain only C text and is
copied; if it precedes double percent characters (%%) it is copied into
the external definition area of the lex.yy.c file. All rules should
follow a %%, as in YACC. Lines preceding %% which begin with a non-blank
character define the string on the left to be the remainder of the line;
it can be called out later by surrounding it with {}. Note that curly
brackets do not imply parentheses; only string substitution is done.
External names generated by lex all begin with the prefix yy or YY.
Certain table sizes for the resulting finite state machine can be set in
the definitions section:
%p n number of positions is n (default 2500)
%n n number of states is n (500)
%e n number of parse tree nodes is n (1000)
%a n number of transitions is n (2000)
%k n number of packed character classes is n (1000)
%o n size of output array is n (3000)
The use of one or more of the above automatically implies the -v option,
unless the -n option is used.
OPTIONS
-r Indicates RATFOR actions.
-c Indicates C actions. This is the default.
-t Causes the lex.yy.c program to be written instead to standard
output.
-v Provides a one-line summary of statistics.
-n Does not print out the -v summary.
Multiple files are treated as a single file. If no files are specified,
standard input is used.
EXAMPLE
D [0-9]
%%
if printf("IF statement\n");
[a-z]+ printf("tag, value %s\n",yytext);
0{D}+ printf("octal number %s\n",yytext);
{D}+ printf("decimal number %s\n",yytext);
"++" printf("unary op\n");
"+" printf("binary op\n");
"/*" skipcommnts();
%%
skipcommnts()
{
for (;;)
{
while (input() != '*')
;
if (input() != '/')
unput(yytext[yyleng-1]);
else
return;
}
}
BUGS
The -r option is not yet fully operational.
SEE ALSO
yacc(1).
Domain/OS Programming Environment Reference.