regcmp(3) CLIX regcmp(3)
NAME
regcmp, regex - Compiles and executes regular expression
LIBRARY
Programmer's Workbench (libPW.a)
SYNOPSIS
char *regcmp(
char *string1, *string2, ... , (char *)0 );
char *regex(
char *re, *subject, *ret0, ... );
extern char *__loc1
PARAMETERS
re A pointer to the compiled regular expression .(pattern).
ret0, ...
A pointer to values which were matched.
string1, string2, ... ,
Each parameter is a pointer to a regular expression. These
arguments will be concatenated and the resulting regular expression
will be compiled. The last string must be a NULL pointer.
subject
A pointer to string upon which the regular expression is executed.
DESCRIPTION
The regcmp() function compiles a regular expression (consisting of the
concatenated arguments) and returns a pointer to the compiled form. The
malloc() function is used to create space for the compiled form. It is
the user's responsibility to free unneeded space so allocated. A return
from regcmp() indicates an incorrect argument. The regcmp() function is
written to generally preclude the need for this function at execution
time.
The regex() function executes a compiled pattern against the subject
string. Additional arguments are passed to receive values back. The
regex() returns on failure or a pointer to the next unmatched character on
success. A global character pointer __loc1 points to where the match
began. The regcmp() and regex() functions were mostly borrowed from the
editor, ed(); however, the syntax and semantics have been changed
slightly.
2/94 - Intergraph Corporation 1
regcmp(3) CLIX regcmp(3)
The following is a list of the valid symbols along with their meanings:
[]*.^ These symbols retain their meaning in ed().
$ Matches the end of the string; \n matches a newline.
- Within brackets the minus means through. For example, [a-z]
is equivalent to [abcd ... xyz]. The - can appear as itself
only if used as the first or last character. For example, the
character class expression []-] matches the characters ] and
-.
+ A regular expression followed by + means one or more times.
For example, [0-9]+ is equivalent to [0-9][0-9]*.
{m}
{m,}
{m,u} Integer values enclosed in {} indicate the number of times the
preceding regular expression is to be applied. The value m is
the minimum number and u is a number, less than 256, which is
the maximum. If only m is present (for example: {m}), it
indicates the exact number of times the regular expression is
to be applied. The value {m,} is analogous to {m,infinity}.
The plus + and the star * operations are equivalent to {1,}
and {0,}, respectively.
( ... )$n The value of the enclosed regular expression is to be
returned. The value is stored in the (n+1)th argument
following the subject argument. At most, ten enclosed regular
expressions are allowed. The regex() function makes its
assignments unconditionally.
( ... ) Parentheses are used for grouping. An operator; for example,
*, +, or {} can work on a single character or a regular
expression enclosed in parentheses. For example:
(a*(cb+)*)$0.
All the above defined symbols are special; therefore, to be used, they
must be escaped with a backslash (\).
EXAMPLES
1. The following matches a leading newline in the subject string pointed
at by cursor:
char *cursor, *newcursor, *ptr;
newcursor = regex((ptr = regcmp("^\n", (char *)0)),
cursor);
free(ptr);
2 Intergraph Corporation - 2/94
regcmp(3) CLIX regcmp(3)
2. The following matches through the string Testing3 and returns the
address of the character after the last matched character (the 4):
char ret0[9];
char *newcursor, *name;
name = regcmp("([A-Za-z][A-Za-z0-9]{0,7})$0", (char *)0);
newcursor = regex(name, "012Testing345", ret0);
The string Testing3 is copied to the character array ret0.
3. The following applies a precompiled regular expression in <file.i>
against string:
#include <file.i>
char *string, *newcursor;
newcursor = regex(name, string);
CAUTIONS
The user program may run out of memory if regcmp() is called iteratively
without freeing the vectors no longer required.
RETURN VALUES
regcmp() A NULL indicates an incorrect argument was received by
regcmp(); otherwise, a pointer to the compiled form of the
input regular expression is returned.
regex() A NULL indicates failure to match; otherwise, a pointer to the
next unmatched character is returned.
RELATED INFORMATION
Commands: regcmp(1), ed(1)
Functions: malloc(3)
2/94 - Intergraph Corporation 3