Index of Section 3 Manual Pages

Interix / SUApcreposix.3Interix / SUA

PCRE(3)                                                   PCRE(3)



NAME
       PCRE - Perl-compatible regular expressions.

SYNOPSIS OF POSIX API
       #include 

       int regcomp(regex_t *preg, const char *pattern,
            int cflags);

       int regexec(regex_t *preg, const char *string,
            size_t nmatch, regmatch_t pmatch[], int eflags);

       size_t regerror(int errcode, const regex_t *preg,
            char *errbuf, size_t errbuf_size);

       void regfree(regex_t *preg);


DESCRIPTION

       This  set  of  functions provides a POSIX-style API to the
       PCRE regular expression package. See the pcreapi  documen-
       tation for a description of the native API, which contains
       additional functionality.

       The functions described here are  just  wrapper  functions
       that ultimately call the PCRE native API. Their prototypes
       are defined in the pcreposix.h header file,  and  on  Unix
       systems  the  library itself is called pcreposix.a, so can
       be accessed by adding -lpcreposix to the command for link-
       ing  an  application  which  uses  them. Because the POSIX
       functions call the native ones, it is  also  necessary  to
       add -lpcre.

       I have implemented only those option bits that can be rea-
       sonably mapped to PCRE native options.  In  addition,  the
       options  REG_EXTENDED  and  REG_NOSUB are defined with the
       value zero. They have no effect, but since  programs  that
       are  written  to  the POSIX interface often use them, this
       makes it easier to slot in PCRE as a replacement  library.
       Other POSIX options are not even defined.

       When  PCRE  is  called via these functions, it is only the
       API that is POSIX-like in style. The syntax and  semantics
       of  the  regular expressions themselves are still those of
       Perl, subject to the setting of various PCRE  options,  as
       described  below. "POSIX-like in style" means that the API
       approximates to the POSIX  definition;  it  is  not  fully
       POSIX-compatible, and in multi-byte encoding domains it is
       probably even less compatible.

       The header for these functions is supplied as  pcreposix.h
       to  avoid  any potential clash with other POSIX libraries.
       It can, of course, be renamed or aliased as regex.h, which
       is  the  "correct"  name. It provides two structure types,
       regex_t for compiled internal forms,  and  regmatch_t  for
       returning  captured  substrings. It also defines some con-
       stants whose names start with "REG_"; these are  used  for
       setting options and identifying error codes.


COMPILING A PATTERN

       The function regcomp() is called to compile a pattern into
       an internal form. The pattern is a C string terminated  by
       a  binary zero, and is passed in the argument pattern. The
       preg argument is a pointer to a regex_t structure which is
       used  as a base for storing information about the compiled
       expression.

       The argument cflags is either zero,  or  contains  one  or
       more of the bits defined by the following macros:

         REG_ICASE

       The  PCRE_CASELESS  option  is  set when the expression is
       passed for compilation to the native function.

         REG_NEWLINE

       The PCRE_MULTILINE option is set when  the  expression  is
       passed  for  compilation to the native function. Note that
       this does  not  mimic  the  defined  POSIX  behaviour  for
       REG_NEWLINE (see the following section).

       In  the  absence  of these flags, no options are passed to
       the native function.  This means the the regex is compiled
       with  PCRE  default  semantics.  In particular, the way it
       handles newline characters in the subject  string  is  the
       Perl way, not the POSIX way. Note that setting PCRE_MULTI-
       LINE has only some of the effects specified  for  REG_NEW-
       LINE. It does not affect the way newlines are matched by .
       (they aren't) or by a negative class such  as  [^a]  (they
       are).

       The  yield  of  regcomp() is zero on success, and non-zero
       otherwise. The preg structure is filled in on success, and
       one  member  of  the structure is public: re_nsub contains
       the number of capturing subpatterns in the regular expres-
       sion.  Various error codes are defined in the header file.


MATCHING NEWLINE CHARACTERS

       This area is not simple, because POSIX and Perl take  dif-
       ferent views of things.  It is not possible to get PCRE to
       obey POSIX semantics, but then PCRE was never intended  to
       be a POSIX engine. The following table lists the different
       possibilities for matching newline characters in PCRE:

                                 Default   Change with

         . matches newline          no     PCRE_DOTALL
         newline matches [^a]       yes    not changeable
         $ matches \n at end        yes    PCRE_DOLLARENDONLY
         $ matches \n in middle     no     PCRE_MULTILINE
         ^ matches \n in middle     no     PCRE_MULTILINE

       This is the equivalent table for POSIX:

                                 Default   Change with

         . matches newline          yes      REG_NEWLINE
         newline matches [^a]       yes      REG_NEWLINE
         $ matches \n at end        no       REG_NEWLINE
         $ matches \n in middle     no       REG_NEWLINE
         ^ matches \n in middle     no       REG_NEWLINE

       PCRE's behaviour is the same as Perl's, except that  there
       is  no  equivalent for PCRE_DOLLARENDONLY in Perl. In both
       PCRE and Perl, there is no way to stop newline from match-
       ing [^a].

       The default POSIX newline handling can be obtained by set-
       ting PCRE_DOTALL and PCRE_DOLLARENDONLY, but there  is  no
       way  to  make  PCRE  behave exactly as for the REG_NEWLINE
       action.


MATCHING A PATTERN

       The function regexec() is called to match  a  pre-compiled
       pattern  preg  against a given string, which is terminated
       by a zero byte, subject to the options  in  eflags.  These
       can be:

         REG_NOTBOL

       The  PCRE_NOTBOL option is set when calling the underlying
       PCRE matching function.

         REG_NOTEOL

       The PCRE_NOTEOL option is set when calling the  underlying
       PCRE matching function.

       The  portion  of the string that was matched, and also any
       captured substrings, are returned via the pmatch argument,
       which points to an array of nmatch structures of type reg-
       match_t, containing the members  rm_so  and  rm_eo.  These
       contain  the  offset  to  the first character of each sub-
       string and the offset to the first character after the end
       of  each  substring,  respectively. The 0th element of the
       vector relates to the entire portion of  string  that  was
       matched;  subsequent elements relate to the capturing sub-
       patterns of the regular expression. Unused entries in  the
       array have both structure members set to -1.

       A  successful  match  yields  a zero return; various error
       codes are defined in the header file, of which REG_NOMATCH
       is the "expected" failure code.


ERROR MESSAGES

       The  regerror()  function  maps  a non-zero errorcode from
       either regcomp() or regexec() to a printable  message.  If
       preg  is  not  NULL, the error should have arisen from the
       use of that structure. A message terminated  by  a  binary
       zero  is  placed  in  errbuf.  The  length of the message,
       including the zero, is limited to errbuf_size.  The  yield
       of  the  function is the size of buffer needed to hold the
       whole message.


STORAGE

       Compiling a regular expression causes memory to  be  allo-
       cated and associated with the preg structure. The function
       regfree() frees all such memory, after which preg  may  no
       longer be used as a compiled expression.


AUTHOR

       Philip Hazel 
       University Computing Service,
       Cambridge CB2 3QG, England.

Last updated: 03 February 2003
Copyright (c) 1997-2003 University of Cambridge.



                                                          PCRE(3)

Interix / SUAHosted at SUA Community for Interix, SUA and SFUInterix / SUA