Overview

Frequently when writing server modules one must parse input data at the protocol level. Take a POP3 server for example, it must parse lines starting with:

{"USER", "PASS", "CAPA", "UIDL", "STAT", "RETR", "TOP", "RSET", "DELE", "LIST", "QUIT"}

or SMTP looking for:

{"EHLO", "HELO", "MAIL", "RCPT", "DATA", "QUIT", "NOOP", "HELP", "EXPN", "VRFY", "RSET"}

Often times when one finds themself implementing one of these programs they will write a code block like so:

if(!strcasecmp(buf, "USER")) {
        /* handle USER line */

} else if(!strcasecmp(buf, "PASS")) {
        /* handle PASS line */

} else if(!strcasecmp(buf, "CAPA")) {
        /* handle CAPA line */

} else if(!strcasecmp(buf, "UIDL")) {
        /* handle UIDL line */

} else if(!strcasecmp(buf, "STAT")) {
        /* handle STAT line */

} else if(!strcasecmp(buf, "RETR")) {
        /* handle RETR line */

} else if(!strcasecmp(buf, "TOP")) {
        /* handle TOP line */

} else if(!strcasecmp(buf, "RSET")) {
        /* handle RSET line */

} else if(!strcasecmp(buf, "DELE")) {
        /* handle DELE line */

} else if(!strcasecmp(buf, "LIST")) {
        /* handle LIST line */

} else if(!strcasecmp(buf, "QUIT")) {
        /* handle QUIT line */

} else {
        /* handle parse error */
}

This code functionally is fine but it is only efficient for the case where the input in buf was "USER".

When the input is "QUIT", or a parse error, 10+ strcasecmp()'s must fail before reaching those branches.

The code is also a bit annoying to deal with, and it requires that your read() or recv() have enough data in buf to successfully do the full strcmp().

This requirement generally means you will have a input loop looking for a newline before even attempting to do any parsing in your big if-else block of strcmp()'s. To do this you must be iterating through the bytes you recieve from recv() or read() testing them looking for that newline before even doing the strcmp()'s. It's that or have a minimum bytes count and just compare the length of what you've read against that, only entering the parse loop once enough bytes are there to satisfy all of the strcmp()'s.

One simple optimization to this technique is to replace the big if-else block with a switch() on the first character of the line.

In a simple situation like POP3, this is very effective in making the parsing of commands extremely efficient, with the only collision being RETR and RSET which will require a small if-else-if block of strcasecmp()'s within the 'R' case.

        Heres an example of the above code block changed to use a switch to
        improve efficiency:

                /* begin */

                switch(*buf) {

                case 'U':
                case 'u':
                        if(!strcasecmp(&buf[1], "SER")) {
                                /* handle USER line */

                        } else goto _parse_error;

                case 'P':
                case 'p':
                        if(!strcasecmp(&buf[1], "ASS")) {
                                /* handle PASS line */

                        } else goto _parse_error;

                case 'C':
                case 'c':
                        if(!strcasecmp(&buf[1], "APA")) {
                                /* handle CAPA line */

                        } else goto _parse_error;

                case 'U':
                case 'u':
                        if(!strcasecmp(&buf[1], "UIDL")) {
                                /* handle UIDL line */

                        } else goto _parse_error;

                case 'S':
                case 's':
                        if(!strcasecmp(&buf[1], "STAT")) {
                                /* handle STAT line */

                        } else goto _parse_error;

                case 'R':
                case 'r':
                        if(!strcasecmp(&buf[1], "ETR")) {
                                /* handle RETR line */

                        } else if(!strcasecmp(&buf[1], "SET")) {
                                /* handle RSET line */

                        } else goto _parse_error;

                case 'T':
                case 't':
                        if(!strcasecmp(&buf[1], "OP")) {
                                /* handle TOP line */

                        } else goto _parse_error;

                case 'D':
                case 'd':
                        if(!strcasecmp(&buf[1], "ELE")) {
                                /* handle DELE line */

                        } else goto _parse_error;

                case 'L':
                case 'l':
                        if(!strcasecmp(&buf[1], "IST")) {
                                /* handle LIST line */

                        } else goto _parse_error;

                case 'Q':
                case 'q':
                        if(!strcasecmp(&buf[1], "UIT")) {
                                /* handle QUIT line */

                        } else goto _parse_error;

                default:
                        goto _parse_error;

                }

Now in the worst-case "RSET" you have the switch and strcmp() against "ETR" cost. The rest have the fast switch and call to the single strcmp().

Invalid input from byte 0 invokes zero strcmp()'s and goes straight to the default: in the switch. The rest of the parse errors incur the switch and strcmp() which fails, worst case being a parse error starting with 'R' because "SET" and "ETR" both must be strcmp()'d against before it's deemed a parse error.

Clearly this is an improvement in efficiency, but it still suffers from the requirement of reading enough data before even attempting to do the parsing. This method is even more annoying to deal with for the programmer as well.

Parse trees (or ptrees for short) are a solution I created for this relatively minor problem. The way a parse tree works is you construct a list of "targets" which define unique tree paths as byte-strings and associate them with something more computationally-practical. The something can be an integer key into a small range suitable for lookup or jump table usage, or it could be a simple pointer to data or a target-specific function.

Once you have assembled this list, you supply it to a function which processes the list and constructs a tree representation of the list in memory. Then when input is read you use a walk function against the input data and the constructed tree - regardless what quantity of bytes you have read in, and the function walks the tree using the input bytes as directions.

When the walk function indicates that it is at the end of a leaf in the tree, it has successfully matched one of your targets against the input stream.

When a parse error is encountered the walk function returns an error code.

Here is the POP3 example as a ptree target-list:

server_ptree_target_t   POP3_targets[] = {
        {"USER ", 5, do_user},
        {"PASS ", 5, do_pass},
        {"CAPA\r\n", 6, do_capa},
        {"UIDL\r\n", 6, do_uidl_all},
        {"UIDL ", 5, do_uidl_one},
        {"STAT\r\n", 6, do_stat},
        {"RETR ", 5, do_retr},
        {"TOP ", 4, do_top},
        {"RSET\r\n", 6, do_rset},
        {"DELE ", 5, do_dele},
        {"LIST\r\n", 6, do_list},
        {"QUIT\r\n", 6, do_quit}
};

In this example the do_* assignments are presumed to be functions for handling those aspects of the protocol. Note that I've included the POP3 line terminators as part of the targets. In order to make it RFC compliant I would also include the case permutations for all of the arguments to make it a case-insensitive list.

After you have the list put together, you simply pass it to the function server_ptree_new(). From this function you will get a server_ptree_t * which represents the root of the ptree. This pointer is used as the starting point for walking, and is also what you would supply to server_ptree_free() to deallocate the resources associated with the ptree instance.

Here's the ptree creation using the above target list:

server_ptree_t  *POP3_ptree;

POP3_ptree = server_ptree_new(POP3_targets, (sizeof(POP3_targets) / sizeof(server_ptree_target_t)));

if(POP3_ptree == NULL) {
        goto _failed;
}

You would only do the ptree creation as part of your program startup or initialization. Even if your program needed to do ptree walking in parallel it would only need one instance per target list. The ptree is never written to after creation, only read from, making it safe to share.

Also note that the server_ptree_new() function just constructs a tree of data structures which reference into the strings in your target list. It does not allocate copies of the strings in any way, so your targets must persist for the lifetime of the ptree.

How you integrate server_ptree_walk() into your read/recv loop is going to depend greatly on how your program is designed. The important thing is to make sure you handle the corner cases. The walk function returns the number of bytes it processed when there were no errors. If the number it returns is smaller than the length of input bytes you provided it, the walk has terminated at the end of a leaf node (successful match).

If the number it returned were equal to the length of input bytes you provided, there would be an ambiguity. It may have terminated at the end of a leaf node, but the only way for you to know would be to call the walk function again with a zero length input. If zero is returned then yes you have reached a target. If -1 is returned then more data must be read. This is how early versions of ServerKit were implemented.

In newer versions the walk function only returns a value equal to your input bytes when there is more walking to be done. When the input length is equal to the remaining length of the path, the input length is returned plus 1. So when ptree_walk() returns $>$ input length, you have reached the target with your last call in an aligned fashion. No more calls to ptree_walk() are needed.

It is important to understand this detail. Otherwise you may write a program that works most of the time and you may not realize until it is too late that there is a subtle flaw in your use of the walking.

Other than what's mentioned above, walking is simple. Once you have server_ptree_walk() properly integrated you will not have to tinker with that part of your code. Provided you do it right, you should be able to extend your program by modifying the targets list and writing the target-specific code the target associates with.

Depending on the CPU architecture, the ptree walking has been found to be generally between just 2 If you have a very very simple situation with only a few possible branches using a ptree is probably overkill and may even be slower than even the switchless if-else example above. However, in more common parsing situations where you have many branch possibilities some of which partially collide ptree performance is quite good. A good ptree candidate for example would be a fast /proc/meminfo parser:

MemTotal:       127148 kB
MemFree:          6152 kB
Buffers:          9856 kB
Cached:          50684 kB
SwapCached:          0 kB
Active:          74236 kB
Inactive:        34620 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       127148 kB
LowFree:          6152 kB
SwapTotal:     1001448 kB
SwapFree:      1001448 kB
Dirty:             192 kB
Writeback:           0 kB
Mapped:          59560 kB
Slab:             8032 kB
Committed_AS:   188872 kB
PageTables:        816 kB
VmallocTotal:   909232 kB
VmallocUsed:      2644 kB
VmallocChunk:   906204 kB

There is a small benchmark at the bottom of the server_ptree.c that is disabled via the preprocessor. You can experiment with it by changing the #if 0 and simply compiling server_ptree.c alone into an executable. The benchmark also provides a sample of ptree integration.

2007-12-06