This API reference documents version 2.0.0 - complete, but very rough. I have split this into sections that relate to the functional seperation of the library implementation, basically mirroring the listings. The relevant listings are referenced by name, in many cases there are valuable comments in the listing and you probably want to check the listing especially if this document is for a different version of ServerKit than you are using. Note: Since this is currently the primary ServerKit documentation, it has a mix of tutorial and reference content. I will fork off the tutorial parts at some point in the future, making it a pure API reference. - Modules {server_module.h} ServerKit modules are shared objects which get dynamically loaded at runtime by the server program. When the server program opens a module using dlopen(), it attempts to find a specific symbol named "ServerModule". All ServerKit modules must have the ServerModule symbol, or they will not load. The ServerModule symbol resolves to a server_module_t type, which is a data structure defined in server_module.h from libserver: /* begin */ typedef struct _server_module_t { char *name; char *description; int ver_major, ver_minor, ver_micro; int build_ver_major, build_ver_minor, build_ver_micro; char **authors; cfg_opt_t *configuration_options; int (*construct)(void *); int (*prestart)(cfg_t *); void * (*operator)(void *); void (*report)(FILE *); } server_module_t; /* end */ name: name of module description: brief description of modules purpose ver_*: version of module build_ver_*: version of ServerKit used to compile module authors: NULL terminated array of module author strings configuration_options: libconfuse type which describes configuration options for the module construct: function for very early module initialization prestart: function for post-configuration pre-userswitch activities, like binding to privileged ports operator: function passed to pthread_create, basically the modules main entry point report: function for reporting statistics, usually this involves sending some verbose statistics to the supplied stdio FILE * You don't generally interact directly with this structure. A macro is provided to simplfy the population of the server_module_t structure. This also abstracts the interaction so we may alter the structure in future releases as long as the macro usage is unchanged. Here is the macro: /* begin */ #define SERVER_MODULE(nam, va, vb, vc, desc) \ server_module_t SERVER_MODULE_ENTRY_SYMBOL = {\ #nam,\ desc,\ va, vb, vc,\ SERVER_VERSION_MAJOR, SERVER_VERSION_MINOR, SERVER_VERSION_MICRO,\ nam ## _authors,\ nam ## _config,\ nam ## _construct,\ nam ## _prestart,\ nam ## _operator,\ nam ## _report\ }; /* some stuff snipped here for brevity */ /* end */ Example usage of the macro is provided: SERVER_MODULE(pop, 0, 0, 1, "High performance POP3 (rfc1939) server designed for virtual hosting environments") Note that the macro assumes that you will have ${nam}_authors, ${nam}_config, ${nam}_construct, ${nam}_prestart, ${nam}_operator, ${nam}_report all defined properly. So, given these requirements, a pretty minimal FOO ServerKit module would be something like: /* begin */ #include #include static long int sleep_duration; static int FOO_construct() { fprintf(stderr, "FOO_construct\n"); return 1; } static int FOO_prestart(cfg_t *configuration) { fprintf(stderr, "FOO_prestart\n"); return 1; } static void * FOO_operator(void *foobar) { fprintf(stderr, "FOO_operator\n"); for(;;) sleep(sleep_duration); return NULL; } static void FOO_report(void) { fprintf(stderr, "FOO_report\n"); } static cfg_opt_t FOO_config[] = { CFG_SIMPLE_INT("sleep_duration", &sleep_duration), CFG_END() }; static char *FOO_authors[] = {"Vito Caputo ", NULL}; SERVER_MODULE(FOO,0,0,1,"Example module that does nothing but sleep") /* end */ Building the modules will require linking with libserver if you use any of the libserver API (recommended). If you limit yourself to only the module interface mentioned in this section, all you need is the server_module.h header. An additional feature of ServerKit is a "Bundled module inspector". When building your module, you should set the entry point to "__server_module_main", a symbol defined by the SERVER_MODULE macro. One way to do this is when invoking the linker "ld" add this: -e __server_module_main This causes the produced shared object to be directly executable. When run it will report vital information about itself like version, description, and all configuration options with default values. - Database pools {server_db.[ch]} When the server program starts and successfully parses a c11n file containing database sections database connection pools are created for every section found. Modules which need access to the configured databases must lookup the database by an identifier string. The identifier is assigned to each database pool in the c11n file as a section title. Database sections may occur many times to create many database pools, but the identifier strings must be kept unique to facilitate lookups. The following is the database portion of the ServerKit API: server_db_pool_t *server_db_pool_lookup(char *id) Returns a pool when successful, or NULL on failure. The module must get the identifier somehow. This is usually done by adding a database identifier string in the modules configuration option list. It's NOT recommended to hard-code database identifiers. Also note that this is not a fast path, the list of configured pools is not indexed or hashed or anything special. It's simply a list and a lookup consists of walking it comparing the id to the pool titles. This means you should not call this function frequently. It doesnt make sense for you to ever call it frequently anyways, because the configuration cannot change during module operation. So if your module uses databases perform your pool lookups in the prestart or beginning of the operator, not in a worker or main event loop. server_db_t * server_db_pool_take(server_db_pool_t *pool, int wait_if_exhausted) Returns a database instance / connection when successful, or NULL on failure. If you specify a value of 1 for wait_if_exhausted the function will block when the pool is empty and cannot be expanded (pool_max has been reached). This means it will wait until a connection is available. If you specify 0 for wait_if_exhausted the function will return immediately with NULL if the pool is empty and cannot be expanded. You'll generally want it to wait. The pool you provide is obtained via the server_db_pool_lookup() function explained above. When you are finished using a connection (generally, you use them for a very short period to perform a snappy little query), you must return it to the pool so it may be used again ASAP. int server_db_pool_give(server_db_t *db) Returns 1 on success, 0 on failure (unlikely, unless you pass NULL for db or some other programming error) For every server_db_pool_take(), there must be a server_db_pool_give(), otherwise you are leaking instances. This function will notify blocked takers of a instance becoming available. It is strongly recommended that you only hold onto a database instance in your code only as long as required to finish your queries, give it back as soon as you can. In some cases you may not want to give it back, if you really don't do any IO or CPU bound work between queries the work of taking & giving can become contentious. server_db_result_t * server_db_query(server_db_t *db, char *query) Returns a result on success, or NULL on failure. The db you provide is obtained with server_db_pool_take(), the query is a SQL query (MySQL currently). I've wrapped the MySQL functions with a ServerKit API so we can possibly add support for other SQL implementations in the future, i.e. postgres, or maybe db2. So if you have programmed using libmysqlclient, this should be very familiar to you. server_db_row_t server_db_fetch_row(server_db_result_t *result) Returns a row from the result, NULL is returned when either there was an error or no row. The result you provide is obtained with server_db_query() You interact with this row type just like with libmysqlclient, so if you are confused you might want to look at the libmysqlclient documentation. See the general example included at the bottom of this section for a model usage. int server_db_free_result(server_db_result_t *result) This function frees a result returned by a successful call to server_db_query(). For every successful query you must call server_db_free_result, or you are leaking memory. int server_db_escape_string(server_db_t *db, char *dest, char *src, int src_len) Returns the number of bytes written to dest not including the terminating null byte. This is a helper function for escaping strings you wish to put in your SQL query so they don't get parsed into strange things when they contain special characters like quotes. There is a requirement on the size of the destination buffer, and a helper macro is provided for allocating one of appropriate size in server_db.h: #define SERVER_DB_ESCAPED_STRING_LEN(len) ((len*2)+1) If more database support is added what this macro actually does may change, but the name and usage should be constant. This means when you allocate a buffer for storing the escaped version of a string, perhaps a email address you have stored in the char * addr, you would do it something like this: esc_addr = (char *)malloc(SERVER_DB_ESCAPED_STRING_LEN(strlen(addr))); or, preferably, if you have the length of the string on hand you can drop the strlen. Also, if you are dealing with something standardized, like a protocol, and the protocol defines maximum lengths for things like this, you could probably just allocate the escaped buffer once early in execution and continue reusing it, by simply using the maximum length as an arg to the macro. This generally makes the program faster. General db sample: /* begin */ query = "select home from users where name=\"booty\""; result = server_db_query(db, query); if(result != NULL) { if((row = server_db_fetch_row(result)) && row[0] != NULL) { homedir = strdup(row[0]); } server_db_free_result(result); } server_db_pool_give(db); puts(homedir); free(homedir); /* simple, yes? */ /* end */ - Threads pools & (delayed) work management {server_thread.[ch]} ServerKit has been built with multi-threading using POSIX threads from the ground up. You are not required to use the thread pools interface to make use of ServerKit. But since at this time the operator of the module is run in a thread created via pthread_create() you can't get away from the pthreads dependancy. This may change in a future version, if there is sufficient demand. I will briefly run off on a tangent here before describing the API, to hopefully help you better understand the purpose & goals of the thread pool implementation. ServerKit provides a thread pool interface kindof similar to the database pool interface described above (lacking the SQL-stuff of course). It also adds some unique features for dealing with "delayed work" that you can employ in your modules if you like. You also don't explicitly give and take threads from the pool, rather you add work to the thread pool. Delayed work is something most system administrators and programmers should be aware of and understand pretty well. Delayed work is what those D state processes on your unix/linux box are experiencing. When it comes to creating large scale applications there are a number of ways to achieve the concurrency required to provide service to many from a single program. However the problem of delayed work is often neglected, or dealt with in a way that makes the code delicate, confusing, or just severely limiting. These are commonly used techniques for achieving concurrency in a server program, in no particular order: - Single process usually using non-blocking IO and often fragile state machine. This technique can produce the most efficient programs but tends to create programs that are difficult to understand, extend, modify, debug, and write. It also doesnt utilize SMP, unless you can make it so you simply run a few instances of the process (see djbdns for example) Sometimes the task is well-fitted to this model, and there's no reason to write it off completely. The computer is after all just a state machine. However, programmers are humans... and requirements change, so clever code is usually a bad idea. I generally classify these programs as clever code. Very cool when you're first learning how to program, but in the long run the cause of many broken keyboards. This type of program basically considers everything that isnt imediately ready IO-wise delayed work. Usually it will only block on a select(), poll(), or epoll_wait() call, and upon return will have to scan the results for file descriptors that are ready (depending on how this is implemented, it can be a very inefficient exhaustive search through the set. Especially true when using select() and the ready FD is near the end of the set, epoll entirely fixes this particular problem however). The program then will attempt IO on the ready file descriptors, with the system calls returning immediately. This results in high system call counts, especially when work is progressing very slowly. You won't be moving much data for every system call, and system calls are not free. Programs built this way using select() also suffer from a very low scale limitation, due to how select() is implemented. It is basically a maximum number of open file descriptors equal to FD_SETSIZE, remember, sockets take up file descriptors too. - Multi-process, blocking IO This technique is simple, often referenced as the accept & fork model. It suffers from relatively low scale limitations. The forked processes eat memory, context switching overhead is high, and if the processes need to share information you either end up using expensive mechanisms or clumsy SysV IPC shared memory. Sometimes this model is a necessary evil, if your execution contexts require the OS-provided memory protection among processes, or running as different users, or with different working directories. Make sure you actually care about these things before you commit to this model, because you are severely limiting yourself by doing so. In this model when done simply, with a connection per process, delayed work quickly eats processes and most implementations rely on a timeout to exit processes and disconnect clients for dealing with extensively delayed work. Some examples of programs that are like this are: Apache 1.3 (MaxClients is easy to reach, apache stops accepting work) There are many more, I will add more when I get a chance. TODO You'll probably be surprised how common it is. - Multi-threaded, blocking IO This technique is very similar to the previous one, but instead of accept & fork, it's accept & pthread_create. Or, since the memory is naturally shared among all threads making it trivial & efficient, the accept & get_thread_from_pool model. The threads do not consume additional memory except basically a unique stack space and some kernel space in linux to track the new task. Many programs are built this way. You simply implement a worker function for your threads using blocking IO, and the thread either exits or preferably is returned to a pool when the work is finished. There are problems with this simple design though, there are still limits to how many threads you can create. (pretty high though, depends on how big your thread stack size is) There is also still context switching overhead that gets pretty silly when you have many thousands of threads on just a few processors. Dealing with running out of threads is simple enough. You can insert a queue between the accept & get_thread_from_pool, so when there are no more threads the work gets queued up (takes some memory) and waits for a thread to become available - which requires one of the occupied threads to finish what it's already doing. This solves one problem, and it's not hard to add. Except, with this method there is potentially high latency between work being added to the queue and any work being finished, possibly unacceptably high. The real problem here is delayed work, if you use the simple design and just have a 'thread per connection' delayed work can easily consume all your threads. And depending on your timeouts, you might find yourself with no threads coming back for more work for a long time (pop3 rfc wants a timeout of 10 minutes for example). Sometimes a program will implement connection counters that limit how many connections can come from a single IP address. This gets used often to mitigate the problem of delayed work eating all the threads. The idea is it's an attack, a client is just connecting alot and not doing much, eating up threads or processes... by limiting how many times they can connect from that IP address it prevents them from using up all the available slots. It's not the correct solution though, it is very plausible especially with todays very fast systems that you are servicing a huge number of clients spread all over the world. All that needs to happen is a particular region have some router problems, and it can easily result in hundreds or thousands (depending on your scale) of threads that are lost waiting until those victims of the outage have timed out. Some may say then just shorten the timeout, but that also is not a reasonable solution. You need to still provide a good quality of service to your users. Not all users are on fast reliable connections and it's ridiculous for a service provider with such fast machines to not be able to satisfy the needs of a *SLOW LAGGY USER*. The problem is the software design, a timeout of hours should be possible without breaking things. These problems plague both multi-threaded and multi-process designs that adhere to a simple 1:1 relationship of work:execution context. - Hybrid approach - the ServerKit way (But certainly not the only way!) Another solution which I think is reasonably elegant, simple to use, and abstracted by ServerKit, is to break the strictly bound 1:1 relationship. You can implement the basic 1:1 model still by writing thread functions that never detect work as being delayed, never get bored and return work as unfinished, and just continue running exclusively with a work unit until completion. So if you don't like this hybrid model you are not forced to use it. Supporting it does add some complexity to your modules. The way this is achieved is simple, it employs a mix of the concepts from the single process state machine model and the simple 1:1 multi- threaded model. In your worker threads you will likely do some form of IO, some of the IO may be under your control, like accessing a storage device that is local and is relatively deterministic in how latent it will be. With these forms of IO you probably don't have to worry much or do anything in particular. When you perform IO on a network socket, or some other less- deterministic device, you need to be careful. As I stated before, generally people rely on timeouts to simply disconnect, and they are quite lengthy otherwise you're just rude to your latent users. In ServerKit, use system calls like poll() to timeout quickly on IO that can be potentially very slow, you might even want to employ some form of non-blocking IO. When it comes to sockets, there are two options SO_RCVTIMEO and SO_SNDTIMEO that are properly supported on Linux (though apparently not all UNIX-like systems support them). These are helpful in efficiently sensing delays without simply placing a poll() in front of your socket IO. When you detect a delay, perhaps half a second long when you expected progress, your worker thread function can return to ServerKit a status indicating that it has been delayed. You inform ServerKit of the file descriptor that is delayed and how long the timeout should be before ServerKit hands the work back to you. (presumably for you to disconnect and clean up, so the timeout would probably be pretty long) It's quite simple, and it solves the delayed work hogging threads problem without making YOUR code a big ugly state machine. Libserver implements this feature using epoll(), and your thread function is not run directly from pthread_create(). Instead your thread function is wrapped by a function in libserver which handles delayed work on your behalf. You just have to return what things are delayed, where to look for their arrival, and how long to wait for them. To avoid introducing alot of context switching, the delayed work timeout is expressed in *seconds*. It does not attempt to implement any high resolution timing in this context, if you want to timeout the work in under a second don't bother with the delayed work management, just wait the sub-second in poll() or use SO_*TIMEO. If this limitation becomes a problem it would not be difficult to change it to milliseconds or something. But increasing the resolution would introduce more overhead in the delayed work timer thread. In addition to handling delayed work there is a concept of unfinished work as well. Depending on what your module implements, your worker function may not often block on IO or maybe never at all. But it may still spend alot of time 'working' on the work unit it has been given. To prevent starving work from getting worked on, the API also provides a work status return value that indicates unfinished work. If you have some part of your worker that is timely to process even when things don't get delayed per-se, you should go ahead and return from your function with the unfinished status. What this will do is it causes the wrapper function to examine the work queue. If the work queue is empty your worker function simply gets called again with the same work unit you just gave back as unfinished. If the work queue was actually non-empty, the unfinished work you returned gets placed at the back of the queue and the work at the front of the queue gets supplied to the work function. The thread never goes back onto the thread pool as available when you return an unfinished status, and it's relatively cheap so sprinkling them around your worker function is not a big deal. Some prime spots for returning work as unfinished are after completing some subset of the work at hand. For example lets say the 'work' is actually a pop3 session. Returning unfinished after transmitting a message without delays is not a bad idea, if there is no backlog of queued work the session continues without any noticable impact. If there is some backlogged work in the queue it will get worked on first, adding some latency to the unfinished work progression. It's similar to the single-process state machine but coarser grained. The problem with the non-blocking IO single-process state machine is it is too much of a busy body, far too agressive. Rather than us jumping from work to work a page at a time we either jump from work to work due to significant delays (should be hundreds of milliseconds) or significant progress. An important thing to keep in mind is work which you have returned as delayed does not go back onto the work queue. The work queue only contains either new work units or unfinished work units. So when you return work unfinished and it gets placed back on the queue in last, the unit you get back is either a previously unfinished but not delayed unit which has made its way back to the front, or a fresh new unit - which may get delayed at which point it goes to the delayed work manager not the work queue. Ok, now for the actual documentation on this part of the API, which is in fact, surprisingly small: server_thread_pool_t * server_thread_pool_new(const char *label, int stack_size, int min_threads, int max_threads, server_work_status_t(*func)(server_thread_t *, void *, server_thread_delayed_work_t **), int options, const char **wchans, int n_wchans) Returns a new thread pool on success, or NULL on failure label: a string used when ServerKit prints statistics stack_size: size in bytes of the per-thread stack to use min_threads: the minimum number of threads in the pool max_threads: the maximum number of threads in the pool func: a function pointer to the worker function options: Currently there is only one thread pool option: SERVER_THREAD_POOL_QUEUE_ENTRY_SUPPLIED Use of this option is preferred as it makes the queueing of work more efficient. This option results in the passing of SERVER_QUEUE_ENTRY_SUPPLIED to the internal ServerKit queue used by the thread pool. As a result when you submit work to the thread pool you must do so in the form of a server_queue_entry_t *, and simply place your opaque data type which describes the work to your application in the foo member of the server_queue_entry_t structure. wchans: An array of strings to be used as wait channel identifiers for notable procedures which may be delayed in the routines executed under the thread function. May be NULL if wchans are not to be used. n_wchans: The number of wchan strings in the wchans array. The function you provide must return a server_work_status_t, when your function is called it will be in response to work being available on the work queue for the thread pool. Your function will get the server_thread_t object for the thread it is executing on in the first argument, the second argument is a void * which is the work unit you are working on, taken from the work queue. The type of this object would be known to you because you added work to the thread pool by casting something known to you to a void *, this pointer is being given back to you at the worker end now. When you use the SERVER_THREAD_POOL_QUEUE_ENTRY_SUPPLIED option the second argument should be instead treated as a server_queue_entry_t *, the same as you would have added to the thread pool. You would then find your work-specific type in the foo member of the server_queue_entry_t. The third argument for the worker function is a pointer to pointer to a ServerKit type which is used for describing delayed work, server_thread_delayed_work_t. When your worker function has work delayed, it populates a server_thread_delayed_work_t structure and stores the pointer to this structure in the address provided in the third argument by dereferncing it. After storing the description of the delayed work where the ServerKit worker function wrapper can find it, the worker must return a value from the server_work_status_t enum indicating delayed work. Here are the members of the server_thread_delayed_work_t structure you need to set to let ServerKit manage your delayed work for you: /* begin */ typedef struct _server_thread_delayed_work_t { /* file descriptor responsible for the delay */ int fd; /* how long delay can last in seconds */ int idle_time_remaining; /* how many times this work has been delayed */ int count; /* the work that is being delayed */ void *work; ...snip } server_thread_delayed_work_t; /* end */ There are other members in the structure but you don't have to worry about them right now, working with this structure will probably be abstracted in the near future either with macros or a function. Note that the space this structure is in must persist after the worker has returned the delayed status, so it must be malloc'd or something, it can't be on the stack. This structure must be populated and assigned to the dereferenced pointer when you return either: SERVER_WORK_DELAYED_WAIT_IN or SERVER_WORK_DELAYED_WAIT_OUT, which may be OR'd together to indicate both IN and OUT. These other possible return values do not involve the delayed work structure: When you wish to return unfinished work, you return: SERVER_WORK_UNFINISHED When the work is completed, you return: SERVER_WORK_FINISHED Upon completion, it is your responsibility to free the resources allocated for the work unit. Once you return SERVER_WORK_FINISHED, ServerKit will not do anything with the related work unit again. int server_thread_pool_add_work(server_thread_pool_t *pool, void *work_unit) Returns 0 on failure 1 on success The pool you provide is the pool returned by the server_thread_pool_new() function. The work you provide is simply a pointer ServerKit moves around for you, whatever it points at is up to you. Usually, it tends to be some structure that persists for the lifetime of a connection or session, including members for tracking the state of the session. Note that the work_unit needs to persist until the work is finished and it will never be referenced again. If you used SERVER_THREAD_POOL_QUEUE_ENTRY_SUPPLIED when creating the thread pool, the void *work_unit may not simply point at any data structure you wish. The API expects you to supply a server_queue_entry_t * in this case and place your opaque "whatever you want" pointer in the foo member of the supplied server_queue_entry_t. void server_thread_set_wchan(server_thread_t *thread, int wchan) This is a simple function for worker threads to set a string describing the context of what they are currently doing. It is wise to surround potentially timely operations with calls to this function that set the "wait channel" to a useful string via integer index into the wchans array provided to server_thread_pool_new(), then unset the wchan by calling it again with -1 when the section ends. The thread argument is the thread object that gets passed as the first argument to the worker function. This function is intended to be used in your worker functions to assist in profiling, debugging, and troubleshooting by both the programmer and administrator. These wchans are something that gets printed by ServerKit when reporting statistics, for every thread in every pool. This function is also the primary reason the thread object is passed to the worker function. For example, say you have a pop3 module and upon successful login you have to scan the contents of the maildrop with a function like this: maildir_get_cur(maildrop); You might want to do this in your code: /* begin */ #define POP_WCHAN_CLEANTMP 0 #define POP_WCHAN_GETCUR 1 const char **pop_wchans[] = {"clean_tmp", "get_cur"}; /* after providing pop_wchans to server_thread_pool_new().. */ server_thread_set_wchan(thread, POP_WCHAN_GETCUR); maildir_get_cur(maildrop); server_thread_set_wchan(thread, -1); /* end */ Then, lets say your pop3 is being really slow, and you told ServerKit to report statistics on the pop3 personality. You might find that all the threads doing work are showing "get_cur" for the wchan. That immediately shows you that it's not slow because the public network is congested or failing, it is probably a filesystem/storage problem causing maildir_get_cur() to block. This is similar to the wchan feature of ps, which shows you the wait channel of processes & threads at the system level. The problem with relying exclusively on the ps wchan is you often have a program that enters the same wait channel at the system level in various contexts, it doesnt have the application-specific contexts. When you set the wchans yourself in the program, you can surround the same system call in all the places it occurs in the module with unique ServerKit wchans that include much more context. Note that this is not exactly free, so you want to make sure you don't call it excessively. If there is a tight loop of calls that block, it's probably better to set and unset the wchan outside the loop in the interest of efficiency. The wchan set function acquires a per-thread-pool mutex every time it is called, so it can become contentious if you overdo it causing unwanted serializing of worker threads. - Queues {server_queue.[ch]} ServerKit provides basic queues that are intended for use among threads. It's basically a simple linked list, but with POSIX threads support added in the form of some synchronization primitives. If you need a linked list based queue but for use in a single execution context, do not use these. They will add more overhead than is necessary because they perform synchronization. The up side to this, is if you do have threads that need to queue data between one another, you can use these functions without worrying about synchronization. When you want to send data through a queue, you use the server_queue_push() function, you provide it the queue you want to use and a void * pointing at the data you wish to queue. You must ensure that the memory occupied by what you are queueing persists, it can't be on the local stack, and generally you will want to use something with reference counting and also some synchronization primitives so it can be placed on multiple queues without making new copies of itself. The queue implementation doesnt touch the object you pass through it. It simply moves the pointer through, and it is up to the receiver to make use of it. If you need to queue something simple like a 32 bit integer (fd?) you can generally cast it to a void * and push it through without having to worry about allocating additional memory. But when your application gets more complex you will need more than the 32 or 64 bits the void * provides (depending on the architecture), you may want to look at the ServerKit heaps if it fits your needs. On the recipient end of the queue, that is, any thread that wants to wait for something to come through the queue, you call server_queue_pull(). This function takes 3 arguments, the first is the queue you are interested in, the second is a integer currently used as a boolean (0 or 1), which tells ServerKit if you want to wait for something to come or just return immediately either way. The third argument is a pointer to where you want to store the void * when you receive something. Usually, you will want to set 1 for the boolean. This causes a wait in a pthread_cond_wait() behind the scenes for something to be added to the queue and call pthread_cond_signal() on the condition variable within the queue. This approach works pretty well but note that it is moving individual units at a time, which can cause extraneous context switching. If you have a queue with high throughput there is another way to accept data from the queue in blocks rather than individually, the function is server_queue_acquire(). Using this method is slightly more complex because you must iterate over possibly many queue entries rather than just having a single void * stored for you. This also exposes you to the ServerKit heap part of the api, because the queues are built on top of them by default. When you use server_queue_acquire, and walk the server_queue_entry_t based linked list, any entries you are finished with you have to explicitly free by calling server_heap_unit_unref(). If you don't do this in your consumer, you will just leak from the queues internal heap and eventually it's going to break, probably in the form of server_queue_push() just blocking indefinitely waiting for a heap unit. Forgetting to free also hurts the performance, you should return the queue entries as quickly as possible and avoid doing anything that can block while you have queue entries in hand, because it will force the queue heap to grow which is costly. Note I didnt say never to block while you have queue entries, sometimes that is the whole point of putting a queue in front of what you are doing. Just keep it in mind to only do what is required before returning queue entries to the queue, and remember to implement timeouts. There is one last function for working with the queues, that is server_queue_cycle(). This function is used when you want to trade an object with something on the queue if there is anything. This is used to implement the unfinished work feature mentioned above in the thread pools section. You call this function with the queue you are interested in, and a pointer to the void * object you would like to give back to the queue. What it will do is if the queue has anything in it, it will take the first entry off, store the payload at the pointer you supplied, and take the void * your supplied pointer pointed at, storing it in the entry as the new payload. The entry is then placed at the back of the queue. If the queue is empty when you call server_queue_cycle(), nothing is done, your supplied void ** is left alone. The server_queue_cycle() function is intended to be a form of "I'm sick of this one, maybe theres something better in there, I can use a break from this." So when there is something else for you to try out you get it, but if theres nothing else you are stuck with it carry on. Of course depending on how your logic works and the task at hand, you may wind up with a queue of objects that you just keep cycling but you should never write code that cycles the queue without making any progress towards rendering the payload obsolete. Be sure to do some real work first, unless you're certain it makes sense. The push, pull, acquire, and cycle functions are all atomic with regard to the queue. You must not free the queue while calling any of these functions, so your code must take care of that detail. You also must not call any of these functions after a queue has been freed, and a free attempt on a non-empty queue is considered a programming error and will fail loudly with an error. Enough overview of the queue implementation, heres the dry details: server_queue_t * server_queue_new(const char *label, int max, int options) Creates a new queue, max is the maximum length of the queue, options is used to alter the behavior of the queue, currently two options are supported: SERVER_QUEUE_UNLIMITED Causes the max value to be ignored. SERVER_QUEUE_ENTRY_SUPPLIED Instructs the queue to NOT create its own internal heap for allocating queue entry space. Using this option requires that the caller pass server_queue_entry_t *'s in the void * payload spaces. This option is intended for improved queue efficiency when you are working with payloads that will never exist on more than one queue at a time. In these situations you can simply nest the server_queue_entry_t structure within the payload data structure. Assign the payload pointer to the foo member of the server_queue_entry_t, then use the server_queue_entry_t * when interfacing with the queue. The queue will give back the server_queue_entry_t on pull/cycle calls, and will make no attempt to free or otherwise manage the queue entry space. You access the payload by casting the void * to a server_queue_entry_t * and dereferencing the foo member. The label is a name you should place some context in so when ServerKit reports statistics about its primitives you can see which statistics are from what parts of your program, there will likely be many heaps/queues in use at once. This function returns the new queue on success or NULL on failure. int server_queue_free(server_queue_t *queue) Frees the queue, this must be called with an empty queue. That is, you must have pulled or acquired all the entries before trying to free it. Of course it would be possible to have server_queue_free() walk the resident queue entries and unref them, but the assumption is that something needs to be done with the payload resources. Since the queue has no knowledge of payload-specific semantics, it's difficult to support this cleanly, and would probably just result in buggy programs. Returns 1 on success, and 0 on error. Error means either you passed a NULL value as the queue, or you passed a queue that was non-empty. When 0 is returned, the queue is left as-is, possibly keeping it functional and the free can be tried again. It is not recommended that you design your program around this behavior as if it is normal, and ServerKit will complain via stderr every time you cause these errors. After successfully freeing a queue you may not use it again. If you happen to have server_queue_entry_t instances acquired before the server_qeueu_free() call, they are unaffected by the free. Once you have called server_heap_unit_unref() on all acquired queue entries the orinating heap will get freed automatically. This happens because when you free the queue, the internal transport heap is also freed with server_heap_free(). If you look at the semantics of server_heap_free() explained in this document, you'll see mentioned that server_heap_free() will defer the actual heap freeing if there are any outstanding unit references. int server_queue_push(server_queue_t *queue, void *foo) This function adds an entry to the back of the queue with the payload specified in foo. If you created the queue with the SERVER_QUEUE_ENTRY_SUPPLIED option, you are expected to supply a server_queue_entry_t * in place of the void * payload, with your payload of interest assigned to the foo member of the server_queue_entry_t. It returns 1 on success 0 on failure. When not using a SERVER_QUEUE_ENTRY_SUPPLIED queue, this call can block perhaps indefinitely, if the queue is maxed out and nobody is consuming anything from it, ever. Without SERVER_QUEUE_ENTRY_SUPPLIED this function has to allocate an entry to place on the linked list, it does this using a ServerKit heap since the entries are of fixed size. Generally, this should work quite well and over time the heap will grow to a high water mark and expensive allocations will cease, the problem is when no threads pull from the queue but you continue to push. The heap will continue to grow, to unlimited size if SERVER_QUEUE_UNLIMITED is set, and eventually simply exhaust memory and fail if nothing ever consumes from the queue. So you must be careful especially in the consumer side, *especially* when using unlimited size. int server_queue_pull(server_queue_t *queue, int block, void **foo) This function removes an entry from the front of the queue, stores the payload at the memory pointed at by foo. If you specify 1 for block, and the queue is empty, the function will wait until something is available, perhaps forever if nothing ever gets pushed onto the queue. When not using a SERVER_QUEUE_ENTRY_SUPPLIED queue, after storing the payload at the memory pointed at by foo, the entry is freed for you using server_heap_unit_unref(), so that it may be reused by a server_queue_push() call ASAP. When using SERVER_QUEUE_ENTRY_SUPPLIED, instead of storing the payload at the void **foo space provided the server_queue_entry_t * you supplied to server_queue_push() is stored, and ServerKit does no freeing on your behalf. You must cast the void * pointer to server_queue_entry_t * and access the payload in the foo member of the server_queue_entry_t. It returns 0 on error or empty queue w/0 specified for block, and 1 on success (success means the memory foo points at now has the payload, should be safe to use it, whatever that means). int server_queue_acquire(server_queue_t *queue, int block, server_queue_entry_t **_head) This is similar to server_queue_pull, but is provided to improve the throughput of the queue. This is not always useful depending on your needs, the general rule is, if you have a queue from which only one thread ever consumes, this is the function to use. When you have many consumers, this function is more difficult to use because it bites off as much as it can from the queue (all of it), what about the other threads? Perhaps I will add a maximum argument where you can limit how much you will consume at a time, if there is interest. When you use this instead of server_queue_pull(), the difference is in the third argument, rather than storing the payload in the memory your pointer points at, it stores th ServerKit type server_queue_entry_t, which is defined to be this in server_queue.h: /* begin */ typedef struct _server_queue_entry_t { struct _server_queue_entry_t *next; void *foo; } server_queue_entry_t; /* end */ This should look familiar, it's a singly linked list, and the function placed the head of it in the space you provided. When not using a SERVER_QUEUE_ENTRY_SUPPLIED queue, the server_queue_entry_t is allocated using the ServerKit heaps which provides reference counting for us. While you are walking the linked list you have acquired, you should free the entries so they may get reused sooner than later. At the very least do not forget to free them ever, creating a memory leak. Here is a sample use of server_queue_acquire(): /* begin */ /* assuming queue is already initialized, and work_on_payload() * does something with the object consumed */ server_queue_entry_t *head, *next; if(server_queue_acquire(queue, 1, &head)) { for(; head != NULL; head = next) { next = head->next; work_on_payload(head->foo); server_heap_unit_unref(head); } } /* end */ If you were using a SERVER_QUEUE_ENTRY_SUPPLIED queue, you would probably omit the server_heap_unit_unref(head); How you self-manage your queue entry memory is beyond the scope of this reference, but if thats what you want to do use SERVER_QUEUE_ENTRY_SUPPLIED. int server_queue_cycle(server_queue_t *queue, void **foo) Returns 0 on error or when the queue is empty, if 0 was returned, nothing was done, foo is left alone, your object is still intact. Returns 1 on success, what foo points at has been replaced with the new object pointer. You supply the queue you want to work with, and foo points at the pointer to the object you wish to put on the queue. Like every other call in the server_queue API, if you used a SERVER_QUEUE_ENTRY_SUPPLIED queue instead of foo pointing at some opaque object it must point at a server_queue_entry_t, and the foo member of the server_queeu_entry_t would point at the opaque payload object. The function will also store the new payload at the supplied location in foo. This means, when it returns 1, what foo pointed at has been overwritten with a new pointer ready for your use. As mentioned above at the beginning of the queue section, this function takes an entry from the front of the queue, gives you the payload, takes the payload you provided, assigns it to the entry it just took from the front, and places the entry at the back of the queue. - Heaps {server_heap.[ch]} The heaps in ServerKit are very simple thread-safe fixed-size allocators with builtin reference counting. The purpose of these heaps is to provide a fast heap allocator that keeps the memory around even after being freed (garbage collection / heap shrinking is currently supported as an option but ignored in operation, shrinking will probably be added soon). When you create a new heap you define the size of the individual units to be allocated on the heap, how many units for the heap code to allocate via libc malloc() at a time, the maximum units to allow the heap growth to reach, and some options tweaking the behavior. Some common uses for these style heaps are: Session containers that are created for each new connection and released when the connection is disconnected. It is important to make this efficient as possible, especially when you need to support very high connection rates. Message containers in message passing systems, when rates are leveled off new libc allocations should cease to occur, the unis in the heap will simply continuously be cycled through. May add more examples later. An important limitation of these heaps is their fixed unit size, this is one major specialization that makes them efficient, allocations do not involve any complex contiguous memory searches of specific size - except when the heap has to grow which invokes malloc(). In multi-threaded programming it is also important to have local heaps to specific allocation domains. If your application simply uses malloc() and free() everywhere for all memory needs, the process-wide synchronization required in libc to make malloc() and free() thread-safe will serialize all the threads concurrently using these functions, hurting parallelization. An example of this would be say, an application with many threads that consume messages from various sources via queues. You would probably want to have a heap per queue (if you used a ServerKit queue primitive it comes with a local heap builtin). This allows a thread to allocate a queue entry destined for one thread while another thread does the same on a different threads queue (if SMP is available). If they were all using malloc/free directly the allocations & frees would get serialized. When you have provided local heaps for the local allocation domains the serialization is a worse case situation that only happens when the producers collide on the same consumer, but there is potential for parallel execution, it's not forced to be the rule. A large part of writing good multi-threaded programs is carrying knowledge of locality, domains, and contexts, as far down to the copper as is practical. The moment you give up control in your code and hand it off to some global library or system call, it's very likely awareness of your programs needs has disappeared and something is being done sub-optimally due to being a very generic interface. POSIX threads gives us an interface to the kernel to convey these details through, and it's reasonably effective at this. However, you will probably find, libc and other middleware usually are not so cooperative. The currently supported options for ServerKit heaps are: SERVER_HEAP_GROW_EXPONENTIALLY Grow the heap exponentially, this basically makes the malloc() used for growing double in size every time, starting at the units_per_chunk value you supplied. SERVER_HEAP_UNLIMITED Ignore max_units, just keep growing - use this with care or not at all. SERVER_HEAP_SHRINK Shrink the heap when it makes sense (this currently is ignored but if you are using a heap where it would make sense to shrink it when possible, specify it, one day libserver will just start doing it.) These options are specified when you create the new heap, you may OR them together, or just specify 0 if you don't want to set any options. Since we've already started to touch on it, heres the API: server_heap_t * server_heap_new(const char *label, int unit_size, int units_per_chunk, int maximum_units, server_heap_options_t options) Returns a heap pointer for use with the rest of the heap API or NULL on failure. unit_size specifies how large the individual allocations are to be. units_per_chunk specifies how many units to grow by whenever heap growth occurs. This is the starting value if SERVER_HEAP_GROW_EXPONENTIALLY is set. The label is a simple name you should use to give this instance some context, most of the SeverKit primitives allow you to name them, this is so when ServerKit reports statistics it can give some meaningful identities to the numbers. Most of the time you will probably want to set the exponential growth option, it makes the heap growth more efficient at the cost of some memory... but it really depends on what you need. void server_heap_free(server_heap_t *heap) This function frees the heap, note that this is a tricky one because there might still be references to units on the heap. In the event that units are still referenced the free is simply queued and you can consider it freed, when the final unit is unref'd it will sense the queued free and do the actual freeing of the remaining memory. void * server_heap_unit_allocate(server_heap_t *heap, int wait_if_exhausted) This function returns a unit of memory, whos size in bytes is the unit_size specified at heap creation time. The wait_if_exhausted flag specifies wether it should block waiting for a unit to become available when the heap is empty and cannot be grown. Set it to 1 to enable this behavior, if you set it to 0 it will return immediately, just with NULL when the heap is exhausted. Generally you will want to set it to 1. void server_heap_unit_unref(void *_unit) This function decrements the reference count on the unit, notice that there is no server_heap_unit_free() function, that is because server_heap_unit_unref() is essentially that function. When you allocate a unit from the heap, its reference count is set to 1. The unit will not be freed until ti gets back to 0, by calling this function to do so. Note that these functions are all thread safe at the moment, a mutex is stuffed away in the unit where you can't see it, next to the reference count, so ref and unref can be called freely from many threads. void server_heap_unit_ref(void *_unit) This function increments the reference count on the unit, you must do this when you add another reference to the unit. For example, lets say you have a unit you allocated from a ServerKit heap with server_heap_unit_allocate(). You have populated this unit with data representing a message, the message is to be sent to a number of threads via seperate ServerKit queues. For every queue you place the unit on, you must increase the reference count. When you are finished with placing the message on queues, you must decrement the reference count because hwen you allocated it, it was set to 1. These details are important to understand, if you don't get it you will have some serious problems with multithreaded programming. Here is some code with explanation: /* begin */ typedef struct _foo_t { /* heres our message structure */ char msg[128]; int len; } foo_t; server_heap_t *heap; heap = server_heap_new("messages", sizeof(foo_t), 20, 2000, SERVER_HEAP_GROW_EXPONENTIALLY); if(heap == NULL) { fprintf(stderr, "Failed to create messages heap"); return 0; } unit = server_heap_unit_allocate(heap, 1); /* note at this point our unit would have a ref count of 1 */ /* imaginary function that gets a message from something and * populates the foo_t instance with it */ recv_message(unit); /* now, we distribute the message, lets say we have a bunch * of consumers in a consumers linked list called c_list, each * one has a ServerKit queue attached to it. * * So the consumer type looks like: * struct consumer { * struct consumer *next, *previous; * server_queue_t *queue; * } */ /* note that if the c_list is modified by other threads it * would have to be protected while we walk it. */ for(c = c_list; c != NULL; c = c->next) { server_heap_unit_ref(unit); server_queue_push(c->queue, unit); } /* it's as easy as that, since we increased the reference * count for every queue we pushed it on, there is no risk * of the unit getting unref'd down to 0 out from under us by * any of the consumers, because we had a reference count of 1 * ever since the unit was allocated. * * now that we are finished adding references (queueing), we * have to do do our unref because we're finished with it too. */ server_heap_unit_unref(unit); /* now the unit is potentially free, so we can't touch it * again in this context! I say potentially because wether it * really is free'd by now is dependant on the consumers, if * they have scheduled to run in the mean time and finished * their work, decrementing the ref count... it's possible but * it's also possible that they have yet to run and the ref * count is positive still, regardless, we don't know this * in this context and we must forget the unit existed. * * We would probably just loop back to the top here and get * a new unit, recieve a new message etc. Hopefully the next * unit is a recycled one, and the heap didnt have to grow, * that would be very nice. */ /* end */ int server_heap_is_busy(server_heap_t *heap); Returns 1 if the heap is 'busy', meaning there are still units originating from this heap with a positive reference count. 0 is returned when the heap is not busy. This can be used to synchronize threads that share a heap for communication use. A producer thread may be used to allocate units on the heap, and as a result would be the only source of new allocations. Consumer threads would recieve the allocated units and unref each when finished "consuming". By polling the heap periodically in your producer, you seize all new allocations and wait for all consumption to finish. This is typical of shutdown procedures, but in general it is not appropriate to wait for threads to finish things so use in practice expected to be rare. This function is not intended to be used where efficiency is important, otherwise it would be designed to wait rather than poll. - Logging {server_log.[ch]} ServerKit supplies a logging interface similar to the syslog() libc interface. In ServerKit the logged messages are placed on a queue and there is a single logging thread which consumes the messages from the queue and either calls the libc syslog() or sends them directly over a UDP socket in a syslog-compatible format. The latter is more efficient especially when you are in a large scale environment that you remotely log everyting in anyways, theres no point in adding IPC overhead just so your syslogd can send the message over the UDP socket. Another reason for adding the UDP option is syslog() in some Linux distributions manipulates the process' signal mask. When you do this with large numbers of threads, it is very expensive (at least with the current kernels...). Some distributions have fixed this problem in libc by adding the MSG_NOSIGNAL to the flags on the send to /dev/log, rather than setting SIGPIPE to ignore upon syslog() entry and restoring it on exit, which is what other distributions' libc do. You are not required to use this logging interface, it is here for your convenience and it's recommended that you use it. int server_log(int priority, char *who, char *fmt, ...) The way you call this is just like syslog() (see syslog(3) man page), with one twist, there is a "who" 2nd argument, which is similar to what you would pass to openlog() when using syslog as ident. The addition of who is to allow you to add additional context to the logged message for identification purposes, due to how ServerKit is designed to operate, there may be many modules loaded for a particular personality. Any logging these modules perform should pass their name as the who argument, so any messages they log show useful origin. E.g. a pop3 module logging a successful login would do something like: server_log(LOG_MAIL | LOG_NOTICE, "pop", "LOGIN user: \"%s\"", session->user); Like I said, this is important because say there were also an SMTP module loaded in the same personality, they are both executing under the same process but with different threads, you need to distinguish the logs somehow. /* tangent */ You may be thinking, "why on earth would I want to combine POP and SMTP under the same ServerKit personality?" but you should probably bite your tongue, can you think of a more efficient way for the pop server to share data with the smtp server? What if you had mostly pop users logging in who don't leave mail on the server at high frequencies, sometimes immediately after accepting mail destined for these users... Wouldnt it be better to just defer the delivery to the maildrop to moments after the next expected pop login from that user? If they manage to login before the timer expires you could avoid ever writing the message to the back-end storage, it just hung around in user space for a while before getting sent over pop and freed. You just cut out a big pile of user <-> kernel space copies and some IOPS on what is usually expensive storage. Obviously you still have to store the message somewhere non-volatile, but that is usually the spool on fast local disk which differs from the big disk getting beat up with random IO from all the mail activity. We're talking big systems here, not the 100 user shell box. You would probably also have to selectively perform this caching focusing on smaller messages to not just eat all your memory with pdf attachments, most mail is small though in my experience. So small, in fact, that it usually doesnt even fill the smallest block size of big storage (8K in my experience), boy would I love to not push those messages to the maildrop if I don't have to. /* end tangent, making mail fast is fun isnt it ;) */ - Reader-writer locks {server_rwlock.[ch]} When implementing parallel programs, it is very common to have data structures which have a severe imbalance in how they are concurrently accessed (read from vs. written to). Most of the time your shared data structures can be efficiently protected with a simple mutual-exclusion lock, "mutex" for short. Pthreads provides this primitive with the pthread_mutex_*() portion of the API. When your design results in shared data that is often concurrently read, and occasionally written to, simple mutexes will be serializing your potentially parallel readers. This is especially true if the critical section is expected to have a long duration. Keep in mind the scheduler can preempt and stop running your thread with the lock held no matter how 'fast' your critical section appears in code, this depends on the scheduler / operating system being used. A good example of a proper reader-writer lock use would be a large shared hash that occasionally has entries added or removed. But often has multiple threads reading from it to search for entries of interest. Before searching the hash the code would acquire a "read lock" on a server_rwlock_t instance protecting the hash, this is done with the function server_rwlock_rlock(). For the duration of the search the rwlock will have an internal reader count set positive. Once the search has been completed the reader must release the "read lock" using server_rwlock_unlock(). Since the search code is only acquiring a "read lock" it has the potential to execute in parallel. The critical section is reduced to only the serialization needed to protect the internal server_rwlock_t counters. The only time this search code cannot potentially run is when the "write lock" is held. This will cause the acquisition of the "read lock" to block until the "write lock" has been released. Before modifying the hash, the code would acquire a "write lock" on the same server_rwlock_t instance protecting the shared hash. This is done using the server_rwlock_wlock() function. For the duration of the modification (writing) the rwlock will have an intenal writer count set positive. Once the modification is completed the writer must release the "write lock" using server_rwlock_unlock(), same as a reader releases the "read lock". When either the "read lock" or the "write lock" is released, the library decrements the respective internal reader or writer counter. When the counter has been decremented down to zero, a "free lock" condition variable is signaled internally. This will awake blocked acquisitions on the "read lock" or the "write lock", if any exist that is. A new addition to ServerKit has been the ability to convert a held lock to either a "read lock" or a "write lock". In the example given above, where a "read lock" was acquired to search a hash, lets say after searching the hash you failed to locate the entry. Your routine is required to insert the entry after confirming it doesnt already exist in the hash. This poses a problem, you acquired a "read lock" so that others could search the hash while you searched it in parallel. But now you are in an interesting position of having to modify the hash which you are not immediately permitted to do. Previously, you would be required to unlock the reader-writer lock, and acquire a "write lock" using server_rwlock_wlock(). This would work just fine, except between your unlock and wlock, there is a window where another thread could acquire the "write lock" and modify the hash. A classic race condition. As a result of this possibility, you would have to do something to ensure your insert is still the result of a true condition (the searched for entry not being present). There are a few ways you could achieve this, one which is to simply redo the search after acquiring the "write lock". Another would be to simply acquire a "write lock" on all searches that need to insert an entry when it doesnt exist. Instead of forcing you to deal with this complex situation, the reader-writer locks have been made more flexible. Two functions, server_rwlock_r2wlock() and server_rwlock_w2rlock(), have been added which atomically convert a held "read lock" to a "write lock" and vice versa. In the above example, instead of unlocking the "read lock" and then acquiring a "write lock" with a race, you simply call server_rwlock_r2wlock() on the reader-writer lock you already hold a "read lock" on. If the function returns 1, you will hold a "write lock" as if you called server_rwlock_wlock(). Whatever data condition was true before, requiring the "write lock" _will_ still be true. No "write lock" could possibly have been acquired since your search by anyone else. If the function returns 2 however, you will still hold the "write lock" but the different return value indicates that you were unable to atomically move from "read lock" to "write lock", and will have to assume data changed.(see IMPORTANT below) Making this possible does add some overhead to the reader-writer locks in general, but properly used the increased flexibility is welcome. Here is a summary of the rules by which reader-writer locks operate. Counters x_readers, writers, and readers all start life with a zero value. server_rwlock_rlock() Blocks on condition variable until: Zero writers Increments: Reader count server_rwlock_wlock() Blocks on condition variable until: Zero readers AND writers Increments: Writer count server_rwlock_r2wlock() Increments: X_readers count Blocks on condition variable until: X_readers equals readers Decrements: Readers and x_readers Increments: Writers server_rwlock_w2rlock() Decrements: Writers Increments: Readers Broadcasts condition variable: Always server_rwlock_unlock() Decrements: Readers if positive OR writers Signals condition variable if: Decrement resulted in zero value Or broadcasts condition variable if: Readers equals x_readers You must match ALL server_rwlock_[rw]lock() calls with a single call to server_rwlock_unlock(), protecting your critical section just like you would using Pthreads mutexes. Note that GNU libc implements reader-writer locks as part of the Pthreads library. It is conditionally available based on the __USE_UNIX98 define, and you are free to use this instead of the ServerKit reader-writer locks. Note the ServerKit implementation features held-lock conversions which UNIX98 rwlocks don't have. And now for the strict reference portion: #define SERVER_RWLOCK_INITIALIZER \ {0, 0, PTHREAD_COND_INITIALIZER, PTHREAD_MUTEX_INITIALIZER} This macro is defined in server_rwlock.h, it is provided to assist in static initialization of server_rwlock_t instances. Use is identical to how you use PTHREAD_MUTEX_INITIALIZER with Pthreads. int server_rwlock_init(server_rwlock_t *rwlock); Returns 0 on error 1 on success This function is for dynamic initialization of a server_rwlock_t instance. This is the alternative to using the SERVER_RWLOCK_INITIALIZER macro. You should destroy initialized server_rwlock_t instances using server_rwlock_destroy() when finished. int server_rwlock_rlock(server_rwlock_t *rwlock); Returns 0 on error 1 on success This function acquires a "read-lock" on the supplied initialized rwlock. Upon successful return there may be other readers but never a writer. int server_rwlock_wlock(server_rwlock_t *rwlock); Returns 0 on error 1 on success This function acquires a "write-lock" on the supplied initialized rwlock. Upon successful return there are never any other writers or readers, exclusivity is achieved. int server_rwlock_unlock(server_rwlock_t *rwlock); Returns 0 on error 1 on success This function unlocks the supplied initialized rwlock. Use of this is unchanged when dealing with either a held "read-lock" or "write-lock". Internal logic discerns which case is being unlocked, possible due to the exclusivity of the "write-lock". int server_rwlock_r2wlock(server_rwlock_t *rwlock); Returns 0 on error, 1 on success, 2 on success but race lost This function atomically converts a held "read lock" to a "write lock". The caller _must_ hold the read-lock before calling this, otherwise bad things will happen. This function will block until "write lock" status can be acquired (all other readers have gone away), and for the duration of the conversion all other "write lock" attempts will be blocked making this atomic. Use this function for elevating your lock level after you have done your reading and determined you must modify. If you must ensure that the state of what is protected cannot be modified between your lengthy reading process and now needed write process, this function is how to do it. ** IMPORTANT **** IMPORTANT **** IMPORTANT ** There is still a problem / opportunity for modifications to occur between your server_rwlock_rlock() and successful return of server_rwlock_r2wlock() however. This is due to the potential for multiple threads holding the "read lock" entering server_rwlock_r2wlock() simultaneously. One will return without contention for the transcending state, others will be blocked until the winner gives up the write lock. This means the others cannot assume the protected data has not changed. As a result, when you use server_rwlock_r2wlock() you must test the return value to see if it is 1 or 2. When it returns 2 the "race was lost" and though you now have acquired the "write lock" someone else had it somewhere during your transition from "read lock" to "write lock". This means that in the worse-case scenario you must deal with it as if you were using posix reader-writer threads, and as in the example above redo your search. However, when the returned value is 1, you were uncontended and enjoy improved efficiency... which is presumably the common case. int server_rwlock_w2rlock(server_rwlock_t *rwlock); Returns 0 on error 1 on success This function atomically converts a held "write lock" to a "read lock". Similar to server_rwlock_r2wlock(), this function must be called with the "write lock" held. Since the "write lock" is already held, this function should not be expected to potentially block for a significant period, unlike server_rwlock_r2wlock(). int server_rwlock_destroy(server_rwlock_t *rwlock); Returns 0 on error 1 on success This function destroys the supplied initialized rwlock. Do not use this rwlock instance again without re-initializing it. Deadlock may otherwise occur. - Sequences {server_sequence.[ch]} Sequences are something I came up with to make SMTP and POP logs more meaningful. The basic idea is most services you provide have a session, a lifetime of sorts during which a number of various things happen - the variety of which is finite and defined by the protocol, usually the range is actually small... It's just they can potentially be repeated alot so generally simple counts are kept if anything at all. The small range makes it easy to just pull single ascii character keys out of thin air to represent particular requests within a protocol. Alot of programs out there when they log the end of a session, be it SMTP, POP, or IMAP... will print some counts, like how many times the user issued DELE, RETR, TOP in POP3 for example. This can be useful but it loses too much information, the time dimension is lost completely, there is no preservation of the order of these events, the _sequence_. The way ServerKit sequences solve this problem is almost comically simple. By efficiently implementing a way to store the sequence of events in the form of a character string which you would probably log at the end of sessions. When a session begins, you would create a new sequence with server_sequence_new(). Or if you already have something you allocate when a session begins, like a session structure, just place a server_sequence_t within it, and be sure to call server_sequence_reset() before trying to use it, which will initialize it properly. Then, for every event / request / command etc you support, you would pick a character. For the purpose of this example I will use pop3. USER = u PASS = p CAPA = c LIST = l RETR = r STAT = s TOP = t UIDL = U DELE = d QUIT = q RSET = R given the above assignments, when USER occured, I would call: server_sequence_append(&session->sequence, 'u'); PASS: server_sequence_append(&session->sequence, 'p'); QUIT: server_sequence_append(&session->sequence, 'q'); etc, it's simple. What ServerKit is doing with these characters is it's assembling a sequence stored in the server_sequence_t. It doesnt just append the letters onto the end of a char * until it runs out of space, though that is basically what it is doing. It also compresses repeated characters by counting the repetitions, any time a character is appended multiple times consecutively its count at that slot in the sequence accumulates rather than taking another slot. The sequences are limited to 512 slots in the timeline, every slot can have up to 256 repetitions. When your session is finished, and you wish to log the sequence in string form somewhere you must print it. You do this simply with the server_sequence_print() function. This function will print into a buffer you provide with a specified length. If the print would have exceeded the length of the buffer you provided the end of the printed sequence gets replaced with a '$' to indicate it has been truncated. Also, if a slot overflows the repetition count, its character will get followed by a '+' to indicate overflow, meaning > 256 repetitions. Since these chracters are utilized by ServerKit in the sequence implementation, they are not permitted for use by the append function, you will get a error if you try use '\0', '+', or '$'. This means those three characters are not available for you to assign to commands. Here are some sample sequence prints using the pop3 example: "cupsq" the most common pop3 sequence, capa user pass stat quit "cupsUrdrdrdrdrdrdrdrdrdrdrdrdrdrdrdrdrdrdrdrdrdq" Another common sequence pattern, same as above followed with uidl and a series of retr dele retr dele... then quit "cupsr2d2q" An example demonstrating the repetition counts, retr retr dele dele "cupsr+d+q" Same as above but instead of 2 we overflowed the repetition counter in retr and dele. "cupsr$" Illustrating a tiny buffer supplied to the print function, the sequence was truncated and the $ indicates it. "cupq" A bad sign, if this is going on alot MySQL replication may be broken and users can't authenticate. Clients don't login to not even check the stat or uidl, this is a failed login. "cupupupupupupupupup" Whos fishing for logins? "cu+" Another fishing example, some pop3 servers say something different when you enter user for a valid vs. invalid user. Another interesting thing about having the sequence information is it gives you some very useful insight into usage patterns. After you collect megabytes of these logs you can really see what are the most common patterns to help direct your optimization efforts. I can tell you "cupsq" is probably the most common pop3 sequence. Knowing this, perhaps the parser should place higher priority on the capa, user, pass, stat, and quit keywords, if your parser is designed such that the patterns are tested in some arbitrary order (ptree use can make this a non-issue). Another thing this shows is that users usually authenticate, execute STAT then quit. They are not retrieving the sorted list of messages or uidls, so a good optimization would be to change things so a user could login and have the output of STAT available without pounding the Maildir and sorting the list of all the messages and retrieving the uidl list. Defer the Maildir & uidl retrieval until a command which requires that data is run, and since cupsq is the most frequent command it would be a huge gain in mail system efficiency. Just cache the STAT data somewhere that can be quickly accessed, and update it on Maildir changes. Anyways, here's the boring API details: server_sequence_t * server_sequence_new(void) Returns a new sequence instance on success, or NULL on failure void server_sequence_reset(server_sequence_t *sequence) Resets a sequence instance, may be created with server_sequence_new() or just a server_sequence_t you allocated another way. void server_sequence_append(server_sequence_t *sequence, char e) Appends e to the provided sequence, doesnt return errors, if you use reserved characters it will complain to stderr but not fail, just drops your broken append. int server_sequence_print(server_sequence_t *sequence, char *buf, int len) Prints the string representation of the provided sequence in buf, len is the length of the buffer buf. Returns the number of bytes printed into the buf, includes null terminator. Note you can print a sequence as many times as you want. void server_sequence_free(server_sequence_t *sequence) Frees a sequence that has been insantiated with server_sequence_new(). Do not use this for sequences that were created differently and initialized with server_sequence_reset(). Now, for a quick little note regarding the implementation of the sequence support, an attempt to explain why repetitions are 8 bits. ;) The repetition is limited to 8 bit fields because I didnt want to make printing the sequences any more expensive than necessary. Large scale server programs do alot of logging, you don't want to waste cpu cycles here, so it's a compromise, and I think it's pretty well balanced. Knowing that the process of converting an integer to string is not particularly cheap, I limited it to 8 bits per repetition accumulator and simply created a lookup table of the 256 strings. So, when you are printing the sequence, that whole modulo divide remainder base 10 store string in reverse thing isnt happening. It's a simple string copy from a lookup table indexed by the repetition count. - Parse trees (ptrees) {server_ptree.[ch]} Frequently when writing server modules one must parse input data at the protocol level. Take a POP3 server for example, it must parse lines starting with: {"USER", "PASS", "CAPA", "UIDL", "STAT", "RETR", "TOP", "RSET", "DELE", "LIST", "QUIT"} or SMTP looking for: {"EHLO", "HELO", "MAIL", "RCPT", "DATA", "QUIT", "NOOP", "HELP", "EXPN", "VRFY", "RSET"} Often times when one finds themself implementing one of these programs they will write a code block like so: /* begin */ if(!strcmp(buf, "USER")) { /* handle USER line */ } else if(!strcmp(buf, "PASS")) { /* handle PASS line */ } else if(!strcmp(buf, "CAPA")) { /* handle CAPA line */ } else if(!strcmp(buf, "UIDL")) { /* handle UIDL line */ } else if(!strcmp(buf, "STAT")) { /* handle STAT line */ } else if(!strcmp(buf, "RETR")) { /* handle RETR line */ } else if(!strcmp(buf, "TOP")) { /* handle TOP line */ } else if(!strcmp(buf, "RSET")) { /* handle RSET line */ } else if(!strcmp(buf, "DELE")) { /* handle DELE line */ } else if(!strcmp(buf, "LIST")) { /* handle LIST line */ } else if(!strcmp(buf, "QUIT")) { /* handle QUIT line */ } else { /* handle parse error */ } /* end */ This code functionally is fine (except for pop3 being case insensitive) but it is only very efficient for the case where the input in buf was "USER". When the input is "QUIT", or a parse error, 10+ strcmp()'s must fail before reaching those branches. The code is also a bit annoying to deal with, and it requires that your read() or recv() have enough data in buf to successfully do the full strcmp(). This requirement generally means you will have a input loop looking for a newline before even attempting to do any parsing in your big if-else block of strcmp()'s. To do this you must be iterating through the bytes you recieve from recv() or read() testing them looking for that newline before even doing the strcmp()'s. It's that or have a minimum bytes count and just compare the length of what you've read against that, only entering the parse loop once enough bytes are there to satisfy all of the strcmp()'s. One simple optimization to this technique is to replace the big if-else block with a switch() on the first character of the line. In a simple situation like POP3, this is very effective in making the parsing of commands extremely efficient, with the only collision being RETR and RSET which will require a small if-else-if block of strcmp()'s within the 'R' case. Heres an example of the above code block changed to use a switch to improve efficiency: /* begin */ switch(*buf) { case 'U': if(!strcmp(&buf[1], "SER")) { /* handle USER line */ } else goto _parse_error; case 'P': if(!strcmp(&buf[1], "ASS")) { /* handle PASS line */ } else goto _parse_error; case 'C': if(!strcmp(&buf[1], "APA")) { /* handle CAPA line */ } else goto _parse_error; case 'U': if(!strcmp(&buf[1], "UIDL")) { /* handle UIDL line */ } else goto _parse_error; case 'S': if(!strcmp(&buf[1], "STAT")) { /* handle STAT line */ } else goto _parse_error; case 'R': if(!strcmp(&buf[1], "ETR")) { /* handle RETR line */ } else if(!strcmp(&buf[1], "SET")) { /* handle RSET line */ } else goto _parse_error; case 'T': if(!strcmp(&buf[1], "OP")) { /* handle TOP line */ } else goto _parse_error; case 'D': if(!strcmp(&buf[1], "ELE")) { /* handle DELE line */ } else goto _parse_error; case 'L': if(!strcmp(&buf[1], "IST")) { /* handle LIST line */ } else goto _parse_error; case 'Q': if(!strcmp(&buf[1], "UIT")) { /* handle QUIT line */ } else goto _parse_error; default: goto _parse_error; } /* end */ Now in the worst-case "RSET" you have the switch and strcmp() against "ETR" cost. The rest have the fast switch and call to the single strcmp(). Invalid input from byte 0 invokes zero strcmp()'s and goes straight to the default: in the switch. The rest of the parse errors incur the switch and strcmp() which fails, worst case being a parse error starting with 'R' because "SET" and "ETR" both must be strcmp()'d against before it's deemed a parse error. Clearly this is an improvement in efficiency, but it still suffers from the requirement of reading enough data before even attempting to do the parsing. This method is even more annoying to deal with for the programmer as well. Parse trees (or ptrees for short) are a solution I created for this relatively minor problem. The way a parse tree works is you construct a list of "targets" which define unique tree paths as strings and associate them with something more useful. The something can be an integer key into a small range suitable for lookup or jump table usage, or it could be a simple pointer to data or a target-specific function. Once you have assembled this list, you supply it to a function which processes the list and constructs a tree representation of the list in memory. Then when input is read you use a walk function against the input data and the constructed tree - regardless what quantity of bytes you have read in, and the function walks the tree using the input bytes as directions. When the walk function indicates that it is at the end of a leaf in the tree, it has successfully matched one of your targets against the input stream. When a parse error is encountered the walk function returns an error code. Here is the POP3 example as a ptree target-list: /* begin */ server_ptree_target_t pop3_targets[] = { {"USER ", 5, do_user}, {"PASS ", 5, do_pass}, {"CAPA\r\n", 6, do_capa}, {"UIDL\r\n", 6, do_uidl_all}, {"UIDL ", 5, do_uidl_one}, {"STAT\r\n", 6, do_stat}, {"RETR ", 5, do_retr}, {"TOP ", 4, do_top}, {"RSET\r\n", 6, do_rset}, {"DELE ", 5, do_dele}, {"LIST\r\n", 6, do_list}, {"QUIT\r\n", 6, do_quit} }; /* end */ In this example the do_* assignments are presumed to be functions for handling those aspects of the protocol. Note that I've included the POP3 line terminators as part of the targets. In order to make it RFC compliant I would also include the case permutations for all of the arguments to make it a case-insensitive list. After you have the list put together, you simply pass it to the function server_ptree_new(). From this function you will get a server_ptree_t * which represents the root of the ptree. This pointer is used as the starting point for walking, and is also what you would supply to server_ptree_free() to deallocate the resources associated with the ptree instance. Here's the ptree creation using the above target list: /* begin */ server_ptree_t *pop3_ptree; pop3_ptree = server_ptree_new(pop3_targets, (sizeof(pop3_targets) / sizeof(server_ptree_target_t))); if(pop3_ptree == NULL) { goto _failed; } /* end */ You would only do the ptree creation as part of your program startup or initialization. Even if your program needed to do ptree walking in parallel it would only need one instance per target list. The ptree is never written to after creation, only read from, making it safe to share. Also note that the server_ptree_new() function just constructs a tree of data structures which reference into the strings in your target list. It does not allocate copies of the strings in any way, so your targets must persist for the lifetime of the ptree. How you integrate server_ptree_walk() into your read/recv loop is going to depend greatly on how your program is designed. The important thing is to make sure you handle the corner cases. The walk function returns the number of bytes it processed when there were no errors. If the number it returns is smaller than the length of input bytes you provided it, the walk has terminated at the end of a leaf node (successful match). If the number it returned were equal to the length of input bytes you provided, there would be an ambiguity. It may have terminated at the end of a leaf node, but the only way for you to know would be to call the walk function again with a zero length input. If zero is returned then yes you have reached a target. If -1 is returned then more data must be read. This is how early versions of ServerKit were implemented, In newer versions the walk function only returns a value equal to your input bytes when there is more walking to be done. When the input length is equal to the remaining length of the path, the input length is returned plus 1. So when ptree_walk() returns > input length, you have reached the target with your last call in an aligned fashion. No more calls to ptree_walk() are needed. It is important to understand this detail. Otherwise you may write a program that works most of the time and you may not realize until it is too late that there is a subtle flaw in your use of the walking. Other than what's mentioned above, walking is simple. Once you have server_ptree_walk() properly integrated you will not have to tinker with that part of your code. Provided you do it right, you should be able to extend your program by modifying the targets list and writing the target-specific code the target associates with. Depending on the CPU architecture, the ptree walking has been found to be generally between just 2% faster to 70% faster than the first if-else strcmp() method, and up to 300% faster if strcasecmp() is being used and a target list with all case permutations is used. It is tough to beat the second method which uses a switch on the first character with a strcmp(), but this method is limited to very simple situations. If you have a very very simple situation with only a few possible branches using a ptree is probably overkill and may even be slower than even the switchless if-else example above. However, in more common parsing situations where you have many branch possibilities some of which partially collide ptree performance is quite good. A good ptree candidate for example would be a /proc/meminfo parser: MemTotal: 127148 kB MemFree: 6152 kB Buffers: 9856 kB Cached: 50684 kB SwapCached: 0 kB Active: 74236 kB Inactive: 34620 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 127148 kB LowFree: 6152 kB SwapTotal: 1001448 kB SwapFree: 1001448 kB Dirty: 192 kB Writeback: 0 kB Mapped: 59560 kB Slab: 8032 kB Committed_AS: 188872 kB PageTables: 816 kB VmallocTotal: 909232 kB VmallocUsed: 2644 kB VmallocChunk: 906204 kB Anyways, here's the ptree API, I'll tidy up the above later. There is a small benchmark at the bottom of the server_ptree.c that is disabled via the preprocessor. You can experiment with it by changing the #if 0 and simply compiling server_ptree.c alone into an executable. The benchmark also provides a sample of ptree integration. server_ptree_t * server_ptree_new(server_ptree_target_t *targets, int n_targets); Creates a tree, returns the root. int server_ptree_walk(server_ptree_t **branch, int *temp, unsigned char *input, int input_len); Walks a tree, branch changes as walking progresses. When you first initiate a walk you start with the root returned from server_ptree_new(). Make sure you keep the root pointer around so you can reuse it, the walk function overwrites the space you pass in branch when it steps to new branches. The temp integer is used as scratch space that must persist across walk calls. You don't have to initialize the temp variable ever, it is fully managed by the walk function. input and input_len are the input stream of bytes that you want the walk function to try use as directions through the tree. The function returns -1, 0, or +N. When 0 is returned a destination has been reached in the tree, whos associated object is available in branch->hook. When -1 is returned a parse error was experienced. When +N is returned and N is less than or equal to input_len N bytes have been consumed from the input. If N is less than input_len a destination was reached and it should be handled as if 0 was returned. If N is greater than input_len a target was reached and N-1 bytes (all input bytes) were consumed. This last case is done to avoid returning N equal to input_len and arrived at destination, a situation that technically shouldn't require another ptree_walk() call, which is different from pre-1.0.0 ServerKit releases which required another ptree_walk() call. int server_ptree_walk_masked(server_ptree_t **branch, int *temp, unsigned char *input, int input_len, unsigned char mask); Identical to server_ptree_walk() except it integrates a bitwise AND of the input bytes with the supplied mask value before comparison. This can be used to facilitate case-insensitivity by using the mask 0x5f. Just keep in mind that if your ptree also incorporates non-alphabetic characters like spaces or newlines, 0x5f may collide. This is true for spaces which are particularly common in ptree paths. The trick is to use '\0' instead of ' ' in the ptree path, because ' ' & 0x5f becomes '\0'. This is often a reasonable solution when case permutations are unacceptable due to wasted memory. void server_ptree_print(server_ptree_t *branch); A convenience function for printing the contents of a ptree. In the future this will probably get a FILE * added to its argument list, right now it prints to stderr. void server_ptree_free(server_ptree_t *branch); This frees the tree from memory,