Overview

ServerKit has been built with multi-threading using POSIX threads from the ground up. You are not required to use the thread pools interface to make use of ServerKit. But since at this time the operator of the module is run in a thread created via pthread_create() you can't get away from the pthreads dependency. This may change in a future version, if there is sufficient demand.

I will briefly run off on a tangent here before describing the API, to hopefully help you better understand the purpose and goals of the thread pool implementation.

ServerKit provides a thread pool interface kindof similar to the database pool interface described above (lacking the SQL-stuff of course). It also adds some unique features for dealing with "delayed work" that you can employ in your modules if you like. You also don't explicitly give and take threads from the pool, rather you add work to the thread pool.

Delayed work is something most system administrators and programmers should be aware of and understand pretty well. Delayed work is what those D state processes on your unix/linux box are experiencing. When it comes to creating large scale applications there are a number of ways to achieve the concurrency required to provide service to many from a single program. However the problem of delayed work is often neglected, or dealt with in a way that makes the code delicate, confusing, or just severely limiting.

In your worker thread function you will likely do some form of IO, some of the IO may be under your control, like accessing a storage device that is local and is relatively deterministic in how latent and reliable it will be. With these forms of IO you probably don't have to worry much or do anything in particular.

When you perform IO on a network socket, or some other less-deterministic device, you need to be careful. Generally people rely on timeouts to simply disconnect when communications fail, but these expirations must be quite lengthy otherwise you're just rude to your latent users.

In ServerKit, use system calls like poll() to timeout quickly on IO that can be potentially very slow, you might even want to employ some form of non-blocking IO. When it comes to sockets, there are two options SO_RCVTIMEO and SO_SNDTIMEO that are properly supported on Linux (apparently not all UNIX-like systems support them). These are helpful in efficiently sensing delays without simply placing a poll() in front of your socket IO system calls.

When you detect a delay, perhaps half a second long when you expected progress, your worker thread function can return to ServerKit a status indicating that it has been delayed.

You inform ServerKit of the file descriptor that is delayed and how long the timeout should be before ServerKit hands the work back to you as expired. (presumably for you to disconnect and clean up, so the timeout would probably be pretty long)

It's quite simple, and it solves the delayed work hogging threads problem without making your code a big ugly state machine.

Libserver implements this feature using epoll(), and your thread function is not run directly from pthread_create(). Instead your thread function is wrapped by a function in libserver which handles delayed work on your behalf. You just have to return what things are delayed, where to look for their arrival, and how long to wait for them.

To avoid introducing alot of context switching, the delayed work timeout is expressed in seconds . It does not attempt to implement any high resolution timing in this context, if you want to timeout the work in under a second don't bother with the delayed work management, just wait the sub-second in poll() or use SO_*TIMEO.

If this limitation becomes a problem it would not be difficult to change it to milliseconds or something. But increasing the resolution would introduce more overhead in the delayed work timer thread.

In addition to handling delayed work there is a concept of unfinished work as well. Depending on what your module implements, your worker function may not often block on IO or maybe never at all. But it may still spend alot of time 'working' on the work unit it has been given.

To prevent starving work from getting worked on, the API also provides a work status return value that indicates unfinished work. If you have some part of your worker that is timely to process even when things don't get delayed per-se, you should go ahead and return from your function with the unfinished status.

What this does is it causes the wrapper function to examine the work queue. If the work queue is empty your worker function simply gets called again with the same work unit you just gave back as unfinished.

If the work queue is non-empty, the unfinished work you returned gets placed at the back of the queue and the work at the front of the queue is passed to the thread pool function in place of the work unit just unfinished. The thread does not back in the thread pool as available when you return an unfinished status, and it's relatively cheap so sprinkling them around your worker function is not a big deal. You can consider returning the unfinished status as a form of voluntary yielding to other potential work.

Locations well-suited for returning work as unfinished are after completing some subset of the work at hand, after reaching a milestone of sorts. For example lets say the work is actually a POP3 session. Returning unfinished after completing the transmission of a message is not a bad idea, if there is no backlog of queued work the session continues without any significant impact. If there is some backlogged work in the queue it will get worked on first, adding some latency to the unfinished work progression.

It's similar to the traditional single-process non-blocking select()-driven state machine but coarser grained and SMP-utilizing. The problem with the non-blocking IO single-process state machine is it is too much of a busy body, far too agressive. Rather than us jumping from work to work a page at a time we either jump from work to work due to significant delays or voluntarily due to significant progress.

2007-12-06