CPSC 441, Fall 2002
Lab 7: Web Server, Part 3


THE FINAL STEP in writing your Web server is to add multi-threading capabilities. Your server will use a separate thread to process each connection request. This will make it possible to service several requests concurrently.

The thread API that you will use is the pthread library, a standardized UNIX thread API. A sample program that uses this library for networking can be found in the file threaded_chat.cc in the directory /home/cs441/Sockets. (But this application is quite a bit different from what you will be doing with your Web server.) In order to use the pthread library, you must include the line

            #include <pthread.h>

at the beginning of your program. And when you compile your program, you must add "-lpthread" to the command line. For example:

            g++ -o httpd httpd.cc Sockets.o -lpthread

Creating Threads

A pthread is represented by a variable of type pthread_t. This type is a struct that contains information about a thread. This struct is filled in by the function that creates the thread, and it is used to refer to the thread in other functions. To create a thread, you must create a variable of type pthread_t. You also need a function for the thread to run. The job of a thread is to run a function. The thread will die when the function returns. (The thread can also die if it is killed by another thread or if the program that creates the thread exits.) The function should take one parameter of type void* and it should return a value of type void*. For example, a function with prototype

            void *processConnection(void *sockptr)

could be executed by a thread. The type void* is a general pointer that can point to a value of any type. The pthread_create function is used to create threads. It takes four parameters: a pointer to a pthread_t variable that will be filled with information about the thread; a pointer -- usually 0 -- which can point to some option settings to be applied to the thread; the name of the function that the thread will execute; and a pointer that will be passed to the function as its parameter. For example:

            pthread_create(&myThread, 0, processConnection, socket);

This statement has a similar effect to simply calling processConnection(socket), except that the function will be executed in parallel with the program that calls it and in parallel with other threads created by the program.

The pthread_create function returns an int. If the thread is successfully created, the return value is 0. A non-zero return value indicates an error.

With this information, it should be easy to convert your web server program to use a separate thread for each connection. After you have this working, there are some other details to take care of.


Synchronization

All the threads created by a program have access to the same global variables. Any time threads are sharing a resource, there is the problem of synchronization: There has to be some way to make sure that a thread can get exclusive access to a resource while it is using that resource. In the pthread library, a mutex is used to ensure "mutually exclusive" access to a shared resource. A mutex is a value belonging to the type pthread_mutex_t. When it is declared, it should be initialized with the value PTHREAD_MUTEX_INITIALIZER. For example:

            pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;

If a resource is shared by several threads, access to that resource should be controlled by a mutex. A mutex can be either locked or unlocked. The function pthread_mutex_lock(&mtx) is used to lock a mutex. If a second thread comes along and tries to lock a mutex that is already locked, it will be forced to wait in line. Its call to pthread_mutex_lock(&mtx) will not finish until the first thread unlocks the mutex by calling pthread_mutex_unlock(&mtx). If you write a segment of code that looks like:

            pthread_mutex_lock(&mtx);
              .
              . // protected code
              .
            pthread_mutex_unlock(&mtx);

then it will be impossible for two threads to execute the protected code at the same time. If several segments of code access the shared resource, then each segment should be protected by the same mutex.

In your program, the log file is a shared resource. Create a mutex and use it to protect the code that writes log entries to the log file.


Using a Thread Pool

So far, this might look easy, but there are a lot of subtleties to using threads. In the case of your Web server, it should not be possible to have an unlimited number of threads running at the same time. In fact, there is a limit on the number of threads that a program is allowed to create. To avoid having an indefinite number of threads, you should allow only a limited number of threads, say 10. To keep track of the threads that are currently running, use an array of structs such as:

            struct ThreadInfo {
               bool inUse;
               pthread_t info;
               Socket *connection;
               unsigned int timeout; // Used in last part of lab.
            };
            
            ThreadInfo thread_pool[10];

This array will be a shared resource, and any access to it should be protected by a mutex. Each spot in the array represents a potential thread. The inUse field in that slot tells whether that slot currently contains a thread. When the server starts, it should set all these slots to false. When the server receives a connection request, it should look for an empty slot in the array, and use that slot for the thread. The thread should be responsible for removing itself from the slot (by setting its inUse field to false just before it terminates). I suggest that you pass a pointer to the ThreadInfo struct to the function, instead of a pointer to the socket:

      pthread_create(&thread_pool[i].info, 0, processConnection, &thread_pool[i]);

Add a thread pool to your program. You will have to decide what to do if the server receives a connection request and the thread pool is already full.


Testing Your Program

To test your program, you should run it with a page that contains a lot of images. After a web browser loads the page, it will very quickly open connections for all the images on the page. You probably have such a web page in your own www account, so set up your server to use /home/username/www as document directory (replacing username with your own user name). You should then be able to request pages from your own web site. If you don't have a suitable page, you can use the directory /home/cs441/testwww as your document directory and request the file index.html. This page is a lab from my computer graphics class that contains a lot of images.

To check that you are handling connections simultaneously, you might want to put a line such as cout << "Starting\n" at the beginning of your processConnection function and a line such as cout << "Finishing\n" at the end. That way, you can check whether one thread starts before another finished.


Cleanup and Killing Threads

For a perfect program, there is one more thing you could do. It's possible for a connection to stay open almost indefinitely, for example if the client machine crashes. A malicious hacker might keep a connection open indefinitely as an attempt at a denial of service attack. This might not happen often, but you don't want your thread pool to fill with useless connections that will block new requests from being processed.

To handle something like this, you can use timeouts. If a connection has be inactive for too long, it should be killed and its spot in the thread pool should be reclaimed. To make this work, the thread should store its starting time in the timeout field of its ThreadInfo struct. Periodically, you can check the timeout value of each thread that is currently running. If too much time has passed since the thread started, you can kill the thread and clean up after it. (Actually, a better approach would be to have the thread update the timeout value each time it does something, such as send a block of data to its client. That way, you will only kill threads that have been inactive for a while.)

The function time(0) returns the current time expressed as the number of seconds that have passed since January 1, 1970. To use this function, your program should #include <time.h>. You can kill a thread with the pthread_cancel function, which takes one parameter of type pthread_t. For example:

            pthread_cancel(thread_pool[i].info);

You still have to decide who will be responsible for killing off threads. One possibility is to let the main program do it. Each time it gets a connection request, it can check the thread pool and kill off dead threads. Another possibility is to create a separate thread to do it. This thread would run a function that checks the thread pool periodically and kills off threads that have been inactive for too long. This "reaper" thread should only run periodically (every few seconds, for example), so you should use a function call such as

            sleep(5);

to insert a delay into the reaper thread. The sleep function suspends the thread for a specified number of seconds. To use this function, your program should #include <unistd.h>.


David Eck, November 2002