CS441, Fall 2014, Lab 3

CPSC 441, Fall 2018
Lab 3: Writing a Web Server

The assignment for this lab is to write a basic web server. Although you will implement only a part of the HTTP protocol, your server should be capable of serving pages correctly to a web browser such as Firefox and Chrome, and those pages should be displayed correctly in the browser. You should start by writing a single-threaded web server. When that is working you can add multi-treading.

For this lab, you are encouraged but not required to work with a partner. (Working with a partner will require some extra features in the server.) If you work with a partner, make sure that both names are listed in a comment in the Java file that contains the main() routine. Only one person should turn in the work.

The program will be due in two weeks, and you can continue to work on it in next week's lab. However, there will also be a few new small tasks to do next week. When tuning in the program, your work should be placed in a folder named lab3 in your homework folder. I will print out all .java files in the lab3 folder, or in subfolders, except for NetUtil.java and TextIO.java, so please don't include unnecessary files.

Requirements

A basic web server, which is sufficient for a one-person project, must do the following. It must allow a directory for its files to be specified on the command line (as in done in FileSrv.java). It must be able to respond to GET requests for files, either by sending the file in the response or by sending an appropriate error message. It should respond to unimplemented request types (POST, HEAD, etc.) with a "501 Not Implemented" error. If a GET request is for a directory, and if that directory contains a file named index.html, then you should return that file; if no such file exists, a basic web server can simply return a "403 Forbidden" error. You can ignore any request headers, and you can always use the "Connection: close" response header and close the connection immediately after sending the response. When sending a file in the response, you must include "Content-Type:" and "Content-Length:" headers. The server should log (that is, output) some information about connections to standard output. The final web server must use a thread pool to handle connections (but you should probably start with a version that does not use threads). You can find more information and more details below.

There are many ways that the server could be improved, of course. Here are some possibilities. If you are working with a partner, you should do at least one or two of these. If you are working alone, you could still implement some of them for extra credit. Many of the improvements will require reading the request headers. Some will most like require some discussion with me. You might also have other ideas, which you can discuss with me.

Implement the HEAD method. The response should be the same as for a GET response, but leaving out the actual data. That is, only the response status line and headers are returned. (This is too easy to get full credit for a two-person project.)
Implement the If-Modified-Since request header, and return a "304: Not Modified" response if the requested file is older than the specified time. You will have to deal with the very specific format of dates that is used in the header.
Implement the "Connection: Keep-Alive" request header, to allow a client to get several files using the same connection. This means reading and responding to multiple requests instead of just one.
Implement the "Accept-Encoding:" for an encoding type of gzip. It looks like it's easy to work with gzip in Java. See the API documentation and look for GZIPInputStream and GZIPOutputStream. (You won't be able to use "Content-Length" for a gzipped response, since you won't know how big the compressed data will be. I suggest using "Connection: close".)
Implement directory listings. When the client sends a GET request for a directory that does not contain an index.html file, send back a list of (readable) files in that directory. Ideally, the response will be in html format and the file names will be hyperlinks to the files, so that the person using the web browser can just click on a link to access the file.
Allow requests using HTTP/1.0 as well as HTTP/1.1. Unless you are doing something very fancy, it won't make any difference to how you handle the request, except that the response should begin with "HTTP/1.0" instead of "HTTP/1.1". You might also consider allowing HTTP 0.9 requests, which simply have the form "GET <filepath>". For those requests, you should just return the file (or some error text) without any status line or headers.
Make it possible for the server to read basic configuration data from a file when it starts up. The configurable options could include the port number on which the server will listen, the directory that contains the files that will be served, and the number of threads in the thread pool. (You need to think about how the program will locate the configuration file.)
Protect your server against the kind of denial-of-service attack where the attacker opens large numbers of connections without ever sending any request and without closing the connection. If the thread that is handling a connection waits forever for input, that thread will never be able to do anything else. Soon, all the threads in the thread pool will be blocked and no other clients will be able to connect. To prevent a read on a socket from waiting forever, you can set the "soTimeout" property of the socket. See the API documentation for the Socket class.
Implement cookies to track users in some way.
Implement GET requests with data and/or implement POST requests. For such requests, the server should call some subroutine to handle the request. The data from the request is sent as a parameter to the subroutine. The subroutine is responsible for writing the response. (So, pass the socket output stream to the subroutine as a parameter). The response can be of type text/plain or, if you know html, text/html. You can have several subroutines performing different tasks, indicated by the path in the request. You should probably not attempt this without discussing it with me first.

FileSrv

I handed out the sample programs FileSrv.java and FileClient.java in class, along with the file NetUtil.java that both programs use. All of these classes are in the package netsrv, and you can find copies in the folder /classes/cs441/netsrv. There is also a .jar file containing all three classes, /classes/cs441/FileTransfer.jar.

You will want to use NetUtil.java as part of your web server. You can use FileSrv.java as a model for the web server. FileSrv reads a simple command, either LIST or GET <filename>, from the client and sends back a response. You might want to try running the file server and connecting to it with telnet. To run the server, you can use this command in a Terminal:

java -cp /classes/cs441/FileTransfer.jar netsrv.FileSrv

The directory that contains the files for the server will be the directory that you are in when you give this command. (Make sure that the current directory contains some non-directory files.) Then, in another window, use the following command to connect to the server:

telnet localhost 33987

You won't see any response because the server waits for a command before doing anything. Type the command LIST and press return. You will see a list of available files. The server closes the connection after sending the list. Try again with a command of the form GET <filename> to retrieve a file from the server, where <filename> is one of the file names that were listed. The contents of the file will simply be displayed in the Terminal window. You might also try giving some bad commands to see the error messages that you get.

Getting Started

Start your program with the basic outline for a network server, similar to the structure of FileSrv.java or DateServer.java from Section 11.4.4 of Javanotes. You can choose any port number. 8000 and 8080 are common choices for web browsers that can't run on port 80.

The program needs a main routine to create the server socket and listen for connections. Write a separate "handleConnection" method to do the communication with a connected client. Make sure that any exceptions that occur during the communication are caught so that they do not crash the server. Log some information about each connection, including the address of the client. Both of the sample programs already do logging.

As soon as you get the basic outline done, you might want to test it by telnetting to your server, even if all it does is immediately close the connection. You can try again at various points in the development to test features as you implement them. Don't forget to test how your server deals with errors in the request, since I will certainly do that!

When you have the usual basic outline, you are ready to start programming HTTP. You should use some of the methods from the NetUtil class to help with the communication. You can find a copy in /classes/cs441/netsrv.

Don't just write everything in one long handleConnection method! You should try to design a well-written program in which responsibilities are divided up appropriately among several methods and possibly even additional classes.

Response Codes and Mime Types

When you send a response to a client, the first line of the response includes the string "HTTP/1.1" followed by a status code and a status message. For example, if you are returning a file that was requested by the client, the first line will be "HTTP/1.1 200 OK".

You should also be prepared to send back some error messages. You are not required to implement a lot of different status codes, but you will need these:

HTTP/1.1 200 OK -- the request was successful and the file is included in the response.
HTTP/1.1 400 Bad Request -- the request was not a legal request for the HTTP protocol.
HTTP/1.1 404 Not Found -- indicates that the requested file does not exist.
HTTP/1.1 403 Forbidden -- indicates that the file exists but the server can't send it. This might be because the server doesn't have permission to read the file, or it might be because the requested file is a directory, and your server does not allow access to directory listings.
HTTP/1.1 501 Not Implemented -- indicates that the server does not support the requested method; you are only required to implement the GET method in requests, so you could return this error if the request starts with any other method name.
HTTP/1.1 500 Internal Server Error -- indicates some unknown error, such as a bug in the server.
HTTP/1.1 505 HTTP Version Not Supported -- you might send this error if you happen to get an HTTP/2 request instead of HTTP/1.1. (You should not try to handle HTTP/2 requests.)

An error response should still contain some data. The content of the message is shown to the user in the browser, even for an error response. The content can be a plain text message describing the error in a human-readable ways. (You could also send an HTML response if you know some HTML.)

You can just close the connection after sending a response. To be nice, when you do that, you should include a "Connection: close" header. You should always have a Content-Type header, and it is nice to have a Content-Length header. That is, the response headers take the form

Connection: close
Content-Type:  <mime-type>
Content-Length:  <file-size>

You can omit the Content-Length when you send an error response. The mime-type for an error response will be text/plain (or text/html). For a file, you can determine the mime-type from the file name extension (converted to lower case). This list of mime types should be sufficient for our purposes:

    EXTENSIONS        MIME TYPE
   --------------    ------------
     .html             text/html
     .txt              text/plain
     .css              text/css
     .js               text/javascript
     .java             text/x-java
     .c                text/x-csrc
     .jpg  .jpeg       image/jpeg
     .png              image/png
     .gif              image/gif
     (anything else)   x-application/x-unknown

The last mime type, x-application/x-unknown, is made up. It should make the browser prompt the user to save the file. If you are interested, you can find a long list of mime types in the file /etc/mime.types.

Reading the Request

To keep things simple, you only really need to look at the first line that a client sends to a server. For a basic server, you will not need to read and parse all the headers in the request, and you can use "Connection: close" in the response so that you can simply close the connection after sending the response. Recall that the first line of a legal HTTP/1.1 request has the form

<method>  <filepath>  HTTP/1.1

The method can be GET, POST, HEAD, and some other things. You are only required to implement the GET method. You can ignore the rest of the headers and any data in the request. Nevertheless, you should send a legal response in all cases, even when the command is not GET and even when the first line does not have the correct form. If the line has the correct form but the first word is not GET, then you can send a 501 error response. If the line is not of the correct form, you can send a 400 error response.

The <filepath> in the request will be a path to a file, or possibly a directory. The protocol allows the path to be followed by a "?" character and some additional characters. For this lab, you can strip them off and discard them (or just ignore the possibility). If the <filepath> string contains special characters, they will be URL encoded, so you should decode the string before using it. (There is a URL decoder method in class NetUtil.)

Finding and Sending the File

A web server is designed to serve files from a certain directory. The directory that is used is part of the web server configuration. Your program should accept the directory name as a command-line argument, just like NetSrv.java. You can use /classes/cs441/graphicsbook as a default directory if you like.

The <target> path from the request starts in the server directory. That is, the target path has to be appended to the directory to get the path to the actual file. You might check out FileSrv.java to see how I handle it there, using the File class.

Once you have a File object representing the location of the file, you can use the File API to check whether the file exists and whether it can be read, and send an appropriate error response if the file can't be sent. If the requested file is a directory, you should look for a file named index.html in the directory; if that file exists, you should send it to the client. That is, if the request is a request for a directory and if the directory contains a file named index.html, then send index.html as the response. If no index.html file exists in the directory, a basic web server can simply send a 403 Forbidden response.

Finally, when you are ready to send a file, you need to create a legal response, with status 200 OK. Remember to include the Content-Type and Content-Length headers. You can use the File API to get the file size.

Adding the Thread Pool

The program DateServerWithThreadPool.java from Section 12.4.4 in javanotes is an example of using a thread pool in a network server. Thread pools themselves are discussed in Section 12.3.2. Another example can be found in the solution to Exercise 12.4.

You will need a subclass of Thread to represent the threads in the thread pool, and you will need an ArrayBlockingQueue to carry connected sockets from the main routine to the worker threads. The run method in the thread class should run in an infinite loop in which it takes a Socket from the queue and handles the connection.

The main() routine needs to create the ArrayBlockingQueue that will be used to pipe connections to the thread pool. Then, main() needs to create the threads that make up the thread pool and start them running. This must be done before any connections can be accepted by the server. As the main() routine accepts connections, it simply puts the connected sockets into the queue. Ordinarily, this will take essentially no time.

That's really all there is to it. The ArrayBlockingQueue is designed to take care of all of the "synchronization" that is needed when several threads all use a shared resource (the queue in this case), and the put() and take() methods in the ArrayBlockingQueue take care of any blocking that is needed when a thread wants to enqueue or dequeue an item.

There is still, however, one synchronization issue: The "log" to which the server writes information about connections is a shared resource, since all the threads write their messages to the same destination. Messages from different threads will be mixed together in the output. To keep all the log messages for a given connection in one place, it is advisable to make a string that contains all the log messages for that connection, and write that string with one call to System.out.println() after handling the connection. (I think that System.out.println() is synchronized to prevent data from two calls to System.out.println() to be intermingled, but I am not completely sure of that. To be safe, you could write the log message in a synchronized method.)

CPSC 441, Fall 2018 Lab 3: Writing a Web Server