CPSC 225, Spring 2011 -- Lab 11

CPSC 225, Spring 2011
Lab 11: Web Server

As discussed in class, the goal of this week's lab is to write a basic multi-threaded web server that can make a directory full of files available to real web browsers such as Firefox and Internet Explorer.

You should start a new project for Lab 11 in your Eclipse workspace. You will want to have a copy of this page open in a web browser so that you can copy-and-paste some of the code from this page into your program.

The lab is due in two weeks. For next Friday, April 15, Phase 4 of the final project is due. For Phase 4, you should turn in "a significant amount of code". In the lab next week, you can work on your final project.

Testing Your Server

Once your server is running, you should be able to contact it from any web browser. You just need to enter an appropriate URL in the location box of the browser. For the URL, you need to know the host where the server is running and the port number where it is listening. When you are running the browser on the same computer as the server, you can use localhost as the host name, so you get a URL something like this:

      localhost:8080/index.html

This is a request for the file named index.html on the top level of the server's directory. It assumes that the port number is 8080.

The server can also be contacted from a browser running on another computer. In that case, the URL can use the IP address of the computer where the server is running. For example:

     172.30.10.43:8080/index.html

There is another way to test your server. You can contact it directly from the command line, using the telnet program. Telnet is a simple program that is often used for testing text-based network programs. For example, to connect to the web server that is running on port 80 on math.hws.edu, you can use this command in a terminal window:

      telnet math.hws.edu 80

Once connected, you can type in a request by hand to be sent to the server. Try this:

      GET /index.html HTTP/1.1
      Host: math.hws.edu

with an extra blank line at the the end. You can see the exact text that the server sends in response. You can try this with your own server as well, using a command such as

     telnet localhost 8080

In that case, you just have to enter the first line shown above. This is a good way to tell exactly how your server is responding to the request, which can be helpful in debugging.

About the HTTP Protocol

Web browsers and web servers communicate using the HTTP protocol. This is a "request/response" protocol: The browser sends a request to the server and the server sends back a response. Both the request and the response start with a "header," which consists of one or more lines of text followed by a blank line. The blank line is essential since it marks the end of the header. Lines should be terminated with a CRLF (carriage-return / line-feed, or "\r\n" in Java). The blank line consists of just a CRLF.

After the blank line that ends the headers, a response can contain data, which can be of any type (text, picture, music, etc.). The data that is sent in the response is what appears in the web browser's window; the headers are not displayed to the user of the browser.

Request and response headers can get complicated, but your server will only have to deal with a few possibilities. For the request, you only need to read the first line, which should contain three tokens and should be of the form:

        GET <path-to-file> HTTP/1.1

The first token is the "method." HTTP supports several methods, but you will only implement GET. The second token tells you which file you should send back to the browser. You should probably check that the third token is there, but you don't have to do anything with it. (Some very old browsers might send HTTP/1.0 instead of HTTP/1.1, but you probably don't need to deal with that.)

A web server has a directory that contains the files that it can send to clients. The <path-to-file> tells how to find the requested file, starting in that directory. For example, if the directory is /classes/s11/cs225/javanotes6 and if <path-to-file> is /index.html, then the actual file is /classes/s11/cs225/javanotes6/index.html.

(A technicality that you can probably ignore: If a file name contains special characters such as spaces, they will be encoded in the HTTP request. The get the real file name from the <path-to-file>, you should really call URLDecoder.decode(<path-to-file>,"UTF-8"). Another note for Windows users: File paths on the web always use the forward slash "/" as a separator. Windows uses a backslash "\". If you try to run the web server on a windows computer, you might need to globally replace "/" with "\" in the <path-to-file> for the server to work.)

Once you know what file is being requested, you have to send a response. Assuming that there has been no error and that you are in fact sending a file, the response should look like this, where <mime-type> is the mime type that describes the type of data in the file, and <file-size> is the number of bytes in the file:

         HTTP/1.1 200 OK
         Connection: close
         Content-Type:  <mime-type>
         Content-Length  <file-size>

followed by a blank line and then the contents of the file.

If an error occurred, the server should instead send a response that describes the error. For example, if the requested file does not exist, you can send a "Not Found" error response:

         HTTP/1.1 404 Not Found
         Connection: close
         Content-Type: text/plain
         
         Sorry, the file that you requested
         could not be found.

In this case, the last two lines are the content of the response, which will be displayed to the user. Remember that the browser only displays the part that comes after the blank line. (A real server would use Content-Type text/html and send HTML-formatted text as the response.)

You don't have to implement all possible error responses, but here are some possible errors that you might want to take into account:

HTTP/1.1 404 Not Found -- indicates that the requested file does not exist.
HTTP/1.1 403 Forbidden -- indicates that the file exists but the server can't send it. This might be because the file does not exist, or it might be because the requested file is a directory.
HTTP/1.1 400 Bad Request -- indicates that there was an error in the request. For example, no "HTTP/1.1" on the first line.
HTTP/1.1 501 Not Implemented -- indicates that the server does not support the requested method; for us, this just means that the first word in the request was not "GET".
HTTP/1.1 500 Internal Server Error -- indicates some unknown error, such as a bug in the server.

Server Setup and Connection Handling

Create a new class for your server. Add constants to represent the PORT on which the server will listen and the DIRECTORY that contains the files that are made available on the server. For the DIRECTORY, you can use "/classes/s11/cs225/javanotes6" if you want; that directory is a copy of the textbook for this course.

Your program needs a main() routine and a method for handling a connection. The main routine for a server is pretty standard, and we have looked at an example in class. You can use this code:

    public static void main(String[] args) {
        try {
            ServerSocket server = new ServerSocket(PORT);
            System.out.println("LISTENING ON PORT NUMBER " + PORT);
            while (true) {
                Socket socket = server.accept();
                handleConnection(socket);
            }
        }
        catch (IOException e) {
            System.out.println("Some Error Occurred.  Shutting down server.");
            System.out.println("Error: " + e);
        }
    }

You need to write the handleConnection method. For that method, you can follow the general outline that we went over in class. You need to get the input and output streams from the socket. You need to read the first three tokens from the input stream, using a Scanner. Once you know that you have a legal request, you can take the fileName from the request and use it to look for the file in the DIRECTORY. To get the full name of the file that you want, do

       fileName = DIRECTORY + fileName;

Check that (1) the file exists, (2) you can read the file, and (3) the file is not a directory. (Directories can't just be sent like normal files. You should consider a request for a directory to be an error of type 403.) Once you know that you have a good file you can send out the response on the output stream.

(You should not wrap the output stream in any stream wrapper class (except possibly a BufferedOutputStream for efficiency). Some of the data files that you have to send will be images, and for that you need a plain output stream; I have found that it won't work to, for example, write the headers with a PrintWriter and then to try to send binary data to the same stream.)

The format of the response must be as discussed above:

         HTTP/1.1 200 OK
         Connection: close
         Content-Type:  (INSERT VALUE)
         Content-Length  (INSERT VALUE)

followed by a blank line and the content of the file. The Content-Length can be determined using the method file.length() from the File class. For the Content-Type, you can use the following method to determine the content type based on the extension in the file name:

    private static String getMimeTypeFromFileExtension(String fileName) {
         int pos = fileName.lastIndexOf('.');
         if (pos < 0)  // no file extension in name
             return "x-application/x-unknown";
         String ext = fileName.substring(pos+1).toLowerCase();
         if (ext.equals("txt")) return "text/plain";
         else if (ext.equals("html")) return "text/html";
         else if (ext.equals("htm")) return "text/html";
         else if (ext.equals("css")) return "text/css";
         else if (ext.equals("js")) return "text/javascript";
         else if (ext.equals("java")) return "text/x-java";
         else if (ext.equals("jpeg")) return "image/jpeg";
         else if (ext.equals("jpg")) return "image/jpeg";
         else if (ext.equals("png")) return "image/png";
         else if (ext.equals("gif")) return "image/gif"; 
         else if (ext.equals("ico")) return "image/x-icon";
         else if (ext.equals("class")) return "application/java-vm";
         else if (ext.equals("jar")) return "application/java-archive";
         else if (ext.equals("zip")) return "application/zip";
         else if (ext.equals("xml")) return "application/xml";
         else if (ext.equals("xhtml")) return"application/xhtml+xml";
         else return "x-application/x-unknown";
            // Note:  x-application/x-unknown  is something made up;
            // it will probably make the browser offer to save the file.
    }

Lines in the header should be sent in a very exact format, consisting of ASCII characters only, followed by a carriage return ('\r') and line feed ('\n'). To make things easier for you, here is a method that will send one line:

    /**
     * Sends one line of text to an OutputStream, in proper format for HTTP.
     * A carriage return and line feed are added to serve as end-of-line.
     * @param out  The stream where the text will be written.
     * @param text  The text that will be written, which should consist
     *    of ASCII characters only.  If text is null, no characters are
     *    transmitted, but the end-of-line is still sent.
     * @throws IOException
     */
    private static void sendLineOfAsciiText(OutputStream out, String text) 
                                                           throws IOException {
        if (text != null) {
            for (int i = 0; i < text.length(); i++)
                out.write(text.charAt(i));
        }
        out.write('\r');
        out.write('\n');
    }

Note that you can use this method to send a blank line by passing null as the text. For sending the contents of the file itself, you need to create a FileInputStream for reading from the file. You can then copy the data from that stream into the socket's output stream by calling the following copy method, similar to one that we looked at in class:

    /**
     * Copies bytes from an input stream to an output stream until end-of-stream is detected.
     * @throws IOException if an IOExcption occurs during copying
     */
    private static void copy(InputStream in, OutputStream out) throws IOException {
        byte[] buffer = new byte[8192];
        while (true) {
            int count = in.read(buffer);
            if (count < 0)
                break;
            out.write(buffer,0,count);
        }
        out.flush();
    }

Your server has to check for possible errors along the way. If it finds an error, it should send an error response back to the browser. The format of error responses is described above. It is probably a good idea to write a method for sending error responses, such as:

       static void sendError(String error, String message) ...

where error is the first line of the response and message is the message that will be displayed to the user.

You should make sure that your handleConnection doesn't crash. That is, it should catch any exceptions that occur. Furthermore, it should be sure to close the socket before it returns. A good way to make sure that happens is to do it in the finally clause of a try..catch statement. Since socket.close() can throw an IOException, you should call it in a try..catch:

           try {
              socket.close();
           }
           catch (IOException e) {
           }

Adding Threads

The server that you have written is single-threaded. It can only handle one request at a time. If a second request comes in while you are working on another request, the second request will have to wait until you are finished with the first request, even if that takes a long time because you are sending a large file over a slow network. This is not acceptable for a real server. A real server should be multi-threaded, with several threads to handle connections.

One way to write a multi-threaded server is to start a new thread to handle each connection request. (Note that this solution is still not acceptable for real servers, because starting a new thread is a relatively time-consuming thing, and because you don't want to have the possibility of having too many threads running at the same time.)

To use this technique, you will need a subclass of Thread. The class needs a run() method to specify the task that the thread will perform. In this case, it should handle one connection request. We can pass the socket for that connection to the constuctor of the class. Here's the class:

      private static class ConnectionThread extends Thread {
         Socket socket;
         ConnectionThread(Socket socket) {
            this.socket = socket;
         }
         public void run() {
            handleConnection(socket);
         }
      }

Now, in the main routine, instead of calling handleConnection directly, you will create and start a thread of type ConnectionThread. That's all there is to it! With this change, you should have a minimal but functional multi-threaded web server.

An alternative technique is to use a thread pool, like we did in the web collage project. Use threads that run in an infinite loop, removing sockets from a LinkedBlockingQueue<Socket>, and calling handelConnection for each socket.

In this case, the main program should create the queue and several threads before it starts listening. When the server accepts a connection, it should simply add the socket to the queue.

Improvements

There are a lot of improvements that could be made to the server, ranging from fairly simple to very complex. These improvements are not a required part of the lab, but you might want to add some improvements for extra credit. Here are a few possibilities:

Logging -- Although I'm not requiring it, it's nice for your program to print out logging information about each connection, including where the connection is from, which file was requested, and what the response was.
Requests for Directories -- When a browser asks for a directory, your server treats that as an error. Most real servers will look for a file named index.html in the directory and will send that file as a response. For example, this allows you to use the URL http://math.hws.edu/eck/ instead of http://math.hws.edu/eck/index.html. You can consider doing the same thing.
Configuration file -- Instead of hard-coding the port number and directory into your program, you could read them from a configuration file. The file would contain the configuration data (port number, directory, and possibly other options, such as the number of threads in a thread pool or the name of a file where logging information can be output). The server would read the file when it starts up. This would allow the server program to be used in different environments without being recompiled.

David Eck, for CPSC 225