CPSC 225, Spring 2016 -- Lab 11

CPSC 225, Spring 2016
Lab 11: Web Server

In this week's lab, you will write a simple web server program. This means working with the socket API for networking, as well as working with files and streams. A web server should be multi-threaded, but the one you write for this lab is not. It would not be difficult to add threads, once we learn about them.

You should create a new Lab11 project. You can start with a copy of the sample program NotAWebServer.java, which we looked at in class. You can find it in /classes/cs225/first-socket-examples. This web page contains some additional code that you can use.

This lab is due next week, as usual, by 9:00 AM on Saturday morning, November 19. This is an individual assignment.

About the HTTP Protocol

Web browsers and web servers communicate using the HTTP protocol. This is a "request/response" protocol: The browser sends a request to the server and the server sends back a response. Both the request and the response start with a "header," which consists of one or more lines of text followed by a blank line. The blank line is essential since it marks the end of the header. Lines should be terminated with a CRLF (carriage-return / line-feed, or "\r\n" in Java). The blank line consists of just a CRLF. (At least, that's the standard; most browsers will also accept a plain "\n" to mark an end-of-line. Still, it's better to follow the standard.)

After the blank line that ends the headers, a response can contain data, which can be of any type (text, picture, music, etc.). The data that is sent in the response is what appears in the web browser's window; the headers are not displayed to the user of the browser.

Request and response headers can get complicated, but your server will only have to deal with a few possibilities. For the request, you only need to read the first line, which should contain three tokens and should be of the form:

              GET <path-to-file> HTTP/1.1

In fact, you really only need the first two tokens from this line. The first token is called the method. HTTP supports several different. methods, but you will only implement GET, and you should send back an error message if the method is anything besides GET.

The second token in the request tells you which file you should send back to the browser. This is the essential piece of data that you need, in order to decide what to transmit to the client.

(The third token in the request must be "HTTP/1.1" or, in some very old browsers, "HTTP/1.0". A real web browser should check that this token is correct, but for our purposes, it can be ignored. There will be more lines of data in the request, but again, you can ignore them.)

A web server has a directory that contains the files that it can send to clients. The <path-to-file> tells how to find the requested file, starting in that directory. For example, if the directory is /classes/cs225/graphicsbook and if <path-to-file> is /index.html, then the actual file is /classes/cs225/graphicsbook/index.html. You obtain the actual file path by adding the <path-to-file> onto the server directory.

(A technicality that you can probably ignore: If a file name contains special characters such as spaces, they will be encoded in the HTTP request. The get the real file name from the <path-to-file>, you should really call URLDecoder.decode(<path-to-file>,"UTF-8") to convert the encoded path from the request into the path that you need to find the file.) Another note for Windows users: File paths on the web always use the forward slash "/" as a separator. Windows uses a backslash "\". If you try to run the web server on a windows computer, you need to globally replace "/" with "\" in the <path-to-file> for the server to work. There is a replaceAll method in the String class that you can use for this.)

Once the web server knows what file is being requested, it has to send a response. Assuming that there has been no error and that you are in fact sending a file, the response should look like this, where <mime-type> is the mime type that describes the type of data in the file, and <file-size> is the number of bytes in the file:

                 HTTP/1.1 200 OK
                 Connection: close
                 Content-Type:  <mime-type>
                 Content-Length:  <file-size>

These lines must be followed by a blank line and then by the contents of the file.

If any error occurs, then instead of just failing or sending nonsense, the server should send a response that describes the error. For example, if the requested file does not exist, you might send:

                 HTTP/1.1 404 Not Found
                 Connection: close
                 Content-Type: text/plain
                 
                 Sorry, the file that you requested
                 could not be found.

In this case, the first three lines are the response headers, followed by a blank line as usual. The last two lines in the response are the content of the response, which will be displayed in the web browser window to the user. Remember that the browser only displays the part that comes after the blank line. (A real server would use Content-Type text/html and send HTML-formatted text as the response, but it will be easier for you to send plain text.)

You don't have to implement all possible error responses, but here are some possible errors that you will want to take into account:

HTTP/1.1 404 Not Found -- indicates that the requested file does not exist.
HTTP/1.1 403 Forbidden -- indicates that the file exists but the server can't send it. This might be because the server doesn't have permission to read the file, or it might be because the requested file is a directory.
HTTP/1.1 501 Not Implemented -- indicates that the server does not support the requested method; for us, this just means that the first word in the request was not "GET".
HTTP/1.1 500 Internal Server Error -- indicates some unknown error, such as a bug in the server.

Testing A Web Server

A web server program receives "requests" over the network for "URLs" that are available on the server computer. It sends back a "response." The response can be an error message or the file or other data that the requested URL refers to.

The web server does not care that the request comes from a web browser program. Telnet is a simple program that allows you to communicate with any server that works with plain text data. You can use telnet to contact a web server and send it a request. For example, enter

telnet www.hws.edu 80

on the command line to connect to the web server running on host www.hws.edu, on port number 80. Once you are connected, enter:

GET /index.html HTTP/1.1
Host: www.hws.edu
Connection: close

followed by an extra blank line. The server should respond by sending you the main page of the web site.

When you have written your own web server, you can use telnet to test it. This is a good way to see exactly what response your server will send. Let's say that your server listens on port 8080. Run the server and use a command such as

telnet localhost 8080

on the command line. Then type a request such as

GET /index.html HTTP/1.1

The server should send back a correct header followed by a blank line and an error message or the content of the requested file. (Your server won't need the extra lines that are needed when you are sending a request to www.hws.edu.)

I will certainly use telnet to test your program, especially for error handling, and you will likely find it useful to contact your server with telnet during debugging.

Your server should also work with a web browser such as Firefox, Chrome, Safari, Edge, or Internet Explorer. You just need to enter an appropriate URL in the location box of the browser. For the URL, you need to know the host where the server is running and the port number where it is listening. When you are running the browser on the same computer as the server, you can use localhost as the host name, and you would enter a URL something like this:

                http://localhost:8080/index.html

This is a request for the file named index.html on the top level of the server's directory. It assumes that the port number is 8080.

The server can also be contacted from a browser running on another computer. In that case, the URL can use the IP address of the computer where the server is running. For example:

                http://172.21.7.101:8080/index.html

You should make sure to test that both files and error messages sent by your server appear correctly in a web browser window.

Writing your Server

Start with a copy of NotAWebServer.java. The main() routine in that program won't have to be changed (except maybe for the port number). You will have to remove some code from handleConenction(), and add a lot more, including a lot of error detection and handling.

The handleConnection method must read the request from the socket's input stream, then write a response to the socket's output stream. As mentioned above, you really only need to read the first two tokens from the input stream. The output stream is a more difficult matter because some of the data that you have to send can be binary data (for example, if the response is an image file). This means that you shouldn't use a PrintWriter. Instead, you should use the output stream directly. To do that, you should copy the following method into your code, and use it for each line of text that you want to send:

/**
 * Sends one line of text to an OutputStream, in proper format for HTTP.
 * A carriage return and line feed are added to serve as end-of-line.
 * @param out  The stream where the text will be written.
 * @param text  The text that will be written, which should consist
 *    of ASCII characters only.  If text is null, no characters are
 *    transmitted, but the end-of-line is still sent.
 * @throws IOException if an error occurs while transmitting the data
 */
private static void sendAscii(OutputStream out, String text) 
                                                       throws IOException {
    if (text != null) {
        for (int i = 0; i < text.length(); i++)
            out.write(text.charAt(i));
    }
    out.write('\r');
    out.write('\n');
}

For copying the contents of a file to the output stream, you can use the following method, which is similar to the binary copying program that we covered in class. It simply copies all the bytes from an input stream to an output stream. In this case, the input stream should be one that reads from the file, and the output stream is the one from the socket.

/**
 * Copies bytes from an input stream to an output stream until 
 * end-of-stream is detected.
 * @throws IOException if an IOExcption occurs during copying
 */
private static void copy(InputStream in, OutputStream out) throws IOException {
    byte[] buffer = new byte[8192];
    int count = in.read(buffer);
    while (count != -1) {
        out.write(buffer,0,count);
        count = in.read(buffer);
    }
    out.flush();
}

Your interaction with the client must follow the HTTP protocol. Read the first two tokens from the request. If the first token is not GET, send an error. If the first token is GET, you have to decide which file to send. You do this by adding the second token onto the name of the server directory. You can use "/classes/cs225/graphicsbook" as the name of the server directory, if you want; that directory holds a copy of my online computer graphics textbook.

Once you have the full file path, you can create a File object from it. You have to check: first, that the file exists; second, that the file is readable; and third, that the file is not a directory. If any of these fail, you should send an error response.

After you've gotten past the tests that the file exists, is readable, and is not a directory, you can send a response that includes the contents of the file, with the response code "200 OK". You will need to know the length of the file for the response header, which you can also get from the File object. And you will need a FileInputStream to get the contents of the file.

You will also need to know the "mime type" of the file for the response header. The mime type can be determined from the file extension, the last part of the file name. You can use the following method, which will cover almost all of the files that a server is likely to encounter:

private static String getMimeTypeFromFileName(String fileName) {
     int pos = fileName.lastIndexOf('.');
     if (pos < 0)  // no file extension in name
         return "x-application/x-unknown";
     String ext = fileName.substring(pos+1).toLowerCase();
     if (ext.equals("txt")) return "text/plain";
     else if (ext.equals("html")) return "text/html";
     else if (ext.equals("htm")) return "text/html";
     else if (ext.equals("css")) return "text/css";
     else if (ext.equals("js")) return "text/javascript";
     else if (ext.equals("java")) return "text/x-java";
     else if (ext.equals("jpeg")) return "image/jpeg";
     else if (ext.equals("jpg")) return "image/jpeg";
     else if (ext.equals("png")) return "image/png";
     else if (ext.equals("gif")) return "image/gif"; 
     else if (ext.equals("ico")) return "image/x-icon";
     else if (ext.equals("class")) return "application/java-vm";
     else if (ext.equals("jar")) return "application/java-archive";
     else if (ext.equals("zip")) return "application/zip";
     else if (ext.equals("xml")) return "application/xml";
     else if (ext.equals("xhtml")) return"application/xhtml+xml";
     else if (ext.equals("pdf")) return "application/pdf";
     else if (ext.equals("c")) return "text/x-csrc";
     else return "x-application/x-unknown";
        // Note:  x-application/x-unknown  is something made up;
        // it will probably make the browser offer to save the file.
}

Possible Enhancements

There are many possible enhancements. One of them is to make the server multi-threaded. We might do that in a future lab as an easy exercise. Here are some more ideas that would make the web server a more realistic server. Someone who is considering an enhanced web server as their final project would want to do some or all of these. A lot of them will require more information and help to implement.

Thread pool — instead of making a new thread for each connection, use a set of threads where each thread handles multiple connections.
Log file — instead of writing messages from the server to System.out, write them to a file so that a permanent record of connections will be kept.
Configuration file — make the server read a file at startup that contains things like the server directory, the port number, the size of the thread pool, and the location of the log file. This lets the server be customized without changing the source code or re-compiling.
More mime types — there is a file on Linux named /etc/mime.types. Instead of using a fixed list of mime types in the function getMimeTypeFromFileName, read that file and build a hash set mapping file extensions to mime types.
Directory listing — instead of sending an error when the client requests a directory, send a list of files in the directory. The file names in the list can be links to the files.
More request headers —learn more about the HTTP protocol and implement some additional request headers; for example, the If-Modified-Since header requests that a file be sent only if it has been modified since a specified time.
HEAD method — implement the HEAD, which requests that the server send just the response headers for the request, leaving off the file content. (This is too easy.)
POST method — implements the POST method for some resources. With POST, the client sends some data along with the request and that data is used as input for a program. The response contains the output of the program. (This is the hardest enhancement.)