CPSC 225, Fall 2007 -- Lab 10

CPSC 225, Fall 2007
Lab 10: Networking II

In this lab, you will write a small web server program. It will be able to respond to requests for web pages, just like a real server, but it will lack most of the features that make real web servers complicated.

You will need the files from the directory /classes/f07/cs225/files_for_lab_10. You can begin by creating an Eclipse project and adding the files from that directory to your project.

This lab will be due next Wednesday, November 14. And don't forget that last week's lab is due on November 7; be sure that it is available in your regular CVS repository.

Serving Up Files on the Web

The basic idea of a Web server program is not all that complicated: A web server sets up a listening socket to listen for connections on a specified port number (which is port 80, by default, in real Web servers). It has a directory full of files that it makes available to clients. When a client (that is, a web browser) makes a connection to the server, the client sends a request for one of the server's files, and the server responds by sending the contents of the file back to the server. In a real web server, lots of other things can happen. There can be other types of requests that require different types of responses. And even simple requests and responses can have lots of parameters (such as "cookies") that are exchanged between the client and the server. In your basic web server, however, you will stick to the basic case.

A file on the server is specified by a URL such as

            http://foo.bar.com:12345/path/to/file.html

where foo.bar.com is the name of the computer where the server is running; 12345 is the port number on which the server is listening, which can be omitted if the port number is 80; and /path/to/file.html is the path to the file. Here, "path" and "to" would be directories and "file.html" would be the actual name of the file. Note that the path always starts with "/". The path starts in the directory that contains the server's file. That is, if the server's files are in the directory /www/root, then the actual complete absolute path to the file would be /www/root/path/to/file.html. This is the file name that the server would use to look up to the file. Note that it is formed simply by adding the path from the URL onto the server's root directory.

Communication between a Web server and a Web browser is based on a protocol known as HTTP (HyperText Transfer Protocol). The current and most commonly used version of this protocol is known as HTTP/1.1. According to this protocol, a browser that wants to retrieve a file from the server sends a request to the server that starts with a line of text of the following form:

            GET  /path/to/file.html  HTTP/1.1

The tokens "GET" and "HTTP/1.1" have to be there literally. The second token, /path/to/file.html, would be replaced by the path to the file that client actually wants to retrieve. As noted above, the path is the part of the URL that comes after the host name and optional port number. There can be other stuff after this line, but your server will ignore any extra parameters that come after the first line of the request.

If you want to see a complete request from a web browser, run the sample program ReadRequest.java. This program listens for connection requests on port 50505. You can get a web browser to contact the program by entering a URL that begins with "http://localhost:50505/". ("localhost" is a name for the computer that you are working on.) The ReadRequest program will read the request sen by the browser and will output it to standard output. It will not send anything back to the browser; it will simply close the connection. The program will run until you terminate it (or an error occurs). (Note: If another program, or another copy of the same program, is alrady using port 50505, then the attempt to listen on port 50505 will fail. Make sure that you terminate you programs in this lab before you try to run them again.)

When the server sends the response to the client, the first thing that it sends is a status line, which tells the client whether the requested file will be sent or whether some problem has occurred. Your server will use one of the following three status lines:

            HTTP/1.1 200 OK

            HTTP/1.1 404 Not Found

            HTTP/1.1 501 Not Implemented

The first status line means that the requested file was found and is going to be sent to the client. The second line means that the requested file was not found. The third line indicates that the server did not understand the request; you should send this error message if the first token received from the client is not "GET".

Immediately after the status line, the server sends several other lines of information related to its response. These lines are called "headers". There is one header per line, and each header has the syntax: <header-name>: <header-value>. The headers that you will need are named Content-Type, which specifies the mime-type of the file that will be sent; Content-Length, which specifies how many bytes there are in the file; and Connection, which you will use to tell the client that you are going to close the connection after sending the file rather than wait for additional requests. The headers are terminated by an empty line. The file itself is sent immediately after the empty line. A typical response from your server, up the the first empty line, might look like:

            HTTP/1.1 200 OK
            Content-Type: text/html
            Content-Length: 12376
            Connection: close

After the empty line, you would send the contents of the file itself. The content length is not strictly required in practice. For an error reply, you can omit it. In an error reply, instead of sending a file after the headers, you can send some plain text or HTML text. The browser will display this text in the browser window as an error message to the user. For example, you could send the following complete response to indicate that a file can't be found:

            HTTP/1.1 404 Not Found
            Content-Type: text/plain
            Connection: close

            Sorry, the file that you requested
            could not be found on this server.

To begin the lab, you can start a new program named, for example, MyWebServer. The main() method for your program can be copied, with one small change, from ReadRequest.java. You can use the same LISTENING_PORT as that program, or use a different one. (Keep in mind that the port must be between 1024 and 65535.) Also, you can add a constant representing a path to the root directory of your server. You could use your own www directory, if you have one, but you might want to use one of the faculty web sites that are publicly available on our server. For example, if you want to serve up Professor Scotty Orr's web pages, use:

            public final static String SERVER_ROOT = "/home/scottyorr/www";

For your web server program, instead of calling a method to handle a connection, you should start a thread to handle the connection. By using threads, it will be able to handle several connections simultaneously. Write a subclass of Thread to handle connection requests. It can either be in a separate class or can be a static nested class inside your main class. In the main program, instead of calling the handleConnection() method, create an object belonging to your thread class and start it running. The run() method of the thread should handle all communication with the client, and it should close down the connection after it has sent its response to the client. After closing the connection, the thread can terminate. You can use some of the code from the handleConnection() method in ReadRequest.java, but the task that your thread has to perform is more complicated.

Your run() method should read the first two tokens from the client's request. (It can use a Scanner.) If the first token is not GET, you should send a "501 Not Implemented" error to the client, as described above, and terminate. Otherwise, the second token is the path to the file that the browser wants. To get the fill path name for the file, add the file name from the client's request to the SERVER_ROOT, as noted above. Check whether the file exists and is readable and is not a directory. (Use an object of type File to represent the file and perform these tests.) Also, determine the mime type of the file, if possible. (There is some code in code_snippets.txt for finding the mime type of a file.) If any of the tests fail, or if you cannot determine a mime type, sens a "404 Not Found" error to the client, and terminate. Finally, if everything works out, send a "200 OK" response, as described above, including the Content-Type, Content-Length, and Connection headers, as described above. (The length of the file can be determined by calling the length() method in the File object.)

After the headers, send an empty line. (Presumably, you are using a PrintWriter to do this. Don't forget to flush it..) After the headers and empty line, send the contents of the file. Since the file is not necessarily a text file, you should send it byte-by-byte, in binary form, using an InputStream to read from the file and the socket's OutputStream to write to the socket. Some code for doing this can be found in the file code_snippets.java. Don't forget to close the connection in the end; to make sure it is closed, you can use a finally clause, as is done in the handleConnection() method in ReadRequest.java.

A few comments: (1) If the client requests a directory rather than a file, it is customary to return a file named index.html in that directory, if such a file exists. So, if the request is for a directory, you could try adding /index.html onto the file name, and try to return that file instead of sending an error message. (2) You might want to read the third token in the client's request and make sure that it is, in fact, HTTP/1.1, as it is supposed to be, and you could return an error message if not. However, this error is not very likely to occur. (You could also accept HTTP/1.0.) (3) A real web server would not create a new thread for each connection. That would be too inefficient. Instead, it might use a thread pool, like the one that was used in our web-crawler program. The main method of the server would drop connections into an ArrayBlockingQueue, and each thread would run in an infinite loop in which it repeatedly retrieves a connection from the queue and services it.

David Eck, for CPSC 225

CPSC 225, Fall 2007 Lab 10: Networking II

Serving Up Files on the Web

CPSC 225, Fall 2007
Lab 10: Networking II