CPSC 441, Fall 2014
Lab 3: Writing a Web Server
The assignment for this lab is to write a basic web server. Although you will implement only a part of the HTTP protocol, your server should be capable of serving pages correctly to a web browser such as Firefox and Chrome, and those pages should be displayed correctly in the browser. For now, the server will be single-threaded. You will add threads later, which is pretty easy to do.
We will discuss the assignment due date.
Telnet
Before you start, you should experiment with using telnet to communicate with a server. Telnet can be useful for debugging your server, since it can show you exactly how the server is responding to requests. Try it with my FileSrv example. Start the server running in one Terminal window with the command
java -jar /classes/cs441/FileSrv.jar
Then, in another window, use the following command to connect to the server:
telnet localhost 33987
You won't see any response because the server waits for a command before doing anything. Type the command LIST and press return. You will see a list of available files. The server closes the connection after sending the list. Try again with a command of the form GET <filename> to retrieve a file from the server.
You might also want to try connecting to a web browser. For example, to request the file /eck/cs441/index.html from math.hws.edu, use
telnet math.hws.edu 80
then type the following two lines, pressing return twice after the second line:
GET /eck/cs441/index.html HTTP/1.1 Host: math.hws.edu
(Or maybe add a third line, Connection: close, to the input.)
Getting Started
Start your program with the basic outline for a network server, similar to the structure of DateServer.java or FileSrv.java. You can find copies of both files in /classes/cs441; FileSrv.java is in /classes/cs441/netsrv since it is in package netsrv.
The program needs a main routine to create the server socket and listen for connections. Write a separate "handleConnection" method to do the communication with a connected client. Make sure that any exceptions that occur during the communication are caught so that they do not crash the server. Log some information about each connection. (My programs just print logging information to System.out, but it might be better to have a log method or even a Logger object to add more flexibility later, such as being able to configure where the output messages should be recorded.)
As soon as you get the basic outline done, you might want to test it by telnetting to the server, even if all it does is immediately close the connection. You can try again at various points in the development to test features as you implement them. Don't forget to test how your server deals with errors in the request, since I will certainly do that!
When you have the usual basic outline, you are ready to start programming HTTP. You should use some of the methods from the NetUtil class to help with the communication. You can find a copy in /classes/cs441/netsrv.
Don't just write everything in one long handleConnection method! You should try to design a well-written program in which responsibilities are divided up appropriately among several methods and possibly even additional classes.
Reading the Request
For this lab, you are only required to read the first line of a request from a client, and you are only required to implement GET commands. You can ignore the rest of the headers and any data in the request. (But you are welcome to read all the request headers and store them in a hash table for later use, if you would like to do that. You are even welcome to implement some of them, if you are very ambitious.) Nevertheless, you should send a legal response in all cases, even when the command is not GET. Recall that the GET command has the form
GET <target> HTTP/1.1
Sometimes, your server will have to send an error response back to the browser. For example, if the line has the correct form but the first word is not GET, then you can send a 501 Not Implemented response. If the line is not of the correct form, you can send 400 Bad Request.
The <target> will be a path to a file, possibly a directory. The protocol allows the path to be followed by a "?" character and some additional characters. For this lab, you can strip them off and discard them (or just ignore the possibility). Remember that the target will be URL encoded, so you should decode it before using it.
Finding and Sending the File
A web server is designed to serve files from a certain directory. The directory that is used is part of the web server configuration. You can use your own www directory or some other public directory of your choice. (It should be public so the server can use it when I run the server.) I have put a copy of my JavaNotes web site in /classes/cs441/javanotes7 and you are welcome to use that directory if you want.
The <target> path starts in the server directory. That is, the target has to be appended to the directory to get the path to the actual file. You might check out NetSrv.java to see how I handle it there.
Once you have the actual File, you need to check whether the file exists and whether it can be read, and send an appropriate error response (404 Not Found or 403 Forbidden). If the requested file is a directory, you have a decision to make. Sometimes a web server considers a request for a directory to be Forbidden, but sometimes it responds to such a request by sending a list of files in the directory. If you want to do the latter, you need to create an HTML document on the fly, with the file names formated as hyperlinks to the files. Another option, which is used by many servers, is to consider index.html to be the default file in a directory. That is, if the request is a request for a directory and if the directory contains a file named index.html, then send index.html as the response.
Finally, when you are ready to send a file, you need to create a legal response. The first line of a response to a successful request should be
HTTP/1.1 200 OK
followed, don't forget, by CRLF. You should send at least a "Connection: close" header and a "Content-Type: <type>" header. It is desirable to send a "Content-Length: <bytes>" header, but the server will work without it as long as you are closing the connection after sending the file. Don't forget two CRLFs after the last header line, followed by the contents of the file.
About Content-Length, Content-Type, and Error Responses
The File class has a length() method that returns the number of bytes in the file, as a value of type long, so getting the Content-Length for a file is easy.
For the Content-Type, you can determine the type by looking at the file extension in the file name. One simple way to handle this is to write a method that looks at the file extension and returns the appropriate type. To handle the most common situations, you will need at least the following
Extension Content-Type --------- ------------ .html, .htm text/html .css text/css .js application/javascript .txt text/plain .gif image/gif .jpeg, .jpg image/jpeg .png image/png
You might want to add more. (For example, if you are using javanotes7 for your web site, you can send a content type of text/plain for .java files.) You can use application/octet-stream as the default in cases where you can't determine the content type. Remember that extensions are not case-sensitive.
If you want to do something fancy, you could read and parse the contents of the file /etc/mime.types to make a hash table mapping extensions to mime types.
When sending an error message as a response, you should include some data in the response. Remember that the user of the browser will only see the stuff that comes after the header section of the response. You can create an error message in either plain text or HTML format, and include a Content-Type header to match.