CPSC 431, Fall 2016 -- Lab 3

CPSC 431, Fall 2016
Lab 3: Fork, Exec, Pipe, Files

For this lab you will write three short (but not necessarily easy) program that use some common Unix functions for working with processes: fork, exec, and pipe. These functions are covered in the reading, and we have begun discussing them in class, and some basic information is given below. You also have the sample files in /classes/cs431/fork-and-pipe-examples. You can copy code from the sample files, or you could even do some of the exercises by copying and editing an entire sample file.

You will need the files from /classes/cs431/lab3. You should get a copy of that directory. The programs that you will write are:

parallel.c — Run several command-line commands in parallel.
primes.c — Count prime numbers using several parallel processes.
wordcheck.c — Search a binary sort tree for a word, reading the necessary data using random access in a binary-formatted data file.

The lab is due next Monday. Please make sure that the files are named exactly as indicated above, and make sure that they are in a folder named lab3 inside your homework folder in /classes/cs431/homework.

Parallel Processes with Fork and Exec

The first exercise is to write a program named parallel.c that can execute several commands in parallel. That is, the command-line arguments to parallel are themselves commands that could be given on the command line, and those commands will be run in parallel. To do this, parallel will fork a child process for each command, and the child process will use exec to execute the command. Note that a command that itself has arguments must be enclosed in quotes. For example, assuming that the name of the compiled program is parallel, the command

./parallel "ls /home" "./count_some_primes 1000000"

will run the two commands ls /home and ./count_some_primes 1000000 in parallel.

The sample program exec.c is an example of using exec. (Note that the function that is used is actually named execvp.) And fork.c is an example of forking several child processes. Note that fork.c waits for its child processed to end using waitpid. The simpler function wait() is used for the same purpose in multi-write-pipe.c. The functions fork and execvp are declared in the standard header file unistd.h, while wait is declared in sys/wait.h.

The function execvp requires an array of strings that contains the command and its arguments as separate strings, followed by a NULL pointer. Your program will need to parse a command such as ls /home into such an array. Since the point of this lab is not string manipulation, here is a function that you can copy-and-paste into your program to do the parsing:

/* command is a string possibly containing spaces that divide it into
 * "tokens".  The array, args, is an array of strings that will be filled 
 * with pointers to those tokens.  The command string is modified by
 * changing the space after a token to a zero character to mark the end
 * of the token, and the pointers in args will actually point into the 
 * command.  The next space in args after the tokens is set to be a NULL
 * pointer.  At most array_length-1 tokens are stored in the array;
 * any additional tokens in the string are ignored.
 */
void tokenize(char *command, char *args[], int array_length) {
    int word_ct = 0;
    char *p = command;
    char *word = NULL; // Value is NULL when looking for a start-of-token.
    while (1) {
        if (*p == 0) // This is the 0 at the end of command
           break;
        if (*p == ' ') { 
            *p = 0;
            if (word) { // (Otherwise, it was a space after a previous space)
               args[word_ct++] = word;
               word = NULL;  // To start looking for next start-of-token.
               if (word_ct == array_length-1)
                  break;
            }
        }
        else if (word == NULL)  // non-blank char is the start of a token
           word = p;
        p++;
    }
    if (word && word_ct < array_length-1) {
       args[word_ct++] = word;
    }
    args[word_ct] = NULL; // A NULL to mark end of list of tokens
}

Multiprocessing with Fork and Pipe

The second exercise is to write a program named primes.c that will count the prime numbers in the range from 2 to one billion. The program will divide the computation among several processes, running in parallel. The number of processes should be specified on the command line. The program will fork the specified number of processes. It will divide the range of numbers into subranges, and assign one subrange to each process. Furthermore, the child processes should use a pipe to send their counts back to the parent process. The parent should read the count from each child and should output the total number of primes. (For a range from 2 to one billion, the answer is 50847534. See https://primes.utm.edu/howmany.html for a table of number of primes less than N for other powers of ten.)

Your program can be very similar to multi-write-pipe.c. (In fact, the major reason for this exercise is to force you to understand that example.)

The program count_some_primes.c is an example of counting primes in a given range. You can use some code from that program.

Note that both count_some_primes.c and your program, primes.c, need the square root function, which is declared in math.h. To use this function, the program must be linked to the math library. See the comment at the top of count_some_primes.c.

(Note that the point of multiprocessing in this case is to complete a long task more quickly by working on subtasks in parallel. It would be interesting to run the program with different numbers of child processes to see what kind of speed-up you get. You might also want to try compiling the program with optimization turned on, using the -O option on the gcc compiler, to see if that speeds up the program.)

Random Access File

The third exercise is to write a program named wordcheck.c that checks whether a given word is in the set of words represented by the data file tree_in_file.dat. The word that is to be checked can be given as command-line argument. (It will be argv[1] in the program.) To check the word, you should use the file as a random access file, and you should only read the parts of the file that are needed for checking the work. You will be doing a binary sort tree search of the tree represented by the file.

Your tools for working with the file are functions from the Unix file API, which are declared in stdio.h. You will need these functions:

fopen(filename,mode) -- Example: FILE *file = fopen("data.dat", "r"); fopen opens a file for reading and/or writing. The mode tells how the file will be used. For reading only, the mode is the string "r".
fseek(file,offset,whence) -- Example: fseek(file,pos,SEEK_SET); fseek with third parameter SEEK_SET moves the file read/write position to byte number pos in the file. The position is where fread will start reading. This allows you to read data from any point in the file.
fread(buffer,itemsize,itemcount,file) -- Reads from the file, starting at the specified position, into a buffer. The buffer can be an array or the address of a regular variable. Itemsize is the number of bytes per item; for example for integers, it would be 4 and for characters, it would be 1. Itemcount is the number of items to be read. fread returns an int that is the number of items (not the number of bytes) that were actually read. If the return value is less than itemcount, then an error occurred.

The format of the file is as follows: The file contains one "record" for each word. (The file doesn't know about "records", but this is how I think of the format.) A record consists of three int values followed by the characters in the word. The zero at the end of a string is not stored in the file. The record represents a node in a conceptual binary sort tree. The first integer is the position in the file of the left child of that node, or is 0 if the node has no left child. The second integer in the position in the file of the right child of that node, or is 0 if the node has no right child. The third integer is the number of characters in the word. The root of the binary sort tree is at position zero in the file. (Thus, the first two integers replace the left and right pointers that would be used for a binary tree in memory.)

The integers are stored in binary format. That is, the file doesn't contain the ASCII characters that would represent the integer in a printout. It contains the actual 32 bit binary value for the integer. The value can be read directly into an int variable using fread.

Your program should do a typical binary sort tree search: Start with the root record, and move down the tree until you reach the bottom or find the word. At each step, compare the word to the word in the record to decide whether to move left or right.

You can certainly assume that each of the words in the file has length less than 100.

(If you want to impress me, you could write a program to create the file tree_in_file.dat. The source of the words is the same unsorted_words.dat that you used in Lab 1.)