CPSC 331

Operating Systems

Spring 2026

Project 1
Unix Utilities

Due: Wed 2/4 at the start of class

This warmup project has several goals:

to get your programming environment set up for use in this course
to gain some familiarity with the C programming language and basic technical details of writing programs in C (compiling, running, Makefiles, etc)
to gain some familiarity with the command line in Linux
to learn a bit about several common (and useful) Unix utilities

Your task is to implement versions of four command line utilities (cat, grep, zip, and unzip) largely as described in the project README, along with learning a bit about Makefiles. Feel free to go look over the README (along with this C lab tutorial ) now, but don't start trying to figure out the code yet! Look through the rest of this handout first, then do the red-boxed steps in the Preliminaries and Specifications and Details sections (in that order). The combination of the README and this handout will provide a tutorial-style introduction to programming in C.

Collaboration and the Use of AI

Review the course policy on academic integrity.

Certain uses of AI are permitted on this assignment. AI use is not required.

The basic rule: you may use the code completion features of Copilot but may not use features (such as the coding agent) where code is generated from English language prompts. It is also essential that you understand and think critically about any code suggested by Copilot, both to help develop your C programming skills and because while the code suggestions are often uncannily on target, they are not always exactly what you want.

Using the code explanation features of Copilot Chat is permitted, though be careful that this doesn't spill over into code generation.

Other Policies

Review the policy on late work.

Revise and resubmit applies for this assignment. Review the details of how revise and resubmit works.

Handin

To hand in your project:

Make sure that you have included a comment at the beginning of each of your programs containing your name and a description of the program, and that you have auto-formatted everything.
Copy your entire initial-utilities directory to your handin directory (/classes/cs331/handin/username).

Check that the handin was successful and that the directory structure is correct: your handin folder /classes/cs331/handin/username should contain a folder initial-utilities which in turn contains a Makefile and subdirectories wcat, wgrep, wzip, wunzip. Each subdirectory should have a .c file of the same name, a Makefile, and, if you added any test cases, a subdirectory tests containing the appropriate files.

Preliminaries

Do the steps outlined in reddish boxes below before you start writing code.

Directory Structure

Create a directory ~/cs331 to hold your files for this course. (That's a directory named cs331 in your home directory.)
Create a directory ~/cs331/workspace for coding assignments. (That's a directory named workspace inside your new cs331 folder.)

Provided Code

Copy the initial-utilities and tester directories (and all of their contents) from /classes/cs331 to your workspace folder (~/cs331/workspace). Make sure that you end up with initial-utilities and tester directories inside your workspace folder, don't just copy their contents.

The initial-utilities directory contains test cases for each of the programs you will write. Run them from the command line: for the wcat tests, for example, change to the wcat directory (which should be ~/cs331/workspace/initial-utilities/wcat if you set things up as specified above), and run

    ./test-wcat.sh

Treat the provided tests as a backup check: practice your testing skills by first doing your own testing, then run the provided tests to see if you missed anything.

The tester directory contains scripts for running the provided tests along with a README that describes the setup for each test case. There's no need to do anything with the contents of this directory unless you want to define your own test cases.

Extra credit is possible if you find cases missed by the provided tests — see the README in the tester directory for information on how to define a test case and hand in the files for your new case(s) along with your code. Be sure to number your test cases starting one higher than the provided ones — don't modify or replace the provided test cases — and include a description of what the test case covers in the appropriate file.

VSCode

Systems programming usually doesn't require a full IDE, just a code editor with features like syntax highlighting and code completion. Editors like vi/vim and emacs are common in Unix/Linux environments and are useful to know, but they have steep learning curves. In this course, we'll use VSCode, which is much easier to learn but is still powerful and flexible enough for a broad range of programming tasks and languages.

Start VSCode from the Applications menu.
Open your ~/cs331/workspace folder (File→Open Folder...) rather than the specific folder for a particular project. This lets you easily access other folders within your workspace should you need to.

To get auto-formatting support for C in VSCode, set up and configure the Microsoft C/C++ extension:

Click on the Extensions icon in the toolbar on the left side of the VSCode window or select View→Extensions from the menus to show the Extensions panel.
Type "C/C++" in the search box — you are looking for the extension titled just "C/C++" from Microsoft. (It will likely be at the top of the list of matches.)
Click the Install button to install the extension.
Once the extension is installed, click on the Settings icon at the bottom of the toolbar on the left side of the VSCode window and choose "Settings" or select File→Preferences→Settings to bring up the Settings tab.
Type "clang_format" in the search box; you are looking for the setting called "C_Cpp: Clang_format_fallback Style". (It will likely be at the top of the list of matches.)
Replace the setting's value (probably "Visual Studio") with the following:
```
      { BasedOnStyle: Google, IndentWidth: 4, ColumnLimit: 0 } 
```
(The effect of this change is to put open curly brackets on the same line as what comes before rather than starting a new line. If open curly brackets are instead put on new lines when you start working with and auto-formatting C code, this setting didn't get set correctly.)

Additional recommended settings: (locate each by starting to type the setting name in the search box in the Settings tab)

Editor: Format On Save → checked
Files: Auto Save → afterDelay
Editor: Format On Type → checked

The following settings are a matter of personal preference and are optional to set:

Extensions: Ignore Recommendations → checked
Telemetry: Telemetry Level → off

If you want to use Copilot (optional), install the GitHub Copilot extension:

Click on the Extensions icon in the toolbar on the left side of the VSCode window or select View→Extensions from the menus to show the Extensions panel.
Type "copilot" in the search box — you are looking for the extension titled "GitHub Copilot" from GitHub. (It will likely be at the top of the list of matches.)
Click the Install button to install the extension.
Once the Copilot extension is installed, click on it in the list of extensions to bring up a tab with documentation about the extension.
- You'll need to sign up for GitHub Copilot Free. Look for the section "Getting access to GitHub Copilot".
  Note that the free tier has usage limits. As a student, you can get free access to Copilot Pro if you want greater access. Look for the "Education benefits" tab under "Billing and licensing" on the Settings page for your GitHub account once you create it.
- You may use the "Code suggestions in the editor" feature. Look for that section to read a littlea about it.
- It is recommended that you avoid the chat — asking Copilot to explain your code (or code that you are working with) but you may not have Copilot write code from natural language prompts and it can be easy for chat usage to cross this line. Look for the section "Ask and learn about your code with chat" to read a little more about chat.
- You may not use other features, such as the Copilot coding agent (available through Copilot Pro) or anything only available through a paid level of access.

Reference

System Calls, C, and Unix

For system calls, first look them up in the man pages. They are in section 3 — in many cases man whatever will give you the right version, but if not, use man 3 whatever.

For additional reference on system calls as well as C and Unix:

Dive Into Systems, in particular
- Appendix 1: Chapter 1 for Java Programmers
- Chapter 2: A Deeper Dive Into C
- Appendix 2: Using Unix (largely sections 17.1, 17.2, 17.5, but feel free to explore more)
C for Java Programmers

Both Dive Into Systems and C for Java Programmers have a table of contents which can help you locate the right section for something you are looking for. In addition, Dive Into Systems has a search box that you can use to locate specific terms throughout the whole book and you can use your web browser's search function (hamburger menu→Find in Page... or ctrl-F in Firefox) to search what's currently displayed.

Failed Tests / About `diff`

The provided tests are largely self-documenting. For example, you might see the following:

test 1: passed
test 2: passed
test 3: 3.out incorrect
  what results should be found in file: tests/3.out
  what results produced by your program: tests-out/3.out
  compare the two using diff, cmp, or related tools to debug, e.g.:
  prompt> diff tests/3.out tests-out/3.out
  See tests/3.run for what is being run

To see what went wrong, you should compare the two files tests/3.out (the correct result) and tests-out/3.out (what your program actually output). You can view the files by opening them in VSCode or by using cat in the terminal:

  cat tests/3.out
  cat tests-out/3.out

You can also directly compare them line-by-line using diff as indicated:

  diff tests/3.out tests-out/3.out

Normal diff output shows only the lines that differ and indicates which file they came from. For example:

  1c1
  < Usage: wgrep <searchterm> [<file> ...]
  ---
  > Usage: wgrep <searchterm> <file> ...

< indicates lines present only in the first file and > indicates lines present only in the second file. (In this case we can see that the problem is that the usage message isn't formatted correctly — the [] is missing.) If the files match exactly, diff produces no output.

For longer files, a context diff or side-by-side diff can be useful. There are also more flexible comparisons, such as comparisons which ignore whitespace. As with any command, look up diff in the man pages for more information.

Specifications and Details

Your task is to implement versions of four command line utilities (cat, grep, zip, and unzip) largely as described in the project README, along with learning a bit about Makefiles.

To do this:

First read the introduction of the README and sections F.1-F.4 and F.8 in the lab tutorial referenced in the "before beginning" instructions. The lab tutorial addresses the basics of compiling and running C programs. (The other sections of the lab tutorial are useful reading as well, but less important for this project. Debug by printing messages rather than using the debugger.)
Then tackle the four programs (wcat, wgrep, wzip, wunzip) one at a time and in the order listed. For each, read through that section in the README (including the details) for the main specifications and instructions, followed by the corresponding section below for changes in the specifications, additional hints, and a suggested plan of attack. Then write and test the program. When testing, don't forget to run the provided tests as well — they should all pass.
Finally, explore some basic generic Makefiles as described in the last section below.

wcat

Additional or modified specifications:

Name the program file wcat.c. Put it in the wcat subdirectory.
If there is an error opening a file, print
```
    wcat: cannot open file filename: message
```
message should give more information about why the error occurred — use perror() and/or strerror() to get this information.
Assume a maximum line length of 4096 characters. Ensure your code correctly handles lines of that length — watch out for off-by-one errors.

Hints/suggestions:

Build the program incrementally: a suggestion is to first handle the case of no files specified, then loop through the filenames — start with just printing the filenames in order to check your loop, then open each file and handle any errors, then print the lines read.
In C, the argv[0] is the name of the command that was run so argc will always be at least 1 and what we think of as the command line arguments proper start with argv[1].
Read about each of the system calls in the man pages even if there's enough information in the README to use them — it's good to get some practice with deciphering man pages. You can also consult Dive Into Systems and C for Java Programmers for additional information and examples.
Use printf for output. See section 2.8 in Dive Into Systems and section 4.3 in C for Java Programmers. It is also worth looking it up in the man pages — make sure you get the version in section 3!
fgets needs a buffer to store the characters read. C for Java Programmers has an example. (Also read about it in the man pages.)

wgrep

Additional or modified specifications:

Name the program file wgrep.c. Put it in the wgrep subdirectory.
If wgrep is passed no command line arguments, it should print
```
    wgrep <searchterm> [<file> ...]
```
If there is an error opening a file, print
```
    wgrep: cannot open file filename: message
```
message should give more information about why the error occurred — use perror() and/or strerror() to get this information.
Use getline to read arbitrarily-long lines and have getline allocate the buffer used to store the line rather than allocating it yourself using malloc or an array. You will need to deallocate the buffer using free when you are done with it.
Avoid repeated code: define a function grep which takes a search term and a file pointer as parameters and prints out the lines of the file matching the search term.

Hints/suggestions:

Build the program incrementally. Since wcat.c also reads a collection of files line-by-line, a suggestion is to start with that: make a copy of wcat.c, then modify the copy to loop through just the filenames (remember that the first argument will be the search term), then print only the lines containing the search term, and finally switch from fgets to getline in order to handle lines of any length. Finally, handle the case of instead reading from stdin when no files are specified. Since reading from stdin is the same as reading from an opened file, this is also a good time to define the grep function called for in the specifications above.
See the first sample program here for an example of how to call getline, then read the man page to figure out how to initialize buffer and bufsize (from the example) to have getline allocate the buffer itself. You may also find it useful to read section 2.4.2 in Dive Into Systems and section 8.5 in C for Java Programmers to learn more about dynamic memory allocation in C. You won't need malloc in this case (because getline allocates the buffer) but you will need free.
Use the system call strstr to determine if a line contains a search term. Look it up in the man pages.
C functions must be defined before they can be used — order matters! main typically comes last in the file for this reason.
When specifying the search term on the command line, enclose it in quotes if it contains special characters like spaces. For example:
```
	./wgrep "foo foo" bar.txt
```
Also use quotes to pass an empty string as the search term:
```
	./wgrep "" bar.txt
```
When running the read-from-stdin version of the program, use ctrl-D instead of ctrl-C to end the program. (ctrl-D closes stdin, causing the next read from stdin to result in EOF (end-of-file). This allows the program to continue on and exit normally; by contrast, ctrl-C ends the program immediately.)

wzip

Additional or modified specifications:

Name the program file wzip.c. Put it in the wzip subdirectory.
If wzip is passed no command line arguments, it should print
```
    wzip <file> [<file> ...]
```
If there is an error opening a file, print
```
    wzip: cannot open file filename: message
```
message should give more information about why the error occurred — use perror() and/or strerror() to get this information.
Use fgetc to read uncompressed characters.
Treat multiple files as one big input chunk — carry over same-character runs from one file to the next. For example, if file1.txt contains aaaaaaaaaabbbb and file2.txt contains bbaa, the result of
```
    wzip file1.txt file2.txt
```
should be 10a6b2a (not 10a4b2b2b).

Hints/suggestions:

Build the program incrementally. You can again start with a copy of wcat.c, then modify the copy to read each file character-by-character (using fgetc) instead of line-by-line. Next, compute the run-length encoding but use printf to write each count and character as plain text for testing purposes. Once you have confirmed that the correct encoding is being generated, replace printf with fwrite to write the binary form.
See tutorialspoint for examples of using fwrite. (After reading the man page, of course.)

wunzip

Additional or modified specifications:

Name the program file wunzip.c. Put it in the wunzip subdirectory.
If wunzip is passed no command line arguments, it should print
```
    wunzip <file> [<file> ...]
```
If there is an error opening a file, print
```
    wunzip: cannot open file filename: message
```
message should give more information about why the error occurred — use perror() and/or strerror() to get this information.

Hints/suggestions:

Build the program incrementally. You can again start with a copy of wcat.c, then modify the copy to correctly read the compressed file(s).
See tutorialspoint for examples of using fread. (After reading the man page, of course.)

Makefiles

Makefiles can look cryptic, but they are powerful tools for automating and simplifying the build process — you specify how a program is compiled once and can then rebuild only what's necessary with a single command, saving time and reducing errors. In addition to building programs, Makefiles often define other useful targets like make clean to remove generated files and make test to run test suites, helping to streamline common development tasks.

In practice, Makefiles are rarely written from scratch. Instead, a generic Makefile is adapted for a new project. As a result, the focus here is on developing a reading knowledge of a basic Makefile pattern and learning how to adapt it rather than on building Makefiles from the ground up.

A Makefile for wcat has been provided in the initial-utilities/wcat directory.

Take a look at the wcat Makefile, then read section F.6 of the lab tutorial and section 17.5 of Dive Into Systems to understand what is going on.
Try out each of the targets: make (or make all), make test, make clean. Also try them in various combinations to observe what is rebuilt when — start each sequence with make clean, then do make twice in a row or make test twice in a row or make and the make test or vice versa.
Create Makefiles for wgrep, wzip, and wunzip — copy wcat's and make the necessary changes to the Makefile variables. (You should not need to change any of the rules.)

It can be convenient to apply the same operation to all four of wcat, wgrep, wzip, and wunzip with one command. This can also be done with a Makefile.

Take a look at the top-level Makefile (in the initial-utilities directory) to see how it works, then try it out: from the initial-utilities directory, run make, make test, and make clean. Also experiment with different combinations, including doing things like make clean in just one of the subdirectories before running make at the top level. (What is rebuilt?)

Project 1 Unix Utilities

Due: Wed 2/4 at the start of class

Collaboration and the Use of AI

Other Policies

Handin

Preliminaries

Directory Structure

Provided Code

VSCode

Reference

System Calls, C, and Unix

Failed Tests / About diff

Specifications and Details

wcat

wgrep

wzip

wunzip

Makefiles

Project 1
Unix Utilities

Failed Tests / About `diff`