CPSC 431, Fall 2016
Lab 2: Pintos Lists  /
      Assignment: Memory Management

We will soon be working with the Pintos operating system. Pintos has an implementation of doubly linked lists that makes extensive use of C pointer manipulation. In the lab today, you will work with Pintos lists. The program that you work on will be a good exercise in using pointers, and it will help to prepare you for Pintos.

The lab will also get you started on the next programming assignment, which asks you to do some basic memory management in C. You should take some time before the end of the lab period to look at the files for the assignment and ask any questions that you have about them. However, you should expect to do all or most of the programming for the assignment later.

You will need copies of the directories /classes/cs431/pintos-list and /classes/cs431/my_malloc. Copy the two folders into your own account. Next week, you can turn in your work by copying your completed versions of the two directories into your homework folder in /classes/cs431/homework.

A Strange Way to Do Lists

(This section is informational; the lab exercise is in the next section.)

The files pintos-list/list.c and pintos-list/list.h implement the Pintos version of doubly linked lists. Read the files, especially the long comments at the beginning, for more information. A list is made up of nodes defined by the type

struct list_elem 
  {
    struct list_elem *prev;     /* Previous list element. */
    struct list_elem *next;     /* Next list element. */
  };

A list has a head node and a tail node. The head and tail nodes are sentinels; that is, they are not themselves considered to be part of the list. An empty list consists of just the head and tail with nothing in between. A list is represented by the type

struct list 
  {
    struct list_elem head;      /* List head. */
    struct list_elem tail;      /* List tail. */
  };

So, a list takes the form:

A list, with no data items

List operations are meant to be done using functions such as:

void list_init( struct list *l);  
      // Initialize a list to be the empty list.

struct list_elem *list_begin (struct list *l);  
      // First node in the list, or tail if empty
      
struct list_elem *list_next (struct list_elem *elem); 
      // Next node after elem

struct list_elem *list_end (struct list *);  
     // Pointer to tail node

void list_insert (struct list_elem *e1, struct list_elem *e2); 
     // Insert e2 into the list containing e1, in front of e1
     
void list_remove( struct list_elem *e) 
     // Remove e from the list that contains it
     
void list_push_front (struct list *l, struct list_elem *e); 
     // Add node e at the head of list l
     
struct list_elem *list_pop_front (struct list *l); 
     // Remove and return first node of list l

However, note that there is nothing here about any data items in the list; all we have is pointers to other nodes in the list. Data is added to the list in a strange, C-like way. The list_elems are embedded into structs, so that the data items are near the list_elems but not inside them. Instead, the list_elem nodes are inside structs that also contain the data:

A list, with nodes inside structs that also contain data.

With this structure, given a list_elem node, you can get a pointer to the struct that contains the node by doing some pointer arithmetic. A macro is defined in list.h for doing the necessary calculation:

list_entry( ptr_to_list_elem, struct_type, member_name )
    // Here, ptr_to_list_elem is of type struct list_elem *, and 
    // struct_type is the name of the type of struct that contains 
    // the list_elem, and member_name is the name used inside the 
    // struct for the list_elem.  This macro represents a pointer 
    // to the struct that contains the list_elem.

For example, in this lab you will use the struct type

struct namelist_node {
   char *name;
   struct list_elem link;
};

If e is of type struct list_elem *, and e points to the list_elem in a struct of type struct namelist_node, then you can get a pointer to that struct by saying

struct namelist *node = list_entry( e, struct namelist_node, link );

Note that the second and third parameters are not variables; one is the name of a struct type and the other is the name of a member of that struct. The purpose for doing this would be so that you can refer to node->name.

Lab Exercise: Using a Pintos List

The file names_original.c is is a rather boring demo program that uses a linked list to store an alphabetical list of names. The linked list is implemented using pointers in the usual way that will be familiar to you from Java. You can compile and run the program to see what it does. You will not work on this file.

For the lab exercise, you will make a version of the program that uses a Pintos list. You will do the work in the file names_pintos.c. Open it in an editor. The file is a copy of names_original.c in which the bodies of all functions except main have been stripped out, and the list declarations have been replaced by

#include "list.h"

struct list namelist;   // A list of names.

struct namelist_node {  // a node in the list
   char *name;
   struct list_elem link;  // provides linking to adjacent nodes
};

You will fill in the bodies of the functions. The lab will lead you through the programming and hopefully show you how to use a Pintos list.

First, before using a list, the list must be initialized using list_init. Note that the parameter to list_init is a pointer, but the variable namelist that represents the list in this program is not a pointer. That just means that we have to pass the address of namelist to the list_init function:

list_init( &namelist );

Add this line to the beginning of main.

Second, you need to get items into the list of names. The insert function is supposed to keep the list alphabetical with no duplicates, but that will be the hardest part of the lab, so save it for later. For now, just create a struct namelist and push it onto the front of the list. Add the following code to insert:

struct namelist_node *node = malloc( sizeof(struct namelist_node) );
node->name = name;
list_push_front( &namelist,  &node->link );
return true;

Note in particular the call to list_push_front. The first parameter is a pointer to the list, so we pass the address of the list, &namelist. The second parameter is a pointer to the list_elem that we want to add to the list. The list_elem is a member of the node that we have just created. That member is referred to as node->link. Since we want to pass the address of the member, we add an ampersand: &node->link.

Third, you should complete the print_names function so that you can see the contents of the list. You can use a for loop, as suggested in list.h, to traverse the list:

for ( struct list_elem *e = list_begin(&namelist); 
              e != list_end(&namelist); e = list_next(e) ) { ...

Within the loop, you need to use list_entry to get access to the namelist_node that contains the list_elem. Once you have a pointer to that namelist_node, you can use it to get the name that you want to print. The complete for loop, which you can copy into the print_names function, looks like this:

for ( struct list_elem *e = list_begin(&namelist); 
              e != list_end(&namelist); e = list_next(e) ) {
    struct namelist_node *node = list_entry( e, struct namelist_node, link );
    printf("   %s\n", node->name);
}

At this point, you can compile and run the program, although the second half of the output will not be meaningful. When you compile the program, don't forget to include list.c in the gcc command!

Fourth and Fifth, you can write contains and delete using for loops very similar to the one in print_list. (Compile and run to make sure they work!)

Sixth, you should revise insert to make it work correctly for a sorted list. It should not insert duplicates, and it should insert a new name in its proper place to keep the list in sorted order. You will probably want to use a while(1) loop and the list_insert function. You can check how it's done in the original program, names_original.c, but things are actually a bit simpler when using a doubly linked list with head and tail nodes.

Later, you can go back and improve the code for contains and delete. The for loop that I suggested for these two functions makes no use of the fact that the list is sorted. You can save some processing time if you stop looking through the list when you get to an item that does not alphabetically precede the name that you are looking for. The original program does that. Update contains and delete to do the same.

Programming Assignment: Memory Management

This assignment is separate from the lab exercise. You will almost certainly not want to use Pintos lists for the assignment (although it would be possible). In fact, you don't necessarily need to use linked lists at all (although, again, it is possible).

In C, the standard function malloc is used to allocate blocks of memory, and free is used to free blocks that are no longer in use. There is also a function realloc for resizing a block that was allocated with malloc. These standard functions manage a "heap" of memory that is available to the program. For this assignment, you will write your own versions of these functions. You will manage a large pool of memory. A function my_malloc will allocate a block of memory from that pool, returning a pointer to the start of the block. A function my_free will return a block to the pool so that it can be reused. And my_realloc will resize a previously allocated block, possibly moving it if the size is to increase. You will also write mymem_init, which must be called to initialize the memory pool before using any of the other functions. And, for testing purposes, you will write mymem_max_bytes, which returns the size of the largest available block.

In addition to writing the implementation, you should write a short essay explaining your design. How does it work? Why did you decide to use that design? Write your essay in a plain text file and include the file in the my_malloc folder that you turn in.

You will be working in a copy of the directory /classes/cs431/my_malloc. The functions that you have to write are declared in mymem.h. You should read that file, since it has comments that specify exactly how the functions should behave. There are three main programs to test the code that you write: memtest1.c, memtest2.c, and memtest3.c. You will have to create a new C file to implement the memory functions. (You might want to write several versions, and you might decide to write a simpler main program for testing.)

In your implementation program, you should create a single large block of memory to serve as the memory pool, referenced by a global variable. The size must be 224 bytes (0x1000000 in hexadecimal). You could create it as an array of chars, or as an array of 222 (or 0x400000) ints, or using the built-in malloc function. (Note that according to the requirements in mymem.h, the actual size of any block is a multiple of 4, that is a multiple of sizeof(int). So it might be easiest to use an array of ints as your memory pool, depending on how you implement the functions.)

Your my_malloc function should get a block of memory from the pool that is at least as large as the size requested. The pointer that is returned by my_malloc should point somewhere within the memory pool. When my_free is called, the parameter will be a pointer that was previously returned by my_malloc. However, my_free will need to determine how big the block is. That information is not in the block itself, so it has to be stored elsewhere. It is probably easiest to store the size, and any other extra data that you need, in memory that comes just before the allocated block. Given the pointer from my_malloc, you can use pointer arithmetic to get a pointer to the extra data. (From the point of view of my_malloc, it should allocate memory to hold the extra data plus the space requested by the parameter to my_malloc; it should put whatever extra data it needs at the start of that memory, then return a pointer to the byte that follows the extra data.)

Your memory management system needs to keep track of what parts of the memory pool are in use and what parts are still available. There are many ways to do that. You will probably want to consider the pool to be divided into blocks, some in use and some not in use. To allocate a block for my_malloc, you need to find an unused block of memory that is at least big enough to hold a block of the requested size. If the unused block is (much) bigger than the block that you want to allocate, you should split off as much as you need for the allocated block. The remainder goes back into the unused memory pool.

At the beginning, the entire memory pool is in one unused block. At the first call to my_malloc, that unused block is divided into two blocks, one used and one unused.

As soon as there is more than one unused block, you are faced with the problem of deciding which unused block to use. You can use either a "firstfit" or a "bestfit" strategy. (You might want to try both.)

When a block is freed, it goes back into the unused pool. It's easiest just to add it to the list of unused blocks, and you should probably write a first version that does just that. However, there is a problem: It will be possible that no unused block of memory is large enough to fulfill a my_malloc request, but there are two unused blocks next to each other in memory that could be combined into a single block that is large enough. To prevent that, when my_free frees a block, it can check whether the freed block is immediately preceded and/or followed by another unused block. If so, neighboring unused blocks should be combined into one bigger block. (Alternatively, you might only combine blocks when my_alloc can't find a big enough unused block.) For full credit, you should implement the idea of combining unused blocks. Note that to do this efficiently, you might need to store some more extra data in your blocks.

This is not an easy assignment! Think before you program, and maybe discuss your ideas with me before you try to implement this. Drawing pictures can be helpful. Your code should work with the three test programs that I provided, but you might need additional testing. When I was writing my solutions, I actually wrote an extra function named mem_map() that would print out a table showing all the blocks in the memory pool, both used and unused.

http://xkcd.org/371