Section 12.2
List and Set Classes


IN THE PREVIOUS SECTION, we looked at the general properties of collection classes in Java. In this section, we look at some specific collection classes and how to use them. These classes can be divided into two categories: lists and sets. A list consists of a sequence of items arranged in a linear order. A list has a definite order, but is not necessarily sorted into ascending order. A set is a collection that has no duplicate entries. The elements of a set might or might not be arranged into some definite order. As with all of Java's collection classes, the items in a list or set are of type Object.


The ArrayList and LinkedList Classes

There are two obvious ways to represent a list: as a dynamic array and as a linked list. We've encountered these already in Sections 8.3 and 11.2. Both of these options are available in generic form as the collection classes java.util.ArrayList and java.util.LinkedList. That is, a ArrayList represents an ordered sequence of objects stored in an array that will grow in size as new items are added, and a LinkedList represents an ordered sequence of objects stored in nodes that are linked together with pointers. Both of these classes implement an interface java.util.List, which specifies operations that are available for all lists.

Both list classes support the basic list operations, and an abstract data type is defined by its operations, not by its representation. So why two classes? Why not a single List class with a single representation? The problem is that there is no single representation of lists for which all list operations are efficient. For some operations, linked lists are more efficient than arrays. For others, arrays are more efficient. In a particular application of lists, it's likely that only a few operations will be used frequently. You want to choose the representation for which the frequently used operations will be as efficient as possible.

Broadly speaking, the LinkedList class is more efficient in applications where items will often be added or removed at the beginning of the list or in the middle of the list. In an array, these operations require moving a large number of items up or down one position in the array, to make a space for a new item or to fill in the hole left by the removal of an item. In a linked list, nodes can be added or removed at any position by changing a few pointer values. The ArrayList class is more efficient when random access to items is required. Random access means accessing the n-th item in the list, for any integer n. This is trivial for an array, but for a linked list it means starting at the beginning of the list and moving from node to node along the list for n steps. Operations that can be done efficiently for both types of lists include sorting and adding an item at the end of the list.

All lists implement the Collection methods discussed in the previous section, including size(), isEmpty(), add(Object), remove(Object), and clear(). The add(Object) method adds the object at the end of the list. The remove(Object) method involves first finding the object, which is not very efficient for any list since it involves going through the items in the list from beginning to end until the object is found. The List interface adds some methods for accessing list items according to their numerical positions in the list. For an object, list, of type List, these methods include:

These methods are defined in both the ArrayList class and in the LinkedList class, although they are only efficient for ArrayLists. The LinkedList class adds a few additional methods, which are not defined for an ArrayList. If linkedlist is an object of type LinkedList, then

These methods are apparently defined to make it easy to use a LinkedList as if it were a stack or a queue. (See Section 11.3.) For example, we can use a LinkedList as a queue by adding items onto one end of the list (using the addLast() method) and removing them from the other end (using the removeFirst() method).

If list is an object of type List, then the method list.iterator(), defined in the Collection interface, returns an Iterator that can be used to traverse the list from beginning to end. However, for Lists, there is a special type of Iterator, called a ListIterator, which offers additional capabilities. The method list.listIterator() returns a ListIterator for list.

A ListIterator has the usual Iterator methods hasNext() and next(), but it also has methods hasPrevious() and previous() that make it possible to move backwards in the list. To understand how these work, its best to think of an iterator as pointing to a position between two list elements, or at the beginning or end of the list. In this diagram, the items in a list are represented by squares, and arrows indicate the possible positions of an iterator:

Iterator Positions

If iter is a ListIterator(), iter.next() moves the iterator one space to the right along the list and returns the item that the iterator passes as it moves. The method iter.previous() moves the iterator one space to the left along the list and returns the item that it passes. The method iter.remove() removes an item from the list; the item that is removed is the item that the iterator passed most recently in a call to either iter.next() or iter.previous(). There is also a method iter.add(Object) that adds the specified object to the list at the current position of the iterator. This can be between two existing items or at the beginning of the list or at the end of the list.

(By the way, the lists that are used in the LinkedList class are doubly linked lists. That is, each node in the list contains two pointers -- one to the next node in the list and one to the previous node. This makes it possible to implement effeciently both the next() and previous() methods of a ListIterator. Also, to make the addLast() and getLast() methods of a LinkedList efficient, the LinkedList class includes an instance variable that points to the last node in the list.)

As an example of using a ListIterator, suppose that we want to maintain a list of items that is always sorted into increasing order. When adding an item to the list, we can use a ListIterator to find the position in the list where the item should be added. The idea is to start at the beginning of the list and to move the iterator forward past all the items that are bigger than the item that is being inserted. At that point, the iterator's add() method can be used to insert the item at its correct position in the list. In order to say what it means for one item to be "bigger" than another, we assume that the items in the list implement the Comparable interface and define the compareTo() method. (This interface was discussed in the previous section.) Here is a method that will do this:

     static void orderedInsert(List list, Comparable newItem) {

            // Precondition:  The items in list are sorted into ascending
            //                order, according to the compareTo method.
            //                newitem.compareTo(item) must be defined for
            //                each item in the list.
            //
            // Postcondition: newItem has been added to the list in its
            //                correct position, so that the list is still
            //                sorted into ascending order.
            
        ListIterator iter = list.listIterator();
        
        // Move the iterator so that it points to the position where
        // newItem should be inserted into the list.  If newItem is
        // bigger than all the items in the list, then the while loop
        // will end when iter.hasNext() becomes false, that is, when
        // the iterator has reached the end of the list.
        
        while (iter.hasNext()) {
           Object item = iter.next();
           if (newItem.compareTo(item) <= 0) {
                 // newItem should come BEFORE item in the list.
                 // Move the iterator back one space so that
                 // it points to the correct insertion point,
                 // and end the loop.
              iter.previous();
              break;
           } 
        }

        iter.add(newItem);

     } // end orderedInsert()

Since the parameter in this method is of type List, it can be applied to both ArrayLists and LinkedLists, and it will be about equally efficient for both types of lists. You would probably find it easier to write an orderedInsert method using array-like indexing with the methods get(index) and add(index,obj). However, that method would be horribly inefficient for LinkedLists because get(index) is so inefficient for such lists. You can find a program that tests this method in the file ListInsert.java.


Sorting

Sorting a list is a fairly common operation, and there should really be a sorting method in the List interface. For some reason, there is not, but methods for sorting Lists are available as static methods in the class java.util.Collections. This class contains a variety of static utility methods for working with collections. The command

            Collections.sort(list);

can be used to sort a list into ascending order. The items in the list must implement the Comparable interface. This method will work, for example, for lists of Strings. If a Comparator is provided as a second argument:

            Collections.sort(list,comparator);

then the comparator will be used to compare the items in the list. As mentioned in the previous section, a Comparator is an object that defines a compare() method that can be used to compare two objects. We'll see an example of using a Comparator in Section 4.

The Collections class has at least two other useful methods for modifying lists. Collections.shuffle(list) will rearrange the elements of the list into a random order. Collections.reverse(list) will reverse the order of the elements, so that the last element is moved to the beginning of the list, the next-to-last element to the second position, and so on.

Since an efficient sorting method is provided for Lists, there is no need to write one yourself. You might be wondering whether there is an equally convenient method for standard arrays. The answer is yes. Array-sorting methods are available as static methods in the class java.util.Arrays. The command:

            Arrays.sort(A);

will sort an array, A, provided either that the base type of A is one of the primitive types (except boolean) or that A is an array of Objects that implement the Comparable interface. You can also sort part of an array. This is important since arrays are often only "partially filled." The command:

            Arrays.sort(A,fromIndex,toIndex);

sorts the elements A[fromIndex], A[fromIndex+1], ..., A[toIndex-1] into ascending order. You can use Arrays.sort(A,0,N) to sort a partially filled array which has elements in the first N positions.

Java does not support generic programming for primitive types. In order to implement the command Arrays.sort(A), the Arrays class contains eight methods: one method for arrays of Objects and one method for each of the primitive types byte, short, int, long, float, double, and char.


The TreeSet and HashSet Classes

A set is a collection of Objects in which no object occurs more than once. Objects obj1 and obj2 are considered to be the same if obj1.equals(obj2) is true, as discussed in the previous section. Sets implement all the general Collection methods, but do so in a way that ensures that no element occurs twice in the set. For example, if set is an object of type Set, then set.add(obj) will have no effect on the set if obj is already an element of the set. Java has two classes that implement the Set interface: java.util.TreeSet and java.util.HashSet.

In addition to being a Set, a TreeSet has the property that the elements of the set are arranged into ascending sorted order. An Iterator for a TreeSet will always visit the elements of the set in ascending order.

A TreeSet cannot hold arbitrary objects, since there must be a way to determine the sorted order of the objects it contains. Ordinarily, this means that the objects in a TreeSet should implement the Comparable interface and that obj1.compareTo(obj2) should be defined in a reasonable way for any two objects obj1 and obj2 in the set. Alternatively, a Comparator can be provided as a parameter to the constructor when the TreeSet is created. In that case, the Comparator will be used to compare objects that are added to the set.

In the implementation of a TreeSet, the elements are stored in something like a binary sort tree. (See Section 11.4.) The actually type of tree that is used is balanced in the sense that all the leaves of the tree are at about the same distance from the root of the tree. The number of operations required to find an item in a sorted tree is the same as the distance from the root of the tree to the item. Using a balanced tree ensures that all items are as close to the root as possible. This makes finding an item very efficient. Adding and removing elements are equally efficient.

The fact that a TreeSet sorts its elements and removes duplicates makes it very useful in some applications. In Section 10.3, I presented a program, WordList.java, that reads all the words in a file and outputs a list of the words it found. The list is sorted and duplicates have been removed. In that program, I used a linked list to store the words and had to write a subroutine to make sure that the list was sorted and contained no duplicates. By using a TreeSet instead of a list, that part of the programming is taken care of automatically. An algorithm for the program, using a TreeSet, would be:

         TreeSet words = new TreeSet();
         
         while there is more data in the input file:
            Let word = the next word from the file.
            words.add(word);
         
         Iterator iter = words.iterator();
         while (iter.hasNext()):
            Write iter.next() to the output file.

If you would like to see a complete, working program, you can find it in the file WordListWithTreeSet.java.

As another example, suppose that coll is any Collection of Strings (or any other type for which compareTo() is properly defined). We can use a TreeSet to sort the items of coll and remove the duplicates simply by saying:

         TreeSet set = new TreeSet();
         set.addAll(coll);

The second statement adds all the elements of the collection to the set. Since it's a Set, duplicates are ignored. Since it's a TreeSet, the elements of the set are sorted. If you would like to have the data in some other type of data structure, it's easy to copy the data from the set. For example, to place the answer in an ArrayList, you could say:

         TreeSet set = new TreeSet();
         set.addAll(coll);
         ArrayList list = new ArrayList();
         list.addAll(set);

Now, in fact, every one of Java's collection classes has a constructor that takes a Collection as an argument. All the items in that Collection are added to the new collection when it is created. So, new TreeSet(coll) creates a TreeSet that contains the same elements as the Collection, coll. This means that we can abbreviate the four lines in the above example to the single command:

         ArrayList list = new ArrayList( new TreeSet(coll) );

This makes a sorted list of the elements of coll with no duplicates. A nice example of the power of generic programming. (It seems, by the way, there is no equally easy way to get the list with duplicates. To do this, we would need something like a TreeSet that allows duplicates. The C++ programming language has such a thing and refers to it as a multiset. The Smalltalk language has something similar and calls it a bag. Java, for the time being at least, lacks this data type.)


A HashSet stores its elements in a hash table, a type of data structure that I will discuss in the next section. The operations of finding, adding, and removing elements are implemented very efficiently in hash tables, even more so than for TreeSets. The elements of a HashSet are not stored in any particular order. An Iterator for a HashSet will visit its elements in what seems to be a completely arbitrary order, and it's possible for the order to change if a new element is added. Because the elements of a HashSet are not ordered, they do not have to implement the Comparable interface. Use a HashSet instead of a TreeSet when the elements it contains are not comparable, or when the order is not important, or when the small advantage in efficiency is important.


[ Next Section | Previous Section | Chapter Index | Main Index ]