Javanotes 6.0, Section 7.2 -- Programming With Arrays

Section 7.2

Programming With Arrays

Arrays are the most basic and the most important type of data structure, and techniques for processing arrays are among the most important programming techniques you can learn. Two fundamental array processing techniques -- searching and sorting -- will be covered in Section 7.4. This section introduces some of the basic ideas of array processing in general.

7.2.1 Arrays and for Loops

In many cases, processing an array means applying the same operation to each item in the array. This is commonly done with a for loop. A loop for processing all the elements of an array A has the form:

// do any necessary initialization
for (int i = 0; i < A.length; i++) {
   . . . // process A[i]
}

Suppose, for example, that A is an array of type double[]. Suppose that the goal is to add up all the numbers in the array. An informal algorithm for doing this would be:

Start with sum = 0;
Add A[0] to sum;   (process the first item in A)
Add A[1] to sum;   (process the second item in A)
  .
  .
  .
Add A[ A.length - 1 ] to sum;   (process the last item in A)

Putting the obvious repetition into a loop, this becomes:

double sum;  // The sum of the numbers in A.
sum = 0;     // Start with 0.
for (int i = 0; i < A.length; i++)
   sum += A[i];  // add A[i] to the sum, for
                 //     i = 0, 1, ..., A.length - 1

Note that the continuation condition, "i < A.length", implies that the last value of i that is actually processed is A.length-1, which is the index of the final item in the array. It's important to use "<" here, not "<=", since "<=" would give an array index out of bounds error. There is no element at position A.length in A.

Eventually, you should just about be able to write loops similar to this one in your sleep. I will give a few more simple examples. Here is a loop that will count the number of items in the array A which are less than zero:

int count;  // For counting the items.
count = 0;  // Start with 0 items counted.
for (int i = 0; i < A.length; i++) {
   if (A[i] < 0.0)   // if this item is less than zero...
      count++;       //     ...then count it
}
// At this point, the value of count is the number
// of items that have passed the test of being < 0

Replace the test "A[i] < 0.0", if you want to count the number of items in an array that satisfy some other property. Here is a variation on the same theme. Suppose you want to count the number of times that an item in the array A is equal to the item that follows it. The item that follows A[i] in the array is A[i+1], so the test in this case is "if (A[i] == A[i+1])". But there is a catch: This test cannot be applied when A[i] is the last item in the array, since then there is no such item as A[i+1]. The result of trying to apply the test in this case would be an ArrayIndexOutOfBoundsException. This just means that we have to stop one item short of the final item:

int count = 0;
for (int i = 0; i < A.length - 1; i++) {
   if (A[i] == A[i+1])
      count++;
}

Another typical problem is to find the largest number in A. The strategy is to go through the array, keeping track of the largest number found so far. We'll store the largest number found so far in a variable called max. As we look through the array, whenever we find a number larger than the current value of max, we change the value of max to that larger value. After the whole array has been processed, max is the largest item in the array overall. The only question is, what should the original value of max be? One possibility is to start with max equal to A[0], and then to look through the rest of the array, starting from A[1], for larger items:

double max = A[0];
for (int i = 1; i < A.length; i++) {
   if (A[i] > max)
      max = A[i];
}
// at this point, max is the largest item in A

(There is one subtle problem here. It's possible in Java for an array to have length zero. In that case, A[0] doesn't exist, and the reference to A[0] in the first line gives an array index out of bounds error. However, zero-length arrays are normally something that you want to avoid in real problems. Anyway, what would it mean to ask for the largest item in an array that contains no items at all?)

As a final example of basic array operations, consider the problem of copying an array. To make a copy of our sample array A, it is not sufficient to say

double[] B = A;

since this does not create a new array object. All it does is declare a new array variable and make it refer to the same object to which A refers. (So that, for example, a change to A[i] will automatically change B[i] as well.) Remember that arrays are objects, and array variables hold pointers to objects; the assignment B = A just copies a pointer. To make a new array that is a copy of A, it is necessary to make a new array object and to copy each of the individual items from A into the new array:

double[] B = new double[A.length]; // Make a new array object,
                                   //   the same size as A.
for (int i = 0; i < A.length; i++)
   B[i] = A[i];   // Copy each item from A to B.

Copying values from one array to another is such a common operation that Java has a predefined subroutine to do it. The subroutine, System.arraycopy(), is a static method in the standard System class. Its declaration has the form

public static void arraycopy(Object sourceArray, int sourceStartIndex,
        Object destArray, int destStartIndex, int count)

where sourceArray and destArray can be arrays with any base type. Values are copied from sourceArray to destArray. The count tells how many elements to copy. Values are taken from sourceArray starting at position sourceStartIndex and are stored in destArray starting at position destStartIndex. For example, to make a copy of the array, A, using this subroutine, you would say:

double B = new double[A.length];
System.arraycopy( A, 0, B, 0, A.length );

7.2.2 Arrays and for-each Loops

Java 5.0 introduced a new form of the for loop, the "for-each loop" that was discussed in Subsection 3.4.4. The for-each loop is meant specifically for processing all the values in a data structure. When used to process an array, a for-each loop can be used to perform the same operation on each value that is stored in the array. If anArray is an array of type BaseType[], then a for-each loop for anArray has the form:

for ( BaseType item : anArray ) {
   .
   .  // process the item
   .
}

In this loop, item is the loop control variable. It is being declared as a variable of type BaseType, where BaseType is the base type of the array. (In a for-each loop, the loop control variable must be declared in the loop.) When this loop is executed, each value from the array is assigned to item in turn and the body of the loop is executed for each value. Thus, the above loop is exactly equivalent to:

for ( int index = 0; index < anArray.length; index++ ) {
   BaseType item;
   item = anArray[index];  // Get one of the values from the array
     .
     .  // process the item
     .  
}

For example, if A is an array of type int[], then we could print all the values from A with the for-each loop:

for ( int item : A )
   System.out.println( item );

and we could add up all the positive integers in A with:

int sum = 0;   // This will be the sum of all the positive numbers in A
for ( int item : A ) {
   if (item > 0)
      sum = sum + item;
}

The for-each loop is not always appropriate. For example, there is no simple way to use it to process the items in just a part of an array. However, it does make it a little easier to process all the values in an array, since it eliminates any need to use array indices.

It's important to note that a for-each loop processes the values in the array, not the elements (where an element means the actual memory location that is part of the array). For example, consider the following incorrect attempt to fill an array of integers with 17's:

int[] intList = new int[10];
for ( int item : intList ) {   // INCORRECT! DOES NOT MODIFY THE ARRAY!
   item = 17;
}

The assignment statement item = 17 assigns the value 17 to the loop control variable, item. However, this has nothing to do with the array. When the body of the loop is executed, the value from one of the elements of the array is copied into item. The statement item = 17 replaces that copied value but has no effect on the array element from which it was copied; the value in the array is not changed.

7.2.3 Array Types in Subroutines

Any array type, such as double[], is a full-fledged Java type, so it can be used in all the ways that any other Java type can be used. In particular, it can be used as the type of a formal parameter in a subroutine. It can even be the return type of a function. For example, it might be useful to have a function that makes a copy of an array of double:

/**
 *  Create a new array of doubles that is a copy of a given array.
 *  @param source the array that is to be copied; the value can be null
 *  @return a copy of source; if source is null, then the return value is also null
 */
public static double[]  copy( double[] source ) {
    if ( source == null )
       return null;
    double[]  cpy;  // A copy of the source array.
    cpy = new double[source.length];
    System.arraycopy( source, 0, cpy, 0, source.length );
    return cpy;
}

The main() routine of a program has a parameter of type String[]. You've seen this used since all the way back in Section 2.1, but I haven't really been able to explain it until now. The parameter to the main() routine is an array of Strings. When the system calls the main() routine, it passes an actual array of strings, which becomes the value of this parameter. Where do the strings come from? The strings in the array are the command-line arguments from the command that was used to run the program. When using a command-line interface, the user types a command to tell the system to execute a program. The user can include extra input in this command, beyond the name of the program. This extra input becomes the command-line arguments. For example, if the name of the class that contains the main() routine is myProg, then the user can type "java myProg" to execute the program. In this case, there are no command-line arguments. But if the user types the command

java myProg one two three

then the command-line arguments are the strings "one", "two", and "three". The system puts these strings into an array of Strings and passes that array as a parameter to the main() routine. Here, for example, is a short program that simply prints out any command line arguments entered by the user:

public class CLDemo {
   
   public static void main(String[] args) {
      System.out.println("You entered " + args.length
                                  + " command-line arguments");
      if (args.length > 0) {
         System.out.println("They were:");
         for (int i = 0; i < args.length; i++)
            System.out.println("   " + args[i]);
      }
   } // end main()
   
} // end class CLDemo

Note that the parameter, args, is never null when main() is called by the system, but it might be an array of length zero.

In practice, command-line arguments are often the names of files to be processed by the program. I will give some examples of this in Chapter 11, when I discuss file processing.

7.2.4 Random Access

So far, all my examples of array processing have used sequential access. That is, the elements of the array were processed one after the other in the sequence in which they occur in the array. But one of the big advantages of arrays is that they allow random access. That is, every element of the array is equally accessible at any given time.

As an example, let's look at a well-known problem called the birthday problem: Suppose that there are N people in a room. What's the chance that there are two people in the room who have the same birthday? (That is, they were born on the same day in the same month, but not necessarily in the same year.) Most people severely underestimate the probability. We will actually look at a different version of the question: Suppose you choose people at random and check their birthdays. How many people will you check before you find one who has the same birthday as someone you've already checked? Of course, the answer in a particular case depends on random factors, but we can simulate the experiment with a computer program and run the program several times to get an idea of how many people need to be checked on average.

To simulate the experiment, we need to keep track of each birthday that we find. There are 365 different possible birthdays. (We'll ignore leap years.) For each possible birthday, we need to keep track of whether or not we have already found a person who has that birthday. The answer to this question is a boolean value, true or false. To hold the data for all 365 possible birthdays, we can use an array of 365 boolean values:

boolean[] used; 
used = new boolean[365];

The days of the year are numbered from 0 to 364. The value of used[i] is true if someone has been selected whose birthday is day number i. Initially, all the values in the array, used, are false. When we select someone whose birthday is day number i, we first check whether used[i] is true. If it is true, then this is the second person with that birthday. We are done. If used[i] is false, we set used[i] to be true to record the fact that we've encountered someone with that birthday, and we go on to the next person. Here is a subroutine that carries out the simulated experiment (of course, in the subroutine, there are no simulated people, only simulated birthdays):

/**
 * Simulate choosing people at random and checking the day of the year they 
 * were born on.  If the birthday is the same as one that was seen previously, 
 * stop, and output the number of people who were checked.
 */
private static void birthdayProblem() {

   boolean[] used;  // For recording the possible birthdays
                    //   that have been seen so far.  A value
                    //   of true in used[i] means that a person
                    //   whose birthday is the i-th day of the
                    //   year has been found.

   int count;       // The number of people who have been checked.

   used = new boolean[365];  // Initially, all entries are false.
   
   count = 0;

   while (true) {
          // Select a birthday at random, from 0 to 364.
          // If the birthday has already been used, quit.
          // Otherwise, record the birthday as used.
      int birthday;  // The selected birthday.
      birthday = (int)(Math.random()*365);
      count++;
      if ( used[birthday] )  // This day was found before; It's a duplicate.
         break;
      used[birthday] = true;
   }

   System.out.println("A duplicate birthday was found after " 
                                             + count + " tries.");

} // end birthdayProblem()

This subroutine makes essential use of the fact that every element in a newly created array of boolean is set to be false. If we wanted to reuse the same array in a second simulation, we would have to reset all the elements in it to be false with a for loop:

for (int i = 0; i < 365; i++)
    used[i] = false;

Here is an applet that will run the simulation as many times as you like. Are you surprised at how few people have to be chosen, in general?

7.2.5 Arrays of Objects

One of the examples in Subsection 6.4.2 was an applet that shows multiple copies of a message in random positions, colors, and fonts. When the user clicks on the applet, the positions, colors, and fonts are changed to new random values. Like several other examples from that chapter, the applet had a flaw: It didn't have any way of storing the data that would be necessary to redraw itself. Arrays provide us with one possible solution to this problem. We can write a new version of the RandomStrings applet that uses an array to store the position, font, and color of each string. When the content pane of the applet is painted, this information is used to draw the strings, so the applet will paint itself correctly whenever it has to be redrawn. When the user clicks on the applet, the array is filled with new random values and the applet is repainted using the new data. So, the only time that the picture will change is in response to a mouse click. Here is the new version of the applet:

In this applet, the number of copies of the message is given by a named constant, MESSAGE_COUNT. One way to store the position, color, and font of MESSAGE_COUNT strings would be to use four arrays:

int[] x = new int[MESSAGE_COUNT];  
int[] y = new int[MESSAGE_COUNT];
Color[] color = new Color[MESSAGE_COUNT];
Font[] font = new Font[MESSAGE_COUNT];

These arrays would be filled with random values. In the paintComponent() method, the i-th copy of the string would be drawn at the point (x[i],y[i]). Its color would be given by color[i]. And it would be drawn in the font font[i]. This would be accomplished by the paintComponent() method

public void paintComponent(Graphics g) {
   super.paintComponent(); // (Fill with background color.)
   for (int i = 0; i < MESSAGE_COUNT; i++) {
      g.setColor( color[i] );
      g.setFont( font[i] );
      g.drawString( message, x[i], y[i] );
   }
}

This approach is said to use parallel arrays. The data for a given copy of the message is spread out across several arrays. If you think of the arrays as laid out in parallel columns -- array x in the first column, array y in the second, array color in the third, and array font in the fourth -- then the data for the i-th string can be found along the i-th row. There is nothing wrong with using parallel arrays in this simple example, but it does go against the object-oriented philosophy of keeping related data in one object. If we follow this rule, then we don't have to imagine the relationship among the data, because all the data for one copy of the message is physically in one place. So, when I wrote the applet, I made a simple class to represent all the data that is needed for one copy of the message:

/**
 * An object of this type holds the position, color, and font
 * of one copy of the string.
 */
private static class StringData {
   int x, y;     // The coordinates of the left end of baseline of string.
   Color color;  // The color in which the string is drawn.
   Font font;    // The font that is used to draw the string.
}

(This class is actually defined as a static nested class in the main applet class.) To store the data for multiple copies of the message, I use an array of type StringData[]. The array is declared as an instance variable, with the name stringData:

StringData[] stringData;

Of course, the value of stringData is null until an actual array is created and assigned to it. This is done in the init() method of the applet with the statement

stringData = new StringData[MESSAGE_COUNT];

The base type of this array is StringData, which is a class. We say that stringData is an array of objects. This means that the elements of the array are variables of type StringData. Like any object variable, each element of the array can either be null or can hold a reference to an object. (Note that the term "array of objects" is a little misleading, since the objects are not in the array; the array can only contain references to objects.) When the stringData array is first created, the value of each element in the array is null.

The data needed by the RandomStrings program will be stored in objects of type StringData, but no such objects exist yet. All we have so far is an array of variables that are capable of referring to such objects. I decided to create the StringData objects in the applet's init method. (It could be done in other places -- just so long as we avoid trying to use an object that doesn't exist. This is important: Remember that a newly created array whose base type is an object type is always filled with null elements. There are no objects in the array until you put them there.) The objects are created with the for loop

for (int i = 0; i < MESSAGE_COUNT; i++)
    stringData[i] = new StringData();

For the RandomStrings applet, the idea is to store data for the i-th copy of the message in the variables stringData[i].x, stringData[i].y, stringData[i].color, and stringData[i].font. Make sure that you understand the notation here: stringData[i] refers to an object. That object contains instance variables. The notation stringData[i].x tells the computer: "Find your way to the object that is referred to by stringData[i]. Then go to the instance variable named x in that object." Variable names can get even more complicated than this, so it is important to learn how to read them. Using the array, stringData, the paintComponent() method for the applet could be written

public void paintComponent(Graphics g) {
   super.paintComponent(g); // (Fill with background color.)
   for (int i = 0; i < MESSAGE_COUNT; i++) {
      g.setColor( stringData[i].color );
      g.setFont( stringData[i].font );
      g.drawString( message, stringData[i].x, stringData[i].y );
   }
}

However, since the for loop is processing every value in the array, an alternative would be to use a for-each loop:

public void paintComponent(Graphics g) {
   super.paintComponent(g);
   for ( StringData data : stringData) {
          // Draw a copy of the message in the position, color, 
          // and font stored in data.
      g.setColor( data.color );
      g.setFont( data.font );
      g.drawString( message, data.x, data.y );
   }
}

In this loop, the loop control variable, data, holds a copy of one of the values from the array. That value is a reference to an object of type StringData, which has instance variables named color, font, x, and y. Once again, the use of a for-each loop has eliminated the need to work with array indices.

There is still the matter of filling the array, data, with random values. If you are interested, you can look at the source code for the applet, RandomStringsWithArray.java.

The RandomStrings applet uses one other array of objects. The font for a given copy of the message is chosen at random from a set of five possible fonts. In the original version of the applet, there were five variables of type Font to represent the fonts. The variables were named font1, font2, font3, font4, and font5. To select one of these fonts at random, a switch statement could be used:

Font randomFont;  // One of the 5 fonts, chosen at random.
int rand;         // A random integer in the range 0 to 4.

rand = (int)(Math.random() * 5);
switch (rand) {
   case 0:
      randomFont = font1;
      break;
   case 1:
      randomFont = font2;
      break;
   case 2:
      randomFont = font3;
      break;
   case 3:
      randomFont = font4;
      break;
   case 4:
      randomFont = font5;
      break;
}

In the new version of the applet, the five fonts are stored in an array, which is named fonts. This array is declared as an instance variable of type Font[]

Font[] fonts;

The array is created in the init() method of the applet, and each element of the array is set to refer to a new Font object:

fonts = new Font[5];  // Create the array to hold the five fonts.
 
fonts[0] = new Font("Serif", Font.BOLD, 14);
fonts[1] = new Font("SansSerif", Font.BOLD + Font.ITALIC, 24);
fonts[2] = new Font("Monospaced", Font.PLAIN, 20);
fonts[3] = new Font("Dialog", Font.PLAIN, 30);
fonts[4] = new Font("Serif", Font.ITALIC, 36);

This makes it much easier to select one of the fonts at random. It can be done with the statements

Font randomFont;  // One of the 5 fonts, chosen at random.
int fontIndex;    // A random number in the range 0 to 4.
fontIndex = (int)(Math.random() * 5);
randomFont = fonts[ fontIndex ];

The switch statement has been replaced by a single line of code. In fact, the preceding four lines could be replaced by the single line:

Font randomFont = fonts[ (int)(Math.random() * 5) ];

This is a very typical application of arrays. Note that this example uses the random access property of arrays: We can pick an array index at random and go directly to the array element at that index.

Here is another example of the same sort of thing. Months are often stored as numbers 1, 2, 3, ..., 12. Sometimes, however, these numbers have to be translated into the names January, February, ..., December. The translation can be done with an array. The array can be declared and initialized as

static String[] monthName = { "January", "February", "March",
                              "April",   "May",      "June",
                              "July",    "August",   "September",
                              "October", "November", "December" };

If mnth is a variable that holds one of the integers 1 through 12, then monthName[mnth-1] is the name of the corresponding month. We need the "-1" because months are numbered starting from 1, while array elements are numbered starting from 0. Simple array indexing does the translation for us!

7.2.6 Variable Arity Methods

Arrays are used in the implementation of a feature that was introduced in Java 5.0. Before version 5.0, every method in Java had a fixed arity. (The arity of a subroutine is defined as the number of parameters in a call to the method.) In a fixed arity method, the number of parameters must be the same in every call to the method. Java 5.0 introduced variable arity methods. In a variable arity method, different calls to the method can have different numbers of parameters. For example, the formatted output method System.out.printf, which was introduced in Subsection 2.4.4, is a variable arity method. The first parameter of System.out.printf must be a String, but it can have any number of additional parameters, of any types.

Calling a variable arity method is no different from calling any other sort of method, but writing one requires some new syntax. As an example, consider a method that can compute the average of any number of values of type double. The definition of such a method could begin with:

public static double average( double...  numbers ) {

Here, the ... after the type name, double, indicates that any number of values of type double can be provided when the subroutine is called, so that for example average(1,4,9,16), average(3.14,2.17), average(0.375), and even average() are all legal calls to this method. Note that actual parameters of type int can be passed to average. The integers will, as usual, be automatically converted to real numbers.

When the method is called, the values of all the actual parameters that correspond to the variable arity parameter are placed into an array, and it is this array that is actually passed to the method. That is, in the body of a method, a variable arity parameter of type T actually looks like an ordinary parameter of type T[]. The length of the array tells you how many actual parameters were provided in the method call. In the average example, the body of the method would see an array named numbers of type double[]. The number of actual parameters in the method call would be numbers.length, and the values of the actual parameters would be numbers[0], numbers[1], and so on. A complete definition of the method would be:

public static double average( double... numbers ) {
   double sum;      // The sum of all the actual parameters.
   double average;  // The average of all the actual parameters.
   sum = 0;
   for (int i = 0; i < numbers.length; i++) {
      sum = sum + numbers[i];  // Add one of the actual parameters to the sum.
   }
   average = sum / numbers.length;
   return average;
}

Note that the "..." can be applied only to the last formal parameter in a method definition. Note also that it is possible to pass an actual array to the method, instead of a list of individual values. For example, if salesData is a variable of type double[], then it would be legal to call average(salesData), and this would compute the average of all the numbers in the array.

As another example, consider a method that can draw a polygon through any number of points. The points are given as values of type Point, where an object of type Point has two instance variables, x and y, of type int. In this case, the method has one ordinary parameter -- the graphics context that will be used to draw the polygon -- in addition to the variable arity parameter:

public static void drawPolygon(Graphics g, Point... points) {
    if (points.length > 1) {  // (Need at least 2 points to draw anything.)
       for (int i = 0; i < points.length - 1; i++) {
           // Draw a line from i-th point to (i+1)-th point
           g.drawLine( points[i].x, points[i].y, points[i+1].x, points[i+1].y );
       }
       // Now, draw a line back to the starting point.
       g.drawLine( points[points.length-1].x, points[points.length-1].y,
                       points[0].x, points[0].y );
    }
}

Because of automatic type conversion, a variable arity parameter of type "Object..." can take actual parameters of any type whatsoever. Even primitive type values are allowed, because of autoboxing. (A primitive type value belonging to a type such as int is converted to an object belonging to a "wrapper" class such as Integer. See Subsection 5.3.2.) For example, the method definition for System.out.printf could begin:

public void printf(String format, Object... values) {

This allows the printf method to output values of any type. Similarly, we could write a method that strings together the string representations of all its parameters into one long string:

public static String concat( Object... values ) {
   StringBuffer buffer;  // Use a StringBuffer for more efficient concatenation.
   buffer = new StringBuffer();  // Start with an empty buffer.
   for ( Object obj : values ) { // A "for each" loop for processing the values.
       buffer.append(obj); // Add string representation of obj to the buffer.
   }
   return buffer.toString(); // return the contents of the buffer
}