Solution for
Programming Exercise 10.1


THIS PAGE DISCUSSES ONE POSSIBLE SOLUTION to the following exercise from this on-line Java textbook.

Exercise 10.1: The WordList program from Section 10.3 reads a text file and makes an alphabetical list of all the words in that file. The list of words is output to another file. Improve the program so that it also keeps track of the number of times that each word occurs in the file. Write two lists to the output file. The first list contains the words in alphabetical order. The number of times that the work occurred in the file should be listed along with the word. Then write a second list to the output file in which the words are sorted according to the number of times that they occurred in the files. The word that occurred most often should be listed first.


Discussion

In the original WordList program, the words from the file are stored in an array of Strings. For this exercise, we also need to keep track of the number of times a given string has been encountered in the file. One possibility would be to add an array of ints to store the counts. This would be using "parallel arrays" to store the data. Another approach is to create a small class to hold the word and its associated counter. Then the data can be stored in an array of objects belonging to that class. In my program, I used the latter approach. The class that I defined to hold the data for a word is:

      class WordData {
            // A little class to hold the data for one word.
         String word;   // The word (in lower case letters).
         int count;     // The number of times the word has been found.
         WordData(String w) {
              // Constructor creates an object with the specified word
              // and with the counter initialized to 1.
            word = w;
            count = 1;
         }
      }

Note that I've included a constructor to make it easy to create objects of type WordData. This constructor will be used whenever a new word is encountered in the file for the first time. At that time, the count should be one. The constructor takes care of this detail. After that, each time we encounter the word again, the count for that word is increased by 1. Aside from this, the process of reading the file and storing the data in the array is essentially the same as it was in the WordList program. You can check the differences -- shown in red -- in the main() routine and the insertWord() subroutine in the solution that is given below. In insertWord, note in particular that the word in position pos in the array is words[pos].word, while the frequency counter for that word is words[pos].count. For example, the command for adding 1 to the count is "words[pos].count++;".

Since the list of words is output twice, I've written a subroutine, putWordList(), that writes the words to an output stream. I've tried to make the output reasonably neat by outputting the words in a column that is 15 spaces wide. The other subroutine that I've added is sortByFrequency(), which will rearrange the words in the array so that they are in order of decreasing frequency. This routine is called by the main program before the word list is printed for the second time. The sorting routine uses a standard Insertion Sort algorithm, as given in Section 8.4, except that it sorts into decreasing order instead of increasing.


The Solution

Significant changes from WordList.java are shown in red.


   /*
      This program lets the user specify a text file for input and a file
      for output.  All the words are read from the input file.  Words are
      converted to lower case.  The program makes a list of all the words
      that occur in the file, along with the number of times that each
      word occurs.  The words and word frequencies are printed to an
      output file.  The list is printed twice:  once in alphabetical order
      and once in order of decreasing frequency.  A word in this program is 
      defined to be any sequence of letters.
      
      This class depends on the non-standard classes TextIO and TextReader.
   */
   
   import java.io.*;
   
   class WordData {
         // A little class to hold the data for one word.
      String word;   // The word (in lower case letters).
      int count;     // The number of times the word has been found.
      WordData(String w) {
           // Constructor creates an object with the specified word
           // and with the counter initialized to 1.
         word = w;
         count = 1;
      }
   }
   
   public class WordFrequencies {
   
      static WordData[] words;  // An array to hold the words from the file.
                                //   Note that the array will be expanded as 
                                //   necessary, in the insertWord() subroutine.
      
      static int wordCount;   // The number of words currently stored in
                              //   the array.
   
   
      public static void main(String[] args) {
      
         TextReader in;    // A stream for reading from the input file.
         PrintWriter out;  // A stream for writing to the output file.
         
         String inputFileName;   // Input file name, specified by the user.
         String outputFileName;  // Output file name, specified by the user.
         
         words = new WordData[10];  // Start with space for 10 words.
         wordCount = 0;             // Currently, there are no words in the array.
         
         /* Get the input file name from the user and try to create the
            input stream.  If there is a FileNotFoundException, print
            a message and terminate the program. */
         
         TextIO.put("Input file name?  ");
         inputFileName = TextIO.getln().trim();
         try {
            in = new TextReader(new FileReader(inputFileName));
         }
         catch (FileNotFoundException e) {
             TextIO.putln("Can't find file \"" + inputFileName + "\".");
             return;
         }
         
         /* Get the output file name from the user and try to create the
            output stream.  If there is an IOException, print a message
            and terminate the program. */
   
         TextIO.put("Output file name? ");
         outputFileName = TextIO.getln().trim();
         try {
            out = new PrintWriter(new FileWriter(outputFileName));
         }
         catch (IOException e) {
             TextIO.putln("Can't open file \"" + outputFileName + "\" for output.");
             TextIO.putln(e.toString());
             return;
         }
         
         /* Read all the words from the input stream and insert them into
            the array of words.  Reading from a TextReader can result in
            an error of type TextReader.Error.  If one occurs, print an
            error message and terminate the program. */
         
         try {
            while (true) {
                  // Skip past and non-letters in the input stream.  If an
                  //   end-of-stream has been reached, end the loop.  Otherwise,
                  //   read a word and insert it into the array of words.
               while ( ! in.eof() && ! Character.isLetter(in.peek()) )
                  in.getAnyChar();
               if (in.eof())
                  break;
               insertWord(in.getAlpha());
            }
         }
         catch (TextReader.Error e) {
            TextIO.putln("An error occurred while reading from the input file.");
            TextIO.putln(e.toString());
            return;
         }
         
         /* Output the words in alphabetical order. */
         
         out.println("Words in alphabetical order, with the number of occurrences:");
         out.println("-----------------------------------------------------------");
         out.println();
         
         putWordList(out);
         
         /* Sort the list of words according the frequency with which
            they were found in the file, and then output the list again. */
         
         sortByFrequency();
         
         out.println();
         out.println();
         out.println("Words in order of frequency:");
         out.println("---------------------------");
         out.println();
         
         putWordList(out);
         
         if (out.checkError()) {
            TextIO.putln("Some error occurred while writing to the output file.");
            TextIO.putln("The output might be missing, incomplete, or corrupted.");
         }
         else {
            TextIO.putln("Done.");
         }
      
      } // end main()
      
   
      static void insertWord(String w) {
              // Insert the word w into the array of words, unless it already
              // appears there.  If the word already appears in the list,
              // add 1 to the counter for that word.  The words in the array are in
              // lower case,  and w is converted to lower case before it is processed.
              // Note that the words in the array are kept in alphabetical order.
              // If the array has no more space to hold w, then it is doubled
              // in size.
   
         int pos = 0;  // This will be the position in the array where w belongs.
   
         w = w.toLowerCase();
         
         /* Find the position in the array where w belongs, after all the
            words that precede w alphabetically.  If a copy of w already
            occupies that position, then it is not necessary to insert
            w, so just add 1 to the counter associated with the word
            and return. */
   
         while (pos < wordCount && words[pos].word.compareTo(w) < 0)
            pos++;
         if (pos < wordCount && words[pos].word.equals(w)) {
            words[pos].count++;
            return;
         }
            
         /* If the array is full, make a new array that is twice as 
             big, copy all the words from the old array to the new,
             and set the variable, words, to refer to the new array. */
   
         if (wordCount == words.length) {
            WordData[] newWords = new WordData[words.length*2];
            System.arraycopy(words,0,newWords,0,wordCount);
            words = newWords;
         }
         
         /* Put w into its correct position in the array.  Move any
            words that come after w up one space in the array to
            make room for w.  Create a new WordData object to hold
            the new word.  */
   
         for (int i = wordCount; i > pos; i--)
            words[i] = words[i-1];
         words[pos] = new WordData(w);
         wordCount++;
   
      }  // end insertWord()
      
      
      static void putWordList(PrintWriter output) {
             // Write the list of words from the words array, along with their
             // associated frequencies, to the output stream specified in the
             // parameter to this subroutine.  Words are output in a column
             // that is 15 spaces wide.  If an individual word is longer than
             // 15 spaces, the output won't be quite so neat.
          for (int i = 0; i < wordCount; i++) {
             output.print("   ");
             output.print(words[i].word);
             for (int space = words[i].word.length(); space < 15; space++)
                output.print(" ");
             output.print(" ");
             output.println(words[i].count);
          }
      }  // end putWordList()
      
      
      static void sortByFrequency() {
             // Use insertion sort to sort the words in the list so that they
             // are in order of decreasing frequency.  (Note that insertion sort
             // has the following neat property:  In a group of words that all
             // have the same frequency, the words will remain in alphabetical
             // order after the words have been sorted by frequency.)
          for (int i = 1; i < wordCount; i++) {
             WordData temp = words[i];  // Save the word in position i.
             int location;              // An index in the array.
             location = i - 1;
             while (location >= 0 && words[location].count < temp.count) {
                   // Bump words that have frequencies less than the frequency
                   // of temp up one space in the array.
                words[location + 1] = words[location]; 
                location--;
             }
             words[location + 1] = temp;  // Put temp in last vacated space.
          
          }
      }  // end sortByFrequency()
   
      
   }  // end class WordFrequencies


[ Exercises | Chapter Index | Main Index ]