[ Previous Section | Next Section | Chapter Index | Main Index ]

Section 11.3

Programming With Files


In this section, we look at several programming examples that work with files, using the techniques that were introduced in Section 11.1 and Section 11.2.


11.3.1  Copying a File

As a first example, we look at a simple command-line program that can make a copy of a file. Copying a file is a pretty common operation, and every operating system already has a command for doing it. However, it is still instructive to look at a Java program that does the same thing. Many file operations are similar to copying a file, except that the data from the input file is processed in some way before it is written to the output file. All such operations can be done by programs with the same general form. Subsection 4.3.6 included a program for copying text files using TextIO. The example in this section will work for any file.

Since the program should be able to copy any file, we can't assume that the data in the file is in human-readable form. So, we have to use the byte streams, InputStream and OutputStream, to operate on the file. The program simply copies all the data from the InputStream to the OutputStream, one byte at a time. If source is the variable that refers to the InputStream, then the function source.read() can be used to read one byte. This function returns the value -1 when all the bytes in the input file have been read. Similarly, if copy refers to the OutputStream, then copy.write(b) writes one byte to the output file. So, the heart of the program is a simple while loop. As usual, the I/O operations can throw exceptions, so this must be done in a try..catch statement:

while(true) {
   int data = source.read();
   if (data < 0)
      break;
   copy.write(data);
}

The file-copy command in an operating system such as UNIX uses command line arguments to specify the names of the files. For example, the user might say "copy original.dat backup.dat" to copy an existing file, original.dat, to a file named backup.dat. Command-line arguments can also be used in Java programs. The command line arguments are stored in the array of strings, args, which is a parameter to the main() routine. The program can retrieve the command-line arguments from this array. (See Subsection 4.3.6.) For example, if the program is named CopyFile and if the user runs the program with the command

java CopyFile work.dat oldwork.dat

then in the program, args[0] will be the string "work.dat" and args[1] will be the string "oldwork.dat". The value of args.length tells the program how many command-line arguments were specified by the user.

The program CopyFile.java gets the names of the files from the command-line arguments. It prints an error message and exits if the file names are not specified. To add a little interest, there are two ways to use the program. The command line can simply specify the two file names. In that case, if the output file already exists, the program will print an error message and end. This is to make sure that the user won't accidently overwrite an important file. However, if the command line has three arguments, then the first argument must be "-f" while the second and third arguments are file names. The -f is a command-line option, which is meant to modify the behavior of the program. The program interprets the -f to mean that it's OK to overwrite an existing program. (The "f" stands for "force," since it forces the file to be copied in spite of what would otherwise have been considered an error.) You can see in the source code how the command line arguments are interpreted by the program:

import java.io.*;

/**
 *  Makes a copy of a file.  The original file and the name of the
 *  copy must be given as command-line arguments.  In addition, the
 *  first command-line argument can be "-f"; if present, the program
 *  will overwrite an existing file; if not, the program will report
 *  an error and end if the output file already exists.  The number
 *  of bytes that are copied is reported.
 */
public class CopyFile {

   public static void main(String[] args) {
      
      String sourceName;   // Name of the source file, 
                           //    as specified on the command line.
      String copyName;     // Name of the copy, 
                           //    as specified on the command line.
      InputStream source;  // Stream for reading from the source file.
      OutputStream copy;   // Stream for writing the copy.
      boolean force;  // This is set to true if the "-f" option
                      //    is specified on the command line.
      int byteCount;  // Number of bytes copied from the source file.
      
      /* Get file names from the command line and check for the 
         presence of the -f option.  If the command line is not one
         of the two possible legal forms, print an error message and 
         end this program. */
   
      if (args.length == 3 && args[0].equalsIgnoreCase("-f")) {
         sourceName = args[1];
         copyName = args[2];
         force = true;
      }
      else if (args.length == 2) {
         sourceName = args[0];
         copyName = args[1];
         force = false;
      }
      else {
         System.out.println(
                 "Usage:  java CopyFile <source-file> <copy-name>");
         System.out.println(
                 "    or  java CopyFile -f <source-file> <copy-name>");
         return;
      }
      
      /* Create the input stream.  If an error occurs, end the program. */
      
      try {
         source = new FileInputStream(sourceName);
      }
      catch (FileNotFoundException e) {
         System.out.println("Can't find file \"" + sourceName + "\".");
         return;
      }
      
      /* If the output file already exists and the -f option was not
         specified, print an error message and end the program. */
   
      File file = new File(copyName);
      if (file.exists() && force == false) {
          System.out.println(
               "Output file exists.  Use the -f option to replace it.");
          return;  
      }
      
      /* Create the output stream.  If an error occurs, end the program. */

      try {
         copy = new FileOutputStream(copyName);
      }
      catch (IOException e) {
         System.out.println("Can't open output file \"" + copyName + "\".");
         return;
      }
      
      /* Copy one byte at a time from the input stream to the output
         stream, ending when the read() method returns -1 (which is 
         the signal that the end of the stream has been reached).  If any 
         error occurs, print an error message.  Also print a message if 
         the file has been copied successfully.  */
      
      byteCount = 0;
      
      try {
         while (true) {
            int data = source.read();
            if (data < 0)
               break;
            copy.write(data);
            byteCount++;
         }
         source.close();
         copy.close();
         System.out.println("Successfully copied " + byteCount + " bytes.");
      }
      catch (Exception e) {
         System.out.println("Error occurred while copying.  "
                                   + byteCount + " bytes copied.");
         System.out.println("Error: " + e);
      }
      
   }  // end main()
   
   
} // end class CopyFile

It is actually quite inefficient to copy one byte at a time. Efficiency could be improved by using alternative versions of the read() and write() methods that read and write multiple bytes (see the API for details). Alternatively, the input and output streams could be wrapped in objects of type BufferedInputStream and BufferedOutputStream which automatically read data from and write data to files in larger blocks. This would require changing only the two lines in the program that create the streams. For example, the input stream could be created using

source = new BufferedInputStream(new FileInputStream(sourceName));

The buffered stream would then be used in exactly the same way as the unbuffered stream.

There is also a sample program CopyFileAsResources.java that does the same thing as CopyFile but uses the resource pattern in a try..catch statement to make sure that the streams are closed in all cases. See the discussion at the end of Subsection 8.3.2)


11.3.2  Persistent Data

Once a program ends, any data that was stored in variables and objects in the program is gone. In many cases, it would be useful to have some of that data stick around so that it will be available when the program is run again. The problem is, how to make the data persistent between runs of the program? The answer, of course, is to store the data in a file (or, for some applications, in a database—but the data in a database is itself stored in files).

Consider a "phone book" program that allows the user to keep track of a list of names and associated phone numbers. The program would make no sense at all if the user had to create the whole list from scratch each time the program is run. It would make more sense to think of the phone book as a persistent collection of data, and to think of the program as an interface to that collection of data. The program would allow the user to look up names in the phone book and to add new entries. Any changes that are made should be preserved after the program ends.

The sample program PhoneDirectoryFileDemo.java is a very simple implementation of this idea. It is meant only as an example of file use; the phone book that it implements is a "toy" version that is not meant to be taken seriously. This program stores the phone book data in a file named ".phone_book_demo" in the user's home directory. To find the user's home directory, it uses the System.getProperty() method that was mentioned in Subsection 11.2.2. When the program starts, it checks whether the file already exists. If the file exists, it should contain the user's phone book, which was saved in a previous run of the program; in that case, the data from the file is read and entered into a TreeMap named phoneBook that represents the phone book while the program is running. (See Subsection 10.3.1.) In order to store the phone book in a file, some decision must be made about how the data in the phone book will be represented. For this example, I chose a simple representation in which each line of the file contains one entry consisting of a name and the associated phone number. A percent sign ('%') separates the name from the number. The following code at the beginning of the program will read the phone book data file, if it exists and has the correct format:

File userHomeDirectory = new File( System.getProperty("user.home") );
File dataFile = new File( userHomeDirectory, ".phone_book_data" );
        // A file named .phone_book_data in the user's home directory.

if ( ! dataFile.exists() ) {
   System.out.println("No phone book data file found.  A new one");
   System.out.println("will be created, if you add any entries.");
   System.out.println("File name:  " + dataFile.getAbsolutePath());
}
else {
   System.out.println("Reading phone book data...");
   try( Scanner scanner = new Scanner(dataFile) ) {
      while (scanner.hasNextLine()) {
             // Read one line from the file, containing one name/number pair.
         String phoneEntry = scanner.nextLine();
         int separatorPosition = phoneEntry.indexOf('%');
         if (separatorPosition == -1)
            throw new IOException("File is not a phonebook data file.");
         name = phoneEntry.substring(0, separatorPosition);
         number = phoneEntry.substring(separatorPosition+1);
         phoneBook.put(name,number);
      }
   }
   catch (IOException e) {
      System.out.println("Error in phone book data file.");
      System.out.println("File name:  " + dataFile.getAbsolutePath());
      System.out.println("This program cannot continue.");
      System.exit(1);
   }
}

The program then lets the user do various things with the phone book, including making modifications. Any changes that are made are made only to the TreeMap that holds the data. When the program ends, the phone book data is written to the file (if any changes have been made while the program was running), using the following code:

if (changed) {
   System.out.println("Saving phone directory changes to file " + 
         dataFile.getAbsolutePath() + " ...");
   PrintWriter out;
   try {
      out = new PrintWriter( new FileWriter(dataFile) );
   }
   catch (IOException e) {
      System.out.println("ERROR: Can't open data file for output.");
      return;
   }
   for ( Map.Entry<String,String> entry : phoneBook.entrySet() )
      out.println(entry.getKey() + "%" + entry.getValue() );
   out.flush();
   out.close();
   if (out.checkError())
      System.out.println("ERROR: Some error occurred while writing data file.");
   else
      System.out.println("Done.");
}

The net effect of this is that all the data, including the changes, will be there the next time the program is run. I've shown you all the file-handling code from the program. If you would like to see the rest of the program, see the source code.


11.3.3  Storing Objects in Files

Whenever data is stored in files, some definite format must be adopted for representing the data. As long as the output routine that writes the data and the input routine that reads the data use the same format, the files will be usable. However, as usual, correctness is not the end of the story. The representation that is used for data in files should also be robust. (See Section 8.1.) To see what this means, we will look at several different ways of representing the same data. This example builds on the example SimplePaint2.java from Subsection 7.3.3. (You might want to run it now to remind yourself of what it can do.) In that program, the user can use the mouse to draw simple sketches. Now, we will add file input/output capabilities to that program. This will allow the user to save a sketch to a file and later read the sketch back from the file into the program so that the user can continue to work on the sketch. The basic requirement is that all relevant data about the sketch must be saved in the file, so that the sketch can be exactly restored when the file is read by the program.

The new version of the program can be found in the source code file SimplePaintWithFiles.java. A "File" menu has been added to the new version. It implements "Save" and "Open" commands for writing program data to a file and reading saved data back into the program.

The data for a sketch consists of the background color of the picture and a list of the curves that were drawn by the user. A curve consists of a list of Points. A Point, pt, has instance variables x and y of type int that represent the coordinates of the point in the xy-plane. Each curve can be a different color. Furthermore, a curve can be "symmetric," which means that in addition to the curve itself, the horizontal and vertical reflections of the curve are also drawn. The data for each curve are stored in an object of type CurveData, which is defined in the program as:

/**
 * An object of type CurveData represents the data required to redraw one
 * of the curves that have been sketched by the user.
 */
private static class CurveData {
   Color color;  // The color of the curve.
   boolean symmetric;  // Are horizontal and vertical reflections also drawn?
   ArrayList<Point> points;  // The points on the curve.
}

Then, a list of type ArrayList<CurveData> is used to hold data for all of the curves that the user has drawn.

Let's think about how the data for a sketch could be saved to a text file. The basic idea is that all data necessary to reconstitute a sketch must be saved to the output file in some definite format. The method that reads the file must follow exactly the same format as it reads the data, and it must use the data to rebuild the data structures that represent the sketch while the program is running.

When writing character data, all of the data has to be expressed, ultimately, in terms of simple data values such as strings and primitive type values. A color, for example, can be expressed in terms of three numbers giving the red, green, and blue components of the color. The first (not very good) idea that comes to mind might be to just dump all the necessary data, in some definite order, into the file. Suppose that out is a PrintWriter that is used to write to the file. We could then say:

Color backgroundColor = getBackground();
out.println( backgroundColor.getRed() ); // Write background color to file.
out.println( backgroundColor.getGreen() );
out.println( backgroundColor.getBlue() );

out.println( curves.size() );       // Write the number of curves.
   
for ( CurveData curve : curves ) {  // For each curve, write...
   out.println( curve.color.getRed() );      // the color of the curve
   out.println( curve.color.getGreen() );   
   out.println( curve.color.getBlue() );
   out.println( curve.symmetric ? 0 : 1 );   // the curve's symmetry property
   out.println( curve.points.size() );       // the number of points on curve
   for ( Point pt : curve.points ) {       // the coordinates of each point
      out.println( pt.x );
      out.println( pt.y );
   }
}

This works in the sense that the file-reading method can read the data and rebuild the data structures. Suppose that the input method uses a Scanner named scanner to read the data file. Then it could say:

Color newBackgroundColor;                // Read the background Color.
int red = scanner.nextInt();
int green = scanner.nextInt();
int blue = scanner.nextInt();
newBackgroundColor = new Color(red,green,blue);

ArrayList<CurveData> newCurves = new ArrayList<>();
   
int curveCount = scanner.nextInt();      // The number of curves to be read.
for (int i = 0; i < curveCount; i++) {
   CurveData curve = new CurveData();
   int r = scanner.nextInt();            // Read the curve's color.
   int g = scanner.nextInt();
   int b = scanner.nextInt();
   curve.color = new Color(r,g,b);
   int symmetryCode = scanner.nextInt(); // Read the curve's symmetry property.
   curve.symmetric = (symmetryCode == 1);
   curveData.points = new ArrayList<>();
   int pointCount = scanner.nextInt();  // The number of points on this curve.
   for (int j = 0; j < pointCount; j++) {
      int x = scanner.nextDouble();        // Read the coordinates of the point.
      int y = scanner.nextDouble();
      curveData.points.add(new Point(x,y));
   }
   newCurves.add(curve);
}

curves = newCurves;                     // Install the new data structures.
setBackground(newBackgroundColor);

Note how every piece of data that was written by the output method is read, in the same order, by the input method. While this does work, the data file is just a long string of numbers. It doesn't make much more sense to a human reader than a binary-format file would. Furthermore, it is still fragile in the sense that any small change made to the data representation in the program, such as adding a new property to curves, will render the data file useless (unless you happen to remember exactly which version of the program created the file).

So, I decided to use a more complex, more meaningful data format for the text files created by my program. Instead of just writing numbers, I add words to say what the numbers mean. Here is a short but complete data file for the program; just by looking at it, you can probably tell what is going on:

SimplePaintWithFiles 1.0
background 110 110 180

startcurve
  color 255 255 255
  symmetry true
  coords 10 10
  coords 200 250
  coords 300 10
endcurve

startcurve
  color 0 255 255
  symmetry false
  coords 10 400
  coords 590 400
endcurve

The first line of the file identifies the program that created the data file; when the user selects a file to be opened, the program can check the first word in the file as a simple test to make sure the file is of the correct type. The first line also contains a version number, 1.0. If the file format changes in a later version of the program, a higher version number would be used; if the program sees a version number of 1.2 in a file, but the program only understands version 1.0, the program can explain to the user that a newer version of the program is needed to read the data file.

The second line of the file specifies the background color of the picture. The three numbers specify the red, green, and blue components of the color. The word "background" at the beginning of the line makes the meaning clear. The remainder of the file consists of data for the curves that appear in the picture. The data for each curve is clearly marked with "startcurve" and "endcurve." The data consists of the color and symmetry properties of the curve and the xy-coordinates of each point on the curve. Again, the meaning is clear. Files in this format can easily be created or edited by hand. In fact, the data file shown above was actually created in a text editor rather than by the program. Furthermore, it's easy to extend the format to allow for additional options. Future versions of the program could add a "thickness" property to the curves to make it possible to have curves with differing line widths. Shapes such as rectangles and ovals could easily be added.

Outputting data in this format is easy. Suppose that out is a PrintWriter that is being used to write the sketch data to a file. Then the output is be done with:

out.println("SimplePaintWithFiles 1.0"); // Version number.
out.println( "background " + backgroundColor.getRed() + " " +
        backgroundColor.getGreen() + " " + backgroundColor.getBlue() );
for ( CurveData curve : curves ) {
    out.println();
    out.println("startcurve");
    out.println("  color " + curve.color.getRed() + " " +
            curve.color.getGreen() + " " + curve.color.getBlue() );
    out.println( "  symmetry " + curve.symmetric );
    for ( Point pt : curve.points )
        out.println( "  coords " + pt.x + " " + pt.y );
    out.println("endcurve");
}

In the program, this code is used in a doSave() method that is similar to the writeFile() method that is presented in Subsection 11.2.3. The method uses a file dialog box to allow the user to select the output file.

Reading the data is somewhat harder, since the input routine has to deal with all the extra words in the data. In my input routine, I decided to allow some variation in the order in which the data occurs in the file. For example, the background color can be specified at the end of the file, instead of at the beginning. It can even be left out altogether, in which case white will be used as the default background color. This is possible because each item of data is labeled with a word that describes its meaning; the labels can be used to drive the processing of the input. Here is the complete method from SimplePaintWithFiles.java that reads data files created by the doSave() method. It uses a Scanner to read items from the file:

private void doOpen() {
    if (fileDialog == null)
        fileDialog = new JFileChooser();
    fileDialog.setDialogTitle("Select File to be Opened");
    fileDialog.setSelectedFile(null);  // No file is initially selected.
    int option = fileDialog.showOpenDialog(this);
    if (option != JFileChooser.APPROVE_OPTION)
        return;  // User canceled or clicked the dialog's close box.
    File selectedFile = fileDialog.getSelectedFile();
    Scanner scanner;
    try {
        Reader stream = new BufferedReader(new FileReader(selectedFile));
        scanner = new Scanner( stream );
    }
    catch (Exception e) {
        JOptionPane.showMessageDialog(this,
                "Sorry, but an error occurred while trying to open the file:\n" + e);
        return;
    }
    try {
        String programName = scanner.next();
        if ( ! programName.equals("SimplePaintWithFiles") )
            throw new IOException("File is not a SimplePaintWithFiles data file.");
        double version = scanner.nextDouble();
        if (version > 1.0)
            throw new IOException("File requires a newer version of SimplePaintWithFiles.");
        Color newBackgroundColor = Color.WHITE;
        ArrayList<CurveData> newCurves = new ArrayList<CurveData>();
        while (scanner.hasNext()) {
            String itemName = scanner.next();
            if (itemName.equalsIgnoreCase("background")) {
                int red = scanner.nextInt();
                int green = scanner.nextInt();
                int blue = scanner.nextInt();
                newBackgroundColor = new Color(red,green,blue);
            }
            else if (itemName.equalsIgnoreCase("startcurve")) {
                CurveData curve = new CurveData();
                curve.color = Color.BLACK;
                curve.symmetric = false;
                curve.points = new ArrayList<Point>();
                itemName = scanner.next();
                while ( ! itemName.equalsIgnoreCase("endcurve") ) {
                    if (itemName.equalsIgnoreCase("color")) {
                        int r = scanner.nextInt();
                        int g = scanner.nextInt();
                        int b = scanner.nextInt();
                        curve.color = new Color(r,g,b);
                    }
                    else if (itemName.equalsIgnoreCase("symmetry")) {
                        curve.symmetric = scanner.nextBoolean();
                    }
                    else if (itemName.equalsIgnoreCase("coords")) {
                        int x = scanner.nextInt();
                        int y = scanner.nextInt();
                        curve.points.add( new Point(x,y) );
                    }
                    else {
                        throw new Exception("Unknown term in input.");
                    }
                    itemName = scanner.next();
                }
                newCurves.add(curve);
            }
            else {
                throw new Exception("Unknown term in input.");
            }
        }
        scanner.close();
        setBackground(newBackgroundColor);
        curves = newCurves;
        repaint();
        editFile = selectedFile;
        setTitle("SimplePaint: " + editFile.getName());
    }
    catch (Exception e) {
        JOptionPane.showMessageDialog(this,
                "Sorry, but an error occurred while trying to read the data:\n" + e);
    }    
}

The main reason for this long discussion of file formats has been to get you to think about the problem of representing complex data in a form suitable for storing the data in a file. The same problem arises when data must be transmitted over a network. There is no one correct solution to the problem, but some solutions are certainly better than others. In Section 11.5, we will look at one solution to the data representation problem that has become increasingly common.


In addition to being able to save sketch data in text form, SimplePaintWithFiles can also save the picture itself as an image file that could be, for example, printed or put on a web page. This is a preview of image-handling techniques that will be covered in Subsection 13.1.5, and it uses techniques that I have not yet covered.


[ Previous Section | Next Section | Chapter Index | Main Index ]