CS 3291 (Java with Internet Applications):
Homework #3

Assigned:
October 5, 1999.

Due:
October 19, 1999.

Credit:
20 points.

Objective:
Write a program that (1) exploits inheritance and polymorphism and (2) performs Java file I/O.

Description:
For this program, you are to "fill in the blanks" of a skeleton program that behaves as follows. (If you find this description hard to follow, have a look at the sample output first and then come back to it.)
  1. It requires two or more command line arguments, the first a file name for its output and the rest a list (call it P) of other path or file names.
  2. It builds a list of "file info objects", one for each path or file name in P that exists and is readable. It classifies each path or file name as one of the following:
    • A directory.
    • A text file. Text files are assumed to contain ASCII text; their names end in .txt.
    • A C source file. C source files are assumed to contain ASCII text that looks like C source code; their names end in .c.
    • A Fortran source file. Fortran source files are assumed to contain ASCII text that looks like Fortran source code; their names end in .f.
    • A file of unknown type -- anything that doesn't fit one of the above categories.
    In the description below, text files and the two kinds of program source files are collectively referred to as ASCII-text files.
  3. It repeatedly prompts the user to choose one of the following options:
    • 'S' for file sizes
    • 'L' for file line counts
    • 'X' for file line counts excluding whitespace and comments
    • 'C' for file contents
    • 'Q' to exit
    (The program should reject any input that doesn't correspond to one of these options and prompt again. Nothing should be written to the output file in that case.) For option 'Q', the program prints an "end of input" message and exits. For every other option, the program cycles through its list of files (or "file info objects") and for each one writes the appropriate information to the output file. What constitutes "appropriate information" is determined by the file type:
    • File size is always the size of the file or directory in bytes.
    • Line count is computed only for ASCII-text files and is simply the number of lines in the file, where lines are separated by the system line separator character(s). For files of other types, the 'L' option prints a message to the effect that it can't compute a line count. (See sample output below.)
    • Line count excluding white space and comments is also computed only for ASCII-text files:
      • For text files, it's the number of lines in the file (lines separated as above) that contain something other than whitespace (blanks and tabs).
      • For the two types of program-source files, it's the number of lines in the file that contain something other than whitespace and program comments.
        • For C, comments begin with /* and end with */ and are not nested.
        • For Fortran, comments begin with C or c in the first position of the line and continue to the end of the line.
      For files of other types, the 'X' option behaves like the 'L' option above.
    • File contents are printed only for directories and ASCII-text files; for other files, the 'C' option prints a message to the effect that file contents can't be displayed. (See sample output.) For directories, the 'C' option prints a list of files in the directory (or an empty-directory message if it contains no files). For ASCII-text files, the 'C' option prints the contents of the file, one line at a time (or an empty-file message if the file contains no lines).
See the Detailed description section below for the desired format for the output file.

Instructions:
  1. Review the Detailed description and Helpful hints sections below.
  2. Download the sample program framework FilesInfo.java. Also download Input.java (or you may use an existing copy of the Input class, if you used it for Homework #2). This framework provides most of the code for the main() method, including user-interface code. It also defines a framework for a FileInfoObj class; the idea is that you are to build an instance of FileInfoObj (or of some subclass of FileInfoobj you define) for each of the files in the program's list, and use the class's methods to print the various kinds of information.
  3. Modify the framework as needed. Comments of the form // PUT SOMETHING HERE: flag the spots where you need or may want to add or change something.
  4. Test the program until you are satisfied that it works.
  5. Submit your Java source file, as described in How to submit homework.

Helpful hints:
  1. The program framework, in its current form, should compile without error and execute without generating exceptions. It doesn't attempt to correctly process the list of path or file names or generate an output file, but it does show how the standard-input / standard-output user interface works.
  2. The program framework uses methods of our Input class to obtain input from standard input. For this assignment, you should not use the Input class to read from any files other than standard input.
  3. You may find the following classes helpful in filling in the details of the program:
    • java.io.File.
    • java.io.BufferedReader. While this class appears to have problems when its source is keyboard input, it should work fine when its source is a file. Note that unlike the methods of the Input class, the methods of the BufferedReader class do not signal end of file by throwing an exception.
    • java.io.PrintWriter.
    • String. Note the trim() method.
    • java.io.StreamTokenizer.
    See pages 286-289 of Exploring Java for brief examples of the use of some of these classes. See also sample programs TestFile.java, TestFilters.java, and TestStreamTokenizer.java.
  4. You may also find it helpful to look at a couple of sample programs showing inheritance and polymorphism: the contrived example NumObjs.java from February 19's class, or the earlier example RegisteredVehicleTest.java.
  5. Note that you are not required to decide whether the contents of a supposed Fortran or C source file represent valid Fortran or C code; all you need to know about these languages is what comments look like. The Detailed description section below discusses that, and the sample output contains examples.

Detailed description:

Path and file names

The exact format of path and file names is somewhat system-dependent. Your program should be prepared to accept anything that's accepted by the java.io.File class. (If in doubt about a particular pathname, try it with the TestFile sample program.) For this program, there is no need to worry about differences between systems, since you don't have to parse path or file names in any way that could be system-dependent.

Classifying files

Earlier I said that the program classifies each file as either a directory, a text file, a C source file, a Fortran source file, or "other". The rules for deciding how to classify each file are simple:
  • A pathname that represents a directory is a directory. (The File class gives you a way of checking this.)
  • A pathname that ends in .txt is a text file. You can assume that any such file contains ASCII text, with lines separated by the system line separator (as recognized by methods of Java's I/O classes).
  • A pathname that ends in .c is a C source file. You can assume that any such file contains ASCII text as for a text file, with the text looking like C source, at least with respect to comments.
  • A pathname that ends in .f is a Fortran source file. You can assume that any such file contains ASCII text as for a text file, with the text looking like Fortran source, at least with respect to comments.

Identifying comments in "C source files"

For option 'X', you are to produce a count of lines that excludes whitespace and comments. For C source files, comments begin with /* and end with */ and cannot be nested. (Ideally, you should ignore /* and */ if they occur within a quoted string. To make coding easier, you can assume that this never happens.) Your count should exclude these lines:
/* a comment here */
/* a multi-line comment
that continues here */
since they consist of comments only, but include these lines:
printf("hello\n"); /* a comment */
/* a comment */ printf("hello\n");
since they contain code in addition to comments.

Hint: You can do your own parsing for this task, or see CountCLines.java (with sample input file CountCLines.IN) for an example of how to set up a StreamTokenizer to accomplish the desired parsing.

Identifying comments in "Fortran source files"

Comments in Fortran source are much easier to recognize: A line that has a C or a c as its first character is a comment line (that is, the whole line is a comment). (For those of you familiar with newer versions of Fortran: Yes, this means we're assuming our Fortran source is in the older fixed format rather the newer free format that allows other forms of comments.)

Example input/output

Suppose we have a directory TestDir containing:

  • files cpgm.c, empty.txt, otherfile, fpgm.f, and tfile.txt.
  • empty subdirectory EmptySub.
  • subdirectory Sub containing files f1 and files f2.

Then the following is a possible execution of our program (user input is shown in boldface), in directory TestDir:

java FilesInfo Test.OUT EmptySub Sub cpgm.c empty.txt otherfile fpgm.f tfile.txt qwertyuiop
Adding EmptySub to list of files
Adding Sub to list of files
Adding cpgm.c to list of files
Adding empty.txt to list of files
Adding fpgm.f to list of files
Adding otherfile to list of files
Adding tfile.txt to list of files
qwertyuiop not found or not readable
Options are:
S to get file sizes
L to count lines in files
X to count lines in files excluding whitespace, comments
C to get file contents
Q to end
Enter option: (S, L, X, C, or Q)
S
printing sizes
Enter option: (S, L, X, C, or Q)
C
printing contents
Enter option: (S, L, X, C, or Q)
xx
Invalid option
Options are:
S to get file sizes
L to count lines in files
X to count lines in files excluding whitespace, comments
C to get file contents
Q to end
Enter option: (S, L, X, C, or Q)
S
printing sizes
Enter option: (S, L, X, C, or Q)
L
printing line counts
Enter option: (S, L, X, C, or Q)
X
printing line counts excluding whitespace, comments
Enter option: (S, L, X, C, or Q)
Q
which writes the following to the specified output file Test.OUT:
**Sizes** 
for EmptySub:  512 
for Sub:  512 
for cpgm.c:  236 
for empty.txt:  0 
for fpgm.f:  75 
for otherfile:  7 
for tfile.txt:  141 
**done** 
 
**Contents** 
for EmptySub:  list of files in directory 
  (empty directory) 
for Sub:  list of files in directory 
  f1 
  f2 
for cpgm.c:  text 
  /* here's a comment that starts a line 
  */ 
   
        /* here's another */ 
   
  #include "stdio.h"    /* here's another comment */ 
   
  void                  /* and here's one 
                        */ 
  main(int argc, int argv[]) { 
   
        /* here's one more */ (void) printf("hello, world\n") ; 
   
  } 
for empty.txt:  text 
  (empty file) 
for fpgm.f:  text 
  c my program 
        program fpgm 
  C another comment 
        print*, "hello, world" 
   
        end 
for otherfile:   
  ?? unable to display contents (unknown file type) ?? 
for tfile.txt:  text 
  some text  
  next line is just tabs 
                 
  next line is just blanks 
    
  next line is mixed tabs and blanks 
                  
  next line is empty 
   
  more text 
  the end 
**done** 
 
**Sizes** 
for EmptySub:  512 
for Sub:  512 
for cpgm.c:  236 
for empty.txt:  0 
for fpgm.f:  75 
for otherfile:  7 
for tfile.txt:  141 
**done** 
 
**Line counts** 
for EmptySub:  ?? (directory) 
for Sub:  ?? (directory) 
for cpgm.c:  14 
for empty.txt:  0 
for fpgm.f:  6 
for otherfile:  ?? (unknown file type) 
for tfile.txt:  11 
**done** 
 
**Line counts excluding whitespace, comments** 
for EmptySub:  ?? (directory) 
for Sub:  ?? (directory) 
for cpgm.c:  5 
for empty.txt:  0 
for fpgm.f:  3 
for otherfile:  ?? (unknown file type) 
for tfile.txt:  7 
**done** 
 
**End of user input** 
Note that the above is for execution on a Unix system. Results on a DOS-based system should be identical, except for file and directory sizes. Directory sizes are non-zero multiples of 512 under Unix, 0 (I believe) under Windows; file sizes may depend on what the system uses as a line separator. What you are to print is what's returned by the length() method of the File class. Output may thus look a bit different on different systems, but this shouldn't be a concern -- when I test your program, I'll test it on the same system I use for preparing the expected output, so all should be well. Also, conventions for specifying pathnames may be different on different systems, but again this should not be a concern, for the same reason.

NOTE: For this program, I'm concerned only with what your program writes to its output file. So given the inputs above, your program should produce an output file identical to Test.OUT above. What it writes to standard output, however, is up to you; you can consider the standard-output part of the above dialog a guideline only.

Grading:
This homework is worth 20 points: