Introduction to Programming (CPSC 124)
—Hobart & William Smith Colleges, Fall 2014
Project #6
Home | Syllabus | Calendar | Class Notes | Labs and Projects | General Notes

Due by the start of class on Friday, 11/14/2014 11:00 am on Saturday, 11/15/2014

READ ME

This project is a departure from what we've done so far in the class. It is much more open-ended, and it is much more substantial. You have 10 days to complete it, but do not assume you can begin it the day before the due date and expect to finish on time.

This project must be done in groups of two or three people, preferably three. This is not optional,as the experience of teamwork is one of its principal goals. All teams must be formed by Wednesday evening, November 5th (in fact, we'll probably have it done by the end of class that day).

Text Formatting Via Markup Language Processing

Automated typesetting is one of the oldest and most pervasive consumer applications for computers. The earliest systems such as RUNOFF (http://en.wikipedia.org/wiki/TYPSET_and_RUNOFF) were developed to support specialized markup languages (http://en.wikipedia.org/wiki/Markup_language). Such languages consist of commands that are used to annotate a document's displayable content, controlling various aspects of that display. This has proven to be a highly effective approach to the design of typesetting and text-formatting systems. Even today, it remains the leading approach, with such widely used markup languages as HTML/CSS, RTF, Open Document Format, Office Open XML, and TeX (http://en.wikipedia.org/wiki/Comparison_of_document_markup_languages).

In this assignment you and your team will:

The program you build only needs to support plain text formatting not full typesetting or page rendering. The difference is that type setting and page rendering involve many advanced features that cannot be displayed in plain text alone, such as the control of font shape and size, colors, inclusion of graphics, support for scripts, and so on. Your program only needs to produce a plain text file. This could include a few interesting escape characters — for example, page breaks (http://en.wikipedia.org/wiki/Page_break) or colors (http://en.wikipedia.org/wiki/ANSI_escape_code) — but neither of these are required.

The choice of features, how they are specified, and how many are offered by your system is purposefully open-ended. Grade-wise, there are some varying levels of accomplishment possible (see "Standards", below), but I want you all to use your creativity amongst yourselves for this!

Some Ideas

The basic idea behind all markup-based text processing is the separation of content from presentation. The actual content to display is stored in a plain text file, perhaps with special commands that annotate parts of the document, according to some standard file format. A text processing or typesetting system would read the markup commands and content from this file and produce an output file formatted according to these commands.

A Basic Form Idea

For example, we could adopt a simple convention that allowed us to distinguish between paragraphs: separate them by a blank line (this is what TeX does, for example).

Files given in this format would then be used to produce an output file "typeset" in plain text according to rules like these:

A program that supported this set of commands and formatting rules would, for example, read this text

Messiaen's music has been described as outside the western musical tradition,
although growing out of that tradition and being influenced by it. Much of his
output denies the western conventions of forward motion, development and
diatonic harmonic resolution. This is partly due to the symmetries of his
technique-for instance the modes of limited transposition do not admit the
conventional cadences found in western classical music. 

His youthful love for the fairy-tale element in Shakespeare prefigured his later
expressions of Catholic liturgy. Messiaen was not interested in depicting
aspects of theology such as sin; rather he concentrated on the theology of joy,
divine love and redemption.

and produce this:

          Messiaen's music has been described as outside the western 

     musical tradition, although growing out of that tradition and being 

     influenced by it. Much of his output denies the western conventions 

     of forward motion, development and diatonic harmonic resolution. This 

     is partly due to the symmetries of his technique-for instance the 

     modes of limited transposition do not admit the conventional cadences 

     found in western classical music. 

          His youthful love for the fairy-tale element in Shakespeare 

     prefigured his later expressions of Catholic liturgy. Messiaen was 

     not interested in depicting aspects of theology such as sin; rather 

     he concentrated on the theology of joy, divine love and redemption. 

Some Markup Tag Ideas

Perhaps you want to add support for distinguishing between text used for headings and text used for paragraphs. To do this, you could introduce "tags" for each, i.e. simple commands that are placed at the beginning and end of their associated content.

For example, the beginning of a heading could be specified on a line by itself with the command ".h", with the text of the heading on the next line. If heading text is given on multiple lines, those line breaks are preserved, with each line centered horizontally. On a blank line after the end of the heading text, we'd have an "end of heading" command such as "./h". At the end of a heading, two blank lines are added. Similarly, the beginning of each paragraph could be specified with the command ".p", again given on a line by itself, with the paragraph itself beginning on the line after. We could use a command like "./p" to mark the end of that paragraph, and like the heading commands, it would be given on a line by itself (by the way, this is a simplification of the way HTML tags work.)

For example,

.h
Harmony, counterpoint and form
./h
.p
Sorabji's counterpoint stems from Busoni's and Reger's; writers have described
it as more successful than that of the latter. The influence of these
composers led Sorabji to employ various baroque contrapuntal forms (chorale
prelude, passacaglia, fugue and others), but he rejected the symmetry and forms
that characterise the music of composers such as Mozart and Brahms. Sorabji
was dismissive of the Classical style, mainly because he saw it as restricting
the musical material to conform to a "ready-made mould", and his musical
thinking is closer to that of the Baroque era than to the Classical.
./p
.p
Ornamentation assumes a preeminent role in much of Sorabji's music. His harmonic
language, which frequently combines tonal and atonal elements, is thus freer
than in the music of many other composers and less amenable to
analysis. Like many other 20th-century composers, Sorabji displays a
fondness for tritone and semitone relationships. The opening gesture of his
Fourth Piano Sonata, for example, emphasises these two intervals, and the two
long pedal points in its third movement are a tritone apart. However, some
people have remarked that his music rarely contains the tension that is commonly
associated with very dissonant music.
./p

might be formatted as

                         Harmony, counterpoint and form


         Sorabji's counterpoint stems from Busoni's and Reger's; writers  

     have described it as more successful than that of the latter. The 

     influence of these composers led Sorabji to employ various baroque 

     contrapuntal forms (chorale prelude, passacaglia, fugue and others), 

     but he rejected the symmetry and forms that characterise the music of 

     composers such as Mozart and Brahms. Sorabji was dismissive of the 

     Classical style, mainly because he saw it as restricting the musical 

     material to conform to a "ready-made mould", and his musical thinking 

     is closer to that of the Baroque era than to the Classical.

         Ornamentation assumes a preeminent role in much of Sorabji's 

     music. His harmonic language, which frequently combines tonal and 

     atonal elements, is thus freer than in the music of many other 

     composers and less amenable to analysis. Like many other 20th-century 

     composers, Sorabji displays a fondness for tritone and semitone 

     relationships. The opening gesture of his Fourth Piano Sonata, for 

     example, emphasises these two intervals, and the two long pedal points 

     in its third movement are a tritone apart. However, some people have 

     remarked that his music rarely contains the tension that is commonly

     associated with very dissonant music.

Some other choices include tags for left, right, and full justification of lines, inserting form feeds after a certain number of lines, inserting page numbers, setting of tab stops, left and right indentation blocks, and so on. It is even possible to produce multi-column output on a page, though this is a very challenging task.

Turn In

You will turn in two kinds of document for this assignment:

  1. Your source code files, both paper copies and electronically. For the electronic version, please send me a message telling me where I will find your group's work.
  2. A document that describes the various features of your markup file format: essentially a tutorial on how to use it. It should be clear enough that a user of your program (in particular, me) can easily format her or his own files with your system.

Standards

Build your code, early and often!

As with all asignments in this course, your code must be syntactically and semantically correct, which means that it has to compile successfully. Because of the open-ended nature of this assignment, partial credit can still be earned for code efforts that are not yet programs. However, any program that cannot be successfully compiled due to syntax or semantic errors will earn a severe grade penalty, with no more than 50% credit possible.

C-/C/C+ level work

Satisfactory efforts are any programs that can successfully read an input file and write the resulting formatted text, without losing any of its content. At this level, you have designed a few features along the lines of the "Basic Form" or "Markup Tag Ideas" above. However, there are bugs in the implementation of most of these features, and the resulting output files, while retaining their intended content, have frequent unplanned, incorrect formatting results.

B-/B/B+ level work

At this level, your group has chosen to implement very few features in formatting, perhaps even as simple as the "Basic Form" or the "title/paragraph" markup tags above, but what is there works correctly. Other features may have been attempted, but with only buggy, partially correct implementations. However, these bugs are clearly documented in your "tutorial", and they do not distract from the overall usability of the program.

A-/A level work

Your system implements several markup features, at least as rich as the "markup tag ideas" above. All or nearly all features are implemented correctly, and few bugs, if any, exist in the program. Exceptional efforts can even rise above A level work here.

Elegance and Clarity of Source Code

Your work will be assessed on its elegance and clarity of exposition. Significant subtasks should be implemented as method definitions, and the pre and postconditions of these methods should be clearly documented. The purpose of each variable and method parameter should be clearly described with comments. The standard rules of indentation, vertical whitespace, descriptive name choices, etc., continue to apply. As always, see the Style Guide, available from the General Notes section of our course web site, for general expectations of beautiful, clear code style.



John H. E. Lasseter