CPSC 371: Visualization and Data Mining
Spring 2007


Stina Bridgeman
Lansing 312, x3614

Course Description

"...because a picture is worth a thousand words"

Information visualization is an area of computer science concerned with helping users understand data through visual representations. Visualization is a powerful technique: it is possible to pack large quantities of information into a manageable size, and people are generally quite good at detecting patterns, relationships, and anomalies visually.

Data mining involves computational techniques for discovering previously unknown patterns and relationships in large data sets. This is of great interest for companies wishing to analyze customer buying habits in order to increase the company's profit or market share and for governments trying to catch terrorists, to name two examples.

This course will be organized around the theme of discovery and communication of interesting patterns and relationships in data. Topics include the principles and practices of effective visual communication, techniques for exploration and discovery, how visualization and data mining complement each other, and a consideration of the societal impacts and ethics of employing this technology.

Office Hours

M 12:30-1:30pm, W 3-4:30pm, R 10:30am-noon, F 10:30-11:30am
or by appointment (schedule)

Class Hours and Meeting Place

Lecture MWF 1:55pm-2:50pm, Lansing 301

Course Links

Unix Links


[5/4] Practicum question 4: The risk rating is denoted by the "symboling" column in the data set.
Practicum question 7: Build a model for predicting the thermal distress, and use that to determine the answer for the specific conditions given in the problem.

[5/3] A Weka quirk - it won't discretize an attribute which is currently selected as the class attribute in the Preprocess tab. If you wish to discretize that attribute, change the class to something else, discretize, then change the class back to what you want.

[3/26]To clarify something on the prefuse question (#4) -
In the section on interaction controls, it says that there should be controls to select counties based on the number of under- and overvotes. This means "number" in the same way that the number of under- and overvotes were shown on the axis in the visualizations i.e. as a proportion of the total votes in the county. You do not have to make the values on the control different from the values shown on the corresponding axis.

It is also OK if the value labels on any of the controls don't contain sufficient precision for the actual data values. The default is to show only one decimal place (though this can be changed). As long as the controls are set up properly, you don't have to figure out how to set the precision.

[3/20] Check out the new InterestingLinks page on the wiki!


[1/16] Note change in the reading for Friday. The Tufte reading is optional for the moment - we'll come back to it a bit later. (It is appropriate now, however, as it is a powerful illustration of the consequences of not convincingly communicating a point. It also (briefly) addresses the issue of matching the display to the type of data being displayed.)

[1/16] Homework #1 has been updated with information about how to post your examples to the wiki. A demonstration will be done in class on 1/17.

[1/15] The course wiki is live! Information about how to create an account will be posted soon, but you do not need to have an account to view the content there.

[1/15] Welcome to CPSC 371! This web page is your source for a great deal of important and useful material, so you should take a few minutes to familiarize yourself with the website. Check back often for announcements and new information.

Valid HTML 4.01!