CPSC 371: Visualization and Data Mining
Spring 2007

Instructor	Stina Bridgeman bridgeman@hws.edu Lansing 312, x3614	Course Description	"...because a picture is worth a thousand words" Information visualization is an area of computer science concerned with helping users understand data through visual representations. Visualization is a powerful technique: it is possible to pack large quantities of information into a manageable size, and people are generally quite good at detecting patterns, relationships, and anomalies visually. Data mining involves computational techniques for discovering previously unknown patterns and relationships in large data sets. This is of great interest for companies wishing to analyze customer buying habits in order to increase the company's profit or market share and for governments trying to catch terrorists, to name two examples. This course will be organized around the theme of discovery and communication of interesting patterns and relationships in data. Topics include the principles and practices of effective visual communication, techniques for exploration and discovery, how visualization and data mining complement each other, and a consideration of the societal impacts and ethics of employing this technology.
Office Hours	M 12:30-1:30pm, W 3-4:30pm, R 10:30am-noon, F 10:30-11:30am or by appointment (schedule)
Class Hours and Meeting Place	Lecture MWF 1:55pm-2:50pm, Lansing 301

Instructor

Stina Bridgeman
bridgeman@hws.edu
Lansing 312, x3614

Course Description

"...because a picture is worth a thousand words"

Information visualization is an area of computer science concerned with helping users understand data through visual representations. Visualization is a powerful technique: it is possible to pack large quantities of information into a manageable size, and people are generally quite good at detecting patterns, relationships, and anomalies visually.

Data mining involves computational techniques for discovering previously unknown patterns and relationships in large data sets. This is of great interest for companies wishing to analyze customer buying habits in order to increase the company's profit or market share and for governments trying to catch terrorists, to name two examples.

This course will be organized around the theme of discovery and communication of interesting patterns and relationships in data. Topics include the principles and practices of effective visual communication, techniques for exploration and discovery, how visualization and data mining complement each other, and a consideration of the societal impacts and ethics of employing this technology.

Office Hours

M 12:30-1:30pm, W 3-4:30pm, R 10:30am-noon, F 10:30-11:30am
or by appointment (schedule)

Class Hours and Meeting Place

Lecture MWF 1:55pm-2:50pm, Lansing 301

Course Links

Course Information

(general information about the course, including assessment)
Course Policies

(course policies on attendance, collaboration, late/makeup work, and other things)
Syllabus

(syllabus, including links to handouts, assignments, and readings...much of what you want on a daily basis is here)
Course Wiki

(lecture slides, readings, and discussion) - on-campus access only
Prefuse Documentation

(prefuse documentation) - on-campus access only

Unix Links

Using Linux at HWS

(lots of useful information about the Linux systems at HWS)
CS124 Lab 1 (spring 2005)

(basic Linux usage information - logging in, working with files and directories, printing)

Announcements

[5/4] Practicum question 4: The risk rating is denoted by the "symboling" column in the data set.
Practicum question 7: Build a model for predicting the thermal distress, and use that to determine the answer for the specific conditions given in the problem.

[5/3] A Weka quirk - it won't discretize an attribute which is currently selected as the class attribute in the Preprocess tab. If you wish to discretize that attribute, change the class to something else, discretize, then change the class back to what you want.

[3/26]To clarify something on the prefuse question (#4) -
In the section on interaction controls, it says that there should be controls to select counties based on the number of under- and overvotes. This means "number" in the same way that the number of under- and overvotes were shown on the axis in the visualizations i.e. as a proportion of the total votes in the county. You do not have to make the values on the control different from the values shown on the corresponding axis.

It is also OK if the value labels on any of the controls don't contain sufficient precision for the actual data values. The default is to show only one decimal place (though this can be changed). As long as the controls are set up properly, you don't have to figure out how to set the precision.

[3/20] Check out the new InterestingLinks page on the wiki!

[3/6]

The prefuse documentation has been updated to include information about hover and tooltips. The listing for Wednesday's reading on the syllabus page points you to what sections this information is contained in.
I have started gathering data for the project in /classes/s07/cs371/project/ - there are subdirectories for each project I have data for. So far I only have data for two projects (Student Interests and Seneca Lake) - if you've gotten data directly and would like me to store a copy in this directory, please forward it to me. Also, please let me know if you haven't gotten the data by spring break, so I can make sure that it is available after the break.

[1/16] Note change in the reading for Friday. The Tufte reading is optional for the moment - we'll come back to it a bit later. (It is appropriate now, however, as it is a powerful illustration of the consequences of not convincingly communicating a point. It also (briefly) addresses the issue of matching the display to the type of data being displayed.)

[1/16] Homework #1 has been updated with information about how to post your examples to the wiki. A demonstration will be done in class on 1/17.

[1/15] The course wiki is live! Information about how to create an account will be posted soon, but you do not need to have an account to view the content there.

[1/15] Welcome to CPSC 371! This web page is your source for a great deal of important and useful material, so you should take a few minutes to familiarize yourself with the website. Check back often for announcements and new information.

CPSC 371: Visualization and Data Mining Spring 2007

Course Links

Unix Links

Announcements

CPSC 371: Visualization and Data Mining
Spring 2007