CPSC 371 | Visualization and Data Mining | Spring 2007 |
"...because a picture is worth a thousand words" |
Information visualization is an area of computer science concerned with helping users understand data through visual representations. Visualization is a powerful technique: it is possible to pack large quantities of information into a manageable size, and people are generally quite good at detecting patterns, relationships, and anomalies visually.
Data mining involves computational techniques for discovering previously unknown patterns and relationships in large data sets. This is of great interest for companies wishing to analyze customer buying habits in order to increase the company's profit or market share and for governments trying to catch terrorists, to name two examples.
This course will be organized around the theme of discovery and communication of interesting patterns and relationships in data. Topics include the principles and practices of effective visual communication, techniques for exploration and discovery, how visualization and data mining complement each other, and a consideration of the societal impacts and ethics of employing this technology.
Instructor |
Stina Bridgeman |
---|---|
Office Hours |
M 12:30-1:30pm, W 3-4:30pm, R 10:30am-noon, F 10:30-11:30am |
Class Hours and Meeting Place |
Lecture MWF 1:55pm-2:50pm, Lansing 301 |
Course Web Page |
http://math.hws.edu/~bridgeman/courses/371/s07/ |
Texts |
Data Mining, 2nd edition The following books are on reserve at the library:
Additional material will be handed out or posted on the course webpage. |
Prerequisites |
C- in CPSC 225, or instructor permission |
Rationale & Aims |
This course, like the other 300- and 400-level computer science courses, explores a particular topic in computer science. The roots of visualization and data mining, however, are far-flung - including computer graphics, human-computer interaction, cognitive psychology, semiotics, graphic design, cartography, art, algorithms, statistics, artificial intelligence, machine learning, information retrieval, and pattern recognition. The rationale for this course is straightforward: data is all around us. It is being produced in ever-larger quantities - scientific data sets, medical data sets, customer buying habits, websurfing habits, credit histories, census data, ...the list goes on. In order for all this data to be useful, it needs to be examined - for patterns, relationships, anomalies, or anything else that might be interesting. Visualization and data mining provide methods for doing this, and that is the relevance. This course has three primary aims:
|
Course Content Overview |
There will be three main components to the course: visual communication, exploration, and the social and ethical considerations of the technology. Visual Communication: The first part of the course will focus on the principles and practices of effective visual communication, including aspects of human perception and cognition, the basic building blocks of visual representations, graphical integrity, and graphical excellence. These principles will be applied as we build a toolbox of visualization techniques, and critique the effectiveness of these techniques for particular tasks. Exploration: The second part of the course will consider visual and computational methods for the exploration of data sets and the discovery of things of interest. Topics include the basic principles and techniques of building interactive visualizations, applications of those principles and techniques to particular tasks, fundamental data mining tasks and algorithms, applications of those algorithms to particular tasks, and how data mining and visualization complement each other. Social and Ethical Considerations: Data mining, in particular, raises many concerns about privacy, legality, and ethics - while it also offers many potential benefits. The final part of the course will examine these issues. |
Assignments and Evaluation |
Online Discussion: Discussion and reflection on the material is an important part of this course. As part of this, you are expected to contribute to the online discussion in the course wiki. This involves both keeping up with what others have posted, as well as posting your own comments. Details can be found on the DiscussionRequirements page in the wiki. Reading Response: Reading response questions for each reading assignment will be posted on the course wiki, and are due at the start of class on the day for which the reading is assigned. These questions are intended to help you focus on the key points of the reading, to reflect on what you've read, and to prepare you for class discussions. Details can be found on the ReadingResponseRequirements page of the wiki. Homework: Homeworks are designed to reinforce topics covered in class, and to allow you to explore concepts on your own. There will be a variety of types of homework assignments; some will be writing-based (such as critiques of visualizations), while others will be more technical and hands-on (such as using particular tools to explore data sets or construct visualizations). Project: A major component of the course will be a project in which you will apply the skills you've gained to a real problem. Your role will be that of a consultant working for a client who has a data set and some questions to investigate - you'll interview the client to discover their needs, design and implement a way to answer their questions, and produce a report. Practicums: In lieu of more typical exams, there will be three practicums (two involving visualization and one involving data mining). In these, you'll be given a data set and a few tasks to address; your task will be determine how to address those tasks and to produce a report presenting and critiquing your solution. The purpose is to demonstrate that you can apply the theoretical principles studied in the course to practical situations. Final Grades: Final grades in this course will be computed as follows:
Attendance and Participation: On-time attendance and class
participation (see the course policies) are
expected, though they are not formally factored into your final grade.
Missing class - for any reason - often results in lower grades because
important material was missed.
Similarly, not participating in class even if
you are physically present may mean that you aren't actively following the
material and thus may be missing more sophisticated or subtle
points. Late Policy and Collaboration: See the course policies for the late policy and collaboration policy. |