CSE6242 / CX4242, Spring 2015
Data and Visual Analytics

Georgia Tech, College of Computing

4:30 - 6pm, Klaus 1456, Mon & Wed
Prof. Duen Horng (Polo) Chau

This course will introduce you to broad classes of techniques and tools for analyzing and visualizing data at scale. It emphasizes on how to combine computation and visualization to perform effective analysis. We will cover methods from each side, and hybrid ones that combine the best of both worlds. Students will work in small teams to complete a research project exploring novel approaches for interactive data & visual analytics.

Office Hours

Polo Chau Mon, 3-4pm Klaus 1324
Meera Manohar Kamath Tue 3:30-4:30pm Klaus 2126
Chris Berlind Wed 3-4pm Common area outside Klaus 2140
Yichen Wang Thur, 3-4pm Skiles 140
Amir H. Afsharinejad Fri 2-3pm Klaus 3402

Schedule (tentative)

Video recordings of the lectures are available at http://gtcourses.gatech.edu.

Date Topic Mon Wed Events
Jan 5, 7 * Course introduction
* Big data analytics building blocks, data Collection, and simple storage (SQLite)
Slides Slides  
12, 14 * Data cleaning & integration
* Visualization fundamentals by Chad Stolper
Slides Slides HW1 out (Thu)
19, 21 * Data visualization for the web (D3) by Chad Stolper MLK day Slides
26, 28 Analytics in Practice #1: Mike Chekal, Senior Manager in the Customer Product Area of Information Technologies, Union Pacific
* Dimensionality Reduction: techniques, visualization, practitioner's guide
Union Pacific guest lecture Slides HW1 due (Fri)
Feb 2, 4 * Data Mining Concepts & Tasks
* Visualization DOs and DON'Ts; Heilmeier Questions
Slides Slides HW2 out
9, 11 * Graph analytics
  • how to build and store graphs
  • basics; power laws; centrality
  • graph statistics and how to compute them (algorithms)
Slides Slides Form project teams by Friday
16, 18 * Scaling up: Hadoop, Pig
Canceled Slides HW2 due (Fri)
23, 25 * Scaling up: HBase, Hive
Slides Snow
Mar 2, 4 * Scaling up: Spark, Spark SQL
* Interactive graph applications
Slides Slides HW3 out (Mon);
Proj proposal due (Fri, 11:55pm EST)
9, 11 Project proposal presentations Students present proposals Students present proposals  
16, 18 Spring break X X
23, 25 * Analytics in Practice #2: Josh Patterson
* Classification (techniques)
Vectorization
Deep Learning
Slides
HW3 due
Apr 30, 1 * Analytics in Practice #3: Ed Chi, Google
Google guest lecture Canceled Project progress report due (Fri, 11:55pm EST)
HW4 out
6, 8 * Ensemble Methods
* Text analytics: concepts
* Text analytics: algorithms (LSI=SVD)
Slides Slides, Slides
13, 15 * Time series: algorithms
* Time series: algorithms, visualization, & applications
Slides Slides HW4 due (Fri)
20, 22 * Closing words and course overview
* Project poster presentations
Poster presentation. Klaus 1116. Pizza + drinks served! Proj final report due (Fri, 11:55pm EST)

Homework (50% of grade)

The fastest way to get help with homework assignments is to post your questions on Piazza. If you prefer that your question addresses to only our TAs and the instructor, you can use the private post feature (i.e., check the "Individual Students(s) / Instructors(s)" radio box).
While collaboration is allowed for homework assignments, each student must write up their own answers. All GT students must observe the honor code.
We plan to have 4 assignments in total.

Project (50% of grade)

See project description. See the schedule table above for deliverable due dates.

Use Piazza!

We use Piazza for discussion and all announcements.

Post your questions there. Our teaching staff and your fellow classmates will help answer them quickly. You can also use Pizza to find project teammates.

T-square will only be used for submission of assignments and projects.

While we welcome everyone to share their experiences in tackling issues and helping each other out, but please do not post your answers, as that may affect the learning experience of your fellow classmates.

Late Submissions Policy

Distance Learning Sections (Q & Q3)

A standard 3-day lag applies to all homework and project deliverables.  For project presentation, a group that has DL student member can choose to:
  1. Present in class without 3-day lag; or 
  2. Submit a video presentation with 3-day lag (e.g., screen capture)

Dataset Ideas (may need API, or scraping)

Reading materials & Resources

Data Science

Visualization

SQL

Prerequisites & Expectation

For both CSE 6242 (grad) and CX 4242 (undergrad)

Students are expected to complete significant programming assignments (homework, project) that may involve higher-level languages or scripting (e.g., Java, R, Matlab, Python, C++, etc.).

Some assignments may involve web programming and D3 (e.g., Javascript, CSS).

You are expected to quickly learn many new things. For example, an assignment on Hadoop programming may require you to learn some basic Java and Scala quickly, which should not be too challenging if you already know another high-level language like Python or C++. Please make sure you are comfortable with this.

Please take a look at the assignments (homework and project) of the previous offerings of this course, which will give you some idea about the difficulty level of the assignments.

Basic linear algebra, probability knowledge is expected.

Additional formal prerequisites for CSE 6242

None, but you should have taken courses similar to those listed in the next section, at Georgia Tech or at another school.

Additional formal prerequisites for CX 4242

(Undergraduate Semester level MATH 2605 Minimum Grade of D or
Undergraduate Semester level MATH 2401 Minimum Grade of D or
Undergraduate Semester level MATH 24X1 Minimum Grade of D) or
and
(Undergraduate Semester level MATH 3215 Minimum Grade of D or
Undergraduate Semester level MATH 3225 Minimum Grade of D or
Undergraduate Semester level ECE 3077 Minimum Grade of D or
Undergraduate Semester level ISYE 2027 Minimum Grade of D)
and
(Undergraduate Semester level CS 1371 Minimum Grade of C or
Undergraduate Semester level CS 1372 Minimum Grade of C or
Undergraduate Semester level CX 4010 Minimum Grade of C or
Undergraduate Semester level CX 4240 Minimum Grade of C)

If you want to audit this course...

You must first obtain instructor's permission of the instructor, then enroll in the course. The auditor must attend all lectures, and optionally complete the assignments.

Previous offerings

Fall 2014 - CSE 6242 / CX 4242 – Polo Chau
Spring 2014 - CSE 6242 / CX 4242 – Polo Chau
Spring 2013 - CSE 6242 / CS 4803-DVA – Polo Chau
Spring 2011 - CSE 8803-DVA / CS 4803-DVA - Guy Lebanon
Spring 2010 - CSE 8803-DVA - Guy Lebanon

Acknowledgements & Related Classes

We thank Amazon's AWS in Education grant program for providing support for Amazon Web Services.
Tableau's data visualization software is provided through the Tableau for Teaching program.

Many thanks to my colleagues for sharing their course materials:
Prof. John Stasko - Information Visualization - Fall 2012
Prof. Jeff Heer - Research Topics in Interactive Data Analysis - Spring 2011
Prof. Christos Faloutsos - Multimedia Databases and Data Mining - Fall 2012