Polo Club of Data Science
This course has concluded. See https://poloclub.github.io/#cse6242 for all past course offerings.

This course will introduce you to broad classes of techniques and tools for analyzing and visualizing data at scale. It emphasizes on how to combine computation and visualization to perform effective analysis. We will cover methods from each side, and hybrid ones that combine the best of both worlds. Students will work in small teams to complete a research project exploring novel approaches for interactive data & visual analytics.

Piazza Discussion Forum

We will use Piazza for discussion (e.g., homework, project). Post your questions there, and the teaching staff and your fellow classmates will be able to help answer them quickly. You can also use Pizza to find project teammates.

T-square will only be used for submission of assignments and projects.

Office Hours

Instructor Polo Chau Thu, 3-4pm, Klaus 1324
TA Robert Pienta Wed, 4-5pm, common area next to Klaus 1324
TA Long Tran Mon, 4-5pm, Klaus 1305
Grader Alan Zhang

Schedule (tentative)

Video recordings of the lectures are available at http://gtcourses.gatech.edu.
Date Topic Tue Thu Events
Jan 7, 9 * Course introduction
* Big data analytics building blocks, data Collection, and simple storage (SQLite)
Slides Slides  
14, 16 * Data cleaning & integration
* Data Mining Concepts & Tasks
Slides Slides HW1 out (Tue)
21, 23 * Visualization fundamentals
* Data visualization for the web (D3)
Slides Slides by Chad Stolper
28, 30 Snow days! X X HW1 due (Mon)
Feb 4, 6 * Visualization DOs and DON'Ts; Heilmeier Questions
* Graph analytics
  • how to build and store graphs
  • basics; power laws; centrality
  • graph statistics and how to compute them
Slides Slides HW2 out (Sat)
11, 13 Snow days again! X X
18, 20 * Graph analytics
  • graph algorithms
  • interactive tools
  • applications
* Scaling up (Hadoop, Pig, HBase, Hive, Pegasus)
Slides Slides * Form proj teams by 2/21
* HW2 due 2/21
25, 27 * Scaling up (cont'd)
* Classification (techniques, visualization & interaction)
Slides Slides  
Mar 4, 6 * Clustering
* Dimensionality Reduction: techniques, visualization, practitioner's guide
Slides Slides. Guest lecture by Dr. Jaegul Choo * Proj proposals due 3/8 (DL: due 3/15).
* HW3 out (Sun).
11, 13 Project proposal presentations      
18, 20 Spring Break X X  
25, 27 Time series: algorithms, visualization, & applications Slides Slides  
Apr 1, 3 Text analytics: concepts, algorithms (LSI=SVD), visualization Slides Slides HW3 due (Mon, 3/31)
Apr 8, 10 * Ensemble Methods
* Human Computation
Slides Slides Progress report due Wed, 4/9, 5pm; HW4 out (Mon)
15, 17 * Analytics in the real world
* Closing words and course overview
Guest lecture by Flavio Villanustre, VP at LexisNexis, HPCC Systems. Slides Slides  
22, 24 Project presentations     Final report due Fri, 4/25, 5pm (DL: 5/2); HW4 due 4/30 (DL: 5/2)

Grading

Late Submissions Policy

Homework (tentative)

Please note that while collaboration is allowed, individual collaborators *must* write up their own answers. All GT students must observe the honor code.

Project

Team project: 3-4 people. Description and grading policy (proposal + presentation, progress report, final report + presentation).

Dataset Ideas (may need API, or scraping)

Auditors

Auditors must first obtain instructor's permission of the instructor, then enroll in the course. The auditor must attend all lectures, and optionally complete the assignments.

Textbooks and reading materials

Prerequisites

For both CSE 6242 (grad) and CX 4242 (undergrad)
Students are expected to complete significant programming assignments (homework, project) that may involve higher-level languages or scripting (e.g., Java, R, Matlab, Python, C++, etc.). Some assignments may involve web programming and D3 (e.g., Javascript, CSS). Basic algebra, probability knowledge is expected.
Additional formal prerequisites for CSE 6242
None.
Additional formal prerequisites for CX 4242
(Undergraduate Semester level MATH 2605 Minimum Grade of D or
Undergraduate Semester level MATH 2401 Minimum Grade of D or
Undergraduate Semester level MATH 24X1 Minimum Grade of D) or
and
(Undergraduate Semester level MATH 3215 Minimum Grade of D or
Undergraduate Semester level MATH 3225 Minimum Grade of D or
Undergraduate Semester level ECE 3077 Minimum Grade of D or
Undergraduate Semester level ISYE 2027 Minimum Grade of D)
and
(Undergraduate Semester level CS 1371 Minimum Grade of C or
Undergraduate Semester level CS 1372 Minimum Grade of C or
Undergraduate Semester level CX 4010 Minimum Grade of C or
Undergraduate Semester level CX 4240 Minimum Grade of C)

Previous offerings

See https://poloclub.github.io/#cse6242 for all past course offerings.

Acknowledgements & Related Classes

We thank Amazon's AWS in Education grant program for providing support for Amazon Web Services.

Many thanks to my colleagues for sharing their course materials:
Prof. John Stasko - Information Visualization - Fall 2012
Prof. Jeff Heer - Research Topics in Interactive Data Analysis - Spring 2011
Prof. Christos Faloutsos - Multimedia Databases and Data Mining - Fall 2012