This course will introduce you to broad classes of techniques and tools for analyzing and visualizing data at scale.
It emphasizes on how to combine computation and visualization to perform effective analysis.
We will cover methods from each side, and hybrid ones that combine the best of both worlds.
Students will work in small teams to complete a research project exploring novel approaches for interactive data & visual analytics.
Course Information
 Instructor
 Polo Chau Thu 34pm, Klaus 1324
 TAs
 Parikshit Ram Mon 45pm, Klaus 1315
Sooraj Bhat
 Class meets
 Tue, Thu 1:35  2:55, Klaus 2456
 Q&A and discuss at
 Piazza
Schedule (tentative)
Wk 
Date 
Topic 

1 
Jan 
8 
Course Introduction 



10 
Big data analytics process & building blocks 

2 

15 
Data Collection, Simple Storage (SQLite) & Cleaning 



17 
Data Integration 
HW1 out 
3 

22 
Visualization fundamentals 



24 
How to present your analysis (to your boss, or for research) 

4 

29 
Classification (techniques) 
HW1 due 


31 
Classification (visualization & interaction) 

5 
Feb 
5 
Canceled 



7 
Clustering 

6 

12 
Dimensionality Reduction (techniques) 



14 
Dimensionality Reduction (more tehniques, visualization, practitioner's guide) 

7 

19 
Graphs I (basics, how to build and store graphs, laws, etc.) 
HW 1 grades out 


21 
Graphs II (centrality, algorithms) 

8 

26 
Graphs III (Interactive tools, applications) 
HW2 out 


28 
Scaling up (Hadoop) 

9 
Mar 
5 
Scaling up (Pig, HBase) 



7 
Scaling up (HBase, Hive, Pegasus) 
Proposal due, 1:30pm 
10 

12 
Project proposal presentations 



14 
Project proposal presentations 

11 

19 
Spring break 



21 
Spring break 

12 

26 
Human Computation 1 



28 
Human Computation 2 
HW2 due 
13 
Apr 
2 
Time series (algorithms) 



4 
Time series (visualizaiton & applications) 
AWS Setup Guide, HW3 out; Progress report due 4/5, 5pm 
14 

9 
Text analytics (algorithms and concepts) 



11 
Text analytics (LSI=SVD, visualization) 

15 

16 
Canceled 



18 
Review 

16 

23 
Project presentations 



25 
Project presentations 
Final report due 4/26, 5pm;
HW3 due 4/26, 1:30pm;
bonus HW due 4/28, 11:59pm 
Grading
 40% Homework
 50% Project
 10% Class Participation
Homework
Late policy for deliverable
 No penalties, for medical reasons (bring doctor's note) or emergencies
 For homework, every person has 4 slip days for this course; no questions asked
 For project, each team has 3 slip days for this course; no questions asked
 After all slip days are used up, 5% deduction for each day of delay (e.g., if an homework has 100 points total, each day of delay will incur 5 point deducation)
Project
Team project: 24 people.
Description and grading policy (proposal + presentation, progress report, final report + presentation).
Textbooks and reading materials
 None required.
 Good reads (not required):
Prerequisites
No formal prerequisites.
However, students are expected to complete significant programming assignments (homework, project) that
may involve higherlevel languages or scripting (e.g., Java, R, Matlab, etc.).
Basic algebra, probability knowledge is expected.
Acknowledgements & Related Classes
 We thank Amazon's AWS in Education grant program for providing support for Amazon Web Services
 Data and Visual Analytics, by Prof. Guy Lebanon
 Information Visualization, by Prof. John Stasko
 Research Topics in Interactive Data Analysis, by Prof. Jeffrey Heer
 Multimedia Databases and Data Mining, by Prof. Christos Faloutsos