This course will introduce you to broad classes of techniques and tools for analyzing and visualizing data at scale.
It emphasizes on how to combine computation and visualization to perform effective analysis.
We will cover methods from each side, and hybrid ones that combine the best of both worlds.
Students will work in small teams to complete a research project exploring novel approaches for interactive data & visual analytics.
Course Information
- Instructor
- Polo Chau Thu 3-4pm, Klaus 1324
- TAs
- Parikshit Ram Mon 4-5pm, Klaus 1315
Sooraj Bhat
- Class meets
- Tue, Thu 1:35 - 2:55, Klaus 2456
- Q&A and discuss at
- Piazza
Schedule (tentative)
Wk |
Date |
Topic |
|
1 |
Jan |
8 |
Course Introduction |
|
|
|
10 |
Big data analytics process & building blocks |
|
2 |
|
15 |
Data Collection, Simple Storage (SQLite) & Cleaning |
|
|
|
17 |
Data Integration |
HW1 out |
3 |
|
22 |
Visualization fundamentals |
|
|
|
24 |
How to present your analysis (to your boss, or for research) |
|
4 |
|
29 |
Classification (techniques) |
HW1 due |
|
|
31 |
Classification (visualization & interaction) |
|
5 |
Feb |
5 |
Canceled |
|
|
|
7 |
Clustering |
|
6 |
|
12 |
Dimensionality Reduction (techniques) |
|
|
|
14 |
Dimensionality Reduction (more tehniques, visualization, practitioner's guide) |
|
7 |
|
19 |
Graphs I (basics, how to build and store graphs, laws, etc.) |
HW 1 grades out |
|
|
21 |
Graphs II (centrality, algorithms) |
|
8 |
|
26 |
Graphs III (Interactive tools, applications) |
HW2 out |
|
|
28 |
Scaling up (Hadoop) |
|
9 |
Mar |
5 |
Scaling up (Pig, HBase) |
|
|
|
7 |
Scaling up (HBase, Hive, Pegasus) |
Proposal due, 1:30pm |
10 |
|
12 |
Project proposal presentations |
|
|
|
14 |
Project proposal presentations |
|
11 |
|
19 |
Spring break |
|
|
|
21 |
Spring break |
|
12 |
|
26 |
Human Computation 1 |
|
|
|
28 |
Human Computation 2 |
HW2 due |
13 |
Apr |
2 |
Time series (algorithms) |
|
|
|
4 |
Time series (visualizaiton & applications) |
AWS Setup Guide, HW3 out; Progress report due 4/5, 5pm |
14 |
|
9 |
Text analytics (algorithms and concepts) |
|
|
|
11 |
Text analytics (LSI=SVD, visualization) |
|
15 |
|
16 |
Canceled |
|
|
|
18 |
Review |
|
16 |
|
23 |
Project presentations |
|
|
|
25 |
Project presentations |
Final report due 4/26, 5pm;
HW3 due 4/26, 1:30pm;
bonus HW due 4/28, 11:59pm |
Grading
- 40% Homework
- 50% Project
- 10% Class Participation
Homework
Late policy for deliverable
- No penalties, for medical reasons (bring doctor's note) or emergencies
- For homework, every person has 4 slip days for this course; no questions asked
- For project, each team has 3 slip days for this course; no questions asked
- After all slip days are used up, 5% deduction for each day of delay (e.g., if an homework has 100 points total, each day of delay will incur 5 point deducation)
Project
Team project: 2-4 people.
Description and grading policy (proposal + presentation, progress report, final report + presentation).
Textbooks and reading materials
- None required.
- Good reads (not required):
Prerequisites
No formal prerequisites.
However, students are expected to complete significant programming assignments (homework, project) that
may involve higher-level languages or scripting (e.g., Java, R, Matlab, etc.).
Basic algebra, probability knowledge is expected.
Acknowledgements & Related Classes
- We thank Amazon's AWS in Education grant program for providing support for Amazon Web Services
- Data and Visual Analytics, by Prof. Guy Lebanon
- Information Visualization, by Prof. John Stasko
- Research Topics in Interactive Data Analysis, by Prof. Jeffrey Heer
- Multimedia Databases and Data Mining, by Prof. Christos Faloutsos