Polo Chau | Tue, 3:30-4:00pm + FREE after-class coffee, at Clough Starbucks |
Klaus 1324 | |
Meghna Natraj Head TA |
Mon, 1-2PM | Klaus- Open area next to Polo’s office | |
Fred Hohman | Mon, 1-2PM | Klaus- Open area next to Polo’s office | |
Bhanu Verma | Wed, 1-2PM | CULC (3rd floor), Common area near 325, take right from stairs, walk few steps, common area is on the left | |
Chirag Tailor | Wed, 1-2PM | CULC (3rd floor), Common area near 325, take right from stairs, walk few steps, common area is on the left | |
Kiran Sudhir | Thu, 12-1PM | Klaus- Open area next to Polo’s office | |
Varun Bezzam | Thu, 12-1PM | Klaus- Open area next to Polo’s office |
Everyone must join this class's Piazza, at https://piazza.com/class/ixpgu1xccuo47d.
Double check that you are joining the right Piazza!When you have questions about class, homework, project, etc., post your questions there. Our teaching staff and your fellow classmates will help answer them quickly. You can also use Piazza to find project teammates.
T-square will only be used for submission of assignments and projects.
While we welcome everyone to share their experiences in tackling issues and helping each other out, but please do not post your answers, as that may affect the learning experience of your fellow classmates.
Date | Topic | Tue | Thu | Events | |
---|---|---|---|---|---|
Jan | 10, 12 | * Course introduction * Big data analytics building blocks |
intro | analytics building blocks | |
17, 19 |
* Data Collection, and simple storage (SQLite) * Data cleaning |
data collection | cleaning, GT Github | HW1 out | |
24, 26 |
* Class Project overview; Heilmeier questions * Example projects: (1) Firebird: Predicting Fire Risks in Atlanta, presented by Wenwen Chang (2) PASSAGE: A Travel Safety Assistant, presented by Nilaksh Das, Meghna Natraj * Data integration: knowledge graph; data reconciliation/de-duplication; similarity functions |
Firebird, PASSAGE, project | integration | ||
Feb | 31, 2 |
* Visualization 101 * Fixing common visualization issues (* Fixing presentation issues) |
vis101 | fix vis | HW1 due (Fri, 11:55pm) |
7, 9 |
* Data visualization for the web (D3) * Data analytics concepts & tasks |
d3 | concepts | Form project teams by Friday; HW2 out |
|
14, 16 | * Scaling up: Hadoop, Pig, Hive | hadoop | pig, hive, spark | ||
21, 23 |
* Industry Talk: Kristin Ottofy on Microsoft Azure * Scaling up: Spark, Spark SQL |
Microsoft Azure talk | spark, project proposal and presentation | ||
Mar | 28, 2 |
* Classification key concepts, k-NN, decision tree, cross validation |
classification key concepts | continue | HW2 due (Wed, 11:55pm); HW3 out |
7, 9 | Project proposal presentations | Show time! | Show time! | Project proposal & slides due (Mon, 11:55pm) | |
14, 16 |
* Classification vis (ROC, AUC, confusion matrix) * Scaling up: HBase * Intro to clustering; DBSCAN |
Classification Vis, hbase | clustering | ||
21, 23 | Spring Break | X | X | ||
28, 30 |
* Graph analytics
|
graph basics & laws | graph centrality | HW3 due (Fri, 11:55pm) |
|
Apr | 4, 6 |
* Ensemble method, bagging, random forests * Memory-mapping/virtual memory to scale up algorithms |
pagerank, apolo, user eval | random forests, MMap | HW4 out Project progress report due (Fri, 11:55pm EST) |
11, 13 |
* Text analytics: concepts * Text analytics: algorithms (LSI=SVD) |
text, LSI/SVD | cont'd | ||
18, 20 |
* Time series: algorithms, visualization, & applications (* Dimension reduction: PCA, MDS, LDA, IsoMap) * Project poster presentations |
time series, non-linear forecasting | Poster presentation. 4:30pm to 6pm-ish. Klaus Atrium. Pizza + drinks served! | HW4 due (Sun, 11:55pm) | |
25 | * Closing words and course overview |
Review \ lessons learned | X | Proj final report due (Tue, 11:55pm EST) |
Some assignments may involve web programming and D3 (e.g., Javascript, CSS).
You are expected to quickly learn many new things. For example, an assignment on Hadoop programming may require you to learn some basic Java and Scala quickly, which should not be too challenging if you already know another high-level language like Python or C++. Please make sure you are comfortable with this.
Please take a look at the assignments (homework and project) of the previous offerings of this course, which will give you some idea about the difficulty level of the assignments.
Basic linear algebra, probability knowledge is expected.
Prof. John Stasko - Information Visualization - Fall 2012
Prof. Jeff Heer - Research Topics in Interactive Data Analysis - Spring 2011
Prof. Christos Faloutsos - Multimedia Databases and Data Mining - Fall 2012