Polo Chau | Tue, 3:30-4:00pm (+ 30min after Tue's class at Clough Starbucks) |
Klaus 1324 | |
Nilaksh Das | Fri, 1-2PM | CCB common area (1st floor) | |
Pradeep Vairamani Rajendran | Wed, 2-3PM | Outside Klaus 3201 | |
Yanwei Zhang | Mon, 1-2PM | Klaus 3205 | |
Bhanu Verma | Fri, 2-3PM | CULC (3rd floor), Common area near 325, take right from stairs, walk few steps, common area is on the left | |
Meghna Natraj | Tue, 2-3PM | CCB common area (1st floor) | |
Vishakha Singh | Wed, 11AM-12PM | Outside Klaus 3100 |
Date | Topic | Tue | Thu | Events | |
---|---|---|---|---|---|
Aug | 23, 25 | * Course introduction * Big data analytics building blocks |
intro | building blocks | |
30, 1 | * Data Collection, and simple storage (SQLite) * Data cleaning |
collection, cleaning | cancelled | HW1 out | |
Sept | 6, 8 |
* Data integration: knowledge graph/database; feldspar; data reconciliation/de-duplication; similarity functions * Heilmeier questions; group project core requirements * Example projects: (1) Firebird: Predicting Fire Risks in Atlanta (2) PASSAGE: A Travel Safety Assistant |
integration | project, Firebird, PASSAGE | |
13, 15 |
* Visualization 101 * Fixing common visualization issues |
vis101 | fix-vis | HW1 due (Fri, 11:55pm) | |
20, 22 |
* Industry Talk * Data visualization for the web (D3) |
Yahoo Tech Talk + info session | d3 | Form project teams by Friday; HW2 out |
|
27, 29 |
* Data Mining Concepts & Tasks * fixing visualization and presentation |
hw2 walkthrough, javascript demo, data mining concepts | concepts, teamwork tips, fix-vis | ||
Oct | 4, 6 |
* Intro to classification: k nearest neighbor (KNN), decision trees, cross validation * Scaling up: Hadoop, Pig, Hive |
classification-intro | hadoop, pig, hive | |
11, 13 |
* Scaling up: Spark, Spark SQL |
X (fall break) | spark, backup code with git/github | HW2 due (Mon, 11:55pm); HW3 out | |
18, 20 | Project proposal presentations | Show time! | Show time! | Project proposal & slides due (Mon, 11:55pm) | |
25, 27 |
* Scaling up: HBase *MMap * Graph analytics
|
hbase | graphs basics | ||
Nov | 1, 3 | More graphs | graphs centrality and algorithms | graphs centrality and algorithms | HW3 due (Fri, 11:55pm) |
8, 10 |
* Ensemble method, bagging, random forests * Classification (visualization & interaction) * Memory-mapping/virtual memory to scale up algorihtms |
bagging, random forests | roc, auc, confusion matrix, mmap | HW4 out Project progress report due (Sun, 11:55pm EST) |
|
15, 17 |
* Text analytics: concepts * Text analytics: algorithms (LSI=SVD) |
text analytics, clustering | text analytics | ||
22, 24 | Thanksgiving | X | X | ||
29, 1 |
* Time series: algorithms, visualization, & applications (* Dimension reduction (PCA, MDS, LDA, IsoMap)) |
time series linear and non-linear forecasting | review | HW4 due (Sun, 11:55pm) | |
Dec | 6 | Project poster presentations |
Poster presentation. 4:30pm to 6pm-ish. Klaus Atrium. Pizza + drinks served! | X | Proj final report due (Tue, 11:55pm EST) |
We use Piazza for discussion and all announcements.
Post your questions there. Our teaching staff and your fellow classmates will help answer them quickly. You can also use Pizza to find project teammates.
T-square will only be used for submission of assignments and projects.
While we welcome everyone to share their experiences in tackling issues and helping each other out, but please do not post your answers, as that may affect the learning experience of your fellow classmates.
Some assignments may involve web programming and D3 (e.g., Javascript, CSS).
You are expected to quickly learn many new things. For example, an assignment on Hadoop programming may require you to learn some basic Java and Scala quickly, which should not be too challenging if you already know another high-level language like Python or C++. Please make sure you are comfortable with this.
Please take a look at the assignments (homework and project) of the previous offerings of this course, which will give you some idea about the difficulty level of the assignments.
Basic linear algebra, probability knowledge is expected.
Prof. John Stasko - Information Visualization - Fall 2012
Prof. Jeff Heer - Research Topics in Interactive Data Analysis - Spring 2011
Prof. Christos Faloutsos - Multimedia Databases and Data Mining - Fall 2012