|Jan||5, 7||* Course introduction
* Big data analytics building blocks, data Collection, and simple storage (SQLite)
|12, 14||* Data cleaning & integration
* Visualization fundamentals by Chad Stolper
|Slides||Slides||HW1 out (Thu)|
|19, 21||* Data visualization for the web (D3) by Chad Stolper||MLK day||Slides|
Analytics in Practice #1: Mike Chekal, Senior Manager in the Customer Product Area of Information Technologies, Union Pacific
* Dimensionality Reduction: techniques, visualization, practitioner's guide
|Union Pacific guest lecture||Slides||HW1 due (Fri)|
* Data Mining Concepts & Tasks
* Visualization DOs and DON'Ts; Heilmeier Questions
|9, 11||* Graph analytics
||Slides||Slides||Form project teams by Friday|
|16, 18|| * Scaling up: Hadoop, Pig
||Canceled||Slides||HW2 due (Fri)|
|23, 25||* Scaling up: HBase, Hive
* Scaling up: Spark, Spark SQL
* Interactive graph applications
|Slides||Slides||HW3 out (Mon);
Proj proposal due (Fri, 11:55pm EST)
|9, 11||Project proposal presentations||Students present proposals||Students present proposals|
|16, 18||Spring break||X||X|
* Analytics in Practice #2: Josh Patterson
* Classification (techniques)
* Analytics in Practice #3: Ed Chi, Google
||Google guest lecture||Canceled||Project progress report due (Fri, 11:55pm EST)
* Ensemble Methods
* Text analytics: concepts
* Text analytics: algorithms (LSI=SVD)
* Time series: algorithms
* Time series: algorithms, visualization, & applications
|Slides||Slides||HW4 due (Fri)|
|20, 22||* Closing words and course overview
* Project poster presentations
|Poster presentation. Klaus 1116. Pizza + drinks served!||Proj final report due (Fri, 11:55pm EST)|
We use Piazza for discussion and all announcements.
Post your questions there. Our teaching staff and your fellow classmates will help answer them quickly. You can also use Pizza to find project teammates.
T-square will only be used for submission of assignments and projects.
While we welcome everyone to share their experiences in tackling issues and helping each other out, but please do not post your answers, as that may affect the learning experience of your fellow classmates.
You are expected to quickly learn many new things. For example, an assignment on Hadoop programming may require you to learn some basic Java and Scala quickly, which should not be too challenging if you already know another high-level language like Python or C++. Please make sure you are comfortable with this.
Please take a look at the assignments (homework and project) of the previous offerings of this course, which will give you some idea about the difficulty level of the assignments.
Basic linear algebra, probability knowledge is expected.
Prof. John Stasko - Information Visualization - Fall 2012
Prof. Jeff Heer - Research Topics in Interactive Data Analysis - Spring 2011
Prof. Christos Faloutsos - Multimedia Databases and Data Mining - Fall 2012