IDEA 2014 was the biggest idea ever, with almost 200 registrations! It featured two keynotes,
16 papers and presentations, and a -supported networking + poster + demo session!
Join us at IDEA 2015 at KDD in Sydney, Australia!

Download our poster! (1.1MB)

The Interactive Data Exploration and Analytics (IDEA) workshop addresses the development of data mining techniques that allow users to interactively explore their data. We focus and emphasize on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction (HCI). In other words, we explore how the best of these different but related domains can be combined such that the sum is greater than the parts. Last year's IDEA at KDD 2013 in Chicago was a great success.


Program & Attending IDEA

IDEA will be a full-day workshop on Sunday, Aug 24, at KDD 2014 at Sheraton New York Times Square Hotel (map). Register and book hotel rooms through KDD's registration site.

We are very proud to have Bloomberg as the Headline supporter of IDEA 2014!

You are cordially invited to attend the Bloomberg presentation at 10:30 A.M., and please join the Bloomberg-supported Poster + Interactive Demo + Networking session at 3 P.M. - 4 P.M., with snacks and drinks!

In total, 16 papers were accepted for presentation at IDEA 2014. We selected 9 for oral presentation, and 7 for presentation during the Poster, Demo & Networking session.

Download IDEA'14 proceedings (25MB)

9:00 Welcome + Bloomberg Remarks
9:10 Keynote 1
Prof. Ben Shneiderman
University of Maryland, College Park
Information Visualization for Knowledge Discovery: Big Insights from Big Data

Slides
BEN SHNEIDERMAN is a Distinguished University Professor in the Department of Computer Science and Founding Director (1983-2000) of the Human-Computer Interaction Laboratory at the University of Maryland. He is a Fellow of the AAAS, ACM, and IEEE, and a Member of the National Academy of Engineering, in recognition of his pioneering contributions to human-computer interaction and information visualization. His contributions include the direct manipulation concept, clickable web-link, touchscreen keyboards, dynamic query sliders for Spotfire, development of treemaps, innovative network visualization strategies for NodeXL, and temporal event sequence analysis for electronic health records.

Ben is the co-author with Catherine Plaisant of Designing the User Interface: Strategies for Effective Human-Computer Interaction (5th ed., 2010). With Stu Card and Jock Mackinlay, he co-authored Readings in Information Visualization: Using Vision to Think (1999). His book Leonardo’s Laptop appeared in October 2002 (MIT Press) and won the IEEE book award for Distinguished Literary Contribution. His latest book, with Derek Hansen and Marc Smith, is Analyzing Social Media Networks with NodeXL (2010).

Abstract
Interactive information visualization tools provide researchers with remarkable capabilities to support discovery from Big Data resources. Users can begin with an overview, zoom in on areas of interest, filter out unwanted items, and then click for details-on-demand. The Big Data initiatives and commercial success stories such as Spotfire and Tableau, plus widespread use by prominent sites such as the New York Times have made visualization a key technology.

The central theme is the integration of statistics with visualization to support user discovery. Our work focuses on temporal event sequences such as found in electronic health records (www.cs.umd.edu/hcil/eventflow), and social network data such a twitter discussion patterns (www.codeplex.com/nodexl). The talk closes with 8 Golden Rules for Big Data.
10:00 Coffee
10:30 Research Talks (time allocation: 15+5 each)
Visualizing Uncertainty in Spatio-temporal data BibTeX
Ayush Shrestha, Ying Zhu and Ben Miller
Skim-reading thousands of documents in one minute:
Data indexing and visualization for multifarious search
Alessandro Perina, Dongwoo Kim, Andrzej Turski, and Nebojsa Jojic
Formalising the subjective interestingness of a linear projection of a data set: two examples
Tijl De Bie
VizLinc: Integrating information extraction, search, graph analysis, and geo-location for the visual exploration of large data sets
Joel Acevedo-Aviles, William Campbell, Daniel Halbert and Kara Greenfield
Interactive Data Mining Considered Harmful (If Done Wrong)
Pauli Miettinen
12:30 Lunch
2:00 Re-welcome
2:10 Keynote 2
Prof. Aditya Parameswaran
University of Illinois (UIUC)
Human-Powered and Visual Data Management

Slides
Aditya Parameswaran is an Assistant Professor in Computer Science at the University of Illinois (UIUC). He is currently spending the year visiting MIT CSAIL, after completing his Ph.D. from Stanford University in Sept. 2013, advised by Prof. Hector Garcia-Molina. He is broadly interested in data analytics, with research results in human computation, visual analytics, information extraction and integration, and recommender systems. Aditya is a recipient of the Arthur Samuel award for the best dissertation in Computer Science at Stanford (2013), the SIGMOD Jim Gray dissertation award (2014), the SIGKDD dissertation award runner-up (2014), the Key Scientific Challenges Award from Yahoo! Research (2010), two best-of-conference citations (VLDB 2010 and KDD 2012), the Terry Groswith graduate fellowship at Stanford (2007), and the Gold Medal in Computer Science at IIT Bombay (2007).

Abstract
This talk will consist of two parts. The first part will be on an ongoing project:

Fully automated algorithms are inadequate for many data analysis tasks, especially those involving images, video, or text. Thus, we need to combine crowdsourcing with traditional computation, to improve the process of understanding, extracting and managing data. In this part, I will present a broad perspective of our research on this topic. I will then present details of one of the problems we have addressed: filtering large data sets with the aid of humans. For more details, see: i.stanford.edu/~adityagp/scoop.html

The second part will be on a project that is just starting off:

Data scientists rely on visualizations to interpret the data returned by queries, but finding the right visualization remains a manual task that is often laborious. We propose a system that partially automates the task of finding the right visualizations for a query. The output will comprise a recommendation of potentially "interesting" or "useful" visualizations, where each visualization is coupled with a suitable query execution plan. I will discuss the technical challenges in building this system and preliminary results, and outline an agenda for future research. For more details, see http://goo.gl/FHZY61 (to appear at VLDB '14)
3:00
Posters + Interactive Demo + Networking Session
— with cupcakes and drinks
EigenSense: Saving User Effort with Active Metric Learning
Eli T. Brown and Remco Chang
CrowdMGR: Interactive Visual Analytics to Interpret Crowdsourced Data
Abon Chaudhuri and Mahashweta Das
Rapid Data Exploration and Visual Data Mining on Relational Data
Gartheeban Ganeshapillai, Joel Brooks and John Guttag
Decomposing a Sequence into Independent Subsequences Using Compression Algorithms
Thanh Lam Hoang, Julia Kiseleva, Mykola Pechenizkiy and Toon Calders
Interactive Visualization Applications for Maritime Anomaly Detection and Analysis
Valérie Lavigne
Interactive Exploration of Comparative Dependency Network Learning
Diane Oyen and Terran Lane
NIA: System for News Impact Analytics
Mikalai Tsytsarau and Themis Palpanas
4:00 Talks (time allocation: 15+5 each)
Interactive Exploration of Larger Pattern Collections:
A Case Study on a Cocktail Dataset
Daniel Paurat, Roman Garnett and Thomas Gärtner
Better Logging to Improve Interactive Data Analysis Tools
Sara Alspaugh, Archana Ganapathi, Marti Hearst and Randy Katz
Explorable Visual Analytics, Knowledge Discovery in Large and High–Dimensional Data
Saman Amirpour Amraii, Michael Lewis, Randy Sargent and Illah Nourbakhsh
Toward Usable Interactive Analytics: Coupling Cognition and Computation
Alex Endert, Chris North, Remco Chang and Michelle Zhou
5:20 Closing

Important Dates

Submission Fri, June 20, 2014, 23:59 Eastern time (EST)
Notification Mon, July 7, 2014
Camera-ready Fri, July 18, 2014
Workshop Sun, August 24, 2014

Keynotes

Prof. Ben Shneiderman
University of Maryland, College Park
Information Visualization for Knowledge Discovery: Big Insights from Big Data
BEN SHNEIDERMAN is a Distinguished University Professor in the Department of Computer Science and Founding Director (1983-2000) of the Human-Computer Interaction Laboratory at the University of Maryland. He is a Fellow of the AAAS, ACM, and IEEE, and a Member of the National Academy of Engineering, in recognition of his pioneering contributions to human-computer interaction and information visualization. His contributions include the direct manipulation concept, clickable web-link, touchscreen keyboards, dynamic query sliders for Spotfire, development of treemaps, innovative network visualization strategies for NodeXL, and temporal event sequence analysis for electronic health records.

Ben is the co-author with Catherine Plaisant of Designing the User Interface: Strategies for Effective Human-Computer Interaction (5th ed., 2010). With Stu Card and Jock Mackinlay, he co-authored Readings in Information Visualization: Using Vision to Think (1999). His book Leonardo’s Laptop appeared in October 2002 (MIT Press) and won the IEEE book award for Distinguished Literary Contribution. His latest book, with Derek Hansen and Marc Smith, is Analyzing Social Media Networks with NodeXL (2010).

Abstract
Interactive information visualization tools provide researchers with remarkable capabilities to support discovery from Big Data resources. Users can begin with an overview, zoom in on areas of interest, filter out unwanted items, and then click for details-on-demand. The Big Data initiatives and commercial success stories such as Spotfire and Tableau, plus widespread use by prominent sites such as the New York Times have made visualization a key technology.

The central theme is the integration of statistics with visualization to support user discovery. Our work focuses on temporal event sequences such as found in electronic health records (www.cs.umd.edu/hcil/eventflow), and social network data such a twitter discussion patterns (www.codeplex.com/nodexl). The talk closes with 8 Golden Rules for Big Data.
Prof. Aditya Parameswaran
University of Illinois (UIUC)
Human-Powered and Visual Data Management
Aditya Parameswaran is an Assistant Professor in Computer Science at the University of Illinois (UIUC). He is currently spending the year visiting MIT CSAIL, after completing his Ph.D. from Stanford University in Sept. 2013, advised by Prof. Hector Garcia-Molina. He is broadly interested in data analytics, with research results in human computation, visual analytics, information extraction and integration, and recommender systems. Aditya is a recipient of the Arthur Samuel award for the best dissertation in Computer Science at Stanford (2013), the SIGMOD Jim Gray dissertation award (2014), the SIGKDD dissertation award runner-up (2014), the Key Scientific Challenges Award from Yahoo! Research (2010), two best-of-conference citations (VLDB 2010 and KDD 2012), the Terry Groswith graduate fellowship at Stanford (2007), and the Gold Medal in Computer Science at IIT Bombay (2007).

Abstract
This talk will consist of two parts. The first part will be on an ongoing project:

Fully automated algorithms are inadequate for many data analysis tasks, especially those involving images, video, or text. Thus, we need to combine crowdsourcing with traditional computation, to improve the process of understanding, extracting and managing data. In this part, I will present a broad perspective of our research on this topic. I will then present details of one of the problems we have addressed: filtering large data sets with the aid of humans. For more details, see: i.stanford.edu/~adityagp/scoop.html

The second part will be on a project that is just starting off:

Data scientists rely on visualizations to interpret the data returned by queries, but finding the right visualization remains a manual task that is often laborious. We propose a system that partially automates the task of finding the right visualizations for a query. The output will comprise a recommendation of potentially "interesting" or "useful" visualizations, where each visualization is coupled with a suitable query execution plan. I will discuss the technical challenges in building this system and preliminary results, and outline an agenda for future research. For more details, see http://goo.gl/FHZY61 (to appear at VLDB '14)

Organizers

Polo Chau
Georgia Tech
Jilles Vreeken
Max Planck Institute for Informatics,
and Saarland University
Matthijs van Leeuwen
KU Leuven
Christos Faloutsos
Carnegie Mellon
Contact us at:
idea14kdd (at) gmail.com

Sponsors & Supporters


Program Committee

Adam Perer (IBM, USA)
Andreas Holzinger (Medical University Graz, Austria)
Antti Oulasvirta (Aalto University, Finland)
Antti Ukkonen (Aalto University, Finland)
Arno Knobbe (Universiteit Leiden, Netherlands)
Arno Siebes (Universiteit Utrecht, Netherlands)
Cody Dunne (IBM Watson, USA)
Dafna Shahaf (Stanford, USA)
Esther Galbrun (Boston University, USA)
Fei Sha (University of Southern California, USA)
Geoff Webb (Monash University, Australia)
George Forman (HP Labs, USA)
Hanghang Tong (CUNY and Arizona State University, USA)
Jaakko Hollmén (Aalto University, Finland)
Jaegul Choo (Georgia Tech, USA)
Jefrey Lijffijt (Aalto University, Finland)
Kai Puolomäki (Aalto University, Finland)
Klaus Mueller (Stony Brook University, USA)
Leman Akoglu (Stony Brook University, USA)
Lisa Singh (George Town, USA)
Michael Berthold (University of Konstanz, Germany)
Nan Cao (IBM, USA)
Nikolaj Tatti (Aalto University, Finland)
Olivier Thonnard (Symantec)
Parikshit Ram (Georgia Tech, USA)
Pauli Miettinen (Max-Planck Institute for Informatics, Germany)
Saleema Amershi (Microsoft Research, USA)
Stefan Kramer (University Mainz, Germany)
Thomas Gärtner (University of Bonn, Germany)
Thomas Seidl (Aachen University, Germany)
Tijl De Bie (University of Bristol, UK)
Tina Eliassi-Rad (Rutgers, USA)
U Kang (KAIST)
Zhicheng 'Leo' Liu (Stanford, USA)

What's the IDEA?

We have entered the era of big data. Massive datasets, surpassing terabytes and petabytes, are now commonplace. They arise in numerous settings in science, government, and enterprises. Today, technology exists by which we can collect and store such massive amounts of information. Yet, making sense of these data remains a fundamental challenge. We lack the means to exploratively analyze databases of this scale. Currently, few technologies allow us to freely "wander" around the data, and make discoveries by following our intuition, or serendipity. While standard data mining aims at finding highly interesting results, it is typically computationally demanding and time consuming, thus may not be well-suited for interactive exploration of large datasets.

Interactive data mining techniques that aptly integrate human intuition, by means of visualization and intuitive human-computer interaction (HCI) techniques, and machine computation support have been shown to help people gain significant insights into a wide range of problems. However, as datasets are being generated in larger volumes, higher velocity, and greater variety, creating effective interactive data mining techniques becomes a much harder task.

Our focus and emphasis is on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction. In other words, we intend to explore how the best of these different but related domains can be combined such that the sum is greater than the parts.


Call for Papers

Topics of interests for the workshop include, but are not limited to:
  • interactive data mining algorithms
  • visualizations for interactive data mining
  • demonstrations of interactive data mining
  • quick, high-level data analysis methods
  • any-time data mining algorithms
  • visual analytics
  • methods that allow meaningful intermediate results
  • data surrogates
  • on-line algorithms
  • adaptive stream mining algorithms
  • theoretical/complexity analysis of instant data mining
  • learning from user input for action replication/prediction
  • active learning / mining

Submission Information

All papers will be peer reviewed, single-blinded. We welcome many kinds of papers, such as (and not limited to):

  • Novel research papers
  • Demo papers
  • Work-in-progress papers
  • Visionary papers (white papers)

Authors should clearly indicate in their abstracts the kinds of submissions that the papers belong to, to help reviewers better understand their contributions. Submissions must be in PDF, written in English, no more than 10 pages long — shorter papers are welcome — and formatted according to the standard double-column ACM Proceedings Style (Tighter Alternate style).

For accepted papers, at least one author must attend the workshop to present the work.

For paper submission, proceed to the IDEA 2014 submission website.