Interactive Data Exploration and Analytics (IDEA 2014) - Workshop at ACM SIGKDD Aug 24 2014

IDEA 2014 was the biggest idea ever, with almost 200 registrations! It featured two keynotes,
16 papers and presentations, and a

-supported networking + poster + demo session!
Join us at IDEA 2015 at KDD in Sydney, Australia!

Download our poster! (1.1MB)
Tweet #idea2014

The Interactive Data Exploration and Analytics (IDEA) workshop addresses the development of data mining techniques that allow users to interactively explore their data. We focus and emphasize on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction (HCI). In other words, we explore how the best of these different but related domains can be combined such that the sum is greater than the parts. Last year's IDEA at KDD 2013 in Chicago was a great success.

‹ ›

Program & Attending IDEA

IDEA will be a full-day workshop on Sunday, Aug 24, at KDD 2014 at Sheraton New York Times Square Hotel (map). Register and book hotel rooms through KDD's registration site.

We are very proud to have Bloomberg as the Headline supporter of IDEA 2014!

You are cordially invited to attend the Bloomberg presentation at 10:30 A.M., and please join the Bloomberg-supported Poster + Interactive Demo + Networking session at 3 P.M. - 4 P.M., with snacks and drinks!

In total, 16 papers were accepted for presentation at IDEA 2014. We selected 9 for oral presentation, and 7 for presentation during the Poster, Demo & Networking session.

Download IDEA'14 proceedings (25MB)

9:00	Welcome + Bloomberg Remarks
9:10	Keynote 1 Prof. Ben Shneiderman University of Maryland, College Park Information Visualization for Knowledge Discovery: Big Insights from Big Data Slides BEN SHNEIDERMAN is a Distinguished University Professor in the Department of Computer Science and Founding Director (1983-2000) of the Human-Computer Interaction Laboratory at the University of Maryland. He is a Fellow of the AAAS, ACM, and IEEE, and a Member of the National Academy of Engineering, in recognition of his pioneering contributions to human-computer interaction and information visualization. His contributions include the direct manipulation concept, clickable web-link, touchscreen keyboards, dynamic query sliders for Spotfire, development of treemaps, innovative network visualization strategies for NodeXL, and temporal event sequence analysis for electronic health records. Ben is the co-author with Catherine Plaisant of Designing the User Interface: Strategies for Effective Human-Computer Interaction (5th ed., 2010). With Stu Card and Jock Mackinlay, he co-authored Readings in Information Visualization: Using Vision to Think (1999). His book Leonardo’s Laptop appeared in October 2002 (MIT Press) and won the IEEE book award for Distinguished Literary Contribution. His latest book, with Derek Hansen and Marc Smith, is Analyzing Social Media Networks with NodeXL (2010). Abstract Interactive information visualization tools provide researchers with remarkable capabilities to support discovery from Big Data resources. Users can begin with an overview, zoom in on areas of interest, filter out unwanted items, and then click for details-on-demand. The Big Data initiatives and commercial success stories such as Spotfire and Tableau, plus widespread use by prominent sites such as the New York Times have made visualization a key technology. The central theme is the integration of statistics with visualization to support user discovery. Our work focuses on temporal event sequences such as found in electronic health records (www.cs.umd.edu/hcil/eventflow), and social network data such a twitter discussion patterns (www.codeplex.com/nodexl). The talk closes with 8 Golden Rules for Big Data.
10:00	Coffee
10:30	Research Talks (time allocation: 15+5 each) Visualizing Uncertainty in Spatio-temporal data BibTeX Ayush Shrestha, Ying Zhu and Ben Miller Skim-reading thousands of documents in one minute: Data indexing and visualization for multifarious search Alessandro Perina, Dongwoo Kim, Andrzej Turski, and Nebojsa Jojic Formalising the subjective interestingness of a linear projection of a data set: two examples Tijl De Bie VizLinc: Integrating information extraction, search, graph analysis, and geo-location for the visual exploration of large data sets Joel Acevedo-Aviles, William Campbell, Daniel Halbert and Kara Greenfield Interactive Data Mining Considered Harmful (If Done Wrong) Pauli Miettinen
12:30	Lunch
2:00	Re-welcome
2:10	Keynote 2 Prof. Aditya Parameswaran University of Illinois (UIUC) Human-Powered and Visual Data Management Slides Aditya Parameswaran is an Assistant Professor in Computer Science at the University of Illinois (UIUC). He is currently spending the year visiting MIT CSAIL, after completing his Ph.D. from Stanford University in Sept. 2013, advised by Prof. Hector Garcia-Molina. He is broadly interested in data analytics, with research results in human computation, visual analytics, information extraction and integration, and recommender systems. Aditya is a recipient of the Arthur Samuel award for the best dissertation in Computer Science at Stanford (2013), the SIGMOD Jim Gray dissertation award (2014), the SIGKDD dissertation award runner-up (2014), the Key Scientific Challenges Award from Yahoo! Research (2010), two best-of-conference citations (VLDB 2010 and KDD 2012), the Terry Groswith graduate fellowship at Stanford (2007), and the Gold Medal in Computer Science at IIT Bombay (2007). Abstract This talk will consist of two parts. The first part will be on an ongoing project: Fully automated algorithms are inadequate for many data analysis tasks, especially those involving images, video, or text. Thus, we need to combine crowdsourcing with traditional computation, to improve the process of understanding, extracting and managing data. In this part, I will present a broad perspective of our research on this topic. I will then present details of one of the problems we have addressed: filtering large data sets with the aid of humans. For more details, see: i.stanford.edu/~adityagp/scoop.html The second part will be on a project that is just starting off: Data scientists rely on visualizations to interpret the data returned by queries, but finding the right visualization remains a manual task that is often laborious. We propose a system that partially automates the task of finding the right visualizations for a query. The output will comprise a recommendation of potentially "interesting" or "useful" visualizations, where each visualization is coupled with a suitable query execution plan. I will discuss the technical challenges in building this system and preliminary results, and outline an agenda for future research. For more details, see http://goo.gl/FHZY61 (to appear at VLDB '14)
3:00	Posters + Interactive Demo + Networking Session — with cupcakes and drinks — EigenSense: Saving User Effort with Active Metric Learning Eli T. Brown and Remco Chang CrowdMGR: Interactive Visual Analytics to Interpret Crowdsourced Data Abon Chaudhuri and Mahashweta Das Rapid Data Exploration and Visual Data Mining on Relational Data Gartheeban Ganeshapillai, Joel Brooks and John Guttag Decomposing a Sequence into Independent Subsequences Using Compression Algorithms Thanh Lam Hoang, Julia Kiseleva, Mykola Pechenizkiy and Toon Calders Interactive Visualization Applications for Maritime Anomaly Detection and Analysis Valérie Lavigne Interactive Exploration of Comparative Dependency Network Learning Diane Oyen and Terran Lane NIA: System for News Impact Analytics Mikalai Tsytsarau and Themis Palpanas
4:00	Talks (time allocation: 15+5 each) Interactive Exploration of Larger Pattern Collections: A Case Study on a Cocktail Dataset Daniel Paurat, Roman Garnett and Thomas Gärtner Better Logging to Improve Interactive Data Analysis Tools Sara Alspaugh, Archana Ganapathi, Marti Hearst and Randy Katz Explorable Visual Analytics, Knowledge Discovery in Large and High–Dimensional Data Saman Amirpour Amraii, Michael Lewis, Randy Sargent and Illah Nourbakhsh Toward Usable Interactive Analytics: Coupling Cognition and Computation Alex Endert, Chris North, Remco Chang and Michelle Zhou
5:20	Closing

Important Dates

Submission	Fri, June 20, 2014, 23:59 Eastern time (EST)
Notification	~~Mon, July 7, 2014~~
Camera-ready	~~Fri, July 18, 2014~~
Workshop	Sun, August 24, 2014

Keynotes

Prof. Ben Shneiderman
University of Maryland, College Park
Information Visualization for Knowledge Discovery: Big Insights from Big Data

BEN SHNEIDERMAN is a Distinguished University Professor in the Department of Computer Science and Founding Director (1983-2000) of the Human-Computer Interaction Laboratory at the University of Maryland. He is a Fellow of the AAAS, ACM, and IEEE, and a Member of the National Academy of Engineering, in recognition of his pioneering contributions to human-computer interaction and information visualization. His contributions include the direct manipulation concept, clickable web-link, touchscreen keyboards, dynamic query sliders for Spotfire, development of treemaps, innovative network visualization strategies for NodeXL, and temporal event sequence analysis for electronic health records.

Ben is the co-author with Catherine Plaisant of Designing the User Interface: Strategies for Effective Human-Computer Interaction (5th ed., 2010). With Stu Card and Jock Mackinlay, he co-authored Readings in Information Visualization: Using Vision to Think (1999). His book Leonardo’s Laptop appeared in October 2002 (MIT Press) and won the IEEE book award for Distinguished Literary Contribution. His latest book, with Derek Hansen and Marc Smith, is Analyzing Social Media Networks with NodeXL (2010).

Abstract
Interactive information visualization tools provide researchers with remarkable capabilities to support discovery from Big Data resources. Users can begin with an overview, zoom in on areas of interest, filter out unwanted items, and then click for details-on-demand. The Big Data initiatives and commercial success stories such as Spotfire and Tableau, plus widespread use by prominent sites such as the New York Times have made visualization a key technology.

The central theme is the integration of statistics with visualization to support user discovery. Our work focuses on temporal event sequences such as found in electronic health records (www.cs.umd.edu/hcil/eventflow), and social network data such a twitter discussion patterns (www.codeplex.com/nodexl). The talk closes with 8 Golden Rules for Big Data.

Prof. Aditya Parameswaran
University of Illinois (UIUC)
Human-Powered and Visual Data Management

Aditya Parameswaran is an Assistant Professor in Computer Science at the University of Illinois (UIUC). He is currently spending the year visiting MIT CSAIL, after completing his Ph.D. from Stanford University in Sept. 2013, advised by Prof. Hector Garcia-Molina. He is broadly interested in data analytics, with research results in human computation, visual analytics, information extraction and integration, and recommender systems. Aditya is a recipient of the Arthur Samuel award for the best dissertation in Computer Science at Stanford (2013), the SIGMOD Jim Gray dissertation award (2014), the SIGKDD dissertation award runner-up (2014), the Key Scientific Challenges Award from Yahoo! Research (2010), two best-of-conference citations (VLDB 2010 and KDD 2012), the Terry Groswith graduate fellowship at Stanford (2007), and the Gold Medal in Computer Science at IIT Bombay (2007).

Abstract
This talk will consist of two parts. The first part will be on an ongoing project:

Fully automated algorithms are inadequate for many data analysis tasks, especially those involving images, video, or text. Thus, we need to combine crowdsourcing with traditional computation, to improve the process of understanding, extracting and managing data. In this part, I will present a broad perspective of our research on this topic. I will then present details of one of the problems we have addressed: filtering large data sets with the aid of humans. For more details, see: i.stanford.edu/~adityagp/scoop.html

The second part will be on a project that is just starting off:

Data scientists rely on visualizations to interpret the data returned by queries, but finding the right visualization remains a manual task that is often laborious. We propose a system that partially automates the task of finding the right visualizations for a query. The output will comprise a recommendation of potentially "interesting" or "useful" visualizations, where each visualization is coupled with a suitable query execution plan. I will discuss the technical challenges in building this system and preliminary results, and outline an agenda for future research. For more details, see http://goo.gl/FHZY61 (to appear at VLDB '14)

Organizers

Polo Chau
Georgia Tech

Jilles Vreeken
Max Planck Institute for Informatics,
and Saarland University

Matthijs van Leeuwen
KU Leuven

Christos Faloutsos
Carnegie Mellon

Sponsors & Supporters

Program Committee

Adam Perer (IBM, USA)
Andreas Holzinger (Medical University Graz, Austria)
Antti Oulasvirta (Aalto University, Finland)
Antti Ukkonen (Aalto University, Finland)
Arno Knobbe (Universiteit Leiden, Netherlands)
Arno Siebes (Universiteit Utrecht, Netherlands)
Cody Dunne (IBM Watson, USA)
Dafna Shahaf (Stanford, USA)
Esther Galbrun (Boston University, USA)
Fei Sha (University of Southern California, USA)
Geoff Webb (Monash University, Australia)
George Forman (HP Labs, USA)
Hanghang Tong (CUNY and Arizona State University, USA)
Jaakko Hollmén (Aalto University, Finland)
Jaegul Choo (Georgia Tech, USA)
Jefrey Lijffijt (Aalto University, Finland)
Kai Puolomäki (Aalto University, Finland)

Klaus Mueller (Stony Brook University, USA)
Leman Akoglu (Stony Brook University, USA)
Lisa Singh (George Town, USA)
Michael Berthold (University of Konstanz, Germany)
Nan Cao (IBM, USA)
Nikolaj Tatti (Aalto University, Finland)
Olivier Thonnard (Symantec)
Parikshit Ram (Georgia Tech, USA)
Pauli Miettinen (Max-Planck Institute for Informatics, Germany)
Saleema Amershi (Microsoft Research, USA)
Stefan Kramer (University Mainz, Germany)
Thomas Gärtner (University of Bonn, Germany)
Thomas Seidl (Aachen University, Germany)
Tijl De Bie (University of Bristol, UK)
Tina Eliassi-Rad (Rutgers, USA)
U Kang (KAIST)
Zhicheng 'Leo' Liu (Stanford, USA)

What's the IDEA?

We have entered the era of big data. Massive datasets, surpassing terabytes and petabytes, are now commonplace. They arise in numerous settings in science, government, and enterprises. Today, technology exists by which we can collect and store such massive amounts of information. Yet, making sense of these data remains a fundamental challenge. We lack the means to exploratively analyze databases of this scale. Currently, few technologies allow us to freely "wander" around the data, and make discoveries by following our intuition, or serendipity. While standard data mining aims at finding highly interesting results, it is typically computationally demanding and time consuming, thus may not be well-suited for interactive exploration of large datasets.

Interactive data mining techniques that aptly integrate human intuition, by means of visualization and intuitive human-computer interaction (HCI) techniques, and machine computation support have been shown to help people gain significant insights into a wide range of problems. However, as datasets are being generated in larger volumes, higher velocity, and greater variety, creating effective interactive data mining techniques becomes a much harder task.

Our focus and emphasis is on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction. In other words, we intend to explore how the best of these different but related domains can be combined such that the sum is greater than the parts.

Call for Papers

Topics of interests for the workshop include, but are not limited to:

interactive data mining algorithms
visualizations for interactive data mining
demonstrations of interactive data mining
quick, high-level data analysis methods
any-time data mining algorithms
visual analytics
methods that allow meaningful intermediate results
data surrogates
on-line algorithms
adaptive stream mining algorithms
theoretical/complexity analysis of instant data mining
learning from user input for action replication/prediction
active learning / mining

Submission Information

All papers will be peer reviewed, single-blinded. We welcome many kinds of papers, such as (and not limited to):

Novel research papers
Demo papers
Work-in-progress papers
Visionary papers (white papers)

Authors should clearly indicate in their abstracts the kinds of submissions that the papers belong to, to help reviewers better understand their contributions. Submissions must be in PDF, written in English, no more than 10 pages long — shorter papers are welcome — and formatted according to the standard double-column ACM Proceedings Style (Tighter Alternate style).

For accepted papers, at least one author must attend the workshop to present the work.

For paper submission, proceed to the IDEA 2014 submission website.