Come join our sponsored poster + demo + networking session at 3pm. (Mon, Aug 14, at Suite 301)! Food and drinks will be served!
Program & proceedings now online! See you in Halifax!

The Interactive Data Exploration and Analytics (IDEA) workshop addresses the development of data mining techniques that allow users to interactively explore their data. We focus and emphasize on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction (HCI). In other words, we explore how the best of these different but related domains can be combined such that the sum is greater than the parts. The IDEAs at KDD in Chicago 2013, in New York City 2014, in Sydney 2015, and in San Francisco 2016 were all a great success.

Program & Attending IDEA

IDEA will be a full-day workshop on Monday, Aug 14, at KDD 2017, in Suite 301 of the World Trade and Convention Centre. You may register and book hotel rooms through KDD.

In total, 12 papers have been accepted for presentation, 6 for oral presentation over the day, all 12 for interactive discussion at the poster + demo + networking session sponsored by Microsoft Research.

Download IDEA'17 proceedings (20MB)
8:15 Welcome to IDEA'17
8:30 Keynote 1
Dr. Rich Caruana
Microsoft Research
Interactive Machine Learning via Transparent Modeling: Putting Experts in the Driver’s Seat

Rich Caruana is a Senior Researcher at Microsoft Research. Before joining Microsoft, Rich was on the faculty in the Computer Science Department at Cornell University, at UCLA's Medical School, and at CMU's Center for Learning and Discovery. Rich's Ph.D. is from Carnegie Mellon University, where he worked with Tom Mitchell and Herb Simon. His thesis on Multi-Task Learning helped create interest in a new subfield of machine learning called Transfer Learning. Rich received an NSF CAREER Award in 2004 (for Meta Clustering), best paper awards in 2005 (with Alex Niculescu-Mizil), 2007 (with Daria Sorokina), and 2014 (with Todd Kulesza, Saleema Amershi, Danyel Fisher, and Denis Charles), co-chaired KDD in 2007 (with Xindong Wu), and serves as area chair for NIPS, ICML, and KDD. His current research focus is on learning for medical decision making, transparent modeling, deep learning, and computational ecology.

In machine learning often a tradeoff must be made between accuracy and intelligibility: the most accurate models usually are not very intelligible (e.g., deep nets, boosted trees, and random forests), and the most intelligible models usually are less accurate (e.g., linear or logistic regression). This tradeoff often limits the accuracy of models that can be used in mission-critical applications such as healthcare where being able to understand, validate, edit, and ultimately trust a learned model is important. We have developed a learning method based on generalized additive models called GA2Ms that is often as accurate as full complexity models, but as intelligible as linear/logistic regression models. GA2Ms not only make it easy to understand what a model learned and how it makes predictions, but it also makes it easier to edit the model when it learns “bad” things. These bad things typically arise not because the learning algorithm is wrong, but because the data has unexpected “landmines” hidden in it. Making it possible for experts to understand a model and interactively repair it is critical for safe deployment because most data has such landmines. In the talk I’ll present cases studies where these transparent, high-performance GAMs are applied to problems in healthcare and recidivism prediction, and explain what we’re doing to make the models easier for experts to understand and edit.
9:20 Research Talks (time allocation: 15+5 each)
Learning Strategies in Game-theoretic Data Interaction
Ben McCamish, Arash Termehchy, Behrouz Touri, and Liang Huang
Exploring Dimensionality Reductions with Forward and Backward Projections
Marco Cavallo and Cagatay Demiralp
10:00 Coffee
10:30 Research Talks (time allocation: 15+5 each)
Foresight: Recommending Visual Insights
Cagatay Demiralp, Peter J. Haas, Srinivasan Parthasarathy, and Tejaswini Pedapati
10:50 Keynote 2
Prof. Leman Akoglu
Carnegie Mellon University
Understanding Node-Attributed Networks: Interactive Exploration and Summarization

Leman Akoglu is an assistant professor of Information Systems at the Heinz College of Carnegie Mellon University, with courtesy appointments in the Computer Science and Machine Learning Departments of School of Computer Science. She received her PhD from the Computer Science Department at Carnegie Mellon University. Her research interests are algorithmic problems in graph mining, focusing on patterns and anomalies, with applications to fraud and event detection. Dr. Akoglu's research has won 5 publication awards; Best Paper Runner-up at SIAM SDM 2016, Best Paper at SIAM SDM 2015, Best Paper at ADC 2014, Best Paper at PAKDD 2010, and Best Knowledge Discovery Paper at ECML/PKDD 2009. She is also a recipient of the NSF CAREER award (2015) and Army Research Office Young Investigator award (2013).

Given a large network with node attributes, like social networks, how can we make sense of it? How can we characterize, describe, and summarize the network in a succinct way? Visually exploring networks is a challenge when the network size exceeds several hundreds of nodes. It is even more challenging to visualize attributed networks with tens or hundreds of node attributes.

In this talk, I will introduce an end-to-end approach to sense-making of node-attributed networks. The key idea is “description-by-parts”, where the emphasis is on the main building blocks: the communities that the network contains. In order, I will introduce i) a quality measure for node-attributed communities called ‘normality’, ii) a community extraction technique based on ‘normality’, iii) a summarization task of identifying a few representative communities, where users get to adjust the aspect of the summarization to focus on: network coverage, quality, or attribute diversity; and finally iv) an interactive visualization interface that enables users to explore the communities and devise their own summaries or build on algorithm-generated summaries to devise alternative summaries. At the end, I will give a demo of our system (implemented in Tableau and Java) on a real-world Facebook college network.
11:40 Lunch
13:00 Re-welcome
1:10 Research Talks (time allocation: 10+5 each)
Interactive Unsupervised Clustering with Clustervision
Bum Chul Kwon, Ben Eysenbach, Janu Verma, Kenney Ng, and Adam Perer
Incorporating Feedback into Tree-based Anomaly Detection
Shubhomoy Das, Weng-Keen Wong, Alan Fern, Thomas Dietterich, and Md. Amran Siddiqui
Portable In-Browser Data Cube Exploration
Kareem El Gebaly, Lukasz Golab, and Jimmy Lin
2:10 Keynote 3
Prof. Samuel Kaski
Aalto University & Helsinki Institute for Information Technology (HIIT)
Interactive intent modeling

Samuel Kaski is an Academy (research) Professor of the Academy of Finland, Professor of Computer Science at Aalto University, and Director of the Finnish Center of Excellence in Computational Inference Research COIN. His field is probabilistic machine learning, with applications involving multiple data sources in interactive information retrieval, data visualization, health and biology.

I will discuss our recent work on interactive machine learning in two closely related setups: (i) interactive intent modeling for information discovery and (ii) knowledge elicitation on features for improving predictive modelling given limited high-dimensional data. Both setups require balancing between exploration and exploitation in the interactions, and interactive modelling of the user which can be formulated as experimental design or multiarmed bandit problems. I will also discuss extensions to multimodal interfaces, including mind reading, and to inferring more advanced cognitive models from data with Approximate Bayesian Computation.
Posters + Interactive Demo + Networking Session (with food & drinks!)
12 posters total, including those for oral presentations
Visualizing Wikipedia for Interactive Exploration
Ron Bekkerman and Olga Donin
DycomDetector: Discover topics using automatic community detections in dynamic networks
Tommy Dang, Vinh Nguyen, and Md. Yasin Kabir
ECOviz: Comparative Vizualization of Time-Evolving Network Summaries
Lisa Jin and Danai Koutra
Clipped Projections for More Informative Visualizations [A Work-in-Progress Report]
Bo Kang, Junning Deng, Jefrey Lijffijt, and Tijl De Bie
Towards an Interactive Learning-to-Rank System for Economic Competitiveness Understanding
Caitlin Kuhlman and Elke Rundensteiner
Data Sketches for Disaggregated Subset Sum Estimation
Daniel Ting

What's the IDEA?

We have entered the era of big data. Massive datasets, surpassing terabytes and petabytes, are now commonplace. They arise in numerous settings in science, government, and enterprises. Today, technology exists by which we can collect and store such massive amounts of information. Yet, making sense of these data remains a fundamental challenge. We lack the means to exploratively analyze databases of this scale. Currently, few technologies allow us to freely "wander" around the data, and make discoveries by following our intuition, or serendipity. While standard data mining aims at finding highly interesting results, it is typically computationally demanding and time consuming, thus may not be well-suited for interactive exploration of large datasets.

Interactive data mining techniques that aptly integrate human intuition, by means of visualization and intuitive human-computer interaction (HCI) techniques, and machine computation support have been shown to help people gain significant insights into a wide range of problems. However, as datasets are being generated in larger volumes, higher velocity, and greater variety, creating effective interactive data mining techniques becomes a much harder task.

Our focus and emphasis is on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction. In other words, we intend to explore how the best of these different but related domains can be combined such that the sum is greater than the parts.

Important Dates

Submission Fri, June 2, 2017, 23:59 Hawaii Time
Notification Fri, June 23, 2017
Camera-ready Fri, July 7, 2017
Workshop Mon, August 14, 2017

Call for Papers

Topics of interests for the workshop include, but are not limited to:
  • Interactive data mining algorithms
  • Visualizations for interactive data mining
  • Demonstrations of interactive data mining
  • Quick, high-level data analysis methods
  • Any-time data mining algorithms
  • Visual analytics
  • Methods that allow meaningful intermediate results
  • Data surrogates
  • On-line algorithms
  • Adaptive stream mining algorithms
  • Theoretical/complexity analysis of instant data mining
  • Learning from user input for action replication/prediction
  • Active learning / mining

Submission Information

All papers will be peer reviewed, single-blinded. We welcome many kinds of papers, such as (and not limited to):

  • Novel research papers
  • Demo papers
  • Work-in-progress papers
  • Visionary papers (white papers)
  • Appraisal papers of existing methods and tools (e.g., lessons learned)
  • Relevant work that has been previously published
  • Work that will be presented at the main conference of KDD

Authors should clearly indicate in their abstracts the kinds of submissions that the papers belong to, to help reviewers better understand their contributions. Submissions must be in PDF, written in English, no more than 10 pages long — shorter papers are welcome — and formatted according to the standard double-column ACM Sigconf Proceedings Style.

The accepted papers will be posted on the workshop website and will not appear in the KDD proceedings.

For accepted papers, at least one author must attend the workshop to present the work.

For paper submission, proceed to the IDEA 2017 submission website.

IDEA 2017 Keynotes

Dr. Rich Caruana
Microsoft Research
Prof. Leman Akoglu
Carnegie Mellon University
Prof. Samuel Kaski
Aalto University, HIIT

IDEA 2016 Keynotes

Prof. Jerome H. Friedman
Stanford University
Prof. Jeffrey Heer
University of Washington, Trifacta
Prof. Eamonn Keogh
UC Riverside
Dr. Saleema Amershi
Microsoft Research

IDEA 2015 Keynotes

Prof. Geoff Webb
Monash University
Prof. Jure Leskovec
Stanford University

IDEA 2014 Keynotes

Prof. Ben Shneiderman
University of Maryland, College Park
Prof. Aditya Parameswaran
University of Illinois (UIUC)

IDEA 2013 Keynotes

Prof. Haesun Park
Georgia Tech
Prof. Marti Hearst
UC Berkeley


Sponsors, Supporters & Friends

Program Committee

Acar Tamersoy (Symantec, USA)
Adam Perer (IBM, USA)
Aristides Gionis (Aalto U, Finland)
Bahador Saket (Georgia Tech, USA)
Bo Kang (Ghent U, Belgium)
Danai Koutra (UMich, USA)
Emilia Oikarinen (FIOH, Finland)
Esther Galbrun (INRIA Nancy, France)
Geoff Webb (Monash U, Australia)
George Forman (Amazon)
Hannah Kim (Georgia Tech, USA)
Jaegul Choo (Korea U, South Korea)
James Abello (Rutgers U, USA)
Jia-Yu (Tim) Pan (Google, USA)
Kai Puolamäki (FIOH, Finland)
Kashyap Popat (MPI-INF, Germany)
Kevin Roundy (Symantec, USA)
Marti Hearst (UC Berkeley, USA)
Mario Boley (MPI-INF, Germany)
Minsuk (Brian) Kahng (Georgia Tech, USA)
Nan Cao (NYU Shanghai, China)
Parikshit Ram (Skytree, USA)
Pauli Mietinnen (MPI-INF, Germany)
Robert Pienta (Georgia Tech, USA)
Saleema Amershi (Microsoft Research, USA)
Stefan Kramer (U Mainz, Germany)
Steffen Koch (U Stuttgart, Germany)
Stephan Günnemann (TU Munich, Germany)
Subhabrata Mukherjee (MPI-INF, Germany)
Sucheta Soundarajam (Syracuse U, USA)
Thomas Gärtner (U Nottingham, UK)
Thomas Seidl (LMU Munich, Germany)
Tijl De Bie (Ghent U, Belgium)
Wouter Duivesteijn (Eindhoven Tech, Netherlands)