The Interactive Data Exploration and Analytics (IDEA) workshop addresses the development of data mining techniques that allow users to interactively explore their data. We focus and emphasize on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction (HCI). In other words, we explore how the best of these different but related domains can be combined such that the sum is greater than the parts. The IDEAs at KDD in Chicago 2013, in New York City 2014, in Sydney 2015, and in San Francisco 2016 were all a great success.
|8:15||Welcome to IDEA'17|
Interactive Machine Learning via Transparent Modeling: Putting Experts in the Driver’s Seat
Rich Caruana is a Senior Researcher at Microsoft Research. Before joining Microsoft, Rich was on the faculty in the Computer Science Department at Cornell University, at UCLA's Medical School, and at CMU's Center for Learning and Discovery. Rich's Ph.D. is from Carnegie Mellon University, where he worked with Tom Mitchell and Herb Simon. His thesis on Multi-Task Learning helped create interest in a new subfield of machine learning called Transfer Learning. Rich received an NSF CAREER Award in 2004 (for Meta Clustering), best paper awards in 2005 (with Alex Niculescu-Mizil), 2007 (with Daria Sorokina), and 2014 (with Todd Kulesza, Saleema Amershi, Danyel Fisher, and Denis Charles), co-chaired KDD in 2007 (with Xindong Wu), and serves as area chair for NIPS, ICML, and KDD. His current research focus is on learning for medical decision making, transparent modeling, deep learning, and computational ecology.
In machine learning often a tradeoff must be made between accuracy and intelligibility: the most accurate models usually are not very intelligible (e.g., deep nets, boosted trees, and random forests), and the most intelligible models usually are less accurate (e.g., linear or logistic regression). This tradeoff often limits the accuracy of models that can be used in mission-critical applications such as healthcare where being able to understand, validate, edit, and ultimately trust a learned model is important. We have developed a learning method based on generalized additive models called GA2Ms that is often as accurate as full complexity models, but as intelligible as linear/logistic regression models. GA2Ms not only make it easy to understand what a model learned and how it makes predictions, but it also makes it easier to edit the model when it learns “bad” things. These bad things typically arise not because the learning algorithm is wrong, but because the data has unexpected “landmines” hidden in it. Making it possible for experts to understand a model and interactively repair it is critical for safe deployment because most data has such landmines. In the talk I’ll present cases studies where these transparent, high-performance GAMs are applied to problems in healthcare and recidivism prediction, and explain what we’re doing to make the models easier for experts to understand and edit.
Research Talks (time allocation: 15+5 each)
Research Talks (time allocation: 15+5 each)
Carnegie Mellon University
Understanding Node-Attributed Networks: Interactive Exploration and Summarization
Leman Akoglu is an assistant professor of Information Systems at the Heinz College of Carnegie Mellon University, with courtesy appointments in the Computer Science and Machine Learning Departments of School of Computer Science. She received her PhD from the Computer Science Department at Carnegie Mellon University. Her research interests are algorithmic problems in graph mining, focusing on patterns and anomalies, with applications to fraud and event detection. Dr. Akoglu's research has won 5 publication awards; Best Paper Runner-up at SIAM SDM 2016, Best Paper at SIAM SDM 2015, Best Paper at ADC 2014, Best Paper at PAKDD 2010, and Best Knowledge Discovery Paper at ECML/PKDD 2009. She is also a recipient of the NSF CAREER award (2015) and Army Research Office Young Investigator award (2013).
Given a large network with node attributes, like social networks, how can we make sense of it? How can we characterize, describe, and summarize the network in a succinct way? Visually exploring networks is a challenge when the network size exceeds several hundreds of nodes. It is even more challenging to visualize attributed networks with tens or hundreds of node attributes.
In this talk, I will introduce an end-to-end approach to sense-making of node-attributed networks. The key idea is “description-by-parts”, where the emphasis is on the main building blocks: the communities that the network contains. In order, I will introduce i) a quality measure for node-attributed communities called ‘normality’, ii) a community extraction technique based on ‘normality’, iii) a summarization task of identifying a few representative communities, where users get to adjust the aspect of the summarization to focus on: network coverage, quality, or attribute diversity; and finally iv) an interactive visualization interface that enables users to explore the communities and devise their own summaries or build on algorithm-generated summaries to devise alternative summaries. At the end, I will give a demo of our system (implemented in Tableau and Java) on a real-world Facebook college network.
Research Talks (time allocation: 10+5 each)
Aalto University & Helsinki Institute for Information Technology (HIIT)
Interactive intent modeling
Samuel Kaski is an Academy (research) Professor of the Academy of Finland, Professor of Computer Science at Aalto University, and Director of the Finnish Center of Excellence in Computational Inference Research COIN. His field is probabilistic machine learning, with applications involving multiple data sources in interactive information retrieval, data visualization, health and biology.
I will discuss our recent work on interactive machine learning in two closely related setups: (i) interactive intent modeling for information discovery and (ii) knowledge elicitation on features for improving predictive modelling given limited high-dimensional data. Both setups require balancing between exploration and exploitation in the interactions, and interactive modelling of the user which can be formulated as experimental design or multiarmed bandit problems. I will also discuss extensions to multimodal interfaces, including mind reading, and to inferring more advanced cognitive models from data with Approximate Bayesian Computation.
Posters + Interactive Demo + Networking Session (with food & drinks!) 12 posters total, including those for oral presentations
We have entered the era of big data. Massive datasets, surpassing terabytes and petabytes, are now commonplace. They arise in numerous settings in science, government, and enterprises. Today, technology exists by which we can collect and store such massive amounts of information. Yet, making sense of these data remains a fundamental challenge. We lack the means to exploratively analyze databases of this scale. Currently, few technologies allow us to freely "wander" around the data, and make discoveries by following our intuition, or serendipity. While standard data mining aims at finding highly interesting results, it is typically computationally demanding and time consuming, thus may not be well-suited for interactive exploration of large datasets.
Interactive data mining techniques that aptly integrate human intuition, by means of visualization and intuitive human-computer interaction (HCI) techniques, and machine computation support have been shown to help people gain significant insights into a wide range of problems. However, as datasets are being generated in larger volumes, higher velocity, and greater variety, creating effective interactive data mining techniques becomes a much harder task.
Our focus and emphasis is on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction. In other words, we intend to explore how the best of these different but related domains can be combined such that the sum is greater than the parts.
|Workshop||Mon, August 14, 2017|
All papers will be peer reviewed, single-blinded. We welcome many kinds of papers, such as (and not limited to):
Authors should clearly indicate in their abstracts the kinds of submissions that the papers belong to, to help reviewers better understand their contributions. Submissions must be in PDF, written in English, no more than 10 pages long — shorter papers are welcome — and formatted according to the standard double-column ACM Sigconf Proceedings Style.
The accepted papers will be posted on the workshop website and will not appear in the KDD proceedings.
For accepted papers, at least one author must attend the workshop to present the work.
For paper submission, proceed to the IDEA 2017 submission website.