Interactive Data Exploration and Analytics (IDEA 2015) - Workshop at ACM SIGKDD Aug 10 2015

IDEA 2015 was another great success! Join us at IDEA 2016 at KDD in San Francisco on August 14, 2016! Submissions due May 27.

Come join our Microsoft Research sponsored networking + poster + demo session at 4:10pm (Mon, Aug 10, at Hilton Sydney, Lv 4, Room 1)! Teacakes, croissants, appetizers and drinks will be served.

DOUBLE THE EXCITEMENT! KDD15 program chair Geoff Webb and KDD14 program chair Jure Leskovec will give keynotes!

Download our poster!
Tweet #idea2015

The Interactive Data Exploration and Analytics (IDEA) workshop addresses the development of data mining techniques that allow users to interactively explore their data. We focus and emphasize on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction (HCI). In other words, we explore how the best of these different but related domains can be combined such that the sum is greater than the parts. Last years' IDEAs at KDD 2013 in Chicago and KDD 2014 in New York City were great successes.

‹ ›

Impression of IDEA 2014 in New York City

Program & Attending IDEA

IDEA will be a full-day workshop on Monday, Aug 10, at KDD 2015 at the Hilton Sydney (map), on Level 4, Room 1. Register and book hotel rooms through KDD's registration site.

We are proud to have Microsoft Research as the Headline supporter of IDEA 2015!

You are cordially invited to join the Microsoft Research supported Poster + Interactive Demo + Networking session at 4:10 P.M.!

In total, 9 papers were accepted at IDEA 2015, for oral presentation over the day, and for interactive discussion at the poster + demo + network session.

Download IDEA'15 proceedings (27MB)

8:50	Welcome
9:00	Keynote 1 Prof. Geoff Webb Monash University The Knowledge Factory: A Retrospective Geoff Webb is a Professor of Information Technology Research in the Faculty of Information Technology at Monash University, where he heads the Centre for Data Science. His primary research areas are machine learning, data mining, user modeling and computational structural biology. Many of his learning algorithms are included in the widely-used Weka machine learning workbench. A commercial implementation of his association discovery techniques, Magnum Opus, has been acquired by BigML Inc for inclusion in their cloud-based data mining solution. He was editor-in-chief of the highest impact data mining journal, Data Mining and Knowledge Discovery from 2005 to 2014. He is co-editor of the Springer Encyclopedia of Machine Learning, a member of the advisory board of Statistical Analysis and Data Mining, a member of the editorial board of Machine Learning and was a foundation member of the editorial board of ACM Transactions on Knowledge Discovery from Data. He has been Program Committee Co-Chair of the two top data mining conferences, ACM SIGKDD International Conference on Knowledge Discovery from Data (2015) and the IEEE International Conference on Data Mining (2010) and General Co-Chair of the 2012 IEEE International Conference on Data Mining. He is a technical advisor to BigML, Inc. He is an IEEE Fellow and has received the 2013 IEEE ICDM Service Award and a 2014 Australian Research Council Discovery Outstanding Researcher Award. Abstract This talk revisits the first major program of research into interactive rule discovery. The Knowledge Factory is an interactive rule learning system developed in the 1990s. It has many novel features that still remain relevant today. The talk will cover the key techniques that the research developed, interpreting them in the light of subsequent developments in the field.
9:50	Research Talks (time allocation: 15+5 each) Visual Interactive Neighborhood Mining on High Dimensional Data Emin Aksehirli, Bart Goethals and Emmanuel Müller Creedo---Scalable and Repeatable Extrinsic Evaluation for Pattern Discovery Systems by Online User Studies Mario Boley, Maike Krause-Traudes, Bo Kang and Björn Jacobs Check www.realkd.org for the free open source implementation.
10:30	Coffee
11:00	Research Talks (time allocation: 15+5 each) ISPARK: Interactive Visual Analytics for Fire Incidents and Station Placement Subhajit Das, Andrea McCarter, Joe Minieri, Nandita Damaraju, Sriram Padmanabhan and Duen Horng Chau Interactive Clustering with a High-Performance ML Toolkit Biye Jiang and John Canny Opinion Marks: A Human-Based Computation Approach to Instill Structure into Unstructured Text on the Web Bum Chul Kwon, Jaegul Choo, Sung-Hee Kim, Daniel Keim, Haesun Park and Ji Soo Yi In Search of User Features for Identifying Different Inspection Behaviors on Recommended Items Kibeom Lee, Sangmin Lee and Kyogu Lee
12:20	Lunch
1:45	Re-welcome
1:50	Keynote 2 Prof. Jure Leskovec Stanford University Machine Learning for Human Decision Making Jure Leskovec is assistant professor of Computer Science at Stanford University and chief scientist at Pinterest. His research focuses on mining large social and information networks. Problems he investigates are motivated by large scale data, the Web and on-line media. This research has won several awards including a Microsoft Research Faculty Fellowship, the Alfred P. Sloan Fellowship and numerous best paper awards. Leskovec received his bachelor's degree in computer science from University of Ljubljana, Slovenia, and his PhD in in machine learning from the Carnegie Mellon University and postdoctoral training at Cornell University. Jure also co-founded a machine learning startup Kosei which was recently acquired by Pinterest. You can follow him on Twitter @jure Abstract In many real-life settings human judges are making decisions and choosing among many alternatives in order to label or classify items: Medical doctor diagnosing a patient, criminal court judge making a decision, a crowd-worker labeling an image, and a student answering a multiple-choice question. Gaining insights into human decision making is important for determining the quality of individual decisions as well as identifying mistakes and biases. In this talk we discuss the question of developing machine learning methodology for estimating the quality of individual judges and obtaining diagnostic insights into how various judges decide on different kinds of items. We develop a series of increasingly powerful hierarchical Bayesian models which infer latent groups of judges and items with the goal of obtaining insights into the underlying decision process. We apply our framework to a wide range of real-world domains, and demonstrate that our approach can accurately predict judges decisions, diagnose types of mistakes judges tend to make, and infer true labels of items.
2:40	Research Talks (time allocation: 15+5 each) Empirical Comparison of Active Learning Strategies for Handling Temporal Drift Mohit Kumar, Mohak Shah, Rayid Ghani and Zubin Abraham
3:00	Coffee
3:30	Talks (time allocation: 15+5 each) RedRock: Interactive Analysis and Visualization of Very Large Data Sets Hao Wang and Albith Joel Colon Figueroa Data Mining meets HCI: Making Sense of Big Data Polo Chau Interactive Data Repositories: From Data Sharing to Interactive Data Exploration & Visualization Ryan Rossi and Nesreen Ahmed On a Subject I Like to Talk About Great Unknown
4:10	Posters + Interactive Demo + Networking Session — with Teacakes, croissants, appetizers and drinks —
5:20	Closing

Important Dates

Submission	~~Fri, June 5, 2015, 23:59 Hawaii Time~~
Notification	~~Tue, June 30, 2015~~
Camera-ready	~~Wed, July 15, 2015~~
Workshop	Mon, August 10, 2015

Call for Papers

Topics of interests for the workshop include, but are not limited to:

interactive data mining algorithms
visualizations for interactive data mining
demonstrations of interactive data mining
quick, high-level data analysis methods
any-time data mining algorithms
visual analytics
methods that allow meaningful intermediate results
data surrogates
on-line algorithms
adaptive stream mining algorithms
theoretical/complexity analysis of instant data mining
learning from user input for action replication/prediction
active learning / mining

Submission Information

The IDEA chairs will co-edit a TKDD special issue on Interactive Data Exploration and Analytics. Researchers, including this year’s IDEA authors, will be invited to submit their best work. Stay tuned for more information.

All papers will be peer reviewed, single-blinded. We welcome many kinds of papers, such as (and not limited to):

Novel research papers
Demo papers
Work-in-progress papers
Visionary papers (white papers)

Authors should clearly indicate in their abstracts the kinds of submissions that the papers belong to, to help reviewers better understand their contributions. Submissions must be in PDF, written in English, no more than 10 pages long — shorter papers are welcome — and formatted according to the standard double-column ACM Proceedings Style (Tighter Alternate style).

For accepted papers, at least one author must attend the workshop to present the work.

For paper submission, proceed to the IDEA 2015 submission website.

Keynotes

Prof. Geoff Webb
Monash University
The Knowledge Factory: A Retrospective

Geoff Webb is a Professor of Information Technology Research in the Faculty of Information Technology at Monash University, where he heads the Centre for Data Science. His primary research areas are machine learning, data mining, user modeling and computational structural biology. Many of his learning algorithms are included in the widely-used Weka machine learning workbench. A commercial implementation of his association discovery techniques, Magnum Opus, has been acquired by BigML Inc for inclusion in their cloud-based data mining solution. He was editor-in-chief of the highest impact data mining journal, Data Mining and Knowledge Discovery from 2005 to 2014. He is co-editor of the Springer Encyclopedia of Machine Learning, a member of the advisory board of Statistical Analysis and Data Mining, a member of the editorial board of Machine Learning and was a foundation member of the editorial board of ACM Transactions on Knowledge Discovery from Data. He has been Program Committee Co-Chair of the two top data mining conferences, ACM SIGKDD International Conference on Knowledge Discovery from Data (2015) and the IEEE International Conference on Data Mining (2010) and General Co-Chair of the 2012 IEEE International Conference on Data Mining. He is a technical advisor to BigML, Inc. He is an IEEE Fellow and has received the 2013 IEEE ICDM Service Award and a 2014 Australian Research Council Discovery Outstanding Researcher Award.

Abstract
This talk revisits the first major program of research into interactive rule discovery. The Knowledge Factory is an interactive rule learning system developed in the 1990s. It has many novel features that still remain relevant today. The talk will cover the key techniques that the research developed, interpreting them in the light of subsequent developments in the field.

Prof. Jure Leskovec
Stanford University
Machine Learning for Human Decision Making

Jure Leskovec is assistant professor of Computer Science at Stanford University and chief scientist at Pinterest. His research focuses on mining large social and information networks. Problems he investigates are motivated by large scale data, the Web and on-line media. This research has won several awards including a Microsoft Research Faculty Fellowship, the Alfred P. Sloan Fellowship and numerous best paper awards. Leskovec received his bachelor's degree in computer science from University of Ljubljana, Slovenia, and his PhD in in machine learning from the Carnegie Mellon University and postdoctoral training at Cornell University. Jure also co-founded a machine learning startup Kosei which was recently acquired by Pinterest. You can follow him on Twitter @jure

Abstract
In many real-life settings human judges are making decisions and choosing among many alternatives in order to label or classify items: Medical doctor diagnosing a patient, criminal court judge making a decision, a crowd-worker labeling an image, and a student answering a multiple-choice question. Gaining insights into human decision making is important for determining the quality of individual decisions as well as identifying mistakes and biases. In this talk we discuss the question of developing machine learning methodology for estimating the quality of individual judges and obtaining diagnostic insights into how various judges decide on different kinds of items. We develop a series of increasingly powerful hierarchical Bayesian models which infer latent groups of judges and items with the goal of obtaining insights into the underlying decision process. We apply our framework to a wide range of real-world domains, and demonstrate that our approach can accurately predict judges decisions, diagnose types of mistakes judges tend to make, and infer true labels of items.

IDEA 2014 Keynotes

Prof. Ben Shneiderman
University of Maryland, College Park

Prof. Aditya Parameswaran
University of Illinois (UIUC)

IDEA 2013 Keynotes

Prof. Haesun Park
Georgia Tech

Prof. Marti Hearst
UC Berkeley

Organizers

Polo Chau
Georgia Tech

Jilles Vreeken
Max Planck Institute for Informatics,
and Saarland University

Matthijs van Leeuwen
KU Leuven

Dafna Shahaf
Microsoft Research

Christos Faloutsos
Carnegie Mellon

Sponsors & Supporters

Program Committee

Adam Perer (IBM, USA)
Aditya Parameswaran (UIUC, USA)
Antti Ukkonen (Aalto University, Finland)
Arno Knobbe (Amsterdam U. of Applied Sciences, Netherlands)
Arno Siebes (Universiteit Utrecht, Netherlands)
B. Aditya Prakash (Virginia Tech, USA)
Cody Dunne (IBM Watson, USA)
Danai Koutra (CMU, USA)
Esther Galbrun (Boston University, USA)
Geoff Webb (Monash University, Australia)
George Forman (HP Labs, USA)
Hanghang Tong (Arizona State University)
Hoang-Vu Nguyen (Cluster of Excellence MMCI, Germany)
Jacob Eisenstein (Georgia Tech, USA)
Jaegul Choo (Georgia Tech, USA)
Jefrey Lijffijt (University of Bristol, UK)

Kai Puolomäki (Institute for Occupational Health, Finland)
Leman Akoglu (Stony Brook, USA)
Lisa Singh (Georgetown, USA)
Marti Hearst (U. Berkeley, USA)
Nan Cao (IBM, USA)
Parikshit Ram (Georgia Tech, USA)
Pauli Miettinen (Max-Planck Institute for Informatics, Germany)
Saleema Amershi (Microsoft Research, USA)
Steffen Koch (University of Stuttgart, Germany)
Tijl De Bie (U. Bristol, UK)
Tina Eliassi-Rad (Rutgers, USA)
Tim (Jia-Yu) Pan (Google, USA)
Thomas Gärtner (U. Nottingham, UK)
U Kang (Seoul National University, Korea)
Wouter Duivesteijn (TU Dortmund, Germany)
Zhicheng 'Leo' Liu (Stanford, USA)

What's the IDEA?

We have entered the era of big data. Massive datasets, surpassing terabytes and petabytes, are now commonplace. They arise in numerous settings in science, government, and enterprises. Today, technology exists by which we can collect and store such massive amounts of information. Yet, making sense of these data remains a fundamental challenge. We lack the means to exploratively analyze databases of this scale. Currently, few technologies allow us to freely "wander" around the data, and make discoveries by following our intuition, or serendipity. While standard data mining aims at finding highly interesting results, it is typically computationally demanding and time consuming, thus may not be well-suited for interactive exploration of large datasets.

Interactive data mining techniques that aptly integrate human intuition, by means of visualization and intuitive human-computer interaction (HCI) techniques, and machine computation support have been shown to help people gain significant insights into a wide range of problems. However, as datasets are being generated in larger volumes, higher velocity, and greater variety, creating effective interactive data mining techniques becomes a much harder task.

Our focus and emphasis is on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction. In other words, we intend to explore how the best of these different but related domains can be combined such that the sum is greater than the parts.