Interactive Data Exploration and Analytics (IDEA)

IDEA 2013 was a great success! Join us at IDEA 2014 at KDD in New York City on August 24, 2014! Submissions due June 20, 2014.

The Interactive Data Exploration and Analytics (IDEA) workshop addresses the development of data mining techniques that allow users to interactively explore their data. We focus and emphasize on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction (HCI). In other words, we explore how the best of these different but related domains can be combined such that the sum is greater than the parts.

Program & Attending IDEA

IDEA will be a full-day workshop on Sunday, Aug 11, at ACM SIGKDD 2013 at Sheraton Chicago Hotel and Towers (map), in "Michigan A" room.

9:00	Welcome
9:10	Keynote 1 Prof. Haesun Park Georgia Tech School of Computational Science & Engineering Interactive Visual Analytics for High Dimensional Data Prof. Haesun Park received her B.S. degree in Mathematics from Seoul National University, Seoul Korea, in 1981 with summa cum laude and the University President's Medal for the top graduate, and her M.S. and Ph.D. degrees in Computer Science from Cornell University, Ithaca, NY, in 1985 and 1987, respectively. She has been a professor in the School of Computational Science and Engineering at the Georgia Institute of Technology, Atlanta, Georgia since 2005. Before joining Georgia Tech, she was on faculty at University of Minnesota, Twin Cities, and program director at the National Science Foundation, Arlington, VA. She has published extensively in the areas including numerical algorithms, data analysis, visual analytics, text mining, and parallel computing. She has been the director of the NSF/DHS FODAVA-Lead (Foundations of Data and Visual Analytics) center and executive director of Center for Data Analytics at Georgia Tech. She has served on numerous editorial boards including IEEE Transactions on Pattern Analysis and Machine Intelligence, SIAM Journal on Matrix Analysis and Applications, SIAM Journal on Scientific Computing, and has served as a conference co-chair for SIAM International Conference on Data Mining in 2008 and 2009. In 2013, she was elected as a SIAM Fellow. Many modern data sets can be represented in high dimensional vector spaces and have benefited from computational methods that utilize advanced techniques from numerical linear algebra and optimization. Visual analytics approaches have contributed greatly to data understanding and analysis due to utilization of both automated algorithms and human’s quick visual perception and interaction. However, visual analytics targeting high dimensional large-scale data has been challenging due to low dimensional screen space with limited pixels to represent data. Among various computational techniques supporting visual analytics, dimension reduction and clustering have played essential roles by reducing the dimension and volume to visually manageable scales. In this talk, we present some of the key foundational methods for supervised dimension reduction such as linear discriminant analysis (LDA), dimension reduction and clustering/topic discovery by nonnegative matrix factorization (NMF), and visual spatial alignment for effective fusion and comparisons by Orthogonal Procrustes. We demonstrate how these methods can effectively support interactive visual analytic tasks that involve large-scale document and image data sets.
10:00	Coffee
10:30	Talks (time allocation: 20, 20, 20, 15, 15) Zips: Mining Compressing Sequential Patterns in Streams Hoang Thanh Lam, Toon Calders, Jie Yang, Fabian Moerchen and Dmitriy Fradkin Methods for Exploring and Mining Tables on Wikipedia Chandra Sekhar Bhagavatula, Thanapon Noraset and Doug Downey One Click Mining—Interactive Local Pattern Discovery through Implicit Preference and Performance Learning Mario Boley, Bo Kang, Pavel Tokmakov, Michael Mampaey and Stefan Wrobel Online Spatial Data Analysis and Visualization System Yun Lu, Mingjin Zhang, Tao Li, Yudong Guang and Naphtali Rishe Building Blocks for Exploratory Data Analysis Tools Sara Alspaugh, Archana Ganapathi, Marti Hearst and Randy Katz
12:00	Lunch
2:00	Re-welcome
2:10	Keynote 2 Prof. Marti Hearst UC Berkeley School of Information Exploratory Text Analysis and The Middle Distance Dr. Marti Hearst is a professor in the UC Berkeley School of Information. She received BA, MS, and PhD degrees in Computer Science from UC Berkeley and was a Member of the Research Staff at Xerox PARC from 1994 to 1997. A primary focus of Dr. Hearst's research is user interfaces for search, and she is the author of the 2009 book Search User Interfaces. She has invented or participated in several well-known search interface projects including the Flamenco project that investigated and the promoted the use of faceted metadata for collection navigation, TileBars query term visualization, BioText search over the bioscience literature, and Scatter/Gather clustering of search results. She has also researched extensively in computational linguistics and text mining with a focus on detecting semantic relations, and text segmentation including discourse boundaries and abbreviation recognition. Her more recent research interests include user interfaces for the exploratory text analysis in the digital humanities and peer learning in MOOCS. In this talk I will describe a project whose goal is to help scholars and analysts discover patterns and formulate and test hypotheses about the contents of text collections, midway between what humanities scholars call a traditional "close read" and the new "distant read" or "culturomics" approach. To this end, we describe a text analysis and discovery tool called WordSeer that allows for highly flexible "slicing and dicing" (hence "sliding") across a text collection. We illustrate the text sliding capabilities of the tool with two real-world case studies from the humanities and social sciences – the practice of literacy education, and U.S. perceptions of China and Japan over the last 30 years – showing how the tool has enabled scholars with no technical background to make new discoveries in these text collections. (Joint work with Aditi Muralidharan. Sponsored by NEH HK-50011.)
3:00	Talks (time allocation: 15, 15) A Process-Centric Data Mining and Visual Analytic Tool for Exploring Complex Social Networks Denis Dimitrov, Lisa Singh and Janet Mann Lytic: Synthesizing High-Dimensional Algorithmic Analysis with Domain-Agnostic, Faceted Visual Analytics Edward Clarkson, Jaegul Choo, John Turgeson, Ray Decuir and Haesun Park
3:30	Coffee
4:00	Talks (time allocation: 15, 15, 15, 15) Towards Anytime Active Learning: Interrupting Experts to Reduce Annotation Costs Maria Ramirez, Aron Culotta and Mustafa Bilgic Randomly Sampling Maximal Itemsets Sandy Moens and Bart Goethals Augmenting MATLAB with Semantic Objects for an Interactive Visual Environment Changhyun Lee, Jaegul Choo, Haesun Park and Duen Horng Chau Storygraphs: Extracting patterns from spatio-temporal data Ayush Shrestha, Ying Zhu, Ben Miller and Yi Zhao
5:00	Closing

Two useful links:

Keynotes

Prof. Marti Hearst
UC Berkeley
School of Information
Exploratory Text Analysis and The Middle Distance

Dr. Marti Hearst is a professor in the UC Berkeley School of Information. She received BA, MS, and PhD degrees in Computer Science from UC Berkeley and was a Member of the Research Staff at Xerox PARC from 1994 to 1997.

A primary focus of Dr. Hearst's research is user interfaces for search, and she is the author of the 2009 book Search User Interfaces. She has invented or participated in several well-known search interface projects including the Flamenco project that investigated and the promoted the use of faceted metadata for collection navigation, TileBars query term visualization, BioText search over the bioscience literature, and Scatter/Gather clustering of search results. She has also researched extensively in computational linguistics and text mining with a focus on detecting semantic relations, and text segmentation including discourse boundaries and abbreviation recognition. Her more recent research interests include user interfaces for the exploratory text analysis in the digital humanities and peer learning in MOOCS.

In this talk I will describe a project whose goal is to help scholars and analysts discover patterns and formulate and test hypotheses about the contents of text collections, midway between what humanities scholars call a traditional "close read" and the new "distant read" or "culturomics" approach. To this end, we describe a text analysis and discovery tool called WordSeer that allows for highly flexible "slicing and dicing" (hence "sliding") across a text collection. We illustrate the text sliding capabilities of the tool with two real-world case studies from the humanities and social sciences – the practice of literacy education, and U.S. perceptions of China and Japan over the last 30 years – showing how the tool has enabled scholars with no technical background to make new discoveries in these text collections. (Joint work with Aditi Muralidharan. Sponsored by NEH HK-50011.)

Prof. Haesun Park
Georgia Tech
School of Computational Science & Engineering
Interactive Visual Analytics for High Dimensional Data

Prof. Haesun Park received her B.S. degree in Mathematics from Seoul National University, Seoul Korea, in 1981 with summa cum laude and the University President's Medal for the top graduate, and her M.S. and Ph.D. degrees in Computer Science from Cornell University, Ithaca, NY, in 1985 and 1987, respectively. She has been a professor in the School of Computational Science and Engineering at the Georgia Institute of Technology, Atlanta, Georgia since 2005. Before joining Georgia Tech, she was on faculty at University of Minnesota, Twin Cities, and program director at the National Science Foundation, Arlington, VA. She has published extensively in the areas including numerical algorithms, data analysis, visual analytics, text mining, and parallel computing. She has been the director of the NSF/DHS FODAVA-Lead (Foundations of Data and Visual Analytics) center and executive director of Center for Data Analytics at Georgia Tech. She has served on numerous editorial boards including IEEE Transactions on Pattern Analysis and Machine Intelligence, SIAM Journal on Matrix Analysis and Applications, SIAM Journal on Scientific Computing, and has served as a conference co-chair for SIAM International Conference on Data Mining in 2008 and 2009. In 2013, she was elected as a SIAM Fellow.

Many modern data sets can be represented in high dimensional vector spaces and have benefited from computational methods that utilize advanced techniques from numerical linear algebra and optimization. Visual analytics approaches have contributed greatly to data understanding and analysis due to utilization of both automated algorithms and human’s quick visual perception and interaction. However, visual analytics targeting high dimensional large-scale data has been challenging due to low dimensional screen space with limited pixels to represent data. Among various computational techniques supporting visual analytics, dimension reduction and clustering have played essential roles by reducing the dimension and volume to visually manageable scales.

In this talk, we present some of the key foundational methods for supervised dimension reduction such as linear discriminant analysis (LDA), dimension reduction and clustering/topic discovery by nonnegative matrix factorization (NMF), and visual spatial alignment for effective fusion and comparisons by Orthogonal Procrustes. We demonstrate how these methods can effectively support interactive visual analytic tasks that involve large-scale document and image data sets.

Organizers

Polo Chau
Georgia Tech

Jilles Vreeken
Universiteit Antwerpen

Matthijs van Leeuwen
KU Leuven

Christos Faloutsos
Carnegie Mellon

Poster

Program Committee

Adam Perer (IBM, USA)
Albert Bifet (Yahoo! Labs, Barcelona, Spain)
Aris Gionis (Aalto University, Finland)
Arno Knobbe (U Leiden)
Chris Johnson (University of Utah, USA)
Cody Dunne (UMD, USA)
David Gotz (IBM, USA)
Geoff Webb (Monash University, Australia)
George Forman (HP Labs)
Hanghang Tong (City University of New York)
Jaakko Hollmen (Aalto University, Finland)
Jacob Eisenstein (Georgia Tech)
Jaegul Choo (Georgia Tech)
Jiawei Han (University of Illinois at Urbana-Champaign)
Jimeng Sun (IBM, USA)
John Stasko (Georgia Tech)
Kai Puolamäki (Finnish Institute of Occupational Health, Finland)
Katharina Morik (TU.Dortmund)

Kayur Patel (Google)
Leman Akoglu (Stony Brook University)
Mario Boley (Fraunhofer IAIS, University of Bonn)
Marti Hearst (UC Berkeley, USA)
Martin Theobald (University of Antwerp, Belgium)
Nan Cao (IBM, USA)
Naren Ramakrishnan (Virginia Tech, USA)
Nikolaj Tatti (Aalto University, Finland)
Parikshit Ram (Georgia Tech, USA)
Pauli Mietinnen (Max Planck Institute for Informatics, Germany)
Saleema Amershi (Microsoft Research)
Tijl De Bie (University of Bristol, UK)
Tim (Jia-Yu) Pan (Google)
Tina Eliassi-Rad (Rutgers)
Tino Weinkauf (Max Planck Institute for Informatics, Germany)
Toon Calders (Université Libre de Bruxelles, Belgium)
Zhicheng 'Leo' Liu (Stanford)

What's the IDEA?

We have entered the era of big data. Massive datasets, surpassing terabytes and petabytes, are now commonplace. They arise in numerous settings in science, government, and enterprises. Today, technology exists by which we can collect and store such massive amounts of information. Yet, making sense of these data remains a fundamental challenge. We lack the means to exploratively analyze databases of this scale. Currently, few technologies allow us to freely "wander" around the data, and make discoveries by following our intuition, or serendipity. While standard data mining aims at finding highly interesting results, it is typically computationally demanding and time consuming, thus may not be well-suited for interactive exploration of large datasets.

Interactive data mining techniques that aptly integrate human intuition, by means of visualization and intuitive human-computer interaction (HCI) techniques, and machine computation support have been shown to help people gain significant insights into a wide range of problems. However, as datasets are being generated in larger volumes, higher velocity, and greater variety, creating effective interactive data mining techniques becomes a much harder task.

Our focus and emphasis is on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction. In other words, we intend to explore how the best of these different but related domains can be combined such that the sum is greater than the parts.

Call for Papers

Topics of interests for the workshop include, but are not limited to:

interactive data mining algorithms
visualizations for interactive data mining
demonstrations of interactive data mining
quick, high-level data analysis methods
any-time data mining algorithms
visual analytics
methods that allow meaningful intermediate results
data surrogates
on-line algorithms
adaptive stream mining algorithms
theoretical/complexity analysis of instant data mining
learning from user input for action replication/prediction
active learning / mining

Important Dates

Submission	~~Fri, June 7, 23:59 Hawaii time (HST)~~
Notification	~~Mon, June 24~~
Camera-ready	~~Wed, July 3~~
Workshop	~~Sun, August 11~~

Submission Information

All papers will be peer reviewed, single-blinded. We welcome many kinds of papers, such as (and not limited to):

Novel research papers
Demo papers
Work-in-progress papers
Visionary papers (white papers)

Authors should clearly indicate in their abstracts the kinds of submissions that the papers belong to, to help reviewers better understand their contributions. Submissions must be in PDF, written in English, no more than 9 pages long — shorter papers are welcome — and formatted according to the standard double-column ACM Proceedings Style (Tighter Alternate style).

For accepted papers, at least one author must attend the workshop to present the work. Accepted papers will be included in the ACM SIGKDD 2013 Digital Proceedings, as well as made available in the ACM Digital Library.

For paper submission, proceed to the IDEA 2013 submission website.