The Interactive Data Exploration and Analytics (IDEA) workshop addresses the development of data mining techniques that allow users to interactively explore their data. We focus and emphasize on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction (HCI). In other words, we explore how the best of these different but related domains can be combined such that the sum is greater than the parts.
IDEA will be a full-day workshop on Sunday, Aug 11, at ACM SIGKDD 2013 at Sheraton Chicago Hotel and Towers (map), in "Michigan A" room.
Prof. Haesun Park
School of Computational Science & Engineering
Interactive Visual Analytics for High Dimensional Data
Prof. Haesun Park received her B.S. degree in Mathematics from Seoul National University, Seoul Korea, in 1981 with summa cum laude and the University President's Medal for the top graduate, and her M.S. and Ph.D. degrees in Computer Science from Cornell University, Ithaca, NY, in 1985 and 1987, respectively. She has been a professor in the School of Computational Science and Engineering at the Georgia Institute of Technology, Atlanta, Georgia since 2005. Before joining Georgia Tech, she was on faculty at University of Minnesota, Twin Cities, and program director at the National Science Foundation, Arlington, VA. She has published extensively in the areas including numerical algorithms, data analysis, visual analytics, text mining, and parallel computing. She has been the director of the NSF/DHS FODAVA-Lead (Foundations of Data and Visual Analytics) center and executive director of Center for Data Analytics at Georgia Tech. She has served on numerous editorial boards including IEEE Transactions on Pattern Analysis and Machine Intelligence, SIAM Journal on Matrix Analysis and Applications, SIAM Journal on Scientific Computing, and has served as a conference co-chair for SIAM International Conference on Data Mining in 2008 and 2009. In 2013, she was elected as a SIAM Fellow.
Many modern data sets can be represented in high dimensional vector spaces and have benefited from computational methods that utilize advanced techniques from numerical linear algebra and optimization. Visual analytics approaches have contributed greatly to data understanding and analysis due to utilization of both automated algorithms and human’s quick visual perception and interaction. However, visual analytics targeting high dimensional large-scale data has been challenging due to low dimensional screen space with limited pixels to represent data. Among various computational techniques supporting visual analytics, dimension reduction and clustering have played essential roles by reducing the dimension and volume to visually manageable scales.
In this talk, we present some of the key foundational methods for supervised dimension reduction such as linear discriminant analysis (LDA), dimension reduction and clustering/topic discovery by nonnegative matrix factorization (NMF), and visual spatial alignment for effective fusion and comparisons by Orthogonal Procrustes. We demonstrate how these methods can effectively support interactive visual analytic tasks that involve large-scale document and image data sets.
Talks (time allocation: 20, 20, 20, 15, 15)
Prof. Marti Hearst
School of Information
Exploratory Text Analysis and The Middle Distance
Dr. Marti Hearst is a professor in the UC Berkeley School of Information. She received BA, MS, and PhD degrees in Computer Science from UC Berkeley and was a Member of the Research Staff at Xerox PARC from 1994 to 1997.
A primary focus of Dr. Hearst's research is user interfaces for search, and she is the author of the 2009 book Search User Interfaces. She has invented or participated in several well-known search interface projects including the Flamenco project that investigated and the promoted the use of faceted metadata for collection navigation, TileBars query term visualization, BioText search over the bioscience literature, and Scatter/Gather clustering of search results. She has also researched extensively in computational linguistics and text mining with a focus on detecting semantic relations, and text segmentation including discourse boundaries and abbreviation recognition. Her more recent research interests include user interfaces for the exploratory text analysis in the digital humanities and peer learning in MOOCS.
In this talk I will describe a project whose goal is to help scholars and analysts discover patterns and formulate and test hypotheses about the contents of text collections, midway between what humanities scholars call a traditional "close read" and the new "distant read" or "culturomics" approach. To this end, we describe a text analysis and discovery tool called WordSeer that allows for highly flexible "slicing and dicing" (hence "sliding") across a text collection. We illustrate the text sliding capabilities of the tool with two real-world case studies from the humanities and social sciences – the practice of literacy education, and U.S. perceptions of China and Japan over the last 30 years – showing how the tool has enabled scholars with no technical background to make new discoveries in these text collections. (Joint work with Aditi Muralidharan. Sponsored by NEH HK-50011.)
|3:00||Talks (time allocation: 15, 15)
|4:00||Talks (time allocation: 15, 15, 15, 15)
We have entered the era of big data. Massive datasets, surpassing terabytes and petabytes, are now commonplace. They arise in numerous settings in science, government, and enterprises. Today, technology exists by which we can collect and store such massive amounts of information. Yet, making sense of these data remains a fundamental challenge. We lack the means to exploratively analyze databases of this scale. Currently, few technologies allow us to freely "wander" around the data, and make discoveries by following our intuition, or serendipity. While standard data mining aims at finding highly interesting results, it is typically computationally demanding and time consuming, thus may not be well-suited for interactive exploration of large datasets.
Interactive data mining techniques that aptly integrate human intuition, by means of visualization and intuitive human-computer interaction (HCI) techniques, and machine computation support have been shown to help people gain significant insights into a wide range of problems. However, as datasets are being generated in larger volumes, higher velocity, and greater variety, creating effective interactive data mining techniques becomes a much harder task.
Our focus and emphasis is on interactivity and effective integration of techniques from data mining, visualization and human-computer interaction. In other words, we intend to explore how the best of these different but related domains can be combined such that the sum is greater than the parts.
All papers will be peer reviewed, single-blinded. We welcome many kinds of papers, such as (and not limited to):
Authors should clearly indicate in their abstracts the kinds of submissions that the papers belong to, to help reviewers better understand their contributions. Submissions must be in PDF, written in English, no more than 9 pages long — shorter papers are welcome — and formatted according to the standard double-column ACM Proceedings Style (Tighter Alternate style).
For accepted papers, at least one author must attend the workshop to present the work. Accepted papers will be included in the ACM SIGKDD 2013 Digital Proceedings, as well as made available in the ACM Digital Library.
For paper submission, proceed to the IDEA 2013 submission website.