Bibliography | Lin, Tsung-Han: Development of a wo-dimensional visual representation for interactive adjustment of classifiers for multivariate data. University of Stuttgart, Faculty of Computer Science, Electrical Engineering, and Information Technology, Diploma Thesis No. 2836 (2009). 74 pages, english.
|
Abstract | Even though automatic algorithms or computer programs quantitatively have good performance of processing a huge amount of data in a pattern classification task, they practically still cannot correctly and efficiently make every decision yet just as humans do. Cooperation between humans and computers can be achieved by either involving human interference in executing automatic computer programs or supporting human activities by computer programs. For such motivation we will suggest an interactive visualization system with scalable projection strategies and user interactions to actively involve a user in solving pattern classification problems, where we especially focus on massive multivariate (or high dimensional) data. After reviewing some related works we summarize some patterns for integrating visualization and classification processes. Instead of regarding a visualization process merely as pre-processing or post-processing of a classification process, we treat those two processes as the same one with only different data abstraction. To have suitable visual representations we introduce some techniques which project high dimensional data to a 2D or 3D perceivable space. Those techniques, for example, are MDS, FDP, FastMap, ISOMAP, LLE, LSP. We will suggest a scalable projection strategy to mainly project data as scatter points. The implementation of our projection strategy can be achieved by integrating different distance-preserving techniques and employs especially the techniques capable of preserving global and local structure, such as LSP. To avoid high complexity and to simplify projection layouts we use triangulation techniques in the 2D space. With our projection strategy a user can incrementally and hierarchically visualize data in a 2D space, in that the whole massive data set is not necessarily to be projected at once. This scalable projection strategy is achieved by data clustering resulting from user decisions or computer decisions, namely, automatic algorithms. The user decisions are achieved by searching. A user can perform searching with queries among a data set and can subsequently obtain different data clustering as well as projection layouts; therefore this strategy is very user-oriented and highly related to information retrieval especially when the data set is a textual corpus. Based on the projection strategy we also address some user interactions to let a user not only visually explore data but also directly rearrange data in visual form; such user interference can especially improve a supervised classification task. Moreover, we address a design pattern for interactions, where a user can interactively modify data, features (of vector space model), dissimilarity measurements and layout algorithms. The application of our work deals with patent analysis. To process textual patent documents we employ basic techniques of information retrieval, e.g. VSM, tf-idf, stemming. Moreover, we use WordNet to improve lexical analysis and use extracted concept terms to represent semantic meaning.
|