Prologue

As the result of my latest research in Information Visualization and Internet Information Retrieval I designed Cocovas, an infovis system that show the similarity and the relevance of a web search resultset.

This page contains information, videos, screenshots and other related material about Cocovas system.

— Ernesto, 2007.

Cocovas

Cocovas is an Information Visualization system for Internet IR systems that uses a novel Visual Metaphor combining the representation of the similarity and the relevance of the results.

A snapshot of Cocovas.

This system was developed for my Master's Thesis at University of Buenos Aires (UBA). You can download my MS Thesis (in Spanish):

I developed a functional prototype in HTML + SVG, that has the most design ideas. This is a Offline Demo [zip - 126k] tested on IE6 and Adobe SVG Viewer 3.0. The source code is available upon request.

You can also download the video [swf - 5.06 MB] and the slides [ppt - 4.47 MB] I used in my MS Thesis defense in December, 2006.

Master's Thesis Abstract

Information retrieval is, structurally, a vague process. The users of an IR system often lack a plan on acquiring the information they need. Besides, this process may involve making several queries, acquiring context knowledge about the information domain and even reconsidering the goal itself before obtaining the desired documents. Moreover, the objectives may vary in a range of specificity: from the search of a specialistic doctor, to know news about to the competition or to do an exhaustive research of some subject.

Regarding Internet IR systems, the opportunities to improve the recovery mechanisms are promising, making possible the application of several tasks from the process of information acquisition. The inclusion of Information Visualization techniques allows to reach greater levels of effectiveness in the exploration of information acquiring novel knowledge and insight of data.

As a proposal, we designed Cocovas: an Information Visualization system for Internet IR systems that uses a novel Visual Metaphor unifying in a single view the representation of the similarity and the relevance of document results.

Keywords: Information Visualization, Document Visualization, Multi-dimensional Visualizations, infovis, User Interfaces, Information Retrieval.

In the beginnings

In February 2002, my friend Sebastián Wain told me about Google Programming Contest and we decided to enter. We did not have plenty of time, so first of all we ruled out IR, caching and performance improvements and we focused on UI Design and Information Visualization.

Eventually we did not meet the deadline and we only had this drawing...

Cocovas in the beginnings.

Implementation notes

To use the web browser as presentation layer turned out to be very useful. The HTML + SVG + DOM manipulation Team is great for prototyping.

In the server side we used GoogleAPI (now Google SOAP Search API) and PyGoogle as a wrapper to do Google searches and retrieve pages from the Google cache. Then we performed the hierarchical clustering with CLUTO.

Finally we merged them all with a set of Perl and Python scripts.

Visual Metaphor

Putting the similarity between documents and the ranking function together in a same view is a great design challenge. Some projects, like WEBSOM, LightHouse and KartOO, had faced this problem before.

The main goal is to find a position for each point and then cluster them. We tried several dimension reduction techniques and graph drawing methods like forced-oriented and spring-embedded without the desired result. At that moment we decided to perform the clustering first and then draw the cluster result. That was a great decision.

A dendrogram is a diagram showing the relationships of items arranged like the branches of a tree. It is common to use it to show the process of a hierarchical clustering method.

A dendrogram example.

The height of each union represents the distance between the two objects being connected. Now we can use this heights to map them to a circle like this:

Cocovas Visual Metaphor: Dendrogram to circle mapping.

The angle difference between a pair of points (i.e: A, B) is the height of first cluster that this pair belong to it (i.e: 0.3); the angle difference between E and A is always 1. (All this values normalized to the circumference). This mapping can also use the difference of cluster's diameters, the linkage effort and other distance function.

Now we add the radius. In this example we use the lexicographic rank:

Cocovas Visual Metaphor: Adding the radius.

The hierarchical clustering techniques let us choose how many cluster to show. Here we use {A, B} and {C, D, E}. Finally, we add closed regions, colors and labels to the sets:

Cocovas Visual Metaphor: Closed regions, colors and labels.

To always get different colors, the closed region's hue is the mean of the cluster's angles in the color circle.

Information Visualization Pipeline Summary

In this section we will describe you the most important items from Cocovas visualization system, following the Data State Reference Model (Ed H. Chi, 2000).

D: Value
We used GoogleAPI (now Google SOAP Search API) and PyGoogle as a wrapper to get a 50-sized resultset.
T: Data Transformation
We performed a hierarchical clustering with CLUTO.
D: Analytical abstraction
We got the cluster tree with the height and descriptive keywords from every node and leaf.
T: Visualization Transformation
We transformed the hierarchical clustering dendrogram into the Cocovas Visual Methaphor.
D: Visualization abstraction
The color points represent the documents. Their radius are related to the ranking.
The closed regions are the clusters. The angle difference between regions are related to cluster height.
T: Visual Mapping Transformation
We had to add extra points to make the convex hull polygons and to solve some other presentation problems.
D: View
The view has the master-detail design style. In this case we put two detail regions: the result's details and the cluster's details. The cocovas's area has a slider control to split & join the clusters in a direct manipulation interaction style.

D: Data stage
T: Transformation stage

Interactions

The interactions are symmetric between the master and detail areas. They are divided in two groups:

  • You can focus on, show & hide document's details and go to URL over the documents.
  • And you can focus on, show & hide cluster's details, show & hide clusters and split & join clusters over the clusters.

You can also refine your query based on cluster's or document's descriptive keywords.

Discuss

To build this diagram we did not use the all-pair distances, but only those that appear in the dendrogram. The diagram do not show the point's similarity but do show the set's similarity. This new relaxed constrained system have much less conditions to satisfy and let us to find a feasible position to all the points and to produce nice drawings.

People that saw the prototype primarily focus on element membership and say anything about set's similarity. They remark much more the ability to discard results from the list. We noticed that and turned the visualization design to that direction. We enriched the membership perception adding the closed regions, colors and labels to the sets; and we included more operations over clusters like splitting and hiding.

Screenshots

Cocovas big picture.

Focus on a document.

Focus on a cluster.

Hide a cluster.

Split a cluster.

Future Work

We have noted that the Visual Metaphor is able to use for other purpose than to show web search results. We think that it can be used to show other structured collections like a programming language documentation, a book of recipes or legal documents.

In addition, Cocovas could be use without the two detailed areas. Thus it would be easy to transform it into a desktop gadget or a mobile application. However the interaction set must be redesigned.

Cocovas on the web

Cocovas Search Visualization

A novel information visualization system of online search results that represents both the similarity between documents & their relevance (ranking) relative to the search query. the color points represent the resulting documents. The closed regions surrounding the point are similarity-based clusters. The angle difference between regions are related to cluster height. The application was implemented based on GoogleAPI (now Google SOAP Search API).

Information Aesthetics, May 2008.

Cocovas - Multi-Domain Representation

Ernesto Mislej is a Teacher Assistant of Information Visualization at the CS Department of University of Buenos Aires, Argentina. For his Master's thesis he developed Cocovas: an Information Visualization system for Internet Information Retrieval (IR) systems. Cocovas uses a novel Visual Metaphor unifying in a single view the representation of the similarity and the relevance of document search results.

Visual Complexity, 2007.

Lic. Ernesto M. Mislej [vCard]
Av. Olazabal 4422 3A
CA Buenos Aires (C1431CGN)
Argentina
(+54-11) 4175-5892
emislej [at] gmail [dot] com

View Ernesto Mislej's profile on LinkedIn

Contents

MS Thesis Resources

References