Discussion Notes SQ 2009 Week 6

Paper 1:

Marchionini, G. 2006. Exploratory search: from finding to understanding. Commun. ACM 49, 4 (Apr. 2006), 41-46.

Contributions:

It makes three important contributions.

- The first contribution is the identification and the definition of the activities involved in exploratory search, namely learning and investigating.

- The second contribution of the paper is that it describes some of the requirements for a tool to support exploratory search.

- The third contribution of the paper is to propose the jointly work between HCI and Information Retrieval, and to show two examples of systems that offer support for exploratory search.

Discussion:

- The author emphasizes the fact that users are not always looking for a precise piece of information on the Internet, as was the case when users interact with Databases. Instead, users sometimes do not know exactly what they are looking for. In some cases, users will be involved in a learning activity when they look for information on the Internet. Users will iterate over the search process and they will evaluate the results presented to them to understand what it is available and use this information to refine their search. In other cases, users will be investigating for information on the Internet. In that case, they will also iterate over the search process but for a longer period of time and the evaluation and comparison of results will be more exhaustive to decide if it is appropriate to integrate one or some returned results into the user’s knowledge or work.

- A tool with a proper support for exploratory search will need to provide the user with a good overview of the results that will guide the users in their exploration activity. The tool should allow high interaction with the user so that the tool and the user will collaboratively lead the path of continuous exploration and refinement of searchers. The user should feel that he/she is in control of the search process.

- We discussed about what is search and if we are constantly searching for information in our daily life. For example, looking for milk in the refrigerator can be considered an information search activity?

- We discussed the query by example technique. This technique has been used for searching for pictures where the user give a picture as an example and the search engine will look for some that will have similar characteristics.

- Similarities between exploratory search and code search:

o Lookup: when developers look for source code to remember code structures or syntax rules

o Learning: when developers are looking for a reference example

o Investigate: when developers are looking for software to reuse

Paper 2:

Sahami, M. et al., "The Happy Searcher: Challenges in Web Information Retrieval", Proc. of the 8th Pacific Rim Intl. Conf. on Artificial Intelligence, 2004.

Contributions:

- Identify issues in web information retrieval and proposes some approaches to tackle them such as web graph, analysis, statistical methods for inferring meaning in text, and the retrieval and analysis of newsgroup postings, images, and sounds.

Discussion:

Information Retrieval on the Web

- Issue: Identify pages of high quality and relevance to a user’s query

- Current approach: PageRank, HITS algorithm

- Suggested Approach: use terms from anchors and surrounding text

Adversarial Classification: Dealing with span on the Web

- Issue: Use of spam to increase the ranking of web pages. One method is to include invisible keywords on the web page so it will match different queries (related or unrelated)

- Suggested Approach: use AI techniques such as Natural Language Processing and Machine Learning

Evaluating Search Results

- Current approach: interleaving the results of two different ranking schemes and using statistical tests based on the results users clicked on to determine which ranking scheme is “better”

- Suggested Approach: automated means for large-scale evaluation of ranking results

Using the Web to Create “Kernels” of Meaning

- Issue: determining the relatedness of fragments of text, even when the fragments may contain few or no terms in common.

- Suggested Approach: future research could help identify more effective text expansion algorithms that are particularly well suited to certain tasks. Also, various methods such as statistical dispersion measures or clustering could be used to identify poor expansions and cases where a text fragment may have an expansion that encompass multiple meanings.

Retrieval of UseNet Articles

- UseNet existed before the web. It allows users to subscribe to a forum, read articles. Google groups is an example of UseNet.

- Challenge: improving ranking methods for UseNet or bulletin board postings

Images and Sound:

- Applied to source code, content detection depends on how you define the features in the source code.