Discussion Notes SQ 2009 Week 3

"Paper:

Hoffmann, R., Fogarty, J., and Weld, D. S. 2007.

Assieme: finding and leveraging implicit references in a web search interface for programmers.

In Proceedings of the 20th Annual ACM Symposium

on User interface Software and Technology

(Newport, Rhode Island, USA, October 07 - 10, 2007).

UIST '07. ACM, New York, NY, 13-22.

Discussion:

We discussed the result in the search result page.

- It shows "57 examples", but we do not know where it comes from. web or javadoc.

- They stated that using relevance examples as ranking, but the results were not sorted.

- They used javadoc to associate with source code with repository.

implementation part:

- The paper was published in UI field, instead of algorithm field.

- To find the word "Java" in using MSN search engine logs, it might not have good result.

- They used MSN sessions to analysis what users really do search. Soucerer also has records of search session too.

- Using logs to analyze data .. ex. Scott Klemmer's paper.

- Open Source Repository is not the whole source code in the world, there's still snippets, example, tutorial, etc.

- "Scanner-less parsing", Software Engineering Radio podcast, by EEL COVISSER ,

It does not break the document into words/tokens, but it is able to search inside the document that has combination of both natural language and code.

- To distinguish Java, C++ , they used Eclipse parser including in Compilation unit.

Misc:

- There is no interesting words in source code.

Comments are very noisy, but identifiers are better.

why use identifiers, not comments?

- identifiers use constraint words , comments use loosing words

- Comments can be.. ex. "Judy wrote this".. "Fixed" .. "Problem from meeting1" .. "section 29"

- words in comment usually are synonyms

If not ...

- for example using "output acrobat" to search

there is no "output acrobat" as java name (package, class, method) but it is in the comment.

- If users see the result that has searched terms appear there,

users can have more trust in the results rather than no any searched term appears in the result.

And developers can prove that why the result is there.

- Google Code Search is not better than Google in finding code.

Most users still prefer to use Google to search for source code.

Moreover, Google uses synonym words while searching to gain more possible matches.