Version 1 Q and A:
How did you use it?
How did it work?
The bookmark button that the user clicks is JavaScript code that goes through all the IFrames recursively and collects all the HTML currently on the page (xml requests can change that). The browser then sends the information (via xml request / ajax) to a Sale.com server. The server then uses auto generated regular expressions that cuts out the description, picture and price and constructs a Sale.com product. The product was then inserted into a database and linked to the person's favorite list.
How did you generate regular expressions automatically?
We signed up for Affiliate Marketing Services like Link Share to gather product information on a fraction of products for various sites. So for a site like New Egg, we might have products for 5% of the site. We then used selenium to mimic a user saving a product we already knew about to construct regular expressions. This proved to be tricky with sites like amazon who's product template changed slightly for several types of products and I was currently working on this part when Sale.com changed direction from tech based to style based.
How would version 2 have worked?
After we had the regular expressions generated for each site, crawling them with selenium and bookmarking every product found would have been a one or two day task and that would have completed the product crawler.
How would I do it now that I'm an adult?
I'd probably take out selenium altogether. It was necessary for an alpha version but in all reality it's not needed and takes up too much resources for an enterprise utility. Instead, I'd parse each file I got back for urls and recursively go to those urls getting the html of them and crawling that way. I'd parse out all the .gifs and other media and just request for html and other valid extensions.
Go here for the proof of concept
http://www.arjaywaran.com/proofOfConcept.html
Credit:
Credit for a bookmark product like idea goes to Viven Rico
Credit for designing the user interface goes to the CTO Jon Choi