A picture is worth a thousand words
Learn about image recognition and its value for our marketplaces
What is image recognition?
Up until recently, computers could at best see our world. Advances in artificial intelligence have made it possible for machines to both see and understand our world. Images are data and this data is processed through algorithms that are trained to recognize patterns in that data. Different models can return different results depending on the desired use case.
Why use it in our marketplaces?
Marketplaces are about matchmaking (a.k.a. liquidity); that is bringing buyers sellers together and facilitating a transaction between them. While we are good at this, we can improve! Image recognition can help us in several ways, e.g. improve the quality of our data and the user experience, amongst other things. We have selected some of the most promising use cases and presented these below.
While companies like Google, Amazon, eBay and (in particular) Pinterest have been working on their own solutions for image recognition, and all of them have lately scaled their investments into the technology, Schibsted has not been completely at rest. Some features are already live and more are in the pipeline. Stay tuned!
IN PRODUCTION (BETA) for mobile browsers. On iPhone only Safari is supported.
Our culture is already dominated by visual stimuli. Half of the human brain is (directly or indirectly) devoted to the processing of visual information (MIT, 1996). It seems only natural that search and discovery starts with an image.
Visual search enables quicker search and more accurate results, catering to a better user experience in the search and discovery phases. This drives better matchmaking and is the fuel of success for our marketplaces and one of Schibsted’s strategic pillars. More relevant use cases are presented below.
IN PRODUCTION for all categories on Blocket.
Sometimes the sellers on our marketplaces place their ad in the first category that comes to mind and that category is not always the correct one. Albeit, one can always improve the category taxonomy, it could also be solved with smart suggestions based on image recognition.
The category suggestion service of the Cognition team helps people select the right category for their ads during the classified ad submission. The user snaps a picture and the service provides suggestions for which category the ad could belong in. This removes friction and shortens ad insertion time, and ensures that ads are categorized correctly so they are easy to find.
Recommend visually similar ads
IN PRODUCTION for all categories on FINN Torget.
Image recognition can capture remarkably more facets of an image than what a human is able to articulate in a text search or what collaborative filtering (a common recommendation method) is able to account for. Thus recommendations based on visual similarity can be more relevant, which makes it easier for a buyer to find exactly what she is looking for.
Another useful distinction between recommendations based on collaborative filtering versus visual similarity is that while the former tends to favor recently published ads (because of users' behavior), the latter is often indifferent to the age of the ads (as similarity trumps age). This means recommendations based on visual similarity can give new life to older ads and hopefully provide them with a new owner.
Meta data enrichment
PROTOTYPE and winner of the jury prize at FINN Hackdays May 2019.
From FINN Insights we know that some sellers «forget» to include significant information about their object in the ad. Meanwhile, some buyers find it hard to find exactly what they are looking for even though it might exist on the marketplace.
We cut out the humans from the equation and jump straight to automation. Image recognition allows us to populate an ad with additional meta data which makes it more «searchable» and consequently easier to find.
That way we boost liquidity - and both seller and buyer are happy!
What's more, it could even help valuate an object and autogenerate relevant alt text for images which is great for accessibility and SEO.
Cropped visual search
CONCEPT inspired by Pinterest and subject to further exploration.
Much like Google, our platforms started out as a place to search for and find stuff. We have learned that users might need help and we have put significant resources into discovery features like personalized feeds and similar results.
Over the past few years though, we have seen a rise of inspirational platforms like social media and Pinterest. Instagram and Pinterest are incredibly good at monetizing the search for inspiration. So why are we not doing the same?
We know that a large amount of marketplace users browse the real-estate vertical for inspiration or just to dream. Why not help them find the things they like on our generalist marketplace or at Prisjakt?
An engineer on the Data Intelligence team at FINN explains...
The algorithm is designed to extract as much information as possible from each image. Leveraging FINN.no's large database of classified ads, the model generates attributes such as category, sub-category, color of object and the title. Although generating natural language is noisy, we realized that generating realistic titles was important since categories (like "kjole") are too general.
Specifically, the algorithm is a deep neural network that first "encodes" the image pixels and then multiple "decoders" that predicts each attribute. The encoder is a deep convolutional network (resnet-152), pre-trained on the Imagenet dataset and further trained on FINN.no data. The decoders are fully connected layers for the attributes and a Recurrent Neural Network with pre-trained Word2Vec word embeddings for the title generation. After training we utilize the feature-layer in between the encoder and the decoder. This is a very high dimensional universe where we can calculate distances between all images. The images that are closest to the image the user took is our recommendation. For more details, the work was published at Recsys 2018 conference.
Using this algorithm for recommendations is actually just a bi-product of the model! We never actually use the predicted category, color or generated title. However, there are many opportunities to use this elsewhere, e.g. pre-population of ads in ad insertion (e.g. «meta data enrichment»).
When you do a visual search, your picture is first vectorized into a high dimensional space. Then we compare the vector of your picture to the vectors of main image in all other ads and present you with the top visually similar ads. That's at least the simple version.
Now you might wonder what a neural network is?
This is the simplest explanation of a basic neural network that we could find: