A picture is worth a thousand words

Learn about image recognition and its value for our marketplaces

by Schibsted Tech Experiments and FINN Data Intelligence

What is image recognition?

Up until recently, computers could at best see our world. Advances in artificial intelligence have made it possible for machines to both see and understand our world. Images are data and this data is processed through algorithms that are trained to recognize patterns in that data. Different models can return different results depending on the desired use case.

Why use it in our marketplaces?

Marketplaces are about matchmaking (a.k.a. liquidity); that is bringing buyers sellers together and facilitating a transaction between them. While we are good at this, we can improve! Image recognition can help us in several ways, e.g. improve the quality of our data and the user experience, amongst other things. We have selected some of the most promising use cases and presented these below.

62% of millennials want visual search over any other new technology.

A group of millennials holding up their mobile phones and smiling.

Use cases

While companies like Google, Amazon, eBay and (in particular) Pinterest have been working on their own solutions for image recognition, and all of them have lately scaled their investments into the technology, Schibsted has not been completely at rest. Some features are already live and more are in the pipeline. Stay tuned!

Placeholder for visual search demo. This one you should try for yourself.

Visual search

IN PRODUCTION (BETA) for mobile browsers. On iPhone only Safari is supported.

Our culture is already dominated by visual stimuli. Half of the human brain is (directly or indirectly) devoted to the processing of visual information (MIT, 1996). It seems only natural that search and discovery starts with an image.

Visual search enables quicker search and more accurate results, catering to a better user experience in the search and discovery phases. This drives better matchmaking and is the fuel of success for our marketplaces and one of Schibsted’s strategic pillars. More relevant use cases are presented below.

TRY FOR YOURSELF! (Click the link on a smartphone and open in browser)

Category suggestions

IN PRODUCTION for all categories on Blocket.

Sometimes the sellers on our marketplaces place their ad in the first category that comes to mind and that category is not always the correct one. Albeit, one can always improve the category taxonomy, it could also be solved with smart suggestions based on image recognition.

The category suggestion service of the Cognition team helps people select the right category for their ads during the classified ad submission. The user snaps a picture and the service provides suggestions for which category the ad could belong in. This removes friction and shortens ad insertion time, and ensures that ads are categorized correctly so they are easy to find.

Do you have further questions? Reach out to Erik Ingman or Eivind Throndsen.

Demonstration of feature "recommend visually similar ads."

Recommend visually similar ads

IN PRODUCTION for all categories on FINN Torget.

Image recognition can capture remarkably more facets of an image than what a human is able to articulate in a text search or what collaborative filtering (a common recommendation method) is able to account for. Thus recommendations based on visual similarity can be more relevant, which makes it easier for a buyer to find exactly what she is looking for.

Another useful distinction between recommendations based on collaborative filtering versus visual similarity is that while the former tends to favor recently published ads (because of users' behavior), the latter is often indifferent to the age of the ads (as similarity trumps age). This means recommendations based on visual similarity can give new life to older ads and hopefully provide them with a new owner.

Meta data enrichment

PROTOTYPE and winner of the jury prize at FINN Hackdays May 2019.

From FINN Insights we know that some sellers «forget» to include significant information about their object in the ad. Meanwhile, some buyers find it hard to find exactly what they are looking for even though it might exist on the marketplace.

We cut out the humans from the equation and jump straight to automation. Image recognition allows us to populate an ad with additional meta data which makes it more «searchable» and consequently easier to find.

That way we boost liquidity - and both seller and buyer are happy!

What's more, it could even help valuate an object and autogenerate relevant alt text for images which is great for accessibility and SEO.

Demonstration of meta data enrichment.
Demonstration of cropped visual search.

Cropped visual search

CONCEPT inspired by Pinterest and subject to further exploration.

Much like Google, our platforms started out as a place to search for and find stuff. We have learned that users might need help and we have put significant resources into discovery features like personalized feeds and similar results.

Over the past few years though, we have seen a rise of inspirational platforms like social media and Pinterest. Instagram and Pinterest are incredibly good at monetizing the search for inspiration. So why are we not doing the same?

We know that a large amount of marketplace users browse the real-estate vertical for inspiration or just to dream. Why not help them find the things they like on our generalist marketplace or at Prisjakt?

An engineer on the Data Intelligence team at FINN explains...

The algorithm is designed to extract as much information as possible from each image. Leveraging FINN.no's large database of classified ads, the model generates attributes such as category, sub-category, color of object and the title. Although generating natural language is noisy, we realized that generating realistic titles was important since categories (like "kjole") are too general.

Specifically, the algorithm is a deep neural network that first "encodes" the image pixels and then multiple "decoders" that predicts each attribute. The encoder is a deep convolutional network (resnet-152), pre-trained on the Imagenet dataset and further trained on FINN.no data. The decoders are fully connected layers for the attributes and a Recurrent Neural Network with pre-trained Word2Vec word embeddings for the title generation. After training we utilize the feature-layer in between the encoder and the decoder. This is a very high dimensional universe where we can calculate distances between all images. The images that are closest to the image the user took is our recommendation. For more details, the work was published at Recsys 2018 conference.

Using this algorithm for recommendations is actually just a bi-product of the model! We never actually use the predicted category, color or generated title. However, there are many opportunities to use this elsewhere, e.g. pre-population of ads in ad insertion (e.g. «meta data enrichment»).

When you do a visual search, your picture is first vectorized into a high dimensional space. Then we compare the vector of your picture to the vectors of main image in all other ads and present you with the top visually similar ads. That's at least the simple version.

Now you might wonder what a neural network is?

This is the simplest explanation of a basic neural network that we could find:

Still curious? Reach out to us!

Image of contact person Arber Zagragja.

Arber Zagragja

Schibsted Tech Experiments

Image of contact person Eivind Throndsen.

Eivind Throndsen

Schibsted Cognition team

Image of contact person Per Eirik Marton.

Per Eirik Marton

FINN Data Intelligence

Image of contact person Benjamni Weima Lager.

Benjamin Weima Lager

FINN Data Intelligence

#computer-vision