Abstract.- In this work we address the Yelp Restaurant Photo Classification Challenge which consists of predicting attributes of restaurants given its corresponding, variable length, set of images; the restaurant images were provided by Yelp and the labels were annotated by the Yelp Community for the 9 different attributes available.
The multi-instance, multi-label nature of the problem allows us to explore a variety of ideas in the field of representation learning; first we tackle the multi-instance aspect of the problem by means of aggregating high-level CNN features of the image set belonging to a restaurant to create a restaurant feature vector prototype, we then use the restaurant features to train a system of binary classifiers a convenient approach to deal with the multiple possible restaurant attributes.
To improve model performance, we induce a robust representation which consist of extracting the restaurant features through the use of VGG-16 network weights trained on 3 different datasets, namely: Imagenet, Food-101, and MIT Places 2. Due to the fact that not every representation has equal importance for predicting a particular attribute, we add a final classifier that learns the prediction weight of each representation for a given attribute.
Our proposal is summarized as a simple end-to-end system, that achieves a test performance F1-score of 0.8177, placing our model in the top 10% entries for the challenge. Finally, some recommendations for improvement and future work are discussed.
Alumno: Javier Roberto Veloz Centeno
Maestro en Ciencias, Titulado: dd/mm/2018
- Artículo de Congreso MCPR
- Artículo de Congreso COMIA