Implementation and Results

Approach & Implementation

Our project implements an adversarial example on Instagram's OCR algorithm. Since we are not aware of the exact model, we proceeded with a Black Box approach to address the problem.

The implementation was done in phases wherein we performed a targeted attack with antonyms and then proceeded to untargetted attacks that covered flagged content on Instagram such as Vaccinations and COVID-19. More details for our implementation can be found here.

Types of Black Box Attacks

Targeted Attack

A target image along with an input image is given to alter the output of the OCR model to the text present in target-image.

Targeted Attack - Experiment 1

For a given input image containing a single word, a corresponding target image is given containing its antonym.

Targeted Attack - Experiment 2

Images containing terms related to "Vaccination" are taken. We manually modify the images hiding the word "Vaccination" to create target-images.

Untargeted Attack

No target image is given. The goal is just to alter the OCR model output for the given input-image.

Untargeted Attack - Experiment

No target image is given. The goal is to just alter the model output.

Results

[Success Rate = No. of perturbed images able to fool the model/Total no. of Images considered]

Targeted Attack

Experiment 1: Success Rate = 20/20 = 100%

Experiment 2: Success Rate = 46/52 = 88.4%

Untargeted Attack

Success Rate = 48/52 = 92.3%

Upon manually uploading statuses on Instagram, these are the success rates that we were able to achieve on the platform

Targeted Attack: Success Rate = 10/45 = 22.2%

Untargeted Attack: Success Rate = 21/48 = 43.7%

Fooling Instagram's OCR Model

Page updated

Google Sites

Report abuse