In our WebInteraction system, classes form 3 levels of hierarchy.
Level 1
PhishIntentionWrapper: A wrapper for PhishIntention
SubmissionButtonLocator: An object detector for submission button
OpenMMOCR: Calling OCR library from OpenMMlab
Level 2
StateClass: check the status of the website, e.g. empty page, error page etc.
StateAction: perform CRP transition proposed by PhishIntention
Form: implements the essential logic where we define input detection, input rule matching, button detection, form filling, and form submission
Input detection: Since there are limited number of ways to implement a fillable input fields in HTML, we locate all <input>, <textarea> and <search> tags in the HTML document
Input type matching: To decide what is the type of credentials that a particular input is asking, we have 2 layers of matching
1) The simpliest way is to look for keywords in HTML attributes
2) If layer 1 is bypassed by HTML code obfuscation, we will use OCR to report the text surronding the input area
In total we keep 29 matching rules for 29 input types( i.e. email, first name, last name, username, userid, name prefix, password, phone area, phone, month, day, year, birthday, age, file upload, zipcode, city, country, state, street, building number, address, ssn, company name, credit card number, credit card ccv, credit card expiration date). The reason to keep such a comprehensitive list is that we want to avoid interaction failure because of "not filling up input with the required format".
Button detection: The submission button detection is an object detector trained on 1495 images
Button cleansing: We discard the "registration" button because clicking registration button will indeed proceed to the next page.
Form filling: Filling all inputs with required formats
Form submission: Clicking the most probable submission button
Level 3
Web Interaction model: This is a more detailed illustration of Algorithm 3