WebNorm: Detecting and Explaining Anomalies Caused by Web Tamper Attacks via Building Consistency-based Normality

Background: 

Web applications have played an important role in building various modern infrastructure such as government, bank, hospital, and even military applications. However, Web tampering attacks, where an attacker maliciously manipulates client-side parameters, resulting in inconsistent outcomes such as obtaining membership status without payment, are becoming a long-term threat to website security. For example, recent news discovered that Vistara Airlines suffered from payment bypass attacks where an attacker manipulates the input parameters to bypass the payment stage and obtain free goods or services. The tamper attacks are usually relevant with several Common Vulnerability Exposures (CVEs).  

The tamper attacks typically cause the inconsistencies among the server-end logs. Those (even failed) explorative code-tampering behaviors are valuable for the service operators to (1) understand the potential vulnerability of the application and (2) be aware of whether and how the application is under exploitation. Since the frontend code can be tampered in a variety of ways, it leaves the runtime log analysis in the backend as an anomaly detection problem in DevOps.

Introduction

Why we detect tamper attacks with logs? 

Given that the web applications generate many logs in real time, we can resort the logs to enable the real-time detection tamper attack detection during the actual execution of a system, enabling real-time responses to potential security risks. Different from the existing works that identify if a website has the tamper vulnerability, we aim to detect the tamper attacks based on logs, which provide a comprehensive record of the attack activities.

What is the limitation of existing deep-learning-based detection methods

Subtle change of abnormality: Unprecedented attacks can cause the generated logs change in a very subtle way, which traditional solutions typically ignore.

Explainability: Deep learning models output scores but require extensive post-analysis for root cause identification.

Distribution-shift: Evolving tampering behaviors challenge dataset updates and collecting false-negative logs.


A Motivation Example: 

As shown in the example below, the captured back-end logs display "price=0," while the normal logs show "price=27.5," indicating an inconsistency with our extracted rules that the price between two events should remain the same. This phenomenon indicates that a sophisticated attacker bypasses the payment requirement for services, evading detection by existing front-end logic but can be identified by comparing the run-time back-end logs with the normal constraints of usual logs.  

Overview

Input: Frontend and Backend code of the web applications

Output: Run-time anomaly in the deployment phase

Scalability

For practical deployment, we suggest users consider the following ways to convert the backend's log sequences into individual logs for each user: