WebNorm: Detecting and Explaining Anomalies Caused by Web Tamper Attacks via Building Consistency-based Normality
Background:
Web applications have played an important role in building various modern infrastructure such as government, bank, hospital, and even military applications. However, Web tampering attacks, where an attacker maliciously manipulates client-side parameters, resulting in inconsistent outcomes such as obtaining membership status without payment, are becoming a long-term threat to website security. For example, recent news discovered that Vistara Airlines suffered from payment bypass attacks where an attacker manipulates the input parameters to bypass the payment stage and obtain free goods or services. The tamper attacks are usually relevant with several Common Vulnerability Exposures (CVEs).
The tamper attacks typically cause the inconsistencies among the server-end logs. Those (even failed) explorative code-tampering behaviors are valuable for the service operators to (1) understand the potential vulnerability of the application and (2) be aware of whether and how the application is under exploitation. Since the frontend code can be tampered in a variety of ways, it leaves the runtime log analysis in the backend as an anomaly detection problem in DevOps.
Introduction
Why we detect tamper attacks with logs?
Given that the web applications generate many logs in real time, we can resort the logs to enable the real-time detection tamper attack detection during the actual execution of a system, enabling real-time responses to potential security risks. Different from the existing works that identify if a website has the tamper vulnerability, we aim to detect the tamper attacks based on logs, which provide a comprehensive record of the attack activities.
What is the limitation of existing deep-learning-based detection methods?
Subtle change of abnormality: Unprecedented attacks can cause the generated logs change in a very subtle way, which traditional solutions typically ignore.
Explainability: Deep learning models output scores but require extensive post-analysis for root cause identification.
Distribution-shift: Evolving tampering behaviors challenge dataset updates and collecting false-negative logs.
A Motivation Example:
As shown in the example below, the captured back-end logs display "price=0," while the normal logs show "price=27.5," indicating an inconsistency with our extracted rules that the price between two events should remain the same. This phenomenon indicates that a sophisticated attacker bypasses the payment requirement for services, evading detection by existing front-end logic but can be identified by comparing the run-time back-end logs with the normal constraints of usual logs.
Overview
Input: Frontend and Backend code of the web applications
Step 1. Testing scenario selection and Log instrumentation: Run pre-deployment functional test scripts and collect corresponding logs.
Step 2. Event Graph Construction: Link events into a graph based on control and data flow.
Step 3. Constraint Learning: Learn log constraints and convert them into scripts.
Output: Run-time anomaly in the deployment phase
Scalability
For practical deployment, we suggest users consider the following ways to convert the backend's log sequences into individual logs for each user:
Cookies
Persistent Cookies: Persistent cookies are stored on the user's device and remain there even after the browser is closed, unless they reach their set expiration date. They are used to store user preferences, login information, and more.
Server-side Sessions: After a user logs in, the server creates a unique session ID and sends it to the user's browser (usually through a cookie). The browser sends this session ID with each request, and the server uses this ID to identify the user.
IP Address
IP Tracking: Although multiple users may share an IP address, IP addresses can be used for basic user identification and tracking in certain situations.