Airspace Anomaly Detection

Project Information

  • NASA Big Data Analytics

  • Cooperate with Mosaic ATM and Honeywell

  • Period

    • Year 1: October 2016 - September 2017

    • Year 2: October 2017 - September 2018

    • Year 3: October 2018 - present

  • Participants: Raj Deshmukh, Kwangyeon Kim

Airspace Anomaly Detection

Introduction

The National Airspace System (NAS) is vast and complex, including many subsystems. A substantial amount of data is being recorded, whose size will grow as the National Airspace System (NAS) evolves with additional sensing and data collection capabilities and newly deployed systems. In this regard, it has become more critical to effectively identify (a priori unknown) safety threats or emerging risks. Aviation data are typically unlabeled, which leads to apply unsupervised learning approaches to support anomaly detection in the NAS.

  • Objective: to develop an unsupervised learning approach to effectively identify a priori unknown safety threats or emerging risks

  • Challenges

Anomaly detection is the problem of identifying events or observations that do not conform to expected behaviors in a dataset. Especially for sequential (or time-series) data such as flight data, several techniques for anomaly detection have been proposed such as Gaussian Processes (GP) and Hidden Markov Model (HMM). Among the anomalies detected by such methods, which are statistically significant, we can incorporate the human feedback to identify only operationally significant anomalies. In general, these methods infer a surface in a high-dimensional feature space which separates normal and anomalous data. However, it is in general hard to interpret the meanings of the surfaces, especially in on-line monitoring.

In this sense, we propose a Temporal Logic Learning based Anomaly Detection (TempAD) algorithm, which provides formulas that are easy to be interpreted in natural languages. The learned temporal logic formulas can express system properties such as bounds on time and physical parameters which have physical meanings. For example, the following formula

where R1 = {(x,y) | f1(x,y) > 0} and R2 = {(x,y) | f2(x,y) >0}, consists of

  • its structure that determines "Finally (or Eventually)" (F) or "Globally (or Always)" (G), "and" or "or" relations, and the form of the predicates (e.g., f1(x,y) can be in the form of x>c or ax+by>c where a, b, and c are constants) and;

  • its parameters that specify the bounds on time and physical variables.

This temporal logic formula can be interpreted in natural language as

"for the normal behavior, the system’s variables x and y should reach area R1 (e.g., reach x+3y > 27) in any time between 2 and 27, and also should reside in area R2 (e.g., maintain y^2+3x+y <15) during the time steps between 30 and 31."

  • Proposed framework: Temporal logic based Anomaly Detection (TempAD)

Data Preprocessing via Clustering

  • Flight Data: Airport Surface Detection Equipment - Model X (ASDE-X) data recorded at LaGuardia (LGA) Airport during the period of April 6 - 24, 2016 (19 days)

    • Available Information: flight ID, position (latitude, longitude, altitude) and speed (ground speed)

  • Since a single airport contains multiple arrival patterns, instead of dealing with them as a whole, by examining groups with similar properties individually, data analysis and anomaly detection can be made more effectively and efficiently. In this sense, the flight data is first divided into clusters (groups with similar properties) with clustering techniques.

  • Since our data is spatial in nature, and contains abnormal flights (or noises), we propose to use a density-based method, called DBSCAN (Density-Based Spatial Clustering of Applications with Noise).

Temporal Logic Learning

The overall architecture of the proposed learning algorithm, called TempAD (Temporal Logic Based Anomaly Detection), for unsupervised anomaly detection consists of Training and Monitoring

  • In the Training stage, a set of time-series data is fed into TempAD to generate a model that separates normal and anomalous time-series data; and

  • In the Monitoring stage, the learned model is used to compute an anomaly score for an unseen time-series data.

As shown in the figure below, the training process (TempAD) is performed in two steps:

  • Model Selection (Discrete Search) based on a piece-wise regression technique to identify the structure of a temporal logic formula, and;

  • Parameter Learning (Continuous Search) based on the One-Class Support Vector Machine (OCSVM)-like optimization to computes the parameters of the temporal logic formula found in Model Selection, which can best describe the data.

Illustrative Example: Go-around Flight

To demonstrate the training and monitoring steps, we present a go-around example.

  • Training (learning): the models learned by TempAD (blue lines) describe the normal behaviors so that anomalous trajectories (red lines) can be detected.

    • For example, the model in the horizontal dimension is computed as:

where t0 is the time remaining to touchdown at the beginning, t*=178.08 (around three minutes before touchdown), and R1 and R2 are the areas defined by linear and quadratic polynomials as:

R1 = {(x,y)| –x+0.8065y > 106.7516 and –x+1.4286y<132.1086}

R2 = {(x,y)| x+57.2788y^2–4668.0y > 95178.9 and x+68.3416y^2–5571.1y < 113612.2}

With some visualization aids such as the video below, the model can be interpreted in natural language as “an aircraft flying normally should reside in area R1 up to 3 minutes before touchdown and then R2 until touchdown.”

  • Monitoring: For the real-time monitoring purpose, the anomaly score is computed by using the concept of robustness degree that represents a signed distance of a time-series data to the learned model: if it is positive, then the data is normal, and if negative, then anomalous.

    • As an example, the video below shows the models learned by TempAD (blue) and its use for monitoring for a go-around flight (red). Note that for the vertical and speed dimensions, the models are learned for both upper and lower bounds.

      • It is shown that all the anomaly scores in the three dimensions are maintained positive until the vertical trajectory violates the lower bound (82 seconds before touchdown).

      • After 68 seconds, the aircraft takes a go-around maneuver which is identified by the negative anomaly scores in all three dimensions.

Related Publications

  • K. Kim and I. Hwang, "Terminal Airspace Anomaly Detection Using Temporal Logic Learning," 8th International Conference on Research in Air Transportation (ICRAT), June 2018, Catalonia, Spain (accepted).