Rahul Gite

Machine Learning Model to Emulate Chemical Variable

Github

Introduction:

I propose to develop the Neural Architecture Search (NAS) Machine Learning (ML) Model to emulate Chemical Variables using the Weather Research Forecasting (WRF) Chem model’s output. WRF-Chem is the Weather Research and Forecasting (WRF) model coupled with Chemistry. The problem with running the WRF-Chem model is that it requires high time computational and cost.

Aim

Using historical data from the WRF-Chem model I will develop a Machine Learning model which will predict future data.
By doing this I will try to reduce the computation cost and reduce processing time to generate chemical variables.

Methodology:

Neural Architecture Search (NAS): It is a technique to automates neural network architecture engineering.
Artificial Neural Network: An ANN is an information processing model that is inspired to mimic biological nervous systems.
Recurrent Neural Network: Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states.

Introductory Presentation

Data-606_Phase1-Presentation.pptx

Neural Architecture Search (NAS)

It is the process of automating Neural Network architecture engineering.
We provide a NAS system with a dataset and a task (classification, regression, etc), and it will give us the architecture.
This architecture will perform best among all other architecture for that given task when trained by the dataset provided.
Generally, NAS can be categorized into three dimensions- search space, a search strategy, and a performance estimation strategy

The fundamental of neural architecture search [7]

Search Space:

The search space determines which neural architectures to be assessed. The search space contains every architecture design (often an infinite number) that can be originated from the NAS approaches. It may involve all sets of layer configurations stacked on each other or more complicated architectures that include skipping connections. To reduce the search space dimension, it may also involve sub-modules design.[7]

Performance Estimation Strategy:

It will provide a number that reflects the efficiency of all architectures in the search space. It is usually the accuracy of a model architecture when a reference dataset is trained over a predefined number of epochs followed by testing.[7]

Search Strategies:

NAS majorly relies on search strategies, including random and grid search, gradient-based strategies, evolutionary algorithms, and reinforcement learning strategies. A grid search is to follow the systematic search. In contrast, random search randomly picks architectures from the search space and then tests the accuracy of corresponding architecture through performance estimation strategy.[7]

Dimension of NAS Method [8]

AutoKeras

An AutoML system based on Keras. It is developed by DATA Lab at Texas A&M University. The goal of AutoKeras is to make machine learning accessible to everyone. It provides libraries that implement Neural Architecture Search.[5]

Weather Research Forecasting Model (WRF)

The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting applications.[3]
It features two dynamical cores, a data assimilation system, and a software architecture supporting parallel computation and system extensibility.
For researchers, WRF can produce simulations based on actual atmospheric conditions (i.e., from observations and analyses) or idealized conditions.
In our study, the input data is the data from the WRF model. For this study purpose, we will focus on two major variables from the WRF model i.e. Temperature and Pressure.
We will make efforts to emulate exact Temperature and Pressure data from the Historic data.

Data Loading and Preprocessing

The initial Data is in the NetCDF file format.
A major part of Data Loading and Preprocessing involves extracting the useful data from NetCDF files and pickling it as a dictionary and storing it on our server.
NetCDF (network Common Data Form) is a file format for storing multidimensional scientific data (variables) such as temperature, humidity, pressure, wind speed, and direction.
For the initial Experiment, we extracted a month (May-2019) of data from the overall year and a half of data.

Data Dimension:

Latitude shape : (29,29)

Longitude shape : (29,29)

Temperature Shape: (29,29,29)

Pressure Shape: (29,29,29)

So at single point of time and at single Latitude and Longitude we have 29 values of Temperature

Terms to Remember

Planetary Boundary Layer (PBL): It is the lowest part of the atmosphere and its behavior is directly influenced by its contact with a planetary surface.
Potential Temperature: Unlike regular temperature is not affected by the physical lifting or sinking associated with flow over obstacles or large-scale atmospheric turbulence.
Perturbation Potential Temperature: It is defined as the difference between the potential temperature of the PBL and the potential temperature of the free atmosphere above the PBL. We will emulate perturbation Potential temperature.

According to the WRF users manual the formula to convert perturbation potential temperature to total potential temperature in K is:

Total pot. temp. in K = T + 300 (T is the perturbation pot. temp.)

[4]

Plotting input data for Temperature and Pressure

Perturbation Potential Temperature

Pressure

Emulate Potential Temperature and Pressure

WRF-CHEM experiment using the YSU scheme started from 00:00UTC Jan, 2018 to May 31, 2019, over the United States at 2.5⁰x2.5⁰ degrees with 29 vertical levels 5N-70N and 160W-32W domain, hourly output.
We used WRF CHEM multiple locations in the United States. Trained using data from May 2019.
Locations included in this study are Lamont OK, Florida, and University of Maryland Baltimore County.
The different models trained are all Univariate RNN model, so our input and output variable is same