Nathan Dong, PhD, CFA
Department of Finance, Carroll School of Management, Boston College
Academic Research Guide to Artificial Intelligence in Finance
Guide to Literature in Financial Economics
Machine Learning / Deep Learning
Barbaglia, L., Manzan, S., Tosetti, E., 2023. Forecasting Loan Default in Europe with Machine Learning*. Journal of Financial Econometrics 21, 569–596.
Cao, S., Jiang, W., Wang, J.L., Yang, B., Forthcoming. From Man vs. Machine to Man Machine: The Art and AI of Stock Analyses. Journal of Financial Economics.
Erel, I., Stern, L.H., Tan, C., Weisbach, M.S., 2021. Selecting Directors Using Machine Learning. Review of Financial Studies 34, 3226–3264.
Gu, S., Kelly, B. and Xiu, D., 2020. Empirical asset pricing via machine learning. Review of Financial Studies 33(5), 2223-2273.
Maliar, L., Maliar, S., Winant, P., 2021. Deep learning for solving dynamic economic models. Journal of Monetary Economics 122, 76–101.
Sadhwani, A., Giesecke, K., Sirignano, J., 2021. Deep Learning for Mortgage Risk. Journal of Financial Econometrics 19, 313–368.
Natural Language Processing (NLP)
Gorodnichenko, Y., Pham, T., Talavera, O., 2023. The Voice of Monetary Policy. American Economic Review 113, 548–584.
Computer Vision
Khachiyan, A., Thomas, A., Zhou, H., Hanson, G., Cloninger, A., Rosing, T., Khandelwal, A.K., 2022. Using Neural Networks to Predict Microspatial Economic Growth. American Economic Review: Insights 4, 491–506.
Natural Language Generation
Guide to Software and Coding
1. Operating System
Linux Ubuntu 22.04
Microsoft Windows operating system must be avoided for neural-network training purpose at all costs.
2. Programming Language
Python 3.10
Install Python PIP, IDLE, and GIT on Ubuntu:
sudo apt update
sudo apt install python3-pip
sudo apt install idle3
sudo apt install git
Run Python IDLE on Ubuntu:
python3 -m idlelib
Check NVLink if two RTX 2080Ti cards are connected:
nvidia-smi topo -m
Install Nvidia driver for Tesla K80 or M40 GPU:
ubuntu-drivers devices
sudo apt install nvidia-driver-470
Overclock Tesla K80 or M40 GPU to boost performance:
nvidia-smi -q -i 0 -d CLOCK
sudo nvidia-smi -pm ENABLED -i 0
sudo nvidia-smi -rac -i 0
nvidia-smi -q -i 0 -d SUPPORTED_CLOCKS
sudo nvidia-smi -ac 3004,875 -i 0
# For K80, repeat these steps for the 2nd GPU with "-i 1" because K80 has two GPU units in one card.
Install Nvidia driver for Tesla P100 GPU:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.2.1/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.1-535.86.10-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.1-535.86.10-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
3. Library and Coding
Pytorch 2.2Install Pytorch for Nvidia Tesla K80 or M40 on Ubuntu:
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 torchtext==0.13.1 torchdata==0.4.1 --extra-index-url https://download.pytorch.org/whl/cu113
How to choose optimizer: ADAM or SGD?
ADAM (adaptive moment estimation)can achieve convergence faster and it is more robust to bad hyperparameters initialization
But it may converge to less optimal local minima
ADAM is more suitable for training transformers in natural language processing (NLP)
SGD (stochastic gradient descent) can produce more accurate models, but may converge slowly and less stable
SGD adjusts learning rates for parameters togather, whereas ADAM adjusts them separately
SGD is more suitable CNN (convolutional neural network) for image recognition
How to choose activation function: ReLU, Sigmoid, Softmax, or Tanh?
The choice depends on the type of application and the range of output values
RELU: using neural network to predict values that are greater than 1
Sigmoid or Tanh: the output values are in the range of [0,1] or [-1, 1]
Softmax in the last layer: classification when predicting a probability distribution over mutually exclusive class labels
How to use regularization technique: L1, L2, or Dropout?
Regularization helps prevent a NN model from becoming too complex or having large parameter values, and hence avoids over-fitting
L1 (Lasso) adds a penalty (the absolute value of the weights) to the model’s objective function, leading to a sparse model, where some weights are exactly equal to zero
l1 = sum(p.abs().sum() for p in model.parameters())
loss = loss + 0.1*l1
L2 (Ridge) adds a penalty (the square of the weights) to the model's objective function, causing a model to have small, non-zero weights
optimizer = Adam(model.parameters(), lr=0.01, weight_decay=0.1)
Dropout randomly sets zeros to the weights of a fraction of neurons during training, forcing the remaining neurons to learn more features. For example, a dropout rate of 0.1 means that one tenth of neurons are dropped out in each epoch
def __init__(self):
self.dropout = nn.Dropout(p=0.1)
def forward(self, x):
return self.sigmoid(self.layer(self.dropout(...)))
My experience is to start with the Dropout, and then try either or both L1 and L2.
How to use mixed-precision to speed up training?
Using half-precision floating-point numbers (FP16) rather than full-precision floating-point numbers (FP32) can help reduce computing power and memory usage. Mixed precision allows for FP16-based training while still preserving much of the FP32-based network accuracy:
scaler = GradScaler(enabled=True)
for epoch in range(epochs):
with autocast(enabled=True):
predicted = model(input)
loss = loss_fn(predicted, realized)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
However, the gain in efficiency does come at a cost of lower precision. My experience is to double the speed but lose less than 5% accuracy, which is not bad at all.
How to choose learning-rate scheduler?
Learning rate is the magnitude of change/update to model weights during the backpropagation training process. It controls how big of a step for an optimizer to reach the minima of the loss function. Learning rate scheduler adjusts the learning rate between epochs as the training progresses
The exponential decay performs the best:
scheduler = ExponentialLR(optimizer, gamma=0.99)
Time-based decay is not too bad:
scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
The linear and step-based decay often lead to model overfitting:
scheduler = LinearLR(optimizer, start_factor=0.5, total_iters=100)
Sklearn 1.4
Install sklearn on Ubuntu:
pip install -U scikit-learn
Guide to Hardware Setup for Neural-Network Training
My Favorite Build
Gigabyte Z590 Aorus Ultra + Intel Core i5-11400T + Samsung DDR4 2666MHz 64GB
Western Digital SN580 NVMe 1TB + Zotac Gaming GeForce RTX 3080 Trinity
Ubuntu 22.04 + Python 3.10 + Pytorch 2.2 + Scikit-learn 1.4
The Worst Hardware that You Should Avoid
Dell RTX 2080 Ti OEM
Dell Alienware RTX 3080
Ubuntu Log Overflow Causing HD/SSD Out-of-Space
Disable PCIe power management (ASPM) for all GPU devices
sudo gedit /etc/default/grub
change to: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pcie_aspm=off"
or try: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=noaer"
also try: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=nomsi"
sudo update-grub
Clear log files to save space
sudo su
echo "" > /var/log/kern.log
echo "" > /var/log/syslog
service syslog restart
journalctl --vacuum-size=50M
Nvidia GPU Overheating Problem
Dried-out thermal paste
Inefficient thermal pad
Poor ventilation
Bad GPU x16 extension cable
GPU is connected to PCIe at a lower generation, such as gen3 -> gen2
nvidia-smi --query-gpu=pcie.link.gen.max,pcie.link.gen.current --format=csv
GPU is connected to PCIe at a lower width/lane, such as x16 -> x4
nvidia-smi --query-gpu=pcie.link.width.max,pcie.link.width.current --format=csv
To lower GPU power limit to TDP 100W, preventing overheating damage
nvidia-smi -q -d power
sudo nvidia-smi -i 0 -pm enabled
sudo nvidia-smi -i 0 -pl 100