Nathan Dong, PhD, CFA
Seidner Department of Finance, Carroll School of Management, Boston College
Academic Research Guide to Artificial Intelligence in Finance
Guide to Literature in Financial Economics
Machine Learning
Bali, T.G., Beckmeyer, H., Moerke, M., Weigert, F., 2023. Option Return Predictability with Machine Learning and Big Data, Review of Financial Studies 36(9), 3548–3602.
Bianchi, D., Büchner, M., Tamoni, A., 2021. Bond Risk Premiums with Machine Learning. Review of Financial Studies 34(2), 1046-1089.
Barbaglia, L., Manzan, S., Tosetti, E., 2023. Forecasting Loan Default in Europe with Machine Learning. Journal of Financial Econometrics 21, 569–596.
Bogousslavsky, V., Fos, V., Muravyev, D., 2024. Informed Trading Intensity. Journal of Finance 79(2), 903-948.
Chen, A.Y., McCoy, J., 2024. Missing Values Handling for Machine Learning Portfolios. Journal of Financial Economics 155, 103815.
DeMiguel, V., Gil-Bazo, J., Nogales, F.J., Santos, A.A., 2023. Machine Learning and Fund Characteristics Help to Select Mutual Funds with Positive Alpha. Journal of Financial Economics 150(3), 103737.
De Silva, T., Thesmar, D., 2024. Noise in Expectations: Evidence from Analyst Forecasts. Review of Financial Studies 37(5), 1494-1537.
Duarte, V., Duarte, D., Silva, D.H., 2024. Machine Learning for Continuous-Time Finance, Review of Financial Studies 37(11), 3217–3271.
Erel, I., Stern, L.H., Tan, C., Weisbach, M.S., 2021. Selecting Directors Using Machine Learning. Review of Financial Studies 34, 3226–3264.
Easley, D., López de Prado, M., O’Hara, M., Zhang, Z., 2021. Microstructure in the Machine Age. Review of Financial Studies 34(7), 3316-3363.
Feng, G., Giglio, S., Xiu, D., 2020. Taming the Factor Zoo: A Test of New Factors. Journal of Finance 75(3), 1327-1370.
Fuster, A., Goldsmith‐Pinkham, P., Ramadorai, T., Walther, A., 2022. Predictably Unequal? The Effects of Machine Learning on Credit Markets. Journal of Finance 77(1), 5-47.
Gu, S., Kelly, B., Xiu, D., 2020. Empirical Asset Pricing via Machine Learning. Review of Financial Studies 33(5), 2223-2273.
Iskhakov, F., Rust, J., Schjerning, B., 2020. Machine Learning and Structural Econometrics: Contrasts and Synergies. The Econometrics Journal 23, S81–S124.
Kaniel, R., Lin, Z., Pelger, M., Van Nieuwerburgh, S., 2023. Machine-Learning the Skill of Mutual Fund Managers. Journal of Financial Economics 150(1), 94-138.
Leippold, M., Wang, Q., Zhou, W., 2022. Machine Learning in the Chinese Stock Market. Journal of Financial Economics 145(2), 64–82.
Li, K., Mai, F., Shen, R., Yan, X., 2021. Measuring Corporate Culture using Machine Learning. Review of Financial Studies, 34(7), 3265-3315.
Martin, I.W., Nagel, S., 2022. Market Efficiency in the Age of Big Data. Journal of Financial Economics, 145(1), 154-177.
Mullainathan, S., Spiess, J., 2017. Machine Learning: an Applied Econometric Approach. Journal of Economic Perspectives 31(2), 87-106.
Murray, S., Xia, Y., Xiao, H., 2024. Charting by Machines. Journal of Financial Economics 153, 103791.
Sautner, Z., Van Lent, L., Vilkov, G., Zhang, R., 2023. Firm‐level Climate Change Exposure. Journal of Finance, 78(3), 1449-1498.
Van Binsbergen, J.H., Han, X., Lopez-Lira, A., 2023. Man versus Machine Learning: The Term Structure of Earnings Expectations and Conditional Biases. Review of Financial Studies 36(6), 2361–2396.
AI Deep Learning
Cao, S., Jiang, W., Wang, J.L., Yang, B., 2024. From Man vs. Machine to Man Machine: The Art and AI of Stock Analyses. Journal of Financial Economics 160, 103910.
Maliar, L., Maliar, S., Winant, P., 2021. Deep Learning for Solving Dynamic Economic Models. Journal of Monetary Economics 122, 76–101.
Sadhwani, A., Giesecke, K., Sirignano, J., 2021. Deep Learning for Mortgage Risk. Journal of Financial Econometrics 19, 313–368.
Dong, G. Nathan, 2024. Can AI Replace Stock Analysts? Evidence from Deep Learning Financial Statements, Boston College Working Paper.
Natural Language Processing (NLP)
Adams, R.B., Ragunathan, V., Tumarkin, R., 2021. Death by Committee? An Analysis of Corporate Board (Sub-) Committees. Journal of Financial Economics, 141(3), 1119-1146.
Aleti, S., Bollerslev, T., 2024. News and Asset Pricing: A High-Frequency Anatomy of the SDF. Review of Financial Studies.
Bybee, L., Kelly, B., Manela, A., Xiu, D., 2024. Business News and Business Cycles. Journal of Finance 79(5), 3105-3147.
Cookson, J.A., Lu, R., Mullins, W., Niessner, M., 2024. The Social Signal. Journal of Financial Economics, 158, 103870.
Garcia, D., Hu, X., Rohrer, M., 2023. The Colour of Finance Words. Journal of Financial Economics, 147(3), 525-549.
Gorodnichenko, Y., Pham, T., Talavera, O., 2023. The Voice of Monetary Policy. American Economic Review 113, 548–584.
Graham, J.R., Grennan, J., Harvey, C.R., Rajgopal, S., 2022. Corporate Culture: Evidence from the Field. Journal of Financial Economics, 146(2), 552-593.
Hassan, T.A., Hollander, S., van Lent, L., Schwedeler, M., Tahoun, A., 2023. Firm-Level Exposure to Epidemic Diseases: COVID-19, SARS, and H1N1, Review of Financial Studies, 36(12), 4919–4964.
Computer Vision
Khachiyan, A., Thomas, A., Zhou, H., Hanson, G., Cloninger, A., Rosing, T., Khandelwal, A.K., 2022. Using Neural Networks to Predict Microspatial Economic Growth. American Economic Review: Insights 4, 491–506.
Natural Language Generation
TBA
Guide to Software Installation for Neural-Network Training
Operating System: Linux Ubuntu 22.04
KhMicrosoft Windows operating system must be avoided for neural-network training purpose at all costs.
Install Python PIP, IDLE, and GIT on Ubuntu
sudo apt update
sudo apt install python3-pip
sudo apt install idle3
sudo apt install git
Run Python IDLE on Ubuntu
python3 -m idlelib
Check NVLink if two RTX 2080Ti cards are connected
nvidia-smi topo -m
Install Nvidia driver for Tesla K80 or M40 GPU
ubuntu-drivers devices
sudo apt install nvidia-driver-470
Install Nvidia driver for Tesla P100 GPU
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.2.1/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.1-535.86.10-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.1-535.86.10-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
Install Pytorch for Nvidia GeForce RTX 2080 Ti or RTX 3080 on Ubuntu:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Install Pytorch for Nvidia Tesla K80 or M40 on Ubuntu:
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 torchtext==0.13.1 torchdata==0.4.1 --index-url https://download.pytorch.org/whl/cu113
Install TensorRT in Pytorch with CUDA 11.8:
Install CUDA Tooklit 11.8: https://developer.nvidia.com/cuda-11-8-0-download-archive
pip install torch torch-tensorrt tensorrt --extra-index-url https://download.pytorch.org/whl/cu118
Guide to Pytorch Coding in Python
How to choose optimizer: ADAM or SGD?
ADAM (adaptive moment estimation)can achieve convergence faster and it is more robust to bad hyperparameters initialization
But it may converge to less optimal local minima
ADAM is more suitable for training transformers in natural language processing (NLP)
SGD (stochastic gradient descent) can produce more accurate models, but may converge slowly and less stable
SGD adjusts learning rates for parameters togather, whereas ADAM adjusts them separately
SGD is more suitable CNN (convolutional neural network) for image recognition
How to choose activation function: ReLU, Sigmoid, Softmax, or Tanh?
The choice depends on the type of application and the range of output values
RELU: using neural network to predict values that are greater than 1
Sigmoid or Tanh: the output values are in the range of [0,1] or [-1, 1]
Softmax in the last layer: classification when predicting a probability distribution over mutually exclusive class labels
How to use regularization technique: L1, L2, or Dropout?
Regularization helps prevent a NN model from becoming too complex or having large parameter values, and hence avoids over-fitting
L1 (Lasso) adds a penalty (the absolute value of the weights) to the model’s objective function, leading to a sparse model, where some weights are exactly equal to zero
l1 = sum(p.abs().sum() for p in model.parameters())
loss = loss + 0.1*l1
L2 (Ridge) adds a penalty (the square of the weights) to the model's objective function, causing a model to have small, non-zero weights
optimizer = Adam(model.parameters(), lr=0.01, weight_decay=0.1)
Dropout randomly sets zeros to the weights of a fraction of neurons during training, forcing the remaining neurons to learn more features. For example, a dropout rate of 0.1 means that one tenth of neurons are dropped out in each epoch
def __init__(self):
self.dropout = nn.Dropout(p=0.1)
def forward(self, x):
return self.sigmoid(self.layer(self.dropout(...)))
My experience is to start with the Dropout, and then try either or both L1 and L2.
How to use mixed-precision to speed up training?
Using half-precision floating-point numbers (FP16) rather than full-precision floating-point numbers (FP32) can help reduce computing power and memory usage. Mixed precision allows for FP16-based training while still preserving much of the FP32-based network accuracy:
scaler = GradScaler(enabled=True)
for epoch in range(epochs):
with autocast(enabled=True):
predicted = model(input)
loss = loss_fn(predicted, realized)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
However, the gain in efficiency does come at a cost of lower precision. My experience is to double the speed but lose less than 5% accuracy, which is not bad at all.
How to choose learning-rate scheduler?
Learning rate is the magnitude of change/update to model weights during the backpropagation training process. It controls how big of a step for an optimizer to reach the minima of the loss function. Learning rate scheduler adjusts the learning rate between epochs as the training progresses
The exponential decay performs the best:
scheduler = ExponentialLR(optimizer, gamma=0.99)
Time-based decay is not too bad:
scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
The linear and step-based decay often lead to model overfitting:
scheduler = LinearLR(optimizer, start_factor=0.5, total_iters=100)
Install sklearn on Ubuntu:
pip install -U scikit-learn
Guide to Hardware Setup for Training NN Model
DIY Setup vs. cloud GPU
You would choose to build your own deep-learning computer over using cloud GPUs if you prioritize cost-effectiveness, control, privacy and performance needs that might not be fully met by cloud solutions. Your own GPU setup gives you complete control over the hardware, software, and entire system. You can tailor the configuration to your specific requirements, ensuring optimal performance for your particular workload. This is especially valuable for specialized tasks or when you need to optimize for specific hardware and software combinations. You can achieve higher performance with a DIY setup, particularly if you are dealing with large datasets or intensive workloads. Cloud GPU instances may experience performance bottlenecks due to factors like network latency and I/O limitations. With a DIY system, you can ensure that hardware components are optimized for your specific needs, minimizing bottlenecks and maximizing performance. While cloud GPUs offer scalability, building your own system can be more flexible in terms of scaling. You can incrementally add more GPUs or upgrade components as needed, without the constraints of cloud instance types or potential limitations on resource availability. With a DIY setup, you have full control over the security of your data and computing environment. This is crucial if you're handling sensitive information or require a high level of data protection. If you are passionate about learning about GPU programming and hardware, building your own system provides a great opportunity to gain in-depth knowledge and experience. You can experiment with different hardware and software configurations, fine-tuning the system to achieve optimal performance for your specific projects.
My favorite build
Gigabyte Z590 Aorus Ultra + Intel Core i5-11400T + Samsung DDR4 2666MHz 64GB
Western Digital SN580 NVMe 1TB + Zotac Gaming GeForce RTX 3080 Trinity
Ubuntu 22.04 + Python 3.10 + Pytorch 2.2 + Scikit-learn 1.4
The worst hardware that you should avoid
Dell RTX 2080 Ti OEM
Dell Alienware RTX 3080
Overclock Tesla K80 or M40 GPU to boost performance
nvidia-smi -q -i 0 -d CLOCK
sudo nvidia-smi -pm ENABLED -i 0
sudo nvidia-smi -rac -i 0
nvidia-smi -q -i 0 -d SUPPORTED_CLOCKS
sudo nvidia-smi -ac 3004,875 -i 0
# For K80, repeat these steps for the 2nd GPU with "-i 1" because K80 has two GPU units in one card.
Ubuntu log overflow causing HD/SSD out-of-space
Disable PCIe power management (ASPM) for all GPU devices
sudo gedit /etc/default/grub
change to: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pcie_aspm=off"
or try: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=noaer"
also try: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=nomsi"
sudo update-grub
Clear log files to save space
sudo su
echo "" > /var/log/kern.log
echo "" > /var/log/syslog
service syslog restart
journalctl --vacuum-size=50M
Nvidia GPU overheating problem
Dried-out thermal paste
Inefficient thermal pad
Poor ventilation
Bad GPU x16 extension cable
GPU is connected to PCIe at a lower generation, such as gen3 -> gen2
nvidia-smi --query-gpu=pcie.link.gen.max,pcie.link.gen.current --format=csv
GPU is connected to PCIe at a lower width/lane, such as x16 -> x4
nvidia-smi --query-gpu=pcie.link.width.max,pcie.link.width.current --format=csv
To lower GPU power limit to TDP 100W, preventing overheating damage
nvidia-smi -q -d power
sudo nvidia-smi -i 0 -pm enabled
sudo nvidia-smi -i 0 -pl 100