CUNY FDA 2026

Mathematical Deep Learning Workshop: Optimization and Probabilistic Modeling

This seminar is there to give students a practical understanding of how Deep Learning works, how to implement neural networks. We introduce students to the core concepts of deep neural networks and survey the techniques used to model complex processes within the contexts of computer vision and natural language processing.

Organizers: Wenjian Liu, Vincent Martinez, Fei Ye

Time: Thursday 5:30 p.m. - 6:30 p.m.

Location (in-person and hybrid): GC: 4214-03

Zoom: link Zoom Meeting ID: 5977829609

This workshop presents deep learning through two core mathematical lenses: gradient-based optimization and probabilistic modeling. We connect training dynamics and robustness including adversarial perturbations with modern generative methods such as VAEs and diffusion. Along the way, we highlight how inductive biases: convolutions, graph message passing, and attention: shape representation learning in vision and language. Sessions combine concise derivations with light exercises to build paper-reading and experiment-design intuition.

Schedule (Spring 2026)

April 9

Yanqiu Guo (Brown University)

Title: Cross-Entropy, Softmax, and Automatic Differentiation

Abstract: This lecture develops the foundations of classification in deep learning by focusing on differentiable loss functions and the tools used to optimize them. It explains why accuracy is not suitable as a training objective, since it is a hard function with gradients that are zero in most places, and introduces cross-entropy as a more effective alternative for classification tasks. The lecture then presents probability-based outputs, including one-hot labels, binary cross-entropy, and softmax, highlighting how softmax converts network outputs into probabilities and why it is preferred over simple normalization in many settings. Building on these ideas, the lecture examines the derivatives needed for classification networks and shows how backpropagation extends to this setting. It concludes with an introduction to automatic differentiation, computation graphs, and modern deep learning frameworks.

April 2 No meeting (Spring Break)

March 26

Yanqiu Guo (Brown University)

Title: Backpropagation and Stochastic Gradient Descent

Abstract: This lecture introduces the two core tools that make neural network training possible: backpropagation and stochastic gradient descent (SGD). It begins by revisiting gradient descent, explaining the role of the learning rate and why optimization in deep learning is difficult, especially for non-convex objectives with local minima and saddle points. The lecture then develops the mathematical machinery needed to compute gradients in multilayer networks, including weight matrices, Jacobians, and the chain rule, and shows how backpropagation efficiently computes derivatives of the loss with respect to network parameters. Building on these gradients, the lecture presents SGD as a practical alternative to full gradient descent, using minibatches to train faster and often generalize better. It also discusses how batch size affects training dynamics, contrasting small, noisy updates with larger, more stable ones, and concludes with an introduction to numeric, symbolic, and automatic differentiation as computational tools for working with derivatives.

March 19

Yanqiu Guo (Brown University)

Title: Multi-Layer Perceptrons and Optimization

Abstract: This lecture introduces multi-layer perceptrons (MLPs) as a natural extension of perceptrons for learning more complex functions and handling multi-class classification. It explains why non-linear activation functions are essential: without them, a multi-layer network collapses to a linear model. The lecture surveys common activations, including ReLU, Leaky ReLU, tanh, sigmoid, and softmax, and discusses their roles in hidden layers and output layers. It also presents the idea of neural networks as universal function approximators, while clarifying that this result does not explain how to find good parameters or why models generalize well. The second half of the lecture focuses on optimization, contrasting closed-form solutions for convex problems like linear regression with the non-convex setting of neural networks. Students are introduced to gradients, gradient descent, learning rates, and the practical challenges of local minima and non-differentiable objectives, motivating the need for suitable loss functions and backpropagation.

March 12

Yanqiu Guo (Brown University)

Title: Perceptrons, MNIST, and Multi-Layer Networks

Abstract: This lecture introduces perceptrons through the practical task of handwritten digit recognition on MNIST. Starting from how images are represented as pixel values, you will learn how digit classification is framed as a supervised learning problem with training, validation, and test stages. Also explains how perceptrons work as linear classifiers, how their weights relate to input features, and how the perceptron algorithm updates parameters by correcting misclassified examples over multiple epochs. Building on the binary case, the lecture extends perceptrons to multi-class classification by using one output for each class, as in MNIST digit recognition. It also examines the strengths and limits of perceptrons: they are simple, fast, and surprisingly effective, but they cannot solve problems that are not linearly separable. This naturally motivates the transition to multi-layer perceptrons, which overcome these limitations and form the foundation of neural networks.

March 5

Yanqiu Guo (Brown University)

Title: Learning from Data: Linear Models and the Perceptron

Abstract: This lecture connects classical supervised learning to one of the earliest neural network models: the perceptron. It begins by reviewing the standard workflow of training, validation, and testing, and clarifies the difference between regression and classification. Using linear regression as a starting point, we express predictions as a dot product between input features and a weight vector, then formulate learning as minimizing mean squared error. We introduce gradient-based reasoning and matrix notation, discuss the closed-form solution, and explain practical limitations such as computational cost and issues with matrix invertibility. Building on this foundation, we present the perceptron as a linear classifier inspired by biological neurons: it computes a weighted sum plus bias and applies a threshold decision rule.

February 26

Yanqiu Guo (Brown University)

Title: Deep Learning: Foundations, Implementation, and Responsible Use

Abstract: This seminar introduces deep learning in relation to artificial intelligence and machine learning, emphasizing neural networks as learned functions that map inputs to outputs. We cover supervised learning fundamentals, including how to represent inputs and labels, learn a function f, and evaluate whether a model is “good.” Students are introduced to model hypotheses, loss functions, including mean squared error, and standard evaluation splits (training/validation/test), along with key ideas about model complexity and overfitting. We also address practical questions about deep learning architectures, implementation etc.

February 19

Ning Ning (Texas A&M University)

Title: Temporal Interference Stimulation: Mechanisms and Translation

Abstract: Temporal interference stimulation (TIS) is a non-invasive neuromodulation approach that aims to reach deep brain targets with improved focality. By applying multiple high-frequency (kHz) electric fields with a small frequency offset (Δf), TIS generates a low-frequency amplitude-modulated envelope at the interference focus, potentially modulating neural activity while limiting stimulation of superficial tissue. This paper synthesizes advances in (i) computational optimization for individualized targeting and improved focality, (ii) biophysical and dynamical modeling of candidate neuronal mechanisms such as nonlinear ion-channel rectification and subthreshold modulation, and (iii) experimental validation in preclinical and human studies assessing functional outcomes and safety. It also positions TIS relative to conventional transcranial electrical stimulation and invasive deep brain stimulation, highlighting both its promise for depth-access without surgery and current challenges in intensity requirements, modeling accuracy, reproducibility, and translation to clinical protocols.

February 5

Wenjian Liu (Queensborough Community College, CUNY )

Title: Convolutional Architectures in Deep Learning

Abstract: This session introduces convolutional neural network (CNN) architectures and the practical design choices that make them effective for image-like data. It reviews how convolution layers use learnable filters and stride to transform an input into feature maps, and shows how these operations are implemented in practice. Key engineering decisions such as padding are discussed, including the common zero-padding convention and the difference between “VALID” and “SAME” padding. The lecture explains how convolution-layer output size is determined by four hyperparameters—number of filters, filter size, stride, and padding, and illustrates how these choices affect spatial dimensions. Multi-channel inputs are covered, emphasizing how combining information across channels can produce richer representations. Finally, the session highlights the necessity of non-linear activations and introduces pooling operations, discussing their role in invariance and why data augmentation is often needed for robustness.

Page updated

Google Sites

Report abuse