Reference for Deep Learning & Finite Elements

Numerical Analysis

Zeros or Roots

The physical problems can be represented through mathematical equations and by computing these equations i.e. finding zeros or roots, we get the solution of the problem. The zero(s) or root (s) of f(x) i.e. polynomials are the values of x at x = a where f(x) vanishes i.e. f(a) = 0

Case: Linear System of Equations - Single variable problem

Let's suppose you can run at 0.2 miles per minute and your friend can bike at 0.4 miles per minute. It takes a minute for you to put on the shoes but your friend needs 6 minutes to put on the shoes and inflate the tire. When will your friend catch you?

Then, through simple mathematical formula "distance = speed * time", you can write the following equations:

dwalk (t) = 0.2 (t - 1)

dbike (t) = 0.4 (t - 6)

Here, distance is the function of time which is linearly dependent on time. Hence the problem has been formulated with a linear equation with a single variable t and is therefore belongs to system of linear equations.

If d be the distance in miles when you two meet, then d = dwalk (t) = dbike (t)

0.2 (t - 1) = 0.4 (t - 6)

f(t) = 0.2t - 2.2 = 0

The solution of equation f(t) gives t = 11 minutes and d = 2 miles. This is the solution of the problem or the equation. So, you both will meet after 11 minutes in 2 miles distance i.e.

f(11) = 0.2*11 - 2.2 = 0

If we plot this in two dimensional Euclidean space (or Euclidean plane) of Real Numbers (R2), the two lines meet at the point (d,t)=(2,11) in dt-plane as showed:

Fig. The two lines representing two equations intersect at the point (d,t)=(2,11) which is the solution of the given problem.

In the previous example, we were assuming that the speed is constant. Your friend will be applying force on the pedal to overcome the friction. What if he applies constant force to speed up? Then it introduces non-linearity to the equations with the introduction of acceleration "a" and the equation becomes non-linear. Using Newton's distance equation,

dbike (t) = 0.4 (t - 6) + 1/2*a(t - 6)2

Consider a quadratic function f(x) = x2 - 5x + 6.

It follows that the solutions of such an algebraic (polynomial) equation f(x) = 0 are exactly the zeros of the function f (x).

f(x) = x2 - 5x + 6 = 0

Or, x2 - 3x -2x + 6 = 0

Or, x(x-3) -2 (x-3) = 0

Or, (x-3)(x-2) = 0

x=2 or 3 are the roots or the solutions of the polynomial f(x) = 0. In other words, f(2) or f(3) = 0. So, these are the minimas for f(x). So, this second degree polynomial has two zeroes or roots.

A polynomial of degree "n" will have "n" roots, some of which may be multiple roots. The larger the degree the longer and more complicated the procedure and computation becomes. The numerical methods such as Newton Raphson and Secant Method that are described here provide solutions independent of the degree of the polynomial.

Iterative Methods of Finding Roots

Newton Raphson Method

It is the iterative method for finding successively better approximations to the roots (or zeroes) of a real-valued function. The method starts initial guess x0 for a root of the function which may not be a solution i.e. f(x0) !=0. The next approximation is the the intersection of the tangent at f(x) and x0 on x-axis. Geometrically, (x1, 0) is the intersection with the x-axis of the tangent to the function f at (x0, f(x0)).

Fig. 1: Newton Raphson Method [1]

The equation of the tangent line to the curve y = ƒ(x) at the point (xn,yn) is calculated as:

y - yn = m (x - xn)

or, y = m (x - xn) + yn .... (1)

Where m is a slope of a tangent at (xn,yn), given by:

m = df(x)/dx = f’(xn)

By substituting the value of m, the equation 1 becomes.

where ƒ' denotes the derivative of the function ƒ.

The x-intercept of this line (the value of x such that y=0) is then used as the next approximation to the root, xn+1. In other words, setting y to zero and x to xn+1 gives

y = f'(x_n) \, (x-x_n) + f(x_n),
0 = f'(x_n) \, (x_{n+1}-x_n) + f(x_n).

Solving for xn+1 gives

x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}. \,\!

In the given example function: f(x) = x2 - 5x + 6, let’s guess x0 = -5.

f(x0) = (-5)2 -5*-5 + 6 = 56

f’(x0) = 2 *(- 5) - 5 = -10 - 5 = -15

x1 = -5 - 56/(-15) = -1.266

If you keep on iterating (4 iterations), you will see it converging towards its root 2. A large error in the initial estimate can contribute to non-convergence of the algorithm. Also, when there are two or more roots that are close together then it may take many iterations before the iterates get close enough to one of them.

Quasi-Newton Methods

Secant Method:

The secant method replaces the derivative in Newton's method with the finite difference approximation and is thus a Quasi-Newton method.

Fig. 2: Secant Method [2]

In Fig 2, the secant line touches the curve at two points (xi-1,f(xi-1) and (xi,f(xi) where xi xi-1 are the two guesses for the roots for the first iteration. The next value of xi+1 is the intersection of the secant to x axis where f(xi+1) = 0 which is closer to the root of the function. Now, the slope of the secant (line) connecting these three points can be written as:

slope (m) = (f(xi) - f(xi-1))/ (xi - xi-1) = (f(xi+1) - f(xi))/ (xi+1 - xi)

Since (f(xi+1)= 0, solving for xi+1, we get

xi+1 = xi - f(xi)*(xi - xi-1)/(f(xi) - f(xi-1))

The secant method replaces the derivative in Newton's method with the finite difference approximation and is thus a Quasi-Newton method.

Newton's method converges faster than secant method as showed in the "Using Software" section. However, Newton method requires the evaluation of both the function (f) and its derivative (f') at every step which can be costly for complicated expressions. 

System of Linear Equations

Consider the following problem that can be represented by the system of linear equations. 

Problem 1: Suppose that a farmer has a piece of farm land planted with either wheat or barley or some combination of the two. The farmer has a limited amount of fertilizer, 6 kilograms, and pesticide, 6 kilograms. Every square kilometer of wheat requires 2 kilograms of fertilizer and 3 kilograms of pesticide, while every square kilometer of barley requires 3 kilograms of fertilizer and 2 kilograms of pesticide. 

If we denote the area of land planted with wheat and barley by x1 and x2 respectively,

2x1 + 3x2 = 6

3x1 + 2x2 = 6

They can be represented with functions f1 and f2 with two unknown variables, x1 and x2:

f1(x1,x2) = 2x1 + 3x2 - 6 = 0 ......... (1)

f2(x1,x2) = 3x1 + 2x2 - 6 = 0 ......... (2)

These equations are called multi-variable linear equations. We can plot the graphs for two equations (lines) in two dimension (x1,x2) space by finding two points for each line. For example, for equation (1) if x1 = 0; x2 = 2 and if x2 = 0 then x1=3. The ordinates for two points are (0,2) and (3,0). Similarly for equation (2), they are (0,3), (2,0). 

The solution of the problem gives the area of land with wheat and barely i.e. calculates x1 and x2. If the graph is plotted, the co-ordinate of the intersection of two lines (x1,x2) gives the solution i.e. x1 = 6/5 and x2 = 6/5 satisfy both the equations (1) and (2). The another method of solving is by multiplying the first equation by 3 and second by 2 and subtracting the second equation from the first called elimination method. The substitution method solves it by substituting the value of one variable from one equation to the second equation.

Matrix Representation

For larger problems, the system of linear equations are implemented through matrix operations. Let's represent the problem in the matrix form: Ax = b, where A is a matrix with vectors a1 and a2 as its elements, x and b are also vectors, i.e.

[a1 a2]x = b

Here, a matrix is used to represent the coefficients in a system of linear equations. If the matrix is invertible (non-singular or non-degenerate) i.e. determinant |A| is not zero, the solutions can be found using matrix operations that solve if by calculate the inverse of the matrix,

x = ATb

Direct Methods

Gauss elimination method which is a direct method gives an exact solution but the accuracy of this method is not very good in large systems. The main reason of inaccuracy in the gauss elimination method is round-off error because of huge mathematical operations of this technique. Though partial pivoting can fix this issue, this method is very time consuming in large systems. Like Gauss elimination, LU decomposition method is a direct method of getting exact solution of system of linear algebraic equations. This method attempts to decompose coefficient matrix into two lower and upper triangular matrices. Many of systems of linear algebraic equations which should be solved in engineering problems are large and there are lots of zeros in their coefficient matrix (sparse matrix). So, iterative methods are often used for numerical analysis. 

Iterative Methods

Gauss-Seidel, one of the iterative techniques, is very well-known because of its good performance in solving engineering problems. The Jacobi method is very similar to Gauss-Seidel method. Relaxations techniques are employed to improve the convergence of the Gauss-Sidel method. Using lambda as a simple wighting constant, the new guess can be modified by a weighted average of the results of the old and the new iteration. 

Example: Gauss-Seidel Method [8]

For other iterations and the solution, see the section "Using Software" below.

Differential Equations

The transient problems in all fields of engineering and the physical sciences that deals with rate of change are usually expressed in terms of differential equations. Differential Equations are the core for Newton’s and Lagrange’s equations for classical mechanics, Maxwell’s equations for classical electromagnetism, Schrodinger’s equation for quantum mechanics, and Einstein’s equation for the general theory of gravitation. 

Example I: First Order Ordinary Differential Equation - Barometric Pressure Variation [5]

Example II: Second Order Differential Equations - Hooke's Law [5]

For the vast majority of complex problems, these equations cannot be solved with analytical methods. Such problems are therefore tackled by numerical integration using standard techniques such as Euler's method or the Rungga-Kutta method. 

Euler's Method & Rungga-Kutta Method

Euler's method is used to find numerical approximations to the solutions of ordinary differential equations (ODEs). The higher order systems can be converted into a larger system of first-order equations by introducing extra variables without the loss of generality. Rungga-Kutta Method is based on Euler's method and uses weighted average of four increments (RK4), increment based on the slope at the beginning, middle, and the end of the interval.

Solving ODE i.e. calculate u for 100th iterations:

f(t,u): u' = u/(1 + t2), u(0) = 1

At t = 0, u =1; u (100) = ?

Step size (h) = dt = 10/100 = 0.1(should be smaller for better accuracy)

Applying Euler's function, we get 

un+1 = un + hf(tn,un)

u1 = u0 + 0.1 (1/(1 + 02) = 1 + 0.1 = 1.01

Here, f(t0,u0) calculates the slope of the line that is tangent to the solution curve at point (0,1). See the "Using Software" section to get the 100th iteration through Matlab code.

Lagrange or Euler-Lagrange's Equation

Let's define the spring system (Hooke's Law) using Lagrange's Equation. Here, q = x (length) and q. = v (velocity).

Now, this can be solved using Euler Method or Rungga-Kutta's method.

Green's Function

The Green’s function provides a complete solution to a boundary value problems. It can be used to solve differential equations from a large number of families including ordinary differential equations with initial or boundary value conditions, as well as more difficult examples such as non-homogeneous partial differential equations (PDE) with boundary conditions.

Refer to Green's function for Non-Homogeneous problem for details.

Galerkin Method

Galerkin method is an approximation method of converting continuous operator problem such as differential equations into a discrete problem. FEM is commonly introduced as a special case of Galerkin method. This method constructs an integral of the inner product of the residual and the weight functions and set the integral to zero. In simple terms, it is a procedure that minimizes the error of approximation (residual) by fitting trial or test functions into the PDE.

In the Galerkin method, it is assumed that the solution belongs to the same Hilbert space (includes Euclidean spaces, spaces of square-integrable functions, spaces of sequences, Sobolev spaces consisting of generalized functions, and Hardy spaces of holomorphic functions) as the test functions. The details on Galerkin Method is available at [10].

Exterema - Minima/Maxima or optimization

Linear Optimization

Problem 2: In the Problem 1 above, let's suppose S1 is the selling price of wheat per square kilometer, and S2 is the selling price of barley.  Then the profit can be maximized by choosing optimal values for x1 and x2 that has high Return Of investment (ROI). This problem can be expressed with the following linear programming problem in the standard form:

maximize: f(x1,x2) =  S1x1 + S2x2

S.T.

            x1,x2 >= 0           (area of land for cultivation)

            2x1 + 3x2 <= 6       (amount of fertilizers)

            3x1 + 2x2 <= 6       (amount of pesticides)

The constraints indicate that the certain area of land should be cultivated and the amount of fertilizers and pesticides each can't exceed 6 kg. This sets up the feasible regions for the solution bounded by the constraints. Though there are many feasible solutions, x1= 6/5 and x2= 6/5 maximizes the Profit and is the global maxima for the convex region. See the "Using Software" section for Matlab implementation of LP (Linear Programming).

This can be re-written in matrix form as:

maximize: f(x) = CTx

S.T.

          Ax = b

The linear function of unconstrained optimization can be converted to Quadratic form,

 Minimize f(x) = ||Ax - b||2 = xTATAx - 2bTAx + bTb

Also, the General Optimization Problem (GP) can be built into local Quadratic form at a given point x = x0 using Taylor expanison up to the quadratic term as showed:

maximize: f(x) = f(x0) + f(x0)T (x-x0) + 1/2 (x-x0)TH(x0)(x-x0

It can be re-written in exact quadratic form:

f(x) = 1/2xTQx + cTx, where Q = ATA is a symmetric matrix

Now, 

f(x) = Qx + c, where is the gradient vector of smooth function f(x)

H(x) = Q, where H is the Hessian matrix of the smooth function f(x)

If the Hessian is Positive Definite (PD) at x0, it attains a local minimum for f(x). Besides the Hessian test, there are Eigenvalue, Determinants, and Pivot tests as explained in [9] to test Positive-Negative Definite/Semi-definite or indefinite to determine minima/maxima or saddle points. See the "Using Software" section for Matlab implementation of Positive Definite Test. 

Euler-Lagrange for Optimization

Euler-Lagrange is the calculus of variation method that finds the function f(x) which optimizes the functional of f(x) in the interval [a,b].

Example: Finding the shortest distance between two points (Euler-Lagrange method)

Gradient Methods

The search for zeros or roots of the gradient of the function is the search for a minimum or maximum of a function. Newton's method assumes that the function can be locally approximated as a quadratic in the region around the extrema, and uses the first and second derivatives to find the stationary point. In the multi-valued function i.e. equations with multiple variables, the Jacobian becomes Hessian. In Newton method, Hessian needs to be inverted, which is typically implemented by solving a system of linear equations and is often quite costly. The quasi-Newton methods approximate the Hessian matrix i.e. inverse of Hessian is approximated by certain matrix that makes the computation fast and easy and the Hessian is updated by analyzing successive gradient vectors. It can be used even when Jacobian or Hessian is unavailable. Quasi-Newton methods used for optimization are the generalization of the secant method to find the root of the first derivative for multidimensional problem.

Quadratic or Faster Convergence

Suppose that the function ƒ has a zero at a, i.e., ƒ(a) = 0, and ƒ is differentiable in a neighborhood of a. If f  is continuously differentiable and its derivative is nonzero at a, then there exists a neighborhood of "a" such that for all starting values x0 in that neighborhood, the sequence {xn} will converge to "a". If the function is continuously differentiated and its derivative is not 0 at "a" and it has a second derivative at "a" then the convergence is quadratic or faster.

Taylor Series

Taylor series represents the arbitrary function f(x) by the power series of x at about the initial guess "x0" as showed, where ai are the constants.

f(x) = f(x0 + (x-x0)) = a0 + a1(x-x0) + a2(x-x0)2 + … + ak(x-x0)k

We assume that the series is continuously differentiable, then,

f’(x) = a1 + 2a2(x-x0) + ... + kak(x-x0)k-1 & f’’(x) = 2a2 + 3a3(x-x0) + ... + k(k-1)ak(x-x0)k-2

At x = x0,

f(x0) = a0

f’(x0) = a1

f’’(x0) = 2a2 i.e. a2 = f’’(x)/2 & so on

The series can therefore be written as:

f(x) = f(x0) + f’(x0)(x-x0) + 1/2! * f’’(x0)(x-x0)2 + 1/3! * f’’’(x0)(x-x0)3 + ….

If x0= 0, then it becomes the Maclaurin series.

f(x) = f(0) + f’(0)(x) + 1/2! * f’’(0)(x)2 + 1/3! * f’’’(0)(x)3 + ….

Let's approximate the single variable function f(x) by the quadratic function q(x) containing three terms (quadratic)

f(x) ≈ q(x) = f(x0) + f’(x0)(x-x0) + 1/2! * f’’(x0)(x-x0)2

Convergence

Taking the derivative and setting the result to zero, we get

q'(x) = f'(x0) + f''(x0)(x-x0) = 0

Solving for x, it becomes

x = x0 - f'(x0)/f''(x0)

From the initial guess x0, we can carry out the following iteration,

xn+1 = xn - f'(xn)/f''(xn)

Gradient Descent or the Steepest Descent Algorithm

With simple modification of quadratic convergence, Newton's method becomes gradient descent method that seeks out minima instead of general critical points. The cost function [2] from the quadratic functions below are plotted in Fig. 1.

f(x) = (x-1)2 & f1(x) = -f(x) = -(x-1)

Fig. 1: Cost Functions

At x = -10, the cost function, C(x) = 60.5 (blue) & C1(x) = -60.5 (orange). C(x) is minimum at x = 1 where C1(x) is maximum. So, to get to the bottom of C(x) or to get the minimum cost, we need to descend towards lower values of the cost function. Gradient descent algorithm allows to get to the minimum by finding better values of x (learning by adjusting weights in case of Neural Network) in each iteration. 

For this one dimensional problem, we can easily find the minima or maxima but in the multidimensional complex problems such as neural networks, iterative algorithm is necessary for the network to learn and prevent local minima. 

Determining  Maxmima and Minima for Quadratic function

Calculate the first and second derivative of the cost functions

C'(x) = (x-1) & C1'(x) = -(x-1)

C''(x) = 1 > 0 (minima) & C1''(x) = -1 < 0 (maxima)

So, the non-negative cost function results in minima at x = 1 where C(x) = 0. The cost function is decreasing downhill; negative direction of its gradient where the slopes become less steeper and finally becomes 0 at x = 1. 

The two variable cost function looks like the one in Fig. 2.

Figure 2: Two variable cost function [1]

With the initial guess x0, gradient descent algorithm compute the nth iteration for cost function C(x) as:

xn = xn-1 - C'(xn-1)

Initial guess x0 = 2, then

x1 = x0 - C'(x0) = 2 - (2 - 1) = 1

x2 = 1 - (1-1) = 1

Here, the learning rate (α) is set to 1. 

Let's set α to 0.5 and redo the calculation

x1 = 2 - 0.5(1) = 1.5

x2 = 1.5 - 0.5(1.5 - 1) =  1.25

x3 = 1.125, x4 = 1.0625 ....

Hence, with small value of α, the gradient descent is slow and it requires more iterations to converge to minimum value of 1.

If α = 2, then

x1 = 2 - 2 (2-1) = 0

x2 = 0 - 2(0 - 1) = 2

So, with larger value of α, the algorithm oscillates and fails to converge.

System of Non-Linear Equations

Consider the equations with two variables x1 and x2.

f1(x1,x2) = x12 + x22 - 50 = 0

f2(x1,x2) = x1 - x2 - 25 = 0

The Jacobian is:

If x = x0 (a vector) represents the first guess for the solution, successive approximations to the solution are obtained from 

xn+1 = xn - J-1⋅f(xn) = xn - ∆xn, with ∆xn = xn+1 - xn

After 16 iterations, the solution is (x1,x2) = (5,5). See the section "Using Software"

The secant method approximates the Jacobian through finite differences.

Using Software

Copy the directory FinteElementMethod from /usr/local/doc and cd to it. The scripts are obtained from [3].

cp -r /usr/local/doc/FinteElementMethod

cd FinteElementMethod

Request a compute node:

srun --x11 --pty bash

Load the matlab module & open Matlab

module load matlab

matlab &

Single Non-Linear Equation

This uses the function scripts - newtonRaphson.m and secantMethod.m

Run:

>>inputForNewtonRaphson

>>inputForSecantMethod

Fig. Output plot for Secant Method

Fig. Outpput plots for Newton-Raphson Method

System of Non-Linear Equations

Newton-Raphson

This Newton-Raphson method uses the functions f2.m, newtonm.m, and jacob2x2.m [3]

>> inputnewtonm

x0 =

     2

     1

x =

    5.0000

    5.0000

iter =

    16

Secant:

The Secant method uses the functions f2.m and secantm.m [3].  

>> inputsecantm

x0 =

     2

     3

dx =

    0.1000

x =

    5.0000

    5.0000

iter =

    18

Euler's Method

The Matlab script euler1.m implements Euler's method. The inputs are defined in inputEuler.m [6]

>>inputEuler

Fig. Implementation of Euler's Method and comparison with the exact solution

Gauss-Seidel Method

Load the python module

module load python

Run:

python gauss_seidel.py

output:

System:

('4.0*x1 + 1.0*x2 + -1.0*x3', '=', 3.0)

('2.0*x1 + 7.0*x2 + 1.0*x3', '=', 19.0)

('1.0*x1 + -3.0*x2 + 12.0*x3', '=', 31.0)

()

('Current solution:', array([ 0.,  0.,  0.]))

('Current solution:', array([ 0.75      ,  2.5       ,  3.14583333]))

('Current solution:', array([ 0.91145833,  2.00446429,  3.00849454]))

('Current solution:', array([ 1.00100756,  1.99849862,  2.99954069]))

('Current solution:', array([ 1.00026052,  1.99999118,  2.99997609]))

('Current solution:', array([ 0.99999623,  2.00000449,  3.00000144]))

('Current solution:', array([ 0.99999924,  2.00000001,  3.00000007]))

('Current solution:', array([ 1.00000001,  1.99999999,  3.        ]))

Solution:

[ 1.00000001  1.99999999  3.        ]

Error:

[  4.50900237e-08  -7.13728632e-08   0.00000000e+00]

Linear Programming (LP)

Check LP.m file. Here, S1 = 5 and S2 = 6. The negative sign for f is to maximize the function.

>> LP

output:

f =

    -5    -6

A =

     2     3

     3     2

b =

     6     6

Optimization terminated.

x =

    1.2000

    1.2000

Positive Definite (PD) Test

Check Spd_Mat.m file.

>> Spd_Mat([14, 6, 9; 6, 17, -4;9, -4, 13])

output:

EigenPositiveDefiniteUsingCHOL =

logical

1

EigenPositiveDefiniteUsingEig =

logical

1

DeterminantPositiveDefinite =

logical

1

z =

     1

     2

     3

HessianPositiveDefinite =

  logical

   1

References:

[1] Netwon Raphson - http://www.bragitoff.com

[2] Numerical Method - http://www.numericmethod.com

[3] Solutions of Non-Linear Equations

[3] Neural Networks and Deep Learning - http://neuralnetworksanddeeplearning.com/chap1.html

[4] Artificial Neural Network

[5] HyperPhysics

[6] Euler Implementation

[7] System of Linear Equations

[8] Gauss-Seidel Method

[9] Positive Definite Test

[10] Galerkin Method