The Research Area :
Title : Parallel Optimization in
Author : Henry Kar Ming Chan
For the parallel optimization in
computational science and engineering, it is required to solve the large scales
non-linear optimization problems in parallel processing mode. OpenMP supports
multi-platform shared memory multi-processing architecture and is an industrial
standard Application Programming Interface (API) for forking and joining
program and sub-program tasks. From the product of optimizers, Knitro is an
advanced solver for non-linear optimization problems, handling bound
constraints, non-linear equalities and inequalities (both convex and
non-convex), and complementarity constraints. This is because Knitro mainly has
the state-of-the-art features using the algorithmic options, such as interior
point methods and active set method, which can be switched between one another
automatically or manually by changing the parameters settings. Knitro can
handle the inequality constraints by an interior point algorithm and direct
solution of the barrier sub-problems. In large scales non-linear problems, Knitro
can handle the inequality constraints by an interior point algorithm and
solution of the barrier sub-problems by conjugate gradient iterations. For
detecting infeasibility, Knitro can handle the inequality constraints by an
active set algorithm, which is especially beneficial when a good initial point
is available and when solving a sequence of related problems. In some cases,
the initial guess provided by the user and all subsequent iterates also satisfy
these inequality constraints, then a feasible mode can be set in the parameters
list. A simple test scenario is used as test cases. Different active set
methods are tested in parallel processing using OpenMP section or task
parallelism. Within each active set methods, multi-start features are used to randomly
select the initial points so as to find the global optimization points if
possible. Parallel Knitro for nonlinear programming solver and Intel MKL
library for vector and matrix processing are used with OpenMP directives.
Gradient and Hessian vectors for directional and partial derivatives are
generated with OpenMP features through Automatic differentiation.
Finally, vectorization of data, pre-fetching of data and inter-procedural
optimization for parallel OpenMP programs are necessary to be considered during
program compilation to achieve the best performance.
In Parallel Branch and
Bound : strong-branching and pseudo-costs; depth-first search and best-first
search. Potential Problems with OpenMP : Load Balancing, Thread Cancellation,
Error Model, Data Affinity, Data Sharing, Data Dependency, Task Dependency.
Potential Parallelism in strong branching includes generating lists of
candidates using job cancellation, memory aware load balancing features for
parallelizing “While Loop” subprogram within maximum iteration limit, and
applying within-subprogram near search tree nodes by controlling threads
placement on processors (spreading [instead of closing] the threads to idle
processors for strong-branching and depth-first searching at root nodes and at nodes of roots especially for large scale integer problems). Please also note
the "FREE" ramp-ups at the beginning and the "FREE"
processors at the end.
Desirable Features for Parallel
Branch and Bound
Parallel Multi-start (Lower/Upper bounds getting from
Priority for variables with Strong Branching features
Fork-Join hierarchy (Workshare with Transactional
Memory and Speculative Execution)
Vectorization/SIMD features (Common Array Shaping)
Loop, Section and Task parallelism (Looping constructs
and Task Model refinements)
Memory-aware load balancing (CPU Affinity)
Cancellation features (Full Error Model)
Footnote : Some developers find that
OpenMP in gfortran with Intel MKL BLAS and Sparse
Linear Solver reaches linear speedup in the number of cores. If your
application does not exploit these SIMD capabilities, you can easily lose a
factor of 16x or 8x compared to the peak performance of the CPU.
Parallel Branch and
Order Approaches :
Parallel Active Set
Parallel Primal Dual
Interior Point Methods :
F-8 aircraft using Knitro version 8.1 (AMPL)
NEW CASE for F-8 aircraft (In Parallel). A solution obtained by Knitro version 9.0 under NEOS server environment was tested by Henry Kar Ming Chan for the reduced model. The reduced model is defined by using less number of period of time (i.e. ntperu := 10). The standard defaults were used except mainly the strong branching strategies, maximum iterations and maximum nodes with parallel feature in MKL Blas only and without external optimizers. Final objective value is 0.020929. Number of nodes processed is 519 and number of sub-problems solved is 526. Total program CPU time is 30.6946 seconds with time spent in evaluations for 12.14195 seconds. The intervals on the equidistant grid on which w(t) = 1 holds, counting from 0 to 59, are 0, 1, 31, 32, 44, 52, and 53.
The future release of Knitro may have the features such as the tuner, new pre-solver, new parallel features and improvements in mixed-integer programming and the active-set algorithm.