Knitro

Introduction   Algorithms & Interfaces   Supports & Services   Product Info   Fact Sheet  

Training  Parallel Issues  MINLP Research

 

 
"Solving convex optimization problems, whether generated directly or as sub-problems when solving non-convex problems, is the principal requirement in computational optimization. Ever-increasing problem size, particularly as a result of the growing importance and scale of data analysis, is driving the development of novel techniques for solving convex optimization problems using high performance computing systems." Convex Optimization and Beyond workshop, 27 June 2014, ICMS Edinburgh.
 
Parallel Optimization Services --- Parallel Computation in Mathematical Algorithmic Optimization (PCMAO)
 
The Research Area :

Title : Parallel Optimization in Engineering
Author : Henry Kar Ming Chan

For the parallel optimization in computational science and engineering, it is required to solve the large scales non-linear optimization problems in parallel processing mode. OpenMP supports multi-platform shared memory multi-processing architecture and is an industrial standard Application Programming Interface (API) for forking and joining program and sub-program tasks. From the product of optimizers, Knitro is an advanced solver for non-linear optimization problems, handling bound constraints, non-linear equalities and inequalities (both convex and non-convex), and complementarity constraints. This is because Knitro mainly has the state-of-the-art features using the algorithmic options, such as interior point methods and active set method, which can be switched between one another automatically or manually by changing the parameters settings. Knitro can handle the inequality constraints by an interior point algorithm and direct solution of the barrier sub-problems. In large scales non-linear problems, Knitro can handle the inequality constraints by an interior point algorithm and solution of the barrier sub-problems by conjugate gradient iterations. For detecting infeasibility, Knitro can handle the inequality constraints by an active set algorithm, which is especially beneficial when a good initial point is available and when solving a sequence of related problems. In some cases, the initial guess provided by the user and all subsequent iterates also satisfy these inequality constraints, then a feasible mode can be set in the parameters list. A simple test scenario is used as test cases. Different active set methods are tested in parallel processing using OpenMP section, loop or task parallelism. Within each active set methods, multi-start features are used to randomly select the initial points so as to find the global optimization points if possible. Parallel Knitro for nonlinear programming solver and Intel MKL library for vector and matrix processing are used with OpenMP directives. Gradient and Hessian vectors for directional and partial derivatives are generated with OpenMP features through Automatic differentiation. Finally, vectorization of data, pre-fetching of data and inter-procedural optimization for parallel OpenMP programs are necessary to be considered during program compilation to achieve the best performance.

In Parallel Branch and Bound : strong-branching and pseudo-costs; depth-first search and best-first search. Potential Problems with OpenMP : Load Balancing, Thread Cancellation, Error Model, Data Affinity, Data Sharing, Data Dependency, Task Dependency. Potential Parallelism in strong branching includes generating lists of candidates using job cancellation, memory aware load balancing features for parallelizing “(most likely nested) While Loop” subprogram within maximum iteration limit, and applying within-subprogram near search tree nodes by controlling threads placement on processors (spreading [instead of closing] the threads to idle processors for strong-branching and depth-first searching at root nodes and at nodes of roots especially for large scale integer problems). Please also note the "FREE" ramp-ups at the beginning and the "FREE" processors at the end.

Desirable Features for Parallel Branch and Bound

  • Parallel Multi-start (Lower/Upper bounds getting from Multi-starts)
  • Priority for variables with Strong Branching features (Automatic Tuner)
  • Fork-Join hierarchy (Workshare with Transactional Memory and Speculative Execution)
  • Vectorization/SIMD features (Common Array Shaping)
  • Loop, Section and Task parallelism (Looping constructs and Task Model refinements)
  • Memory-aware load balancing (CPU Affinity)
  • Cancellation features (Full Error Model)
Footnote : Some developers find that OpenMP in gfortran with Intel MKL BLAS and Sparse Linear Solver reaches linear speedup in the number of cores. If your application does not exploit these SIMD capabilities, you can easily lose a factor of 16x or 8x compared to the peak performance of the CPU.
 
External Link :
 

Parallel Branch and Bound :

Parallel Second/Higher Order Approaches :

 
Benchmarks :
 

 

NEW CASE for F-8 aircraft (In Parallel). A solution obtained by Knitro version 9.0 under NEOS server environment was tested by Henry Kar Ming Chan for the reduced model. The reduced model is defined by using less number of period of time (i.e. ntperu := 10). The standard defaults were used except mainly the strong branching strategies, maximum iterations and maximum nodes with parallel feature in MKL Blas only and without external optimizers. Final objective value is 0.020929. Number of nodes processed is 519 and number of sub-problems solved is 526. Total program CPU time is 30.6946 seconds with time spent in evaluations for 12.14195 seconds. The intervals on the equidistant grid on which w(t) = 1 holds, counting from 0 to 59, are 0, 1, 31, 32, 44, 52, and 53.

The future release of Knitro may have the features such as the tuner, new pre-solver, new parallel features and improvements in mixed-integer programming and the active-set algorithm.

 

A sample gfortran program using compiler version 4.10 (development stage) with OpenMP 4.0 features for Parallel Branch and Bound for testing purpose is attached for reference :

program parBandB
  use OMP_LIB
  implicit none
  integer :: shared_root_node, private_itr, max_itr
  double precision :: shared_max, private_temp, private_temp_prod
  double precision, allocatable, dimension(:) :: myarray
  max_itr = 100
  allocate ( myarray(max_itr) )
  do private_itr = 1, max_itr
     myarray(private_itr) = (max_itr + 1.0 - private_itr) / max_itr
  end do
  myarray(1) = 1.001
!$OMP PARALLEL proc_bind(spread) num_threads(2) &
!$OMP & reduction(max:shared_max) reduction(min:shared_root_node)
!$OMP MASTER
  shared_root_node = min(100,100)
  shared_max = max(0.999,0.999)
!$OMP END MASTER
!$OMP SINGLE
!$OMP TASKGROUP
!$OMP TASK shared(shared_max, shared_root_node) default(firstprivate) untied mergeable
  do while ((shared_root_node > 1))
!$OMP PARALLEL shared(shared_max, shared_root_node) proc_bind(close) num_threads(2)
!$OMP SINGLE
!$OMP TASKGROUP
!$OMP TASK depend(inout:myarray) private(private_temp,private_temp_prod,private_itr) &
!$OMP & shared(shared_max, shared_root_node) default(firstprivate) untied mergeable
!$OMP SIMD linear(private_itr) safelen(4) private(private_temp,private_temp_prod)
    do private_itr = 1, max_itr
      call random_number(private_temp)
      private_temp_prod = private_temp * myarray(private_itr)
      if ((private_temp_prod > shared_max) .or. (private_temp > shared_max)) then
        shared_max = max(shared_max,private_temp,private_temp_prod)
        shared_root_node = min(shared_root_node,private_itr)
      endif
    enddo
!$OMP END SIMD
!$OMP CANCEL TASKGROUP IF (shared_root_node == 1)
!$OMP END TASK
!$OMP END TASKGROUP
!$OMP END SINGLE
!$OMP SINGLE
print*, 'Maximum bound', shared_max
!$OMP END SINGLE
!$OMP CANCEL PARALLEL IF (shared_root_node == 1)
!$OMP END PARALLEL
  enddo
!$OMP END TASK
!$OMP END TASKGROUP
!$OMP END SINGLE NOWAIT
!$OMP CANCEL PARALLEL IF (shared_root_node == 1)
!$OMP END PARALLEL
  deallocate( myarray )
  STOP
end program parBandB

REMARK : It is good to consider to enable nested and set max active levels. Sometimes, it is also good to set thread limit to prevent runaway oversubscription and set dynamic to enable flexible adjustment of the number of threads.

Performance Analysis :

Outer Taskloop SIMD thread 0 running on cpu 0
Outer Taskloop SIMD thread 1 running on cpu 4

                                                                         Current Thread             Previous Threads

Inner Task thread 0 of Outer Task 0 on cpu 0 (E-HH:HH:HH S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz)...
Inner Task thread 1 of Outer Task 0 on cpu 1 (E-GG:GG:GG S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz)...
Inner Task thread 2 of Outer Task 0 on cpu 2 (E-FF:FF:FF S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz)...
Inner Task thread 3 of Outer Task 0 on cpu 3 (E-EE:EE:EE S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz)...

Inner Task thread 0 of Outer Task 1 on cpu 4 (E-DD:DD:DD S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz)...
Inner Task thread 1 of Outer Task 1 on cpu 5 (E-CC:CC:CC S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz)...
Inner Task thread 2 of Outer Task 1 on cpu 6 (E-BB:BB:BB S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz)...
Inner Task thread 3 of Outer Task 1 on cpu 7 (E-AA:AA:AA S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz) (E-xx:yy:zz S-xx:yy:zz)...

REMARK : It is good to consider to try CPU Affinity if possible. In addition, it is also good to note that the Start time of other CPUs should be faster than the earliest End time.