|
Title : Automatic Differentiation of Parallel OpenMP Programs
Area : Parallel Design Patterns
Tool : Tapenade, OpenMP Summary : In recent year, most of the engineering and scientific applications have been written in Fortran/C using OpenMP as the parallelization directives. The scientists and engineers may use some Automatic Differentiation Tools, for example Tapenade, to do the sensitivity analysis and design optimization for some critical processes and algorithms. Calling functions and subroutines in the computer programs are efficient to obtain both the original results and the differentiated values in Tangent or Adjoint modes. In this paper, only Loop Structure, which is a basic design pattern requiring parallel operation, is discussed here. The Tangent, Tangent vector, Adjoint and Adjoint vector have been examined for best performance.
Original Loop Structure for Tangent and Adjoint Transformation:
C$OMP parallel do private(i,tmp) shared(L,x,y,z) do i=1,L tmp=sin(z(i)) x(i)=y(i)*tmp enddo C$OMP end parallel do
After Tangent or Adjoint differentiation from Tapenade software, the desired transformed program will be expected to look like as Figure 1 and Figure 2.
______________________________________________________________________________ C$OMP parallel do private(i,tmp,tmp_omp) shared(L,x,x_omp,y,y_omp,z,z_omp) do i=1,L
tmp = sin(z(i)) x(i)=y(i)*tmp
!Tangent Transformation !Adjoint Transformation tmp_omp=z_omp(i)*cos(z(i)) y_omp(i) = y_omp(i) + tmp*x_omp(i) x_omp(i)=y_omp(i)*tmp + y(i)*tmp_omp tmp_omp = y(i)*x_omp(i)
end do z_omp(i) = z_omp(i) + cos(z(i))*tmp_omp end do
C$OMP end parallel do
_______________________________________________________________________________
Figure 1. Tangent and Adjoint Transformation
_______________________________________________________________________________
call omp_set_nested(.TRUE.) C$OMP parallel private(i,tmp,tmp_omp,nd) shared(L,x,x_omp,y,y_omp,z,z_omp,nbdirs) C$OMP do do i=1,L tmp = sin(z(i)) x(i) = y(i)*tmp C$OMP parallel private(i,tmp,tmp_omp,nd) shared(x_omp,y,y_omp,z,z_omp,nbdirs) C$OMP do do nd=1,nbdirs
!Tangent Vector Transformation !Adjoint Vector Transformation
tmp_omp(nd) = z_omp(nd, i)*cos(z(i)) y_omp(nd, i) = y_omp(nd, i) &
x_omp(nd, i) = y_omp(nd, i)*tmp & + tmp*x_omp(nd, i)
+ y(i)*tmp_omp(nd) tmp_omp(nd) = y(i)*x_omp(nd, i)
end do z_omp(nd, i) = z_omp(nd, i) &
end do + cos(z(i))*tmp_omp(nd) end do
end do
C$OMP end parallel do
______________________________________________________________________________
Figure 2. Tangent Vector and Adjoint Vector Transformation
For workshare of asynchronization, the preliminary results have been obtained. The Tangent and Tangent vector are not required to do the workshare operations because the transformed program statements for derivatives are actually the same as the original program statements but with gradients. While the Adjoint and Adjoint vector programs are examined, the parallel sections can be used to further enhance the performance. Please refer to Figure 3 for the Adjoint transformation with program optimization using parallel sections as an example.
___________________________________________________________________________________
C$OMP parallel sections private(i,tmp,tmp_omp) & shared(L,x,x_omp,y,y_omp,z,z_omp) C$OMP section C$OMP parallel do private(i,tmp) shared(L,x,x_omp,y,y_omp,z) do i=1,L tmp = sin(z(i)) x(i)=y(i)*tmp y_omp(i) = y_omp(i) + tmp*x_omp(i) end do C$OMP end parallel do
C$OMP section C$OMP parallel do private(i,tmp,tmp_omp) shared(L,x_omp,y,z,z_omp) do i=1,L tmp_omp = y(i)*x_omp(i) z_omp(i) = z_omp(i) + cos(z(i))*tmp_omp end do C$OMP end parallel do C$OMP end parallel sections
___________________________________________________________________________________
Figure 3. Adjoint Transformation with program optimization using parallel sections
In addition, vectorization of data and pre-fetching of data for parallel OpenMP programs are important during program compilation in order to obtain the best performance.
In the year 2010 and the future, the OpenMP and Coarray Fortran for second and higher order derivatives will be tested and compared using Tapenade. Please feel free to contact the author at karminghenry@sinaman.com for any comments.
Company : Cluster Technology Centre (Tentative Company) |