Segmentation Fault in Your Program
Segmentation Fault in Your Program
It is possible to run the calculation on high performance computer (HPC) using a tons of CPU cores, says 1200 cores. For a large number of CPU cores, Message Passing Interface (MPI) can help you for running your program in parallel over multiple CPU processors and/or multiple compute nodes. Nowadays, there is many MPI candidates which is powerful, user-friendly, and free, such as OpenMPI, Intel MPI, MPICH, and MVAPICH. Calculation can be done successfully, without any error. However, most of people frequently confront the error when using pure MPI or Hybrid (MPI+OpenMP) for running the program or application in parallel.
For example, if you have found the standard error message that similar to this
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 182825 RUNNING AT COMPUTE-100
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 182825 RUNNING AT COMPUTE-100
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Intel(R) MPI Library troubleshooting guide:
https://software.intel.com/node/561764
===================================================================================
This error is called segmentation fault. Typically, the segmentation fault is error in the program or application that you use, not in MPI. Therefore, before running any parallel calculation using MPI, you should carefully check that your program/application is designed for this task of calculation. To overcome the segmentation fault, you can use the debugger, such as gdb, ddd, totalview, and padb for debugging segmentation fault in your program. In Linux, you can try first with "gdb" or "ddd". But, these tools are serial debuggers. If you want to use parallel debugger, please use "totalview" as describes below.
For gdb, you can use:
mpiexec -np 4 xterm -e gdb ./your_program
For ddd, you can use
mpiexec -np 4 ddd ./foo/your_program
Moreover, the "totalview" debugger allows multiple levels of debugging for MPI programs. If you need to debug your application without any information from the MPICH stack, you just need to compile your program with
mpicc -g
and run your application
totalview mpiexec -a -f machinefile ./your_program
Rangsiman Ketkaew