This is a place to share experiences with CFOUR. Feel free to edit and/or add more to this page.
I have compiled two versions of CFOUR 1.0 on steno - with and without OpenMPI. Both compilations are linked to threaded MKL (which works exactly like sequential MKL with the correct environment variables).
I have placed submit scrips here (on steno):
Parallel (with OpenMPI): /kemi/andersx/scripts/submit_cfour_openmpi_threads.sh
The names indicates, if CFOUR uses OpenMPI and that it is linked to parallel MKL.
In the parallel script you can set the environment variables CFOUR_NUM_CORES and MKL_NUM_THREADS. Each instance of CFOUR will launch a separate MKL thread, so your total ConsumableCpus(x) will be the two multiplied. CFOUR_NUM_CORES governs the number of CFOUR threads (via MPI) and BLAS routines called by each MPI thread will use up to the number of cores specified by MKL_NUM_THREADS. For instance CFOUR_NUM_CORES=2 and MKL_NUM_THREADS=4 will require 8 cores in total.
In the serial script, you can set MKL_NUM_THREADS, but you can leave it at =1 if you want to use sequential MKL.
As for the amount of memory specified in the input file, the total memory required by the ConsumableMemory(xgb) variable will be CFOUR_NUM_CORES multiplied by MEMORY. For instance CFOUR_NUM_CORES=4 and *CFOUR(MEM_UNIT=GB,MEMORY=8) implies ConsumableMemory(32gb), since each CFOUR thread will allocate 8 GB each. More MKL_NUM_THREADS do not require extra RAM.
In my experience, CFOUR_NUM_CORES does not scale very well after 2-4 for heavy coupled cluster calculations, but this of course depends on what kind of jobs you are running. Generally, the more I/O a job requires use less CFOUR_NUM_CORES, since they will often I/O to disks heavily and simultaneously. For the non-parallelized parts of CFOUR (e.g. MP2) use CFOUR without MPI but add as many MKL_NUM_THREADS as you like.
For parallel coupled coupled cluster calculations you need to have ABCDTYPE=AOBASIS in your input file and use either CC_PROGRAM=VCC or CC_PROGRAM=ECC. I think the ECC option is the faster and more efficient of the two.
You need to have a GENBAS file located in the directory from where you submit the job. You will have the output from CFOUR continuously streamed into a .log file where you submitted your job
The usage of both scripts is thus (something like this):
[andersx@fend03 ZPVC]$ ~/scripts/submit_cfour_threads.sh my_input_file.inp
Let me know if there are any problems!
Ph.D. student at Computational Chemistry