Multi-threading and MPI support
JutulDarcy can use threads by default, but advanced options can improve performance significantly for larger models.
Overview of parallel support
There are two main ways of exploiting multiple cores in Jutul/JutulDarcy: Threads are automatically used for assembly and can be used for parts of the linear solve. If you require the best performance, you have to go to MPI where the linear solvers can use a parallel BoomerAMG preconditioner via HYPRE.jl.
MPI parallelization
MPI parallelizes all aspects of the solver using domain decomposition and allows a simulation to be divided between multiple nodes in e.g. a supercomputer. It is significantly more cumbersome to use than standard simulations as the program must be launched in MPI mode. This is typically a non-interactive process where you launch your MPI processes and once they complete the simulation the result is available on disk. The MPI parallel option uses a combination of MPI.jl, PartitionedArrays.jl and HYPRE.jl.
Thread parallelization
JutulDarcy also supports threads. By defualt, this only parallelizes property evaluations and assembly of the linear system. For many problems, the linear solve is the limiting factor for performance.Using threads is automatic if you start Julia with multiple threads.
An experimental thread-parallel backend for matrices and linear algebra can be enabled by setting backend=:csr
in the call to setup_reservoir_model
. This backend provides additional features such as a parallel zero-overlap ILU(0) implementation and parallel apply for AMG, but these features are still work in progress.
Starting Julia with multiple threads (for example julia --project. --threads=4
) will allow JutulDarcy
to make use of threads to speed up calculations
The default behavior is to only speed up assembly of equations
The linear solver is often the most expensive part – as mentioned above, parts can be parallelized by choosing
csr
backend when setting up the modelRunning with a parallel preconditioner can lead to higher iteration counts since the ILU(0) preconditioner changes in parallel
Heavy compositional models benefit a lot from using threads
Threads are easy to use and can give a bit of benefit for large models.
Mixed-mode parallelism
You can mix the two approaches: Adding multiple threads to each MPI process can use threads to speed up assembly and property evaluations.
Tips for parallel runs
A few hints when you are looking at performance:
Reservoir simulations are memory bound, cannot expect that 10 threads = 10x performance
CPUs can often boost single-core performance when resources are available
MPI in JutulDarcy is less tested than single-process simulations, but is natural for larger models
There is always some cost to parallelism: If running a large ensemble with limited compute, many serial runs handled by Julia's task system is usually a better option
Adding the maximum number of processes does not always give the best performance. Typically you want at least 10 000 cells per process. Can be case dependent.
Example: 200k cell model on laptop: 1 process 235 s -> 4 processes 145s
Solving with MPI in practice
There are a few adjustments needed before a script can be run in MPI.
Setting up the environment
You will have to set up an environment with the following packages under Julia 1.9+: PartitionedArrays
, MPI
, JutulDarcy
and HYPRE
. This is generally the best performing solver setup available, even if you are working in a shared memory environment.
Writing the script
Write your script as usual. The following options must then be set:
setup_reservoir_model
should have the extra keyword argumentsplit_wells=true
. We also recommendbackend=:csr
for the best performance.simulate_reservoir
orsetup_reservoir_simulator
should get the optional argumentmode = :mpi
You must then run the file using the approprioate mpiexec
as described in the MPI.jl documentation. Specialized functions will be called by simulate_reservoir
when this is the case. We document them here, even if we recommend using the high level version of this interface:
Missing docstring.
Missing docstring for JutulDarcy.setup_reservoir_simulator_parray
. Check Documenter's build log for details.
JutulDarcy.simulate_reservoir_parray Function
simulate_reservoir_parray(case, mode = :mpi; kwarg...)
Run simulation with parray. This function is primarily for testing. simulate_reservoir
can do the same job by passing the correct mode.
Checklist for running in MPI
Install and load the following packages at the top of your script:
PartitionedArrays, MPI, HYPRE
(Recommended): Put
MPI.Init()
at the top of your scriptMake sure that
split_wells = true
is set in your model setupSet
output_path
when running the simulation (otherwise the results will not be stored anywhere)Set
mode=:mpi
when running the simulation.
A few useful functions:
MPI.install_mpiexecjl()
installs MPI that works "out of the box" with your current Julia setupMPI.mpiexec()
gives you the path to the executable and all environment variables
A typical command to launch a MPI script from within Julia:
n = 5 # = 5 processes
script_to_run = "my_script.jl"
run(`$(mpiexec()) -n $n $(Base.julia_cmd()) --project=$(Base.active_project()) $script_to_run`)
Adding threads to the command will make JutulDarcy use both threads and processes
⚠️ Running a script in MPI means that all parts of the script will run on each process! If you want to do data analysis you will have to either wrap your code in if MPI.Comm_rank(MPI.COMM_WORLD) == 0
or do the data analysis in serial (recommended).
Limitations of running in MPI
MPI can be cumbersome to use when compared to a standard Julia script, and the current implementation relies on the model being set up on each processor before subdivision. This can be quite memory intensive during startup.
You should be familiar with the MPI programming model to use this feature. See MPI.jl for more details, and how MPI is handled in Julia specifically.
For larger models, compiling the Standalone reservoir simulator is highly recommended.
Note
MPI consolidates results by writing files to disk. Unless you have a plan to work with the distributed states in-memory returned by the simulate!
call, it is best to specify a output_path
optional argument to setup_reservoir_simulator
. After the simulation, that folder will contain output just as if you had run the case in serial.