Sensitivities, adjoints and optimization
Jutul.jl is build from the ground up to support gradient-based optimization problems. This includes both data assimilation/parameter calibration and control problems.
An example application from JutulDarcy.jl demonstrates many of these functions: A fully differentiable geothermal doublet: History matching and control optimization
Objective functions
There are two main types of objective functions supported in Jutul: Those that evaluated globally over all time-steps, and those who are evaluated locally (typically as a sum over all time-steps).
Jutul.AbstractJutulObjective — Type
Abstract type for Jutul objectives.
For either type of objective, you must implement the correct interface. This can either be done by passing a function directly, with Jutul guessing from the number of arguments what kind of objective it is, or by explicitly making a subtype that is a Julia callable struct.
These functions take in the step_info Dict, which is worth having a look at if you plan on writing objective functions:
Jutul.optimization_step_info — Function
optimization_step_info(step::Int, time::Real, dt; kwarg...)Optimization step information is normally not set up manually, but the output from this function will be passed to objective functions as step_info. This function is a Dict with the following fields:
Fields
:time- The time at the end of the step. To get time at the start of the step, usestep_info[:time] - step_info[:dt]:dt- The time step size for this step.:step- The step number, starting at 1. Not that this is the report step, and multiplestep_infoentries could have the same step if substeps are used.:Nstep- The total number of steps in the simulation.:substep- The substep number, starting at 1. This is used to indicate that multiple steps are taken within a single step.:substep_global- The global substep number, starting at 1 and will not reset between global steps.:Nsubstep_global- The total number of substeps in the simulation.:total_time- The total time of the simulation (i.e. time at the final step and substep)
Global objectives
These are objectives that are defined over the entire simulation time. This means that you define the objective function as a single function that takes in the solution for all time-steps together with forces, time-step information, initial state and input data used to set up the model (if any).
Jutul.AbstractGlobalObjective — Type
Abstract type for objective as a global objective function on the form:
F(model, state0, states, step_infos, forces, input_data)Note that if substeps are enabled by setting output_substates=true in the simulator setup, the length of forces and states will be dynamic and of equal length to step_infos. If you want to recover the state at a specific step, you can reason about :step field of the corresponding entry in the step_infos array.
Jutul.WrappedGlobalObjective — Type
WrappedGlobalObjective(objective)A global (in time) objective that wraps a function/callable. This type automatically wraps functions/callable structs that are passed to the optimization interface if they have the sum signature (five arguments).
Local/sum objectives
Jutul.AbstractSumObjective — Type
Abstract type for objective as a sum of function values on the form:
F(model, state, dt, step_info, forces)evaluated for each step. This means that the objective is a sum of all of these values. If you want to only depend on a single step, you can look up
Jutul.WrappedSumObjective — Type
WrappedSumObjective(objective)An objective that is a sum of function values, evaluated for each step, defined by a function. This type automatically wraps functions/callable structs that are passed to the optimization interface if they have the sum signature (five arguments).
Generic optimization interface
The generic optimization interface is very general, handling gradients with respect to any parameter used in a function that sets up a complete simulation case from a AbstractDict. This makes use of AbstractDifferentiation.jl and the default configuration assumes that your setup function can be differentiated with the ForwardDiff backend. In practice, this means that you must take care when initializing arrays and other types so that they can fit the AD type (e.g. avoid use of zeros without a type). In addition, there may be a large number of calls to the setup function, which can sometimes be slow. Alternatively, the numerical parameter optimization interface can be used, which only differentiates with respect to numerical parameters inside the model. A hybrid approach is also supported for the generic optimization interface by setting the deps and deps_ad arguments to optimize, which can be much faster, but assumes that the optimization variables only affect the numerical parameters/variables of the model (values stored in the Dicts from setup_parameters and setup_state0) and not any values that exist e.g. inside the model itself.
Defining the parameter object
Jutul.DictOptimization.DictParameters — Type
DictParameters(parameters)
DictParameters(parameters::AbstractDict, setup_function = missing;
strict = true,
verbose = true,
active_type = Float64
)Set up a DictParameters object for optimization. Optionally, the setup function that takes an instance with the same keys as parameters together with a step_info dictionary can be provided. The setup function should return a JutulCase set up from the parameters in the Dict.
Optional keyword arguments:
strict: If true, the optimization will throw an error if any of the parameters are not set with at least one of the upper or lower bounds.verbose: If true, the optimization will print information about the optimization process.active_type: The type of the parameters that are considered active in the optimization. Defaults toFloat64. This is used to determine which parameters are active and should be optimized. This means that all entries (and entries in nested dictionaries) of theparametersdictionary must be of this type or an array with this type as element type.
Defining constraints and free parameters
Jutul.DictOptimization.free_optimization_parameter! — Function
free_optimization_parameter!(dopt, "parameter_name", rel_min = 0.01, rel_max = 100.0)
free_optimization_parameter!(dopt, ["dict_name", "parameter_name"], abs_min = -8.0, abs_max = 7.0)Free an existing parameter for optimization in the DictParameters object. This will allow the parameter to be optimized through a call to optimize.
Nesting structures
If your DictParameters has a nesting structure, you can use a vector of strings or symbols to specify the parameter name, e.g. ["dict_name", "parameter_name"] to access the parameter located at ["dict_name"]["parameter_name"].
Setting limits
The limits can be set using the following keyword arguments:
abs_min: Absolute minimum value for the parameter. If not set, no absolute minimum will be applied.abs_max: Absolute maximum value for the parameter. If not set, no absolute maximum will be applied.rel_min: Relative minimum value for the parameter. If not set, no relative minimum will be applied.rel_max: Relative maximum value for the parameter. If not set, no relative maximum will be applied.
For either of these entries it is possible to pass either a scalar, or an array. If an array is passed, it must have the same size as the parameter being set.
Note that if dopt.strict is set to true, at least one of the upper or lower bounds must be set for free parameters. If dopt.strict is set to false, the bounds are optional and the DictParameters object can be used to compute sensitivities, but the built-in optimization routine assumes that finite limits are set for all parameters.
Other keyword arguments
initial: Initial value for the parameter. If not set, the current value indopt.parameterswill be used.scaler=missing: Optional scaler for the parameter. If not set, no scaling will be applied. Available scalers are::log: Logarithmic scaling. This value uses shifts to avoid issues with zero values.:exp: Exponential scaling:linear: Linear scaling (scaling to bounds of values, guaranteeing values between between 0 and 1 for initial values.)linear_limits: Linear scaling with limits (scaling to bounds of values, guaranteeing values between between 0 and 1 for all values within the limits.)reciprocal: Reciprocal scalinglog10: Base-10 logarithmic scalinglog: Base-e logarithmic scaling without shifts- A custom scaler object implementing the
DictOptimizationScalerinterface.
lumping=missing: Optional lumping array for the parameter. If not set, no lumping will be applied. The lumping array should have the same size as the parameter and contain positive integers. The lumping array defines groups of indices that should be lumped together, i.e. the same value will be used for all indices in the same group. The lumping array should contain all integers from 1 to the maximum value in the array, and all indices in the same group should have the same value in the initial parameter, otherwise an error will be thrown.
Jutul.DictOptimization.freeze_optimization_parameter! — Function
freeze_optimization_parameter!(dopt, "parameter_name")
freeze_optimization_parameter!(dopt, ["dict_name", "parameter_name"])
freeze_optimization_parameter!(dopt::DictParameters, parameter_name, val = missing)Freeze an optimization parameter in the DictParameters object. This will remove the parameter from the optimization targets and set its value to val if provided. Any limits/lumping/scaling settings for this parameter will be removed.
Jutul.DictOptimization.set_optimization_parameter! — Function
set_optimization_parameter!(dopt::DictParameters, parameter_name, value)Set a specific optimization parameter in the DictParameters object. This function will update the value of the parameter in the dopt.parameters dictionary.
Jutul.DictOptimization.add_optimization_multiplier! — Function
add_optimization_multiplier!(dprm::DictParameters, name_of_target; abs_min = 0.2, abs_max = 5.0)
add_optimization_multiplier!(dprm, target1, target2, target3; abs_min = 0.2, abs_max = 5.0, initial = 2.0)Add an optimization multiplier that acts on one or more targets to the DictParameters object. The multiplier will be optimized during the optimization process. All parameters with the same multiplier must have the same dimensions.
Optimizing and computing gradients
Jutul.DictOptimization.optimize — Function
optimized_dict = optimize(dopt, objective)
optimize(dopt::DictParameters, objective, setup_fn = dopt.setup_function;
grad_tol = 1e-6,
obj_change_tol = 1e-6,
max_it = 25,
opt_fun = missing,
maximize = false,
simulator = missing,
config = missing,
solution_history = false,
backend_arg = (
use_sparsity = false,
di_sparse = true,
single_step_sparsity = false,
do_prep = true,
),
kwarg...
)Optimize parameters defined in a DictParameters object using the provided objective function. At least one variable has to be declared to be free using free_optimization_parameter! prior to calling the optimizer.
Arguments
dopt::DictParameters: Container with parameters to optimizeobjective: The objective function to minimize (or maximize)setup_fn: Function to set up the optimization problem. Defaults todopt.setup_function
Keyword Arguments
grad_tol: Gradient tolerance for stopping criterionobj_change_tol: Objective function change tolerance for stopping criterionmax_it: Maximum number of iterationsoptimizer: Symbol defining the optimization algorithm to use. Available options are:lbfgs(default) and:lbfgsb(requires LBFGSB.jl to be imported)opt_fun: Optional custom optimization function. If missing, L-BFGS will be used. Takes in a NamedTuple containing fieldsf,g,x0,min,max. Here,f(x)returns the objective function value atx,g(dFdx, x)fillsdFdxwith the gradient atx,x0is the initial guess, andminandmaxare the lower and upper bounds, respectively. The functionsu = F.scale(x)andx = F.descale(u)can be used to convert between scaled and unscaled variables. Nominally, the initial values are scaled to the unit cube and the solution must thus be unscaled before usage. Gradients and internal scaling/descaling is automatically handled.maximize: Set totrueto maximize the objective instead of minimizingsimulator: Optional simulator object used in forward simulationsconfig: Optional configuration for the setupsolution_history: Iftrue, stores all intermediate solutionsdeps: One of:case,:parameters,:parameters_and_state0. Defines the dependencies for the adjoint computation. See notes for more details.backend_arg: Options for the autodiff backend:use_sparsity: Enable sparsity detection for the objective functiondi_sparse: Use sparse differentiationsingle_step_sparsity: Enable single step sparsity detection (if sparsity does not change during timesteps)do_prep: Perform preparation step
Returns
The optimized parameters as a dictionary.
Notes
- The function stores the optimization history and optimized parameters in the input
doptobject. - If
solution_historyistrue, intermediate solutions are stored indopt.history.solutions. - The default optimization algorithm is L-BFGS with box constraints.
Type of dependencies in deps
The deps argument is used to set the type of dependency the case setup function has on the active optimization parameters. The default, :case, is fully general and allows dependencies on everything contained within the case instance. This can be slow, however, as the setup function must be called for every time-step. If you know that the model instance and forces are independent of the active parameters, you can use deps = :parameters_and_state0. If there is no dependence on state0, you can set deps = :parameters. This can substantially speed up the optimization process, but as there is no programmatic verification that this assumption is true, it should be used with care.
This interface is dependent on the model supporting use of vectorize_variables! and devectorize_variables! for state0/parameters, which should be the case for most Jutul models.
Jutul.DictOptimization.parameters_gradient — Function
parameters_gradient(dopt::DictParameters, objective, setup_fn = dopt.setup_function)Compute the gradient of the objective function with respect to the parameters defined in the DictParameters object. This function will return the gradient as a dictionary with the same structure as the input parameters, where each entry is a vector of gradients for each parameter. Only gradients with respect to free parameters will be computed.
Numerical parameter optimization interface
Jutul.solve_adjoint_sensitivities — Function
solve_adjoint_sensitivities(model, states, reports_or_timesteps, G; extra_timing = false, state0 = setup_state(model), forces = setup_forces(model), raw_output = false, kwarg...)Compute sensitivities of model parameter with name target for objective function G.
The objective function is at the moment assumed to be a sum over all states on the form: obj = Σₙ G(model, state, dt_n, n, forces_for_step_n)
Solves the adjoint equations: For model equations F the gradient with respect to parameters p is ∇ₚG = Σₙ (∂Fₙ / ∂p)ᵀ λₙ where n ∈ [1, N]. Given Lagrange multipliers λₙ from the adjoint equations (∂Fₙ / ∂xₙ)ᵀ λₙ = - (∂J / ∂xₙ)ᵀ - (∂Fₙ₊₁ / ∂xₙ)ᵀ λₙ₊₁ where the last term is omitted for step n = N and G is the objective function.
Jutul.solve_adjoint_sensitivities! — Function
solve_adjoint_sensitivities!(∇G, storage, states, state0, timesteps, G; forces)Non-allocating version of solve_adjoint_sensitivities.
Jutul.solve_numerical_sensitivities — Function
solve_numerical_sensitivities(model, states, reports, G, target;
forces = setup_forces(model),
state0 = setup_state(model),
parameters = setup_parameters(model),
epsilon = 1e-8)Compute sensitivities of model parameter with name target for objective function G.
This method uses numerical perturbation and is primarily intended for testing of solve_adjoint_sensitivities.
Jutul.setup_parameter_optimization — Function
setup_parameter_optimization(model, state0, param, dt, forces, G, opt_cfg = optimization_config(model, param);
grad_type = :adjoint,
config = nothing,
print = 1,
copy_case = true,
param_obj = false,
kwarg...
)Set up function handles for optimizing the case defined by the inputs to simulate together with a per-timestep objective function G. The objective function should fit the format described in AbstractGlobalObjective or AbstractSumObjective.
Generally calling either of the functions will mutate the data Dict. The options are:
F_o(x): evaluate objectivedF_o(dFdx, x): evaluate gradient of objective, mutatingdFdx(may trigger evaluation ofF_o)F_and_dF(F, dFdx, x): evaluateFand/ordF. Ifnothingis passed for an entry, the corresponding entry is skipped.