# Debugging the MOZART chemical solver¶

(Martin Schultz, 12/03/2014)

As we have just gone through this extensively, it may be a good time to document a bit how the MOZART solver works and what one can do to debug it (or rather: debug the chemical mechanism and/or initial concentrations).

## Calling sequence¶

`moz_chemistry`

(in mo_moz_chemistry.f90) is the wrapper routine for the MOZART chemistry driver `chemdr`

(in mo_moz_chemdr.f90). The first operates in the "ECHAM world" (kproma, krow, etc.), the latter translates things into the "MOZART world" (plonl, lat, etc.). MOZART is not "nproma-aware", which is why all arrays that are passed to `chemdr`

must be filled from `kproma+1:kbdim`

.

After calculation of photolysis and reaction rates, `chemdr`

calls the actual solver for the chemical equation systems. The *MOZART preprocessor (mozpp)* can generate code for three different solver types: explicit, implicit, and Rhodas. All three of them can be executed in parallel since every compound must be associated with one (and only one) solver type ("solution classes" in MOZART language). In ECHAM-HAMMOZ, we assume that all species are solved with the implicit solver `imp_sol`

(in mo_moz_imp_sol.f90).

## Description of imp_sol¶

The implicit solver loops over all grid boxes of the (plev, plonl) vector of volume mixing ratios(\!) it is given, so in effect it does a box model calculation for each grid box. The solver time step is initially set to the model time step length (`dt = delt`

). If the implicit solution does not converge, it will revert to the starting conditions and try again with a time step that is cut in half. This happens until convergence is reached **or** the `cut_limit`

(default value is 5) is reached. If a solution is reached with using a shorter timestep, the internal time (`interval_done`

) will be increased, and the next chunk of the model time step is treated the same way. If, after two steps, the solver still converges, it will try to increase the time step length again by a factor of two. This procedure is repeated until the `interval_done`

reaches or exceeds the model time step length.

In order to expediate the convergence, the species in the `base_solution`

vector are reordered (more or less by lifetime). `lsol`

is the "local" solution vector containing the mixing ratios of all species at the current iteration, `solution`

is the reordered vector which is actually modified. The reordering is defined in the `implicit%permute`

index vector. The `implicit%clsmap`

index vector points to those species (i.e. ECHAM tracers) which shall be subjected to the implicit solver. If you only run MOZ, `do jt = 1, implicit%clsmap`

is the same as `do jt = 1, ntrac`

. Would you have for example the explicit solver running in parallel (e.g. to solve for CH4, CO, etc.), the corresponding species indices would be missing in the `implicit%clsmap`

vector.

For each time step first the invariant parts of the Jacobian are computed (this is basically y(t)/dt), then the linear equation terms (photo- or decomposition reactions) are stored in the Jacobian matrix (subroutine `imp_linmat`

is a preprocessor generated routine that can be found in mo_moz_mat.f90). Then, the solver enters the iteration loop for the Newton Raphson iteration to solve for f(y) = 0. This loop is executed up to `implicit%iter_max`

times (default value is 11). The actual Jacobian matrix is stored in `sys_jac`

. `sys_jac`

is filled with the non-linear equation terms in `imp_nlnmat`

(also in mo_moz_mat.f90). At the end of this routine, the previously calculated linear terms are added to `sys_jac`

(this also happens within mo_moz_mat.90, subroutine `imp_nlnmat_finit`

). Note that the Jacobian is saved as a sparse diagonal matrix, i.e. only terms != 0 are actually included.

Once the Jacobian is formed, a LU factorization takes place (`imp_lu_fac`

), the production and loss rates are computed, and the tendency ("forcing") is evaluated as `forcing(m) = solution(m)*dti - (iter_invariant(m) + prod(m) - loss(m))`

. Then, the equations are solved in `imp_lu_slv`

(`imp_lu_fac`

, `imp_lu_slv`

, and `imp_prod_loss`

are all preprocessor generated and can be found in mo_moz_mat.f90). Finally, the `solution`

vector is updated and the convergence is tested by a relative criterion. Note, that MOZART applies stricter convergence criteria for some key compounds (presently these are O3, NO, NO2, NO3, HNO3, HO2NO2, N2O5, OH, and HO2) than for others. Th criteria applied are 1.e-3 for "normal" compounds, and 1.e-4 for the key species (`rel_err`

and `high_rel_err`

).

If all species converged, the iteration loop is exited, and the current solution is written back to the `base_sol`

vector.

## Debugging techniques¶

Unless you change your chemical mechanism (and forget to update some reaction rate constants), or you mess up your initial tracer concentrations file, you should rarely encounter a situation where you see a lot of `imp_sol: failed to converge `

(lon,lat,lev,indx,dt)@ messages. If there are a few of those when you do an initial run, don't bother. However, if they persist over many time steps, you will have to act.

The `imp_sol`

code will write some additional information about the reason for non-convergence. In particular, it will list the processor, lon, lev, lat and "indx" value where it happened and show you the species for which convergence was not reached. Again, under normal circumstances you will see one or two species here, and usually the same species in all locations. Thus you can first double-check your initial concentrations for this species (or these species). If this doesn't help, make use of the extended debugging code in mo_moz_imp_sol.f90, which is normally commented out. For this, you should pick one specific grid box (look at the "index" value from the output) and set `index_debug`

to that value. If you then run the code with all the extra debugging information, you should get a lot of detail about what happens at each Newton Raphson time step. Look at the `base_sol`

output and the production and loss rates to see if they make sense. Note that species are given in mixing ratios. This means that `prod`

and `loss`

values are not given in the canonical units of "cm^3 molecules^-1 s^-1", but rather in "vmr^-1 s^-1", which means they are multiplied by the Loschmid number (2.7e19 under normal surface conditions).

If you cannot figure out what is going on, try to build a reduced mechanism with the preprocessor so that you have a chance to carefully check each individual reaction equation. It may also help to print out all `rates`

in `imp_sol`

and make sure you find what you expect. To analyze this, it is very helpful to look at the "doc" file that is produced by the MOZART preprocessor. A sample "doc" file for the current HAMMOZ mechanism (219 species) is attached. Note that reactions are numbered in two different ways: in the equations in mo_moz_mat.f90, you will find "rNNN" and "jNNN" for gas-phase and photolysis reactions, respectively. The index numbers for these are listed on the left of the "doc" file's equations. In contrast, the `rates`

array uses indices from 1 to nphoto and adds the gas-phase reactions on top. Use the right-hand side indices in the "doc" file for debugging the `rates`

array.

Good luck!

## kproma testing of the MOZ code¶

**Due to the fact that imp_sol will always recompute an entire block if it fails to converge somewhere, you cannot guarantee bit-identical results for different kproma values unless you set cut_limit and iter_max to zero!**