A black box approach to fitting smooth models of mortality

Actuaries have long been interested in the forecasting of mortality for the purpose of the pricing and reserving of pensions and annuities. Most models of mortality in age and year of death, and often year of birth, are not identifiable so actuaries worried about what constraints should be used to give sensible estimates of the age and year of death parameters, and, if required, the year of birth parameters. These parameters were then forecast with an ARIMA model to give the required forecasts of mortality. A recent article showed that, while the fitted parameters were not identifiable, both the fitted and forecast mortalities were. This result holds if the age term is smoothed with P-splines. The present article deals with generalized linear models with a rank deficient regression matrix. We have two aims. First, we investigate the effect that different constraints have on the estimated regression coefficients. We show that it is possible to fit the model under different constraints in R without imposing any explicit constraints. R does all the necessary booking-keeping ‘under the bonnet’. The estimated regression coefficients under a particular set of constraints can then be recovered from the invariant fitted values. We have a black box approach to fitting the model subject to any set of constraints.


Introduction
Actuaries have long been interested in the forecasting of human mortality for the purpose of the pricing and reserving of pensions and annuities.Gompertz (1825), in a landmark paper, gave the first mathematical model of mortality.He only had death rates by age at ten year intervals but he observed that these rates were linear on a log scale.This enabled him to interpolate for all ages for the range of ages he had available.This was the celebrated Gompertz law of mortality.
At the present time extensive mortality data are available not only by single ages but also by year of death.The Human Mortality Database is a comprehensive resource of mortality data and gives mortality data (a) by age and year of death in a calendar year, and (b) an estimate of the mid-year number of lives exposed to the risk of death in that year.These data are available on over 40 countries for the male and female populations of these countries.
These richer datasets gave rise to the possibility not only of modelling mortality by age and year but also the forecasting of mortality.Such models were in terms of age and year of death, and often year of birth.The idea was to estimate the parameters in the models and then forecast the year of death parameters and, if required, the year of birth parameters; see Cairns et al. (2009) for a comprehensive treatment of such models.However, there was a problem-most models were not identifiable so it was not obvious how to obtain parameter estimates that would give sensible forecasts.Various sets of linear constraints on the parameters were proposed that would solve this problem.
Most models were generalized linear models or GLMs.Currie (2020) showed that, while the fitted parameters were not identifiable, the fitted mortalities were identifiable (a fundamental result in regression modelling) but crucially the forecast values of mortality were also identifiable when forecasting was done with an ARIMA model.This article illustrated this result by considering two sets of constraints, one a standard set of constraints found in the literature (see Cairns et al., 2009, for example) and the other a set of random constraints.A general proof of the result was also provided.
The purpose of the present article is twofold: first, to show that we can ignore the explicit use of constraints completely and use R's lm and glm commands to fit the models, and second, to use the invariance of the fitted mortalities to explore the relationship between the estimates of the parameters under different sets of constraints.R does all the necessary booking-keeping 'under the bonnet'.We have a black box approach to the fitting and forecasting of mortality.
We do not present any general theory but rather present our method through a single model, the Age-Period-Cohort model.We consider this model in a basic form and also in a smooth form where we use P-splines (Eilers an Marx , 1996) to smooth some terms in the model.It should be clear that our method applies more generally.

Data and notation
We illustrate our ideas with data from the Human Mortality Database on UK males (data downloaded February 2022).We adopt the convention that matrices are denoted by upper case bold font as in M and column vectors by lower case bold font as in m.The data comprise two matrices: D obs = (d x,y ), the matrix of the number of deaths at age x in year y, and E obs = (e x,y ), the total time lived or central exposure at age x in year y.(We will use D later to denote a difference matrix.)In general we suppose we have n a ages and n y years.We denote the age indices by x a = (1, . . ., n a ) and the year indices by x y = (1, . . ., n y ) where the indicates the transpose of a vector (or matrix).We index the rows and columns of D obs and E obs by x a and x y respectively.Thus D obs and E obs are both n a × n y .Further we index the cohorts or years of birth with x c = (1, . . ., n c ) where n c = n a + n y − 1 is the number of distinct cohorts.We adopt the convention that the oldest cohort, ie, for age n a in year 1, is indexed 1; hence, the cohort index for cell (x, y) is c(x, y) = n a − x + y.In our case we have data on ages 50 to 90 and years 1970 to 2018.Hence n a = 41, n y = 49 and n c = 89.
In addition, the following notation is useful: 1 n denotes a column vector of 1s of length n and I n denotes the identity matrix of size n.Further we let 0 n stand for either a column vector of 0s of length n or an n × n zero matrix; the context should make clear which applies.We may omit the suffix n if no confusion results.We will make frequent use of the Kronecker product which we denote ⊗.The function vec(M) stacks the columns of a matrix M on top of each other in column order.

The Age-Period-Cohort model
Let D x,y be the random variable corresponding to the observed deaths d x,y .We suppose that D x,y has the Poisson distribution with mean e x,y λ x,y where λ x,y is the force of mortality or instantaneous death rate at age x in year y, ie, D x,y ∼ P(e x,y λ x,y ).There is a slight abuse of notation here; strictly we should write λ x+1/2,y+1/2 since e x,y is the central exposure; we will use the simpler notation throughout for clarity.Let d = vec( D obs ) and e = vec( E obs ) be the stacked vectors of deaths and exposures; let λ be the corresponding vector of forces of mortality.
The Age-Period-Cohort or APC model is the simplest model which involves the age and year of death, and the year of birth.We define where c(x, y) = n a − x + y is the cohort index for age x in year y.Let θ = (α , κ , γ ) .The model matrix is where X a = 1 n y ⊗ I n a , X y = I n y ⊗ 1 n a and the row of X c corresponding to age x and year y contains a one in column n a − x + y and zeros elsewhere.With this setup we have log λ = Xθ .We have defined a generalized linear model or GLM with dependent variable d, model matrix X, Poisson errors and log link; the exposures e enter into the model as an offset.Now X is n a n y × (2n a + 2n y − 1) but its rank is 2n a + 2n y − 4.
The usual actuarial approach is to choose three suitable constraints and then forecast the resulting estimates of κ and γ .A common choice is see Cairns et al. (2009).We refer to (3.3) as the standard constraints.We can fit the APC model in R with the glm command.Here Dth and Exp contain the R variates holding the deaths d and the exposures e respectively.We have Inspection of Fit.APC$coef shows that the estimates of κ n y , γ n a +n y −2 and γ n a +n y −1 are returned as NA.
What is the connection between the approach to identifiability used by R and that often used by actuaries?We can mimic R's method with the constraints ie, we constrain the final period parameter and the final two cohort parameters to be zero.We refer to these constraints, implicit in R's approach, as the R constraints.We refit the APC model with the constraints (3.7).Currie (2013) gave the following generalization of the GLM scoring algorithm to fit a GLM with constraints Hθ = 0 but note that this algorithm plays no part in the treatment presented in this article Here W is the diagonal matrix of weights, z is the working variable, ˜indicates a current estimate and ˆdenotes an updated estimate.The fitted coefficients are precisely those in Fit.APC$coef but with the NA values replaced by zeros.Equation (3.8) is used simply to verify this equality.
We have fitted the model without the need to specify constraints.If we wish for some reason to find a fit with specified constraints we proceed as follows.We seek a solution of the GLM scoring algorithm (X WX) θ = X Wz. (3.9) Let θ1 and θ2 be two solutions of (3.9) where θ1 satisfies the known constraint H θ1 = 0 and θ2 satisfies some unspecified constraint.We know (X WX) θ1 = (X WX) θ2 ⇒ (X WX + H H) θ1 = (X WX) θ2 .Now X WX + H H is positive definite so if we can find a θ2 we have the solution which satisfies Hθ = 0 as θ1 = (X WX + H H) −1 X WX θ2 . (3.10) This expression exists in another form which is particularly illuminating.We observe that X θ2 = log λ and so we can write (3.10) as In other words, we can compute the estimated coefficients subject to any particular set of constraints directly from the invariant fitted values.Invariance is a fundamental and powerful idea and (11) demonstrates just one result from applying it.Figure 1 emphasizes the effect that constraints can have on parameter estimates.The effect on the estimates of the time parameters is particularly striking.Under the standard constraints one may be tempted to interpret the parameter estimates as indicating improving mortality.The estimates under the R constraints indicate exactly the opposite.What can be said is that mortality does improve over time but that can only be concluded securely from the invariant fitted mortality shown in the lower right panel.We examine the differences in the parameter estimates.We define α = αs − αr where the suffixes indicate the estimates under the standard and R constraints.We have corresponding definitions for κ and γ .Currie (2020) showed that these diferences obeyed a very simple relation.We have Gompertz smoothed the age parameters with a simple linear function.Here we use the method of P-splines to smooth the age parameters, α, in the APC model.As well as the general consideration that we expect the age effect to be a smooth monotonic function, there are good actuarial reasons for smoothing the age parameters.The resulting estimates in time by age will be more regular and hence the pricing of insurance products will also be more regular.
Let B a , n a × c a , be a regression matrix of B-splines along age.The model matrix is as in (2) but with X a = 1 n y ⊗ I n a replaced by 1 n y ⊗ B a .We define the penalty matrix Here D 2 is a second order difference matrix and τ is a tuning constant.See Eilers and Marx (2021) for a thorough introduction to the method of P-splines.We seek solutions of the penalized scoring algorithm where for a GLM with Poisson errors and a log link.We could use a generalization of (3.8) where the upper left block becomes X WX + P + H H. However, our purpose here is to use R to fit the model and then use invariance to calculate the estimates of the regression coefficients under the standard and R constraints.We will see that many of the results from the unsmoothed model carry over in a natural way.
We can avoid the use of algorithms like (3.8) with data augmentation; see Eilers and Marx (2021), p35.Data augmentation enables us to fit the smooth APC model with code similar to (3.4), (3.5) and (3.6) for the original APC model discussed in section 3. We define the augmented model matrix and augmented working variable as follows: Figure 2 Observed and fitted log λ for ages 50 to 90 at ten year intervals for the smooth APC model.
We choose τ by minimizing AIC.The AIC curve is quite flat in the region of the optimal value and a short grid search gives τ = 100 as a suitable value; this value is used in what follows.
We use the following extension of (3.11) to investigate the relation between the two sets of estimates obtained under the standard and R constraints.We put H equal to H s and H r in turn where H s and H r are the constraints matrices for the standard and R constraints.
As in the unsmoothed case, we define a , κ and γ as the differences in the two sets of estimates but note here that a refers to the coefficients of the B-spline basis.We do not have the general theory that led to equations (3.12), (3.13) and (3.14) when τ > 0; Currie (2020) gives the theory when τ = 0. We proceed numerically.Plots of a , κ and γ suggest that each is linear and this is confirmed by simple regression.We find in our example that the slopes are 0.17915, −0.03583 and 0.03583 respectively.We note that 0.17915/0.03583= 5.We have used a knot spacing of five which suggests that the slopes are C, −C and C where is the knot spacing.Support for this conjecture comes from Currie (2020) where it is shown that the slopes do indeed satisfy C, −C and C in the case τ = 0.
Figure 2 shows the observed and fitted log mortalities for age 50 to 90 at ten year intervals.Evidently, there has been a steady improvement in mortality over the observation period.This Statistical Modelling 2023; 23(5-6): 456-464 improvement corresponds to an approximate increase in life expectancy of two years every ten years.For example, life expectancy in the UK in 1970 was 72.3 years and this has risen to 81.2 years in 2019.This has major consequences for the provision of both state and private pensions.

Discussion
This short article has highlighted the effect that different constraint systems can have on parameter estimates.Figure 1 is a dramatic example of such effects.We have also highlighted the role of the invariant fitted values.Equation (3.11) shows that knowledge of these invariant values yields parameter estimates under any particular constraint system.We have used the basic lm function in R as a black box to estimate the invariant fitted values and from these values diagrams like Figure 1 can be obtained and equation (3.11) can be applied.

Figure 1
Figure1Estimates of α, κ and γ under standard constraints (solid line), R constraints (dashed line) and the observed and invariant fitted log mortality for age 70.
14) for some constants A, B and C. We conclude that the three sets of differences are linearly related with slopes C, −C and C. With our data and the standard and R constraints we found A = −1.163,B = 1.465 and C = 0.03199.Despite appearances to the contrary in Figure 1 the two sets of estimates are intimately related.Statistical Modelling 2023; 23(5-6): 456-464 4 The smooth Age-Period-Cohort model ) is weighted least squares so we can fit the model with R's lm function.We note that we must iterate with (4.2) until convergence.Good initial values are the observed deaths for d and the observed log mortality rates for z.Convergence is very fast since we have essentially implemented the GLM scoring algorithm via the lm function.