Imputation

!IMPUTE and !SM

Imputation is only partly implemented and not tested. It is invoked by using !IMPUTE on the data line, and splitting the model into submodels with the !SM p qualifier. ASReml will then oscillate between the models.

The concept behind imputation is that a large complex model may be fitted by cycling between two or more simpler models using ideas from Gibbs sampling. Gilmour and Thompson (2003) outlined the strategy as follows:

A more complex strategy was proposed by Clayton and Rasbash 1999) for imputation in mixed models with large crossed (random) factors. It is based on ideas from the EM algorithm and from Gibbs sampling (Sorensen and Gianola 2002). In our context, their idea suggests iterating between two models: y - Zu = XT + e and y - XT = Zu + e .

We initiate the process with model 1 assuming u is zero. We fit the model solving for T and adding noise. We then solve the second model, estimate the variances and add noise to u. The process is then repeated. After `burn in', averages are obtained in the spirit of Gibbs sampling but avoiding some of the noise. We can think of Gibbs sampling methods as adding noise at every step of a simplified exact analysis. For instance estimate T and add noise, estimate u and add noise, form sums of squares for u and add noise to give an estimate of its variance. It is necessary to resample (add noise to) the effects used for adjusting the data to get the correct convergence. The variance used for resampling is based on the prediction error variance. The computational cost for exact methods increases proportional to the cube of the number of effects whereas the cost of sampling methods increases linearly with the number of effects. Therefore, as models increase in size, there will be a point where sampling methods are more efficient. In a sense the difficulties of calculating prediction error variances is replaced by sampling them. A proposed extension which may speed up the process is to do the calculations for several working response variables in parallel. This is aimed at reducing the sampling variance of the variance parameters and is based on arguments in Thompson (1994) and Garcia-Cortes and Sorensen (2001). They have pointed out that the sampling error can be reduced when updating var(u) by taking account of the variance of the noise added to u although this is simpler to do for uncorrelated effects. One can also get nearer to exact methods by using block updating but this leads to more complicated variance correction formula. It is not clear which computational scheme, exact, Gibbs sampling or intermediate will minimize computational effort.

Specification

The data filename line qualifier !IMPUTE n together with the model sectioning qualifier !SM i invokes imputation. The argument n specifies the number of BURNIN iterations and has a default value of 10. MAXIT is set to 11*BURNIN if not otherwise set. Noise is added to the solutions for a set of the equations as they are solved, and the noise affects later solutions generating a covariance among the solutions. Refer to the printed user guide for algebraic details. The qualifier !SM i in the model line, splits the model into submodels. e.g. y ~ mu !r !SM 1 animal !f !SM 2 hys. Thus the whole design matrix is formed and each sub-iteration uses appropriate parts of it. There is also a datafilename line qualifier !SM i which will select a submodel to process if !IMPUTE is not invoked. This mechanism is used behind the scenes to ensure all factors appear as main effects in the design matrix even though they may not be in the model to be fitted. This ensures PREDICT can form a PRESENT table if required. Consequently, PREDICT can not be performed if !IMPUTE or !SM are active.

At the end, the mean and variance of all solutions is calculated and reported in the .sln file. A summary of the estimated variance parameters is reported in the .res file.

Example

A set of data was simulated with a mean of 10, two cross classified factors of 100 levels, randomly allocated to 5000 observations. Factor A was simulated with a variance of 0.05, Factor B was simulated with a variance of 0.10 and the residual with a variance of 1.00. The job to analyse it as as follows.

 Imputation - fixed model
  row A 100 B 100 Y
 impute.asd !skip 1   !DOPART $1

 !PART 1      A and B fixed
  Y ~ mu  A  B

 !PART 3      A fixed and B random
  Y ~ mu  A !r B

 !PART 5      A and B random
  Y ~ mu !r A  B

 #   Models with Imputation
 !PART 11      A and B fixed
  !IMPUTE
  Y ~ mu !SM 1 A !SM 2 B

 !PART 13      A fixed and B random
  !IMPUTE
  Y ~ mu !SM 1 A !r !SM 2 B

 !PART 15   A and B random
  !IMPUTE
  Y ~ mu !r !SM 1 A !SM 2 B

A summary of variance estimates from these models is in the User Guide (Table 5.7) and shows the variance components estimated normally and as the average the impputation iterations. They agree to the third significant figure.

Return to start