Imputation
Context: Other Very Rarely used qualifiers
!IMPUTE and !SM
Imputation is only partly implemented and not tested.
It is invoked by using !IMPUTE on the data line,
and splitting the model into submodels with the !SM p
qualifier. ASReml will then oscillate between the models.
The concept behind imputation is that a large complex model
may be fitted by cycling between two or more simpler models using
ideas from Gibbs sampling. Gilmour and Thompson (2003) outlined the strategy as follows:
A more complex strategy was proposed by Clayton and Rasbash 1999) for
imputation in mixed models with large crossed (random)
factors. It is
based on ideas from the EM algorithm and from Gibbs
sampling (Sorensen and Gianola 2002). In our context,
their idea suggests iterating between two models:
y - Zu = XT + e and
y - XT = Zu + e .
We initiate the process with model 1
assuming u is zero.
We fit the model solving for T and adding noise.
We then solve the second model, estimate the variances and
add noise to u. The process is then repeated.
After `burn in', averages are obtained in the
spirit of Gibbs sampling but avoiding some of the noise.
We can think of Gibbs sampling methods as adding noise
at every step of a simplified exact analysis. For
instance estimate T and add noise,
estimate u and add
noise, form sums of squares for u and add noise to give
an estimate of its variance. It is necessary to resample (add
noise to) the effects used for adjusting the data to get
the correct convergence. The variance used for
resampling is based on the prediction error variance.
The computational cost for exact methods increases
proportional to the cube of the number of effects
whereas the cost of sampling methods increases linearly
with the number of effects. Therefore, as models
increase in size, there will be a point where sampling
methods are more efficient. In a sense the difficulties
of calculating prediction error variances is replaced by
sampling them.
A proposed extension which may speed up the process is to do the
calculations for several working response variables in
parallel. This is aimed at reducing the sampling
variance of the variance parameters and is based on
arguments in Thompson (1994) and Garcia-Cortes and
Sorensen (2001). They have pointed out that the sampling
error can be reduced when updating var(u) by taking account of
the variance of the noise added to u although this is
simpler to do for uncorrelated effects. One can also
get nearer to exact methods by using block updating but
this leads to more complicated variance correction
formula. It is not clear which computational
scheme, exact, Gibbs sampling or intermediate will
minimize computational effort.
Specification
The data filename line qualifier !IMPUTE n
together with the model sectioning qualifier !SM i
invokes imputation.
The argument n specifies the number of BURNIN iterations and
has a default value of 10.
MAXIT is set to 11*BURNIN if not otherwise set.
Noise is added to the solutions for a set of the equations
as they are solved, and the noise affects later solutions
generating a covariance among the solutions.
Refer to the printed user guide for algebraic details.
The qualifier !SM i
in the model line, splits the model into
submodels. e.g. y ~ mu !r !SM 1 animal !f !SM 2 hys.
Thus the whole design matrix is formed and each sub-iteration uses
appropriate parts of it.
There is also a datafilename line qualifier !SM i which
will select a submodel to process if !IMPUTE
is not invoked.
This mechanism is used behind the scenes to ensure all factors
appear as main effects in the design matrix even though
they may not be in the model to be fitted. This ensures PREDICT
can form a PRESENT table if required. Consequently, PREDICT
can not be performed if !IMPUTE
or !SM are active.
At the end, the mean and variance of all solutions is calculated and reported in the .sln
file. A summary of the estimated variance parameters is reported in the .res
file.
Example
A set of data was simulated with a mean of 10, two cross classified factors
of 100 levels, randomly allocated to 5000 observations.
Factor A was simulated with a variance of 0.05, Factor B was simulated with a variance of 0.10 and the residual with a variance of 1.00.
The job to analyse it as as follows.
Imputation - fixed model
row A 100 B 100 Y
impute.asd !skip 1 !DOPART $1
!PART 1 A and B fixed
Y ~ mu A B
!PART 3 A fixed and B random
Y ~ mu A !r B
!PART 5 A and B random
Y ~ mu !r A B
# Models with Imputation
!PART 11 A and B fixed
!IMPUTE
Y ~ mu !SM 1 A !SM 2 B
!PART 13 A fixed and B random
!IMPUTE
Y ~ mu !SM 1 A !r !SM 2 B
!PART 15 A and B random
!IMPUTE
Y ~ mu !r !SM 1 A !SM 2 B
A summary of variance estimates from these models is in the User Guide
(Table 5.7) and shows the variance components estimated normally and as the average
the impputation iterations. They agree to the third significant figure.
Return to start