Predict Directive

Prediction process

The first step is to specify the classify set of explanatory variables after the predict directive. The predict statement(s) may appear immediately after the model line (before or after any tabulate statements) or after the R and G structure lines. The syntax is
predict factors [ qualifiers] where
  • predict
  • must be the first element of the predict statement, commencing in column 1 in upper or lower case,
  • factors is a list of the variables defining a multiway table to be predicted; each variable may be followed by a list of specific levels/values to be predicted,
  • the qualifiers, modify the predictions in some way,
  • a predict statement may be continued on subsequent lines by terminating the current line with a comma,
  • several predict statements may be specified.
     NIN  Alliance trial 1989
      variety  !A
      ...
     nin89.asd !skip 1
     yield ~ mu variety !r repl
     predict variety
    

    ASReml parses the predict statement before fitting the model. If any syntax problems are encountered, these are reported in the .pvs file after which the statement is ignored: the job is completed as if the erroneous prediction statement did not exist. The predictions are formed as an extra process in the final iteration and are reported to the .pvs file. Consequently, aborting a run by creating the ABORTASR.NOW file will cause any predict statements to be ignored but using FINALASR.NOW will allow any predict statements to be honoured.

    By default, factors are predicted at each level, simple covariates are predicted at their overall mean and covariates used as a basis for splines or orthogonal polynomials are predicted at their design points. Model terms mv and units are always ignored.

    Prediction at particular values of a covariate or particular levels of a factor is achieved by listing the values after the variate/factor name. Where there is a sequence of values, use the notation a b ... n to represent the sequence of values from a to n with step size b-a. The default stepsize is 1 (in which case b may be omitted). A colon ( :) may replace the ellipsis ( ...). An increasing sequence is assumed. When giving particular values for factors, the default is to use the coded level (1: n) rather than the label (alphabetical or integer). To use the label, precede it with a quote ( ").

    The second step is to specify the averaging set. The default averaging set is those explanatory variables involved in fixed effect model terms that are not in the classifying set. By default variables that are not in any 'associated' list and that only define random model terms are ignored. Use the !AVERAGE, !ASSOCIATE or !PRESENT, qualifiers to force variables into the averaging set.

    The third step is to select the linear model terms to use in prediction. The default is that all model terms based entirely on variables in the classifying and averaging sets are used. Two qualifiers allow this default to be modified by adding ( !USE) or removing ( !IGNORE) model terms. The qualifier !ONLYUSE explicitly specifies the model terms to use, ignoring all others. The qualifier !EXCEPT explicitly specifies the model terms not to use, including all others. These qualifiers will not override the definition of the averaging set.

    The fourth step is to choose the weights to use when averaging over dimensions in the hyper-table. The default is to simply average over the specified levels but the qualifier !AVERAGE factor weights allows other weights to be specified. !PRESENT and !ASSOCIATE allow for more complicated averaging processes.

    For example,

     yield ~  site variety  !r site.variety at(site).block
     predict variety
    
    puts variety in the classify set, site in the averaging set and block in the ignore set. Consequently, ASReml forms the site- variety hyper-table from model terms site, variety and site.variety but ignoring all terms in at(site).block, and then forms averages across sites to produce variety predictions. This prediction will work even if some varieties were not grown at some sites because the site.variety term was fitted as random. If site.variety was fitted as fixed, variety predictions would be non estimable for those varieties which were not grown at each site.

    Predict failure

    It is not uncommon for users to get the message
    Warning: non-estimable [aliased] cell(s) may be omitted from the table.
    Immediate things to check include whether every level of every fixed factor in the averaging set is present, and whether all cells in every fixed interaction are filled. For example, in the previous example, no variety predictions would be obtained if site was declared as having 4 levels but only three were present in the data. The message is also likely if any fixed model terms are !IGNOREd.

    More formally, there are often situations in which the fixed effects design matrix X is not of full column rank. These can be classified according to the cause of aliasing.
    1. linear dependencies among the model terms due to over-parameterisation of the model,
    2. no data present for some factor combinations so that the corresponding effects cannot be estimated,
    3. linear dependencies due to other, usually unexpected, structure in the data.

    The first type of aliasing is imposed by the parameterisation chosen and can be determined from the model. The second type of aliasing can be detected when setting up the design matrix for parameter estimation (which may require revision of imposed constraints). The third type can then be detected during the absorption of the mixed model equations. Dependencies (aliasing) can be dealt with in several ways and ASReml checks that predictions are of estimable functions in the sense defined by Searle (1971, p160) and are invariant to the constraint method used.

    ASReml doesn't print predictions of non-estimable functions unless the !PRINTALL qualifier is specified. However, using !PRINTALL is rarely a satisfactory solution. Failure to report predicted values normally means that the predict statement is averaging over some cells of the hyper-table that have no information and therefore cannot be averaged in a meaningful way. Appropriate use of the !AVERAGE and/or !PRESENT qualifiers will usually resolve the problem. The !PRESENT qualifier enables the construction of means by averaging only the estimable cells of the hyper-table, where this is appropriate.

    See Also