Multivariate Analysis

Introduction

Multivariate analysis is used here in the narrow sense of a multivariate mixed model. There are many other multivariate analysis techniques which are not covered by ASReml. Multivariate analysis is used when we are interested in estimating the correlations between distinct traits (for example, fleece weight and fibre diameter in sheep) and for repeated measures of a single trait.

Repeated measures (rats)

There are two basic forms of analysis of repeated measures data: Random regression type models and multivariate models. The latter described her apply when there are a limited number of repeat measures and they are taken on each subject at the same times so that the data has a multivariate structure.

Wolfinger (1996) summarises a range of variance structures that can be fitted to repeated measures data and demonstrates the models using five weights taken weekly on 27 rats subjected to 3 treatments.

Multiple traits: Wether trial data

Three key traits for the Australian wool industry are the weight of wool grown per year, the cleanness and the diameter of that wool. Much of the wool is produced from wethers and most major producers have traditionally used a particular strain or 'bloodline'. The file wether.as specifies a bivariate analysis.

Model specification

The syntax for specifying a multivariate linear model in ASReml is
     Y-variates ~ fixed [ !r random ] [ !f sparse_fixed ]
where
  • Y-variates is a list of traits,
  • fixed, random and
  • sparse_fixed are as in the univariate case but involve the special term Trait and interactions with Trait

    The design matrix for Trait has a level (column) for each trait.
  • Trait by itself fits the mean for each variate,
  • In an interaction
  •      Trait.Fac fits the factor Fac for each variate and
         Trait.Cov fits the covariate Cov for each variate.

    ASReml internally rearranges the data so that n data records containing t traits each becomes n sets of t analysis records indexed by the internal factor Trait i.e. nt analysis records ordered Trait within data record. If the data is already in this long form, use the !ASMV t qualifier to indicate that a multivariate analysis is required.

    Variance structures

    A more sophisticated error structure is required for multivariate analysis. Consider a multivariate analysis with t traits and n units in which the data are ordered traits within units. An typical variance structure is to assume units are independent and traits are correlated. This is described as the direct product of an IDENTITY matrix and an unstructured ( US ) variance matrix.

    We discuss the syntax with reference to the following bivariate example
     Orange Wether Trial 1984-8
      SheepID !I
      TRIAL
      BloodLine !I
      TEAM *
      YEAR *
      GFW YLD FDIAM
     wether.dat !skip 1
    
     GFW FDIAM ~ Trait Trait.YEAR,        # Fixed model
              !r Trait.TEAM Trait.SheepID # Random model
    
     predict YEAR Trait
    
     1 2 2                                # Variance header
     1485 0 ID                            # units structure
     Trait 0 US                           # traits structure
      3*0
    
     Trait.TEAM 2                         # First G header
     Trait 0 US !GP
      3*0
     TEAM 0 ID
    
     Trait.SheepID 2                      # Second G header
     Trait 0 US !GP
      3*0
     SheepID 0 ID
    

    R-structure

    For a standard multivariate analysis
  • the error (R) structure for the residual must be
  • specified as two-dimensional with
        independent records and
        an unstructured variance matrix across traits;
  • records may have observations missing in different patterns and
  • these are handled internally during analysis,
  • the R structure must be ordered
  • traits within units, that is, the R structure definition line for units must be specified before the line for Trait ,
  • variance parameters are variances
  • not variance ratios,
  • the R structure definition line for units,
  • that is,
         1485 0 ID, could be replaced by
         0 or
         0 0 ID ; this tells ASReml to fill in the number of units and is a useful option when the exact number of units in the data is not known to the user,
  • the error variance matrix for traits is specified by the model
  •      Trait 0 US
         3 * 0
    Three initial values for the matrix are required being the lower triangle of the (symmetric) matrix specified row-wise.
    Finding reasonable initial values can be a problem. If initial values are written on the next line in the form      q * 0 where q is t(t + 1)/2 and t is the number of traits, as in the example,
    ASReml will take half of the phenotypic variance matrix of the data as an initial value.

    !ASUV and !ASMV

    These special qualifiers relating to multivariate analysis allow for the situation when
  • !ASUV: the data is in a multivariate layout but some residual variance structure other than IDENTITY cross US is required.
  • !ASMV t the data (file) is already in an expanded form (n sets of t records and the multivariate residual variance structure IDENTITY cross US IS required.
  • To use an error structure other than
  • US for the residual stratum you must (also) specify !ASUV on the datafile line and include mv in the model if there are missing values,
  • To perform a multivariate analysis (including the automatic
  • handling of missing values) when the data have already been expanded use !ASMV t on the datafile line.      t is the number of traits that ASReml should expect,     the data file must have t records for each multivariate record although some may be coded missing.

    G-structure

    For a standard multivariate analysis, a US structure is also used for the between trait variance matrix of the random terms (as in the example). However, other structured models may be used and may be necessary when there are more traits as it is not unusual for there not to be a positive definite solution for US matrices.     Note the use of !GP to request the estimated matrix be constrained to be positive definite, and
        the use of 3*0 in lieu of estimates of initial values; ASReml again substitutes a proportion of the observed variance covariance matrix of the data.

    Example

    Below is the output returned in the .asr file for this analysis, except that the !GO qualifiers were omitted.
      ASReml 1.63o [01 Jun 2005]  Orange Wether Trial  1984-88
          Build: j [01 Jul 2005]  32 bit
      13 Jul 2005 09:38:00.928   32.00 Mbyte Windows   wether
      Licensed to: Arthur Gilmour
    
      Folder: C:\data\asr\UG2\manex
       TAG  !I
       BloodLine !I
      QUALIFIERS: !SKIP 1
      Reading wether.dat  FREE FORMAT skipping     1 lines
    
      Bivariate analysis of GFW and FDIAM
      Using     1485 records of    1485 read
       Model term                  Size #miss #zero   MinNon0    Mean      MaxNon0
        1 TAG                       521     0     0      1   261.0956        521
        2 TRIAL                             0     0  3.000      3.000      3.000
        3 BloodLine                  27     0     0      1    13.4323         27
        4 TEAM                       35     0     0      1    18.0067         35
        5 YEAR                        3     0     0      1     2.0391          3
        6 GFW                  Variate      0     0  4.100      7.478      11.20
        7 YLD                               0     0  60.30      75.11      88.60
        8 FDIAM                Variate      0     0  15.90      22.29      30.60
        9 Trait                       2
       10 Trait.YEAR                  6  9 Trait     :   2   5 YEAR           :    3
       11 Trait.TEAM                 70  9 Trait     :   2   4 TEAM           :   35
       12 Trait.TAG                1042  9 Trait     :   2   1 TAG            :  521
        1485  identity
           2  UnStructure    0.2000    0.2000    0.4000
         2970 records assumed sorted    2 within    1485
           2  UnStructure    0.4000    0.3000    1.3000
          35  identity
      Structure for Trait.TEAM         has      70 levels defined
           2  UnStructure    0.2000    0.2000    2.0000
         521  identity
      Structure for Trait.TAG          has    1042 levels defined
      Forming    1120 equations:   8 dense.
      Initial updates will be shrunk by factor    0.316
      Notice: Algebraic ANOVA Denominator DF calculation is not available
              Empirical derivatives will be used.
      NOTICE:      2 singularities detected in design matrix.
        1 LogL=-886.521     S2=  1.0000       2964 df
        2 LogL=-818.508     S2=  1.0000       2964 df
        3 LogL=-755.911     S2=  1.0000       2964 df
        4 LogL=-725.374     S2=  1.0000       2964 df
        5 LogL=-723.475     S2=  1.0000       2964 df
        6 LogL=-723.462     S2=  1.0000       2964 df
        7 LogL=-723.462     S2=  1.0000       2964 df
        8 LogL=-723.462     S2=  1.0000       2964 df
    
      Source                Model  terms     Gamma     Component    Comp/SE
     \verb
      Residual            UnStru   2   1  0.128890      0.128890      12.40   0 U
      Residual            UnStru   2   2  0.440601      0.440601      21.93   0 U
      Trait.TEAM          UnStru   1   1  0.374493      0.374493       3.89   0 U
      Trait.TEAM          UnStru   2   1  0.388740      0.388740       2.60   0 U
      Trait.TEAM          UnStru   2   2   1.36533       1.36533       3.74   0 U
      Trait.TAG           UnStru   1   1  0.257159      0.257159      12.09   0 U
      Trait.TAG           UnStru   2   1  0.219557      0.219557       5.55   0 U
      Trait.TAG           UnStru   2   2   1.92082       1.92082      14.35   0 U
      Covariance/Variance/Correlation Matrix UnStructured
     0.4360 is the correlation Trait.TEAM
      0.1984     0.4360
      0.1289     0.4406
      Covariance/Variance/Correlation Matrix UnStructured
      0.3745     0.5436
      0.3887      1.365
      Covariance/Variance/Correlation Matrix UnStructured
      0.2572     0.3124
      0.2196      1.921
    
                                        Wald F statistics
          Source of Variation           NumDF     DenDF    F-inc             Prob
        9 Trait                             2      33.0  5761.58            <.001
       10 Trait.YEAR                        4    1162.2  1094.90            <.001
      Notice: The DenDF values are calculated ignoring fixed/boundary/singular
                  variance parameters using empirical derivatives.
    
                          Solution       Standard Error    T-value     T-prev
       10 Trait.YEAR
                         2  -0.102262       0.290190E-01     -3.52
                         3    1.06636       0.290831E-01     36.67     42.07
                         5    1.17407       0.433905E-01     27.06
                         6    2.53439       0.434880E-01     58.28     32.85
        9 Trait
                         1    7.13717       0.107933         66.13
                         2    21.0569       0.209095        100.71     78.16
       11 Trait.TEAM                           70 effects fitted
       12 Trait.TAG                          1042 effects fitted
      SLOPES FOR LOG(ABS(RES)) on LOG(PV) for Section   1
        1.00   1.54
               10  possible outliers: see .res file
      Finished: 13 Jul 2005 09:38:05.725   LogL Converged
    

    Return to start