Factor Definition

Introduction

The data fields are defined immediately after the job title. They tell ASReml how many fields to expect in the data file and what they are. No more than 10,000 variables may be read or formed. Data field definitions
  • should be given for all fields in the data file; data fields on the end of a data line that do not have a corresponding field definition will be ignored,
  • must be presented in the order in which they appear in the data file,
  • must be indented one or more spaces,
  • can appear with other definitions on the same line,
  • data fields can be transformed as they are defined (see below),
  • additional data fields can be created by transformation; these
  • should be listed after the data fields read from the data file.

    Syntax

    Usually there will be a field definition for every data field. For example, field definitions typical of a simple randomised block might be
     Randomised Block Experiment       #  Title Line
      Blocks *                         #  coded 1...
      Treatments !A                    #  alphabetic names
      yield                            #  response variable
     rcb.dat                           #  data file
     yield ~ mu Treatments !r Blocks   #  model line
    

    field definitions appear in the ASReml command file in the form
  • a leading SPACE is required on every line
  • a LABEL for the data field
  • [ FieldType ]
  • [ transformations]

    LABEL
  • is an alphanumeric string to identify the field,
  • has a maximum of 31 characters of which only 20 are
  • printed; the remaining characters are not displayed,
  • must begin with a letter,
  • must not contain the special characters ., *, :, /,
  • !, #, | or ( ,
  • names of predefined
  • model terms and variance structures must not be used,

    FieldType defines how a variable is interpreted as it is read and whether it is as a factor or variable if specified in the linear model,
  • for a simple variate, leave FieldType
  • blank or specify 1,
  • for a model factor, various qualifiers are required
  • depending on the form of the factor coding where n is the number of levels of the factor and s is a list of labels to be assigned to the levels:
  • * or n is used when the data field has values 1... n directly coding for the factor unless the levels are to be labelled (see !L ), for example Row *
  • !A [n] is required if the data field is alphanumeric; n must be specified if more than 2000 level names are present, for example Location !A,
  • !I [n] is required if the data is numeric but not 1... n ; n must be specified if more than 1000 codes are present, for example Year !I,
  • !AS [n] is required if the data field is similar to a previous !A or !I factor p and is to be coded identically, for example in a plant diallel experiment Male !A 22 Female !AS Male # integrated coding,
  • !L s is used when the data field is numeric with values 1... n and labels are to be assigned to the n levels, for example
    Sex !L Male Female If there are many labels, they may be written over several lines by using a trailing comma to indicate continuation of the list.
  • !P indicates the special case of a pedigree factor; ASReml will determine the levels from the pedigree file In all these, a warning is printed if the nominated value for n does not agree with the actual number of levels found in the data and if the nominated value is too small the correct value is used.
  • !G m [n] is used when m contiguous data fields are to be treated as a set or group of variates (n omitted or 1) or factor variables (n>1). For example
    :
    X1 X2 X3 X4 X5 y
    data.dat
    y ~ mu X1 X2 X3 X4 X5
    can be expressed as
    :
    X !G 5 y
    data.dat
    y ~ mu X
    so that the 5 variates can be referred to in the model as X by using X !G 5

    Date and Time fields

  • !DATE specifies the field has one of the date formats dd/mm/yy, dd/mm/ccyy, dd-Mon-yy or dd-Mon-ccyy and is to be converted into a Julian day dd is a 1 or 2 digit day of the month, mm is a 1 or 2 digit month of the year, Mon is a three letter month name ( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ), yy is the year within the century (00 to 99), cc is the century (19 or 20). The separators '/' and '-' must be present as indicated. The dates are converted to days since 1899. When the century is not specified, yy of 0-32 is taken as 2000-2032, 33-99 taken as 1933-1999.
  • !DMY specifies the field has one of the date formats dd/mm/yy or dd/mm/ccyy and is to be converted into a Julian day.
  • !MDY the field has one of the date formats mm/dd/yy or mm/dd/ccyy and is to be converted into a Julian day.
  • !TIME specifies the field has one of the format hh:mm:ss and is to be converted into seconds past midnight where hh is hours (0 to 23), mm is minutes (0-59) and ss is seconds (0 to 59). The separator ':' must be present as indicated.

    Storage of alphabetic factor labels

    Space is allocated dynamically for the storage of alphabetic factor labels with a default allocation being 2000 labels of 16 characters long. If there are large !A factors (so that the total across all factors will exceed 2000), you must specify the anticipated size (within say 5%).
  • If some labels are longer then
  • 16 characters and the extra characters are significant, you must lengthen the space for each label by specifying !LL c e.g.
    cross !A 2300 !LL 48
    indicates the factor cross will have about 2300 levels and needs 48 characters to hold the level names. Note that only the first 20 characters of the labels are ever printed.
  • !PRUNE on a field definition line means that if fewer levels are actually present in the factor than were declared, will reduce the factor size to the actual number of levels. Use !PRUNALL for this action to be taken on the current and subsequent factors up to (but not including) a factor with the !PRUNEOFF qualifier.
  • The user may overestimate the size for large ALPHA and INTEGER coded factors so that ASReml reserves enough space for the list. Using !PRUNE will mean the extra (undefined) levels will not appear in the .sln file. Since it is sometimes necessary that factors not be pruned in this way, for example in pedigree/GIV factors, pruning is only done if requested.

    Reordering the factor levels

    !SORT declared after !A or !I on a field definition line will cause ASReml to sort the levels so that labels occur in alphabetic/numeric order for the analysis. As ASReml reads the data file, it encodes !I and !A factor levels in the order they appear in the data so that for example, the user cannot tell whether SEX will be coded 1=Male, 2=Female or 1=Female, 2=Male without looking at the data file to see whether Male or Female appears first in the SEX field. If !SORT is specified, ASReml creates a lookup table after reading the data to select levels in sorted order and uses this sorted order when forming the design matrices. Consequentially, with the !SORT qualifier, the order of fitted effects will be 1=Female, 2=Male in the analysis regardless of which appears first in the file. This can lead to some confusion because some other operations will be applied to the unsorted order. In particular any transformations are performed as the data is read in and before the sorting occurs.

    !SORTALL means that the levels for the current and subsequent factors are to be sorted.

    Skipping input fields

    !SKIP f will skip f data fields BEFORE reading this field. is particularly useful in large files with alphabetic fields are not needed as it saves ASReml the time required to the alphabetic labels. For example Sire !I !skip 1 skip the field before the field which is read as 'Sire'.

    Return to start