Pedigree
Introduction
In an `animal model' or
`sire model' genetic analysis we have data on a
set of animals that are genetically linked via a pedigree. The
genetic effects are therefore correlated and, assuming normal
modes of inheritance, the correlation expected from additive
genetic effects can be derived from the pedigree
provided all the genetic links are in the
pedigree. The additive genetic relationship matrix (sometimes
called the numerator relationship matrix) can be calculated from
the pedigree. It is actually the inverse relationship
matrix that is formed by ASReml for analysis.
Users new to this subject might find notes on Mixed Models for Genetic Analysis by Julius van der Werf helpful.
This document can be downloaded from the ASReml user area of the VSN website at:
http://www.vsni.co.uk/software/asreml/user-area/.
For the more general situation where the pedigree based inverse
relationship matrix is not the appropriate/required matrix, the user can
provide a particular general inverse variance (
GIV
) matrix explicitly
in a
.giv
file.
In this chapter we consider data presented in Harvey (1977) using
the command file
harvey.as
Pedigree file example
animal !P
sire !A
dam
lines 2
damage
adailygain
harvey.ped !ALPHA
harvey.dat
adailygain ~ mu lines, !r animal 0.25
Pedigree factor type
In ASReml the
!P
data field qualifier indicates
that the corresponding data field has an associated pedigree. The file
containing the pedigree (
harvey.ped
in the example) for
animal
is specified after all field definitions and before
the datafile definition.
See below for the first 20
lines of
harvey.ped
together with the corresponding lines of the data file
harvey.dat.
All individuals appearing in the data file must
appear in the pedigree file.
When all the pedigree information (
individual, maleprent, femaleprent
) appears as the first three fields of the data file, the data
file can double as the pedigree file. In this example the line
harvey.ped !ALPHA
could be replaced with
harvey.dat !ALPHA.
Typically additional individuals providing additional genetic links are present
in the pedigree file.
The pedigree file
The pedigree file is used to define the genetic
relationships for fitting a genetic animal model and is required if
the
!P
qualifier is associated with a data field.
The pedigree file
has three fields; the identities of an
individual, its sire and its dam (or maternal grand
sire if the
!MGS
qualifier, is specified),
in that order,
use identity
0
or
*
for unknown parents.
an optional fourth field may supply inbreeding/selfing information
used if the !FGEN qualifier is specified,
a fourth field specifying the SEX of
the individual is required if the !XLINK qualifier is specified,
is sorted so that the line giving the
pedigree of an individual appears before any line where that
individual appears as a parent,
is read free format; it may be the same
file as the data file if the data file is free format and has the
necessary identities in the first three fields, see below,
is specified on the line immediately
preceding the data file line in the command file,
harvey.ped harvey.dat
101 SIRE10 101 SIRE10 1 3 192 390 2241
102 SIRE10 102 SIRE10 1 3 154 403 2651
103 SIRE10 103 SIRE10 1 4 185 432 2411
104 SIRE10 104 SIRE10 1 4 183 457 2251
105 SIRE10 105 SIRE10 1 5 186 483 2581
106 SIRE10 106 SIRE10 1 5 177 469 2671
107 SIRE10 107 SIRE10 1 5 177 428 2711
108 SIRE10 108 SIRE10 1 5 163 439 2471
109 SIRE20 109 SIRE20 1 4 188 439 2292
110 SIRE20 110 SIRE20 1 4 178 407 2262
111 SIRE20 111 SIRE20 1 5 198 498 1972
112 SIRE20 112 SIRE20 1 5 193 459 2142
113 SIRE20 113 SIRE20 1 5 186 459 2442
114 SIRE20 114 SIRE20 1 5 175 375 2522
115 SIRE20 115 SIRE20 1 5 171 382 1722
116 SIRE20 116 SIRE20 1 5 168 417 2752
117 SIRE30 117 SIRE30 1 3 154 389 2383
118 SIRE30 118 SIRE30 1 4 184 414 2463
119 SIRE30 119 SIRE30 1 5 174 483 2293
120 SIRE30 120 SIRE30 1 5 170 430 2303
Reading in the pedigree file
The syntax for specifying a pedigree file in the ASReml command file is
pedigreefle [qualifiers]
the
qualifiers
are listed below,
the identities (
individual, maleprent, femaleprent
) are merged into a single list and the inverse relationship is formed before the data file is read,
when the data file is read, data
fields with the
!P
qualifier are recoded according to the
combined identity list,
the inverse
relationship matrix is automatically associated with factors coded
from the pedigree file unless some other covariance structure is
specified. The inverse relationship matrix is specified with the
variance model name
AINV
,
the
inverse relationship matrix is
written to
ainverse.bin,
if
ainverse.bin
already
exists ASReml assumes it was formed in a previous run and has the
correct inverse;
ainverse.bin
is read, rather than the inverse being reformed
(unless
!MAKE
is specified); this saves time when performing
repeated analyses based on a particular pedigree;
delete
ainverse.bin
or specify
!MAKE
if the pedigree is changed between runs,
identities are printed in the
.sln
file,
identities should be whole numbers less than 200,000,000 unless
!ALPHA
is specified,
pedigree lines for parents must precede their progeny,
unknown parents should be given the identity number 0,
if an individual appearing as a parent does not appear in the first column, it is assumed to have unknown parents, that is, parents with unknown parentage do not need their own line in the file,
identities may appear as both male and female
parents, for example, in forestry.
Pedigree file qualifiers
!ALPHA
indicates that the
identities are alphanumeric with up to 20 characters; otherwise by default they are numeric whole numbers <200,000,000.
!DIAG
causes the pedigree identifiers, the
diagonal elements of the Inverse of the Relationship Matrix
and the inbreeding coefficients
for the individuals (calculated as the diagonal of A-I)
to be written to
basename.aif.
!FGEN [f]
indicates the individuals in the pedigree are inbred to some degree.
The pedigree file
contains a fourth field indicating the level of selfing
or the level of inbreeding in a base individual.
In the fourth field,
0 indicates a simple cross, 1 indicates selfed once,
2 indicates selfed twice, etc.. A value between 0 and 1 for a base
individual is taken as its inbreeding value. If the pedigree
has implicit individuals (they appear as parents but not in
the first field of the pedigree file), they will be assumed base non-inbred
individuals unless their inbreeding level is set with !FGEN f
where 0ltf | lt1 is the inbreeding level of such individuals.
!GIV
instructs ASReml to write out the A-inverse in the format of
.giv
files.
!Goffset o
An alternative to group constraints
(see !GROUP below) is to shrink the group effects by
adding the constant o ( | gt0.0)
to the diagonal elements of A inverse pertaining to groups.
When a constant is added, no adjustment of the degrees of freedom is made for
genetic groups.
Use !Goffset -1 to add no offset but to suppress insertion of constraints
where empty groups appear. The empty groups are then
not counted in the DF adjustment.
!GROUPS g
includes genetic groups in the pedigree. The first g lines of the pedigree identify genetic groups (with zero in both the sire and dam fields). All other lines must specify one of the genetic groups as sire or dam if the actual parent is unknown.
You may insert Groups with no members to define
constraints on groups, that is to associate groups into supergroups
where the supergroup fixed effect is formally fitted separately in the model.
A constraint is added to the inverse which
causes the preceding set of groups which have members to have effects
which sum to zero. The issue is to get the degrees of freedom correct
and to get the correct calculation of the Likelihood, especially in
bivariate cases where DF associated with groups may differ between traits.
The
!LAST qualifier
is designed to help as without it, reordering
may associate singularities in the A matrix with random effects
which at the very least is confusing. When the A matrix incorporates
fixed effects, the number of DF involved may not be obvious, especially
if there is also a sparsely fitted fixed HYS factor. The number of Fixed
effects (degrees of freedom) associated with GROUPS is taken as
the declared number less twice the number of constraints applied.
This assumes all groups are represented in the data, and that
degrees of freedom associated with group constraints will
be fitted elsewhere in the model.
!INBRED
generates pedigree for inbred lines.
Each cross is assumed to
be selfed several times to stabilize as an inbred line as is usual for
cereals, before being evaluated or crossed with another line.
Since inbreeding is usually associated with strong selection,
it is not obvious that a pedigree assumption of covariance of 0.5 between parent and offspring actually holds.
Do not use the
!INBRED
qualifier with the
!MGS
or
!SELF
qualifiers.
!LONGINTEGER
indicates the identifiers are numeric integer with less than 16 digits. The default is
integer values with less than 9 digits. The alternative is alphanmeric identifiers with
up to 20 character indicated by !ALPHA.
!MAKE
tells ASReml to make the
A-inverse
(rather than trying to retrieve it from the
ainverse.bin
file).
!MGS
indicates that the third identity is the sire of the dam rather than the dam.
!MEUWISSEN
The default method for forming A inverse is based on the algorithm of
Meuwissen and Luo (1992).
!QUASS
The original routine for calculating A inverse in ASReml was based on Quass ()
!REPEAT
tells ASReml to ignore
repeat occurrences of lines in the pedigree file.
Use of this option will avoid the check that animals occur in chronological order, but chronological order is still required.
!SARGOLZAEI
invokes an alternative procedure for computing A inverse developed by
Sargolzaei etal (2005).
!SELF s
allows partial selfing when third field is unknown.
It indicates that progeny from a cross where the second parent (male\prent)
is unknown, is assumed to be from selfing with probability s and
from outcrossing with probability (1-s).
This is appropriate in some forestry tree breeding studies where seed collected
from a tree may have been pollinated by the mother tree or pollinated
by some other tree.
Do not use the
!SELF
qualifier with the
!INBRED
or
!MGS
qualifiers.
!SKIP n
you to skip n header lines at the top of the file.
!SORT
causes ASReml to sort the pedigree into an acceptable order,
that is parents before offspring,
before forming the A-Inverse. The sorted pedigree is written to
a file whose name has
.srt
appended to its name.
A pdf file
pedigree.pdf
contains details of these options.
!XLINK
requests the formation of the (inverse)
relationship matrix for the X chromosome as described by
Fernando and Grossman (1990) for species where the
male is XY and the female is XX. This NRM inverse matrix is formed
in addition to the usual A inverse and can be accessed as
GIV1 or as
specified in the output.
The pedigree must include a fourth field which codes the SEX of the
individual. The actual code used is up to the user and deduced from the
first line which is assumed to be a male. Thus, whatever string
is found in the fourth field on the first line of the pedigree is
taken to mean MALE and any other code found on other records is
taken to mean FEMALE.
Genetic groups
If all individuals belong to one genetic group, then use
0
as the
identity of the parents of base individuals. However, if base
individuals belong to various genetic groups this is indicated by the
!GROUP
qualifier and the pedigree file must
begin by identifying these groups. All base individuals should have
group identifiers as parents. In this case the identity
0
will only
appear on the group identity lines, as in the following
example where three sire lines are fitted as genetic groups.
Genetic group example
animal !P
sire 9 !A
dam
lines 2
damage
adailygain
harveyg.ped !ALPHA !MAKE !GROUP 3
harvey.dat
adailygain ~ mu !r animal 02.5 !GU
G1 0 0
G2 0 0
G3 0 0
SIRE1G1 G1
SIRE2G1 G1
SIRE3G1 G1
SIRE4G2 G2
SIRE5G2 G2
SIRE6G3 G3
SIRE7G3 G3
SIRE8G3 G3
SIRE9G3 G3
101 SIRE1G1
102 SIRE1G1
103 SIRE1G1
...
163 SIRE9G3
164 SIRE9G3
165 SIRE9G3
It is usually
appropriate to allocate a genetic group identifier where the parent is unknown.
Return to start
|