Overlaying design matrices for a parental model in ASReml-R

Overlaying Design Matrices for a Parental Model in ASReml-R

Overlaying Design Matrices for a Parental Model in ASReml-R

Most breeding programs plan several controlled crosses between outstanding parents to detect favorable alleles in their offspring. The progeny are later evaluated in a field experiment, and this information is used to assess the genetic worth of the parents by fitting parental linear mixed models (LMMs) and obtaining best linear unbiased predictions (BLUPs). These BLUPs, which are the general combining ability (GCA), or 1/2 of the breeding value (BV, with BV = 2 ×× GCA) of each parent, are then used to select the best parents for future crosses or operational deployment.

In many plant breeding programs, a parent is considered in several crosses. In most cases it is easy to assign the sex of a given individual. However, several commercial plant species are monoecious, which means that a given genotype will bear both male and female flowers. In contrast, dioecious species have distinctive male and female plants. Some examples of monoecious species are corn, squash, banana, and many conifers, particularly those of the genus Pinus.

In quantitative genetic analyses, monoecious species present a particular challenge, as a given parent can contribute to the estimation of its breeding value (or GCA) as both a male and a female, something that needs to be taken into consideration when a statistical model is fitted.

This is done by overlaying design matrices of the factors associated with male and female parents. We will illustrate this here using an example of loblolly pine (Pinus taeda) in which some individuals, depending on the availability of pollen or flowers, were used in several artificial crosses as both male and female in the breeding program.

The data used here originates from a loblolly pine clonal study published by Resende et al. (2012). Parents were crossed in a circular mating design, constituting several full-sib families. Individuals from these families were vegetatively propagated (cloned) and established in a series of field trials. A subset of the full dataset, corresponding to diameter at breast height (DBH, inches) measured at 6 years since planting at the Nassau (Florida, USA) site, is used here. We will be using the adjusted mean values for this trait. In this dataset we have a total of 71 families that originated from 43 parents; however, 20 of those parents were used as both males and females in different crosses. There were no reciprocal or self-pollinated crosses planned, but these can occur in other crops and they might need to be identified and modelled properly. In this dataset, each family is formed by between 1 and 16 individuals (with an average of 12.2). We also have pedigree information for the 43 parents.

The first few lines of the phenotypic dataset are presented below:

treemotherfatherfamilyDBH
280200200124411210179.410
2822002002244090102011.953
28440044126142170104910.785
285210501525804010569.950
28560014202444112100610.325
2866004401217766103611.122

And the first and last few lines of parental pedigree file are:

 genotypemotherfather
11400600
21404600
31406000
41407000
51410400
61410600
481420761411414104
491421701410614046
501421941421614060
512020602003020052
522020962003420052
532220562203414006
542220602202022034

In the phenotypic data each row contains the id of the individual tree, together with information about their mother and father, followed by the adjusted phenotypic response DBH. In the pedigree file there is information about the mother and father of each of the parents used in the crosses. For example, note that tree has father , and this genotype has as parents genotypes and .

We are interested in fitting a parental model, where our objective is to estimate GCA values for each of the parents. (Note that this is different than an animal model, where we estimate a BV for each of the individuals (parents or offspring).) Our parental base model is:

where, is the response for the kth individual originating from the cross between the ith female with the jth male; is the overall mean; is the random effect of the ith female, with ; is the random effect of the jth male, with ; and is the interaction or family effect of the cross between the ith female with the jth male, with ; and is the random residual with .

Note that for the parental effects and there is a pedigree available, which is used to obtain the numerator relationship matrix , that is incorporated in our analyses. Hence, the vectors of parental effects have the assumptions and .

In the above model, we are assuming that there are distinct variances for females and males, and , and that these can take any positive value. 

We can fit the above model with ASReml-R using the following code:

ainv <- ainverse(PED)
modP <- asreml(fixed=DBH~1,
             random=~vm(mother,ainv)+vm(father,ainv)+family,
             residual=~idv(units),
             data=PINE)

In this code we are generating the numerator relationship matrix using the command . The matrix is incorporated into the model using the function . The log-likelihood of the above model fitted to our data is and the estimated variance components are:

 componentstd.errorz.ratio bound$ch
vm(mother, ainv)0.27746230.11216232.473759 P0
vm(father, ainv)0.38159270.13484472.829868 P0
family0.0000004NANA B0
units!units1.87022730.093070720.094696 P0
units!R1.0000000NANA F0

Note the difference in variance components between males (fathers, ) and females (mothers, ). Finding a difference in variance components by sex is common for plant species and it might be correct depending on the origin of the parents, but also other biological reasons might explain these differences. Also note that the family variance is bounded at almost zero.

In our case, we can calculate both the narrow-sense heritability for females and males ( and ) and the dominance ratio (), by using the following expressions:

In some cases, particualrly when the estimations of heritability for both sexes might not be very different, these can be combined into a single estimate, as:

All of the above expressions were obtained in ASReml-R using the command as shown below:

vpredict(modP,h2m~4*V1/(V1+V2+V3+V4))
vpredict(modP,h2f~4*V2/(V1+V2+V3+V4))
vpredict(modP,d2~4*V3/(V1+V2+V3+V4))
vpredict(modP,h2~2*(V1+V2)/(V1+V2+V3+V4))

with the results:

 EstimateSE
h2m0.43880000.1616208
h2f0.60347970.1858860
d20.00000070.0000001
h20.52113990.1043225

As expected, our dominance ratio is very small in this particular trait; hence, it could be ignored. Also, the narrow-sense heritabilities are reasonable for this type of study, with a combined estimate of but with a moderate approximated standard error ().

Also, we can obtain the BLUPs for the model terms and , which in this case will correspond to the GCA values. If these are multiplied by two, we will have the BVs. The GCAs are obtained with the code:

summary(modP,coef=TRUE)$coef.random

In the output below, we can observe a few of the estimated GCA values for the (top) and (bottom).

 solutionstd.errorz.ratio
vm(mother, ainv)_140060.36000020.28415481.2669160
vm(mother, ainv)_140460.00000000.52674720.0000000
vm(mother, ainv)_14060-0.22862670.4864881-0.4699533
vm(mother, ainv)_140700.00000000.52674720.0000000
vm(mother, ainv)_14104-0.46204060.4208766-1.0978056
vm(mother, ainv)_141060.00000000.52674720.0000000
 solutionstd.errorz.ratio
vm(father, ainv)_140060.00136120.28271430.0048148
vm(father, ainv)_14046-0.54954410.5510751-0.9972219
vm(father, ainv)_140600.00000000.61773210.0000000
vm(father, ainv)_14070-0.37924720.3257064-1.1643837
vm(father, ainv)_14104-0.20873550.5024186-0.4154613
vm(father, ainv)_14106-0.54954410.5510751-0.9972219
vm(father, ainv)_141140.14763780.34620450.4264466

Note that for parent the estimates are different as a mother and a father ( against ), and their precision will depend on the amount of information a parent has (i.e., number of offspring), but also in its relationship with other genotypes on the dataset.

So far, we have not combined the information from a given parent when it is used as a male and as a female. Now, it is of interest to overlay the factors and . To do this in ASReml-R we require two elements: the use of the statement in the model and the specification of the command . This is illustrated in the following code for the same data as above:

modO <- asreml(fixed=DBH~1,
               random=~vm(mother,ainv)+and(vm(father,ainv)),
               equate.levels=c('mother','father'),
               residual=~idv(units),
               data=PINE)

First, let’s focus on the function. This is used to request that the design matrices associated with are overlaid on the design matrix associated with the immediately previous term (here, ). Second, the expression requests that the levels for the factor and are treated as identical. That is, levels that have the same name in the factors and will be treated as the same, and levels that are found in only one of the factors will be added to the new design matrix. Both of the above commands are required in ASReml-R to ensure that consistent BV (or GCA) estimates are obtained.

In order to illustrate the above overlay of matrices, we will use a simple example, in which we have eight individuals originating from crosses of four parents (P1, P2, P3 and P4). These parents were used as female or male, and in some cases we have reciprocal crosses and selfing. The crossing and design matrices for our base model (without overlay) are shown below:

Once the LMM model is fitted based on the above matrices (as we did with our previous object ), we will have a total of four GCAs for females and three for males. If we count the number of ones for each effect, we find that they have between one and three pieces of information to be used in their estimation.

Now, if we proceed to overlay the female and male incidence matrices, we obtain the following results:

You can observe that there is a single design matrix but only with four columns: one for each parent. In this case, we have more pieces of information to estimate each effect, ranging from three to six. Of special interest is the effect of parent P3, it now uses four pieces of information to estimate its single GCA, whereas before it was being estimated as two effects, each with two pieces of information.

There are other interesting things in the above matrix, such as the value 2 associated with individual H, that originated from the cross P4 P4, which corresponds to a selfing of parent P4. Also, individuals D and F are full-sibs, originating from reciprocal crosses, but in this case there is no distinction between their parental effects. Both, selfing and reciprocal crosses can be considered in the fitted linear mixed model as extensions with the aim of evaluating their importance or significance.

Finally, we can observe our results from fitting the above overlay model to our loblolly pine data. The estimated variance components are:

 componentstd.errorz.ratio bound%ch 
vm(mother, ainv)0.40004530.11624173.441494P0
units!units1.85994390.091748420.272218P0
units!R1.0000000NANAF0

In this particular case, we are missing the variance associated with the factor . However, this is still part of the model. As the matrices were overlaid, we have as an output a single parental effect now called , which represents both male and female parents. Also, note that we dropped the term associated with , as in the previous analysis this was bound to zero. The log-likelihood of this model is , a greater value indicating a better fit than the obtained from the original model (), and probably a significant improvement.

We can now proceed to obtain our estimate of heritability using the expression:

vpredict(modO,h2~4*V1/(2*V1+V2))

which is the traditional formula; however, note that in the denominator we are considering twice the variance as we have a single parental variance component. The results from the above code are presented below:

 EstimateSE
h20.6015640.1248922

As expected, these results are different to those from our previously fitted model because of the way this model is combining the available information. There is an increase in the heritability when we overlay the factors of mother and father compared to when we do not (). This increase is expected as we now have more information to estimate each of the effects. Also, we expect a reduction in the standard errors of the estimated GCA values, which are presented below:

 solutionstd.errorz.ratio
vm(mother, ainv)_140060.51741800.21165422.4446388
vm(mother, ainv)_14046-0.50551630.5609615-0.9011606
vm(mother, ainv)_14060-0.28435990.5668751-0.5016271
vm(mother, ainv)_14070-0.34307210.3366609-1.0190434
vm(mother, ainv)_14104-0.48499100.4781210-1.0143688
vm(mother, ainv)_14106-0.50551630.5609615-0.9011606

In this case we have only effects that identify each of the levels of the parent. In contrast with our previous model, parent has a single GCA value (), and its standard error is smaller than before ( compared to ). Some GCA estimates will increase or decrease, and their precision might get better or worse depending on how information is combined.

In this example, we have explored a parental model where the effects of male and female were overlaid. This resulted, according to the log-likelihood value, in a statistically superior model that combines the data better. But more importantly, this model is more biologically accurate given our knowledge of monoecy of the species under study.

Author

Salvador A. Gezan

References

Resende, MFR, Munoz, P, Resende, MDV, Garrick, DJ, Fernando, RL, Davis, JM, Jokela, EJ, Martin, TA, Peter, GF and M. Kirst. (2012). Accuracy of genomic selection methods in a standard data set of loblolly pine (Pinus taeda L.) Genetics 190:1503-1510

File to download

EQ_LOBLOLLY

Notes: SAG August-2020

Related Reads