AnGenMap

Archived Post

From listmasteranimalgenome.org  Fri Jan  7 15:30:49 2022
Return-Path: <listmasteranimalgenome.org>
From: "Reecy, James M [VPR]" <jreecyiastate.edu>
Postmaster: submission approved by list moderator
To: Members of AnGenMap <angenmapanimalgenome.org>
Subject: NRSP-8 Bioinformatics Annual Report
Date: Fri, 07 Jan 2022 15:30:49 -0600

NRSP-8 BIOINFORMATICS COORDINATION PROGRAM 2021 ACTIVITIES
Supported by Regional Research Funds, Hatch Act

James Reecy, James Koltes, and Fiona McCarthy, Joint Coordinators

OVERVIEW: Coordination of the NIFA National Animal Genome Research Program's
(NAGRP) Bioinformatics is primarily based at, and led from, Iowa State
University (ISU), with additional activities at the University of Arizona
(UA), and is supported by NRSP-8. The NAGRP is made up of the membership of
the Animal Genome Technical Committee, including the Bioinformatic
Subcommittee.

FACILITIES AND PERSONNEL: James Reecy (ISU) James Koltes (ISU), and Fiona
McCarthy (UA) serve as Co-Coordinators. Iowa State University and University
of Arizona provide facilities and support.

OBJECTIVES: The NRSP-8 project was renewed as of 10/01/18, with the following
objectives: 1. Advance the quality of reference genomes for all agri-animal
species by providing high contiguity assemblies, deep functional annotations
of these assemblies, and comparison across species to understand structure
and function of animal genomes; 2. Advance genome-to-phenome prediction by
implementing strategies and tools to identify and validate genes and allelic
variants predictive of biologically and economically important phenotypes and
traits; and 3. Advance analysis, curation, storage, application, and reuse of
heterogeneous big data to facilitate genome-to-phenome research in animal
species of agricultural interest.

PROGRESS TOWARD OBJECTIVE 1: Advance the quality of reference genomes for all
agri-animal species by providing high contiguity assemblies, deep functional
annotations of these assemblies, and comparison across species to understand
structure and function of animal genomes.

PROGRESS TOWARD OBJECTIVE 2: Facilitate the development and sharing of animal
populations and the collection and analysis of new, unique, and interesting
phenotypes.

PROGRESS TOWARD OBJECTIVE 3: Advance analysis, curation, storage,
application, and reuse of heterogeneous big data to facilitate
genome-to-phenome research in animal species of agricultural interest. The
following describes the project's activities over this past year.

Multi-species support

The Animal QTLdb, CorrDB, NAGRP Bioinformatics Tools, and the NAGRP data
repository have been actively supporting the research activities for multiple
species. The QTLdb has been accommodating active curation of QTL/association
data for seven species (cattle, catfish, chicken, horse, pig, rainbow trout,
and sheep). In 2021, a total of 24,178 new QTL/association data were curated
into the database, bringing the total number of curated data to 235,970
QTL/associations. Currently, there are 34,342 curated porcine QTL, 177,199
curated bovine QTL, 16,217 curated chicken QTL, 2,605 curated horse QTL,
6,072 curated sheep QTL, and 1,413 curated rainbow trout QTL in the database
(https://www.animalgenome.org/QTLdb/). An additional 3,068 correlations
(increase by species: cattle: 1,554; chicken: 208; goat: 311; pig: 846;
sheep: 149) and 1,237 heritability data (increase by species: cattle: 315;
chicken: 32; goat: 2; pig: 843; sheep: 45) were curated into the Animal
CorrDB in 2021. Currently there are a total of 24,104 correlation data on 874
traits and 4,319 heritability data on 1,075 traits in 6 livestock animal
species (this summary includes a decrease of 1,241 data recalled as part of
data quality control efforts).

A new livestock SNP ID/name matching data repository and search tool has
been added to the NAGRP Bioinformatics Tools. This data collection includes
7,856,530 known SNP IDs/names and 5,055,768 SNP 'rs' to 'ss' ID matches
contributed by 10 labs/research groups, and these data are not found in any
other public SNP data resources. We continue to welcome such SNP data
contributions to this repository.

Ontology development

We have developed a hierarchy display tool to facilitate expanding and
exploring the Vertebrate Trait (VT) Ontology, Livestock Product Trait (LPT)
Ontology, Clinical Measurement Ontology (CMO), and other ontology
hierarchies. This tool has been implemented as part of the web portals for
Animal QTLdb, VT, LPT, and CMO project websites.

This past year we continued to focus on the integration of the Animal Trait
Ontology into the Vertebrate Trait Ontology
(http://bioportal.bioontology.org/ontologies/VT). Fifteen (15) dataset
updates were released to the public throughout 2021. We have continued
working with the Rat Genome Database to integrate ATO terms that are not
applicable to the Vertebrate Trait Ontology into the Clinical Measurement
Ontology (http://bioportal.bioontology.org/ontologies/CMO). Traits specific
to livestock products continue to be incorporated into a Livestock Product
Trait Ontology (LPT), which is available on NCBO's BioPortal
(http://bioportal.bioontology.org/ontologies/LPT). Three (3) LPT updates
were released during 2021. Seven (7) updates of Livestock Breed Ontology
(LBO; https://www.animalgenome.org/...ioinfo/projects/lbo/) were made. We
have also continued mapping the cattle, pig, chicken, sheep, and horse QTL
traits to the Vertebrate Trait Ontology (VT), LPT, and Clinical Measurement
Ontology (CMO) to help standardize the trait nomenclature used in the QTLdb.
A semi-automated data release pipeline was developed to minimize the manual
steps involved in new data upload and version release to BioPortal.ORG and
GitHub with AnimalGenome.ORG as a new data sync hub. The VT data download is
available through the Github portal
(https://github.com/...t-ontology) where users can
automate their data updates. Anyone interested in helping to improve the
ATO/VT is encouraged to contact James Reecy (jreecyiastate.edu), Cari Park
(cariparkiastate.edu), or Zhiliang Hu (zhuiastate.edu). The VT/LPT/CMO
cross-mapping has been well employed by the Animal QTLdb, CorrDB, and VCMap
tools. Annotation to the VT is also available for rat QTL data in the Rat
Genome Database and for mouse strain measurements in the Mouse Phenome
Database. We have also continued to integrate information from multiple
resources, e.g. FAO - International Domestic Livestock Resources Information,
Oklahoma State University - Breeds of Livestock web site, and Wikipedia, as
well as requests from community members.

Expanded Animal QTLdb functionality

We made efforts to enable the support of multiple genome builds for all
livestock species by creating a pipeline lifting SNPs between different
assemblies using SAMtools, BEDtools, BWA, and locally developed Perl scripts.
All curated QTL/association data continue to be automatically ported to NCBI,
Ensembl, UCSC genome browser, and Reuters Data Citation Index in a timely
fashion. Users can fully utilize the browser and data mining tools at NCBI,
Ensembl, and UCSC to explore animal QTL/association data. Efforts were
continually made, working with our counterparts at these institutions, to
eliminate any glitches that arose during the automated or semi-automated data
porting process. In addition, we have continued to improve existing and add
new QTLdb curation tools and user portal tools. Other improvements included
the standardization of data links across species for external databases
(db_xref) for both QTLdb and CorrDB; improved editor/curator tools to aid SNP
name/ID look up and batch annotation for QTL/association data curation; and
more improvements on eQTL data display and batch annotations. More
improvements and developments as an on-going process are continually being
carried out.

Further developments of Animal Trait Correlation Database (CorrDB)

Our efforts to overhaul and re-develop functionality within CorrDB are
ongoing. We continued to strengthen the data quality control procedures to
help improve data quality. The new outcome is a re-designed web interface
for users to more easily access data by species (the front page). Internally,
standardization of program configurations for parameters and functions will
help to streamline future tool development and debugging efforts. The CorrDB
efforts continue to feature co-development with the QTLdb for shared use of
resources and tools, such as trait ontology development and management,
literature management, breed ontology management, and bug reporting tools for
improved data quality control. The improved CorrDB curator tools are
available to the public for any user to register for an account to curate
correlation data. As reported in earlier sections, in 2021, correlation data
and heritability data continued to be curated. The public web portals
continue to undergo improvement.

Facilitating research

The Data Repository for the aquaculture, cattle, chicken, horse, pig, and
sheep communities to share their genome analysis data has proven to be very
useful and has been actively used (https://www.animalgenome.org/repository).
While new data is continually being curated, we have gradually scaled down
the support for hosting supplementary files for publications for more
sensible use of the NRSP8 bioinformatics funds. We have redirected the
community to a better data repository resource (Open Science Framework, OSF,
https://osf.io/) for better long-term data security. In 2021, the data
currently in the supplementary data repository was prepared for transfer to
OSF in the coming months. Appropriate web visit forwarding will be set up on
the current site to redirect to the new URL.

The data downloads from the repository generated over 2.05 TB of data traffic
in 2021. Throughout the year, over 62 cases were handled through our helpdesk
at AnimalGenome.ORG to help users with inquiries/requests for services
affecting community research activities and the use of our services. Provided
assistance ranged from data transfer and hosting, data deposition, data
curation, web presentation, and data analysis, to software applications, code
development, advice for tool developments, etc.

Community support and user services at AnimalGenome.ORG

We have been maintaining and actively updating the NRSP-8 species web pages
for each of the six NRSP-8 species. We continue to host mailing
lists/websites for various research groups in the NAGRP community
(https://www.animalgenome.org/community/). This includes groups like
AnGenMap, FAANG international consortium working groups, and CRI-MAP users,
new meetings, and user bulletin boards to facilitate these meetings, among
other user forums (https://www.animalgenome.org/community).

The Functional Annotation of ANimal Genomes (FAANG, https://www.faang.org/)
website has been continually developed and maintained to actively support the
FAANG activities. The FAANG site serves not only as a FAANG-related
information hub, but also as a platform for this international consortium's
communication, collaboration, organization, and interaction. It serves over
760 members and 12 working groups and sub-groups, with 14 listserv mailing
lists, bulletin board, database, and tools for membership and working group
management. The actively hosted materials include meeting minutes,
tools/protocols for FAANG activities, incorporation and use of data portal
hosted at EBI, presentation slides, and video records of scientific meetings
and related events, all interactively available to members through the web
portal.

Site maintenance

We have further consolidated services and developmental platforms to the
current Dual Quad Core Xeon Linux server. Efforts were made to improve data
backup, security, and availability. This was accomplished by better use of
the resources for shared workloads, better data security and network
security, and improved protocols for data backup, management, and
inventories.

Reaching out

We have been sending periodic updates to more than 3,000 users worldwide
(https://www.animalgenome.org/community/angenmap/) to inform the animal
genomics research community of the news and updates regarding
AnimalGenome.org. "What's New on AnimalGenome.ORG web site" emails were
sent out 3 times in 2021, consistent with the pace/pattern of the past 17
years (https://www.animalgenome.org/bioinfo/updates/).

PLANS FOR THE FUTURE

OBJECTIVE 1. Advance the quality of reference genomes for all agri-animal
species by providing high contiguity assemblies, deep functional
annotations of these assemblies, and comparison across species to
understand structure and function of animal genomes.

We will continue to analyze "omics" data to help better annotate livestock
genomes.

OBJECTIVE 2. Advance genome-to-phenome prediction by implementing
strategies and tools to identify and validate genes and allelic variants
predictive of biologically and economically important phenotypes and
traits.

OBJECTIVE 3. Advance analysis, curation, storage, application, and reuse of
heterogeneous big data to facilitate genome-to-phenome research in animal
species of agricultural interest.

We will continue to work with bovine, mouse, rat, and human QTL database
curators to develop minimal information for publication standards. We will
also work with these same database groups to improve phenotype and
measurement ontologies, which will facilitate transfer of QTL information
across species. We will continue working with U.S. and European colleagues
to develop a Bioinformatics Blueprint, similar to the Animal Genomics
Blueprint recently published by USDA-NIFA, to help direct future
livestock-oriented bioinformatic/database efforts.

Publications:

Hu, Zhi-Liang, Carissa A. Park, and James M. Reecy (2022). Bringing the
Animal QTLdb and CorrDB into the future: meeting new challenges and
providing updated services. Nucleic Acids Research, Volume 50, Issue D1,
Pages D956-D961. DOI: 10.1093/nar/gkab1116


 

 

© 2003-2024: USA · USDA · NRPSP8 · Program to Accelerate Animal Genomics Applications. Contact: Bioinformatics Team