World Library  
Flag as Inappropriate
Email this Article

Comparative genomics

Article Id: WHEBN0000917868
Reproduction Date:

Title: Comparative genomics  
Author: World Heritage Encyclopedia
Language: English
Subject: Model organism, Cognitive genomics, Genomics, Pathogenomics, MicrobesOnline
Collection: Evolutionary Biology, Genomics
Publisher: World Heritage Encyclopedia

Comparative genomics

Whole genome alignment is a typical method in comparative genomics. This alignment of eight Yersinia bacteria genomes reveals 78 locally collinear blocks conserved among all eight taxa. Each chromosome has been laid out horizontally and homologous blocks in each genome are shown as identically colored regions linked across genomes. Regions that are inverted relative to Y. pestis KIM are shifted below a genome's center axis.[1]

Comparative genomics is a field of DNA that is evolutionarily conserved between them.[6] Therefore, comparative genomic approaches start with making some form of alignment of genome sequences and looking for orthologous sequences (sequences that share a common ancestry) in the aligned genomes and checking to what extent those sequences are conserved. Based on these, genome and molecular evolution are inferred and this may in turn be put in the context of, for example, phenotypic evolution or population genetics.[7]

Virtually started as soon as the whole genomes of two organisms became available (that is, the genomes of the bacteria Saccharomyces cerevisiae.[4] It has also showed the extreme diversity of the gene composition in different evolutionary lineages.[8]


  • History 1
  • Evolutionary principles 2
  • Methods 3
  • Tools 4
  • Applications 5
    • Agriculture 5.1
    • Medicine 5.2
    • Research 5.3
  • See also 6
  • References 7
    • Further reading 7.1
  • External links 8


See also: History of genomics

Comparative genomics has a root in the comparison of virus genomes in the early 1980s.[8] For example, small RNA viruses infecting animals (picornaviruses) and those infecting plants (cowpea mosaic virus) were compared and turned out to share significant sequence similarity and, in part, the order of their genes.[10] In 1986, the first comparative genomic study at a larger scale was published, comparing the genomes of varicella-zoster virus and Epstein-Barr virus that contained more than 100 genes each.[11]

The first complete genome sequence of a cellular organism, that of Haemophilus influenzae Rd, was published in 1995.[12] The second genome sequencing paper was of the small parasitic bacterium Mycoplasma genitalium published in the same year.[13] Starting from this paper, reports on new genomes inevitably became comparative-genomic studies.[8]

The first high-resolution whole genome comparison system was developed in 1998 by Art Delcher, Simon Kasif and Steven Salzberg and applied to the comparison of entire highly related microbial organisms with their collaborators at the Institute for Genomic Research (TIGR). The system is called MUMMER and was described in a publication in Nucleic Acids Research in 1999. The system helps researchers to identify large rearrangements, single base mutations, reversals, tandem repeat expansions and other polymorphisms. In bacteria, MUMMER enables the identification of polymorphisms that are responsible for virulence, pathogenicity, and anti-biotic resistance. The system was also applied to the Minimal Organism Project at TIGR and subsequently to many other comparative genomics projects.

Saccharomyces cerevisiae, the baker's yeast, was the first eukaryote to have its complete genome sequence published in 1996.[14] After the publication of the roundworm Caenorhabditis elegans genome in 1998[15] and together with the fruit fly Drosophila melanogaster genome in 2000,[16] Gerald M. Rubin and his team published a paper titled "Comparative Genomics of the Eukaryotes", in which they compared the genomes of the eukaryotes D. melanogaster, C. elegans, and S. cerevisiae, as well as the prokaryote H. influenzae.[17] At the same time, Bonnie Berger, Eric Lander, and their team published a paper on whole-genome comparison of human and mouse.[18]

With the publication of the large genomes of vertebrates in the 2000s, including human, the Japanese pufferfish Takifugu rubripes, and mouse, precomputed results of large genome comparisons have been released for downloading or for visualization in a genome browser. Instead of undertaking their own analyses, most biologists can access these large cross-species comparisons and avoid the impracticality caused by the size of the genomes.[19]

Next-generation sequencing methods, which were first introduced in 2007, have produced an enormous amount of genomic data and have allowed researchers to generate multiple (prokaryotic) draft genome sequences at once. These methods can also quickly uncover single-nucleotide polymorphisms, insertions and deletions by mapping unassembled reads against a well annotated reference genome, and thus provide a list of possible gene differences that may be the basis for any functional variation among strains.[9]

Evolutionary principles

One character of biology is evolution, evolutionary theory is also the theoretical foundation of comparative genomics, and at the same time the results of comparative genomics unprecedentedly enriched and developed the theory of evolution. When two or more of the genome sequence compared, in essence get the evolutionary relationships of the sequence in the phylogenetic tree. Increased genome information of study makes molecular evolution, gene function at the genome level possible. Based on a variety of biological genome data and the study of vertical and horizontal evolution process, can understand vital parts of gene structure and its regulation function for life. But as a result in biological genome about 1.5% ~ 14.5% of the genes related to "lateral migration phenomenon", namely the gene transfer between populations which can exist at the same time, the differences in sequence result has nothing to do with the evolution. So in the system analysis, it needs to establish a relatively complete evolution model, in order to avoid gene transfer and the influence of the lack of more appropriate species which are conserved sequence.

Similarity of related genomes is the basis of comparative genomics. Two creatures which have a recent common ancestor, the species difference genomes between them were evolved from ancestors’ genome, the closer the two organisms on the evolutionary stages, the higher their genome correlated. If there is close relationship between them, then their genome will behave like linear (synteny), namely some or all of the genetic sequences are conservative. So scientists can use the homology of the sequence and structure of encoding between mode genomes, by known genome mapping information to locate other genes in the genome, so as to reveal the potential function of the genes, clarify evolutionary relationship and the inner structure of the genome.

Orthologous sequences are separate because of speciation: a gene exists in the original species, the species divided into two species, so genes in new species are orthologous. Paralogy sequences are separate by gene cloning (gene duplication): if a particular gene in the biology is copied, then the copy of the two sequences is paralogy. A pair of orthologous sequences is called orthologous pairs (orthologs), a pair of paralogy sequence is called collateral pairs (paralogs). Orthologous pairs usually have the same or similar function, but not necessarily on collateral pairs: due to the lack of the power of natural selection, the original duplicate copy of the genes are variation and get free new functions.

Human FOXP2 gene and evolutionary conservation is shown in and multiple alignment (at bottom of figure) in this image from the UCSC Genome Browser. Note that conservation tends to cluster around coding regions (exons).

Comparative genomics exploits both similarities and differences in the selection has acted upon these elements. Those elements that are responsible for similarities between different species should be conserved through time (stabilizing selection), while those elements responsible for differences among species should be divergent (positive selection). Finally, those elements that are unimportant to the evolutionary success of the organism will be unconserved (selection is neutral).

One of the important goals of the field is the identification of the mechanisms of eukaryotic genome evolution. It is however often complicated by the multiplicity of events that have taken place throughout the history of individual lineages, leaving only distorted and superimposed traces in the genome of each living organism. For this reason comparative genomics studies of small Caenorhabditis elegans and closely related Caenorhabditis briggsae) are of great importance to advance our understanding of general mechanisms of evolution.[20][21]


Computational approaches to genome comparison have recently become a common research topic in computer science. A public collection of case studies and demonstrations is growing, ranging from whole genome comparisons to gene expression analysis.[22] This has increased the introduction of different ideas, including concepts from systems and control, information theory, strings analysis and data mining. [23] It is anticipated that computational approaches will become and remain a standard topic for research and teaching, while multiple courses will begin training students to be fluent in both topics.[24]


Computational tools for analyzing sequences and complete genomes are developed quickly due to the availability of large amount of genomic data. At the same time, comparative analysis tools are progressed and improved. In the challenges about these analyses, it is very important to visualize the comparative results.[25]

Visualization of sequence conservation is a tough task of comparative sequence analysis. As we know, it is highly inefficient to examine the alignment of long genomic regions manually. Internet-based genome browsers provide many useful tools for investigating genomic sequences due to integrating all sequence-based biological information on genomic regions. When we extract large amount of relevant biological data, they can be very easy to use and less time-consuming.[25]

  • UCSC Browser: This site contains the reference sequence and working draft assemblies for a large collection of genomes.[26]
  • Ensembl: The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.[27]
  • MapView: The Map Viewer provides a wide variety of genome mapping and sequencing data.[28]
  • VISTA is a comprehensive suite of programs and databases for comparative analysis of genomic sequences. It was built to visualize the results of comparative analysis based on DNA alignments. The presentation of comparative data generated by VISTA can easily suit both small and large scale of data.[29]

An advantage of using online tools is that these websites are being developed and updated constantly. There are many new settings and content can be used online to improve efficiency.[25]



Agriculture is a field that reaps the benefits of comparative genomics. Identifying the loci of advantageous genes is a key step in breeding crops that are optimized for greater yield, cost-efficiency, quality, and disease resistance. For example, one genome wide association study conducted on 517 rice landraces revealed 80 loci associated with several categories of agronomic performance, such as grain weight, amylose content, and drought tolerance. Many of the loci were previously uncharacterized.[30] Not only is this methodology powerful, it is also quick. Previous methods of identifying loci associated with agronomic performance required several generations of carefully monitored breeding of parent strains, a time consuming effort that is unnecessary for comparative genomic studies.[31]


The medical field also benefits from the study of comparative genomics. Vaccinology in particular has experienced useful advances in technology due to genomic approaches to problems. In an approach known as

  • Genomes OnLine Database (GOLD)
  • Genome News Network
  • JCVI Comprehensive Microbial Resource
  • Pathema: A Clade Specific Bioinformatics Resource Center
  • CBS Genome Atlas Database
  • The UCSC Genome Browser
  • The U.S. National Human Genome Research Institute
  • Ensembl The Ensembl Genome Browser
  • Genolevures, comparative genomics of the Hemiascomycetous yeasts
  • Phylogenetically Inferred Groups (PhIGs), a recently developed method incorporates phylogenetic signals in building gene clusters for use in comparative genomics.
  • Metazome, a resource for the phylogenomic exploration and analysis of Metazoan gene families.
  • IMG The Integrated Microbial Genomes system, for comparative genome analysis by the DOE-JGI.
  • Comparative Genomics Center.
  • SUPERFAMILY Protein annotations for all completely sequenced organisms
  • Comparative Genomics
  • Blastology and Open Source: Needs and Deeds
  • Alignment-free comparative Genomics tool

External links

  • Bergman NH, ed. (2007). Comparative Genomics: Volumes 1 and 2. Totowa (NJ): Humana Press.  
  • Kellis M, Patterson N, Endrizzi M, Birren B, Lander E (2003-05-15). "Sequencing and comparison of yeast species to identify genes and regulatory elements". Nature 423 (6937): 241–254.  
  • Cliften P, Sudarsanam P, Desikan A (2003-07-04). "Finding functional features in Saccharomyces genomes by phylogenetic footprinting". Science 301 (5629): 71–76.  
  • Boffeli D, McAuliffe J, Ovcharenko D, Lewis KD, Ovcharenko I,  
  • Dujon B; et al. (2004-07-01). "Genome evolution in yeasts". Nature 430 (6995): 35–44.  
  • Filipski A, Kumar S (2005). "Comparative genomics in eukaryotes". In T.R. Gregory.  
  • Gregory TR, DeSalle R (2005). "Comparative genomics in prokaryotes". In T.R. Gregory. The Evolution of the Genome. San Diego: Elsevier. pp. 585–675. 
  • Xie X, Lu J. Kulbokas EJ, Golub T, Mootha V, Lindblad-Toh K, Lander E, Kellis M (2005). "Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals". Nature 434 (7031).  
  • Champ PC, Binnewies TT, Nielsen N, Zinman G, Kiil K, Wu H, Bohlin J, Ussery DW (2006). "Genome update: purine strand bias in 280 bacterial chromosomes". Microbiology 152 (3): 579–583.  
  • Kumar L, Breakspear A, Kistler A, Ma L-J, Xie X (2010). "Systematic discovery of regulatory motifs in Fusarium graminearum by comparing four Fusarium genomes". BMC Genomics 11: 208.  
  • Serafim Batzoglou,  

Further reading

  1. ^ Darling AE, Miklós I, Ragan MA (2008). "Dynamics of Genome Rearrangement in Bacterial Populations". PLOS Genetics 4 (7): e1000128.  
  2. ^ a b c Touchman, J. (2010). "Comparative Genomics". Nature Education Knowledge 3 (10): 13. Retrieved 2014-01-02. 
  3. ^ a b Xia, X. (2013). Comparative Genomics. Heidelberg: Springer.  
  4. ^ a b Russel, P.J.; Hertz, P.E.; McMillan, B. (2011). Biology: The Dynamic Science (2nd ed.). Belmont, CA: Brooks/Cole. pp. 409–410. 
  5. ^ Primrose, S.B.; Twyman, R.M. (2003). Principles of Genome Analysis and Genomics (3rd ed.). Malden, MA: Blackwell Publishing. 
  6. ^ Hardison, R.C. (2003). "Comparative genomics". PLoS Biology 1 (2): e58.  
  7. ^ Ellegren, H. (2008). "Comparative genomics and the study of evolution by natural selection". Molecular Ecology 17 (21): 4586–4596.  
  8. ^ a b c d Koonin, E.V.; Galperin, M.Y. (2003). Sequence - Evolution - Function: Computational approaches in comparative genomics. Dordrecht: Springer Science+Business Media. 
  9. ^ a b Hu, B.; Xie, G.; Lo, C.-C.; Starkenburg, S. R.; Chain, P. S. G. (2011). "Pathogen comparative genomics in the next-generation sequencing era: genome alignments, pangenomics and metagenomics". Briefings in Functional Genomics 10 (6): 322–333.  
  10. ^ Argos, P.; Kamer, G.; Nicklin, M.J.; Wimmer, E. (1984). "Similarity in gene organization and homology between proteins of animal picornaviruses and a plant comovirus suggest common ancestry of these virus families". Nucleic Acids Research 12 (18): 7251–7267.  
  11. ^ McGeoch, D.J.; Davison, A.J. (1986). "DNA sequence of the herpes simplex virus type 1 gene encoding glycoprotein gH, and identification of homologues in the genomes of varicella-zoster virus and Epstein-Barr virus". Nucleic Acids Research 14 (10): 4281–4292.  
  12. ^ Fleischmann R, Adams M, White O, Clayton R, Kirkness E, Kerlavage A, Bult C, Tomb J, Dougherty B, Merrick J (1995). Rd"Haemophilus influenzae"Whole-genome random sequencing and assembly of . Science 269 (5223): 496–512.  
  13. ^ Fraser, Claire M.; et al. (1995). "The Minimal Gene Complement of Mycoplasma genitalium". Science 270 (5235): 397–404.  
  14. ^ A. Goffeau, B. G. Barrell, H. Bussey, R. W. Davis, B. Dujon, H. Feldmann, F. Galibert, J. D. Hoheisel, C. Jacq, M. Johnston, E. J. Louis, H. W. Mewes, Y. Murakami, P. Philippsen, H. Tettelin & S. G. Oliver (1996). "Life with 6000 genes". Science 274 (5287): 546, 563–567.  
  15. ^ The C. elegans Sequencing Consortium (1998). "Genome sequence of the nematode C. elegans: A platform for investigating biology". Science 282 (5396): 2012–2018.  
  16. ^ Adams MD, Celniker SE, Holt RA; et al. (2000). "Drosophila melanogaster"The genome sequence of . Science 287 (5461): 2185–95.  
  17. ^  
  18. ^ Serafim Batzoglou,  
  19. ^ Ureta-Vidal, A.; Ettwiller, L.; Birney, E. (2003). "Comparative genomics: Genome-wide analysis in metazoan eukaryotes". Nature Reviews Genetics 4 (4): 251–262.  
  20. ^ Stein LD; et al. (2003). "The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics". PLoS Biology 1 (2): E45.  
  21. ^ "Newly Sequenced Worm a Boon for Worm Biologists". PLoS Biology 1 (2): e4–e4. 2003.  
  22. ^ Cristianini N and Hahn M (2006). Introduction to Computational Genomics. Cambridge University Press.  
  23. ^ Pratas, D; Silva, R; Pinho, A; Ferreira, P (May 18, 2015). "An alignment-free method to find and visualise rearrangements between pairs of DNA sequences.". Scientific Reports (Nature Publishing Group) 5.  
  24. ^ Via, Allegra; Javier De Las Rivas; Teresa K. Attwood; David Landsman; Michelle D. Brazas; Jack A. M. Leunissen; Anna Tramontano; Maria Victoria Schneider (2011-10-27). "Ten Simple Rules for Developing a Short Bioinformatics Training Course". PLoS Comput Biol 7 (10): e1002245.  
  25. ^ a b c Bergman NH, ed. (2007). Comparative Genomics: Volumes 1 and 2. Totowa (NJ): Humana Press.  
  26. ^ "UCSC Browser". 
  27. ^ "Ensembl Genome Browser". 
  28. ^ "Map Viewer". 
  29. ^ "VISTA tools". 
  30. ^ Huang XH; et al. (2010). "Genome-wide association studies of 14 agronomic traits in rice landraces". Nature Genetics 42 (11): 961-U76.  
  31. ^ Morrell PL, Buckler ES, Ross-Ibara J (2012). "Crop genomics: advances and applications". Nature Reviews Genetics 13 (2): 85–96.  
  32. ^ Seib KL, Zhao X, Rappuoli R (2012). "Developing vaccines in the era of genomics: a decade of reverse vaccinology". Clinical Microbiology and Infection 18 (SI): 109–116.  
  33. ^ Maione D; et al. (2005). "Identification of a Universal Group B Streptococcus Vaccine by Multiple Genome Screen". Science 309 (5731): 148–150.  
  34. ^ Rasco DA; et al. (2008). "The pangenome structure of Escherichia coli: Comparative genomic analysis of E-coli commensal and pathogenic isolates". Journal of Bacteriology 190 (20): 6881–6893.  
  35. ^ Rodgers J, Gibbs RA (2014). "APPLICATIONS OF NEXT-GENERATION SEQUENCING Comparative primate genomics: emerging patterns of genome content and dynamics". Nature Reviews Genetics 15 (5): 347–359.  
  36. ^ Prado-Martinez J; et al. (2013). "Great ape genetic diversity and population history". Nature 499 (7459): 471–475.  
  37. ^ Zeng J, Konopa G, Hunt BG, Preuss TM, Geschwind D, Yi SV (2012). "Divergent Whole-Genome Methylation Maps of Human and Chimpanzee Brains Reveal Epigenetic Basis of Human Regulatory Evolution". The American Journal of Human Genetics 91 (3): 455–465.  


See also

Comparative genomics also opens up new avenues in other areas of research. As DNA sequencing technology has become more accessible, the number of sequenced genomes has grown. With the increasing reservoir of available genomic data, the potency of comparative genomic inference has grown as well. A notable case of this increased potency is found in recent primate research. Comparative genomic methods have allowed researchers to gather information about genetic variation, differential gene expression, and evolutionary dynamics in primates that were indiscernible using previous data and methods.[35] The Great Ape Genome Project used comparative genomic methods to investigate genetic variation with reference to the six great ape species, finding healthy levels of variation in their gene pool despite shrinking population size.[36] Another study showed that patterns of DNA methylation, which are a known regulation mechanism for gene expression, differ in the prefrontal cortex of humans versus chimps, and implicated this difference in the evolutionary divergence of the two species.[37]



This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.