Home :: Chapter 5 :: Summary

Chapter 5 Summary

THE COMPLEXITY OF EUKARYOTIC GENOMES

Introns and Exons: Most eukaryotic genes have a split structure in which segments of coding sequence (exons) are interrupted by noncoding sequences (introns). In complex eukaryotes, introns account for more than ten times as much DNA as exons.

Repetitive DNA Sequences: Over 50% of mammalian DNA consists of highly repetitive DNA sequences, some of which are present in 105 to 106 copies per genome. These sequences include simple-sequence repeats as well as repetitive elements that have moved throughout the genome by either RNA or DNA intermediates.

Gene Duplications and Pseudogenes: Many eukaryotic genes are present in multiple copies, called gene families, which have arisen by duplication of ancestral genes. Some members of gene families function in different tissues or at different stages of development. Other members of gene families (pseudogenes) have been inactivated by mutations and no longer represent functional genes. Gene duplications can occur either by duplication of a segment of DNA or by reverse transcription of an mRNA, giving rise to a processed pseudogene. Approximately 5% of the human genome consists of duplicated DNA segments. In addition, there are more than 10,000 processed pseudogenes in the human genome.

The Composition of Higher Eukaryotic Genomes: Only a small fraction of the genome in complex eukaryotes corresponds to protein-coding sequences. The human genome is estimated to contain 20,000 to 25,000 genes, with protein-coding sequence corresponding to only about 1.2% of the DNA. Approximately 20% of the human genome consists of introns, and more than 60% is composed of repetitive and duplicated DNA sequences.

CHROMOSOMES AND CHROMATIN

Chromatin: The DNA of eukaryotic cells is wrapped around histones to form nucleosomes. Chromatin can be further compacted by the folding of nucleosomes into higher-order structures, including the highly condensed metaphase chromosomes of cells undergoing mitosis.

See Website Animation 5.1

Centromeres: Centromeres are specialized regions of eukaryotic chromosomes that serve as the sites where sister chromatids are joined and the sites of spindle fiber attachment during mitosis. Centromere function is determined by a variant H3-like histone, which is epigenetically maintained at cell division.

Telomeres: Telomeres are specialized sequences required to maintain the ends of eukaryotic chromosomes.

THE SEQUENCES OF COMPLETE GENOMES

Prokaryotic Genomes: The genomes of more than 500 different bacteria, including E. coli, have been completely sequenced. The E. coli genome contains 4288 genes, with protein-coding sequences accounting for nearly 90% of the DNA.

The Yeast Genome: The first eukaryotic genome to be sequenced was that of the yeast S. cerevisiae. The S. cerevisiae genome contains about 6000 genes, and protein-coding sequences account for approximately 70% of the genome. The genome of the fission yeast S. pombe contains fewer genes (about 5000) and more introns than S. cerevisiae, with protein-coding sequence corresponding to about 60% of the S. pombe genome.

The Genomes of Caenorhabditis elegans, Drosophila melanogaster and Other Invertebrates: The genome of C. elegans was the first sequenced genome of a multicellular organism. The C. elegans genome contains about 19,000 protein-coding sequences, which account for only about 25% of the genome. The genome of Drosophila contains approximately 14,000 genes, with protein-coding sequences accounting for about 13% of the genome. Additional invertebrate genomes that have been sequenced include other species of nematodes and Drosophila, additional insects, the sea urchin, and sea anemone. The numbers of genes in these species indicate that gene number is not simply related to the biological complexity of an organism.

Plant Genomes: The genome of the small flowering plant Arabidopsis thaliana contains approximately 26,000 genes—surprisingly more genes than were found in either Drosophila or C. elegans. Many of these genes are unique to plants, including genes involved in plant physiology, development, and defense. The sequences of the rice and black cottonwood tree genomes have also been determined, and appear to contain more than 40,000 genes. The large number of genes in these plants partially reflects the duplication of large regions of their genomes.

The Human Genome: The human genome appears to contain 20,000 to 25,000 genes—not much more than the number of genes found in simpler animals like Drosophila and C. elegans. Over 40% of the predicted human proteins are related to proteins found in other sequenced organisms, including Drosophila and C. elegans. In addition, the human genome contains expanded numbers of genes involved in the nervous system, the immune system, blood clotting, development, cell signaling, and the regulation of gene expression.

The Genomes of Other Vertebrates: The genomes of fish, chickens, mice, rats, dogs, rhesus macaques, and chimpanzees provide important comparisons to the human genome. All of these vertebrates contain similar numbers of genes but in some cases differ substantially in their content of repetitive sequences.

BIOINFORMATICS AND SYSTEMS BIOLOGY

Systematic Screens of Gene Function: The genome sequencing projects have introduced large-scale experimental and computational approaches to research in cell and molecular biology. Genome-wide screens using RNA interference can systematically identify all of the genes in an organism that are involved in any biological process that can be assayed in a high-throughput format.

Regulation of Gene Expression: The identification of gene regulatory sequences and elucidation of the signaling networks that control gene expression are major challenges in bioinformatics and systems biology. These problems are being approached by genome-wide studies of gene expression combined with the development of computational approaches to identify functional regulatory elements.

Variation among Individuals and Genomic Medicine: Variations in our genomes are responsible for the characteristics of individual people, including susceptibility to many diseases. Genome-wide association scans are being used to identify genes associated with susceptibility to a variety of common diseases. Identification of these genes will ultimately allow the development of new strategies for disease prevention and treatment that match the genetic makeup of different individuals.

Go