Chapter 5: The Organization and Sequences of Cellular Genomes

Chapter Summary

THE COMPLEXITY OF EUKARYOTIC GENOMES

Introns and Exons: Most eukaryotic genes have a split structure in which segments of coding sequence (exons) are interrupted by noncoding sequences (introns). In complex eukaryotes, introns account for more than ten times as much DNA as exons.

Repetitive DNA Sequences: Over 50% of mammalian DNA consists of highly repetitive DNA sequences, some of which are present in 105 to 106 copies per genome. These sequences include simple-sequence repeats as well as repetitive elements that have moved throughout the genome by either RNA or DNA intermediates.

Gene Duplications and Pseudogenes: Many eukaryotic genes are present in multiple copies, called gene families, which have arisen by duplication of ancestral genes. Some members of gene families function in different tissues or at different stages of development. Other members of gene families (pseudogenes) have been inactivated by mutations and no longer represent functional genes. Gene duplications can occur either by duplication of a segment of DNA or by reverse transcription of an mRNA, giving rise to a processed pseudogene. Approximately 5% of the human genome consists of duplicated DNA segments. In addition, there are more than 10,000 processed pseudogenes in the human genome.

The Composition of Higher Eukaryotic Genomes: Only a small fraction of the genome in complex eukaryotes corresponds to protein-coding sequences. The human genome is estimated to contain 20,000–25,000 genes, with protein-coding sequence corresponding to only about 1.2% of the DNA. Approximately 20% of the human genome consists of introns, and more than 60% is composed of repetitive and duplicated DNA sequences.

CHROMOSOMES AND CHROMATIN

Chromatin: The DNA of eukaryotic cells is wrapped around histones to form nucleosomes. Chromatin can be further compacted by the folding of nucleosomes into higher-order structures, including the highly condensed metaphase chromosomes of cells undergoing mitosis.

Centromeres: Centromeres are specialized regions of eukaryotic chromosomes that serve as the sites of association between sister chromatids and the sites of spindle fiber attachment during mitosis.

Telomeres: Telomeres are specialized sequences required to maintain the ends of eukaryotic chromosomes.

THE SEQUENCES OF COMPLETE GENOMES

Prokaryotic Genomes: The genomes of more than 100 different bacteria, including E. coli, have been completely sequenced. The E. coli genome contains 4288 genes, with protein-coding sequences accounting for nearly 90% of the DNA.

The Yeast Genome: The first eukaryotic genome to be sequenced was that of the yeast S. cerevisiae. The S. cerevisiae genome contains about 6000 genes, and protein-coding sequences account for approximately 70% of the genome. The genome of the fission yeast S. pombe contains fewer genes (about 5000) and more introns than S. cerevisiae, with protein-coding sequence corresponding to about 60% of the S. pombe genome.

The Genomes of Caenorhabditis elegans and Drosophila melanogaster: The genome of C. elegans was the first sequenced genome of a multicellular organism. The C. elegans genome contains about 19,000 protein-coding sequences, which account for only about 25% of the genome. The genome of Drosophila contains approximately 14,000 genes, with protein-coding sequences accounting for about 13% of the genome. Although Drosophila contains fewer genes than C. elegans, many genes in both species are duplicated, and it appears that both species contain 10,000–15,000 unique genes. Some of these genes are shared between Drosophila, C. elegans, and yeast—these genes may encode proteins with common functions in all eukaryotic cells. However, the majority of Drosophila and C. elegans genes are not found in yeast and are likely to function in the regulation and development of multicellular animals.

Plant Genomes: The genome of the small flowering plant Arabidopsis thaliana contains approximately 26,000 genes—surprisingly more genes than were found in either Drosophila or C. elegans. However, many of these genes are the result of duplications of large segments of the Arabidopsis genome, so the number of unique genes in Arabidopsis is about 15,000. Many of these genes are unique to plants, including genes involved in plant physiology, development, and defense. The sequence of the rice genome is of particular agricultural interest because rice is the staple food for more than half the world's population. The draft sequence of the rice genome is estimated to contain approximately 37,000 genes, many of which are duplicated and may have arisen by duplication of large genome segments.

The Human Genome: The human genome appears to contain 20,000–25,000 genes—not much more than the number of genes found in simpler animals like Drosophila and C. elegans. Over 40% of the predicted human proteins are related to proteins found in other sequenced organisms, including Drosophila and C. elegans. In addition, the human genome contains expanded numbers of genes involved in the nervous system, the immune system, blood clotting, development, cell signaling, and the regulation of gene expression.

The Genomes of Other Vertebrates: The genomes of fish, chickens, mice, rats, dogs, and chimpanzees provide important comparisons to the human genome. All of these vertebrates contain similar numbers of genes but in some cases differ substantially in their content of repetitive sequences.

BIOINFORMATICS AND SYSTEMS BIOLOGY

Systematic Screens of Gene Function: The genome sequencing projects have introduced large-scale experimental and computational approaches to research in cell and molecular biology. Genome-wide screens using RNA interference can systematically identify all of the genes in an organism that are involved in any biological process that can be assayed in a high-throughput format.

Regulation of Gene Expression: The identification of gene regulatory sequences and elucidation of the signaling networks that control gene expression are major challenges in bioinformatics and systems biology. These problems are being approached by genome-wide studies of gene expression combined with the development of computational approaches to identify functional regulatory elements.

Variation among Individuals and Genomic Medicine: Variations in our genomes are responsible for the characteristics of individual people, including susceptibility to many diseases. Analysis of these variations will allow the identification of genes responsible for disease susceptibility and enable the development of new strategies for disease prevention and treatment that match the genetic makeup of different individuals.

American Society for Microbiology   Sinauer Associates