![]() |
||||||||||||||||||
Introduction Analysis of Genomes by Reassociation Experiments Organization of Single-copy Sequences Evolution of Repeated Sequences in Cereals Estimating the Number of Expressed Genes Chloroplast Genome Organization Mitochondrial Genome Organization
|
Analysis of Genomes by Reassociation ExperimentsIf a DNA molecule is melted and allowed to reassociate, the complexity of the genome dictates the rate in which duplex DNA will form. If we consider a simple molecule that consists of alternating GCs, this molecule will be able to form a duplex quicker than a molecule that consists of repeating blocks of AGCT. As the number of different combinations of bases increases, the time required for complete duplex formation to occur will increase. Renaturation, or duplex formation requires random collisions between two single-stranded molecules. This process follows second-order kinetics and is concentration dependent. We will not go through the derivation of the formula but the important parameter used to define a certain DNA is:Cot½.This value is defined as the amount of time required for one-half of the DNA to reanneal or form duplex DNA. The units for this parameter is moles of nucleotides per liter per second. The more complex the genome of interest, the longer it will take for like sequences to reanneal. Consequently, the Cot½ will be larger. Thus in terms of reassociation kinetics complexity has a specific definition. Complexity - the total length of different sequences For example, E. coli is considered to have a complexity of 4.2 x 106 base pairs. What is the experimental procedure used to derive these values? In general the procedure is:
A comparison of the Cot value of each of these components with an E. coli standard allows us to derive the complexity of each component. The complexity of the slow component of the genome is greater than that for the other two components and is considered to represent the single copy portion of the genome. The complexity of the slow component can be used as a good estimate of the genome size. The genome size will be the sum of the lengths of all the unique sequences. Using the example from Genes V - Lewin, p.664, the complexity of the slow component is 3 x 108 bp and the complexity of the intermediate component is 6 x 105. If we divide the complexity of the slow component into the intermediate component we get 2 x 10-3. This demonstrates that the intermediate component contributes very little to the complexity of the genome. Therefore, the complexity of the slow or single copy portion of the genome can be considered equal to the genome size. To derive the complexity of each component, it is necessary to run a standard, such as E. coli DNA, with each experiment. As stated above, E. coli is considered to consist of only single- copy sequences. Let's say that in the experiment from which the Cot for each component was derived, the Cot½ value for E. coli was 4. Experimentally it was determined that the slow component comprised 45% of the total DNA. Therefore, if only that component was annealed, the Cot½ value would be 283 (630 x 0.45). That value is 71 (283/4) times slower than for E. coli. Therefore the complexity of the slow component is 71 times that of E. coli or 3.0 x 108 (71 x 4.2 x 106). The complexity of the other components is derived similarly. Genome size is quite variable throughout the biological world and the genome size in plants shows the greatest variation of any kingdom in the biological world. Variation in Genome Size among Plants
1Conversion factor: 1 pg = 0.965 X 109 bp = 6.1 X 1011 daltons One manner in which a genome can be described is by determining the distribution of fast, intermediate and slow components in the genome. For comparison purposes, what does the human genome look like?
Distribution Sizes Among Components of the Human Genome
The table Sequence Distribution of Selected Plant Species lists the different components from different plant species. As you can see, plant species exhibit a wide range of values for each of the components. The genome of Arabidopsis is essentially entirely single copy sequences (the repetitive sequences have been determined to be essentially all chloroplast DNA). At the other extreme, pea and wheat genomes have only 10-20% single copy sequences. Reassociation kinetic experiments of polyploid species, such as bread wheat (Triticum aestivium ) were unable to derive a component that displayed true single copy kinetics. Instead the slowest component appeared to act as if it consisted of copies represented three times. This result is consistent with the current hypothesis that bread wheat was developed from the introgression of three diploid wheat species. Genomic analysis suggest that this hypothesis is correct since the slowest component appears to consist of sequences represented three times. Copyright © 1998. Phillip McClean |