Plant Genome Organization and Structure : Analysis of Genomes by Reassociation Experiments

Plant Genome Organization and Structure : Analysis of Genomes by Reassociation Experiments

Introduction
Analysis of Genomes by Reassociation Experiments
Repeated Sequences
Organization of Single-copy Sequences
Evolution of Repeated Sequences in Cereals
Estimating the Number of Expressed Genes
Chloroplast Genome Organization
Mitochondrial Genome Organization
RNA Editing
Course Topics
Course Home Page

Analysis of Genomes by Reassociation Experiments
If a DNA molecule is melted and allowed to reassociate, the complexity of the genome dictates the rate in which duplex DNA will form. If we consider a simple molecule that consists of alternating GCs, this molecule will be able to form a duplex quicker than a molecule that consists of repeating blocks of AGCT. As the number of different combinations of bases increases, the time required for complete duplex formation to occur will increase. Renaturation, or duplex formation requires random collisions between two single-stranded molecules. This process follows second-order kinetics and is concentration dependent. We will not go through the derivation of the formula but the important parameter used to define a certain DNA is:C_ot½.
This value is defined as the amount of time required for one-half of the DNA to reanneal or form duplex DNA. The units for this parameter is moles of nucleotides per liter per second. The more complex the genome of interest, the longer it will take for like sequences to reanneal. Consequently, the C_ot½ will be larger. Thus in terms of reassociation kinetics complexity has a specific definition.
Complexity - the total length of different sequences
For example, E. coli is considered to have a complexity of 4.2 x 10⁶ base pairs. What is the experimental procedure used to derive these values? In general the procedure is:

Shear the DNA to be analyzed to a length of about 300 bp.
Melt the DNA (usually in 0.12 M phosphate buffer) by boiling for 5 min.
Quickly place at 60^oC.
Take aliquots at different time points. Separate single-stranded DNA from double-stranded DNA by hydroxyapatite. Measure the amount of DNA that is double-stranded by absorbance at 260 nm.
Plot the amount that is single-stranded versus the C_ot value. The C_ot value is expressed in log equivalent. This plot depicts the C_ot curve.
When this type of experiment is performed with eukaryotic DNA three components are usually seen. These components each reanneal with their own unique C_ot½ value. The three components are termed the fast, intermediate, and slow components. Why do we see these three components? Eukaryotic genomes are characterized by sequences that are represented by different copy numbers. If a sequence is found many times in the genome, it will reanneal much quicker than those sequences that are found only once in the same genome. Thus the equivalent C_ot curve for a eukaryotic genome will be different than a genome, such as E. coli, which only contains single copy sequences.
A comparison of the C_ot value of each of these components with an E. coli standard allows us to derive the complexity of each component. The complexity of the slow component of the genome is greater than that for the other two components and is considered to represent the single copy portion of the genome. The complexity of the slow component can be used as a good estimate of the genome size. The genome size will be the sum of the lengths of all the unique sequences. Using the example from Genes V - Lewin, p.664, the complexity of the slow component is 3 x 10⁸ bp and the complexity of the intermediate component is 6 x 10⁵. If we divide the complexity of the slow component into the intermediate component we get 2 x 10^-3. This demonstrates that the intermediate component contributes very little to the complexity of the genome. Therefore, the complexity of the slow or single copy portion of the genome can be considered equal to the genome size.
To derive the complexity of each component, it is necessary to run a standard, such as E. coli DNA, with each experiment. As stated above, E. coli is considered to consist of only single- copy sequences. Let's say that in the experiment from which the C_ot for each component was derived, the C_ot½ value for E. coli was 4. Experimentally it was determined that the slow component comprised 45% of the total DNA. Therefore, if only that component was annealed, the C_ot½ value would be 283 (630 x 0.45). That value is 71 (283/4) times slower than for E. coli. Therefore the complexity of the slow component is 71 times that of E. coli or 3.0 x 10⁸ (71 x 4.2 x 10⁶). The complexity of the other components is derived similarly.
Genome size is quite variable throughout the biological world and the genome size in plants shows the greatest variation of any kingdom in the biological world.
Variation in Genome Size among Plants

Species kb/haploid pg/haploid¹

Arabidopsis 7 X 10⁴ 0.15

Lily 1 X 10⁸ 100.00

¹Conversion factor: 1 pg = 0.965 X 109 bp = 6.1 X 10¹¹ daltons
One manner in which a genome can be described is by determining the distribution of fast, intermediate and slow components in the genome. For comparison purposes, what does the human genome look like?
Distribution Sizes Among Components of the Human Genome

Component % of Genome

Fast 6

Intermediate 38

Slow 50

The table Sequence Distribution of Selected Plant Species lists the different components from different plant species. As you can see, plant species exhibit a wide range of values for each of the components. The genome of Arabidopsis is essentially entirely single copy sequences (the repetitive sequences have been determined to be essentially all chloroplast DNA). At the other extreme, pea and wheat genomes have only 10-20% single copy sequences. Reassociation kinetic experiments of polyploid species, such as bread wheat (Triticum aestivium ) were unable to derive a component that displayed true single copy kinetics. Instead the slowest component appeared to act as if it consisted of copies represented three times. This result is consistent with the current hypothesis that bread wheat was developed from the introgression of three diploid wheat species. Genomic analysis suggest that this hypothesis is correct since the slowest component appears to consist of sequences represented three times.
Copyright © 1998. Phillip McClean