Disclaimer: This is Totally Untrue.


2.3.6 Genetic Polymorphism

2.3.6.1 General

Based on such context, Haplogroup information would be analized, while knowledge of polymorphism and DNA is required for an accurate understanding.

2.3.6.2 Categories of Genetic Polymorphism

Base pairs (bp) of DNAs in a human cell are some 6,000,000,000. One cell has 46 DNAs. Then one DNA has some 130,000,000 base pairs (bp) on average. As far as they are humans, base sequences are mostly (some 99%-99.9% of the sequences) the same. However, they are not the same aside from twins. Some 0.1%-1% of base sequences of DNAs differ from person to person. The difference or diversity of DNA base sequences is called genetic polymorphism. "poly" means "multi," "morph" means "form," and "genetic polymorphism" roughly means "diversity, variation, or variety of DNA base sequences." There are some types of genetic polymorphism.

Base sequences of DNAs sometimes vary. For example, if an original sequence is (1), examples of varieties could be like (2), (3), and (4).

(1) AACATCAGCAGCAGCAGCAGCGCTTAG
(2) AACATCAGCAGCAGCAGCAGCAGCAGCGCTTAG
(3) AACATCAGCAGCAGCAGCGCTTAG
(4) AACGTCAGCAGCAGCAGCAGCGCTTAG

(1) has 5 repeats of CAG sequence.
(2) has 7 repeats of CAG sequence. (2 more CAG are added)
(3) has 4 repeats of CAG sequence.
(4) has 5 repeats of CAG sequence, while the 4th base A in (1) is replaced by G.
These ((2), (3), and (4)) are examples of genetic polymorphism.
Diversity of the generated sequences or generated sequence is called "polymorphism."
* "Polymorphism in Wikipedia" http://en.wikipedia.org/wiki/Polymorphism_(biology)

Genetic polymorphism would commonly be categorized into 3 categories as follows depending on the length of the repetitive units, while the definitions are somewhat controversial.

SNP
The type of genetic polymorphism in (4) could be called "Single Nucleotide Polymorphism (SNP)," since just one base (single nucleotide) is changed in (4) (compared with (1)).

STRP
The type of genetic polymorphism in (2) and (3) could be called "Short Tandem Repeat Polymorphism (STRP)," since short base sequences (3 base (CAG) sequences in this case) repeat in tandem. Number of repetition varies. The definition of the base sequence unit length, "short" in this case, would range between 2-5 and 2-9 base sequence, while somewhat associated with the concept of Microsatellite.
The reason why repetitions such as STRP and VNTR (mentioned below) are generated would be explained associated with "Retrotransposon" mentioned below.
* "Short Tandem Repeat in Wikipedia" http://en.wikipedia.org/wiki/Short_tandem_repeat
* "Microsatellite in Wikipedia" http://en.wikipedia.org/wiki/Microsatellite_(genetics)

VNTR
The other type of genetic polymorphism is "Variable Number of Tandem Repeat (VNTR)." VNTR would be defined as variation of repetition of middle-length base sequences.
An example of VNTR is as follows. In this case, the length of the unit is 13 bp. Numbers of repetition vary as well. It may repeat ranging from 12-17 times in this case.
ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG ACAGGGTGTGGGG
The unit length defined as VNTR would be some 10-80 bp, while details are controversial. On the other hand, VNTR is commonly defined associated with Minisatellite.
* "Variable Number Tandem Repeat in Wikipedia" http://en.wikipedia.org/wiki/Variable_number_tandem_repeat
* "Minisatellite in Wikipedia" http://en.wikipedia.org/wiki/Minisatellite

2.3.6.3 Categories of DNA Sequences

Base pairs in one human cell are some 6,000,000,000. One cell has 46 DNAs. 46 DNAs consist of 2 genomes. 1 genome consists of 23 DNAs, 3,000,000,000 base pairs (bp). (2 genomes x 23 DNAs = 46 DNAs) Then one DNA has 130,000,000 base pairs (bp) (=6,000,000,000/46) on average. DNA sequence could be categorized into "gene DNA regions" and "non-gene DNA regions."
The definition of "gene" is disputable, while the dominant meaning of "a gene" would be a unit (or information) of "DNA sequence (a region) which will be transcripted into mRNA, tRNA, or rRNA and adjacent related sequences." (One gene basically corresponds to one protein or one RNA to be created. A gene (region) may include introns.) It is said that some 22,000 genes (regions) are in one genome, 23 DNAs.
Among the 3,000,000,000 base pairs (bp) of 23 DNAs, gene DNA regions are 30%, 900,000,000 base pairs (bp). The other 70% of might be called "non-gene DNA region."
DNA sequence might be categorized into another way, "coding DNA regions" and "non-coding DNA regions." "coding DNA regions" means base sequence regions which codes for amino acids corresponding the codon table. "coding DNA regions" are naturally included in "gene DNA regions."
Then sequence of 3,000,000,000 base pairs (bp) of human 23DNAs (one genome) would be roughly categorized as follows.

Gene DNA regions (regions associated with RNAs) (30% of a genome)
  Gene DNA regions associated with m RNA creation (amino acid/protein creation)
    Coding DNA regions (coding sequences: CDS) (1-1.5% of a genome)
    Non-coding DNA region adjacent to coding DNA regions (27% of a genome)
      Untranslated regions (UTR)
      Introns
      Spacer DNA: (*Spacer DNA might be categorized as Non-repeat Sequences)
    (* CDS and UTR are called exon)
  Gene DNA regions associated with non-coding RNAs (2% of a genome)
    Gene DNA regions associated with t RNA creation
    Gene DNA regions associated with r RNA creation
Non-gene DNA regions (regions not so associated with RNAs) (70% of a genome)
  Non-repeat Sequences (16% of genom)
    Pseudogenes (no functional sequences)
    Spacer DNA: (*Spacer DNA might be categorized as Non-coding DNA)
  Repeated Sequences (54% of a genome)
    Tandem Repeated Sequences (8% of a genome)
      Satellite DNA (large tandemly repeated sequences which mostly compose centromeres)
      Minisatellite (Variable Number of Tandem Repeat) (repeats of some 10-60 base sequences)
        e.g. Telomeres
      Microsatellite (Short Tandem Repeat) (repeats of some 2-6 base sequences)
    Interspersed Repeated Sequences (46% of a genome)
      Retrotransposon Repeated Sequences (43% of a genome)
        Long Terminal Repeat Retrotransposon (Endogenous Retroviruses) (8% of a genome)
        Short Interspersed Nuclear Elements (SINEs) (14% of a genome)
          e.g. Alu Sequence
        Long Interspersed Nuclear Elements (LINEs) (21% of a genome)
      Inverted Repeated Sequences (3% of a genome)
        DNA Transposon

*STRP and VNTR
STRP and VNTR, genetic polymorphism mentioned above, correspond to Microsatellite and Minisatellite of Non-gene DNA regions involving the above categorization. Since it is said that genetic polymorphism of STRP and VNTR (variation of repetition) frequently occur through generations (through varied reproductive cells), STRP and VNTR don't fit tracing lineage. For example, frequently (probability) of STRP is said to be some 0.0001/generation (based on the traditional theory of evolution). Varieties of repetition supposedly occur through enzymes involving Retrotransposon.

*SNP
In contrast, SNPs (Single Nucleotide Polymorphism) lie everywhere in a genome over the above categorization. SNPs occur through mistranslation of DNA duplication. Since it is said that genetic polymorphism of SNP less frequently occurs, SNPs fit tracing lineage. Frequency (probability) of SNP is said to be some 0.000000001/generation - 0.00000001/generation (based on the traditional theory of evolution). That's why haplogroup is defined associated with SNPs.
* "Haplogroup in Wikipedia" http://en.wikipedia.org/wiki/Haplogroup

*Retrotransposon
In addition, Retrotransposons are distinctive sequences to be learned. Retrotransposons are distinctive sequences, their copied sequences are frequently found in genomes. Present-day Retrotransposons seem stable and they won't create new copied sequences. However, it seems Retrotransposons have ever created their copies and inserted them into DNA sequences. Generally, if DNA sequences have promoters and so forth, they would be transcripted into RNAs. Specific sequences of DNAs are transcripted into RNAs. This is a part of so-called "central dogma." However, it was found that RNAs could be transcripted into DNA sequences in rare cases (reverse transcription). The enzymes to carry out reverse transcription are called "reverse transcriptase."
* "Reverse Transcriptase in Wikipedia" http://en.wikipedia.org/wiki/Reverse_transcriptase
Then copies of specific DNA sequences could be inserted into existing DNA sequences. A representative retrotransposon repeated sequence is "Alu sequence."
* "Alu Sequence in Wikipedia" http://en.wikipedia.org/wiki/Alu_sequence
It was named after the relevant enzyme isolated from bacteria "Arthrobacter luteus."
* "Arthrobacter in Wikipedia" http://en.wikipedia.org/wiki/Arthrobacter







Return to the Home Page