Single-Nucleotide Polymorphisms (SNPs): Difference between revisions

no edit summary
No edit summary
Line 27: Line 27:
== Single-Nucleotide Polymorphisms (SNPs) ==
== Single-Nucleotide Polymorphisms (SNPs) ==


All human beings are 99.9% identical in their genetic makeup meaning that at out of the 3 billion genes we all have 99.9% are the same in all humans. The places where it is possible for a variance to occur are called SNP's which stands for Single-nucleotide polymorphisms. SNPs are the main force behind DNA and what gives it it's genealogical value. When two individuals have enough matching SNPs in a row, this becomes a matching segment. The more matching SNPs there are, the bigger the segment is. If a segment is big enough (bigger than 15cm), then the segment must be identical by descend (IBD) which means the two individuals share that segment because they both descend from a common ancestor who passed on that segment of DNA to both of them. The more matching segments there and the bigger they are, the closer two test takers are probably related. By testing a sample of a person's SNP's and then comparing them to everyone else in the database, it is possible to identify a person's genetic relatives. Most major companies will test 500-600k SNPs.
Of the 3.2 billion base pairs we all have 90% are identical in all humans. Of the four nitrogenous bases (A, C, G, or T) each of those base pairs are homozygous meaning they can only have one of the four possible alleles. They simply make us human. The places where it is possible for a variance to occur are called SNPs which stands for Single-nucleotide polymorphisms. SNPs are the main force behind DNA and what gives it it's genealogical value. When two individuals have enough matching SNPs in a row, this becomes a matching segment. The more matching SNPs there are, the bigger the segment is. If a segment is big enough (morethan 15cm), then the segment must be identical by descend (IBD) which means the two individuals share that segment because they both descend from a common ancestor who passed on that segment of DNA to both of them. The more matching segments there and the bigger they are, the closer two test takers are probably related. By testing a sample of a person's SNP's and then comparing them to everyone else in the database, it is possible to identify a person's genetic relatives. Most major companies will test 500-700k SNPs.


In theory each SNP can be one of the four nitrogenous bases (A, C, G, or T), but in practice only two are ever found at each specific spot the vast majority of the time. There is usually a major allele and a minor allele that is present in at least 5% of test takers. In autosomal DNA, each person will have two nitrogenous bases at each spot, one inherited from their mother and one inherited from their father. This means that at each SNP tested a person can have one of three combinations: two copies of the major allele (called a homozygous SNP), one copy of the major allele and one copy of the minor allele (called a heterozygous SNP), and two copies of the minor allele (called a homozygous SNP). When two people have at least once matching SNP at each spot, it is a half match, if both SNPs match it is a full match, and if neither SNP matches it is no match. Since there are only three possible combinations at each spot, many people will be either a full or half match at any given SNP by coincidence even though they are not related. In fact, somebody who is heterozygous will match everybody on earth at that spot. This is why it is important that hundreds-thousands of SNPs in a row match to be confident that a matching segment is identical by descent and not just a coincidence.
In practice only two alleles are ever found at any one SNP the vast majority of the time. The allele that is most common is called the major allele the less common is called the minor allele. In autosomal DNA, each person will have two alleles at each spot, one inherited from their mother and one inherited from their father. This means that at each SNP tested a person can have one of three combinations:
 
1. Two copies of the major allele (called a homozygous SNP)
2. Two copies of the minor allele (called a homozygous SNP).
3. One copy of the major allele and one copy of the minor allele (called a heterozygous SNP)
 
 
When two people have at least once matching allele at a SNP, it is a half match, if both SNPs match it is a full match, and if neither SNP matches it is no match. Since there are only three possible combinations at each spot, most people will be either a full or half match at any one SNP by coincidence even though they are not related. In fact, somebody who is heterozygous will match everybody on earth at that spot. This is why it is important that hundreds or thousands of SNPs in a row match to be confident that a matching segment is truly identical by descent.


== Phasing ==
== Phasing ==


The order that the SNPs are listed within your raw data file is arbitrary and it is impossible to know which gene came from each parent without comparing the raw data to another relative. If both genes are the same (A A for example) then one A came from mom and one from dad. If your DNA is heterozygous at a certain SNP (C G for example), the only way to know which parent gave you the C and which the G is by comparing against other relatives. Sorting out the paternal and maternal SNPS is called phasing. In this situation, if you compared your DNA against your mom's and at the spot where you have C G, your mom has G G then you must have inherited the C from your dad and the G from your mom. If you are C G and your mom is also C G, then it is still unclear which gene came from which parent and comparing against your dad or another relative would be necessary to figure it out. Programs such as GedMatch.com offer the ability to phase your DNA by comparing it against one or both parents. Using phased kits reduces the amount of false segments identified between you and a match and is a valuable tool for people interested in small DNA segments. However, in cases where you and the parent being compared against are both heterozygous (like C G) the value becomes a no call and is discarded from the comparison. For this reason, comparing your DNA against both parents creates better results than just comparing against one. Perhaps in the spot where you and your mom are both C G, your father is C C, now it can be concluded you inherited the C from your father and the G from your mother.
Your raw data file lists your SNPS in the following order.
 
1. The leftmost SNP on chromosome 1 is listed first.
2. The next SNP listed is whatever SNP comes next moving from left to right
3. Once the last SNP (the rightmost SNP) is read the process repeats with the next chromosome
 
At each SNP, however, the order the allele's are nost listed in any particular order. AG, for example means the exact same thing as GA. It would be nice if the first allele was always from the father and the second always from the mother or vice versa but unfortunately it is a random mixture. When dealing with a heterozygous SNP the best way to know which allele is from which parent is to compare against other relatives. Sorting out the paternal and maternal SNPS is called phasing. In this situation, if you compared your DNA against your mom's and at the spot where you have C G, your mom has G G then you must have inherited the C from your dad and the G from your mom. If you are C G and your mom is also C G, then it is still unclear which allele came from which parent and comparing against your dad or another relative would be necessary to figure it out. Programs such as GedMatch.com offer the ability to phase your DNA by comparing it against one or both parents. Using phased kits reduces the amount of false segments identified between you and a match and is a valuable tool for people interested in small DNA segments. However, in cases where you and the parent being compared against are both heterozygous (like C G) the value becomes a no call and is discarded from the comparison. For this reason, comparing your DNA against both parents creates better results than just comparing against one. Perhaps in the spot where you and your mom are both C G, your father is C C, now it can be concluded you inherited the C from your father and the G from your mother.
[[Category:Genetic_Research]]
[[Category:Genetic_Research]]