9,422
edits
No edit summary |
No edit summary |
||
Line 7: | Line 7: | ||
A raw data file will typically have four or five columns. The first column records the "rsid." This is a code that the DNA company uses to record what SNP is being looked at. Each SNP has a unique combination here that is always rs and then a number. The second column will have a number 1-26. This is the number of the chromosome that the SNP is located on. The X chromosome can be referred to as X or 23, the Y chromosome can be referred to as Y or 24, the pseudo-autosomal region on the Y chromosome can be referred to as PAR or 24, and the mitochondrial DNA can be referred to as MT or 26. The third column will be a number and this records where on the chromosome the SNP is located. The first nitrogenous base on a chromosome is always 1, the next is 2, the next is 3, in order up into the millions. The higher the number, the farther along the chromosome the SNP is located. Since not all bases pairs are tested, you may notice large jumps in the numbers, but the numbers always get higher and higher until the end of the chromosome is reached. The fourth and fifth columns record the two values you have at each SNP. Some companies lump them both into one column (four column template) and some separate each into their own column (five column template). | A raw data file will typically have four or five columns. The first column records the "rsid." This is a code that the DNA company uses to record what SNP is being looked at. Each SNP has a unique combination here that is always rs and then a number. The second column will have a number 1-26. This is the number of the chromosome that the SNP is located on. The X chromosome can be referred to as X or 23, the Y chromosome can be referred to as Y or 24, the pseudo-autosomal region on the Y chromosome can be referred to as PAR or 24, and the mitochondrial DNA can be referred to as MT or 26. The third column will be a number and this records where on the chromosome the SNP is located. The first nitrogenous base on a chromosome is always 1, the next is 2, the next is 3, in order up into the millions. The higher the number, the farther along the chromosome the SNP is located. Since not all bases pairs are tested, you may notice large jumps in the numbers, but the numbers always get higher and higher until the end of the chromosome is reached. The fourth and fifth columns record the two values you have at each SNP. Some companies lump them both into one column (four column template) and some separate each into their own column (five column template). | ||
== Nitrogenous Bases (ATCG) | == Nitrogenous Bases (ATCG) == | ||
DNA chromosomes look like long twisty ladders. The longest chromosome (1) has over 249 million rungs and the smallest (21) has over 48 million. In total there are over 3 billion of these rungs in human DNA. Each rung in the ladder will contain a pair of nitrogenous bases: Adenine (A), Thymine (T), Cytosine (C), and Guanine (G). A and T are always paired together and C and G are paired together. Although two SNPs will always be together at each spot, only one of the two values at each spot will do any coding, the other is just a backbone that holds the structure together. The side that does the coding is called the + strand and the side that is the backbone is the - strand. Sometimes in an A T pair, the A will be the coding gene and the T will be the backbone, other times it will be the reverse and the same is true for C and G pairs. For simplicity, DNA companies will therefore just record the value of a person's + strand at each spot they test. | DNA chromosomes look like long twisty ladders. The longest chromosome (1) has over 249 million rungs and the smallest (21) has over 48 million. In total there are over 3 billion of these rungs in human DNA. Each rung in the ladder will contain a pair of nitrogenous bases: Adenine (A), Thymine (T), Cytosine (C), and Guanine (G). A and T are always paired together and C and G are paired together. Although two SNPs will always be together at each spot, only one of the two values at each spot will do any coding, the other is just a backbone that holds the structure together. The side that does the coding is called the + strand and the side that is the backbone is the - strand. Sometimes in an A T pair, the A will be the coding gene and the T will be the backbone, other times it will be the reverse and the same is true for C and G pairs. For simplicity, DNA companies will therefore just record the value of a person's + strand at each spot they test. |
edits