Exercise: Count Amino Acids
-
Each sequence consists of many repetition of the 4 bases represented by the ACTG characters.
-
There are 64 codons (sets of 3 bases following each other)
-
There are 20 Amino Acids each of them are represented by 3 bases (by one codon).
-
Some of the Amino Acids can be represented in multiple ways, represented in the Codon Table. For example Histidine can be encoded by both CAU, CAC
-
Create a file called count_amino_acids.py that given a file witha DNA sequence in it, will count the Amino acids from the sequence.
-
Read the sequence saved in a txt file.
-
You can generate a sequence with a random number generator and save it to that file, but it would be much better if you used a real sequence.
-
An even better way would be to read the sequence from a FASTA file. You can download one from NCBI.
-
Skeleton:
codon_table = {
'Phe' : ['TTT', 'TTC'],
'Leu' : ['TTA', 'TTG', 'CTT', 'CTC', 'CTA', 'CTG'],
'Ile' : ['ATT', 'ATC', 'ATA'],
'Met' : ['ATG'],
'Val' : ['GTT', 'GTC', 'GTA', 'GTG'],
'Ser' : ['TCT', 'TCC', 'TCA', 'TCG', 'AGT', 'AGC'],
'Pro' : ['CCT', 'CCC', 'CCA', 'CCG'],
'Thr' : ['ACT', 'ACC', 'ACA', 'ACG'],
'Ala' : ['GCT', 'GCC', 'GCA', 'GCG'],
'Tyr' : ['TAT', 'TAC'],
'His' : ['CAT', 'CAC'],
'Gln' : ['CAA', 'CAG'],
'Asn' : ['AAT', 'AAC'],
'Lys' : ['AAA', 'AAG'],
'Asp' : ['GAT', 'GAC'],
'Glu' : ['GAA', 'GAG'],
'Cys' : ['TGT', 'TGC'],
'Trp' : ['TGG'],
'Arg' : ['CGT', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'],
'Gly' : ['GGT', 'GGC', 'GGA', 'GGG'],
'STOP' : ['TAA', 'TAG', 'TGA']
}
- You will want to convert this to a dictionary that maps each codon to an Amino Acid. Do it programmatically!