Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Exercise: Count Amino Acids

  • Each sequence consists of many repetition of the 4 bases represented by the ACTG characters.

  • There are 64 codons (sets of 3 bases following each other)

  • There are 20 Amino Acids each of them are represented by 3 bases (by one codon).

  • Some of the Amino Acids can be represented in multiple ways, represented in the Codon Table. For example Histidine can be encoded by both CAU, CAC

  • Create a file called count_amino_acids.py that given a file witha DNA sequence in it, will count the Amino acids from the sequence.

  • Read the sequence saved in a txt file.

  • You can generate a sequence with a random number generator and save it to that file, but it would be much better if you used a real sequence.

  • An even better way would be to read the sequence from a FASTA file. You can download one from NCBI.

  • Skeleton:

codon_table = {
    'Phe' : ['TTT', 'TTC'],
    'Leu' : ['TTA', 'TTG', 'CTT', 'CTC', 'CTA', 'CTG'],
    'Ile' : ['ATT', 'ATC', 'ATA'],
    'Met' : ['ATG'],
    'Val' : ['GTT', 'GTC', 'GTA', 'GTG'],
    'Ser' : ['TCT', 'TCC', 'TCA', 'TCG', 'AGT', 'AGC'],
    'Pro' : ['CCT', 'CCC', 'CCA', 'CCG'],
    'Thr' : ['ACT', 'ACC', 'ACA', 'ACG'],
    'Ala' : ['GCT', 'GCC', 'GCA', 'GCG'],
    'Tyr' : ['TAT', 'TAC'],
    'His' : ['CAT', 'CAC'],
    'Gln' : ['CAA', 'CAG'],
    'Asn' : ['AAT', 'AAC'],
    'Lys' : ['AAA', 'AAG'],
    'Asp' : ['GAT', 'GAC'],
    'Glu' : ['GAA', 'GAG'],
    'Cys' : ['TGT', 'TGC'],
    'Trp' : ['TGG'],
    'Arg' : ['CGT', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'],
    'Gly' : ['GGT', 'GGC', 'GGA', 'GGG'],
    'STOP' : ['TAA', 'TAG', 'TGA']
}


  • You will want to convert this to a dictionary that maps each codon to an Amino Acid. Do it programmatically!