DNA

1. Generate a random DNA sequence (from https://www.bioinformatics.org/sms2/random_dna.html)

aactcgttggctatgttgcctcgcgttggggactgtgcattacgtcctgtccttcaagta
gacgggcacgcactggtcactttgtaattgcgcagttccgcggcaagtcaacgcccaatt
aaggcgaagtcaagcatgtatgttgtccaccgagtcctgtgagataaagatgagcgtttg
aggaatctagacttcagcgaatcaaccttctctatatcgaccatattaatcgatacgtcc
aaccacgggtacccgatttccgtctgttcaggaagattagactccctgtacacttctatt
accggcgactaacttagatacgcattttgtggaatactttatgagataaggatcaatccg
aggcggatggtttatgccctcagatcccaagagtcaccctaaatgttccggccactaacg
gtgaaatggaccctgtacgcaccatcaagataggatagttttaacattctcaaaagggct
ttcctcaacctgagttctgtgttgctcatccgcagccaggatggtagcctactaggtcac
tttcaagaggggcttttaccgaaatgtggaccttacaaaatatggtacaataaatatcta
ttataggcagtaggtctatcgcatttgagtgccctgtgcagtgcggagtggagccagata
cgccaggaggtgcaaaagttatgtatgcacttcaccgagtcgccgcatcgatgcgagacc
tcatgtagcaaggccgttcggtctgtccgcgtgtggaggccagtagcggacggtgaactg
cccattgaggtgatgcatcaaagggaaacctaatcttgggctctagaaatcacttgcata
ctactatccgactctagtatgctattccgtggtggtcagctagttatttcttctggctga
tacgtacctaagcgttcaacgcagtccaggtggcctgacgacggactcagaaactgaacc
gtatcattcggtcgaagcctaatctccgcacttcgcattctaccaaatgggcctgaaggc
ggtttgacatcagagagaaaatgcgaggataaccggcccagagactcggggtgtctcttt
tttcacgacgtataaccctcccaatagtgtgccgacgatatttctacccacatgtaatga
acaagctttgtagggtttagtggtgcttggttcgtactcaacgtgtaatcattcagagtg
ggccggtttcatgccaaacattaatttacagtgtcggtcgaattcccgtcgggtgtgtgg
tttcctgatttacattttgtgatggcattgtgattgatggctctggagagacgcctagaa
aaggaaactaattctctacggtcttatcatcgttcaatcgtgaggaaaaaggagaggcgt
caaatgtctcccgttctggctcgcggtggcaaattatcagggttcgggagggccagtaag
atcgaagtagtatacgttaatctgcgtggtgggagaccgtaataacgcccggtacagagg
ggtcgctacgacacagagacgcgctcccaagccggagggtaggcatgcaggaattactcc
ggaatttcgcgtgattgccggaaaatctcaagccgcgccttgacagggtaagcgtcattc
ggcacaccaatttttcagctgaccgtgtcgcaccacgcacaggtgtggttaaggatcagg
ctttcagggtcgagcttatcgttgcgaagtatcggcactactggtcttgtcggtctagga
cgattcgttacggtcggcatctatctcatcttacctgcactgacgaccagattaaatttt
agcagataaacggtgttaccatgctttcttcattagttattctccgtgcaggctccgcct
gtgctttacggtattgtgtatgtataccaatgtcatgtttaaattggctctctcacgtga
gtctcaaggattgcattgactgttttcaacacgaatagtttctgagctgtctgttcgagt
ctgataattatcgccccgcgtctaaataccacatcggctaacccacgtagagcgagtcac
gtactaactgagacgtggtgctgagttaaccatttcaccgtatatcactcgatggataag
ttaaccacagcacgcggttcaaaagctacagcttcgcgtaccctttcgaatatcttgcct
tcatcctcttagcgtagtaggtaatcctgccatgaaatttggatctcacgtgagatgccc
gtagccaataagggaaggatatcacggaccttacccggacggatctagatcgagatggat
acaacgggtctttttcaacaactaacgttggttggagatgtgggaaacaagtaagttgta
acggattatgtggtgcagtgagtggttacagaaccctgtagtctatacgtcatgatcgat
ccgttctaaatatgcgtatagatcccagccccccgtcgtgtggtaccagagacggctggt
aaggctggtatggaccaatcacctggggcaacggactcctagtcgtcacaagccccatga
gttggctaggctaagtacgtatttaacaactcctctcgacagtaatgggcaactataggt
ttggacatcttagacttatgatagggcgagcgtgaatcctctggcggccggatatgggga
atatgaaccaagggctgttcagttacaactcaatcccgtattccaggtacaactgaacgc
aaaggttgtgtgcctagtcaccaaggtatctgtctattttgtgcgcacaaaggaaatgat
gtcggctcggattctcgcatacctattaatacagggttttggcaggtaatttccgatgac
actgtgtagaaggtctataaatccacagattttcatcctcaatcgaaggcgtcggcctcc
tagagactgaaacccgccggggcgatgcggatggcgcgctagcagacttctaactgccca
cagattacgccgcatctctgtgagagcgtagacatccgtaccgtgggctctaccgcgtat
cgcggaaactgtgcccgaacgagtagcggctccaggaggttgattcctcagacgctgggc
gcgagcagaagtgcagcattttatctctaacatcaggaggttcgacattgtatgccgaca
gctcattacatgtagcgcgtcaggggcgtcataaagcagtctcgtgctccaaacgtgtca
cttacacaacacccgactttaatgggaaggcagcattctttgaagcagagtcgccttgcc
ggtaccgttctattacatagtatcagaaaggccctcctggtcctcttcatagaagtggga
tctaaaaaattcgtacggcgtatagatgggttgggtctcatcagagcacacacgggagct
ttacctttagttccgtcgggtgaattcggctaactgcttgccgtgtccagcttttagggc
actgaaataggccaccaagctgatataccaaggcaaaaagacccacatttcccgcatcct
ccccctttttgcatataaagcctccgttcaccgcatttccagagaaacaagcactactta
gcgcgcagtcgcagcacgtggagttagtatatgttgtcttgaaaaaacaccgtatatatc
aaagcatagttcggattctcgccagcattgctccctagtgctcaacaccggtcgactcgg
tatccggtgcactctgactgagtgtacgacgcgtgctacaactggtcgctctaattcgcg
cgttgtttagcctttggatatgtaggagcggagctatacggccaaaatcttggaaacgct
tacgtcaggtctgcgagcggtccgggtcgcctccctacgccaagcccaccgatccgactt
tgcgtttggtcatcacagcacacgtagcgcctgtctcacgtgccttcaagccgtctccag
gggggcttatgctgccttattaaccggcggtttggcagagcgagtgtctcggcgggatat
acgagtcttacatcatctacctcggtccatcggacagtatgccagtcattgttcggagcc
aaatccccgcccatgatccaaggaaacccgggagcgggcaaggcccgagcctatctgttg
cgtctctgggcaccgtaacgccaaaaaatctaggtaacgaacatactcggcctaaggggg
ataaccattcatctttggcacataaaaacctttaaatcgaagcccgctggcaccacacga
aacagcccaatcggacctaccaccagcggttgcggaataagatgttacactcaaacctaa
caggaccttaccatccctgtcggcggattaggacgcgacaatctgcccagttgtatccga
attctcaggcgcgccgggagaggatgctcccgacacactgtgttaaaccatcaatttctg
tcatttggttcgtctgtgcttgatgcgtatggtcatcaggatgaagcgaagtcctgcact
cgtctccgttgcttatgcgcttcagtggggtgctcccccgaaaatagaatgtcccagtgg
tgccggctggtccgtctgaccttttcttctttatgctgccgattggccgtgtccatttgc
cccttcttcactacttatccactaggcgtgatggatactgtctacttgcggaccaacacg
tgccaggttggcccgtgttgagtgcaagtaggcgggaaatgcggctggacggattccgca
tgttgcagtcagtcgcaatatacgatttgacctctagatttcgtatcagcatcagtaatt
gcggaggaaaatccaccattggactactaggacccgcgaatcctctaaaatgtggctgag
cgaactaccgcgaacaatgtatgctgggtctttaatggccgctccgcggcaagtaaaaaa
tgccaaatgcctttaagcgcgactctttgggcgaggtcgtgcctgggcgccaccggcacc
ttttgaataattctttcgttcaggccgctaatggcactaaatctgaaattctgaattact
agtatgcccctgccaactcaagagaaaggggtgatactctgcttatagaatcccttgatc
cagagaaagatctggcacgagggttttggcgggtcttggttcgacgtcaaataatagtaa
acgtaacataagatctataacgacgaaagtgcgtaactggcacaggtcggatcctggcac
ttttacgcctaggcatccgctgctgagcctccggggtacaaacagcctaccggcaccgcg
cgggtaaccaagtagattgatcaaaaacctctgacttatcaattgaggctgccagttaag
acccaacagatagacacgtaaattcgtatacccaaaacgtttagtgctcgaaggggcaat
tacgtagcacatctggtacataccttgggcagtcttggagaagatcgaaataagggtaaa
ccttgggggtgcacgttaagtgagtctcagacttttcagcctatatgatgcaagcatagt
cgagcaggtaagccgtcttcacaaattggttatgacgtgctcccatagatccaattgcac
ttatctcataatccagttagacgtctgttagcgaatagagaagcccgctgaatcaccgtt
tctactcgggcctctttcgatcaacttaggatgaagaaaataagtcagtagctgagcata
acgtcaaaatataactcgtaccgtggctgctaatacgtttaagacttgccctggtgtgcg
gtttgtaacttctcctgtggcgtcgatttttgcaatagcgcaacctttcagacatccgtt
gcgtaatgtgttaagcgccgctcggcccgatgatcaagcgtagagcggagtcccccctca
gtagtcgtgctctaaggagaactgccgacaagctagccatcgttgcttccatctactttt
tgatcagagcaccatgaacccacgggaatctttaatgtcgccgcctagtattttggtttg
ctttagacatttgatattccacatgttgcgcagtccgggatatgtcaatggcgcccaaac
acccgctagggggactcgaaatcaaacttctacacgccacgcgcacagcacgatgagtcc
ctacggaatcacggacttgtaccgcccggtgttcagctccctggtaacccaatgatacgc
caggaccccaacctaattgattgatacgccttcgcgcgcatggcgaattgaactcacgac
gaggactagtaccttatagggtcgtggttaattgttcccgacactacctcgaagttgttg
tcacggacagcaaagtcttttccaggtgcccagctactgcagagacatacgaatgtgcag
ttttgcaccgtaatacaagacatcgtgagacgcagtaggatgctcacgtgcacgagatca
gagtacgcattaccaactagcgttgtttattttagcatgttagccacgacctggttgcaa
ttattcgtattactagacctaccacggtcaggaaacggctctacacgaacggtcgcggcg
ctgcgagtcaatctataacgggctcgtccgagaattgtcagctctcccgcgtccggcatc
ggttcaaagtactttttgctacatagggaaagaagcaggccctcttcgttggccatgatg
atagaaaggtaaactggatcgtggactccagccagtggcgacgccatgagaaaagcaaat
tacgcatcgatttgcaggtccggtaacgttccattataagacacaggccgcaagggtatc
cacagccgtcaaactgtaagggaacgatggtcccctaacagactctacgatagcgtatgc
tttgtaatcattttcacaattctccggagcatctccccccgccagcggcccagttcattg
ttctcctccccctctttgctctagctttgtcacgtcgataggcgtaggtccacggaaagt
atgtaaaaaaatcaatccaggaacgtggccccatcgagggtctgtcctctgcggctctgt
gcaattcggatgaataagcgtgccaaaggatcctacgccaggcactagtttgggtatagt
gtaaatccaattgtgcaattccaactagctccatattatgatacatcgttgaggaatctt
cgtgacagaacgacctcagggacctctcagtaatactcaccaaccgacgatctccgctac
aggagactctatcagtggatcgtcgaatgggatgctagcgcgatacattttagccctcgt
aatcacgtcttagtctccctctgacactataggcgtctcttggctcgaggaaaggcaaag
gagtgatctacccttagccacacacaaagaaggggtgcagctacgtgcccgatcgtactc
taggctggtccgcagaattactaaataccacttgaacaatccatcttcatttgattcgtc
gtactgtgtggctaggttggatgtgatccgattctctaaagcaaaagcatttaccttatt
gcgaagacaccctgttgggtatgcttgcgaaaaagtctgcacgtagggctgcttacacga
tccggttcgaggtcgtccactaactcctgaatgacctatccgccatctacgacaggatag
ttggtaaagggggcgtacagatacgaatgatctatataatttagtcatagacacatgccg
acaagccaacacgatcaaccgctatcgaactttagattagggatcactgggtgaaggagt
cgttccggctcgagtacgccctcgcccctttcgcttgcgtccgtatgatggactcatcct
accctccgctacaaggggaagtatattaaacaagcattacgaatagcgctcgtaaccttt
atcattcaacgaaggcttgaagtcttgaacggccctgcgcggtgaggtccgtaacgcttc
aactgcacgcaccagtttgggtgaggagaggctttgtgcgtggtgttcaatagtaagatt
gggcctgtgctgtgtcgcatggacccggggctaaacacggtactcgatctaaagactaaa
cgttgaagccctttcataaccgataacgtagggaattaagttaaaacccctgctcagatg
gtctagccccgtccgggaggtggcggtaagggccatccgggtcgaccacacggggtgctt
cagtttactccccgttacagaaggccctactaacgaatgccgcggagtcataatagtgtg
ttgcataatagcatgatttacttcgcacttcacggcctcagctcatgctcaataccgtag
aagaactttgatcgcaggcgaagatgtgtttgattcgcaaatcgtagtcaattggctgaa
cacgaaatcagactaacagatccgtacatattctagccgtcatagcactgcccttgagat
acgcctatccctcttcacacggtatggcgacgttggttatcactacttcggtaggtagga
gcgacacacggctgcttcgattacattagtagggcggtgcaagcgactgggagttgataa
atttaacgggaccgacccctaagctcgcgggtcctcgaaacacaggttctgtcagagccg
cgatattactaggacccaaacactatactagcacccctcgagagtaggttggggcttggt
tgggcataattcgttctgtcggccagggctacgggactggcgacgccgtgtctaccacgg
tgctgagggcagctgtaacatgcggttgtcggcaactgctctccggaaaaactcacggac
tcttaagtaactggagtatctagcgttcaagtcatttctctctgaaaagtctataccatt
caagattcgcgagtgggagttaatttttcgagggtaactgtatcagtataacagtagacg
gcgttttcttgggttgcattagtccaatgcgaacgtcgagccgggatgtacgtgccacta
ggagcaaacatgtcataatgatccgttgaaccagatgacatgttttatgagaattagtga
cgttgatctgtccgtccggagctcttcgcttcaaatttgttgaaatcaccgtaggctgtg
cgacttaaataaagcaggggtgatgatcatttcatggggatactcagtttgttggacgat
cggggcaaggaacgaggatcggtctcgccggcatcttttcatattctcaatgatggattt
agcctgtcattccatagctacgcggccgactcaataattcagtctcggaccagctagcat
cgcagggtatcagctacagcgttattttcgacccctgcctccttcactcgtcggcaggca
attgtcttatctgaatttaatctctcggcgtttgacctgaggttaagtgagctattgtgg
gtatccactcaggatagcaggatgcgacgcatgcgaaatcacagcttattgcgctgaggg
ccactctcagctcaaagctcctagatacggcgacagctccgtacatagatcaataccacc
ggaccggtgccgtgctgaaaggcgagggtagctctgtaccccagccattcacgagcggat
cgggactcaggcattctcatacacaacaatgtaggcgaaaaaccaatgcacagctcgact
cagagagaagtaaaataatttcggcacgcgaactctctatactccacccacatctgtgct
gccagatagtgtgcctagggcgggtccagatctgcctcttgggtcagcaagtccctggcg
tgctaacatcgcgctcagatcagtgttatgcggactccttctcgaacagacaaggattct
tgtcagacacttaaggacggagtcctaaaagataggggctatgtgggatggagctactta
agcgttttgttatcttattttggtagattctaaccctggcccccggtccactgtaagaat
gcaaccccggggtcacacttagaataatccgcgccgaatgttttgtaagagccagtacga
gccacccaccggacccgggggtatagactaaaacattatcccgaggagcgataacgggac
atgcgtagaagtgagcatttagcccttctggctgcatata

2. What's the number of each character (g, t, c, a)?

3. Detect specific sequences as provided below:

  • gagttaaccattt
  • gaaaggggtgatactctgctta
  • ttaagcgccgctcggcccgatgatcaagcgtagagcggagtcccccctcagtagtcgtgctctaaggagaactgccgacaagctagccatcgttgcttccatctactttttgatcagagcaccatgaacccacgggaatctttaatgtcgccgcctagtattttggtttg

Example code

# read input text file
with open("random_DNA.txt") as f:
    data = f.readlines()
print(data)

# make a single big string from the list of strings
bigString = ""
for each in data:
    bigString += each.rstrip()
print(bigString)

# count the number of "t" data
target = "t"
Count = 0
for c in bigString:
    if c == target:
        Count += 1
print(Count)