Despite the existence of the Lalign program which could find internal duplications between two strings, I've developed my own algorithm for carrying out a similar function as part of this Rosalind problem.
import distance
b="GACTCCTTTGTTTGCCTTAAATAGATACATATTTACTCTTGACTCTTTTGTTGGCCTTAAATAGATACATATTTGTGCGACTCCACGAGTGATTCGTA"
a="ATGGACTCCTTTGTTTGCCTTAAATAGATACATATTCAACAAGTGTGCACTTAGCCTTGCCGACTCCTTTGTTTGCCTTAAATAGATACATATTTG"
c=0
da=""
db=""
for j in range(32,41):
for i in range(len(a)-j):
c1=0
for k in range(len(b)-j):
if distance.hamming(a[i:i+j], b[k:k+j])<=3:
c1+=1
if c1>c:
c=c1
print c
c=0
da=a
db=b
a=db
b=da
for j in range(32,41):
for i in range(len(a)-j):
c1=0
for k in range(len(b)-j):
if distance.hamming(a[i:i+j], b[k:k+j])<=3:
c1+=1
if c1>c:
c=c1
print c
This code takes two DNA strings that share some short inexact repeat of 32-40 bp that may appear with slight modifications (each repeat differ by ≤3 changes/indels).
As you can see, this code is far from perfect. For one, it involves hardcoding in the sequence (a bad habit of mine). For another thing, anyone human being with at least half a brain will notice that this is essentially the same few lines of code repeated. What could be uglier than seeing the same thing twice? We could easily clean this up:
def func(a, b):
for j in range(32,41):
for i in range(len(a)-j):
c1=0
for k in range(len(b)-j):
if distance.hamming(a[i:i+j], b[k:k+j])<=3:
c1+=1
if c1>c:
c=c1
return c
print func(a, b)
print func(b, a)
Ahh, much better. I wanted to post it as an example of the importance of using functions, even if a script will only be run once or, perhaps, never at all. Now, if only we could do something about that hardcoding habit of mine...
Many of my scripts can be found on my Github page.
I just came across this range of great stats-themed t-shirts from Sassystatistics.com. Surely a must have gift for the stats lecturer in your life. Link to Sassystatistics.com Post written by Christian Jarrett (@psych_writer) for the BPS...
Employment Consultant, Entry Level (Full-Time)- ID 62583 Malcolm & Associates, LLC, Marriotsville, Maryland ...
Case Worker, Entry Level (Full-Time)- Careers4Terps ID 61648 Advocate Support Services Baltimore, Maryland ...
Schedule a Career Assistance Appointment: During these sessions the University Career Center & The President’s Promise staff help UMD undergraduate students plan solutions for career-related concerns while identifying appropriate Center resources,...
A Rockville family is searching for a student to work in-house with their autistic son on Tuesdays, Wednesdays, and Thursdays from 3-7 pm. The student selected will utilize behavioral techniques to teach life skills as well...