I wrote a method to calculate the distance of cosine between two arrays:
Def cosine_distance A, B): If LAN (A)! = Lane (B): Returns wrong digit = 0 denom = 0 in dinob = i i category (lane (a)): degrees + a = i [i] * b [i] denoma + = abs (a [i]) * * 2 denomb + = abs (b [i]) ** 2 results = 1 - fraction / (sqrt (denoma) * sqrt (dinob)) returns result
running this one The large array can be very slow. Is there a customized version of this method that will run fast?
Update: I have tried to do all the tips till today, including sliced. Here's the version of the beat to include suggestions from Mike and Steve:
def cosine_distance (a, b): if len (a)! = Len (b): Increase valueError, "a and b should have the same length" #importes = 1 = 0 denom = 0 = 0 for = 0 in dinob = 0 i optimize the mean: AI = a [i] # This is the result of diploid + = bi * bi = 1 - fraction / (sqrt) only once BI = B [I] fraction + = AI * bi # exponent (barely) denoma + = ai * ai #strip abs () (Denoma) * sqrt (dinob)) Results Results
If you use SciPy You can use the cosine
from Local Distance
:
If you can not use SciPy, you can get a little speed by typing your python again (edit: but this does not work Was doing as I thought it would be, see below). Importing izip from math copy to itertools from
Ort sqrt def cosine_distance (a, b): if LAN (A)! = Lane (b): increase the value, "a and b should be the same length" fraction = sum (tup [0] * tup [1] Izip for a tup (a, b) denom = sum (avalue ** 2 For avalue) dinob = amount (for bvalue in bvalue ** b) result = 1 - fraction / (sqrt (denoma) * sqrt (dinob)) returns result
A and b The length of the mismatch is better when lifting an exception.
sum (for using the generator quote inside the call))
You can calculate your values with most of the functions being done by C code inside Python. The should be faster than using for the loop.
I have not given time, so I can not guess how fast it can be. But SciPy codes are almost certainly written in C or C ++ and should be as soon as possible.
If you are doing bioinformatics in Python then you should actually use SciPy.
EDIT: Darius Bacon finished my code and found it slow. So I ended my code and ... Yes, it is slow Lesson for everyone: When you are trying to speed things up, do not guess, measure.
Why is the slow attempt to do more work on the C-Internal of Insanity, I tried to do it for the list of length 1000 and it was still slow.
I can not spend much time trying to hack Python cleverly if you need more speed, then I suggest you try SciPy.
Edit: Without time, I have just tested by hand I think that for small A and B, the old code is fast; For long time A and B, the new code is fast; The difference is not large in both cases (now I am wondering if I can trust the timetable on my Windows computer; I want to try this test again on Linux.) I work to get it faster I will not change the code and once I urge you to try a sympiune. : -)