i tried this piece of code
For i=1 to 1000000
mystring.s=Str(i)+"'2013-"+mm+"-"+dd+"','"+valoare+"','"+curs+"','"+total+"','"+Str(cont)+"','"+simbolcont+"','Denumire"+Str(i)+"','"+valuta.s+"','"+RSet(Str(i),40,"0")+"','"+total.s+"'"
id.s=UCase(MD5Fingerprint(@mystring.s,StringByteLength(mystring))+SHA1Fingerprint(@mystring,StringByteLength(mystring)))
Next i
the code above is in Purebasic, but i am more intrested in the principle of using this for uniqueid i can say that in 1,000,000 generated strings i did not found any collisions
MD5(String)+SHA1(String) resulting a 72 characters string for uniqueid?
Keep in mind that String is the same in both functions and variyng length 300-350 chars
or the simple question
if a SHA1 collide does a MD5 of same string collide too? or viceversa? i'm not a math genius, but i guess the colliding factor is low..
i can not use uniqueid based on timestamp here.
Thank you for your time.
To answer my own question quote from other forum
If I have two random strings (s1, s2) that are different (s1 != s2), you want to know the probability that md5(s1) == md5(s2) AND sha1(s1) == sha1(s2).
Well, first for two specific randomly chosen strings what is the probability that md5(s1) == md5(s2)? Answer its 1/2^128 as the first hash is some 128-bit string, and the chances that the second hash equals the second is 1 in 2^128 or about 2.9 x 10^-37 %.
Similarly, P(sha1(s1) == sha1(s2)) = 2^-160 ~ 6.8 x 10^-47 %.
Now the probability that that both conditions would be true assuming they are independent conditions (that is that the hashing functions are fundamentally independent of each other), is found by multiplying the probabilities since P(X AND Y) = P(X) P(Y) so P(md5(s1)==md5(s2) AND sha1(s1) == sha1(s2)) = 2^-288 ~ 2 x 10^-85 %.
Granted we assumed the hashing functions act independent of each other on the string -- which is a fair assumption for md5 and sha1 as hashing functions. But if instead of comparing MD5 and SHA-1, we compared MD5 and a new hashing function that's just MD5 applied to itself 100 times, we would find that whenever md5(s1) == md5(s2), that we'd also have md5^100(s1) == md5^100(s2), so the probability of both colliding is the same as the probability of having one collision.
Similarly, if we had a silly "hash" function that was just silly_hash(s) = md5(s) ++ s (where ++ means concatenate), then you could show that if s1 != s2 and md5(s1) == md5(s2) then silly_hash(s1) != silly_hash(s2) -- meaning that you could never have a double collision with md5 and silly_hash.
If you take 2 specific strings and compare, there's a 1 in 2^288 ~ 497323236409786642155382248146820840100456150797347717440463976893159497012533375533056 chance of both matching. Granted if you generate roughly about 2^144 ~ 22300745198530623141535718272648361505980416 strings together, there's a good chance that both hashes will match for one.
Tested with 3,500,000 strings and not a match .. then it's good enough for me (for the db i use to have that much records it requires about 10+ years of input at the rate they input (1.400.000 records in 4 years) - and i did a idcheck on the way (and they can modify if needed 1 char somewhere))
And 22300745198530623141535718272648361505980416? i cant even count that.
Hope it helps someone. The answer is Yes i can use MD5(s1)+SHA1(s1)
as id.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments