I have two 50kb string data in vars doc1
and doc2
coming form the database through entity framework. I want to compare these two vars to see if doc1 and doc2 are equal. I can take the hashset of the strings and compare the hashes. Or I can simply use if (doc1 == doc2)
Is there a third option that's better?
If there is no third option, does anyone have any suggestion (a logical one is good) in regards to hashset v. ==
in terms of optimization, performance and what IL does in the background? I would imagine that a hashset
would have to scan the string in a linear fashion to the end to create a unique hash string (for two vars). So does ==
. Then which one is logically better?
The ==
operator compares character by character but stops as soon as it finds a mismatch between the two strings, so in that regard it will be able to perform better as it likely will not have to scan the whole strings and even in the worst case when they are equal you have not performed any more work than hashing them both.
If you are really concerned about performance, you could store precomputed hashes of the long strings in the database and thus would not have to look at their content at all (provided a hash collision is not fatal).
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments