
Locality Sensitive Hashing (LSH) is an algorithm known for enabling scalable, approximate nearest neighbor search of objects. LSH enables a precomputation of a hash that can be quickly compared with another hash to ascertain their similarity. A practical application of LSH would be to employ it to optimize data processing and analysis. An example is transportation company Uber, which implemented LSH in the infrastructure that handles much of its data to identify trips with overlapping routes and reduce inconsistencies in GPS data. Trend Micro has been actively researching and publishing reports in this field since 2009. In 2013, we open sourced an implementation of LSH suitable for security solutions: Trend Micro Locality Sensitive Hashing (TLSH).
TLSH is an approach to LSH, a kind of fuzzy hashing that can be employed in machine learning extensions of whitelisting. TLSH can generate hash values which can then be analyzed for similarities. TLSH helps determine if the file is safe to be run on the system based on its similarity to known, legitimate files. Thousands of hashes of different versions of a single application, for instance, can be sorted through and streamlined for comparison and further analysis. Metadata, such as certificates, can then be utilized to confirm if the file is legitimate.
Read More