• Trend Micro
  • About TrendLabs Security Intelligence Blog
Search:
  • Home
  • Categories
    • Ransomware
    • Vulnerabilities
    • Exploits
    • Targeted Attacks
    • Deep Web
    • Mobile
    • Internet of Things
    • Malware
    • Bad Sites
    • Spam
    • Botnets
    • Social
    • Open source
Home   »   Malware   »   Malware Classification with ‘Graph Hash,’ Applied to the Orca Cyberespionage Campaign

Malware Classification with ‘Graph Hash,’ Applied to the Orca Cyberespionage Campaign

  • Posted on:September 6, 2019 at 3:31 am
  • Posted in:Malware
  • Author:
    Trend Micro
0

By Chia Ching Fang and Shih-Hao Weng (Threat Researchers)

In malware research, threat hunting and sharing of threat intelligence, such as exchanging indicators of compromise (IoCs) in the form of hashes (e.g., MD5s, SHA256s), are common industry practices and helpful for information security professionals. Researchers, for instance, would typically search for malware samples on VirusTotal using hashes. However, hashes have some characteristics that could limit researchers trying to do file or threat correlation, such as the one-to-one relationship between a file and its hash. To overcome this, other hashing techniques, methodologies, and tools have been proposed, such as ssdeep, sdhash, imphash, and even our own Trend Micro Locality Sensitive Hashing (TLSH) — and they can indeed help researchers find and identify the similarities between binary files. These approaches use binary as a point of view.

Alternatively, researchers use other methodologies, such as using graphs as a point of view, to compare an executable file’s characteristics with others. For example, zynamics’ BinDiff does this by taking and viewing a bigger picture of an executable to see the similarities or differences between two executable files. This approach processes two files at the same time.

Our research, which we’ve named “Graph Hash,” builds on the advantages of these two approaches by calculating the hash of executable files using a graph view, which would help in classifying malware more consistently and efficiently. Our research aims to provide a viable approach to malware classification, which, in turn, can help in the sharing of actionable threat intelligence beyond simple checksums, such as MD5s and secure hash algorithm (SHA) families.

We applied Graph Hash to the Orca cyberespionage campaign to demonstrate this method’s viability.

Graph Hash: Viewing Executables Through Call Graphs

Graph Hash uses a call graph to view executables. A call graph, essentially the executable’s workflow, is generated by creating a vertex graph for each of the executable’s functions and then connecting the vertices of the function based on the existence of a call that cross-references one function to another. Figure 1 is an example of a call graph.


Figure 1. An example of a call graph plotted via Interactive Disassembler Pro (IDA Pro)

As shown in the figure above, executables that have the same workflow will share the same call graph. This, in a nutshell, is how we use a call graph to classify executables. After creating the call graph, we convert it into what we call a “call graph pattern” (CGP). CGP is a binary-based data format presented through its corresponding call graph. With CGP, we can get a checksum, carry out fuzzy hashing, and perform search or similarity analysis.

How Call Graphs Work

A call graph is composed of vertices and edges, where vertices represent an executable’s functions and edges indicate the relationship between vertices (e.g., function A will call function B). For instance, we can take two properties of functions and define the “values” of these functions. The vertex can be added with color to better distinguish call graphs more easily.


Figure 2. An example of a vertex

Let’s take Figure 2 as an example. The value is 16 bits (binary), with 0–7 storing the address block. We can then do the following to acquire the address block:

  1. Divide the executable address information into 16 blocks from top to bottom (this means that there is block 0 to block 15).
  2. Given a start address of a function belonging to block 0, we assign the function’s address block as 0, and so on.

In an 8-15-bit address block that stores functions, there are certain types. Note that the function types are based on BinExport (a sub-project of BinDiff):

  • Type 0 — Regular function with full disassembly and isn’t a library or imported function.
  • Type 1 — A well-known library function.
  • Type 2 — An imported function from a dynamic link library.
  • Type 3 — A thunk function that forwards its work via an unconditional jump.
  • Type 4 — An invalid function.

Furthermore, there is a special function among a call graph’s functions that we named “root,” which is the function that an executable starts with. With these, we could use vertices and edges to describe a call graph.

The next step involves doing a call graph traversal, whose build up is what we call CGP. The traversal algorithm we use is Depth-First Search.

There is one concept that we want to emphasize here, which is the “reuse” characteristic of functions. It means that one function in an executable might be called more than once. A call graph view, for example, will go through this kind of vertex and its descendant more than once. We want to keep this kind of path in CGP, however, in order to reduce duplication, improve efficiency, and avoid having to go through all paths from visited ancestor to descendant. We also keep the visited ancestor in the CGP simply to present the reuse relationship.


Figure 3. An example of simple call graph

Figure 3 is an example of a simple call graph. To further demonstrate, the following is an example of how values were assigned to the vertices, root, and edges:

  • Vertices: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
  • Root Vertex: {5}
  • Edges: {1, 9} {2, 0} {5, 9} {5, 6} {6, 1} {8, 3} {8, 4} {9, 7} {9, 8} {9, 2}

In this example, the graph hashing will be “59783420619” and not “5978342061978342”. With this, we can have a CGP to present the related call graph. With CGP, we could calculate the graph checksum and perform fuzzy hashing. In an MD5 of a CGP, for example, it can be labeled “Graph MD5,” while an ssdeep of CGP can be labeled “Graph ssdeep.”

Applying Graph Hash to the Orca Cyberespionage Campaign

To demonstrate Graph Hash, we applied it to the Orca cyberespionage campaign, which we have been tracking since 2011 (and disclosed in 2017). The table below summarizes our findings based on 322 samples of executables that the group purportedly used against their targeted organizations. We were also able to classify 10 malware families based on their tokens, comprised of the communications protocols or command-and-control (C&C) servers that the malware used. We used Graph Hash to calculate the grouping/similarity rates of samples related to Orca’s attacks. The results are in the table below.

Grouping/Similarity Rate of the Samples Groups
Graph MD5 81% (260/322) 71
Graph ssdeep 85% (274/322) 67
File ssdeep 66%   (211/322) 62
By Malware Handler 100% (322/322) 10

Table 1. Graph Hash grouping of samples related to Orca campaign

Of course, Graph Hash has its limitations. For example, this approach heavily relies on the use of IDA Pro, and that CGP’s ability to recognize malware packer routines is limited. We are, however, still fine-tuning Graph Hash and extending its capabilities to ELF and Mach-O files, as well as frameworks or tools for reverse engineering.

Graph Hash was demonstrated in the Hack in the Box GSEC Conference 2019 during the “What Species of Fish Is This? Malware Classification with Graph Hash” session on August 29, 2019 at InterContinental Singapore.

Learn how to protect Enterprises, Small Businesses, and Home Users from ransomware:
ENTERPRISE »
SMALL BUSINESS»
HOME»
Tags: Graph HashMalware Classification

Featured Stories

  • systemd Vulnerability Leads to Denial of Service on Linux
  • qkG Filecoder: Self-Replicating, Document-Encrypting Ransomware
  • Mitigating CVE-2017-5689, an Intel Management Engine Vulnerability
  • A Closer Look at North Korea’s Internet
  • From Cybercrime to Cyberpropaganda

Security Predictions for 2019

  • Our security predictions for 2019 are based on our experts’ analysis of the progress of current and emerging technologies, user behavior, and market trends, and their impact on the threat landscape. We have categorized them according to the main areas that are likely to be affected, given the sprawling nature of the technological and sociopolitical changes under consideration.
    Read our security predictions for 2019.

Business Process Compromise

  • Attackers are starting to invest in long-term operations that target specific processes enterprises rely on. They scout for vulnerable practices, susceptible systems and operational loopholes that they can leverage or abuse. To learn more, read our Security 101: Business Process Compromise.

Recent Posts

  • Obfuscation Tools Found in the Capesand Exploit Kit Possibly Used in “KurdishCoder” Campaign
  • Mobile Cyberespionage Campaign Distributed Through CallerSpy Mounts Initial Phase of a Targeted Attack
  • Operation ENDTRADE: Finding Multi-Stage Backdoors that TICK
  • Patched GIF Processing Vulnerability CVE-2019-11932 Still Afflicts Multiple Mobile Apps
  • Mac Backdoor Linked to Lazarus Targets Korean Users

Popular Posts

  • Mac Backdoor Linked to Lazarus Targets Korean Users
  • New Magecart Attack Delivered Through Compromised Advertising Supply Chain
  • Fake Photo Beautification Apps on Google Play can Read SMS Verification Code to Trigger Wireless Application Protocol (WAP)/Carrier Billing
  • 49 Disguised Adware Apps With Optimized Evasion Features Found on Google Play
  • Microsoft November 2019 Patch Tuesday Reveals 74 Patches Before Major Windows Update

Stay Updated

  • Home and Home Office
  • |
  • For Business
  • |
  • Security Intelligence
  • |
  • About Trend Micro
  • Asia Pacific Region (APAC): Australia / New Zealand, 中国, 日本, 대한민국, 台灣
  • Latin America Region (LAR): Brasil, México
  • North America Region (NABU): United States, Canada
  • Europe, Middle East, & Africa Region (EMEA): France, Deutschland / Österreich / Schweiz, Italia, Россия, España, United Kingdom / Ireland
  • Privacy Statement
  • Legal Policies
  • Copyright © Trend Micro Incorporated. All rights reserved.