Linux is a family of open source operating systems (OS) commonly used to run internet of things (IoT) devices and web servers. The prevalence of the OS, as expected, has turned it into a valuable target for cybercriminals casting wide nets to reach more potential victims.
In the past few years, Linux systems have been susceptible to attacks involving ransomware, cryptocurrency miners, botnets and other types of malware. The successful deployment of the said attacks refutes an old notion that machines and devices that run Linux are less likely to be affected by malware.
To come up with effective countermeasures, we constantly work on developing methods to address concerns pertaining to attacks against Linux systems, for example, by looking for ways to conduct quick and efficient analysis of malware samples that leads to their eventual detection and blocking. One of these methods involves reverse engineering files to locate the address of the main() function, which usually contains code that malware authors craft to start malicious routines.
Using GDB to locate the main() function
Locating the address of the main() function can be easily achieved when malware samples are compiled with symbols, which are references added by the compiler to help in the debugging process. GDB, or the GNU Project debugger, can be used to put a breakpoint at the beginning of the main() function by just passing its name to the “b” (breakpoint) command.
Figure 1. Breaking at the main() function of a binary compiled with symbols.
However, we often find most malware samples stripped, i.e., without symbols. In that case, using GDB wouldn’t work, as shown in our analysis of a Linux ELF malware below.
Figure 2. Screenshot of GDB failing to find the main() function address on samples without symbols.
If symbols are non-existent on samples, another option is to look for the entry point with the “info files” command:
Figure 3. Finding the entry point with the “info files” command.
We can then put a breakpoint on it, run the program using the “r” command, and inspect its code.
Figure 4. Disassembly of the entry point section.
In the above screenshot, GDB is used to disassemble 20 instructions from the current program counter (RIP register in this case) with the “x/20i $pc” command. At the 0x401b24 address, we see the call to the __libc_start_main@plt function. Its prototype is as follows:
int __libc_start_main(int *(main) (int, char * *, char * *), int argc, char * * ubp_av, void (*init) (void), void (*fini) (void), void (*rtld_fini) (void), void (* stack_end));
The first parameter of this function is actually the address of the main() function that we are looking for. In Figure 4, we see that 0x408661 is the first parameter of a function call at 0x401b1d as the RDI register contains the first parameter according to the specifications in System V AMD64 ABI. We can then put a breakpoint on it and continue the execution with the “c” command:
Figure 5. Disassembly of the main() function.
This process is manual, repetitive, and susceptible to changes according to certain characteristics of the sample. For example, a 32-bit ELF sample would show a different disassembly. This code also changes if binaries are linked statically.
Using PEDA to automate processing
As a solution for the time-consuming process, a program that enhances GDB can be used for automation. Case in point: PEDA (Python Exploit Development Assistance), a project written by Long Le Dinh in 2012.
One of PEDA’s interesting features is its capability to look for a place to put a temporary breakpoint for the “start” command from GDB. PEDA will then try to locate the main() function. It has some failover cases if the main() function is not found, as shown in the excerpt from its source code:
Figure 6. Source code of the PEDA function to find an initial place to break at.
Unfortunately, PEDA is not capable of doing the __libc_start_main() trick shown in Figure 1. When the symbol is available, it tries to stop within the __libc_start_main() function, when the execution should be stopped at the main() function. But since the project’s source code is open, we can patch it to implement the manual steps we have illustrated here. The result is as follows:
Figure 7. After typing “start”, GDB will stop at the main() function of this malware sample.
The fork — with two patches — is available on this Github page. It’s tested with both 32- and 64-bit binaries that are linked both statically and dynamically.
Once the main() function is located, the process of reverse engineering a malware sample can be started. From that, we can then conduct analysis of its behavior, C&C server/s, and other features, which are essential information needed for detecting and blocking malware.