Collection of Volatile Data (Linux)
Collection of Volatile Data (Linux)
Introduction
During the incident response process we often come across a situation where a compromised system wasn't powered off by a user or administrator. This is a great opportunity to acquire much valuable information, which is irretrievably lost after powering off. I'm referring to things such as: running processes, open TCP/UDP ports, program images which are deleted but still running in main memory, the contents of buffers, queues of connection requests, established connections and modules loaded into part of the virtual memory that is reserved for the Linux kernel. All of this data can help the investigator in offline examination to find forensic evidence. Moreover, when an incident is still relatively new we can recover almost all data used by and activities performed by an intruder. Sometimes the live procedure described here is the only way to acquire incident data because certain types of malicious code, such as LKM based rootkits, are loaded only to memory and don't modify any file or directory. A similar situation exists in Windows operating systems -- the Code Red worm is a good example of this, where the malicious code was not saved as a file, but was inserted into and then run directory from memory. On the other hand, methods presented below also have serious limitations and violate the primary requirement of the collection procedure for digital investigation -- a requirement which can not be easily fulfilled. That is: every user and kernel space tool used to collect data by nature changes the state of the target system. By running any tools on a live system we load them into memory and create at least one process which can overwrite possible evidence. By creating a new process, the memory management system of the operating system allocates data in main memory and then can overwrite other unallocated data in main memory or in the swap file system. Other problems arise when we plan to take legal actions and need to comply with local laws. The signs of intrusions found in images of main memory can be untrusted, because they could be created by our acquisition tools. So before taking any action we must decide whether to acquire some data from a live compromised system or not. It is very often worth it to collect such information. In the main memory image we can find passwords or decrypted files. Using /proc pseudo file system we can also recover programs that have been deleted but are still allocated in memory. In an ideal world, I could imagine a kind of hardware based solution for Intel-based computers, which would allow us to dump the whole memory to an external storage device without assistance of operating system. Such a solution exits on Sparc machines, whereby we can dump the whole physical memory by using the OpenBoot firmware. Unfortunately, no similar solution exists for Intel- or AMD-based computers. Despite the above problem, software based methods also have advantages for forensic purposes, and I'll try to show them in this paper. The main goal of this article is a presentation of methods used during an evidence collection procedure. All collected data can be used later to perform offline forensic analysis. Some of presented tasks can be also be performed in the preparation and identification phases of the incident response cycle -- these are two of the six phases defined in a guide called "Incident Handling Step by step", published by the SANS Institute.
2. Forensic Analysis
This article is divided into four related sections:
2.1 Fitting to the environment 2.2 Preparing the forensic toolkit media 2.3 Data collecting from a live system - step by step procedure 2.4 Initial data analysis and keyword searching Sections 2.1, 2.2, and part of 2.3 will be discussed in this article; the remaining steps and some offline procedures will be discussed next month in part two of this article series.
Try not to run programs on a compromised system. Why? An intruder could modify system commands (such as a netstat) or system libraries (such as a libproc), rendering the results unreliable. To fulfill this criteria we have to prepare versions of the tools which are compiled statically. Try not to run programs which can modify the meta-data of files and directories. All results from the investigation must be written to a remote location. To fulfill this criteria we will
use the remote host as our destination location. The netcat tool will be used to transfer digital data. You have to use tools to calculate the hash values of the digital data. This is a kind of assurance that the digital data has not been altered. A best practice is to make sure that data is not altered and is properly saved on the destination host, so we also will compare hash values calculated on both the source and the destination. Sometimes it's impossible to calculate a hash value on the compromised host -- a good example of this is with main memory. When we try to use md5sum on the /dev/mem device twice in a row, every time the hash value will be different. This happens because every time we load that program into memory (and thus create a new process which needs memory to operate) we change the state of the memory. In our procedure we calculate hash values of digital data immediately after collection is completed, as well as (when possible) on both the source and destination host. To maintain the integrity of all results we will use md5sum tool. The required criteria about preventing our tools from writing data to the memory and even the swap space of the compromised system cannot be fulfilled for some steps. This will be discussed in greater detail in section 2.3. For now, let's ensure we have a proper forensic toolkit on removable media, as showin in Table 1.
program 1 nc
source & method of creation http://www.atstake.com/research/tools/network_utilities/nc110.tgz How to build: $tar zxvf nc110.tgz; make linux How to verify: file nc or ldd nc http://www.gnu.org/software/fileutils/fileutils.html (added to core utilities) http://www.gnu.org/software/coreutils/ How to build: $ tar zxvf coreutils-5.0.tar.gz; configure CC="gcc -static", make How to verify: file date cat or ldd date cat http://www.porcupine.org/forensics/tct How to build: $tar zxvf tct-1.14.tgz; make CC="gcc -static" How to verify: file pcat or ldd pcat http://www.phrack.org/phrack/61/p61-0x03_Linenoise.txt To make the module more "independent" we have to delete the following lines from the source code: #ifdef CONFIG_MODVERSIONS #define MODVERSIONS #include <linux/modversions.h> #endif We can load this module to other kernels by removing the MODVERSIONS. How to build: $ gcc -c hunter.c -I/usr/src/linux/include/
dd
datecat
pcat
Hunter.o
insmod
http://www.kernel.org/pub/linux/utils/kernel/modutils/for kernel 2.4 How to build: $./configure-enable-insmod_static; make How to verify: file insmod.static or ldd insmod.static
http://freshmeat.net/projects/net-tools/ How to build: $bzip2 -d net-tools-1.60.tar.bz2; tar xvf net-tools-1.60.tar.bz2; NetstatArproute make config; make CC="gcc -static" How to verify: file netstat arp route or ldd netstat arp route dmesg http://ftp.cwi.nl/aeb/util-linux/util-linux-2.12.tar.gz How to build: $./configure; make CC="gcc -static" How to verify: file dmesg or ldd dmesg
When we build all above tools successfully, we can copy all of them to our removable media (such as a CD-RW disc).
Modified Meta-data by the mount command atime atime atime atime atime, mtime, ctime
/dev/cdrom /bin/mount
atime atime
*We can avoid access to this file by using a "-n" switch. We can imagine a situation when an intruder modifies the mount command. When someone tries to run this command perhaps a special process, which removes all evidence from the compromised system, is initiated instead of allowing the media to be mounted. Such a process is called a "deadman switch". But let's assume this is not the case, and now go back to the process of data collection. I suggest that one verify every command that is going to be put on the forensic toolkit media, which later will be used on the compromised system to collect evidence. We also have to stop and think about potential problems met during the mounting process:
This is a kind of screenshot, and of course we have to use a digital camera to do this task. This is a simple step. After putting the media into a drive, the Volume Manager process will mount the media automatically. Which files and directories will be modified? Are these files listed in the table 1? Suppose an unknown media is currently mounted on a compromised system. Then the first task is to unmount that media. How should we safely unmount it? I can suggest two solutions. We can use the untrusted unmount command or we can put the trusted unmount command (statically linked) on a floppy disc. Next, we use the untrusted mount command to mount the floppy and then run the trusted unmount command. It is a little bit complicated but effective. We still use only one untrusted command. An administrator is logged off or even worse an administrator password is changed by an intruder. When the administrator is logged off we have to login into the system. What files will be accessed or modified during the login process? How many additional processes will be created? If the administrator password was changed what are the other accounts on the system? What volatile data can be collected without access to a shell? Open TCP/UDP ports, current connections, what else? Are there other unpredictable problems?
# mount -n /mnt/cdrom If the mounting process is successful we can start with the most important phase of data collection. Remember, all results generated by trusted commands have to be sent to the remote host. I use the netcat tool and the pipe method to do this. To better differentiate which tasks are performed on which host, all commands run on the compromised host will be prefixed with a (compromised) word in brackets. Commands run on the remote host will be prefixed with a (remote) word in brackets. Consider the
following example. To send information about an actual date of the compromised system into the remote location (the IP address of remote host in this case is 192.168.1.100) we have to open TCP port on the remote host as it follows: (remote host)# nc -l -p 8888 > date_compromised Next, on the compromised host we do the following: (compromised host)# /mnt/cdrom/date | /mnt/cdrom/nc 192.168.1.100 8888 -w 3 To maintain the integrity of digital evidence we calculate the hash value of the collected file and clearly document every step on our paper copy, to document this procedure. (remote host)# md5sum date_compromised > date_compromised.md5 Sometimes we can generate checksums on the compromised system and send the result to the remote host. A bit more about some of the problems this can cause has been discussed elsewhere in this article. (compromised host)# /mnt/cdrom/md5sum /etc/fstab | /mnt/cdrom/nc 192.168.1.100 8888 -w 3 Let's go ahead and mount our media, in this case a CD-ROM with our toolkit.
First, we have to collect information from cache tables because the lifetime of this data, placed in the tables, is very short. I will collect data from the arp and routing tables.
(compromised)# /mnt/cdrom/netstat -an | /mnt/cdrom/nc (remote) port (remote)#md5sum connections_compromised > connections_compromised.md5 We can use the cat command instead of the netstat one in this case. Information about open ports is kept in the /proc pseudo file system (/proc/net/tcp and /proc/net/udp files). Information about current connections is placed in the /proc/net/netstat file. All data in those files are represented in the hex format. For example: 0100007F:0401 in decimal is 127.0.0.1:1025. As mentioned before, current connections can be detected by analyzing of the recorded traffic. It is important to note: an easy method of detecting a rootkit, loaded into kernel memory, is when one of its tasks is hiding an open port. We have to scan the compromised host remotely and compare the detected open ports with our result from the netstat command. But this causes a lot of harm and we once again change the state of the compromised system, in step seven I will present an alternate method of detecting hidden LKM based rootkits. Now, we start collecting information about current connections and open TCP/UDP ports. Information about all active raw sockets will be gathered in step eight.
References Alessandro Rubini, Jonathan Corbet. Linux Device Drivers, 2nd Edition. O'Reilly; 2001. Dan Farmer, Wietse Venema. Column series for the Doctor Dobb's Journal.http://www.porcupine.org/forensics/column.html. Daniel P. Bovet, Marco Cesati. Understanding the Linux Kernel, 2nd Edition. O'Reilly; 2002. Kernel source code. http://www.kernel.org Linux manual pages. National Institute of Standards and Technology. Computer Security Incident Handling Guide.http://csrc.nist.gov. PHRACK #61. Finding hidden kernel modules (the extrem way) by madsys. http://www.phrack.org. RFC 3227. Guidelines for Evidence Collection and Archiving. Smith Fred, Bace Rebecca. A guide to forensic testimony. Addison Wesley; 2003. Symantec Corporation. CodeRed Worm. http://securityresponse.symantec.com. The Honeynet Project. Scan 29. http://www.honeynet.org The SANS Institute. Incident Handling step by step. http://www.sans.org