Parsing Binary File Formats With PowerShell
Parsing Binary File Formats With PowerShell
M a t t G ra eb er @ ma t t ifestati on
PS> Get-Bio
Security Researcher
Former U.S. Navy Chinese linguist and U.S. Army Red Team member Alphabet soup of irrelevant certifications Avid PowerShell Enthusiast
Original inspiration: Dave Kennedy and Josh Kelley "Defcon 18 PowerShell OMFG", Black Hat 2010 Continued motivation from @obscuresec
Fuzzing
You want to generate thousand or millions of malformed files of a certain format in order to stress test or find vulnerabilities in programs that open that particular file format.
Curiosity
You simply want to gain an understanding of how a piece of software interprets a particular file format.
Automation!
Once a parser is written, you can analyze a large number of file formats, quickly perform analysis, and gather statistics on a large collection of files. Example: You could analyze all known good file formats on a clean system, take a baseline of known good and use that as a heuristic to determine if an unknown file is potentially malicious or malformed.
Requirements
A solid understanding of C/C++, .NET, and PowerShell data types is a must!
Windows C/C++ data types are described here:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa383751(v=vs.85).aspx
[Runtime.InteropServices.Marshal]::SizeOf([UInt32]) # 4
DWORD == [System.UInt32]
Per specification, the first two bytes of a DOS header are MZ (0x4D,0x5A).
Trivia What does MZ stand for?
Nowadays, the only useful field of the DOS header is e_lfanew the offset to the PE header. The fields of a non-malicious DOS header are relatively consistent.
To see an awesome abuse of the PE file format and DOS header, check out Alexander Sotirovs TinyPE project.
#define IMAGE_VXD_SIGNATURE
typedef struct _IMAGE_DOS_HEADER { WORD e_magic; WORD e_cblp;
0x454C
// LE
WORD WORD WORD WORD WORD WORD WORD WORD WORD WORD WORD WORD WORD WORD WORD
e_cp; e_crlc; e_cparhdr; e_minalloc; e_maxalloc; e_ss; e_sp; e_csum; e_ip; e_cs; e_lfarlc; e_ovno; e_res[4]; e_oemid; e_oeminfo;
// // // // // // // // // // // // // // //
Pages in file Relocations Size of header in paragraphs Minimum extra paragraphs needed Maximum extra paragraphs needed Initial (relative) SS value Initial SP value Checksum Initial IP value Initial (relative) CS value File address of relocation table Overlay number Reserved words OEM identifier (for e_oeminfo) OEM information; e_oemid specific
WORD e_res2[10]; // Reserved words LONG e_lfanew; // File address of new exe header } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
MATTHEW GRAEBER - CREATIVE COMMONS ATTRIBUTION 3.0 UNPORTED LICENSE.
Optional: An enum representation of e_magic since it contains only three possible, mutually-exclusive values.
Again, you can manually validate that these data types match e.g.
LONG-> System.Int32. A 32-bit signed integer. The range is 2147483648 through 2147483647 decimal.
Min value: [Int32]::MinValue Max Value: [Int32]::MaxValue Size: [System.Runtime.InteropServices.Marshal]::SizeOf([UInt32])
Pure PowerShell Strictly using only the PowerShell scripting language and built-in cmdlets
Pros:
Not complicated. Thus, easy to implement. Works in PowerShell on the Surface RT tablet i.e. PowerShell running in a Constrained language mode.
Cons:
Very slow when dealing with large, complicated binary files
10
Cons:
Doesnt work on the Surface RT tablet. You are restricted from using Add -Type. Involves calling csc.exe and writing temporary files to disk in order to compile code. This is undesirable if you are trying to maintain a minimal forensic footprint.
11
Cons:
Doesnt work on the Surface RT tablet. You are restricted from using the .NET reflection namespace. Reflection can be a difficult concept to grasp if you are not comfortable with .NET.
12
-Encoding Byte -
[System.IO.File]::ReadAllBytes(string path)
Quickly reads large files Does not work on the Surface RT tablet Reads all bytes in a file
13
Note: many file formats store their values in little-endian so you must swap their values in order to read the proper values.
Helper function to convert bytes into either a UInt16 or an Int32.
14
15
A custom object will be formed in this technique since the PowerShell scripting language has no way to define a native .NET type. You must be aware of the offsets to each field in the DOS header definition Demo: Source code analysis and script usage
MATTHEW GRAEBER - CREATIVE COMMONS ATTRIBUTION 3.0 UNPORTED LICENSE.
16
17
Reflection allows you to perform code introspection and code assembly. Requires a basic understanding of the .NET architecture.
18
AppDomain An execution sandbox for a set of assemblies Assembly The dll or exe containing your code Module A container for a logical grouping if types. Most assemblies only have a single module. Type A class definition Members The components that make up a type Constructor, Method, Event, Field, Property, NestedType
MATTHEW GRAEBER - CREATIVE COMMONS ATTRIBUTION 3.0 UNPORTED LICENSE.
19
Once the type is defined, a .NET type representing the DOS header will be defined and be nearly identical to the type created in C# previously. Demo: Source code analysis and script usage
MATTHEW GRAEBER - CREATIVE COMMONS ATTRIBUTION 3.0 UNPORTED LICENSE.
20
21
Conclusion
Parsing binary file formats in PowerShell is not a trivial matter. However, once structure is applied to a binary blob and is stored in an object, this is where PowerShell really shines. There are three primary strategies for parsing binary data in PowerShell: pure PowerShell, C# compilation, and reflection. Each strategy has their respective pros and cons.
Parsing binary data in PowerShell requires knowledge of C-style structure definitions and data types.
22
Thanks!
Twitter: @mattifestation Blog: www.exploit-monday.com Github: PowerSploit
23
24
The word Rich to indicate the presence of a Rich signature The DWORD XOR used to decode the signature Lets extend our DOS header parser to parse the Rich signature
25