diff --git a/ConverterExplanation.md b/ConverterExplanation.md new file mode 100644 index 0000000..34b4362 --- /dev/null +++ b/ConverterExplanation.md @@ -0,0 +1,56 @@ +# Explanation +The Dyld Shared Cache (DSC) is Apple's method of optimizing the loading of system libraries (images). They do this by analyzing and combining the images in a way that it bypasses a lot of processes. This extractor uses several convertors that aim to reverse the optimization done so that images can be reverse engineered easier. The order that these convertors are run is in reverse order of the optimization done. + +The goal of this project is not to make runnable files! When the DSC was built, important data was removed. Without this data we cannot completely reverse the optimizations done. We could technically try it, but there would be a very high chance of failures, and would make the extractor extremely fragile against new caches. + +## slide_info.processSlideInfo() +Dynamic libraries need to be moved in memory. On normal libraries they use a table of rebase information that locate all the pointers in the file. In the DSC apple replaces this information with a linked list of rebase information, where each pointer has extra bit information to locate the next pointer. Unfortunately, this makes pointers look like "0x20XXXXXXXXXX" which breaks most disassemblers. This convertor walks down this linked list and restores the pointers to regular plain pointers. Additionally, on arm64e, it removes the pointer authentication bits, which also help disassemblers. + +## linkedit_optimizer.optimizeLinkedit() +One of these optimizations involves combining the Linkedit of all images into one big linkedit. While we don't technically need to re-split the Linkedit, it allows for faster disassembly and smaller file sizes. This convertor is almost a one-to-one copy of Apple's "OptimizerLinkedit", just with the opposite result. + +## Stub_fixer.fixStubs() +In the DSC stubs are bypassed. In normal images, stubs generally work like this. + +1. Code in the __text section calls the stub for objc_msgSend. +2. The stub loads and jumps to its symbol pointer, which currently pointers to a stub helper. +3. The stub helper calls the dyld binder which changes the symbol pointer to the actual objc_msgSend function. And then jumps to objc_msgSend. +4. All future calls to the stub will load and jump to the objc_msgSend function. + +But in the DSC the code is modified to either the two following cases. + +1. The code jumps to the function directly. +2. The code jumps to one or more "trampoline" stubs, which eventually lands on the function. + +To reverse this we need to symbolize each element of the stub process and relink them together. + +## Objc_fixer.fixObjC() +A majority of Objective-C structures and data are moved out of the images themselves and put into libobjc's file. We can visit each pointer in classlist, protolist, catlist, etc, to almost recursively pull in all the ObjC data again. Similar to what Apple does, all the data is put into one big segment. + +Also, ObjC uses selectors to call on methods. In the DSC all the selectors are combined, and all the instructions that used the original selector reference pointers are changed to just directly load the string. This also needs to be reversed, whether by relinking the instruction back to the selector pointer, or by pointing the load address to a string that's inside the image. + +## Macho_offset.optimizeOffsets() +Because the actual segments of the image are split across large distances, the resulting output file would be gigabytes big, with most of it being unused space. This changes the file offsets so that the output file is much smaller. Note, this does not change the VM Addresses as that would break PC relative instructions and pointers. + +# Contributing +For people that want to contribute to this, here are some links for reference. + +### Objective-C Runtime +* https://opensource.apple.com/source/xnu/xnu-7195.81.3/EXTERNAL_HEADERS/mach-o/loader.h.auto.html +* https://opensource.apple.com/source/objc4/objc4-781/runtime/objc-runtime-new.h.auto.html + +### DYLD Cache +* https://opensource.apple.com/source/dyld/dyld-832.7.3/dyld3/shared-cache/dyld_cache_format.h.auto.html +* https://opensource.apple.com/source/dyld/dyld-832.7.3/dyld3/shared-cache/dsc_extractor.cpp.auto.html + +### Other Extractors +* https://github.com/deepinstinct/dsc_fix/blob/master/dsc_fix.py +* https://github.com/kennytm/Miscellaneous/blob/master/dyld_decache.cpp +* https://github.com/phoenix3200/decache/blob/master/decache.mm + +### Another extractor and a blog about DYLD extraction +* https://worthdoingbadly.com/dscextract/ +* https://github.com/zhuowei/dsc_extractor_badly/blob/master/launch-cache/dsc_extractor.cpp + +### Arm64 Instruction Set +* Search "DDI_0596_ARM_a64_instruction_set_architecture" diff --git a/README.md b/README.md index e1cf00d..97ead3d 100644 --- a/README.md +++ b/README.md @@ -21,61 +21,6 @@ dyldex -e SpringBoard.framework/SpringBoard [dyld_shared_cache_path] # Extracting all frameworks/libraries from a shared cache dyldex_all [dyld_shared_cache_path] -``` - -# Explanation -The Dyld Shared Cache (DSC) is Apple's method of optimizing the loading of system libraries (images). They do this by analyzing and combining the images in a way that it bypasses a lot of processes. This extractor uses several convertors that aim to reverse the optimization done so that images can be reverse engineered easier. The order that these convertors are run is in reverse order of the optimization done. - -The goal of this project is not to make runnable files! When the DSC was built, important data was removed. Without this data we cannot completely reverse the optimizations done. We could technically try it, but there would be a very high chance of failures, and would make the extractor extremely fragile against new caches. - -## slide_info.processSlideInfo() -Dynamic libraries need to be moved in memory. On normal libraries they use a table of rebase information that locate all the pointers in the file. In the DSC apple replaces this information with a linked list of rebase information, where each pointer has extra bit information to locate the next pointer. Unfortunately, this makes pointers look like "0x20XXXXXXXXXX" which breaks most disassemblers. This convertor walks down this linked list and restores the pointers to regular plain pointers. Additionally, on arm64e, it removes the pointer authentication bits, which also help disassemblers. - -## linkedit_optimizer.optimizeLinkedit() -One of these optimizations involves combining the Linkedit of all images into one big linkedit. While we don't technically need to re-split the Linkedit, it allows for faster disassembly and smaller file sizes. This convertor is almost a one-to-one copy of Apple's "OptimizerLinkedit", just with the opposite result. - -## Stub_fixer.fixStubs() -In the DSC stubs are bypassed. In normal images, stubs generally work like this. - -1. Code in the __text section calls the stub for objc_msgSend. -2. The stub loads and jumps to its symbol pointer, which currently pointers to a stub helper. -3. The stub helper calls the dyld binder which changes the symbol pointer to the actual objc_msgSend function. And then jumps to objc_msgSend. -4. All future calls to the stub will load and jump to the objc_msgSend function. - -But in the DSC the code is modified to either the two following cases. - -1. The code jumps to the function directly. -2. The code jumps to one or more "trampoline" stubs, which eventually lands on the function. - -To reverse this we need to symbolize each element of the stub process and relink them together. +# In any of the above examples, replace "dyldex" and "dyldex_all" with "kextex" and "kextex_all" respectively to extract images from a MH_FILESET kernelcache instead of a DSC -## Objc_fixer.fixObjC() -A majority of Objective-C structures and data are moved out of the images themselves and put into libobjc's file. We can visit each pointer in classlist, protolist, catlist, etc, to almost recursively pull in all the ObjC data again. Similar to what Apple does, all the data is put into one big segment. - -Also, ObjC uses selectors to call on methods. In the DSC all the selectors are combined, and all the instructions that used the original selector reference pointers are changed to just directly load the string. This also needs to be reversed, whether by relinking the instruction back to the selector pointer, or by pointing the load address to a string that's inside the image. - -## Macho_offset.optimizeOffsets() -Because the actual segments of the image are split across large distances, the resulting output file would be gigabytes big, with most of it being unused space. This changes the file offsets so that the output file is much smaller. Note, this does not change the VM Addresses as that would break PC relative instructions and pointers. - -# Contributing -For people that want to contribute to this, here are some links for reference. - -### Objective-C Runtime -* https://opensource.apple.com/source/xnu/xnu-7195.81.3/EXTERNAL_HEADERS/mach-o/loader.h.auto.html -* https://opensource.apple.com/source/objc4/objc4-781/runtime/objc-runtime-new.h.auto.html - -### DYLD Cache -* https://opensource.apple.com/source/dyld/dyld-832.7.3/dyld3/shared-cache/dyld_cache_format.h.auto.html -* https://opensource.apple.com/source/dyld/dyld-832.7.3/dyld3/shared-cache/dsc_extractor.cpp.auto.html - -### Other Extractors -* https://github.com/deepinstinct/dsc_fix/blob/master/dsc_fix.py -* https://github.com/kennytm/Miscellaneous/blob/master/dyld_decache.cpp -* https://github.com/phoenix3200/decache/blob/master/decache.mm - -### Another extractor and a blog about DYLD extraction -* https://worthdoingbadly.com/dscextract/ -* https://github.com/zhuowei/dsc_extractor_badly/blob/master/launch-cache/dsc_extractor.cpp - -### Arm64 Instruction Set -* Search "DDI_0596_ARM_a64_instruction_set_architecture" +``` diff --git a/bin/kextex b/bin/kextex new file mode 100755 index 0000000..2698fb4 --- /dev/null +++ b/bin/kextex @@ -0,0 +1,255 @@ +#!/usr/bin/env python3 + +import progressbar +import argparse +import pathlib +import logging +import os +import sys +from typing import List, BinaryIO + +try: + progressbar.streams +except AttributeError: + print("progressbar is installed but progressbar2 required.", file=sys.stderr) + exit(1) + +from DyldExtractor.extraction_context import ExtractionContext +from DyldExtractor.macho.macho_context import MachOContext +from DyldExtractor.kc.kc_context import KCContext + +from DyldExtractor.dyld.dyld_structs import ( + dyld_cache_image_info +) + +from DyldExtractor.converter import ( + slide_info, + macho_offset, + linkedit_optimizer, + stub_fixer, + chained_fixups, +) + + +class _DyldExtractorArgs(argparse.Namespace): + + kc_path: pathlib.Path + extract: str + output: pathlib.Path + list_extensions: bool + filter: str + verbosity: int + pass + + +def _getArguments(): + """Get program arguments. + + """ + + parser = argparse.ArgumentParser() + parser.add_argument( + "kc_path", + type=pathlib.Path, + help="A path to the target kernelcache. Only MH_FILESET caches are supported." # noqa + ) + parser.add_argument( + "-e", "--extract", + help="The name of the kext to extract." # noqa + ) + parser.add_argument( + "-o", "--output", + help="Specify the output path for the extracted kext. By default it extracts to the binaries folder." # noqa + ) + parser.add_argument( + "-l", "--list-extensions", action="store_true", + help="List all extensions in the cache." + ) + parser.add_argument( + "-f", "--filter", + help="Filter out extensions when listing them." + ) + parser.add_argument( + "-a", "--addresses", action="store_true", + help="List addresses along with extension paths. Only applies when --list-extensions is specified." + ) + parser.add_argument( + "-b", "--basenames", action="store_true", + help="Print only the basenames of each extension. Only applies when --list-extensions is specified." + ) + parser.add_argument( + "--lookup", + help="Find the library that an address lives in. E.g. kextex --lookup 0xfffffff009bbe250 kernelcache.release.iPhone14,6." + ) + parser.add_argument( + "-v", "--verbosity", type=int, choices=[0, 1, 2, 3], default=1, + help="Increase verbosity, Option 1 is the default. | 0 = None | 1 = Critical Error and Warnings | 2 = 1 + Info | 3 = 2 + debug |" # noqa + ) + + return parser.parse_args(namespace=_DyldExtractorArgs) + + +def _extractImage( + dyldFilePath: pathlib.Path, + dyldCtx: KCContext, + image: dyld_cache_image_info, + outputPath: str +) -> None: + """Extract an image and save it. + + The order of converters is essentially a reverse of Apple's AppCacheBuilder + """ + + logger = logging.getLogger() + + statusBar = progressbar.ProgressBar( + prefix="{variables.unit} >> {variables.status} :: [", + variables={"unit": "--", "status": "--"}, + widgets=[progressbar.widgets.AnimatedMarker(), "]"], + redirect_stdout=True + ) + + # get a a writable copy of the MachOContext + machoOffset, context = dyldCtx.convertAddr(image.address) + machoCtx = MachOContext(context.fileObject, machoOffset, True) + + extractionCtx = ExtractionContext(dyldCtx, machoCtx, statusBar, logger) + + #slide_info.processSlideInfo(extractionCtx) + #linkedit_optimizer.optimizeLinkedit(extractionCtx) + chained_fixups.fixChainedPointers(extractionCtx) + stub_fixer.fixStubs(extractionCtx) + + writeProcedures = macho_offset.optimizeOffsets(extractionCtx) + + # Write the MachO file + with open(outputPath, "wb") as outFile: + statusBar.update(unit="Extractor", status="Writing file") + + for procedure in writeProcedures: + outFile.seek(procedure.writeOffset) + outFile.write( + procedure.fileCtx.getBytes(procedure.readOffset, procedure.size) + ) + pass + pass + + statusBar.update(unit="Extractor", status="Done") + pass + + +def _filterImages(imagePaths: List[str], filterTerm: str): + filteredPaths = [] + filterTerm = filterTerm.lower() + + for path in imagePaths: + if filterTerm in path.lower(): + filteredPaths.append(path) + + return sorted(filteredPaths, key=len) + + +def main(): + args = _getArguments() + + # Configure Logging + level = logging.WARNING # default option + + if args.verbosity == 0: + # Set the log level so high that it doesn't do anything + level = 100 + elif args.verbosity == 2: + level = logging.INFO + elif args.verbosity == 3: + level = logging.DEBUG + + # needed for logging compatibility + progressbar.streams.wrap_stderr() # type:ignore + + logging.basicConfig( + format="{asctime}:{msecs:3.0f} [{levelname:^9}] {filename}:{lineno:d} : {message}", # noqa + datefmt="%H:%M:%S", + style="{", + level=level + ) + + with open(args.kc_path, "rb") as f: + dyldCtx = KCContext(f) + + # enumerate images, create a map of paths and images + imageMap = {} + for imageData in dyldCtx.images: + path = dyldCtx.readString(imageData.pathFileOffset) + path = path[0:-1] # remove null terminator + path = path.decode("utf-8") + + imageMap[path] = imageData + + # Find the image that an address lives in + if args.lookup: + lookupAddr = int(args.lookup, 0) + + imagePaths = imageMap.keys() + + # sort the paths so they're in VM address order + sortedPaths = sorted(imagePaths, key=lambda path: imageMap[path].address) + + previousImagePath = None + for path in sortedPaths: + imageAddr = imageMap[path].address + if lookupAddr < imageAddr: + if previousImagePath is None: + print("Error: address before first image!", file=sys.stderr) + sys.exit(1) + print(os.path.basename(previousImagePath) if args.basenames else previousImagePath) + return + else: + previousImagePath = path + # We got to the end of the list, must be the last image + path = sortedPaths[-1] + print(os.path.basename(path) if args.basenames else path) + return + + # list images option + if args.list_extensions: + imagePaths = imageMap.keys() + + # filter if needed + if args.filter: + filterTerm = args.filter.strip().lower() + imagePaths = set(_filterImages(imagePaths, filterTerm)) + + # sort the paths so they're displayed in VM address order + sortedPaths = sorted(imagePaths, key=lambda path: imageMap[path].address) + + print("Listing Images\n--------------") + for fullpath in sortedPaths: + path = os.path.basename(fullpath) if args.basenames else fullpath + if args.addresses: + print(f"{hex(imageMap[fullpath].address)} : {path}") + else: + print(path) + + return + + # extract image option + if args.extract: + extractionTarget = args.extract.strip() + targetPaths = _filterImages(imageMap.keys(), extractionTarget) + if len(targetPaths) == 0: + print(f"Unable to find image \"{extractionTarget}\"") + return + + outputPath = args.output + if outputPath is None: + outputPath = pathlib.Path("binaries/" + extractionTarget) + os.makedirs(outputPath.parent, exist_ok=True) + + print(f"Extracting {targetPaths[0]}") + _extractImage(args.kc_path, dyldCtx, imageMap[targetPaths[0]], outputPath) + return + + +if "__main__" == __name__: + main() + pass diff --git a/bin/kextex_all b/bin/kextex_all new file mode 100755 index 0000000..7425393 --- /dev/null +++ b/bin/kextex_all @@ -0,0 +1,273 @@ +#!/usr/bin/env python3 + +import argparse +import errno +import io +import logging +import multiprocessing +import pathlib +import signal +import sys +import progressbar + +from typing import ( + List, + BinaryIO, + Tuple +) + +from DyldExtractor.converter import ( + linkedit_optimizer, + macho_offset, + objc_fixer, + slide_info, + stub_fixer, + chained_fixups, +) + +from DyldExtractor.kc.kc_context import KCContext +from DyldExtractor.extraction_context import ExtractionContext +from DyldExtractor.macho.macho_context import MachOContext + +# check dependencies +try: + assert sys.version_info >= (3, 9, 5) +except AssertionError: + print("Python 3.9.5 or greater is required", file=sys.stderr) + exit(1) + +try: + progressbar.streams +except AttributeError: + print("progressbar is installed but progressbar2 required.", file=sys.stderr) + exit(1) + + +class _DyldExtractorArgs(argparse.Namespace): + + kc_path: pathlib.Path + output: pathlib.Path + jobs: int + verbosity: int + pass + + +def _createArgParser() -> argparse.ArgumentParser: + parser = argparse.ArgumentParser(description="Extract all images from a Kernelcache.") # noqa + parser.add_argument( + "kc_path", + type=pathlib.Path, + help="A path to the target DYLD cache." + ) + parser.add_argument( + "-o", "--output", + type=pathlib.Path, + help="Specify the output path for the extracted extensions. By default it extracts to './binaries/'." # noqa + ) + parser.add_argument( + "-j", "--jobs", type=int, default=multiprocessing.cpu_count(), + help="Number of jobs to run simultaneously." # noqa + ) + parser.add_argument( + "-v", "--verbosity", + choices=[0, 1, 2, 3], + default=1, + type=int, + help="Increase verbosity, Option 1 is the default. | 0 = None | 1 = Critical Error and Warnings | 2 = 1 + Info | 3 = 2 + debug |" # noqa + ) + + return parser + + +class _DummyProgressBar(): + def update(*args, **kwargs): + pass + pass + + +def _workerInitializer(): + """ + Ignore KeyboardInterrupt in workers so that the main process + can receive it and stop everything. + """ + signal.signal(signal.SIGINT, signal.SIG_IGN) + pass + + +def _extractImage( + dyldPath: pathlib.Path, + outputDir: pathlib.Path, + imageIndex: int, + imagePath: str, + loggingLevel: int +) -> str: + # change imagePath to a relative path + if imagePath[0] == "/": + imagePath = imagePath[1:] + pass + + outputPath = outputDir / imagePath + + # setup logging + logger = logging.getLogger(f"Worker: {outputPath}") + + loggingStream = io.StringIO() + handler = logging.StreamHandler(loggingStream) + formatter = logging.Formatter( + fmt="{asctime}:{msecs:03.0f} [{levelname:^9}] {filename}:{lineno:d} : {message}", # noqa + datefmt="%H:%M:%S", + style="{", + ) + + handler.setFormatter(formatter) + logger.addHandler(handler) + logger.setLevel(loggingLevel) + + # Process the image + with open(dyldPath, "rb") as f: + try: + dyldCtx = KCContext(f) + + machoOffset, context = dyldCtx.convertAddr( + dyldCtx.images[imageIndex].address + ) + machoCtx = MachOContext(context.fileObject, machoOffset, True) + + extractionCtx = ExtractionContext( + dyldCtx, + machoCtx, + _DummyProgressBar(), + logger + ) + + #slide_info.processSlideInfo(extractionCtx) + #linkedit_optimizer.optimizeLinkedit(extractionCtx) + chained_fixups.fixChainedPointers(extractionCtx) + stub_fixer.fixStubs(extractionCtx) + + writeProcedures = macho_offset.optimizeOffsets(extractionCtx) + + # write the file + outputPath.parent.mkdir(parents=True, exist_ok=True) + with open(outputPath, "wb") as outFile: + for procedure in writeProcedures: + outFile.seek(procedure.writeOffset) + outFile.write( + procedure.fileCtx.getBytes(procedure.readOffset, procedure.size) + ) + pass + pass + pass + + except OSError as e: + if e.errno == errno.EMFILE: + logger.error("Too many files open, you may need to increase your FD limit.") # noqa + else: + raise e + + except Exception as e: + logger.exception(e) + pass + pass + + handler.close() + loggingStream.flush() + loggingOutput = loggingStream.getvalue() + loggingStream.close() + return loggingOutput + + +def _main() -> None: + argParser = _createArgParser() + args = argParser.parse_args(namespace=_DyldExtractorArgs()) + + # Make the output dir + if args.output is None: + outputDir = pathlib.Path("binaries") + pass + else: + outputDir = pathlib.Path(args.output) + pass + + outputDir.mkdir(parents=True, exist_ok=True) + + if args.verbosity == 0: + # Set the log level so high that it doesn't do anything + loggingLevel = 100 + elif args.verbosity == 1: + loggingLevel = logging.WARNING + elif args.verbosity == 2: + loggingLevel = logging.INFO + elif args.verbosity == 3: + loggingLevel = logging.DEBUG + + # create a list of image paths + imagePaths: List[str] = [] + with open(args.kc_path, "rb") as f: + dyldCtx = KCContext(f) + + for image in dyldCtx.images: + imagePath = dyldCtx.readString( + image.pathFileOffset + )[0:-1].decode("utf-8") + imagePaths.append(imagePath) + pass + pass + + with multiprocessing.Pool(args.jobs, initializer=_workerInitializer) as pool: + # Create a job for each image + jobs: List[Tuple[str, multiprocessing.pool.AsyncResult]] = [] + jobsComplete = 0 + for i, imagePath in enumerate(imagePaths): + # The index should correspond with its index in the DSC + extractionArgs = (args.kc_path, outputDir, i, imagePath, loggingLevel) + jobs.append((imagePath, pool.apply_async(_extractImage, extractionArgs))) + pass + + # setup a progress bar + progressBar = progressbar.ProgressBar( + max_value=len(jobs), + redirect_stdout=True + ) + + # Record potential logging output for each job + jobOutputs: List[str] = [] + + # wait for all jobs + while len(jobs): + for i in reversed(range(len(jobs))): + imagePath, job = jobs[i] + if job.ready(): + jobs.pop(i) + + imageName = imagePath.split("/")[-1] + print(f"Processed: {imageName}") + + jobOutput = job.get() + if jobOutput: + summary = f"----- {imageName} -----\n{jobOutput}--------------------\n" + jobOutputs.append(summary) + print(summary) + pass + + jobsComplete += 1 + progressBar.update(jobsComplete) + pass + pass + pass + + # close the pool and cleanup + pool.close() + pool.join() + progressBar.update(jobsComplete, force=True) + + # reprint any job output + print("\n\n----- Summary -----") + print("".join(jobOutputs)) + print("-------------------\n") + pass + pass + + +if __name__ == "__main__": + _main() diff --git a/setup.py b/setup.py index 10f095e..f339058 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name='dyldextractor', - version='2.1.3', + version='2.2.0', description='Extract Binaries from Apple\'s Dyld Shared Cache', long_description='file: README.md', long_description_content_type='text/markdown', diff --git a/src/DyldExtractor/cache_context.py b/src/DyldExtractor/cache_context.py new file mode 100644 index 0000000..85d91f9 --- /dev/null +++ b/src/DyldExtractor/cache_context.py @@ -0,0 +1,45 @@ +import pathlib +from typing import ( + List, + Tuple, + BinaryIO +) + +from DyldExtractor.file_context import FileContext +from DyldExtractor.dyld.dyld_structs import ( + dyld_cache_header, + dyld_cache_mapping_info, + dyld_cache_image_info, + dyld_subcache_entry, + dyld_subcache_entry2, +) + + +class CacheContext(FileContext): + + def __init__(self, fileObject: BinaryIO, copyMode: bool = False) -> None: + super().__init__(fileObject, copyMode=copyMode) + + def convertAddr(self, vmaddr: int) -> Tuple[int, "CacheContext"]: + """Convert a vmaddr to its file offset + + Returns: + The file offset and the CacheContext, but if not found, `None`. + """ + + for mapping, ctx in self.mappings: + lowBound = mapping.address + highBound = mapping.address + mapping.size + + if vmaddr >= lowBound and vmaddr < highBound: + mappingOff = vmaddr - lowBound + return mapping.fileOffset + mappingOff, ctx + + # didn't find the address in any mappings... + return None + + def hasSubCaches(self) -> bool: + return False + + def isFileset(self) -> bool: + return False diff --git a/src/DyldExtractor/converter/chained_fixups.py b/src/DyldExtractor/converter/chained_fixups.py new file mode 100644 index 0000000..d6fa400 --- /dev/null +++ b/src/DyldExtractor/converter/chained_fixups.py @@ -0,0 +1,101 @@ +import struct +from dataclasses import dataclass +from typing import ( + Type, + TypeVar, + Union, + Tuple, + List +) + +from DyldExtractor.extraction_context import ExtractionContext +from DyldExtractor.macho.macho_context import MachOContext + +from DyldExtractor.macho.macho_structs import ( + LoadCommands, + linkedit_data_command, +) +from DyldExtractor.macho.fixup_chains_structs import ( + dyld_chained_fixups_header, + dyld_chained_starts_in_image, + dyld_chained_starts_in_segment, + ChainedPtrStart, + ChainedFixupPointerOnDisk, + PointerFormat, +) + +class _PointerFixer(object): + def __init__(self, extractionCtx: ExtractionContext) -> None: + super().__init__() + + self.extractionCtx = extractionCtx + self.machoCtx = extractionCtx.machoCtx + self.dyldCtx = extractionCtx.dyldCtx + self.statusBar = extractionCtx.statusBar + self.logger = extractionCtx.logger + self.context = extractionCtx.dyldCtx._machoCtx + + def fixChainedPointers(self) -> None: + self.statusBar.update(unit="Chained Pointers") + + fixupsCmd = self.context.getLoadCommand((LoadCommands.LC_DYLD_CHAINED_FIXUPS,)) + chainsHeader = dyld_chained_fixups_header(self.context.file, fixupsCmd.dataoff) + if not fixupsCmd: + self.logger.warning("No LC_DYLD_CHAINED_FIXUPS found in mach-o.") + return + + if chainsHeader.fixups_version != 0: + self.logger.error("Unrecognised dyld_chained_fixups version.") + return + + startsOffset = chainsHeader._fileOff_ + chainsHeader.starts_offset + startsInfo = dyld_chained_starts_in_image(self.context.file, startsOffset) + + seg_info_offsets = self.context.readFormat("<" + "I" * startsInfo.seg_count, startsOffset + 4) + for segInfoOffset in seg_info_offsets: + if segInfoOffset == 0: + continue + segInfo = dyld_chained_starts_in_segment(self.context.file, startsOffset + segInfoOffset) + self.fixChainedPointersInSegment(segInfo) + + def fixChainedPointersInSegment(self, segInfo: dyld_chained_starts_in_segment) -> None: + page_start_off = 22 + for pageIndex in range(segInfo.page_count): + offsetInPage = self.context.readFormat(" None: + pointer_format = segInfo.pointer_format + if pointer_format != PointerFormat.DYLD_CHAINED_PTR_64_KERNEL_CACHE: + self.logger.error(f"Unsupported chain pointer_format: {pointer_format}") + return + + stride = 4 + chainEnd = False + + while not chainEnd: + chain = ChainedFixupPointerOnDisk(self.context.file, chainOff) + self.fixPointer(chain, segInfo) + + if chain._next == 0: + chainEnd = True + else: + chainOff += chain._next * stride + + def fixPointer(self, chain: ChainedFixupPointerOnDisk, segInfo: dyld_chained_starts_in_segment) -> None: + self.statusBar.update(status="Fixing Pointers") + fixedPointer = self.context.segments[b"__TEXT"].seg.vmaddr + chain.target + self.machoCtx.writeBytes(chain._fileOff_, struct.pack(" None: + fixer = _PointerFixer(extractionCtx) + fixer.fixChainedPointers() diff --git a/src/DyldExtractor/converter/slide_info.py b/src/DyldExtractor/converter/slide_info.py index 25b7c42..f37c0f4 100644 --- a/src/DyldExtractor/converter/slide_info.py +++ b/src/DyldExtractor/converter/slide_info.py @@ -330,6 +330,14 @@ def _getMappingInfo( class PointerSlider(object): + def __new__(cls, extractionCtx: ExtractionContext) -> object: + if extractionCtx.dyldCtx.isFileset(): + slider = KCPointerSlider.__new__(KCPointerSlider) + slider.__init__(extractionCtx) + return slider + + return super().__new__(cls, extractionCtx) + def __init__(self, extractionCtx: ExtractionContext) -> None: """Provides a way to slide individual pointers. """ @@ -414,6 +422,20 @@ def slideStruct( return structData +class KCPointerSlider(object): + def __init__(self, extractionCtx: ExtractionContext) -> None: + """Provides a way to slide individual pointers in kernelcaches. + """ + self._dyldCtx = extractionCtx.dyldCtx + pass + + def slideAddress(self, address: int) -> int: + # TODO(muirey03): This doesn't yet deal with chained pointers + if not (offset := self._dyldCtx.convertAddr(address)): + return None + offset, context = offset + return context.readFormat(" None: """Process and remove rebase info. diff --git a/src/DyldExtractor/converter/stub_fixer.py b/src/DyldExtractor/converter/stub_fixer.py index 89e8d91..05e2fdd 100644 --- a/src/DyldExtractor/converter/stub_fixer.py +++ b/src/DyldExtractor/converter/stub_fixer.py @@ -57,7 +57,7 @@ def __init__(self, extractionCtx: ExtractionContext) -> None: pass self._enumerateExports() - self._enumerateSymbols() + self._enumerateSymbols(self._machoCtx) pass def symbolizeAddr(self, addr: int) -> List[bytes]: @@ -96,11 +96,18 @@ def _enumerateExports(self) -> None: reExports: List[dyld_trie.ExportInfo] = [] # get an initial list of dependencies - if dylibs := self._machoCtx.getLoadCommand(DEP_LCS, multiple=True): - for dylib in dylibs: - if depInfo := self._getDepInfo(dylib, self._machoCtx): - depsQueue.append(depInfo) - pass + # assume every image in a fileset is a dependency: + if self._dyldCtx.isFileset(): + for image in self._dyldCtx.images: + machoOffset, context = self._dyldCtx.convertAddr(image.address) + context = MachOContext(context.fileObject, machoOffset) + self._enumerateSymbols(context) + else: + if dylibs := self._machoCtx.getLoadCommand(DEP_LCS, multiple=True): + for dylib in dylibs: + if depInfo := self._getDepInfo(dylib, self._machoCtx): + depsQueue.append(depInfo) + pass while len(depsQueue): self._statusBar.update() @@ -248,19 +255,19 @@ def _cacheDepExports( self._symbolCache[functionAddr] = [bytes(export.name)] pass - def _enumerateSymbols(self) -> None: + def _enumerateSymbols(self, machoCtx) -> None: """Cache potential symbols in the symbol table. """ - symtab: symtab_command = self._machoCtx.getLoadCommand( + symtab: symtab_command = machoCtx.getLoadCommand( (LoadCommands.LC_SYMTAB,) ) if not symtab: self._logger.warning("Unable to find LC_SYMTAB.") return - linkeditFile = self._machoCtx.ctxForAddr( - self._machoCtx.segments[b"__LINKEDIT"].seg.vmaddr + linkeditFile = machoCtx.ctxForAddr( + machoCtx.segments[b"__LINKEDIT"].seg.vmaddr ) for i in range(symtab.nsyms): @@ -275,7 +282,7 @@ def _enumerateSymbols(self) -> None: if symbolAddr == 0: continue - if not self._machoCtx.containsAddr(symbolAddr): + if not machoCtx.containsAddr(symbolAddr): self._logger.warning(f"Invalid address: {symbolAddr}, for symbol entry: {symbol}.") # noqa continue @@ -1266,6 +1273,39 @@ def _addToMap(stubName: bytes, stubAddr: int): for segment in self._machoCtx.segmentsI: for sect in segment.sectsI: if sect.flags & SECTION_TYPE == S_SYMBOL_STUBS: + if sect.size == 0 and self._dyldCtx.isFileset(): + # fileset stubs section was nuked, rebuild it + # here I expand the __TEXT_EXEC section + # we can assume that we have enough space for this + # as the area after will belong to another binary + sect.offset = segment.seg.fileoff + segment.seg.filesize + sect.reserved2 = 16 + sect.size = sect.reserved2 * len(symbolPtrs) + segment.seg.vmsize += sect.size + segment.seg.filesize += sect.size + self._machoCtx.writeBytes(sect._fileOff_, sect) + self._machoCtx.writeBytes(segment.seg._fileOff_, segment.seg) + + for i, (key, targets) in enumerate(symbolPtrs.items()): + self._statusBar.update(status="Fixing Stubs") + + stubAddr = sect.addr + (i * sect.reserved2) + symPtrAddr = targets[0] + + symPtrOff = self._dyldCtx.convertAddr(symPtrAddr)[0] + if not symbolPtrFile: + symbolPtrFile = self._machoCtx.ctxForAddr(symPtrAddr) + pass + symbolPtrFile.writeBytes(symPtrOff, struct.pack(" None: - if ( - b"__TEXT" not in self._machoCtx.segments - or b"__text" not in self._machoCtx.segments[b"__TEXT"].sects - ): - raise _StubFixerError("Unable to get __text section.") + textSect = self._machoCtx.segments.get(b"__TEXT", {}).sects.get(b"__text", None) + if not textSect: + textSect = self._machoCtx.segments.get(b"__TEXT_EXEC", {}).sects.get(b"__text", None) - textSect = self._machoCtx.segments[b"__TEXT"].sects[b"__text"] + if not textSect: + raise _StubFixerError("Unable to get __text section.") textAddr = textSect.addr # Section offsets by section_64.offset are sometimes diff --git a/src/DyldExtractor/dyld/dyld_context.py b/src/DyldExtractor/dyld/dyld_context.py index 12e3eb5..dcdbd7c 100644 --- a/src/DyldExtractor/dyld/dyld_context.py +++ b/src/DyldExtractor/dyld/dyld_context.py @@ -13,9 +13,10 @@ dyld_subcache_entry, dyld_subcache_entry2, ) +from DyldExtractor.cache_context import CacheContext -class DyldContext(FileContext): +class DyldContext(CacheContext): def __init__(self, fileObject: BinaryIO, copyMode: bool = False) -> None: """A wrapper around a dyld file. @@ -60,24 +61,6 @@ def __init__(self, fileObject: BinaryIO, copyMode: bool = False) -> None: self._subCaches: List[DyldContext] = [] pass - def convertAddr(self, vmaddr: int) -> Tuple[int, "DyldContext"]: - """Convert a vmaddr to its file offset - - Returns: - The file offset and the DyldContext, but if not found, `None`. - """ - - for mapping, ctx in self.mappings: - lowBound = mapping.address - highBound = mapping.address + mapping.size - - if vmaddr >= lowBound and vmaddr < highBound: - mappingOff = vmaddr - lowBound - return mapping.fileOffset + mappingOff, ctx - - # didn't find the address in any mappings... - return None - def headerContainsField(self, field: str) -> bool: """Check to see if the header contains the given field. diff --git a/src/DyldExtractor/extraction_context.py b/src/DyldExtractor/extraction_context.py index 827dc87..d2b846c 100644 --- a/src/DyldExtractor/extraction_context.py +++ b/src/DyldExtractor/extraction_context.py @@ -1,7 +1,7 @@ import progressbar import logging -from DyldExtractor.dyld.dyld_context import DyldContext +from DyldExtractor.cache_context import CacheContext from DyldExtractor.macho.macho_context import MachOContext @@ -9,7 +9,7 @@ class ExtractionContext(object): """Holds state information for extraction """ - dyldCtx: DyldContext + dyldCtx: CacheContext machoCtx: MachOContext # The update method of the the progress bar has @@ -33,7 +33,7 @@ class ExtractionContext(object): def __init__( self, - dyldCtx: DyldContext, + dyldCtx: CacheContext, machoCtx: MachOContext, statusBar: progressbar.ProgressBar, logger: logging.Logger diff --git a/src/DyldExtractor/kc/__init__.py b/src/DyldExtractor/kc/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/src/DyldExtractor/kc/kc_context.py b/src/DyldExtractor/kc/kc_context.py new file mode 100644 index 0000000..581a9a8 --- /dev/null +++ b/src/DyldExtractor/kc/kc_context.py @@ -0,0 +1,78 @@ +import pathlib +from typing import ( + List, + Tuple, + BinaryIO +) + +from DyldExtractor.file_context import FileContext +from DyldExtractor.dyld.dyld_structs import ( + dyld_cache_header, + dyld_cache_mapping_info, + dyld_cache_image_info, + dyld_subcache_entry, + dyld_subcache_entry2, +) + +from DyldExtractor.macho.macho_context import MachOContext +from DyldExtractor.cache_context import CacheContext + +from DyldExtractor.macho.macho_structs import ( + LoadCommandMap, + LoadCommands, + load_command, + UnknownLoadCommand, + mach_header_64, + segment_command_64 +) + +class KCContext(CacheContext): + + def __init__(self, fileObject: BinaryIO, copyMode: bool = False) -> None: + """A wrapper around a kernelcache file. + + Provides convenient methods and attributes for a given kernelcache file. + + Args: + file: an open kernelcache file. + """ + + super().__init__(fileObject, copyMode=copyMode) + + machoCtx = MachOContext(fileObject, 0, False) + self._machoCtx = machoCtx + self.header = machoCtx.header + + # Check filetype + MH_FILESET = 0xc + if self.header.filetype != MH_FILESET: + raise Exception("Only MH_FILESET kernelcaches are supported!") + + self.mappings: List[Tuple[dyld_cache_mapping_info, KCContext]] = [] + for segment in machoCtx.segmentsI: + seg = segment.seg + + info = dyld_cache_mapping_info() + info.address = seg.vmaddr + info.size = seg.vmsize + info.fileOffset = seg.fileoff + self.mappings.append((info, self)) + pass + + # get images + self.images: List[dyld_cache_image_info] = [] + + filesetEntries = machoCtx.getLoadCommand((LoadCommands.LC_FILESET_ENTRY,), multiple=True) + if not filesetEntries: + raise Exception("Kernelcache does not contain any fileset entries!") + + for entry in filesetEntries: + info = dyld_cache_image_info() + info.pathFileOffset = entry._fileOff_ + entry.entry_id.offset + info.address = entry.vmaddr + self.images.append(info) + pass + pass + + def isFileset(self) -> bool: + return True diff --git a/src/DyldExtractor/macho/fixup_chains_structs.py b/src/DyldExtractor/macho/fixup_chains_structs.py new file mode 100644 index 0000000..ccc083a --- /dev/null +++ b/src/DyldExtractor/macho/fixup_chains_structs.py @@ -0,0 +1,103 @@ +"""Structs for fixup chains + +This is mainly sourced from +https://github.com/apple-oss-distributions/dyld/blob/4de7eaf4cce244fbfb9f3562d63200dbf8a6948d/include/mach-o/fixup-chains.h +""" + +import sys +from ctypes import ( + c_char, + c_uint8, + c_uint16, + c_uint32, + c_uint64, + Union, + sizeof +) +from enum import IntEnum + +from DyldExtractor.structure import Structure + +class ChainedPtrStart(IntEnum): + DYLD_CHAINED_PTR_START_NONE = 0xFFFF + DYLD_CHAINED_PTR_START_MULTI = 0x8000 + DYLD_CHAINED_PTR_START_LAST = 0x8000 + +class PointerFormat(IntEnum): + DYLD_CHAINED_PTR_ARM64E = 1 + DYLD_CHAINED_PTR_64 = 2 + DYLD_CHAINED_PTR_32 = 3 + DYLD_CHAINED_PTR_32_CACHE = 4 + DYLD_CHAINED_PTR_32_FIRMWARE = 5 + DYLD_CHAINED_PTR_64_OFFSET = 6 + DYLD_CHAINED_PTR_ARM64E_OFFSET = 7 + DYLD_CHAINED_PTR_ARM64E_KERNEL = 7 + DYLD_CHAINED_PTR_64_KERNEL_CACHE = 8 + DYLD_CHAINED_PTR_ARM64E_USERLAND = 9 + DYLD_CHAINED_PTR_ARM64E_FIRMWARE = 10 + DYLD_CHAINED_PTR_X86_64_KERNEL_CACHE = 11 + DYLD_CHAINED_PTR_ARM64E_USERLAND24 = 12 + +class dyld_chained_fixups_header(Structure): + SIZE = 28 + + fixups_version: int + starts_offset: int + imports_offset: int + symbols_offset: int + imports_count: int + imports_format: int + symbols_format: int + + _fields_ = [ + ("fixups_version", c_uint32), + ("starts_offset", c_uint32), + ("imports_offset", c_uint32), + ("symbols_offset", c_uint32), + ("imports_count", c_uint32), + ("imports_format", c_uint32), + ("symbols_format", c_uint32), + ] + +class dyld_chained_starts_in_image(Structure): + seg_count: int + + _fields_ = [ + ("seg_count", c_uint32), + ] + +class dyld_chained_starts_in_segment(Structure): + size: int + page_size: int + pointer_format: int + segment_offset: int + max_valid_pointer: int + page_count: int + + _fields_ = [ + ("size", c_uint32), + ("page_size", c_uint16), + ("pointer_format", c_uint16), + ("segment_offset", c_uint64), + ("max_valid_pointer", c_uint32), + ("page_count", c_uint16), + ] + +class ChainedFixupPointerOnDisk(Structure): + target: int + cacheLevel: int + diversity: int + addrDiv: int + key: int + _next: int + isAuth: int + + _fields_ = [ + ("target", c_uint32, 30), + ("cacheLevel", c_uint32, 2), + ("diversity", c_uint32, 16), + ("addrDiv", c_uint32, 1), + ("key", c_uint32, 2), + ("_next", c_uint32, 12), + ("isAuth", c_uint32, 1), + ]