Advanced Windows Debugging
Advanced Windows Debugging
I'm Mario Hewardt, and I welcome you to the first module, debugging fundamentals, of the advanced Windows debugging course. What we're going to be tal ing about here first of all is the im!ortance of effective debugging. Why is it so im!ortant for software engineers to fine tune and hone their debugging s ills" We'll tal a little bit about debugging tools for Windows, which is the debugger !ac age that we'll be using throughout the course. We'll cover how to go about installing them as well as how to run them. # little bit of information on symbols and symbol servers, a very ey as!ects of debugging. $ome of the basic debugger commands that you're going to be using !retty much throughout every debug session. #nd then finally, !ost mortem debugging, which is a very effective way of trouble shooting a !roblem when live access to the servers are not !ossible. $ome logistics before we get started. %he focus is going to be on user mode. %hat's the vast ma&ority of software out there minus device drivers. We're going to be loo ing at !rimarily '() when it comes to demos. *ow, the debuggers will abstract away the bitness of the architecture for you. $o all of the things that we tal about in this course are e+ually a!!licable to ), bit. %he debugging !ac age that we'll use comes with three debuggers or three user mode debuggers. %hey're all built on one and the same debugger engine, which means that what you can do in one debugger you can do in the other. $o it's really a matter of !ersonal !reference which one you want to use. #nd throughout all the demonstrations in this course, I'm going to be using *%$D. #nd finally, debugging tools for Windows is e-actly what's going to be covered here, and we will not be tal ing about .isual $tudio debugging. %he im!ortance of effective debugging. %here is a +uote u! there by a gentleman by the name of /rian 0urniem who really ind of ca!tured the essence of debugging. #nd his +uote says, 1Debugging is twice as hard as writing the code in the first !lace. %herefore, if you write the code as cleverly as !ossible, you are by definition not smart enough to debug it.1 *ow, this wasn't meant to be a discouraging +uote. It's really &ust meant to illustrate the im!ortance of the mindset that you have to have when it comes to debugging. 2uite often, debugging is considered to be a necessary evil or a secondary citi3en of your day to day &ob. /ut in reality, debugging is very, very hard. #nd that's what Mr. 0urniem wanted to illustrate here. %here is a staggering statistic out there that was collected by the *ational Institute of $tandards and %echnology bac in 4554. #nd that statistic says that 6)5 billion !er year in economic losses due to software defects. %hat's a fairly substantial amount of money. #nd you can view this strategy for ind of reducing this in two different trac s. One is well, let's sto! writing buggy code. %hat's, of course, an honorable goal, but it's not always feasible to achieve. #nd another way to loo at this is when a !roblem does ha!!en because we now they will, let's ma e sure that we're e+ui!!ed with the right mindset, the right tools, and the right debugging s ill set to be able to reduce the root cause analysis from a wee down to &ust minutes. If we can do that, then effectively, we've reduced the amount of down time that we have. #nd we can more +uic ly turn around and fi- the !roblem. 7erfect code is an illusion. *ow, we all li e to thin that we do write !erfect code. /ut unfortunately, a lot of times, there are factors outside of our own control that doesn't allow us to do that. For e-am!le, schedule gets cut short because the software has to go out before a com!etitor releases theirs. Or management decides to cut the schedule short. Whatever the case may be, it creates somewhat of a stressful situation for software engineers, and that's ty!ically when bugs start cree!ing into the !roduct. 8ven if you were able to write !erfect code, we do still have to deal with an as!ect of legacy code. *ow, you may be wor ing on new code for version ,, but you still have version 9, 4, and : out there that needs to be maintained because customers are using it. %hat code may have been written 95 years ago on what was then considered modern hardware, but it now needs to run on hardware that's evolved +uite a bit. $o being able to effectively troubleshoot !roblems can really hel! you here. #nd finally, if you do embrace this mindset that debugging is a first class citi3en of your &ob, then one thing that's guaranteed is that you will become a better develo!er. ;ou will gain an understanding of the abstractions that you're wor ing with so that it enables you to write better software for those abstractions moving forward. Debugging %ools for Windows Overview Debugging tools for Windows overview. One of the ey +uestions that I get by engineers is why can't we &ust use the .isual $tudio debugger" #nd everyone loves to sit in .isual $tudio, develo! their code, do their debugging, and it's a great, great integrated develo!ment environment. Well, it turns out that debugging tools for Windows offers several advantages over .isual $tudio. *o. 9 is 3ero foot!rint, which is an e-cellent feature
to have when you're loo ing at !roduction environments. For e-am!le, system engineers that maintain a set of !roduction machines. %hey ty!ically loc them down, and they don't want any configuration changes done on those machines. Well, with a 3ero foot!rint debugger that you can literally launch from a thumb drive, it ma es them a lot more comfortable when it comes to running those debuggers on the machines. It's got a !owerful e-tension model. *ot only do the debuggers come with a large number of e-isting e-tensions and commands, but it also !ublically e-!oses an #7I that allows you to write your own e-tensions if you so wish. #nd it also shi!s with several other very useful tools. For e-am!le, <MDH is a great hea! memory lea trac ing tool, which we'll see in the resource league section later on. $o now that we're convinced that this is a great debugger to use, what about installing them" Well, it turns out installs is !art of the Windows $%0, which is free. $o these debuggers come for free. If you're worried about Windows $%0 not being 3ero foot!rint, then fear not because the Windows $%0 allows you to !ic and choose what you want to install. $o if you run the web installer for the Windows $%0, you can deselect everything and only !ic the debuggers. If for some reason you need to run older versions of the debuggers, they are available as standalone installs on Microsoft.com. ;ou &ust search for debugging tools for Windows, and it will ta e you to the lin to install them. 0ey thing here is that you want to !ic the a!!ro!riate architecture. $o if you are !lanning to install the debuggers on '(), then !ic the '() !ac age. #nd on ), bit, you !ic the ), bit !ac age. *ow, with ), bit, Windows has the ability to run :4 bit a!!lications. $o if you want to debug a :4 bit a!! on ), bit Windows, you can install the :4 bit debuggers on ), bit Windows. %he default installation !ath is in !rogram files debugging tools for Windows. #nd if you're installing the :4 bit version on a ), bit machine, they will be installed into !rogram files '() debugging tools for Windows. *ow, you can change these !aths during the installation !rocess. Once you got the debuggers installed to the s!ecified !ath, when you run them, you're going to have to s!ecify the full !ath to the debugger. #nd in order to avoid that as an o!timi3ation, I ty!ically &ust add that !ath to the system !ath variable. $o what are some of the ey com!onents in this !ac age" Well, the first three here, *%$D, =D/, and Win D/>, they constitute the three user mode debuggers that are !art of the !ac age. %hey are all built on the same core debugger engine, and this is critical. $o what you can do in *%$D you can do in =D/ and you can do in Win D/>. $o it really &ust becomes a !reference of what ty!e of debugger you want to run. #nd by ty!e, I mean *%$D and =D/ are entirely console based. $o you'll essentially be sitting in what loo s li e a DO$ shell without any <I at all. Win D/> on the other hand, adds the additional <I functionality to it, and if you're used to .isual $tudio, you're going to see similarities between the <I's there. 0D is the ernel debugger. #D !lus is a great little tool if you want to monitor !rocesses for certain conditions. <MDH, an e-cellent, e-cellent hea! memory lea trac ing tool. #nd then remote, which is a!!licable if you want to remote out debugger sessions. Demo? Installing Debugging %ools for Windows *ow, let's run through a demo of what it loo s li e to install the debugging tools for Windows. $o I'm going to go off to the Microsoft.com website. I'm going to search for the Windows $D0. #nd I'll !ic the latest one, which is the Microsoft Windows $D0 for Windows @ and .net framewor ,. =lic the download button to launch the web installer. #nd the first thing that the installation !rocess does, it it goes through a detection !hase to see if you already have it installed. On my !articular machine, I do already have it installed. #nd that's why I get the re!air, remove, and change o!tions. I'll clic on the change to show you what a clean install would loo li e. #nd here, you can see all the com!onents that are currently installed or that will be installed. #nd what you want to do is you want to deselect everything, e-ce!t for debugging tools for Windows under the common utilities. #nd what that does is it s i!s everything, and the only thing that will be installed on the machine are the debuggers. One more thing worth mentioning here is there is another debugging tools under the redistributable !ac ages. ;ou want to select that one if you want the debugger M$I's. #nd now the ey benefit of that is that you will get all the flavors. $o '() and '),. #nd you'll have the M$I's for them so that you can go off and &ust !ut them on another machine and install it from the M$I's directly. #lso, during the installation !rocess, you can choose the install folder for the debuggers. #nd I'm going to show you because in my case, I actually !ut them under D/> '),, for e-am!le. $o if I do a DIA, I loo in the folder, there are all my debuggers and all the tools associated with debugging tools for Windows. #nd this is the same folder that I would actually run the debuggers from, for e-am!le, *%$D. Aunning the debuggers
*ow that you've got the debuggers installed, how do you actually go about running them" Well, the first way is by sim!ly launching a !rocess under the debugger. $o you're instructing the debugger to launch a new instance of a !rocess for you. #nd the way that you go about doing that is by s!ecifying the name of the debugger, *%$D in this case, followed by the full !ath to the binary itself. #nd what that will do is sim!ly start a new instance of the !rocess under the debugger and dis!lay the debugger for you, and then the debugger will wait for you to enter in!ut or enter commands. One thing to be aware of is if you launch a new instance of the !rocess under the debugger, certain !arts of Windows will assume that you are, in fact, going to debug the !rocess and try to hel! you out. # great e-am!le of this is the Windows hea! manager, which enters a hea! debug mode, if you will, which adds some additional instrumentation to hel! you ca!ture issues around hea! corru!tions. $o it's &ust the caveat to be aware of that debug sessions might loo different. %he second way to run the debugger is sim!ly when you have an e-isting !rocess already running, and you can tell the debugger that you want to attach yourself to that !rocess. #nd there are scenarios where this is a very realistic a!!roach. For e-am!le, if it ta es a really, really long time for a !roblem to re!roduce, let's say it ta es days in !roduction, well you don't want to start the !rocess under the debugger and let it run under the debugger for ' number of days. Instead, what you want to do is you want to get the !rocess into the !roblematic state and then attach the debugger to the !rocess to try to debug it. %he way that you do that is you have to s!ecify to the debugger what's nown as a !rocess identifier. $o every single !rocess on Windows is uni+uely identified by a number. #nd that's the number that you feed into *%$D using the dash 7 switch to tell the debugger to attach. If for some reason you are not able to find the !rocess ID, you can use the name of the !rocess and the dash 7* switch. One caveat to this is if there is more than one instance of that !rocess running, the debugger is not going to be able to attach because he can't really figure out which of those !articular instances that you want. #nd then you're going to have to resort to going bac to the !rocess identifier. %o find the !rocess identifier, there are a lot of different ways. # sim!le one is tas manager. ;ou can hit control shift esca!e. It will bring u! tas manager, and it will have a 7ID, which is short for !rocess ID column, in the !rocesses tab. Or as alternatively, you can also use % list.e-e, which is !art of the debugging tools for Windows, and it will sim!ly list out all the !rocesses on the machine as well as what the !rocess ID's are. Demo? Aunning the debuggers *ow, let's run through a demo of how to launch a !rocess under the debugger and attach to an already e-isting !rocess. $o I'm at my command line !rom!t here. #nd the !rocess that we're going to be debugging now is &ust the good old *ote!ad.e-e. If I wanted to start *ote!ad.e-e, a new instance of it under the debugger, I would go and s!ecify the debugger. In this case, it's *%$D, followed by the name of the !rocess, *ote!ad.e-e. #nd what this will do is launch the debugger as well as launch a new instance of *%$D.e-e. We can see that at the way bottom, once the debugger has launched it, we have a !rom!t. #nd right now, the debugger has !aused the !rocess and is waiting for you to in!ut some debugger commands. ;ou will see here some introductory te-t. %he to! !art will tell you a little bit of information about the debugger such as version and architecture. #nd then there's a long list of module load notifications. $o for e-am!le, what the debugger is telling us here that our module named *ote!ad.e-e was loaded. #nd *% DBB was also loaded, 0ernel :4, and a whole bunch of others. %hese are the base modules that have to get loaded into a !rocess before the debugger gets a chance to brea e-ecution. $o I'm going to use the 2 command, which stands for +uit. #nd that terminates the debugger as well as the !rocess that the debugger is debugging. *ow, what if we already had an instance of *ote!ad running. $o I'm going to o!en u! a new instance. #nd let's say we wanted to attach a debugger to this. Well, remember, what I have to do is find out the !rocess ID. #nd in this case, I can use tas manager. More s!ecifically, the !rocesses tab. #nd then I can search for *ote!ad, which I find right here. #nd it tells me that the 7ID, or !rocess ID, is @C45. $!ecify my debugger, *%$D D 7 for !rocess ID followed by @C45. #nd that once again o!ens u! the debugger window. %he debugger is now attached to that instance. #nd you can see that is very, very similar here. ;ou've got a bunch of modules loaded followed by the debugger waiting for you to do something. >oing to +uit, and I'm going to show you one more way to attach. >o to launch *ote!ad yet again. %his time, I want to attach to *ote!ad, but I either don't now the !rocess ID or maybe I'm too la3y to actually go and find it. I can use D 7* for !rocess name and s!ecify *ote!ad. #nd then it attaches yet again. $o that's how you can go about attaching to a !rocess as well as launching a new instance of a !rocess in the debugger.
$ymbols $ymbols or symbolic information is really a ey construct when it comes to debugging. #nd the best way to thin about symbols is as au-iliary metadata. $o instead of having to sim!ly loo at code addresses or data addresses, you can introduce an additional symbol file with the e-tension of 7D/ that allows the debugger to ma! those addresses to symbolic information that ma es more sense to you when you're debugging. %here are two categories of symbols. %here are the !ublic symbols and the !rivate symbols. *ow, the !ublic symbols are a slimmed down version of the !rivate symbols. #nd that's commonly used when you want to share those symbols with e-ternal engineers. 7rivate symbols, on the other hand is used internally for very rich debugger e-!erience. %he !rivate symbols will contain everything that you !ossibly want to now in as far as symbolic information and the !ublic is sim!ly a slimmed down version. *ow, why would you want to ma e this se!aration" Well, as it turns out with !rivate symbols, it gets to be a lot easier for someone to reverse engineer your code if they have access to the !rivate symbols, and that's why you ind of want to ee! them loc ed down and not release them e-ternally. $o where are the symbols located" Well, it turns out that if you need symbols for Microsoft !roducts, Microsoft maintains what's nown as a !ublic symbol server. #nd it !uts all the !ublic symbols of these !roducts out there, and you can sim!ly instruct the debugger to go and fetch them from that location. For your own !roducts, you can manage your own symbols. $o whenever you do a build, you can ta e the symbol files, the DDD files and archive them in a nown location. #nd when an engineer needs to debug, they !oint the debugger to that s!ecific location. %hat is ind of a hassle to maintain, es!ecially when you're dealing with multi!le !roducts, multi!le versions, multi!le hot fi-es, etc., and it turns out that you can create your own symbol server where the debuggers would automatically figure out what the right version is and and where to go and fetch the symbols, and we'll see an e-am!le of that later on. $o now that we now where the symbols are located, how do you tell the debugger to go and use those symbols" Well, one of the most common commands you'll run into is .sym fi-, and that's really all a shortcut. #nd it tells the debugger to set the symbol !ath to the !ublic Microsoft symbol server. Dot sym !ath will show you what the current symbol !ath is and also allows you to set the symbol !ath. Dot sym !ath !lus followed by symbol !ath hands a symbol !ath to the e-isting symbol !ath. $o one common !attern in debug session is to do a .sym fi-, which will set it to the Microsoft !ublic symbol server, followed by .sym !ath !lus and then the !ath to your own !rivate symbols. Dot reload reloads all the symbols. Eust because you've told the debugger where to find them, doesn't mean that it's going to go and load them. #nd that's what .reload does. #nd chec sym followed by module name chec s the validity of the s!ecified module to ma e sure that the symbols that the debugger attem!ts to load for that module are a strict match to the module itself. #nd then there is a !lethora of other commands that indirectly use the symbols as well. Demo? =onfiguring symbols Bet's do a +uic demo of the symbol commands. ;et again, we're going to !ic on *ote!ad as our target !rocess that we want to debug. $o I'll s!ecify *D$% followed by *ote!ad, which will launch us. *ote!ad, a new instance of it, under the debugger. #nd let's use the .sym !ath command &ust to see what the current symbol !ath is. #nd here we can see that the symbol search !ath is em!ty and e-!anded symbol search !ath is em!ty. *ow, what we want to do is set the symbol !ath to the !ublic Microsoft symbol server. We can use the .sym fi- command for that. #nd now we can, again, chec to see what the symbol !ath is set to using .sym !ath. #nd here we can see that the sym !ath is now set to H%%7?FFmsdlmicrosoft.com download and symbols. #nd this is the actual <AB for the Microsoft !ublic symbol server. *ow, let's say that I'm debugging my own a!!lication, and I have a !rivate symbol !ath. How do I a!!end that to the currently set symbol !ath" Well, you can use sym !ath !lus. #nd then s!ecify your own symbol !ath. #nd now we can use .sym !ath, and we see that in addition to the !ublic Microsoft symbol server, we also have our own. #nd they're se!arated by semicolons. *ow that we've told the debugger where the symbols are, we also have to tell it to reload the symbols. #nd for that, we use the .reload command. It says reloading current modules, and it's done. #nd now we are ready to ind of !arty on with the symbols. I'll show you an e-am!le command here. What ' does is it tries to resolve the symbol that you give it and tell you what corres!onding information it find. $o let's say that I wanted to do a function located in 0ernel :4.dll, and the function is called create file wide. %he out!ut of that is that the debugger has actually found that symbolic information, and it correlates it to the following address, which ha!!ens to be where the code is located for that !articular function. ;ou can use the reci!rocal
command to ' called B*, list near, and s!ecify an actual address, and the debugger does the same thing but the other way around. It tries to find symbolic information about that address. #nd here we can see that it tells me that it found an e-act match, and it's what we e-!ected. %he 0ernel :4 create file wide. $ymbol $ervers One of the great things about symbol servers is that you no longer have to worry about where the symbols are located as long as you !oint a debugger to a root !ath, then the debugger figures out by itself which version of the binaries you're running and the e+uivalent symbol files u! on the symbol server. #nd the great e-am!le of that is, of course, the Microsoft !ublic symbol server. #s it turns out, there's no magic there, and the debugging tools for Windows !ac age actually contains all the tools that you need to set u! your own symbol server for your com!any. #nd the ste!s to do that are *o. 9, build your source code. Ma e sure that the 7D/'s get generated or the symbol files. >enerate !ublic symbols. *ow, this is o!tional. If you're !lanning on having an internal symbol server, then you would want to use the !rivate symbols. /ut if you're !lanning on having a !ublic symbol server, then I'd strongly recommend generating !ublic symbols. ;ou store the symbols in the symbol store. #nd then $te! ,, which is o!tional. If you want to e-!ose the symbols through H%%7, you can do that as well. %hat's all that's really the four ste! !rocess. .ery sim!le. Demo? =reating your own symbol server Well, now that we now about symbol servers, let's run through a demo of how to use that really cool feature to set u! your own symbol server. I've navigated right now to a folder called sendF!db. %hat's where my 7D/'s or my symbol files are located that I want to e-tract and !ut into a symbol store. If I do a DIA, I see that we've got hello.test.!db. %hat's my symbol file. #nd now, the only thing that I have to do to create a symbol store is run a tool called sym store. #nd I want to say that I'm going to add symbol files. %he FF tells it where the symbol files are located. In my case, it's this !ath. %he F$ tells it where do you want the symbol store created" $o I'm going to do =?Fsymstore. % basically allows you to give it a friendly name. I'm going to use the FA switch, which tells it in this folder where all the symbols are, I want you to recursively go through subfolders as well loo ing for symbols. #nd then finally, the FG switch tells it whether or not this is a !rivate symbol store or a !ublic. #nd in my case, it's actually !rivate. %he result tells me that it stored one file, which is what we e-!ect as we only had one 7D/ in there. It had no errors, and it didn't ignore any files at all. %hat's really all you got to do to create a symbol store. $o let's ta e a loo at what that symbol store actually loo s li e. I can see that as !er e-!ectations, we have a hello test 7D/ in the symbol store now. /ut the big difference here is that it's not a file. It's a folder. $o let's recurse into that folder and see what's in there. #nd here, we see yet another folder. #nd this one has a long, uni+ue ID of some sort. $o let's go onto that folder. #nd finally, we get down to our hello test.!db. What's ha!!ening here is that a binary and a symbol are tightly cou!led. #nytime you build a binary, it !roduces a symbol file that will only wor with that version of the binary. One of the ways that this cou!ling is done is by creating a uni+ue ID. #nd this is e-actly what we're seeing here. %his uni+ue ID is !resent in the binary, and it's also !resent in the 7D/ itself. *ow, let's load u! the !rocess under the debugger. #nd let me set the sym !ath over to my symbol store. Do a .reload. If you actually attem!t to use symbols from that hello world symbol file, what the debugger does, it goes through all the loaded modules, find hello world.dll, e-tracts that uni+ue ID, and then uses the root !ath, which is =?Fsymstore followed by the name of the binary, hello world.!db, followed by that long, uni+ue ID that it found. %his is the reason why you only have to s!ecify the root !ath to the sym store. #nd the debugger will ta e care of finding e-actly where to go de!ending on which version of the binary is loaded to get the actual symbol file. .ery, very !owerful. 8-!loratory debugger commands *ow, let's ta e a loo at some of the commands that the debugger offers you and that you can use in your debug sessions. %his is not going to be an e-haustive enumeration of all the commands. Aather it's going to be some of the most commonly used commands. #nd we'll start in the category of threads. *ow, one thing to ee! in mind is that whenever you debug, the debugger will have a thread conte-t. #nd from a debugger's !oint of view, what that means is that this is the thread that the debugger currently has selected. #nd if you run any commands that re+uires a thread, it will be done on that thread. $o one of the most common things with
threads is you want to see the call stac . If something crashes, you want to get the call stac so you can see where in the a!!lication it crashed. #nd there is a command called 0. #nd it contains +uite a few other commands by a!!ending a letter to the 0 command. #nd really, all that does is show the call stac for the currently selected thread in a variety of different flavors. For e-am!le, 0/ and 0* will show the call stac with !arameters for each frame, or in the case of 0*, with the frame number. %he tilde star followed by a command tells the debugger that you want a command to be ran for every single thread in the !rocess. /ang uni+ue stac is a great little command. It will wal every single thread and its call stac and try to find out if there are du!licates. #nd this is a great command, for e-am!le, if you're loo ing at a !rocess that has a loc ing !roblem. ;ou've got H55 threads. One of the threads holds a loc . #nd the remaining ,CC threads are actually waiting to ac+uire that loc . Well, instead of wal ing every single thread and loo ing at the call stac , you can do a bang uni+ue stac to get a more summari3ed view. D. shows the local variables for the current frame. $o much li e the debugger has the notion of a thread conte-t, which is the thread as currently selected, it a!!lies the same logic to frames of a !articular thread. #nd then D/ command will sim!ly show the locals for the currently selected frame. If you want to change the frame, then you can use the .frame command followed by the number. %ell .thread number $ tells the debugger that you want to change the currently selected thread to the thread number s!ecified. What about memory" %hat's another common, !roblematic area. What are some of the commands we can do that will tell us more about memory" Well, you've got the D family of commands that can be augmented with a letter s!ecifying how you want the memory to be inter!reted and dis!layed. #nd the out!ut of the D command is raw memory. $o if you do a DD, that's a dum! double word. D2 is dum! +uad word. D<, dum! unit code. D#, dum! #$=II. #nd there is a whole other list of modifiers that you can a!!ly to it. D% allows you to dum! data structures. $o instead of seeing raw data, you can tell the debugger I want you to dum! out memory at the s!ecified address, and I want you to try to inter!ret that data according to this symbolic name that I'm giving you. #nd the debugger will use the 7D/ files, and try to dis!lay that. /ang address shows a very detailed memory overview, consum!tion and usage. /ang hea! essentially does the same thing as bang address, but it focuses in on hea! memory. #nd then finally, reading code. %hat's another very common thing that we do in the debugger. #nd it has some commands to hel! us with that. %he < command, it disassembles code at the current instruction !ointer. </ disassembles code going bac wards. #nd <F disassembles an entire function. /y default, the < command only disassembles ' number of bytes and shows you that if you want to get a good overview or the disassembly of the entire function, then <F is a great little command to use. Demo? 7ee ing into note!ad Bet's ta e some of these commands that we've tal ed about out for a test drive. #nd we're going to use *ote!ad yet again. #nd we're now bro en into the debugger. What I'm going to do first is resume e-ecution by using the > command. #nd you will notice that at that !oint, since the a!! is actually running, *ote!ad dis!lays. If I do a control = now to brea into the debugger, fi- u! my symbols .sym fi- .reload to reload them, and let's say I wanted to dis!lay the call stac for all the threads in the *ote!ad !rocess. I use tilde, star, 0. #nd the out!ut tells us that we have got two threads. # thread with debugger thread ID 3ero and a thread with debugger thread ID of one. One thing you'll notice that's different between these two thread ID's is that this one has a hash sign ne-t to it. #nd this is the s!read that is the currently active thread in the debugger itself. $o if you were to do any ty!e of thread o!erations right now, it would do those o!erations against thread *o. 9. #nd you can also see that in the debugger !rom!t that it has thread ID 9. *ow, let me resume e-ecution once again. #nd I'll go to *ote!ad. *ow, I'm going to invo e the o!en file dialogue and watch what ha!!ens in the bac ground in the debugger itself. $o what we see &ust by o!ening the file dialogue, we've got roughly )5 or so module loads ha!!ening. /rea into the debugger and, again, do a tilde, star, 0 so I can get all the call stac s, all the threads. #nd all of a sudden, we can see that we've gone from two threads u! to 9:, 94 as a matter of fact. #nd that's &ust from o!ening the file dialogue bo-. *ow, let's say, furthermore, that we wanted to switch from the currently active thread in the debugger, and we now that's 94 from the !rom!t or from the hash sign ne-t to the thread out!ut. #nd let's say I'm &ust going to go and !ic one here. #nd I !ic thread *o. C. We would use tilde, thread *o. C, $. #nd all of a sudden, we're switched to thread *o. C. $o now, any ty!e of thread o!eration that we do will be on *o. C. I can do a 0, which will, of course, give me the call stac . *ow, the call stac s themselves the debugger gives you, it tells you the return address for a !articular frame, and then the frame itself. $o here, if we loo ed at it from the bottom u! here, we start in the *% DBB, user thread
start. $o that ma es sense because a thread was started. It goes into 0ernel :4, base threading it. We got a wor er thread. It goes into shell :4 and then it's waiting for something to ha!!en. What if we wanted to unassemble one of these functions &ust to see what the code loo s li e" ;ou can ta e any frame and its symbolic information and do a < for unassemble followed by the name itself. #nd there, we see the disassembly or unassembly for that !articular function. *ow, it only disassembles so many bytes. If you wanted to see the entire function, you could use the <F command and then it gives you the entire function. 7ostmortem debugging *ow, let's tal a little bit about !ost mortem debugging, which is a very ey ty!e of debugging that you will do. In a nutshell, !ost mortem debugging references the ability to debug !roblems offline. *ow, by offline, I mean that you're able to collect information from a faulty machine, bring that information over locally or off of the !roblematic server, let the server resume running, and then debug that fault data on your own machine. $ome common scenarios. When line debugging is not feasible, then !ost mortem debugging is a savior. %he machine is sitting in a data center, and the !roblem occurs. 2uite commonly, a wor around is sim!ly to restart that !rocess and let it run again. Well, what you really want to do is figure what the root cause of the !roblem was. #nd the way to do that is by collecting fault data when default ha!!ens and bring that off and then let the machine restart. Default data is sim!ly a static sna!shot of the live !rocess. #nd this ty!e of fault data is nown as a crash dum! file. Once you have a crash dum! file, you can use the e-act same debuggers. %here's no new tools that you need to learn to debug crash dum!s. ;ou use the same debuggers as you did before. %here are some limitations, however. $ince it is a sna!shot, you can't control e-ecution. $o you can't debug a crash dum! file and tell it to resume, for e-am!le, because there's nothing for it to resume from. It's &ust some static data. How do you go about generating these crash dum!s" %here are a number of different ways. One is &ust using the debuggers. It's on the machine in the data center. %hey have the debuggers installed. ;ou can attach the debugger to the !rocess and issue the .dum! command followed by FMF and then the !ath to where you want that crash dum! written out to. %here are some automatic ways of getting crash dum! files. One is #D !lus. It comes as !art of the debugging tools for Windows. #nd what you can do with #D !lus is tell it to monitor a s!ecific !rocess for a certain condition, and when that condition triggers, #D !lus will generate a crash dum!. Windows error re!orting is another great tool. It is a free service that Microsoft offers. #nd it's a =loud based service. #nd you can log onto Windows error re!orting. ;ou can sign u! for your account. #nd then you can tell Windows error re!orting here are the binaries that I'm interested in getting fault data about when something bad ha!!ens. #nd now, when your a!!lication is out in the wild on customers machines and something bad ha!!ens, Microsoft sends u! a fault re!ort, including a crash dum!. #nd that fault re!ort or crash dum! gets forwarded to your account. #nd you can log in again and get some stats on how many faults you had, and you can loo at the crash dum!s. 7roduct dum!.e-e is another tool. It's a command line drive tool that allows you to generate crash dum!s for !rocesses. $o now we now how to generate them. Well, what about actually debugging them" Well, it turns out that the only thing you have to do different with the debuggers is use the DG switch, which tells the debugger you're interested in crash dum!s. #nd then you s!ecify the full !ath to the dum! file. *ow, when you first o!en u! a crash dum!, the first thing the debugger tells you is what was the last event that ha!!ened. $o for e-am!le, if it was a crash, it might say that there was an e-ce!tion, and it will actually set you to the thread where the e-ce!tion ha!!ened. #nd then it as s you for in!ut. #nd here, you go off, and you can debug &ust li e you do with the lab debug session with the e-ce!tion that you can't control e-ecution. Demo? #D7lus crash dum! generation Bet's wal through a demo of how to generate a crash dum! file as well as how you can use the debugger to debug it. For this demo, we'll use a tool called #D !lus, which is a !rocess monitoring tool that allows you to tell the tool when certain conditions ha!!en, I want you to generate a crash dum! that I can debug !ost mortem. $o let's switch over, and let's !ic an a!!lication that is not so well written. #nd I'll !ic one called 5) overrun.e-e and s!ecify an argument, any arbitrary argument. #nd if I run that, it tells me that it sto!!ed wor ing. $o something bad ha!!ened to the a!!lication itself. $o let's launch #D !lus. #nd it's the a!!lication that is misbehaving is an '() a!!lication. $o you have to ma e sure to run the '() version of the debugger. *ow, #D !lus, by the way in the debugging tools for Windows folder has a very e-tensive document that contains all the o!tions and all the ty!es of monitoring that it can do. I'm going to run the e-ecutable. I'm going
to say I want you to monitor for crashes. I want you to monitor for crashes in the !rocess that's named. #nd if you do find something sus!icious, I want you to out!ut the dum! files into the current folder. I'm also going to run this in the bac ground, so I'll use the start command that will o!en u! the new window. #nd this window tells me that it's starting #D !lus. Bogs and memory !lus files will be !laced in this folder. #nd starting to monitor the following !rocesses. Whenever you're ready to sto! monitoring, you can &ust !ress enter here. *ow, let's rerun our 5) overrun bad a!!lication. #gain, it says it sto!!ed wor ing. #nd it e-its. I'm going to switch bac to the window that #D !lus is running in. $ince the crash has ha!!ened, I'm going to hit enter to sto! monitoring. #nd now, let's ta e a loo to see what was actually !roduced in that folder that it was writing to. #nd what we can see besides a few log and te-t files, we have two full dum!s and one mini dum!. %he full dum!s is usually what you want to loo at because it contains the most amount of information about the !rocess. %he difference between these two is that this full dum! was ta en when the first chance e-ce!tion occurred during a !rocess shut down. *ow, we're not all that interested in the !rocess shut down crash dum! because that's !retty normal. What we are interested in is the full dum! for the second chance access violation. %hat &ust sounds li e a bad crash to me. If I scrolled a little bit to the right here, you'll notice that these all have an e-tension.dm!, and that's the convention for crash dum! files. *ow, let's go ahead. I'm going to switch the directory into this !articular one. #nd let's debug this dum! file. Well, we now it's an '() a!!lication, so we also have to use the '() debuggers. *%$D. *ow, here's the big difference. %he DG switch tells the debugger that you're loading a dum!. #nd we're interested in the second chance e-ce!tion, so I s!ecify the name of the dum! file and then I &ust hit enter. *ow, the debugger has loaded u! the dum! file, and you're ready to do root cause analysis. It loo s very similar to attaching to, for e-am!le, a live !rocess. What you'll notice a little bit different here is that it says this dum! file has an e-ce!tion of interest stored in it. %he stored e-ce!tion information can be accessed via command called .ec-r. #nd the last event that led u! to this dum! file being created was an access violation. *ow, I can do 0 to get the call stac that led u! to this access violation. #nd we can see that we have a cou!le of frames of 5) overrun, but we don't have any function names in here. It's &ust some bi3arre, strange offset. #nd that's because we haven't loaded the symbols for it yet. $o we have to tell the debugger. I'm going to use .sym fi- first for the Microsoft !ublic symbols. #nd then I'm going to say my symbol !ath is at this location. %hat's where the 7D/ is located. $ummary %his concludes the first module of the advanced Windows debugging course. #nd what we covered really is we tal ed about why debugging is such a critically im!ortant s ill for software engineers to have. We too a high level overview of the debugging tools for Windows, which is the debuggers that we're covering. We also show the !ower that these debuggers have, and we have a !lethora of commands at our dis!osal to do various ty!es of investigations. #nother ey feature of these debuggers is that it allows you to create your own symbol servers, which greatly reduces the amount of time you s!end on managing symbols. #nd finally the very !owerful !ost mortem debugging feature. Hea! =orru!tions Introduction and Overview Hi. I'm Mario Hewardt, and I welcome you to the Hea! =orru!tions module that is !art of the #dvanced Windows Debugging =ourse. What we'll tal about in this module is what are hea! corru!tions and ind of &ust a high level overview of why hea! corru!tions are bad. We'll ta e a dee! dive into the Windows memory architecture, including the internals of how the Windows hea! manager wor s, which is really ind of a !reD re+uisite of being able to efficiently debug hea! corru!tions. #nd then finally, we will e-!lore the tools that are available to ma e life a lot easier when debugging hea! corru!tions. Bet's start with what is a hea! corru!tion" Well, at a high level, any time a !iece of memory that's on the hea! has its integrity violated, we have a hea! corru!tion. #nd e-am!les of that is a stray !ointer. It's basically when you have a !ointer that is !ointing off to a location that you don't own, and you write to that !ointer. Overruns. When an a!!lication writes beyond the end of a buffer or a hea! allocation, !er se, you'll have a hea! corru!tion. <nder runs is really &ust the o!!osite of overruns, e-ce!t that it's going bac wards. #nd over deletion. # bloc of memory is freed more than once. %hat will also cause a hea! corru!tion. One of the bad things about hea! corru!tion besides the !roblems that it creates is that it's also arguably one of the toughest bugs to debug. #nd the reason for that is hea!
corru!tion, the sym!toms of a hea! corru!tion may not surface until way later in time ma ing it very, very hard to go from when the a!!lication crashed and bac trac ing to ind of figure out where the hea! corru!tion occurred in the first !lace. =ommon misconce!tions and sym!toms *ow that we now what a hea! corru!tion is, I'd li e to address some very common misconce!tions that I hear. %he first one and !robably the most fre+uent one, the to! most frame is not in my code. How could I !ossibly be the one corru!ting the hea!" Here's an e-am!le of a call stac . If I'm the develo!er res!onsible for 5) overrun, and I loo at this call stac , while I am not the one crashing because I don't have 5) overrun at the to! of the call stac , so I may be inclined to say there's a bug in *% DBB. When in reality, what's ha!!ening here is that 5) overrun violated the integrity of a hea! loc that then caused the hea! manager in *% DBB to crash. My code is not on the stac !eriod. %hat's another e-am!le. 0ind of an e-tension of the first one. ;our code doesn't have to be on the stac and still be the source of the hea! corru!tion. ;ou may be corru!ting memory in entirely different com!onent that could even be a Windows com!onent in your !rocess. #nd because of that, that com!onent crashes out. It must be a hardware !roblem. $o it is true that the hardware !roblem such as bad memory can mas+uerade as a hea! corru!tion. It is very, very uncommon. I use my own hea!, so I can't !ossibly corru!t someone else's hea!. %he hea! manager in Windows allows a!!lications to create their own hea!s. #nd if an a!!lication does so, and wor s only with its own hea!, it does not mean that it cannot corru!t another hea! in the same !rocess because there is no enforcement between the boundaries of the hea! because they all are in the same address s!ace. Window o!!ortunities too small, they would never ha!!en. Hea! corru!tions are sometimes very timing related. #nd you could be in a situation where you've chosen that the window is so small that it will never ha!!en once the !roduct shi!s. Well, in general, if a bug surfaces in internal testing where resources are unlimited in as far as mimic ing all the different customer environments out there, chances are it's going to ha!!en out on the customer machine. $o it should &ust be fi-ed. What are some of the sym!toms What does a !rocess loo li e if there's a hea! corru!tion" Well, the a!!lication can crash. #nd we already saw an e-am!le call stac of that. %he a!!lication can also hang. If you're corru!ting a !iece of memory that controls the logic of the loo!, for e-am!le, well that loo! may, in fact, never end. #nd it's &ust going to sit there and s!in. #!!lication e-hibits undesirable behavior. %his is a very random hea! corru!tion. For e-am!le, if you've got a credit card transactioning system in another !art of your code, corru!ts !art of the system's memory and sets the flag that says whether or not the system should shut down. If it sets that to off, then you're credit card transaction system will at random times &ust shut off, and it's very, very hard to figure out why that ha!!ened. If you are luc y, the sym!toms are close to the source. %hat is really the biggest ey when it comes to hea! corru!tions. Once the !roblem surfaces, you want the !roblem to be as close to the source of the hea! corru!tion as !ossible. <nfortunately, more commonly than not, the sym!toms are far from the source ma ing hea! corru!tions very, very tric y to debug. /ut as we'll see later on, there are some tools out there that can really, really hel! in the troubleshooting !rocess and ind of force hea! corru!tion and the !roblem to occur at the same time. Windows Memory #rchitecture overview One of the ey things when loo ing at hea! corru!tion !roblems is that you have to have a really, really good understanding of how the Windows memory architecture is defined. $o let's start with a high level overview. #t the bottom, you have the virtual memory manager in Windows. #nd that is the one sto! sho! when it comes to memory allocations. %his virtual memory manager wor s on the basis of !ages. $o if someone was to come in, for e-am!le, and say I want to allocate 95 bytes of memory, the virtual memory manager would give you bac at least a !age, which is roughly ,0. #nd that is an incredible waste of resources !rimarily because 95 bytes are being used but ,0 is allocated. #nd so in the user mode s!ace, Windows introduces a more efficient mechanism, which is layered on to! of the virtual memory manager. #nd that's called the Windows hea! manager. #nd the Windows hea! manager calls the virtual memory manager to get !ages of memory, but it manages those !ages internally so that you're not wasting a large !art of the !age for each allocation. $o if you allocated now 95 bytes of memory through the hea! manager, the hea! manager would as for a !age from the virtual memory manager. It would mar that !age as saying 95 bytes are being used, and the rest are free. $o it does all that boo ee!ing for you. On to! of the Windows hea! manager, you have one or more other hea! managers that are built on to! of the hea! manager. For e-am!le,
this = run time hea! sits on to! of a hea! manager. It adds some additional logic in as far as how it wants to manage memory. %here are other hea! managers out there. %hird !arty hea! managers. #nd there's also a default !rocess hea! that very single !rocess in Windows will always have. #nd finally, you have the a!!lications. Most commonly, the a!!lications sim!ly use the = run time hea! by invo ing new or malloc. It can also use the default !rocess hea!. #nd the a!!lication is free to use the virtual memory manager directly if it so chooses. Windows Hea! Manager Overview *ow that we understand the high level overview of the Windows memory architecture, let's focus in on the hea! manager itself. #t a high level, the hea! manager consists of two com!onents. One is the front end allocator, and one is the bac end allocator. *ow, the front end allocator is nothing but a table nown as a a loo aside table. #nd it contains u! to 94@ entries. 8ach entry corres!onds to free hea! bloc s of a certain si3e. $o at entry one, the loo aside table stores all the free bloc s of si3e 9) bytes. #t inde- 4, all the free bloc s at 4,, all the way u! to inde- 94@ where it stores all the free bloc s of 954, bytes. *ow, when your a!!lication ma es a hea! allocation, the re+uest first always comes to the front end allocator. #nd what the hea! manager does, it very +uic ly finds out based on the number of bytes you re+uest, which inde- that corres!onds to, goes into that inde-, and sees are there any free bloc s available in that si3e. If there are, it sim!ly !o!s that out of the list and returns it to your a!!lication. $o very, very +uic and efficient way of getting at memory. If however, there are no free bloc s of that !articular si3e, the allocation then travels on to the second com!onent, which is the bac end allocator. *ow much li e the front end allocator, the bac end allocator contains a table. It's also nown as the free lists. %he free list at inde- 4 contains free bloc s of si3e 9), inde- : of 4,, all the way u! to 94@ that contains a list of free bloc s of si3e 95 9) bytes. *ow, what's different between the free lists and the front end allocators table is inde- 5. %he free lists here at inde- 5 contain allocations of variable si3e. %hese are sorted by si3e, so if your allocation doesn't fit in any of these other slots in the free lists, it will then go to the variable si3e one and try to find one from there. *ow, the bac end allocator also contains one or more segments, which we'll tal a little bit more about later. /ut the segments really re!resent the memory that's available to the hea!. Hea! $egments Hea! $egments. *ow when an a!!lication re+uests memory from the hea! manager, they hea! manager in turn re+uests memory from the virtual memory manager. /ut in order to avoid having to ta e too many round tri!s to the virtual memory manager, the hea! manager as ed for a larger chun of memory. #nd that chun is nown as a segment. From this segment, the hea! manager then hands out allocations as the a!!lication re+uests them. 8ach segment contains what's nown as a hea! bloc . $o when your a!!lication re+uests 95 bytes of memory, the hea! manager will return those 95 bytes, and that's now considered to be a hea! bloc . %he structural integrity is maintained by the hea! manager itself. %he segment is re!resented by a symbolic name, Ihea!Isegment. It's !erfectly valid to be in the debugger. #nd if you have a !ointer to one of these segments, you could use the D% command to get the data. *ow, if the memory allocation comes in, and there's no available memory in the segment or segments that the hea! manager already has, it them ma es another call to the virtual memory manager to create another segment. Here's an e-am!le of a hea! segment structure. We have a free bloc , which will also be on the free list. #nd you have a busy bloc . #nd that's the bloc that your a!!lication is using followed by another busy bloc , and then an uncommented range. *ow when the hea! manager allocates another segment of memory, it doesn't commit all of that memory. It commits a small !ortion of it. #nd then as that !ortion gets e-hausted and it hits the end or the beginning of the uncommitted range, it then commits some more memory and uses that as the a!!lication re+uests memory. Hea! /loc s *ow that we now what a hea! segment is, let's loo at what a hea! segment contains. Well, the hea! segment contains hea! bloc s. #nd they're really the basic allocation unit of the hea! manager. $o when your a!!lication re+uests memory, the memory that's returned is called the hea! bloc . #ll the hea! bloc s must be contained within a segment. #nd each hea! bloc also contains what's nown as allocation metadata. *ow the allocation metadata is a notion that allows the hea! manager to effectively manage the memory. $o for
e-am!le, a hea! bloc 's allocation metadata may contain the si3e of the bloc and the si3e of the !revious bloc . With &ust those two !ieces of information, the hea! manager can start at the beginning of the segment and traverse every single bloc that's !art of that segment. $o the hea! loc consists of three !ieces. %he !reD allocation metadata. It contains the user accessible !art and the !ost allocation metadata. %he user accessible !art is the !art of the hea! loc that your a!!lication can write to. #nd that will be the si3e that the a!!lication re+uested. If we ta e a closer loo at the !reDallocation metadata, it's going to contain things li e current si3e, !revious si3e, segment inde-, flags, an unused field, and a tag inde-. %hree fields in this metadata that's most commonly used and loo ed at is the current si3e, the !revious si3e, and the flags. %he flags sim!ly tells what the state of the bloc is. $o if the bloc is, for e-am!le, being used by an a!!lication, then the flags would say that it's a busy bloc . If it's a free bloc , allocation has already freed it, well, then it's nown as a free bloc . Following that !reDallocation metadata, you have your user accessible !art. %he !ost allocation metadata also contains some fields. For e-am!le, the suffi- bytes, the fill area, and hea! e-tra. %he tools we'll be discussing a little bit later on can use this !ost allocation metadata. More s!ecifically, the fill area. #nd it will fill out that area with a certain !attern. $o if your code for some reason has a bug in it that it doesn't res!ect the boundaries of the allocation and goes beyond the end of it, it will start writing over this !articular fill !attern. Well, the ne-t time the hea! manager is able to go and validate the integrity of it, it's going to notice that something has changed there, and then it's going to essentially tell you about it by brea ing into the debugger. Hea! =oalescing One very common !roblem that virtually all memory managers suffer from is that of memory fragmentation. #t a high level, memory fragmentation refers to there not being enough contiguous s!ace to satisfy a memory allocation. %here may be enough overall s!ace, total s!ace. It's &ust that it's not contiguous, and that ha!!ens to be a !reDre+uisite. *ow, the Windows hea! manager uses the strategy nown as hea! coalescent to minimi3e hea! fragmentation. #nd hea! coalescent basically means that the hea! manager merges ad&acent free bloc s into one larger bloc to avoid having a lot of small memory allocation sitting around. /y doing so, it increases the chances of another memory allocation to succeed based on that bigger bloc . For e-am!le, if we had a segment that consisted of three bloc s, one of si3e 9) that's free, one of si3e :4 that's busy, and one of si3e 9) that's free. *ow, if the a!!lication freed the memory bloc of si3e :4, we would end u! with three free bloc s. $i3e 9), :4, and again 9). If the a!!lication then re+uested ), bytes, well, there isn't one single bloc anymore that has ), bytes available in si3e. Aather, it's got 9), :4, and 9), and an auto memory condition will ensue, even though there is enough total s!ace. Well, what the hea! manager does instead of ee!ing those three bloc s, when the allocation of si3e :4 is freed, it notices that it's got three bloc s that it could coalesce into one big one of si3e ),. *ow, when the a!!lication allocates ), bytes, there is enough contiguous memory. Bow Fragmentation Hea! %o further reduce hea! fragmentation, the low fragmentation hea! was introduced. It's actually been available since Windows '7, but it wasn't turned on by default, and a!!lications had to e-!licitly enable it. $tarting with .ista, the low fragmentation hea! is on by default. *ow, the way that it attem!ts to minimi3e fragmentation even more is that it organi3es allocations into buc ets of the same si3e thereby reducing the ris of fragmentation. %he bloc s are housed in sub segments on the bac end allocator segments. #nd one caveat is that the allocation metadata for each of the hea! loc s on the low fragmentation hea!s, they're scrambled for security reasons. %he good news though is that the debugger commands for viewing the hea! bloc s, it actually understands and nows how to descramble that and show you the !ro!er values. %ools for debugging hea! corru!tions $ince hea! corru!tions is such a tough !roblem to debug and troubleshoot, what are some of the tools that we have available at our dis!osal to hel! us with that" Well, again, as mentioned before, the goal for hea! corru!tion is really to brea when the corru!tion occurs and not after. If you're able to achieve that goal, hea! corru!tion becomes a lot easier to debug. 7ay =hea! is a tool that can hel! us with that. #nd the way that 7ay =hea! wor s is that it annotates the hea! bloc s to trigger a fault when they're right or when the hea! corru!tion is actually ta ing !lace. *ow, 7ay =hea! comes in two flavors. %here is light 7ay =hea!, which
uses fill !atterns. #nd by using fill !atterns, the hea! manager is able to detect if that well nown !attern has been overwritten or not. %he !roblem, however, with the fill !atterns is that the hea! manager doesn't chec constantly the integrity of the hea!. It really only has an o!!ortunity to do that when a new allocation is re+uested or, for e-am!le, when freeing a bloc of memory because that's the only entry !oints that you have into the hea! manager for it to do something. $o you may still be in situations with light 7ay =hea! where it will brea into the debugger, but it's still after the actual hea! corru!tion occurs. %he second ty!e of 7ay =hea! is called full 7ay =hea!. It also uses fill !atterns and something that's nown as guard !ages. *ow, a guard !age, what that means is that the hea! manager will ta e one !age of memory and a!!end it to every single hea! bloc that it's created in the hea! manager. #nd it sets that !age of memory to be inaccessible. $o we have this hea! bloc now that we've allocated. <nder full 7ay =hea!, the hea! manager attached an inaccessible chun of memory to the end of it. *ow, when our code that's doing the hea! corru!tion is writing !ast the end of that buffer and hits that inaccessible !iece of memory, an access violation will occur right away. $o there is absolutely no ga! between when the hea! corru!tion ha!!ened and when the sym!toms actually came out. #nd that is e-actly what you want to achieve. *ow, there is a downside with full 7ay =hea!, and that is because every single hea! bloc will have a !age of memory attached to it, it is very, very memory intensive. Demo? Debugging a hea! corru!tion manually #ll right. $o let's run through a few demos after all this theory on how to actually go about debugging hea! corru!tion. %his is going to be a series of three demos debugging one and the same a!!lication, but with three different techni+ues. #nd we're going to start by actually manually debugging a hea! corru!tion. %he a!!lication that we're going to use is called 5) overrun. #nd that ta es an argument. I'm going to s!ecify something random. #nd when I run it, we notice that it crashes out. *ow, if we sus!ect a hea! corru!tion, let's attach a debugger to the a!!lication, so I'll start it again, and this time, I'm going to start it in the bac ground. #nd then I'm going to attach the debugger. %his is an '() a!!lication. I'm going to attach my !rocess name, fiu! my symbols, and resume e-ecution. *ow, I'll switch bac , and then &ust continue running. $o what ha!!ened in the debugger here" What ha!!ened is that it bro e in, and it tells us that an access violation has occurred. *ow, if I do a 0 command to get the call stac , what I see is our 5) overrun is in its main function. It calls some function called du!e string, which calls into hea! free !resumably to free some memory, and it eventually crashes out with an access violation. $o this is ind of an e-am!le of a hea! corru!tion where the sym!tom, the access violation is actually not where the hea! corru!tion actually too !lace, where something overwrote memory that it wasn't su!!osed to. What we can do here if we want to manually debug this, we can use a command called bang hea!, which allows you to drill down in the hea! +uite a bit. #nd I'm going to first use the D$ switch, which says give me &ust a +uic , statistical dum! of all the hea!s in this !rocess. #nd what you'll see at the bottom is a list of hea!s. We have a hea! with this hea! handle, another one, and then finally a hea! with this !articular hea! handle. $o we have three hea!s. *ow, this a!!lication isn't creating any new hea!s. $o it's &ust using the default !rocess hea!. I can get more information about that hea! by doing bang hea! D# followed by the hea! handle. #nd really, what this does, it wal s that hea! and tells it every single hea! bloc that's there. #nd I scrolled u! &ust a little bit, and you'll see that we have hea! entries for this !articular segment, 55. %his is a very small a!!lication, and it doesn't use more than one segment. #nd really, these are the hea! bloc s that we were tal ing about that are contained within the segments themselves. %his first entry is the address to the hea! bloc . It's not the address to the user accessible !art of it, but it's the address to the !reDallocation metadata. $o if you wanted to dum! out the memory in the user accessible !art, you're going to have to ta e this address and add ( bytes to it because that's the si3e of the !reDallocation metadata. %his is the si3e of the !revious bloc . Well, in this case, it's 3ero. %his is the first bloc in the segment. #nd this is the current si3e. *ow, as you can see, when the hea! manager goes to the ne-t one, the !revious si3e is H((, which matches u! with the current si3e of the !revious bloc , and the current si3e is 4,5, etc. We scroll down all the way to the bottom, we notice that we have a !revious si3e of 4( that matches u!. We have a !revious si3e of :5 that matches u!. #nd then all of a sudden, we have a !revious si3e that is loo ing relatively strange. #nd it certainly doesn't match u! with the current si3e. $o one of the theories now is that somehow, something overwrote the !reDallocation metadata of this bloc , which should contain the si3e of the !revious bloc . # reasonable theory would be that somehow, someone that owned this !articular bloc continued writing !ast the end of the bloc thereby overwriting the !reDallocation metadata of the ne-t bloc . Well, let's ta e a loo and see if we can find out what's in this !articular hea! bloc . I can do a DD for dum!
double word. %hat address !lus ( bytes because we want to see the user accessible !art. We want to ind of see if we can e-tra!olate the data in there. #nd what we end u! seeing is a re!eated !attern of 55:5, 55:5. *ow a unit code string ind of has this !attern. $o instead of using DD, I can use D< for dum! unit code, the same address, and now we see something different. We see a whole bunch of 3eroes. #nd if we remember bac on how we actually e-ecuted this a!!lication, on the command line, we !ut in a bunch of 3eroes. $o now, the theory is that whoever owns that !iece of memory !uts the command line into it, but it doesn't res!ect the si3e of that memory bloc and continues writing !ast that. $o the ne-t !art of this root cause analysis would then to be to go bac and see in the source code where is this argument used that's !assed and then try to figure out where a bloc is overwritten by it. $o this is ind of the manual way of doing a hea! corru!tion debug session. Demo? Debugging a hea! corru!tion using light !agehea! In the last demo, we manually debugged a hea! corru!tion, and it was somewhat !ainful because you have to manually trac the hea! and all the hea! logs and try to figure out where the integrity of the hea! has been violated. #nd from there, try to e-tra!olate what the data was. Well, let's use the tool this time and see if that can hel! us get to the root cause a little bit faster. #nd what we're going to use is called light 7ay =hea!. *ow, light 7ay =hea! is actually enabled through another a!!lication called a!!lication verifier. #nd that tool is available to download for free off of Microsoft.com. ;ou can sim!ly search for a!!lication verifier. #nd you have a lin here for download details. What's critical here is to ma e sure you download the right architecture. $o we've got #MD ),, '(), and then Itanium. Download and install is very straight forward. #nd when you run a!!lication verifier, it tells me that an instance is already running. $o let me bring that one u! instead. It sim!ly shows you an em!ty window for the a!!lications that are enabled as far as the tests. $o the first thing you want to do is tell a!!lication verifier which a!!lication you're interested in monitoring. #nd you can go to file, add a!!lication, and in our case, it's still 5) overrun.e-e. #nd then, on the right hand side, a bunch of tests a!!ear. %he one that you're interested in for hea! corru!tions is under the basics node, and it's called hea!. $o I'm going to go ahead and chec that. I'm also going to right clic , which allows it to choose a few more o!tions, and then clic !ro!erties. #nd you'll notice here that it says full. %hat means that full 7ay =hea! is enabled. /ut what we really want to do here is enable light 7ay =hea!. $o you &ust clic the full off, clic o ay, and then save. #nd that's all you got to do. *ow, let's go into our folder where our a!!lication is located. #nd we're going to run the a!!lication that !erforms the hea! corru!tion under the debugger s!ecifying argument. Fi- u! my symbols. #nd then it as ed me to !ress any ey to start. $o I hit enter, and what you'll see now is a verifier sto!. $o this is a!!lication verifier going in and chec ing various as!ects of the hea!. #nd if it finds a fault or a !roblem, it's going to re!ort that to you. #nd then it says that it's the corru!ted suffi- !attern for a hea! bloc . It tells you what the hea! handle is, the actual bloc that was involved in the o!eration, and the si3e of the hea! bloc . *ow, if I do a 0 to get the call stac , what we see is that our a!!lication is calling into a function called du!e string, which then calls into hea! free. Well, hea! free itself can't cause a hea! corru!tion. $o we now that a hea! corru!tion has already occurred, but it's only being re!orted when the hea! free is being ran. #nd the reason for that is the way that light 7ay =hea! wor s is that at the end of each bloc , it a!!ends a nown fill !attern. #nd if you have a hea! corru!tion that fill !attern is going to get overwritten. #nd the ne-t time the hea! manager gets an o!!ortunity to do anything, it will chec the hea! !atterns, fill !atterns, and if they're not what it e-!ects to be because it's been overwritten, it will then brea and tell you this message. $o we haven't really gotten to the !oint yet where we can say that the hea! corru!tion is at the same !oint when the debugger brea s. *ow, this gives us some amount of information because we now that we're in 5) overrun du!e string. #nd that may be sufficient when it comes to code review. /ut if it's not, we ind of have to em!loy a more !owerful tool, which I'm going to show you in the ne-t demo. Demo? Debugging a hea! corru!tion using full !agehea! $o far, we've seen two demos. One was a manual debug session of a hea! corru!tion. #nd the second one was using a tool called light 7ay =hea! to see if we can debug and find the root cause. Well, both tools were actually useful in the sense that it gave us some information, but it still did not ta e us to our ultimate goal, which is brea in the debugger as soon as the corru!tion ha!!ens and not afterwards. #nd what we're going to do now is use this final tool called full !age hea! to see if we can achieve that. 7ay =hea!, again, is enabled through a!!lication verifier. $o I'm going to switch over to that. #nd I'm going to go over to hea!, right
clic !ro!erties, and select full. %hat will enable full 7ay =hea! for the !rocess. I'm going to launch 5) overrun again, which is the a!!lication that corru!ts the hea!, fi- u! my symbols, and resume e-ecution. It as ed me to !ress any ey to start. #nd this time, we get an access violation. %hat's ind of interesting because it ind of seems li e we're ta ing a ste! bac . In the manual debug session, we got an access violation. It didn't give us a whole lot of detail. In the light 7ay =hea! case, we didn't get an access violation. We got a nice message saying that a hea! bloc was corru!ted. #nd now, with full 7ay =hea!, we're bac to the access violation. Well, let's get the call stac and see what that loo s li e. #nd this actually loo s a whole lot better. We're not crashing in the hea! manager somewhere. #s a matter of fact, we're crashing very, very close to our a!!lication itself. $o we have our 5) overrun main calling into du!e string, which calls and does a string co!y, and that's where the crash ha!!ens. %his is an illustration of the !ower of full 7ay =hea!. /ecause full 7ay =hea! a!!ends an entire inaccessible !age to each hea! bloc , if you try to overwrite the boundaries of your memory, you will hit that !age of inaccessible memory and access violate right away. $o now, it's really a matter of going into du!e string because we are very certain that that's what's corru!ting it and figuring out what the !roblem is. Full 7ay =hea!, very, very !owerful. $ummary %his concludes the hea! corru!tion module. #nd what we've done in this module is we've loo ed at what a hea! corru!tion is. We got a good overview of the Windows memory architecture. We too a dee! dive into the Windows hea! manager internals tal ing about the front end allocator, the bac end allocator, its segments, hea! bloc s, etc. #nd we also too a loo at the tools that are available to us to ma e debugging hea! corru!tions easier. Aesource Bea s Introduction and Overview Hi. I'm Mario Hewardt, and welcome to the third module, resource lea s, of the advanced Windows debugging course. In this course, we will ta e a loo at what a resource is and what a resource lea is. We'll loo at some common ty!es of categories of resources, one being a handle, as well as hea! memory, and finally, the tools that we have at our dis!osal for efficient lea detection and resolution. #nd finally, we'll also loo at some of the !reDem!tive strategies that you can !ut in !lace to ensure that a lea doesn't cro! u! in the future. $o let's begin by defining what a resource is. #nd at a very, very high level, it's anything in the system that occu!ies s!ace. #nd s!ace usually comes in the form of memory. Aesources can come in many forms. Handles is one e-am!le. For e-am!le, if I call create file, a handle will be returned to me. $ynchroni3ation !rimitives, critical sections, for e-am!le, hea! memory, which is !robably the most commonly associated ty!e of resource when it comes to lea s, virtual memory and many, many more. 8ven though we have what a!!ears to be a lot of different ty!es of resources, don't let that abstraction fool you because at the end of the day, it all boils down to memory. One of the !roblems that having these different ty!es of resources available to us is that there are different #7I's for ac+uiring them and for releasing them. For e-am!le, for handles, I've got to ma e sure that I call close handle to free. With a synchroni3ation !rimitive, I've got to ma e sure I call delete critical section, for e-am!le. $o we have different sets of #7I's de!ending on the resource that we're wor ing with. /ut underneath the covers, the resource really boils down to memory. $o what is a resource lea " It's when an owning !rocess ac+uires a resource but fails to release it. For e-am!le, if I call create event, and I assign that handle to a variable, and I don't call close handle on it, we will have a lea . $imilarly, for calm, call +uery interface. #nd if I I don't call release on that interface !ointer, there will be a lea . $ame for hea! memory. *ow, the ey thing here is that most commonly, a resource lea is confined to an address s!ace or to a !rocess. /ut not always. %here are ways where you can create resources that s!an multi!le !rocesses, and it's something to be aware of in case you really can't find why a resource isn't being freed when you're absolutely !ositive that within your own address s!ace or within your own !rocess that you're doing it correctly. Bastly, it's &ust evil, es!ecially when it's to easily re!roduced. %he most frustrating thing about resource lea s is when you have one and the same a!!lication, you feed it the same set of data twice only to have it lea really bad the first time and not at all the second time. %hose are the ty!es of !roblems that are really tough to debug. $o why are resource lea s bad" Well, whether we li e it or not, resources are unlimited. If a !rocess lea s a lot of memory, it's ta ing away memory from other !arts of the system and eventually, something has
got to give. #vailability can go down. If you lea a lot of memory, you may end u! starting to swa! a lot of data bac and forth between the !age file and memory. #nd that can cause availability to go down. In a similar fashion, if you're doing a lot of swa!!ing because of a lea , !erformance is going to go down, and that's going to have a net effect on the bottom line. Eust li e any other bug, the cost to fi- is high after release. Misconce!tions and sym!toms %o! resource lea misconce!tions. Windows cleans it u! so I don't have to. While it is true that when a !rocess terminates, Windows will clean u! the resources that the !rocess was using. It's not something that I would rely u!on though, and here's the reason why. If you have a !iece of code that is a command line a!!lication, it's meant to run &ust for a few seconds, do its wor and finish, you may have a resource lea in there and decide not to do anything about it because the intention was for this code to only be run in the conte-t of this console a!!lication. Well, down the line, someone else decides that they want to borrow the !iece of code that you have. /ut instead of !utting it into a short lived command line a!!lication, it gets !ut into an *% service, which runs 4,F@, and the lea &ust continuously accumulates. $o it's always good !ractice, even though Windows does clean it u! for you, to ma e sure that all resources are freed. I'll &ust recycle the !rocess if it gets out of hand. %here are a few technologies out there that ta e this a!!roach. If memory grows too high, they ill the !rocess and immediately restart it. One of the !roblems with this a!!roach that is very, very difficult, unless you're entirely stateless, to ensure that a !rocess can be torn down at any given !oint in time without affecting the state of the !rocess. #nd it's very, very hard to !ro!erly create an a!!lication that can recover from something li e that. %he code I was calling was su!!osed to clean it u!. It doesn't ha!!en all that fre+uently, but there are some #7I contracts where you can !ass in a buffer that you allocate. %he #7I does some things to the buffer, and then that #7I actually frees it. $o what are some of the sym!toms with resource lea s" Well, the +uantity of any given resource ee!s increasing over time. For e-am!le, the handle count ee!s going u!, virtual memory usage goes u!. %he hea! memory goes u!. One thing to be wary about is caching, which a lot of a!!lications utili3e. It may be that the a!!lication caches a lot of these resources only to use them or reuse them later. %hose obviously are not memory lea s unless there is a bug in the cache. $o &ust because memory grows, don't immediately assume that it's a lea !er se. What I ty!ically do if I see memory grow is I get it to a !oint where it's grown +uite a bit, and then I'll let the !rocess sit idle. Don't do anything at all with the !rocess. #nd ty!ically, caches will start !urging themselves after a while if inactivity. $ystem becomes sluggish. Well, if you're using a lot of memory, and you're thrashing on the !age file, that will slow the system down. $!oradic failures. If the !rocess is lea ing memory, and eventually gets to a !oint where when it allocates memory, it gets an out of memory error code bac , it is very, very hard to !ro!erly unwind the a!!lication when there is no free memory left. #nd so what ha!!ens a lot of times is a!!lications will &ust flat out crash, whether it be an access violation or something else. #nd what you see are s!oradic failures. Inability to start new a!!lications. If the overall system commit limit has been breached, then Windows will actually sto! you from starting new a!!lications. 7rocesses flat out dies. It's another very common sym!tom of a resource lea . Handle Overview *ow, let's turn our attention to a very common ty!e of resource in Windows, and that's a handle. # handle is nothing more than a user mode identifier of a Windows ob&ect ty!e or a 0ernel ty!e, if you will. 8-am!les of Windows 0ernel ob&ect ty!es are file ob&ects, !rocess ob&ects, and thread ob&ects. #nd the user mode code uses the handle to interact with these o!erating system ob&ects. #nd really, the best way to view it is that it's an isolation layer between the user mode code and the underlying 0ernel. $o rather than the 0ernel allowing user mode code to directly mani!ulate its data structures such as a file ob&ect, it introduces this level of abstraction so that it can tightly control what o!erations are !erformed for user mode. If it didn't do that and you had a bug in the user mode code, it could literally ta e down the 0ernel in which case the entire machine goes down. Here's an illustration. Boo at the to! !art here. %hat's our user mode code. #nd we call this create event #7I. What that does is it returns a handle to a 0ernel mode event ob&ect in user mode. *ow, when create event is e-ecuted, it first goes through this abstraction layer, this Win :4 #7I. %hat in turn thun s down into 0ernel mode. #nd the 0ernel nows which user mode !rocess is actually re+uesting it. #nd it loo s u! the 8 !rocess data structure that it ee!s internally. %he 8 !rocess data structure has a !ointer to a table. *ow, this table has a reference count, an ob&ect count, and then an ob&ect field. *ow, the reference count is sim!ly saying how
many references to this !articular ob&ect do we have. In our case, since we're creating a new event, the reference count will be bum!ed u! to one. %he ob&ect field sim!ly contains the address to the event data structure in the 0ernel. *ow, since the events or all 0ernel ob&ects are reference counted, it means that someone else could call create event, re+uest them to create the same event that was created before in which case the reference count would to u! to two. *ow, when that !iece of code then subse+uently calls close handle, the reference count goes to one. #nd then the original !iece of code that created it to begin with calls close handle. %he reference count goes to 3ero, at which !oint the underlying 0ernel mode ob&ect is freed. %ools for debugging handle lea s If we sus!ect that we have a handle lea in the !rocess, what are some of the tools that we can use to ma e it easier for us to debug that !articular !roblem" Well, tas manager is a great tool to do some initial triaging. It's available out of the bo- in Windows. It's even a shortcut to it. =ontrol shift esca!e. #nd what that tells you is a total handle count for your !rocess. #nd this is a great clue that you can get by watching a !rocess and see that the handle count ee!s going u!, u!, and u! and doesn't really come bac down. %hat could be a ey indicator that you've got a handle lea in your !rocess. %he ne-t tool that gives you more information is called !rocess e-!lorer. #nd that shows the handle count as well. Aight on to! of that, it tells you the ty!e of the handle. $o for e-am!le, is it a file" Is it a mutant, a section ob&ect, etc." It also gives you the name of the handle, the handle value itself, and it can come in very handy when figuring out what ty!e of handle is being lea ed. For e-am!le, you use tas manager. ;ou see that the handle count goes u! to :,555. ;ou now use !rocess e-!lorer, and what that tells you farther is CC !ercent of those are threads handles. Well, based on that nowledge, you can now go into your code base and go and loo at the different s!ots where you're using threads and then figure out why it's not !ro!erly being dis!osed or closed. If it turns out that your code base is really, really big or that you're using threads all over the !lace, and it's not an easy tas , &ust do a code review to narrow it down. %here is one more tool, and it's a command in the debuggers themselves. It's called bang H trace. /ang H trace will tell you not only what tas manager and !rocess e-!lorer can tell you, but it will also give you the e-act call stac where the handles were being created and not closed. #nd armed with that nowledge, you can !in!oint directly in your code where the offending create handle was called but not closed. Demo? Debugging a s!oradic handle lea Bet's ta e a loo at an interesting demo of an a!!lication that's lea ing handles. *ow, what ma es this a!!lication interesting is that, even with the same in!ut, this a!! is going to lea a different number of handles every time it's ran. $o I'm going to start by running it. It's 5CH lea . It ta es a few arguments. %he first one is the number of threads that you want the a!!lication to run. $o let's !ic 45. %he ne-t argument is the number of iterations !er thread. I'm going to set that to 95. #nd then the slee! time in between each iteration, and we'll &ust do 3ero here. *ow, before we !ress any ey to start the stress a!!lication, I'm going to bring u! tas manager, control shift esca!e. #nd I go to the !rocesses tab, find the name on my !rocess. It's right here. /y the way, the star :4 ne-t to the !rocess name means that it is a :4 bit a!!lication. #nd I am running this on a ), bit machine. *ow, if we wanted to find out what the handle count is, it's going to be one of the tabs u! here. /y default, tas manager will not show the handle count. #nd the way to enable that is go to view, select columns, scroll down, and !ic handles. =lic o ay, and we see that a handle count or a handles column was added. #nd for our a!!lication right now is 49. $o we now that the a!!lication right at start u! has 49 handles. If I switch bac to the window and !ress indicator start, we see that the handle count all of a sudden goes u! to 95,. *ow, I can choose to let this sit here for a bit to ind of convince myself that this is not some sort of a caching thing. /ut since the a!! is about ready to e-it, I'm going to &ust safely assume that it's not. $o I'm going to e-it the a!!. #nd let's run it again. #t start u! time, 49 handles, which is the same as last time. #nd now, the handle count is 94,. $o it's actually lea ed a few more handles this time, about 45 of them. #nd if we ee! running this a!!lication, we're going to see different behaviors every single time with the same in!ut. $o how do we debug this" Well, I'm going to launch the a!! once again. I'm going to run it in the bac ground. It brings u! this window. #nd I'm going to attach a debugger to it. Aemember, it's an '() a!!lication, which means I have to run the '() version of the debuggers, use a !rocess name this time. #nd now, we're attached. Fi- u! my symbols as we always do. I'm going to resume e-ecution. #ctually, before we resume e-ecution, the command that we're going to run to trac handle lea s is the bang H trace. $o if I sim!ly &ust run that, it tells
me that handle tracing is not enabled for this !rocess. <se bang H trace to enable it. $o I'm going to go ahead and do that. #nd it tells me that handle tracing is enabled and that a sna!shot was successfully ta en. $o right now, the command has ta en a sna!shot of the handle activity in the !rocess. *ow, let's resume. #nd I'm going to !ress any ey. It's done its wor . /efore it e-its, I'm going to go bac to the debugger, brea in using control =, and I'm going to run bang H trace again. $o H trace at this !oint too another sna!shot of the handle activity and out!uts very detailed information about it. $o let's go ahead and scroll u! to the to!. Aight here. %his is where we ran our most recent H trace. %he out!ut is bro en out on a !er handle basis. $o here, we've got one handle followed by another handle, etc. For each of the handles, it tells you the handle value, the state of the handle that it was closed, the thread ID that closed it, as well as the !rocess ID. #nd then comes !erha!s the most useful information, which is the call stac that led u! to the handle being closed in this case. #nd I can scroll through all of these handles. Bet me scroll down a little bit +uic er. We find an o!en handle somewhere. #nd we've got an o!en handle and its associated call stac . $o this is all really great information because it tells you e-actly when handles are o!ened and closed and the call stac s of when those o!erations ha!!en. %he !roblem though here is that you would actually have to go in and loo at every single o!en handle with that !articular handle value and see if there is an e+uivalent close somewhere. If you're only tal ing about a handful of handles, !erha!s not such a big deal. /ut if you're loo ing at thousands u!on thousands of handles, that's a very, very long !rocess and is very error !rone. Fortunately, bang H trace has this magic switch called Ddiff. #nd it does all that wor for you. *ow, what we have is a list of only the o!en handles that have not been closed in the !rocess. #nd if I ind of loo through here, ta e the bottom most one here, I loo at the call stac . We have our own CH lea . It's our a!!lication. %hread wor er !robably means that it s!awned u! a thread. =alls into = server get sid function. =alls into get to en. #nd then eventually calls into o!en !rocess to en. $o o!en !rocess to en will return a handle to a to en. For some reason, the a!!lication is not closing it. Well now, we've really got this !in!ointed down e-actly to the !iece of code that does the o!en, and it's relatively trivial to now go bac and do more sco!e code review to figure out why the handle isn't being closed. #s a matter of fact, if I wanted to convince myself that most of the handles that were lea ing are due to the same call stac , I can ind of &ust scroll u! a little bit. Here we see another one. >et sid, get to en, o!en !rocess to en, get sid, get to en, o!en !rocess to en. #nd for the most !art, that is the consistent theme in this a!!lication. %hat's the !iece of code that is lea ing that handle. Windows Hea! Manager Overview Bet's turn our attention now to !erha!s the most common ty!e of resource lea , and that's memory lea s. In the !revious module, the hea! corru!tions module, we already too a brief loo at what the overall Windows memory architecture loo s li e, and I'm &ust going to do a +uic reca! of that. %he Windows hea! manager is the most commonly used allocator for user mode a!!lications. %he hea! manager uses the virtual memory manager in order to more efficiently !rovide manageable chun s of memory. Aemember, the virtual memory manager wor s on the basis of !ages, so even if you were to allocate 95 bytes, the virtual memory manager would give you bac a !age about ,0. %he Windows hea! manager sits on to! of the virtual memory allocator and ma es that allocation !attern much more efficient so that when you re+uest 95 bytes, you will actually get bac 95 bytes. %ools for debugging hea! lea s %ools for hea! memory trac ing. %he first tool is called <MDH, and it's !art of debugging tools for Windows. #nd much li e bang H trace that we can use to trac handle lea s, <MDH does a very similar thing. It trac s hea! based memory allocations. #nd furthermore, it will trac the call stac s of when the memory was allocated so that it gives you a concrete !in!oint in your code who allocated the memory. #nd from there, you can there go and loo to see why it wasn't being freed. *ow, one caveat about <MDH is that it re+uires o!erating system instrumentation to be enabled via a tool called > flags, which is also !art of debugging tools for Windows. #nd in essence, the recording of call stac s when memory allocations occur really isn't in the function of <MDH. It's a function of the o!erating system. #nd all that we're doing is telling the o!erating system to enable it so that <MDH can utili3e it. #nother tool that's available is called debug diag. #nd that's a !retty !owerful and automated debugger. #nd in addition to being able to trac hea! based allocations, debug diag can su!!ort other ty!es of allocators. For e-am!le, com allocators, virtual memory allocators, etc. Finally, if these automated tools are not able to give you the answer, you can use a debugger command called bang
hea!. #nd bang hea! really &ust allows you to wal the hea! manually and give you statistical information about the different si3e bloc s that e-ist, how big they are, etc. While it's a more manually, labor intensive !rocess, it is sometimes viable as a last resort when the other tools don't wor . Demo? $!oradic Memory lea *ow, let's ta e a loo at an e-am!le of an a!!lication that's lea ing heat memory. Furthermore, much li e the !revious demo on handle lea s, with the same in!ut, this a!!lication will lea different amounts of memory. $o the memory lea is a little bit s!oradic. %he a!!lication is called 5C basic M lea . First argument, the number of threads that you want running. $o I'm going to !ic 45. %he number of iterations !er thread, I'll do 4,555. #nd the slee! time between iterations, which is going to be 3ero. *ow, if I run this and bring u! tas manager, find my !rocess under the !rocesses tab, I'm loo ing for a column that's going to tell me how much memory my !rocess is committing. #nd by default, that's not shown in tas manager. What you've got to do is go to the view menu, select columns, and there's a memory commit si3e. =lic o ay. #nd it brings you the commit si3e column. #t start u! time, it loo s li e a re!lication is using about @H40. *ow, if I run this a!!, we can see that it is growing, ee!s growing. It's about 4.) megs, and now it's done. #nd dro!s bac down to 9.) megs. $o we've gone from about @H50 to about 9.) megs. $o that's an increase that we'd !robably want to ta e a loo at to ma e sure that it's not a real true lea of some sort. Bet's &ust run this one more time to see if there's any difference this time. $o we start at @,(, run it. We see the increase slowly going u!, 4.,, 4.H megs, and down to 9.). $o it is actually about the same this time around. *ow, if we wanted to debug this memory lea , we're going to use a tool called <MDH. %hat's an awesome little tool to use for hea! based memory. #nd there are a cou!le of things to ee! in mind with <MDH. #t a high level, it wor s on the same !rinci!les that the bang H trace command does for handles. It's sna!shot based. $o what you want to do is run <MDH once before the lea starts, let the lea accumulate, and then run <MDH again. #nd then do a com!arison of those two results to find out what allocations are still outstanding. $o that's ind of the high level overview. # cou!le of other caveats. <MDH will give you call stac s for all the memory allocations that are outstanding. /ut in order to get those call stac s it relies on an o!erating system feature in Windows. $o you have to ma e sure to enable that collection mechanism before you run <MDH. #nd the way to do that is there is a tool called > flags, and note here that I'm running the '() version of the debuggers because 5C basic M lea is '(). %hat brings u! a !retty sim!le <I. $ince we're only interested in setting some o!tions for a s!ecific a!!lication, we want to clic the image file tab. We !ut in the name of the a!!lication without the !ath, and then we're able to choose a whole bunch of o!tions here. %he one we're interested in is create user mode stac trace database. $o I'm going to select that, clic o ay. $o now that !art is done. We've told the o!erating system to collect. %he second !art is to get the stac traces, you more than li ely want to get stac traces that are easily digestible meaning that they have symbolic information. #nd since <MDH is a console based a!!, you need to tell it where the symbols are located. #nd that's done through an environment variable. $o you do set, and the environment variable is called I*%IsymbolI!ath e+uals H%%7, this is the Microsoft !ublic symbol server, FsymbolsFdownload, and that's it for the Microsoft symbol server. We !robably do also want to give it the symbol !ath for our own a!!lication. Aight, 5C basic M lea . #nd you can do that by se!arating it by the rest using semicolon, and then s!ecify your own symbol !ath. #nd that's it. $o now we're ready to actually run <MDH. Bet's fire off the a!!lication, same arguments. I'm going to run it in the bac ground. O ay. *ow, we want to ta e our first sna!shot using <MDH. <MDH wor s on the basis of !rocess identifiers. ;ou don't have the lu-ury of telling <MDH a !rocess name. $o the first thing we've got to do is bring u! tas manager, find the !rocess identifier, which is 95(55. <MDH again, it's got to be the '() version because it's an '() a!! D7? allows you to s!ecify the 7ID. *ow, <MDH out!uts everything to the console. $o what I ty!ically end u! doing because it can get +uite verbose is sim!ly !i!ing that to a log file. $o I'm going to call it first a log. %hat's it. It's now done the first sna!shot. $o now, let's continue running the a!! all the way to the end. I have to wait here for a little bit while it's busy lea ing. #ll right. #nd now before it's ready to e-it, we want to run <MDH again to get a second sna!shot. #nd this time, we'll &ust change the name of the file to second.log. Obviously, the 7ID is still the same because it's one of the same a!!lication. #ll right. $o now we got these two logs. %he last bit of the !u33le here is you want to tell <MDH ta e my first log, com!are it to the second log, and only out!ut allocations that are still outstanding. #nd <MDH has a nifty switch for that. It's the DD for diff. 7ass it the first log, !ass it the second log, ty!e the results to some file, the result.log. #nd now we're done. Aesult.log is &ust !lain old te-t file. $o I'm going to o!en it using *ote!ad. First !art here is &ust some hel!. $o if you want to ind
of get more in de!th details, you can ta e a loo at this hel!. Here after the hel! is really the critical !art. ;ou've got it bro en down by memory allocation and by call stac . %he way to read this is the !lus sign says that there's been a net gain. It's been a net gain of around :(50 across 9:,)55 allocations with the following call stac . #nd the only other call stac we have here says it's a net gain of :4 bytes across 9 allocation with the following call stac . $o the one I'm going to ind of focus on here is obviously the one that seems to be lea ing the most. If we loo at that call stac , we see we've got 5C basic M lea is our a!!. It's got some thread wor er, so it !robably s!awned off a thread. #nd then it goes into the = server get sid function, which calls o!erator new and ends u! in allocate hea!. $o this is the e-act !lace in the code where we're allocating memory and not freeing it. #nd really now, it's &ust a matter of going into that !articular !iece of source code and seeing what the !roblem might be. 7reem!tive $trategies *ow that you've s!ent wee s hunting down that elusive memory wee , are there things that we can do or safeguards that we can !ut in !lace to ensure that that won't ha!!en in the future" #s it turns out, one very good tool is called !re fast. It's available with a WD0 free of charge, and it's a static source code analysis tool. #nd it's really, really good at doing an in de!th analysis and finding +uite a few memory lea s. #nd fi-ing those at the !oint of a static analysis is much, much chea!er than doing it later on. *e-t, at select milestones, enable <MDH when you run through your tests and see what ha!!ens. <MDH can find a lot of memory lea s and really give you great !in!ointed almost instructions on where to go and loo in your code to fi- the !roblem. Bastly, we have memory wra!!ers. #nd that's really &ust about letting the com!iler do the wor for you. Imagine that you had a function that has a !ointer to a bloc of memory. #t the end of the function, you're su!!osed to free that memory, but you forget to do so. Well, with constructs such as shared !ointer in =JJ 99, you assign the ownershi! of that !ointer to a class instance. #nd when that function goes out of sco!e, so does the class instance, and the destructor automatically then invo es the freeing of that memory for you. $o it really !ushes the res!onsibility of remembering to call free to the com!iler who enforces it. $ummary %his concludes the resource lea module of advanced Windows debugging course. #nd what we tal ed about today is we did high level overview of what resources are, what some of the resource lea sym!toms are, and then we ind of &um!ed in and too a loo at the two most common ty!es of resources and lea s. 7rimarily handles and hea! based memory. In addition, we too a loo at the tools that allowed us to be much more effective when we hunt down and try to trac down these ty!e of handle lea s and memory lea s. #nd finally, to ma e sure that we don't have to re!eat the debug sessions later on, we loo ed at some !reDem!tive strategies that are available to ma e sure this doesn't ha!!en in the future.