Understanding Kernel Oops
Understanding Kernel Oops
HOME REVIEWS HOW-TOS CODING INTERVIEWS FEATURES OVERVIEW BLOGS SERIES IT ADMIN
Understanding a kernel panic and doing the forensics to trace the bug
is considered a hacker’s job. This is a complex task that requires sound
knowledge of both the architecture you are working on, and the
internals of the Linux kernel. Depending on type of error detected by
the kernel, panics in the Linux kernel are classified as hard panics
(Aiee!) and soft panics (Oops!). This article explains the workings of a
Linux kernel ‘Oops’, helps to create a simple version, and then debug it.
It is mainly intended for beginners getting into Linux kernel
development, who need to debug the kernel. Knowledge of the Linux
kernel, and C programming, is assumed.
An “Oops” is what the kernel throws at us when it finds something faulty, or an exception, in the
kernel code. It’s somewhat like the segfaults of user-space. An Oops dumps its message on the
console; it contains the processor status and the CPU registers of when the fault occurred. The
offending process that triggered this Oops gets killed without releasing locks or cleaning up
structures. The system may not even resume its normal operations sometimes; this is called an
unstable state. Once an Oops has occurred, the system cannot be trusted any further.
Let’s try to generate an Oops message with sample code, and try to understand the dump.
1 #include <linux/kernel.h>
2 #include <linux/module.h>
3 #include <linux/init.h>
4
5 static void create_oops() {
6 *(int *)0 = 0;
7 }
8
9 static int __init my_oops_init(void) {
10 printk("oops from the module\n");
11 create_oops();
12 return (0);
13 }
14 static void __exit my_oops_exit(void) {
15 printk("Goodbye world\n");
16 }
17
18 module_init(my_oops_init);
19 module_exit(my_oops_exit);
obj-m := oops.o
KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)
SYM=$(PWD)
all:
$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
This is the error code value in hex. Each bit has a significance of its own:
[#1] — this value is the number of times the Oops occurred. Multiple Oops can be triggered as
a cascading effect of the first one.
CPU 1
The Tainted flag points to P here. Each flag has its own meaning. A few other flags, and their
meanings, picked up from kernel/panic.c:
RIP is the CPU register containing the address of the instruction that is getting executed. 0010
comes from the code segment register. my_oops_init+0x12/0x21 is the ><sym+bolthe
ofset/lngh.
Stack:
ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060
0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9
ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000
Call Trace:
[<ffffffff8100205f>] do_one_initcall+0x59/0x154
[<ffffffff8107aac9>] sys_init_module+0xd1/0x230
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
The above is the call trace — the list of functions being called just before the Oops occurred.
Code: <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00
The Code is a hex-dump of the section of machine code that was being run at the time the Oops
occurred.
Next, add the symbol file to the debugger. The add-symbol-file command’s first argument is
oops.o and the second argument is the address of the text section of the module. You can obtain
this address from /sys/module/oops/sections/.init.text (where oops is the module name):
(y or n) y
Reading symbols from /code/oops/oops.o...done.
From the RIP instruction line, we can get the name of the offending function, and disassemble it.
Now, to pin point the actual line of offending code, we add the starting address and the offset. The
offset is available in the same RIP instruction line. In our case, we are adding
0x0000000000000038 + 0x012 = 0x000000000000004a. This points to the movl instruction.
References
The kerneloops.org website can be used to pick up a lot of Oops messages to debug. The Linux
kernel documentation directory has information about Oops — kernel/Documentation/oops-
tracing.txt. This, and numerous other online resources, were used while creating this article.
Related Posts:
Tags: C, Debugging, Fedora, GDB, kernel aiee, kernel code, kernel development, kernel oops, kernel panic, kerneloops.org, LFY January
2011, Linux kernel, Loadable kernel modules, makefile, modprobe, processor status, segfaults, unstable state
Surya Prabhakar
The author is an engineering advisor in the Product Group at Dell India R&D
Centre, Bengaluru, and has eight years of experience in Linux. He spends most
of his time hacking and playing around with Linux.
Linux Professionals in High Demand What You Don't Know About Shale Gas May
3 comments Surprise You Exxon
Playing Hide and Seek with Passwords Snack Like a Leprechaun on St. Patrick's
2 comments Day with Pot o' Gold Chex Mix Tablespoon
Introducing Samba 4 Now, Even More Belgian Workers Protest Against Austerity,
Awesomeness 1 comment Job Cuts Businessweek
Cyber Attacks Explained: The Botnet Army My Travel Tech Essentials My Life Scoop
1 comment
3 comments ★ 0
Leave a message...
I was very over the moon to find this site.I wanted to offer
you on this great presume from!! I obviously enjoying every bantam speck of it
and I suffer with you bookmarked to monitor elsewhere novel pieces you post.
Search for:
Search
Get Connected
+1,888
Find us on Facebook
All published articles are released under Creative Commons Attribution-ShareAlike 3.0 Unported License, unless otherwise noted.
LINUX For You is powered by WordPress, which gladly sits on top of a CentOS-based LEMP stack.