Linux Device Driver Design

Linux device driver design
D.W.Hawkins, California Institute of Technology

4/12/2006 12:30 PM EDT
This tutorial presents the author's practical experience with writing Linux device drivers to control
custom-designed hardware. The tutorial starts by providing an overview of the driver writing
process, and describes several example drivers provided with this tutorial [4]. The reader is
encouraged to experiment with those example drivers on their own x86 system, as it provides the
best learning experience.
The ability of a user-space process to transfer data from multiple PCI boards is contingent on the
implementation of both the hardware and driver. The requirements of both the hardware and
software are presented.
The drivers in this tutorial are written for the Linux 2.6 kernel. The drivers have been built against;
2.6.9-11 (Centos 4.1), 2.6.13, and 2.6.14 for x86 and PowerPC targets. Details that are clearly
described in the book 'Linux Device Drivers' [1], by Corbet, Rubini, and Kroah-Hartman are not
repeated in this tutorial, so the reader is encouraged to obtain a copy.
The Linux 2.6 kernel presents a number of generalized interfaces that the driver writer must first
understand, and then implement for their specific driver. The best way to understand the interfaces
is to write simple drivers that exercise a subset of the kernel driver interfaces. The following
sections describe the interfaces used to implement character device drivers.
Kernel modules
The file simple_module.c implements a very basic kernel module. A device driver is a kernel
module, but kernel modules are also used to add features to the kernel that have nothing to do with
device drivers. Welcome to your first generalized kernel interface.
The basic requirements of a kernel module are that they implement an initialization and an exit
function. Those two functions are identified by the macros module_init()and module_exit(). The
example also shows how to pass load-time parameters to the module, and how to setup logging in a
module.
The code sets up two logging macros; LOG_ERROR()and LOG_DEBUG(). The debug macro can
be removed from the code at compile time (by not defining DEBUG), or can be compiled into the
code and then enabled or disabled via the load-time parameter simple_debug. This method of
adding log messages to code is easier to maintain (eg. disable) than a series of printk()calls littered
throughout the code.
The following shows the driver usage; the // marks are comments, while the $(user) and #(root)
prompts show the commands you enter (bashshell syntax).
So with the load-time parameter simple_debugset to zero, the LOG_DEBUG()message does not
appear in the output. The module load and unload messages are generated using the
LOG_ERROR()macro so that they are always generated.
Device drivers
The file simple_driver.c implements a simple device driver. What makes it a device driver, and not
just a kernel module? In simple_init the driver requests a range of major and minor numbers (the
numbers used to represent device nodes in /dev), it then allocates memory for an array of device-
specific simple_device_tstructures, and then registers the character device, cdev, member of each
structure in the array with the kernel.
Registration of the character device requires a set of file operations, i.e., a kernel-level
implementation of the functions that get called when user-space calls system calls, eg. open(),
read(), write(), ioctl(), lseek(), select(),and mmap(). The file operations are stored as function-
pointers in a structfile_operations; if this code was written in C++, then this structure would be the
base-class, and your implementation of its functions would be an inherited class.
The file simple_driver_test.cis a user-space application that tests the functions of the driver. Install
the module, type ls/dev/simple*and once you see device nodes there, run the test. After the test
finishes, type dmesg to see the kernel-level messages triggered by the user-space test. Remove the
driver, and reinstall it with load-time parameters, eg.
#insmodsimple_driver.kosimple_device_count=3simple_minor_count=2
This creates three devices each responsible for three minor numbers (functions on the device). ls-
al/dev/simple* will show the multiple devices created (and their major/minor numbers).
How did the device nodes magically appear in /dev?i Thats next.
Hotplug, sysfs, and udev

The simple driver initialization code, simple_init, also performs another step, it creates a kernel
object, class_simple or class depending on the kernel version, that creates entries in the sys-file
system, sysfs, in the directory /sys/class. Creation of the class object in the initialization code,
creates the entry /sys/class/simple_driver. Devices managed by the driver are then added to the class
object (see the code), creating the device nodes under /sys/class/simple_driver, eg. if no load-time
parameters are specified, the driver creates one device, and the node
/sys/class/simple_driver/simple_a0 is created.
Why create these class and device 'objects'? The Linux 2.6 kernel supports the concept of hot-
pluggable devices, i.e., devices that can be plugged in while the system is turned on, eg. a USB
camera. In older Linux systems, if you plugged in a camera, you'd have to look at the output of
dmesg to see what the camera was detected as (if at all), and then try and figure out how to get
images off the camera!
The Linux 2.6 system generates 'hot-plug' events every time a kernel object is created and
destroyed, and these hotplug events trigger the execution of scripts in user-space. The
(appropriately written) scripts then automatically populate the /deventries for a device. A nice
feature of these scripts is that you can decide what name to give the device, eg., a camera detected
as a USB mass-storage device might be detected as /dev/sda1in a non-hotplug system, but with
hotplug you can setup the camera name to be /dev/camera, much nicer!
The automatic creation of /dev entries relies on three related kernel infrastructures; hotplug,
sysfs,and udev. The man page, manudev, gives details on how the scripts can be setup to create the
/dev entries with specific permissions, and how to map a kernel name (eg. that used when the
device was added to the class object in simple_init) to a user-space defined name.
On Centos 4.1, the udevconfiguration files are kept in /etc/udev/, the line udev_log=noin in
/etc/udev/udev.conf can be changed to udev_log=yes and hotplug events will be written to the
system log. For example, as root type tail-f/var/log/messages, and then from another terminal install
the simple_driver.ko, and you will see the logging of the hotplug events.
The default name given to a single device created by the simple driver is /dev/simple_a0. With no
udevscripts in-place, the device node is created for use by root only, and is named identically to the
string used in simple_init. The permissions on the device node can be changed by creating a
udevscript containing a single line:
#/etc/udev/permissions.d/20-simple.permissions
simple_*:dwh:mm:0660
This changes the permission on all nodes matching the pattern simple_*to the owner dwh,group
mm, with permissions 0660. The name of the device entry can be changed, or a symbolic link to a
device entry can be created, by adding another script, eg. the following creates a symbolic link to
the first device entry
#/etc/udev/rules.d/20-simple.rules
KERNEL="simple_a0"SYMLINK="simple_00"
The udevman page gives more details on the options for device naming (eg. a user-supplied
program can be run to generate the device name). The automatic creation of /dev entries helps
reduce the contents of /dev to just those devices installed. It also provides flexibility to user-space in
the naming of device nodes.
For example, in the case of PCI devices it allows the PCI location, eg. bus:dev.fn to be remapped
into a meaningful slot number, eg. instead of say a device named /dev/board_00:0c.0, the user-space
name can be mapped to /dev/ board2.
The class_simple interface, as described in the Linux Device Drivers book [1], was removed from
the kernel (according to the ChangeLog for that kernel), and the API changed again slightly. The
parallel port user-space driver, ppdev.c, is a nice small (easily understandable) driver that uses the
class interface. A diff of different kernel versions of this driver can be used to determine the usage
of any API changes (eg. whether a new argument can be assigned NULL).
Kernel timers
The driver simple_timer.c implements a single device that uses two different kernel mechanisms for
delaying the calls read(), write(),and select(). The test program simple_timer_test.c tests the driver.
The driver demonstrates the usage of timers and events.
Interrupts
The driver simple_irq.c implements a single device that uses the parallel port on an x86 PC. To test
this driver, you might need to first remove the printer driver and parallel port driver, i.e., modprobe-
rlp, modprobe-rparport_pc. The driver creates a kernel timer that fires every second.
The timer handler writes a low and then high to all the data lines on the parallel port. If a data line,
one of pins 2 through 9, is jumpered to the interrupt line, pin 10, then an IRQ will be generated
every second. The IRQ handler unblocks a blocked read(), write(),or select().
If a data line is not jumpered to the IRQ line, then the blocked calls will timeout (2s) and continue
anyway. The test program simple_irq_test.c tests the driver. The driver demonstrates the usage of
timers, IRQs, and events with timeouts.
Data buffering
The driver simple_buffer.c implements a single device that also uses the parallel port on an x86 PC
(so you will need to remove simple_irq to test it). This driver is similar to simple_irq.c with the
change that IRQs write a time-stamp to an internal buffer, user-space write()writes to that buffer,
and read()reads from the buffer. The following are some tests that can be performed using standard
command-line tools:
1) Connect the parallel port IRQ to a data line. Install the driver named insmodsimple_buffer.ko.
Once the /dev/simplenode is valid, type cat/dev/simple. A UTC timestamp will be printed every
second.
2) Remove the parallel port jumper. Remove the driver. Install the driver and disable the timer and
timeout as follows:
insmodsimple_buffer.kosimple_timer_enable=0simple_timeout_enable=0.
On one terminal type "cat/dev/simple", on another type echo "Hello">/dev/simple". (You can also
leave the timer enabled and it will just write messages to the log file).
3) Combine the first two tests (remove and re-install the driver without any load-time parameters);
the IRQ will add a complete timestamp message every second, while write will add a complete
string (whenever the user triggers a write). No messages will be interrupted, since each procedure
locks the internal buffer.
The test shows that the driver works as one would expect, however, take a look at the source for the
details. The internal buffer is a resource that is shared between read() (eg. one process), write() (eg.
another process), and the IRQ handler (interrupt context).
The driver uses a spin-lock to protect access to the buffer (and its associated buffer count and
pointers). Without this protection, an IRQ could interrupt a write, and insert a timestamp into the
middle of the string echoed into the driver. Of course in a real driver, the results could be more
disastrous.
If the resource (buffer) being protected by the driver was only ever accessed by processes, then a
semaphore can be used to protect it. Semaphores can be used to block a process, causing it to sleep
while waiting for a resource. Spin-locks are not quite so forgiving.
You are not allowed to sleep, or call a function that might sleep, while holding a spin-lock. Make
sure to build your driver development kernel with CONFIG_DEBUG_SPINLOCK and
CONFIG_DEBUG_SPINLOCK_SLEEP enabled, and the kernel will give you a nice reminder if
you try to do something bad (eg. calling kmalloc while holding a lock).
The write() and read() operations of the driver need to copy data from (or to) user-space to (or from)
a kernel buffer. However, a copy_from/to_user can sleep, so there is no way to copy directly to the
spin-lock protected buffer!
There's also the following write sequencing issue; to write data into the buffer, you first need to
check whether there is space. However, the spin-lock needs to be held to check the buffer state, so
ideally you would hold the lock, check for space, release the lock, and then copy a matching amount
of user-data to the kernel. But, since you are not holding the lock, an IRQ can come along and use
up your space!
The solution, shown in the driver code, is to first copy all the user data into a kernel buffer, and then
hold the lock while checking for space. This allows the (sleepable) copy and allocation calls to be
performed before holding the lock. Of course in the case of a full buffer and non-blocking write, the
allocation and copy from user-space was a waste of time.
The code that holds the spin-lock, checks for a condition, and then goes to sleep on a wait-queue if
the condition is not met, should look eerily familiar to anyone who has programmed with Pthreads;
it is the same pattern of code as used with a mutex and condition variable.
A mutex is used to protect a resource, while a condition variable is used to put a thread to sleep
while waiting for some other thread to signal it that the condition has changed. The nice thing about
this analogy is that you can write pthreads code to simulate driver buffering operations to 'figure it
out' outside of the kernel.
The buffering used in the simple buffer driver is a bit contrived in that there are two 'producers'
writing to the buffer, and one 'consumer'. A more likely scenario for a driver would be to have a
buffer contended for by a single producer (say the receive IRQ), and a single consumer (say read),
and another separate buffer for a single producer (write) and consumer (transmit IRQ).
But even in this situation, you can run into problems if the read from the buffer takes an excessive
amount of time, blocking new data from the receive IRQ. One solution to this issue is to use two
buffers for each producer-consumer pair; eg. the receive IRQ is initialized to point to an empty
buffer, and receive IRQs fill the buffer until a read is issued, at that point IRQ buffer is passed to
read, and the IRQ gets the second empty buffer.
Once read has consumed the contents of the first buffer, if the second buffer in-use by the IRQ has
new data, then the buffers are swapped again. In this scheme, the lock only needs to be held to swap
the buffers, and since read does not hold the lock once it has a valid buffer, a copy to user-space
from the kernel buffer is allowed, removing the need to use an intermediate buffer as shown in the
simple buffer driver. The kernel tty layer uses this form of buffering scheme and refers to it as flip-
buffering (see linux/tty.h).
The simple buffer driver has (at least) two practical applications. If you install it and "cat" the timer
generated time stamps into a file, a plot of the difierence between consecutive time stamps minus 1
second, will show the error in the kernel's ability to generate a 1 second delay.
Running some testsbr> In a test on an HP Omnibook 6100 PIII 1GHz laptop, the error was
approximately -130µs (i.e., slightly less than 1 second). The test was started on a 1 second
boundary, and over the space of 10 minutes, the timer was firing 100ms earlier than a 1 second
boundary. The second test determines how good NTP operates. Install the driver with the timer and
timeout disabled. Connect up the 1pps tick from your NTP server's GPS unit to the parallel port
interrupt of your PC, make sure your PC NTP daemon is running, and catthe IRQ generated
timestamps.
The observed error of the measured timestamp relative to that same timestamp rounded to the
nearest second was about ±0.5ms. If the test PC (laptop) had its ethernet cable disconnected, or the
NTP daemon was stopped, the error of the logged timestamps relative to the GPS 1pps tick would
gradually increase (100 to 200µs over 10 minutes). If you had a method of generating a higher-
frequency square-wave that was also locked to GPS, then you could determine the interrupt latency,
and interrupt handling overhead, of the kernel by hammering the IRQ pin at a few kilohertz.
A 'real-world' PCI driver

The experience presented in this document was gained during the development of the Caltech-
OVRO Broadband Reconfigurable Array (COBRA) Correlator System. The hardware developed is
documented at www.ovro.caltech.edu/~dwh/correlator.
The hardware is currently in use on several radio astronomy projects, eg. the SZ Array
(http://astro.uchicago.edu/sza/) and the CARMA array (http://www.mmarray.org). The cPCI
digitizer and correlator boards used in the correlator system contain a PLX9054 PCI interface, a
Texas Instruments DSP, Altera FLEX10K FPGAs, and on the digitizer, 1GHz analog-to-digital
converters.
The digitizer output routes to the FPGAs on the digitizer board, where data is digitally filtered,
delayed, and routed to front-panel high-speed connectors. The data travels over LVDS cabling
(Ultra-SCSI cables) to the correlator boards, where FPGAs cross-correlate and average the data.
The on-board DSPs retrieve auto-and cross-correlation results from the FPGAs, perform FFTs,
further corrections, and average the data for 100ms to 500ms. Data is then transferred to a Linux
host.
The system uses a GPS based NTP server with a 1pps output. The 1pps signal is used to derive a
hardware heartbeat, so that the 100ms and 500ms transfers are aligned with real-time. The Linux
hosts run NTP pointing to the NTP server, and check that data from boards arrives within a 50ms
window relative to a 100ms or 500ms boundary.
The Linux driver used in the COBRA system is shown graphically in Figure 1, below [2]. The
driver implements several character device interfaces to the board; a terminal-like interface with
standard-input, output, and error, a read/write control interface, a read-only data interface, and a
read-only monitoring interface.
The reason for using multiple devices, rather than a complex scheme of I/O control was determined
by the usage of the driver. For example, one objective was to enable the use of standard command
line tools like cat, od(octal dump), echo,and dd. These tools know nothing of I/O control calls, so
need to be directed to a device node of a specific 'personality'.
Figure 1: COBRA device driver block diagram. The block

diagram shows the relationship between the /devnodes
accessed by user-space applications and the files that
implement the driver.
The COBRA control system code controls up to 20 boards in a single sub-system, and data must be
collected from each board at about the same time. The standard method for dealing with multiple
sources of data is to use the select() call, which uses file-descriptors. So by separating out the data
device and monitor device functionality at the driver-level, a user-space server can run a thread
containing a select() call that collects all the data from all boards, and serves that data up to clients.
Then another thread, or another process even, can run a monitor server containing a thread calling
select() on all the monitor file descriptors.
Dr. David Hawkins, Senior Scientist at the California Institute of Technology, is currently involved
with the design and development of high-speed digital correlator systems for Caltech, U. Chicago,
and the CARMA (Caltech,Berkeley, U. Illonois, and U. Maryland) radio observatories.
This article is excerpted from a paper of the same name presented at the Embedded Systems
Conference Silicon Valley 2006. Used with permission of the Embedded Systems Conference.
For more information, please visit www.embedded.com/esc/sv.
References
[1] J. Corbet, A. Rubini, and G. Kroah-Hartman. LinuxDeviceDrivers. O'Reilly, 3nd edition, 2005.
[2] D. Hawkins. COBRA device driver. Caltech-OVRO documentation, 2004.
(www.ovro.caltech.edu/fidwh/correlator/pdf/cobra driver.pdf).
[3] D. Hawkins. PLX-9054 PCI Performance Tests. Caltech-OVRO documentation, 2004.
(www.ovro.caltech.edu/fidwh/correlator/pdf/pci performance.pdf).
[4] D. Hawkins. Linux driver design source code. Caltech-OVRO documentation, 2005.
(www.ovro.caltech.edu/fidwh/correlator/software/driver design.tar.gz).

Linux Device Driver Design

Uploaded by

Linux Device Driver Design

Uploaded by

Linux device driver design

D.W.Hawkins, California Institute of Technology

Hotplug, sysfs, and udev

A 'real-world' PCI driver

Figure 1: COBRA device driver block diagram. The block

You might also like