A kernel module is an extension to the operating system. The module resides in the same privilege level of the OS (the highest) and therefore can access every resource of the system. Under Linux, a module is nothing more than a C program with a well-defined interface to communicate with user processes and with other parts of the operating system.
In the following sections we will use the term device driver instead of kernel module. A device driver is a kernel module specialized in I/O communication with some sort of device. The term device has a very wide meaning and it does not exclusively refer to an external or physical system. In general, a device is some kind of resource such as a floppy disk, a printer, a mouse, but also a special region of memory, a virtual terminal or a message box.
An excellent source of information on this topic is Linux Device Drivers, by Alessandro Rubini and Jonathan Corbet. This book, published by O'Reilly, is also available online in PDF and HTML formats (see resources).
Writing device drivers for Linux is very easy. By the end of this paper you should be capable of writing a simple device driver. At that point, if you found all of this fascinating, you should definitely read the book above - in any case, take a look at it!
Linux distinguishes between three types of devices. Each module implements only one of these types and thus is classifiable as a character module, block module or a network module.
/dev
directory. Examples are /dev/tty
(terminal) and /dev/port
(I/O port access). /dev/floppy/0u1440
(first floppy, 1.4 MB) and
/dev/hda
(first hard disk)./dev
directory).It is possible to identify a module's class by using the
ls -l
command. For example:
[ealtieri@italia os]$ ls -l /dev/tty
crw-rw-rw- 1 root root 5, 0 Jun 15 12:59 /dev/tty
The "c"
in the file properties shows that the
tty
(terminal) device is a character device.
In this document we will discuss character devices only.
One of the greatest features of Linux and UNIX is that almost every
resource on the system looks like a file, including devices. As
shown in the previous section, device files (called nodes) are
located under the /dev
directory. Each of these device
files is associated with a particular module in the kernel. If the
kernel is compiled with Device File System support, the module
creates /dev
entries automatically at load-time and
removes them when it is unloaded.
Because devices are files, we can issue file operations on
them such as open()
, read()
,
write()
and close()
. Every time a file
operation is issued on a device file, the kernel module associated
with such device must handle that operation. For example:
fd = open("/dev/hda", O_RDONLY);
The above operation opens the /dev/hda
device (first
hard disk) for read only (O_RDONLY). When open()
is
issued, the operating system knows that /dev/hda
is a
device file. Therefore it locates the kernel module associated with
the device and calls the device_open()
file operation
handler in that module. At this point it is up to the device driver
to initialize the device and maybe return an error code.
There must be a handler for every possible file operation (listed below). However, the device driver can choose default actions for some operations.
How does the OS know which module is associated with a
/dev
entry? Each module has to register itself
using the devfs_register()
function. This function
automatically creates an entry in the /dev
directory. The module also uses this function to tell the operating
system the address of the file-operations handler functions, as
shown below. Thus, the call to open()
above can be
translated to the appropriate device-specific function provided by
devfs_register()
.
/* handlers for the file operations */ struct file_operations mydev_fops = { open : mydev_open, /* handler for the open() operation */ release: mydev_close, /* handler for the close() operation */ /* NULL (default actions) */ }; /* this function is called when the module is loaded */ int mydev_init(void) { ... devid = devfs_register( ... "mydev", /* create /dev/mydev entry */ ... &mydev_fops, /* file ops handlers (see above) */ ... );
There are several file operations that a module can
implement. These are defined in the file_operations
structure, in include/linux/fs.h
(line 817). For convenience,
this structure has been reported below. For a more detailed
description of these operations see Linux Device
Drivers, page 64. In this document we will consider only the
simplest file operations: read()
and
write()
.
struct file_operations { struct module *owner; loff_t (*llseek) (struct file *, loff_t, int); ssize_t (*read) (struct file *, char *, size_t, loff_t *); ssize_t (*write) (struct file *, const char *, size_t, loff_t *); int (*readdir) (struct file *, void *, filldir_t); unsigned int (*poll) (struct file *, struct poll_table_struct *); int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long); int (*mmap) (struct file *, struct vm_area_struct *); int (*open) (struct inode *, struct file *); int (*flush) (struct file *); int (*release) (struct inode *, struct file *); int (*fsync) (struct file *, struct dentry *, int datasync); int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *); ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *); ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); };
Using the above file operations a program can transfer data to and from a device driver, like in the example below:
/* PROGRAM a.out - reads the first sector of a floppy disk */ int main(int argc, char *argv[]) { int fd; char sector[512]; ssize_t count; fd = open("/dev/fd0", O_RDONLY); /* open floppy device */ count = read(fd, sector, 512); /* read one sector */ close(fd); /* close device */ /* do something else... */ return(0); }
If at load time the driver has to register itself, it also needs to unregister when unloaded, as shown below:
void mydev_exit(void) { devfs_unregister(devid); ...
dev_unregister()
unregisters the device and automatically
removes the /dev
entry created at load time.
So far we have assumed that the mydev_init()
and
mydev_exit()
are called at load time and unload time
respectively. We actually need to tell this information explicitly
to the operating system using the following macros:
module_init(mydev_init);
module_exit(mydev_exit);
These macros are placed generally at the end of the file.
![]() |
skeldev.c is a basic implementation of a device
driver. It registers itself and creates a /dev/skeldev.c entry. The
driver does not handle any file operation - the default OS actions
apply.
|
To compile a kernel module just type the following:
[ealtieri@italia dev]$ gcc -c -D __KERNEL__ -D MODULE skeldev.c
Notice the -c
flag which tells GCC to compile
the source file without generating an executable. The command above
produces the skeldev.o
object file. This object can
now be injected in the kernel using the (privileged)
insmod
command:
[ealtieri@italia dev]$ sudo /sbin/insmod skeldev.o
If insmod
does not output any error message, the
module has been correctly inserted in the kernel. You can see
this with the "List Modules" command (lsmod
). Also,
the module should have created the /dev/skeldev
entry.
[ealtieri@italia dev]$ /sbin/lsmod Module Size Used by Tainted: P skeldev 644 0 (unused) [ealtieri@italia dev]$ ls -l /dev/skeldev crw-rw-rw- 1 root root 8, 2 Dec 31 1969 /dev/skeldev
Communication between a device driver and a user process occurs
mainly with the read()
and write()
file
operations. Using the basic device driver skeleton above as
reference, we can add handlers for the read()
and
write()
file operations.
/* read from device */ static ssize_t skel_read(struct file *filp, char *buf, size_t count, loff_t *offp); /* write to device */ static ssize_t skel_write(struct file *filp, const char *buf, size_t count, loff_t *offp); /* file operations handlers */ static struct file_operations skel_fops = { read : skel_read, /* handler for the read() operation */ write : skel_write, /* handler for the write() operation */ /* NULL (default) */ }
The read()
file operation takes four arguments:
struct file* filp
, a pointer to a
file
structure, defined in
include/linux/fs.h
(line 519) . This structure referst
to an open file, in this case the device node. The operating system
allocates this structure for the device driver when the device node
is opened.char *buf
, a user-space buffer where the data to
be read from the device driver has to be copied to.size_t count
, the number of bytes to be read from
the device.loff_t* offp
, indicates the file position the user
is accessing.
The function returns the number of bytes copied to the user buffer,
which may or may not be equal to count
.
Following is a simple implementation of skell_read()
which
copies data from the driver buffer skel_buffer[]
to
the user buffer:
/* device buffer */ static unsigned char skel_buffer[SKEL_BUFMAX]; ... /* read from device */ static ssize_t skel_read(struct file *filp, char *buf, size_t count, loff_t *offp) { if (count > SKEL_BUFMAX) count = SKEL_BUFMAX; /* trim data */ copy_to_user(buf, skel_buffer, count); return(count); }
The write()
file operation is similar to
read()
, but in this case data flows from the user
buffer to the driver buffer:
/* write to device */ static ssize_t skel_write(struct file *filp, const char *buf, size_t count, loff_t *offp) { if (count > SKEL_BUFMAX) count = SKEL_BUFMAX; copy_from_user(skel_buffer, buf, count); return(count); }
![]() |
skeldev2.c is a basic
device driver implementation with read() and
write() file operations. |
To test the skeldev2.c
device driver we can write a
simple C program that opens the skeldev2 device, writes some data
and then retreives it. This is shown below.
int main(void) { int fd; ssize_t count; char buf[50]; /* open device */ if ((fd = open("/dev/skeldev2", O_RDWR)) < 0) { perror("open()"); exit(1); } /* write to device */ memset(buf, 0x00, sizeof(buf)); /* clear buffer */ strcpy(buf, "Hello World!"); count = write(fd, buf, sizeof(buf)); printf("Written %d bytes to device\n", count); /* read from device */ memset(buf, 0x00, sizeof(buf)); /* clear buffer */ count = read(fd, buf, sizeof(buf)); printf("Read %d bytes from device: %s\n", count, buf); /* close device */ close(fd); exit(0); }
![]() |
test_skel2.c is a test
program for the skeldev2.c device. |
Because a kernel module does not run in user space, the C libraries
are useless. This means that the familiar printf()
function will not work in a kernel module. Fortunately, the kernel
provides a similar function, printk()
, which your
device driver can use to output messages. However, there are some
important differences between these two functions:
printk()
requires a "<x>"
at
the very beginning of the string to be printed, where "x" is a
number from 0 to 9 indicating the priority of the message. For
example:
printk("<7>Hello world!\n");
will print a message at the lowest (DEBUG)
priority. Priorities are defined in include/linux/kernel.h
(line 30). To each "<x>"
priority is associated a
KERN_xxx
symbol, so the line above could be rewritten
as following: printk(KERN_DEBUG "Hello world!\n");
printk()
will not print the message to standard
output. Instead, the message is sent to the system log, located
in /var/log/messages
. To display the most recent
messages in this log you can use the
tail
command:
[ealtieri@italia linux]$ sudo tail -5 /var/log/messages Jun 18 21:20:26 italia kernel: VFS: Disk change detected on device sr(11,0) Jun 18 21:20:26 italia kernel: VFS: Disk change detected on device sr(11,0) Jun 18 21:20:26 italia kernel: VFS: Disk change detected on device ide1(22,0) Jun 18 21:20:26 italia kernel: cdrom: This disc doesn't have any tracks I recognize! Jun 18 21:59:54 italia kernel: Hello world!
A different way to communicate with device drivers is the
/proc
file system. For example, let's examine the
/proc/meminfo
file:
[ealtieri@italia os]$ ls -l /proc/meminfo -r--r--r-- 1 root root 0 Jun 19 11:20 /proc/meminfo [ealtieri@italia os]$ cat /proc/meminfo total: used: free: shared: buffers: cached: Mem: 525320192 447213568 78106624 0 81035264 157761536 Swap: 271392768 0 271392768 MemTotal: 513008 kB MemFree: 76276 kB MemShared: 0 kB Buffers: 79136 kB ...
As you can see from the ls
command above, the
/proc/meminfo
file has size zero (number left
to the date). However, when we show the contents of the file with
cat
, the file appears to contain information. How can
we explain this? The trick is that files under the
/proc
file system are generated when they are
read. Each of these files is associated with a module in the
kernel. When the file is read, using cat
for example,
the kernel locates its module and calls a function to generate the
contents of the file.
/proc
operations are handled in a similar way to the
file operations described earlier. You can define a handler for
a read operation on a /proc
entry and one for a write
operation. In this document we will describe only the read handler.
First, a proc
entry is created at load time using the
create_proc_read_entry()
function:
/* create /proc/xxxx entry */ proc = create_proc_read_entry ( skel_name, /* entry name (/proc/skeldev) */ 0, /* default mode */ NULL, /* parent directory (NULL=/proc) */ skel_read_proc, /* read() operation handler */ NULL /* other data */ );
The code above registers skel_read_proc()
as the
handler for the read operation on the /proc/skeldev
entry. This function must have the following prototype:
static int skel_read_proc(char *buf, char
**start, off_t offset, int count, int *eof, void *data)
The contents of the /proc/skeldev
file is generated
"on the fly" by writing to the buf
parameter. A simple
implementation of skel_read_proc()
could be the
following:
/* Handler for /proc/skeldev read */ static int skel_read_proc(char *buf, char **start, off_t offset, int count, int *eof, void *data) { if (count > SKEL_BUFMAX) count = SKEL_BUFMAX; memcpy(buf, skel_buffer, count); /* generate file */ *eof = 1; /* end of file */ return(count); } /* skel_read_proc() */
The functions can be tested as following:
[ealtieri@italia dev]$ sudo /sbin/insmod skeldev3.o [ealtieri@italia dev]$ echo "Hello World" > /dev/skeldev [ealtieri@italia dev]$ cat /proc/skeldev Hello World [ealtieri@italia dev]$
![]() |
skeldev3.c implements
the /proc/skeldev entry. |
The /proc
filesystem is a simple way to communicate
system information to user processes and is extensively used under
Linux. Examples of /proc
entries are
/proc/meminfo
, which displays information about memory,
and /proc/cpuinfo
, which displays processor type and
features.
Writing handlers for /proc
files can become really
complicated if the data to be output is big. In this case, several
read()
calls may be needed to retrieve the whole
data. In between successive read()
calls, the module
or kernel's data structures being read can change, causing
inconsistency in the output.
More information about the /proc
filesystem can be found
on Linux Device Drivers, page 103.
/dev/null
device discards everything that
is being written to it and returns nothing when reading from
it. This device is useful if you want to hide the messages output
by a program. For example: "gcc example.c >
/dev/null"
will hide all of the compiler's
messages. How would you implement this device driver?skeldev2.c
device driver so that the
read()
file operation returns the data written to the
buffer in opposite order.![]() |
Linux Device Drivers, Second Edition, by Alessandro Rubini and Jonathan Corbet. Original version published by O'Reilly & Associates. Available online in PDF and HTML format. |
![]() |
skeldev.c is a basic
implementation of a device driver. It registers itself and creates
a /dev/skeldev.c entry. The driver does not handle any file
operation - the default OS actions apply. |
![]() |
skeldev2.c is a basic
device driver implementation with read() and
write() file operations. |
![]() |
test_skel2.c is a test
program for the skeldev2.c device. |
![]() |
skeldev3.c implements
the /proc/skeldev entry. |