CSC 262

How to write a kernel module for Linux
 

This lab will go over the basics of writing a kernel module for Linux. Recall that a kernel module is a piece of code that can be added into the operating system while it is running, and can then interact with kernel data structures. Typically this might be used to add a new device driver. We will create a kernel module that will look to the operating system like a device driver, but in fact it won't control any physical device. Instead, it will help us to peek inside the OS to find out what is going on. Later labs will develop more complicated modules; for now, we just want to test out the mechanics of creating, inserting and removing a module, and of writing code to move data between kernel memory and user memory.

The remainder of this lab has details on different aspects of kernel modules. You will need to be using the lab kernel to complete it, so make sure you save a local copy of this web page and any other resources you might want access to. (The lab kernel has no network access, but you should be able to see any previously cached pages.) Also, make sure you know how to save your work on a floppy or zip disk.

You should concentrate on the following tasks: First, read through the orientation material in the first three sections, until you get to the part on compiling and loading a module. Practice the compiling and loading process using the sample file skeldev2.c provided below. Next, compile and run the program that uses this module, test_skel2.c, also provided below. Then unload the module. Finally, make the modifications specified in the To Do section. The section on the /proc file system is for reference only -- it is similar to the device file system in many ways. We will eventually create a /proc module, but won't try it for now.

Contents

  1. What is a kernel module?
  2. Classes of devices and modules
  3. The file interface
  4. Compiling and loading a module
  5. Communicating with a module
  6. Testing the module
  7. Debugging using printk()
  8. The /proc file system
  9. Questions
  10. Resources

What is a kernel module?

A kernel module is an extension to the operating system. The module resides in the same privilege level of the OS (the highest) and therefore can access every resource of the system. Under Linux, a module is nothing more than a C program with a well-defined interface to communicate with user processes and with other parts of the operating system.

In the following sections we will use the term device driver instead of kernel module. A device driver is a kernel module specialized in I/O communication with some sort of device. The term device has a very wide meaning and it does not exclusively refer to an external or physical system. In general, a device is some kind of resource such as a floppy disk, a printer, a mouse, but also a special region of memory, a virtual terminal or a message box.

An excellent source of information on this topic is Linux Device Drivers, by Alessandro Rubini and Jonathan Corbet. This book, published by O'Reilly, is also available online in PDF and HTML formats (see resources).

Writing device drivers for Linux is very easy. By the end of this paper you should be capable of writing a simple device driver. At that point, if you found all of this fascinating, you should definitely read the book above - in any case, take a look at it!

Classes of devices and modules

Linux distinguishes between three types of devices. Each module implements only one of these types and thus is classifiable as a character module, block module or a network module.

It is possible to identify a module's class by using the ls -l command. For example:

[ealtieri@italia os]$ ls -l /dev/tty
crw-rw-rw- 1 root root 5,0 Jun 15 12:59 /dev/tty

The "c" in the file properties shows that the tty (terminal) device is a character device.

In this document we will discuss character devices only.

The File Interface

One of the greatest features of Linux and UNIX is that almost every resource on the system looks like a file, including devices. As shown in the previous section, device files (called nodes) are located under the /dev directory. Each of these device files is associated with a particular module in the kernel. If the kernel is compiled with Device File System support, the module creates /dev entries automatically at load-time and removes them when it is unloaded.

Kernel Modules

Because devices are files, we can issue file operations on them such as open(), read(), write() and close(). Every time a file operation is issued on a device file, the kernel module associated with such device must handle that operation. For example:

fd = open("/dev/hda", O_RDONLY);

The above operation opens the /dev/hda device (first hard disk) for read only (O_RDONLY). When open() is issued, the operating system knows that /dev/hda is a device file. Therefore it locates the kernel module associated with the device and calls the device_open() file operation handler in that module. At this point it is up to the device driver to initialize the device and maybe return an error code.

There must be a handler for every possible file operation (listed below). However, the device driver can choose default actions for some operations.

How does the OS know which module is associated with a /dev entry? Each module has to register itself using the devfs_register() function. This function automatically creates an entry in the /dev directory. The module also uses this function to tell the operating system the address of the file-operations handler functions, as shown below. Thus, the call to open() above can be translated to the appropriate device-specific function provided by devfs_register().

/* handlers for the file operations */
struct file_operations mydev_fops = {
  open: mydev_open,/* handler for the open() operation */
  release: mydev_close,  /* handler for the close() operation*/
 /* NULL (default actions) */
};

/* this function is called when the module is loaded */
int mydev_init(void)
{
  ...
  devid = devfs_register( ...  
  "mydev",  /* create /dev/mydev entry */
  ...
  &mydev_fops, /* file ops handlers (see above) */
  ... );

There are several file operations that a module can implement. These are defined in the file_operations structure, in include/linux/fs.h (line 817). For convenience, this structure has been reported below. For a more detailed description of these operations see Linux Device Drivers, page 64. In this document we will consider only the simplest file operations: read() and write().

struct file_operations {
  struct module *owner;
	loff_t (*llseek) (struct file *, loff_t, int);
	ssize_t (*read) (struct file *, char *, size_t, loff_t *);
	ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
	int (*readdir) (struct file *, void *, filldir_t);
	unsigned int (*poll) (struct file *, struct poll_table_struct *);
	int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
	int (*mmap) (struct file *, struct vm_area_struct *);
	int (*open) (struct inode *, struct file *);
	int (*flush) (struct file *);
	int (*release) (struct inode *, struct file *);
	int (*fsync) (struct file *, struct dentry *, int datasync);
	int (*fasync) (int, struct file *, int);
	int (*lock) (struct file *, int, struct file_lock *);
	ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *);
	ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);
	ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
	unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
};

Using the above file operations a program can transfer data to and from a device driver, like in the example below:

/* PROGRAM a.out - reads the first sector of a floppy disk */

int main(int argc, char *argv[]) 
{
  int fd;
  char sector[512];
	ssize_t count;

  fd = open("/dev/fd0", O_RDONLY); /* open floppy device */

  count = read(fd, sector, 512);/* read one sector */

  close(fd);  /* close device */

  /* do something else... */

  return(0);
}

If at load time the driver has to register itself, it also needs to unregister when unloaded, as shown below:

void mydev_exit(void)
{
  devfs_unregister(devid);
  ...

dev_unregister() unregisters the device and automatically removes the /dev entry created at load time.

So far we have assumed that the mydev_init() and mydev_exit() are called at load time and unload time respectively. We actually need to tell this information explicitly to the operating system using the following macros:

module_init(mydev_init);
module_exit(mydev_exit);

These macros are placed generally at the end of the file.

skeldev.c skeldev.c is a basic implementation of a device driver. It registers itself and creates a /dev/skeldev.c entry. The driver does not handle any file operation - the default OS actions apply.

Compiling and loading a module

Compiling a kernel module with gcc requires the use of a number of compiler flags (you may need to scroll the window to see them all):

[ealtieri@italia dev]$ gcc -c -nostdinc -I /usr/src/linux-2.4/include/ -I /usr/lib/gcc-lib/i386-redhat-linux/3.2/include/ -D MODULE -D __KERNEL__ example.c

Each option is explained below:

The command above should produce the skeldev.o object file. This object can now be injected in the kernel using the (privileged) insmod command:

[ealtieri@italia dev]$ sudo /sbin/insmod skeldev.o

If insmod does not output any error message, the module has been correctly inserted in the kernel. You can see this with the "List Modules" command (lsmod). Also, the module should have created the /dev/skeldev entry.

[ealtieri@italia dev]$ /sbin/lsmod 
ModuleSize  Used by Tainted: P  
skeldev6440 (unused)

[ealtieri@italia dev]$ ls -l /dev/skeldev 
crw-rw-rw- 1 root  root 8,2 Dec 31  1969 /dev/skeldev

Special note: There is a bug in the Linux kernel makefiles that can sometimes cause insmod to fail with a message about and undefined symbol. If this happens, the following workaround should fix the problem. (Unfortunately it requires recompiling the kernel.) When you have finished the steps below, install the new kernel image in the /boot directory and reboot the machine.

mv .config ..
make mrproper
mv ../.config .
make oldconfig
make dep clean bzImage modules

Communicating with a module

Communication between a device driver and a user process occurs mainly with the read() and write() file operations. Using the basic device driver skeleton above as reference, we can add handlers for the read() and write() file operations.

/* read from device */
static ssize_t skel_read(struct file *filp, char *buf, size_t count, loff_t *offp);

/* write to device */
static ssize_t skel_write(struct file *filp, const char *buf, size_t count, loff_t *offp);

/* file operations handlers */
static struct file_operations skel_fops = {
  read  : skel_read, /* handler for the read() operation  */
  write : skel_write,/* handler for the write() operation */
  /* NULL (default) */
}

The read() file operation takes four arguments:

The function returns the number of bytes copied to the user buffer, which may or may not be equal to count.

Following is a simple implementation of skell_read() which copies data from the driver buffer skel_buffer[] to the user buffer:

/* device buffer */
static unsigned char skel_buffer[SKEL_BUFMAX];

...

/* read from device */
static ssize_t skel_read(struct file *filp, char *buf, size_t count, loff_t *offp)
{
	if (count > SKEL_BUFMAX)
		count = SKEL_BUFMAX;  /* trim data */
	copy_to_user(buf, skel_buffer, count);
	return(count);
}

The write() file operation is similar to read(), but in this case data flows from the user buffer to the driver buffer:

/* write to device */
static ssize_t skel_write(struct file *filp, const char *buf, size_t count, loff_t *offp)
{
	if (count > SKEL_BUFMAX)
		count = SKEL_BUFMAX;
	copy_from_user(skel_buffer, buf, count);
	return(count);
}
skeldev2.c skeldev2.c is a basic device driver implementation with read() and write() file operations.

Testing the module

To test the skeldev2.c device driver we can write a simple C program that opens the skeldev2 device, writes some data and then retrieves it. This is shown below.

int main(void) 
{
	int fd;
	ssize_t count;
	char buf[50];

	/* open device */
	if ((fd = open("/dev/skeldev2", O_RDWR)) < 0) {
		perror("open()");
		exit(1);
	}

	/* write to device */
	memset(buf, 0x00, sizeof(buf));/* clear buffer */
	strcpy(buf, "Hello World!");
	count = write(fd, buf, sizeof(buf));
	printf("Written %d bytes to device\n", count);

	/* read from device */
	memset(buf, 0x00, sizeof(buf));/* clear buffer */
	count = read(fd, buf, sizeof(buf));
	printf("Read %d bytes from device: %s\n", count, buf);

	/* close device */
	close(fd);
	exit(0);
}
test_skel2.c test_skel2.c is a test program for the skeldev2.c device.

Debugging with printk()

Because a kernel module does not run in user space, the C libraries are useless. This means that the familiar printf() function will not work in a kernel module. Fortunately, the kernel provides a similar function, printk(), which your device driver can use to output messages. However, there are some important differences between these two functions:

The /proc filesystem

A different way to communicate with device drivers is the /proc file system. For example, let's examine the /proc/meminfo file:

[ealtieri@italia os]$ ls -l /proc/meminfo 
-r--r--r-- 1 root  root0 Jun 19 11:20 /proc/meminfo
[ealtieri@italia os]$ cat /proc/meminfo 
  total: used: free:  shared: buffers:  cached:
Mem:  525320192 447213568 78106624  0 81035264 157761536
Swap: 271392768  0 271392768
MemTotal: 513008 kB
MemFree:76276 kB
MemShared:  0 kB
Buffers:79136 kB
...

As you can see from the ls command above, the /proc/meminfo file has size zero (number left to the date). However, when we show the contents of the file with cat, the file appears to contain information. How can we explain this? The trick is that files under the /proc file system are generated when they are read. Each of these files is associated with a module in the kernel. When the file is read, using cat for example, the kernel locates its module and calls a function to generate the contents of the file.

/proc operations are handled in a similar way to the file operations described earlier. You can define a handler for a read operation on a /proc entry and one for a write operation. In this document we will describe only the read handler.

First, a proc entry is created at load time using the create_proc_read_entry() function:

/* create /proc/xxxx entry */
proc = create_proc_read_entry
	(
	 skel_name,  /* entry name (/proc/skeldev) */
	 0, /* default mode*/
	 NULL, /* parent directory (NULL=/proc) */
	 skel_read_proc,/* read() operation handler*/
	 NULL  /* other data  */
	 );

The code above registers skel_read_proc() as the handler for the read operation on the /proc/skeldev entry. This function must have the following prototype:

static int skel_read_proc(char *buf, char **start, off_t offset, int count, int *eof, void *data)

The contents of the /proc/skeldev file is generated "on the fly" by writing to the buf parameter. A simple implementation of skel_read_proc() could be the following:

/* Handler for /proc/skeldev read */
static int skel_read_proc(char *buf, char **start, off_t offset, int count, int *eof, void *data)
{
	if (count > SKEL_BUFMAX)
		count = SKEL_BUFMAX;
	memcpy(buf, skel_buffer, count);/* generate file */
	*eof = 1;  /* end of file */
	return(count);
} /* skel_read_proc() */

The functions can be tested as following:

[ealtieri@italia dev]$ sudo /sbin/insmod skeldev3.o 
[ealtieri@italia dev]$ echo "Hello World" > /dev/skeldev 
[ealtieri@italia dev]$ cat /proc/skeldev 
Hello World
[ealtieri@italia dev]$
skeldev3.c skeldev3.c implements the /proc/skeldev entry.

The /proc filesystem is a simple way to communicate system information to user processes and is extensively used under Linux. Examples of /proc entries are /proc/meminfo, which displays information about memory, and /proc/cpuinfo, which displays processor type and features.

Writing handlers for /proc files can become really complicated if the data to be output is big. In this case, several read() calls may be needed to retrieve the whole data. In between successive read() calls, the module or kernel's data structures being read can change, causing inconsistency in the output.

More information about the /proc filesystem can be found on Linux Device Drivers, page 103.

Generic Kernel Modules

It's also possible to create a generic kernel module that doesn't use either the /proc or the devfs protocols. It's a little more work, but not too difficult. The module must have a unique number between 0 and 255 (if you find the number you chose conflicts, try another.) The functions used to register and unregister your module differ a bit, as do the call signatures of the functions implemented. Here's a sample of a generic kernel module: skeldev4.c.

Generic kernel modules are compiled just like any other. Installation is a two-step process the first time, but only one step thereafter. (The second step below only needs to be performed once.) If the module number chosen is 201, the first-time installation process looks like this:

$ sudo /sbin/insmod mymodule.o
$ sudo mknod /dev/mymodule c 201 0
 

To Do

  1. The /dev/null device discards everything that is being written to it and returns nothing when reading from it. This device is useful if you want to hide the messages output by a program. For example: "gcc example.c > /dev/null" will hide all of the compiler's messages. Modify the skeleton device driver to create a new module, mynull.c, that mimics /dev/null.
  2. Modify the skeldev2.c device driver so that the read() file operation returns the data written to the buffer in opposite order. Call it revdev.c. Also, test out printk by making your driver log the input data every time new data is written to it. Check the messages file to see that it works.

Resources

Linux Device Drivers Linux Device Drivers, Second Edition, by Alessandro Rubini and Jonathan Corbet. Original version published by O'Reilly & Associates. Available online in PDF and HTML format.
skeldev.c skeldev.c is a basic implementation of a device driver. It registers itself and creates a /dev/skeldev.c entry. The driver does not handle any file operation - the default OS actions apply.
skeldev2.c skeldev2.c is a basic device driver implementation with read() and write() file operations.
test_skel2.c test_skel2.c is a test program for the skeldev2.c device.
skeldev3.c skeldev3.c implements the /proc/skeldev entry.

Valid
XHTML 1.0!   Powered by RedHat Linux