CSC 262 Lab: Kernel Modules

This lab will go over the basics of writing a kernel module for Linux. A kernel module is a piece of code that can be added into the operating system while it is running, and can then interact with kernel data structures. Typically this might be used to add a new device driver, and in fact kernel modules look and act more or less like device drivers, even if they do not actually correspond to a physical device. We will take advantage of this capability to create a kernel module that will eventually help us to peek inside the OS to find out what is going on. Later labs will develop more complicated modules that look at kernel data structures; for now, we just want to test out the mechanics of creating, inserting and removing a module, and of writing code to move data between kernel memory and user memory. (For security and optimization reasons, the kernel reserves some portion of the computer's memory for its own use, and user programs are not allowed to access this area.)

Kernel modules and device drivers continue to evolve as Linux does. The way kernel modules worked in version 2.4 of the kernel has changed in version 2.6, and may change again with future versions. Therefore, keep in mind that what you see here is just one flavor of the many possible configurations for kernel modules.

What is a kernel module?
A first kernel module: Hello, World
Debugging using printk()
Classes of devices and modules
The file interface
The /proc file system
Exercises

What is a kernel module?

A kernel module is an extension to the operating system. It is a piece of code written explicitly to work with the existing kernel, but designed to be inserted or removed as necessary while the OS is running. The module resides in the same privilege level of the OS (the highest) and therefore can access every resource of the system. Under Linux, a module is nothing more than a C program with a well-defined interface to communicate with user processes and with other parts of the operating system. The kernel module must register itself with the operating system by providing a predefined set of function pointers that can be called by the OS to complete certain standard tasks (initialize & cleanup). Many kernel modules also register themselves to perform additional tasks: for example, a device driver will register functions to read, write, etc.

In the following sections we will sometimes use the term device driver instead of kernel module. A device driver is a kernel module specialized in I/O communication with some sort of device. The term device has a very wide meaning and it does not exclusively refer to an external or physical system. In general, a device is some kind of resource such as a floppy disk, a printer, a mouse, but also a special region of memory, a virtual terminal or a message box.

Many resources are available on the web that cover details of kernel module development. Most of the examples in this lab are based upon ones obtained from The Linux Kernel Module Programming Guide. Another excellent source of information, though somewhat dated at this point, is Linux Device Drivers, by Alessandro Rubini and Jonathan Corbet. This book, published by O'Reilly, is also available online in PDF and HTML formats (see resources).

A First Kernel Module: Hello, World

Developing a kernel module and getting it to work with your existing operating system is more complicated than running a simple user-level program, but not greatly more so. This section summarizes all the steps involved in working with kernel modules, using a bare-bones module that does little more than register itself and print a message to a log file. The code has been heavily commented to help you understand what is going on. This section will take you through the steps of a typical module development cycle. You'll want to follow a similar process with the more complicated modules described later.

Write the source code. In this particular example, we will be using a program called hello.c. It has been completely written for you, but you should look over the code and try to understand how it works.

hello.c is a basic implementation of a device driver. It registers itself and prints a message to the system log file.
Compile the source code. It used to be possible to do this by hand. However, the configuration for kernel modules has gotten so complicated that it is better to do so automatically using make. Create a Makefile with the following contents:
```
obj-m += hello.o

all:
	make -C /usr/src/redhat/BUILD/kernel-2.6.20/linux-2.6.20.i686/ M=$(PWD) modules

clean:
	make -C /usr/src/redhat/BUILD/kernel-2.6.20/linux-2.6.20.i686/ M=$(PWD) clean
```
The top line informs make that hello.o should be compiled as a kernel module, with all the settings appropriate thereto. The bottom two lines are conveniences.
Once set up, make will compile the file, creating hello.ko. (The .ko suffix indicates that the file has been compiled as a kernel object.) You can get information about the kernel module, including the kernel version is was compiled for, using the modinfo command.
```
$ sudo /sbin/modinfo hello.ko
```
Your module should be ready for insertion into the kernel. If all goes well, the command below should execute silently.
```
$ sudo /sbin/insmod hello.ko
```
If the version of the kernel you are running does not match the version the kernel module was compiled for, the command will fail. It will also fail if there are any other problems with the module file.
Test the module as necessary. In the case of this module, it should print a message to the system log file (/var/log/messages). You can view this file as the superuser; probably you only want to look at the last few lines.
```
$ sudo tail /var/log/messages
```
We can also look at the list of currently installed modules, and make sure that ours is present. The file /proc/modules will give us a list of all currently installed modules. Either look through the whole list, or use grep to filter it down to the one we want.
```
$ cat /proc/modules
  ...
$ cat /proc/modules | grep hello
  ...
```
When you are done, or if there are errors and you wish to update the code and try again, remove the module from the kernel. Our module prints a message to the system log during cleanup also, so we can verify that the cleanup function was indeed called.
```
$ sudo /sbin/rmmod mymod
$ sudo tail /var/log/messages
  ...
$ cat /proc/modules | grep hello
  ...
```

Debugging with printk()

Because a kernel module does not run in user space, the C libraries are useless. This means that the familiar printf() function will not work in a kernel module. The sample module above uses a similar function provided by the kernel, called printk(). However, there are some important differences between these two functions:

printk() requires a "<x>" at the very beginning of the string to be printed, where "x" is a number from 0 to 9 indicating the priority of the message. For example:
printk("<7>Hello world!\n");
will print a message at the lowest (DEBUG) priority. Priorities are defined in include/linux/kernel.h (line 30). Rather than using numbers, it is better style to use the macros provided in the same file, which look like KERN_xxx. The line above could be rewritten as following:
printk(KERN_DEBUG "Hello world!\n");
printk() will not print the message to standard output. Instead, the message is sent to the system log, located in /var/log/messages. To display the most recent messages in this log you can use the tail command as shown above.

Now that you know a little more about kernel modules, let's try something a little more tricky. This is a variant on hello.c that prints a message to the current terminal window when it is inserted or removed, instead of writing to the log file. (Try inserting it in one window and removing it in another.) This is a more complicated process, but the comments do a pretty good job of explaining what is going on.

hellotty.c is a basic implementation of a device driver. It registers itself and prints a message to the current terminal.

Classes of devices and modules

Linux distinguishes between three types of devices. Any single module acting as a device driver implements only one of these types and thus is classifiable as a character device, block device or a network device.

A character device is one that can be accessed as a stream of bytes, like a file. Reading from such a device can be done byte by byte. These devices are located under the /dev directory. Examples are /dev/tty (terminal) and /dev/port (I/O port access).
A block device is a device that can be accessed only as multiples of a block, where the size of a block depends on the device. This category includes hard disks and floppy disks, where the size of a block is two sectors (1 KB). Examples are /dev/floppy/0u1440 (first floppy, 1.4 MB) and /dev/hda (first hard disk).
A network device is a device in charge of exchanging data with other hosts. A network module is not visible on the file system (i.e. cannot be found in the /dev directory).

It is possible to identify a module's class by using the ls -l command. For example:

[ealtieri@italia os]$ ls -l /dev/tty crw-rw-rw- 1 root root 5,0 Jun 15 12:59 /dev/tty

The "c" in the file properties shows that the tty (terminal) device is a character device.

In this document we will discuss character devices only.

The File Interface

One of the greatest features of Linux and UNIX is that almost every resource on the system looks like a file, including devices. As shown in the previous section, device files (called nodes) are located under the /dev directory. Each of these device files is associated with a particular module in the kernel.

Because devices are files, we can issue file operations on them such as open(), read(), write() and close(). Every time a file operation is issued on a device file, the driver (kernel module) associated with the device must handle that operation. For example, a C program might include the following line.

fd = open("/dev/hda", O_RDONLY);

The above operation opens the /dev/hda device (first hard disk) for read only (O_RDONLY). When open() is issued, the operating system knows that /dev/hda is a device file. Therefore it locates the kernel module associated with the device and calls the device_open() file operation handler in that module. At this point it is up to the device driver to initialize the device and maybe return an error code.

There must be a handler for every possible file operation (listed below). However, not all operations are appropriate for all devices, so a device driver can choose default actions that do nothing for some operations.

How does the OS know which module is associated with a /dev entry? Each module has to register itself using the an appropriate function. For example, the function to register a character device in Linux is called register_chrdev(). This function tells the operating system the address of the various file-operations handler functions, and returns an integer called the major number of the device (which should be unique among all the devices on the system).

One more step is required to complete the installation of a device driver: it must be connected to the file system by creating a corresponding device file. This is the file name (like /dev/hda above) that will be referenced by user processes wishing to interact with the device. To do this properly, you will need the major number returned by the kernel when the device driver was first registered. In the example below, it is 251; we also provide a minor number of 0, and the c indicates a character device. (The minor number is used by drivers that control more than one device.)

$ sudo mknod /dev/my_char_device c 251 0
$ sudo chmod 666 /dev/my_char_device

There are several file operations that a module can implement. These are defined in the file_operations structure, in include/linux/fs.h (line 817). For convenience, this structure has been reproduced below. For a more detailed description of these operations see Linux Device Drivers, page 64. The example file below uses only the simplest file operations: read(), write(), open() and close().

struct file_operations {
  struct module *owner;
	loff_t (*llseek) (struct file *, loff_t, int);
	ssize_t (*read) (struct file *, char *, size_t, loff_t *);
	ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
	int (*readdir) (struct file *, void *, filldir_t);
	unsigned int (*poll) (struct file *, struct poll_table_struct *);
	int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
	int (*mmap) (struct file *, struct vm_area_struct *);
	int (*open) (struct inode *, struct file *);
	int (*flush) (struct file *);
	int (*release) (struct inode *, struct file *);
	int (*fsync) (struct file *, struct dentry *, int datasync);
	int (*fasync) (int, struct file *, int);
	int (*lock) (struct file *, int, struct file_lock *);
	ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *);
	ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);
	ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
	unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
};

Compile and install the kernel module buffer.c below. It creates a new character device consisting of a small (80-character) memory buffer. Once the module has been installed, create a device file for it called /dev/buffer. You should be able to read and write from the new device file using shell commands. When you are done, you can remove the module and the associated device file.

$ cat /dev/buffer
  ...
$ echo "Try changing the buffer message." > /dev/buffer
  ...
$ cat /dev/buffer
  ...

buffer.c is a basic implementation of a character device driver. It registers itself and write a message to the system log file containing the major number of the new device.

The /proc filesystem

Full-blown character devices are often more than we need for a particular task. Many kernel modules exist solely to provide information about the status of the operating system. The /proc file system simplifies the creation of such utility modules. Although a /proc kernel module will look similar to that of a general character device, there are a few differences.

An entry is created in the file system under the /proc directory when the device is registered. This makes it unnecessary to issue the mknod command.
Although it is possible to create a /proc entry that takes input, most communicate in only one direction (sending kernel information to the user). Thus the only operation which must be defined is read().

You have already seen several examples of how to use pieces of the /proc file system. Here is another, providing information on the status of memory usage.

[ealtieri@italia os]$ ls -l /proc/meminfo 
-r--r--r-- 1 root  root0 Jun 19 11:20 /proc/meminfo
[ealtieri@italia os]$ cat /proc/meminfo 
  total: used: free:  shared: buffers:  cached:
Mem:  525320192 447213568 78106624  0 81035264 157761536
Swap: 271392768  0 271392768
MemTotal: 513008 kB
MemFree:76276 kB
MemShared:  0 kB
Buffers:79136 kB
...

As you can see from the ls command above, the /proc/meminfo file has size zero (the number to the left of the date). However, when we show the contents of the file with cat, the file appears to contain information. How can we explain this? The trick is that files under the /proc file system are generated when they are read. When the file is read, using cat for example, the kernel locates the associated module and calls its read() function to generate the contents of the file.

Try out the sample /proc module below. You will compile and insert it into the kernel normally. At that point, you should see the /proc/test file automatically created. Examine its contents a few times using cat.

procfs.c contains a kernel module that implements the /proc/test entry.

The /proc filesystem is a simple way to communicate system information to user processes and is extensively used under Linux. Examples of /proc entries are /proc/meminfo (above), which displays information about memory, and /proc/cpuinfo, which displays processor type and features.

Writing handlers for /proc files can become really complicated if the data to be output is big. In this case, several read() calls may be needed to retrieve the whole data. In between successive read() calls, the module or kernel's data structures being read can change, causing inconsistency in the output.

More information about the /proc filesystem can be found on Linux Device Drivers, page 103.

Exercises

	Q1. The `/dev/null` device discards everything that is being written to it and returns nothing when reading from it. This device is useful if you want to hide the messages output by a program. For example: `"gcc example.c > /dev/null"` will hide all of the compiler's messages. Modify `buffer.c` to create a new module, `mynull.c`, that mimics /dev/null.
	Q2. Modify the `buffer.c` device driver so that the `read()` file operation returns the data written to the buffer in opposite order. Call it `revbuf.c`. Also, test out `printk` by making your driver log the input data every time new data is written to it. Check the messages file to see that it works.