|
|||
Part I Designing Device Drivers for the Solaris Platform 1. Overview of Solaris Device Drivers 2. Solaris Kernel and Device Tree 5. Managing Events and Queueing Tasks 7. Device Access: Programmed I/O 10. Mapping Device and Kernel Memory 14. Layered Driver Interface (LDI) Part II Designing Specific Kinds of Device Drivers 15. Drivers for Character Devices 18. SCSI Host Bus Adapter Drivers 19. Drivers for Network Devices Part III Building a Device Driver 21. Compiling, Loading, Packaging, and Testing Drivers 22. Debugging, Testing, and Tuning Device Drivers 23. Recommended Coding Practices B. Summary of Solaris DDI/DKI Services C. Making a Device Driver 64-Bit Ready |
Testing DriversTo avoid data loss and other problems, you should take special care when testing a new device driver. This section discusses various testing strategies. For example, setting up a separate system that you control through a serial connection is the safest way to test a new driver. You can load test modules with various kernel variable settings to test performance under different kernel conditions. Should your system crash, you should be prepared to restore backup data, analyze any crash dumps, and rebuild the device directory. Enable the Deadman Feature to Avoid a Hard HangIf your system is in a hard hang, then you cannot break into the debugger. If you enable the deadman feature, the system panics instead of hanging indefinitely. You can then use the kmdb(1) kernel debugger to analyze your problem. The deadman feature checks every second whether the system clock is updating. If the system clock is not updating, then you are in an indefinite hang. If the system clock has not been updated for 50 seconds, the deadman feature induces a panic and puts you in the debugger. Take the following steps to enable the deadman feature:
Note that any zones on your system inherit the deadman setting as well. If your system hangs while the deadman feature is enabled, you should see output similar to the following example on your console: panic[cpu1]/thread=30018dd6cc0: deadman: timed out after 9 seconds of clock inactivity panic: entering debugger (continue to save dump) Inside the debugger, use the ::cpuinfo command to investigate why the clock interrupt was not able to fire and advance the system time. Testing With a Serial ConnectionUsing a serial connection is a good way to test drivers. Use the tip(1) command to make a serial connection between a host system and a test system. With this approach, the tip window on the host console is used as the console of the test machine. See the tip(1) man page for additional information. A tip window has the following advantages:
Note - Although using a tip connection and a second machine are not required to debug a Solaris device driver, this technique is still recommended. To Set Up the Host System for a tip Connection
Setting Up a Target System on the SPARC PlatformA quick way to set up the test machine on the SPARC platform is to unplug the keyboard before turning on the machine. The machine then automatically uses serial port A as the console. Another way to set up the test machine is to use boot PROM commands to make serial port A the console. On the test machine, at the boot PROM ok prompt, direct console I/O to the serial line. To make the test machine always come up with serial port A as the console, set the environment variables: input-device and output-device. Example 22-1 Setting input-device and output-device With Boot PROM Commandsok setenv input-device ttya ok setenv output-device ttya The eeprom command can also be used to make serial port A the console. As superuser, execute the following commands to make the input-device and output-device parameters point to serial port A. The following example demonstrates the eeprom command. Example 22-2 Setting input-device and output-device With the eeprom Command# eeprom input-device=ttya # eeprom output-device=ttya The eeprom commands cause the console to be redirected to serial port A at each subsequent system boot. Setting Up a Target System on the x86 PlatformOn x86 platforms, use the eeprom command to make serial port A the console. This procedure is the same as the SPARC platform procedure. See Setting Up a Target System on the SPARC Platform. The eeprom command causes the console to switch to serial port A (COM1) during reboot. Note - x86 machines do not transfer console control to the tip connection until an early stage in the boot process unless the BIOS supports console redirection to a serial port. In SPARC machines, the tip connection maintains console control throughout the boot process. Setting Up Test ModulesThe system(4) file in the /etc directory enables you to set the value of kernel variables at boot time. With kernel variables, you can toggle different behaviors in a driver and take advantage of debugging features that are provided by the kernel. The kernel variables moddebug and kmem_flags, which can be very useful in debugging, are discussed later in this section. See also Enable the Deadman Feature to Avoid a Hard Hang. Changes to kernel variables after boot are unreliable, because /etc/system is read only once when the kernel boots. After this file is modified, the system must be rebooted for the changes to take effect. If a change in the file causes the system not to work, boot with the ask (-a) option. Then specify /dev/null as the system file. Note - Kernel variables cannot be relied on to be present in subsequent releases. Setting Kernel VariablesThe set command changes the value of module or kernel variables. To set module variables, specify the module name and the variable: set module_name:variable=value For example, to set the variable test_debug in a driver that is named myTest, use set as follows: % set myTest:test_debug=1 To set a variable that is exported by the kernel itself, omit the module name. You can also use a bitwise OR operation to set a value, for example: % set moddebug | 0x80000000 Loading and Unloading Test ModulesThe commands modload(1M), modunload(1M), and modinfo(1M) can be used to add test modules, which is a useful technique for debugging and stress-testing drivers. These commands are generally not needed in normal operation, because the kernel automatically loads needed modules and unloads unused modules. The moddebug kernel variable works with these commands to provide information and set controls. Using the modload() FunctionUse modload(1M) to force a module into memory. The modload command verifies that the driver has no unresolved references when that driver is loaded. Loading a driver does not necessarily mean that the driver can attach. When a driver loads successfully, the driver's _info(9E) entry point is called. The attach() entry point is not necessarily called. Using the modinfo() FunctionUse modinfo(1M) to confirm that the driver is loaded. Example 22-3 Using modinfo to Confirm a Loaded Driver$ modinfo Id Loadaddr Size Info Rev Module Name 6 101b6000 732 - 1 obpsym (OBP symbol callbacks) 7 101b65bd 1acd0 226 1 rpcmod (RPC syscall) 7 101b65bd 1acd0 226 1 rpcmod (32-bit RPC syscall) 7 101b65bd 1acd0 1 1 rpcmod (rpc interface str mod) 8 101ce8dd 74600 0 1 ip (IP STREAMS module) 8 101ce8dd 74600 3 1 ip (IP STREAMS device) ... $ modinfo | grep mydriver 169 781a8d78 13fb 0 1 mydriver (Test Driver 1.5) The number in the info field is the major number that has been chosen for the driver. The modunload(1M) command can be used to unload a module if the module ID is provided. The module ID is found in the left column of modinfo output. Sometimes a driver does not unload as expected after a modunload is issued, because the driver is determined to be busy. This situation occurs when the driver fails detach(9E), either because the driver really is busy, or because the detach entry point is implemented incorrectly. Using modunload()To remove all of the currently unused modules from memory, run modunload(1M) with a module ID of 0: # modunload -i 0 Setting the moddebug Kernel VariableThe moddebug kernel variable controls the module loading process. The possible values of moddebug are:
Setting kmem_flags Debugging FlagsThe kmem_flags kernel variable enables debugging features in the kernel's memory allocator. Set kmem_flags to 0xf to enable the allocator's debugging features. These features include runtime checks to find the following code conditions:
The Solaris Modular Debugger Guide describes how to use the kernel memory allocator to analyze such problems. Note - Testing and developing with kmem_flags set to 0xf can help detect latent memory corruption bugs. Because setting kmem_flags to 0xf changes the internal behavior of the kernel memory allocator, you should thoroughly test without kmem_flags as well. Avoiding Data Loss on a Test SystemA driver bug can sometimes render a system incapable of booting. By taking precautions, you can avoid system reinstallation in this event, as described in this section. Back Up Critical System FilesA number of driver-related system files are difficult, if not impossible, to reconstruct. Files such as /etc/name_to_major, /etc/driver_aliases, /etc/driver_classes, and /etc/minor_perm can be corrupted if the driver crashes the system during installation. See the add_drv(1M) man page. To be safe, make a backup copy of the root file system after the test machine is in the proper configuration. If you plan to modify the /etc/system file, make a backup copy of the file before making modifications. To Boot With an Alternate KernelTo avoid rendering a system inoperable, you should boot from a copy of the kernel and associated binaries rather than from the default kernel.
Example 22-4 Booting an Alternate KernelThe following example demonstrates booting with an alternate kernel. ok boot kernel.test/sparcv9/unix Rebooting with command: boot kernel.test/sparcv9/unix Boot device: /sbus@1f,0/espdma@e,8400000/esp@e,8800000/sd@0,0:a File and \ args: kernel.test/sparcv9/unix Example 22-5 Booting an Alternate Kernel With the -a OptionAlternatively, the module path can be changed by booting with the ask (-a) option. This option results in a series of prompts for configuring the boot method. ok boot -a Rebooting with command: boot -a Boot device: /sbus@1f,0/espdma@e,8400000/esp@e,8800000/sd@0,0:a File and \ args: -a Enter filename [kernel/sparcv9/unix]: kernel.test/sparcv9/unix Enter default directory for modules [/platform/sun4u/kernel.test /kernel /usr/kernel]: <CR> Name of system file [etc/system]: <CR> SunOS Release 5.10 Version Generic 64-bit Copyright 1983-2002 Sun Microsystems, Inc. All rights reserved. root filesystem type [ufs]: <CR> Enter physical name of root device [/sbus@1f,0/espdma@e,8400000/esp@e,8800000/sd@0,0:a]: <CR> Consider Alternative Back-Up PlansIf the system is attached to a network, the test machine can be added as a client of a server. If a problem occurs, the system can be booted from the network. The local disks can then be mounted, and any fixes can be made. Alternatively, the system can be booted directly from the Solaris system CD-ROM. Another way to recover from disaster is to have another bootable root file system. Use format(1M) to make a partition that is the exact size of the original. Then use dd(1M) to copy the bootable root file system. After making a copy, run fsck(1M) on the new file system to ensure its integrity. Subsequently, if the system cannot boot from the original root partition, boot the backup partition. Use dd(1M) to copy the backup partition onto the original partition. You might have a situation where the system cannot boot even though the root file system is undamaged. For example, the damage might be limited to the boot block or the boot program. In such a case, you can boot from the backup partition with the ask (-a) option. You can then specify the original file system as the root file system. Capture System Crash DumpsWhen a system panics, the system writes an image of kernel memory to the dump device. The dump device is by default the most suitable swap device. The dump is a system crash dump, similar to core dumps generated by applications. On rebooting after a panic, savecore(1M) checks the dump device for a crash dump. If a dump is found, savecore makes a copy of the kernel's symbol table, which is called unix.n. The savecore utility then dumps a core file that is called vmcore.n in the core image directory. By default, the core image directory is /var/crash/machine_name. If /var/crash has insufficient space for a core dump, the system displays the needed space but does not actually save the dump. The mdb(1) debugger can then be used on the core dump and the saved kernel. In the Solaris operating system, crash dump is enabled by default. The dumpadm(1M) command is used to configure system crash dumps. Use the dumpadm command to verify that crash dumps are enabled and to determine the location of core files that have been saved. Note - You can prevent the savecore utility from filling the file system. Add a file that is named minfree to the directory in which the dumps are to be saved. In this file, specify the number of kilobytes to remain free after savecore has run. If insufficient space is available, the core file is not saved. Recovering the Device DirectoryDamage to the /devices and /dev directories can occur if the driver crashes during attach(9E). If either directory is damaged, you can rebuild the directory by booting the system and running fsck(1M) to repair the damaged root file system. The root file system can then be mounted. Recreate the /devices and /dev directories by running devfsadm(1M) and specifying the /devices directory on the mounted disk. The following example shows how to repair a damaged root file system on a SPARC system. In this example, the damaged disk is /dev/dsk/c0t3d0s0, and an alternate boot disk is /dev/dsk/c0t1d0s0. Example 22-6 Recovering a Damaged Device Directoryok boot disk1 ... Rebooting with command: boot kernel.test/sparcv9/unix Boot device: /sbus@1f,0/espdma@e,8400000/esp@e,8800000/sd@31,0:a File and \ args: kernel.test/sparcv9/unix ... # fsck /dev/dsk/c0t3d0s0** /dev/dsk/c0t3d0s0 ** Last Mounted on / ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1478 files, 9922 used, 29261 free (141 frags, 3640 blocks, 0.4% fragmentation) # mount /dev/dsk/c0t3d0s0 /mnt # devfsadm -r /mnt Note - A fix to the /devices and /dev directories can allow the system to boot while other parts of the system are still corrupted. Such repairs are only a temporary fix to save information, such as system crash dumps, before reinstalling the system. |
||
|