[ofa-general] Race condition in core/sysfs.c (kernel panic) when unloading the driver

Jack Morgenstein jackm at dev.mellanox.co.il
Sat Feb 21 23:09:11 PST 2009


On Friday 20 February 2009 08:50, Roland Dreier wrote:
> What test are you using to hit this race?  Are you using a distro kernel
> with OFED?
> 
I ran on RHEL5.2, with a ConnectX card, using the following test (source given at the end of this post):

1. Start the driver.
2. In one console window, compile (just gcc) and run the app below which prints out pkeys
   in a tight loop via libsysfs.
3. In another console window, run the bash script below (which loads/unloads the driver, with some
   time randomization added).

After a few hours of this test, I got a kernel panic, and adding a mutex to make the low-level driver
access atomic (wrt ib_core) for showing pkeys fixed the problem entirely.

When I added printouts to the low-level driver and to sysfs.c (printout in procedure show_port_pkey
just before call to ib_query_pkey), I noticed that the crash occurred as follows
(note that mlx4_ib is not in the list of loaded modules, and that the paging request address failure
is in virtual function "query_pkey"):

ENTERING mlx4_ib_remove: ibdev = ffff81010dfdf800
show_port_pkey: ibdev=ffff81010dfdf800, query_pkey=ffffffff88422f28, portnum=1, ix=127
show_port_pkey: ibdev=ffff81010dfdf800, query_pkey=ffffffff88422f28, portnum=1, ix=126
show_port_pkey: ibdev=ffff81010dfdf800, query_pkey=ffffffff88422f28, portnum=1, ix=125
...
show_port_pkey: ibdev=ffff81010dfdf800, query_pkey=ffffffff88422f28, portnum=1, ix=79
show_port_pkey: ibdev=ffff81010dfdf800, query_pkey=ffffffff88422f28, portnum=1, ix=78
ib_device_unregister_sysfs: ibd=ffff81010dfdf800, portnum=1
ib_device_unregister_sysfs: ibd=ffff81010dfdf800, portnum=2
LEAVING mlx4_ib_remove: ibdev = ffff81010dfdf800
Unable to handle kernel paging request at ffffffff88422f53 RIP:
 [<ffffffff88422f53>]
PGD 203067 PUD 205063 PMD 11658b067 PTE 0
Oops: 0010 [1] SMP
last sysfs file: /class/infiniband/mlx4_0/ports/1/pkeys/78
CPU 0
Modules linked in: ib_ipoib(U) ib_cm(U) ib_sa(U) ib_uverbs(U) ib_umad(U) mlx4_core(U) ib_mad(U)
ib_core(U) hfsplus netconsole nfsd exportfs auth_rpcgss autofs4 hidp nfs lockd fscache nfs_acl
rfcomm l2cap bluetooth sunrpc ipoib_helper(U) ipv6 xfrm_nalgo crypto_api dm_mirror
dm_multipath scsi_dh video backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac
parport_pc lp parport i2c_piix4 ide_cd k8_edac cdrom edac_mc i2c_core k8temp hwmon sg bnx2
serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache sata_svw libata
shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 26829, comm: opensm Tainted: G      2.6.18-128.el5 #1
RIP: 0010:[<ffffffff88422f53>]  [<ffffffff88422f53>]
RSP: 0018:ffff810212a27e58  EFLAGS: 00010246
RAX: ffff81010ccec180 RBX: ffff81012194bc80 RCX: 0000000000000000
RDX: ffff81010ccec180 RSI: 0000000000000202 RDI: ffff81010ccec280
RBP: ffff81010da7d000 R08: ffff810212a26000 R09: 000000000000003c
R10: ffff810123f88800 R11: 0000000000000001 R12: ffff810115354701
R13: 000000000000004e R14: ffff81010dfdf800 R15: ffff810212a27ea6
FS:  00002ad1a47afc00(0000) GS:ffffffff803ac000(0000) knlGS:00000000f75fdb90
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffff88422f53 CR3: 0000000121354000 CR4: 00000000000006e0
Process opensm (pid: 26829, threadinfo ffff810212a26000, task ffff81021c4dc820)
Stack:  00000010000280d0 ffff81012194bc80 ffff81010da7d000 ffff810115354740
 ffff810212a27f50 ffffffff882665e0 ffff81012194bc80 ffffffff88256e71
 ffff810212a27f50 ffffffff882665e0 ffff810115bcbc90 ffff81010f8ef140
Call Trace:
 [<ffffffff88256e71>] :ib_core:show_port_pkey+0x59/0x7d
 [<ffffffff80107068>] sysfs_read_file+0xa5/0x13f
 [<ffffffff8000b3f3>] vfs_read+0xcb/0x171
 [<ffffffff800117d4>] sys_read+0x45/0x6e
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code:  Bad RIP value.
RIP  [<ffffffff88422f53>]
 RSP <ffff810212a27e58>
CR2: ffffffff88422f53
 <0>Kernel panic - not syncing: Fatal exception

- Jack
=================================
1. Pkeys print app:

/*
 * Copyright (c) 2004-2008 Voltaire Inc.  All rights reserved.
 *
 * This software is available to you under a choice of one of two
 * licenses.  You may choose to be licensed under the terms of the GNU
 * General Public License (GPL) Version 2, available from the file
 * COPYING in the main directory of this source tree, or the
 * OpenIB.org BSD license below:
 *
 *     Redistribution and use in source and binary forms, with or
 *     without modification, are permitted provided that the following
 *     conditions are met:
 *
 *      - Redistributions of source code must retain the above
 *        copyright notice, this list of conditions and the following
 *        disclaimer.
 *
 *      - Redistributions in binary form must reproduce the above
 *        copyright notice, this list of conditions and the following
 *        disclaimer in the documentation and/or other materials
 *        provided with the distribution.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
 * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
 * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
 * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 * SOFTWARE.
 *
 */

#define _GNU_SOURCE

#if HAVE_CONFIG_H
#  include <config.h>
#endif /* HAVE_CONFIG_H */

#include <inttypes.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdarg.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <string.h>
#include <endian.h>
#include <byteswap.h>
#include <sys/poll.h>
#include <syslog.h>
#include <netinet/in.h>
#include <errno.h>

static int
ret_code(void)
{
	int e = errno;

	if (e > 0)
		return -e;
	return e;
}

int
sys_read_string(char *dir_name, char *file_name, char *str, int max_len)
{
	char path[256], *s;
	int fd, r;

	snprintf(path, sizeof(path), "%s/%s", dir_name, file_name);

	if ((fd = open(path, O_RDONLY)) < 0)
		return ret_code();

	if ((r = read(fd, str, max_len)) < 0) {
		int e = errno;
		close(fd);
		errno = e;
		return ret_code();
	}

	str[(r < max_len) ? r : max_len - 1] = 0;

	if ((s = strrchr(str, '\n')))
		*s = 0;

	close(fd);
	return 0;
}

int
sys_read_uint(char *dir_name, char *file_name, unsigned *u)
{
	char buf[32];
	int r;

	if ((r = sys_read_string(dir_name, file_name, buf, sizeof(buf))) < 0)
		return r;

	*u = strtoul(buf, 0, 0);

	return 0;
}

int main()
{
	int i;
	char *path = "/sys/class/infiniband/mlx4_0/ports/1/pkeys";
	char pkey_is[20];
	unsigned u;

	while (1) 
		for (i = 127; i >= 0; --i) {
		   sprintf(pkey_is, "%d",i);
		   if (sys_read_uint(path, pkey_is, &u)) {
				sleep(1);
				break;
		   }
		   printf("%d: %u\n",i, u);
		}
	return 0;	
}
========================================================
Bash driver up-down script:

#!/bin/bash -x
i=0
while true; do
        echo iteration number $i; date
        /etc/init.d/openibd start
        opensm &
        sleep 10.$RANDOM
        pkill -9 opensm
        wait
        /etc/init.d/openibd stop
        let i=$i+1
done




More information about the general mailing list