[openib-general] segmentation fault with ibv_pingpong

Sayantan Sur surs at cse.ohio-state.edu
Tue Apr 12 16:28:05 PDT 2005


Hi,

I am facing a segmentation fault problem with the OpenIB Gen2 drivers
while executing `ibv_pingpong' test. The description of the problem is
given below. Can someone point out what may be going wrong here? I have
included as much information as I thought would be required, but if
more specific information is needed, I can provide it.

Thanks,
Sayantan.

Hardware:
---------

Two Dual Intel Xeon EM64T 3.4 GHz nodes
PCI-Express I/O bus
MT25208 Mellanox HCAs (rev a0)

Software:
---------
RedHat AS 4
2.6.11.6/2.6.11.7 kernel with Gen2 InfiniBand drivers
Firmware version 5.0.1
OpenIB Gen2 drivers (user verbs from main branch)
OpenSM (OpenIB version/IBGD 1.7.0 both of them result in the same)


Both the machines display their ports as ACTIVE.

[surs at x1:~] cat /sys/class/infiniband/mthca0/ports/1/state
4: ACTIVE

[surs at x5:bin] lsmod | grep ib
ib_uverbs              28056  0 
ib_umad                17696  0 
ib_mthca              113952  0 
ib_mad                 38576  2 ib_umad,ib_mthca
ib_core                52352  4 ib_uverbs,ib_umad,ib_mthca,ib_mad
libata                 53000  1 ata_piix
scsi_mod              151888  3 libata,aic79xx,sd_mod

Now, if I try to run ibv_pingpong, I get this error:

--->

[surs at x1:~] ibv_pingpong
Segmentation fault
[surs at x1:~]
Message from syslogd at x1 at Mon Apr 11 18:37:18 2005 ...
x1 kernel: invalid operand: 0000 [1] SMP

<---

The relevant part from the kernel log:

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at pci_gart:537
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: ib_uverbs ib_umad ib_mthca ib_mad ib_core parport_pc
lp parport autofs4 nfs lockd sunrpc dm_mod video button battery ac md5
ipv6 uhci_hcd ehci_hcd hw_random i2c_i801 i2c_core e1000 floppy ext3 jbd
ata_piix libata aic79xx sd_mod scsi_mod
Pid: 4034, comm: ibv_pingpong Not tainted 2.6.11.6
RIP: 0010:[<ffffffff8011da86>] <ffffffff8011da86>{dma_map_sg+223}
RSP: 0018:ffff81001d4dfd58  EFLAGS: 00010246
RAX: 000000001b92b000 RBX: ffff81001764fbf8 RCX: 000000001b92b000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000ff0 R11: 0000000000000246 R12: ffff81001764f000
R13: ffff81001764fbf8 R14: 0000000000000001 R15: ffff81001f92c070
FS:  00002aaaaacca000(0000) GS:ffffffff804c6380(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000515000 CR3: 0000000013145000 CR4: 00000000000006e0
Process ibv_pingpong (pid: 4034, threadinfo ffff81001d4de000, task
ffff81001dc641f0)
        Stack: ffff81001dc641f0 0000000000000000 0000000100000000
ffff81001764fbd0
        ffff8100176f3000 ffff81001764f000 0000000000513000
0000000000000001
        ffff81001facfa40 ffffffff8824b621
        Call Trace:<ffffffff8824b621>{:ib_mthca:mthca_map_user_db+366}
        <ffffffff88249f54>{:ib_mthca:mthca_create_cq+115}
<ffffffff880f38f6>{:ib_uverbs:ib_uverbs_create_cq+165}
        <ffffffff880f2608>{:ib_uverbs:ib_uverbs_write+139}
        <ffffffff8017443b>{vfs_write+207}
<ffffffff8017454a>{sys_write+69}
        <ffffffff8010e29e>{system_call+126}

Code: 0f 0b 0f 91 32 80 ff ff ff ff 19 02 89 f8 49 8b 97 08 01 00
RIP <ffffffff8011da86>{dma_map_sg+223} RSP <ffff81001d4dfd58>

----------------------------------------------------------------

Now, if I try to run ibv_pingpong under gdb (sender side), I get it it
to progress a little bit more (but not to completion). The receiver
prints this now:

<---

[surs at x5:examples] ibv_pingpong 192.168.107.2
local address:  LID 0x0002, QPN 0x000404, PSN 0x104788
remote address: LID 0x0001, QPN 0x000404, PSN 0x08b81e
[ 0] 00000404
[ 4] b3000000
[ 8] fd000003
[ c] 110000c0
[10] 15810000
[14] 00000010
[18] 00008002
[1c] ff100000
Failed status 12 for wr_id 2

--->


-- 
---------------------------------------------------------
Sayantan Sur            Graduate Research Assistant

395 Dreese Labs,        Computer Science and Engineering
Ohio State University,  Office : 774, Dreese Labs
Columbus,               email  : surs at cse.ohio-state.edu
Ohio - 43210.           phone(res) : 614.688.9792
USA.                    phone(off) : 614.292.8501
---------------------------------------------------------



More information about the general mailing list