[ewg] Need urgent help : Soft-RoCE in Linux-3.19
Kamal Heib
kamalh.mellanox at gmail.com
Thu Jun 11 00:07:01 PDT 2015
On Thu, Jun 11, 2015 at 10:03 AM, Amir Vadai <amirv.mellanox at gmail.com>
wrote:
> On Tue, Jun 9, 2015 at 11:36 AM, Dinesh Kb <dinesh.kb at vvdntech.in> wrote:
> >
> > One more thing we have modified is in the file
> "drivers/infiniband/hw/rxe/rxe_net.c
> >
> > we have commented the line 426
> >
> > //struct net_device *ndev = arg;
> >
> > and added the following
> >
> > struct netdev_notifier_info *info =arg;
> > struct net_device *ndev = info->dev;
> >
> > moreover we have copied "ib_pack.h" from the vanilla(3.0.0+) to latest
> kernel (4.0.4)
> >
> > please help me... to achieve SoftRoCE in latest kernel
> >
> >
> >
> > with warm regards
> > Dinesh.K.B
> > Software Engineer
> >
> >
> > Cell : +91 9944456867 | Skype : dinesh_kb93
> >
> >
> > On Tue, Jun 9, 2015 at 1:53 PM, Dinesh Kb <dinesh.kb at vvdntech.in> wrote:
> >>
> >> Hi,
> >>
> >> "I am new to
> OFED"
> >>
> >> I have downloaded the "Soft-RoCE" which includes the kernel (3.0.0+)
> and rxe support.
> >> I have Installed all the packages included in the OFED-1.5.2-rxe except
> the kernel-ib.
> >>
> >> In kernel 3.0.0+, I am able to run the "rping" test successfully and
> able to perform the benchmark tests, I am comfort with benchmark output.
> >>
> >> Now I am in need to perform the RoCE operation in the higher kernel
> version...say 4.0.4
> >>
> >> I have downloaded source file for the kernel version 4.0.4 and i have
> copied "driver/infiniband/hw/rxe" directory from kernel 3.0.0+ and pasted
> in the 4.0.4 kernel source and I have applied some patch for rxe to work
> smoothly...
> >>
> >> I am facing Kernel crash while running rping...
> >> and regarding the benchmark...
> >>
> >> when running rping as server in the higher kernel version and client in
> the lower version is hold fine....
> >> but its worse(crash) when running vice-versa
> >>
> >> Now i am facing problems, when kernel 4.04 acts as a client...
> >> while taking the wireshark output.. the server constantly sends the
> "read request" and its not getting any response from the client...
> >>
> >> the virtual address and dma length fields are "zero".
> >>
> >>
> >> Kindly help me.....
> >>
> >> Is there any possibilities/ perfect patch to make SoftRoCE work fine
> with higher kernel version (say kernel 3.19+)
> >>
> >> I hereby attached the wireshark output for ib_read_bw (when higher
> version acts as a client and 3.0.0+ acts as a server) and the patch i have
> applied as an attachment
> >>
> >> the crash in the server is as follows
> >>
> >> root at ls2085aqds:/etc/libibverbs.d# rping -s
> >> ------------[ cut here ]------------
> >> WARNING: CPU: 0 PID: 810 at kernel/softirq.c:146
> __local_bh_enable_ip+0x84/0xc0()
> >> Modules linked in: ib_rxe_net ib_rxe rdma_ucm ib_ucm ib_uverbs ib_umad
> rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr ipv6
> >> CPU: 0 PID: 810 Comm: kworker/0:1 Not tainted
> 3.16.0-Layerscape2-SDK+g330dd4b #20
> >> Workqueue: ib_cm cm_work_handler [ib_cm]
> >> Call trace:
> >> [<ffffffc000088084>] dump_backtrace+0x0/0x12c
> >> [<ffffffc0000881c0>] show_stack+0x10/0x1c
> >> [<ffffffc000552fa4>] dump_stack+0x74/0xc4
> >> [<ffffffc0000a9b8c>] warn_slowpath_common+0x84/0xac
> >> [<ffffffc0000a9c78>] warn_slowpath_null+0x14/0x20
> >> [<ffffffc0000ae3d0>] __local_bh_enable_ip+0x80/0xc0
> >> [<ffffffc00049fccc>] __dev_queue_xmit+0x1cc/0x408
> >> [<ffffffc00049ff14>] dev_queue_xmit+0xc/0x18
> >> [<ffffffbffc11b620>] send_finish+0x34/0x40 [ib_rxe_net]
> >> [<ffffffbffc11b6b8>] send+0x8c/0xec [ib_rxe_net]
> >> [<ffffffbffc10933c>] $x+0x318/0x334 [ib_rxe]
> >> [<ffffffbffc109be0>] $x+0xa8/0x120 [ib_rxe]
> >> [<ffffffbffc109d54>] rxe_run_task+0x4c/0x90 [ib_rxe]
> >> [<ffffffbffc1093b4>] arbiter_skb_queue+0x5c/0x8c [ib_rxe]
> >> [<ffffffbffc0fffd4>] rxe_requester+0x83c/0xddc [ib_rxe]
> >> [<ffffffbffc109be0>] $x+0xa8/0x120 [ib_rxe]
> >> [<ffffffbffc109d54>] rxe_run_task+0x4c/0x90 [ib_rxe]
> >> [<ffffffbffc104e0c>] rxe_post_send+0x80/0x40c [ib_rxe]
> >> [<ffffffbffc097068>] ib_send_mad+0x288/0x454 [ib_mad]
> >> [<ffffffbffc09759c>] ib_post_send_mad+0x190/0x544 [ib_mad]
> >> [<ffffffbffc0b075c>] ib_send_cm_rej+0xd0/0x194 [ib_cm]
> >> [<ffffffbffc0b19b8>] cm_destroy_id+0x188/0x300 [ib_cm]
> >> [<ffffffbffc0b1e80>] cm_process_work+0x154/0x17c [ib_cm]
> >> [<ffffffbffc0b2600>] cm_req_handler+0x758/0x978 [ib_cm]
> >> [<ffffffbffc0b28ec>] cm_work_handler+0xcc/0x1584 [ib_cm]
> >> [<ffffffc0000c0918>] process_one_work+0x114/0x354
> >> [<ffffffc0000c12f8>] worker_thread+0x13c/0x500
> >> [<ffffffc0000c7034>] kthread+0xd0/0xe8
> >> ---[ end trace be7a1b95934c8f03 ]---
> >>
> >>
> >> with warm regards
> >> Dinesh.K.B
> >> Software Engineer
> >>
> >>
> >> Cell : +91 9944456867 | Skype : dinesh_kb93
>
> + Kamal
>
>
Hi Dinesh,
I think that you are using an old version of Soft-RoCE. We have a github
project [1] that include a newer version of Soft-RoCE that based on kernel
4.0.*
To configure your environment please follow the wiki page under [2]. And to
validate that everything is working you can follow the wiki page under [3].
Please let me know if you had any issue.
[1] - https://github.com/SoftRoCE/
[2] - https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home
[3] - https://github.com/SoftRoCE/rxe-dev/wiki/Validate-that-RXE-is-working
Thanks,
Kamal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20150611/e4830912/attachment.html>
More information about the ewg
mailing list