[openib-general] Re: Questions about libibat, ib_uat, and ib_a
Hal Rosenstock
halr at voltaire.com
Mon Oct 17 07:07:43 PDT 2005
Hi Heiko,
On Mon, 2005-10-17 at 09:54, Heiko J Schick wrote:
> Hello Roland and Hal,
>
> did you changed the mailing-list settings, because it seems that I can
> sent anymore to "openib-general". Must I be a member nowdays? I
> apologize when you received my message twice.
You shouldn't need to be a member to send. It's an open list.
> I have some basic question about address translation in OpenIB
> (libibat, ib_uat, and ib_at).
>
> When I run "uatt" I will get the output below. To me it seems that
> function ib_at_route_by_ip work just fine. At least I receive a
> callback and gets the SGID, DGID, etc. But I'm not sure how
> ib_at_cancel works. This functions always reportes -1 (EPERM /
> Operation not permitted) as return code.
I don't think it it always but that's what is currently returned if
there is no pending request to cancel.
> It seems to me that ib_at_cancel in
> /trunk/src/linux-kernel/infiniband/core/at.c only reports -1 when
> lookup_req_id founds no corresponding pending request with the same
> ID. So is it ok that ib_cancel_at reports -EPERM?
EPERM is 1 so this is the same thing.
The comments say:
/**
* ib_at_cancel - cancel possible active asynchronous operation
* @req_id: asynchronous request ID
*
* Return 0 if canceled, -1 if cancel failed (e.g. bad ID)
*/
> When should ib_at_cancel normally called?
To terminate a pending request (if the callback to some AT request has
not been issued). It does no harm to call it even if the callback has
been invoked.
> XXXXXXXXXXX:/tmp/heiko # ./uatt
> uatt: main: src ip address c0a80841
> uatt: main: dest ip address c0a80841
> uatt: main: uat test start
> uatt: main: ib_at_route_by_ip: ret 1 errno 0 for request 1 id 0 0
> uatt: att_rt_comp_fn: id 0 context 0x10013258 completed with rec_num 1
> ===> rt 0x10013258 sgid 0xfe8000000000000002e625f000020003 dgid
> 0xfe8000000000000002e625f000020003
> uatt: att_rt_comp_fn: ib_at_paths_by_route: ret 0 errno 0 id 1 1
> uatt: att_path_comp_fn: id 1 context 0x10012658 completed with rec_num
> 1
> ===> slid 0x7 dlid 0x7
> uatt: main: ib_at_route_by_ip: ret 1 errno 0 for request 2 id 0 0
> uatt: att_rt_comp_fn: id 0 context 0x10013290 completed with rec_num 1
> ===> rt 0x10013290 sgid 0xfe8000000000000002e625f000020003 dgid
> 0xfe8000000000000002e625f000020003
> uatt: att_rt_comp_fn: ib_at_paths_by_route: ret 0 errno 0 id 2 2
> ...
> uatt: main: sleeping for 30 secs
> ...
> uatt: main: uat test cleanup
> uatt: main: cancel but no rt id 0 ret -1 errno 1
> uatt: main: cancel but no path id 1 ret -1 errno 1
> uatt: main: cancel but no rt id 0 ret -1 errno 1
> uatt: main: cancel but no path id 2 ret -1 errno 1
>
> If I understood everything correctly the normal sequence is like:
>
> 1. Execute ib_at_route_by_ip and check return code agains >0 =0 <0,
> etc.
> 2. Callback will be executed and I can process the received
> information included in struct ib_at_ib_route *rt (context)
> 3. After some timeout cancel pending requests with ib_at_cancel
Yes, but no requests were pending in this test execution.
> I've modified the att.c testcase and run in the route completion
> function ibv_get_device_name with parameter rt->out_dev. The source
> code looks like:
>
> static void att_rt_comp_fn(uint64_t req_id, void *context, int
> rec_num)
> {
> struct ib_at_ib_route *rt = context;
> int r, i;
> uint64_t req_id2;
> char *ib_dev_name;
>
> printf("rt->out_dev: %p\n", rt->out_dev);
> ibv_get_device_name(rt->out_dev);
> ...
>
> Should this code work, because it seems that out_dev is a kernel
> address (platform: PPC64) which cannot accessed by a userspace
> program. Via GDB I can see that rt has the following content:
>
> The address is rt->out_dev = 0xc0000000cffaa800 which looks like a
> kernel address.
Yes, this is a bug which has been previously pointed out on the list and
not fixed.
-- Hal
> Starting program: /home/schickhj/heiko/att -s 3232237633 -d 3232237633
> [Thread debugging using libthread_db enabled]
> [New Thread 549758242848 (LWP 3430)]
> uatt: main: src ip address c0a80841
> uatt: main: dest ip address c0a80841
> uatt: main: uat test start
> uatt: main: ib_at_route_by_ip: ret 1 errno 0 for request 1 id 0 0
> [Switching to Thread 549758242848 (LWP 3430)]
>
> Breakpoint 1, att_rt_comp_fn (req_id=0, context=0x10013208, rec_num=1)
> at att.c:139
> 139 struct ib_at_ib_route *rt = context;
> (gdb) bt
>
> (gdb) print /x *rt
> $1 = {sgid = {raw = {0xfe, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2,
> 0xe6, 0x25, 0xf0, 0x0, 0x2, 0x0, 0x3}, global = {
> subnet_prefix = 0xfe80000000000000, interface_id =
> 0x2e625f000020003}}, dgid = {raw = {0xfe, 0x80, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x2, 0xe6, 0x25, 0xf0, 0x0, 0x2, 0x0, 0x3}, global =
> {subnet_prefix = 0xfe80000000000000,
> interface_id = 0x2e625f000020003}}, out_dev =
> 0xc0000000cffaa800, out_port = 0x1, attr = {qos_tag = 0x0, pkey =
> 0xffff,
> multi_path_type = 0x0}}
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 549758242848 (LWP 3605)]
> ibv_get_device_name (device=0xc0000000cffaa800) at device.c:62
> 62 device.c: No such file or directory.
> in device.c
> (gdb) p /x *device
> Cannot access memory at address 0xc0000000cffaa800
>
> Mit freundlichen Gruessen / Kind Regards
> Heiko Joerg Schick
>
> IBM Deutschland Entwicklung GmbH
> I/Ox Microcode Development
> Linux Infiniband Device Drivers
>
> Schoenaicher Str. 220
> 71032 Boeblingen
> E-Mail: schickhj at de.ibm.com
> External: 49-7031-16-0 x4219, t/l: 120-4219
More information about the general
mailing list