[openib-general] Re: Questions about libibat, ib_uat, and ib_a

Hal Rosenstock halr at voltaire.com
Mon Oct 17 07:07:43 PDT 2005


Hi Heiko,

On Mon, 2005-10-17 at 09:54, Heiko J Schick wrote:
> Hello Roland and Hal,
> 
> did you changed the mailing-list settings, because it seems that I can
> sent anymore to "openib-general".  Must I be a member nowdays? I
> apologize when you received my message twice.

You shouldn't need to be a member to send. It's an open list.

> I have some basic question about address translation in OpenIB
> (libibat, ib_uat, and ib_at).
> 
> When I run "uatt" I will get the output below. To me it seems that
> function ib_at_route_by_ip work just fine. At least I receive a
> callback and gets the SGID, DGID, etc. But I'm not sure how
> ib_at_cancel works. This functions always reportes -1 (EPERM /
> Operation not permitted) as return code. 

I don't think it it always but that's what is currently returned if
there is no pending request to cancel.

> It seems to me that ib_at_cancel in
> /trunk/src/linux-kernel/infiniband/core/at.c only reports -1 when
> lookup_req_id founds no corresponding pending request with the same
> ID. So is it ok that ib_cancel_at reports -EPERM?

EPERM is 1 so this is the same thing.

The comments say:
/**
 * ib_at_cancel - cancel possible active asynchronous operation
 * @req_id: asynchronous request ID
 *
 * Return 0 if canceled, -1 if cancel failed (e.g. bad ID)
 */

>  When should ib_at_cancel normally called?

To terminate a pending request (if the callback to some AT request has
not been issued). It does no harm to call it even if the callback has
been invoked.

> XXXXXXXXXXX:/tmp/heiko # ./uatt
> uatt: main: src  ip address c0a80841
> uatt: main: dest ip address c0a80841
> uatt: main: uat test start
> uatt: main: ib_at_route_by_ip: ret 1 errno 0 for request 1 id 0 0
> uatt: att_rt_comp_fn: id 0 context 0x10013258 completed with rec_num 1
> ===> rt 0x10013258 sgid 0xfe8000000000000002e625f000020003 dgid
> 0xfe8000000000000002e625f000020003
> uatt: att_rt_comp_fn: ib_at_paths_by_route: ret 0 errno 0 id 1 1
> uatt: att_path_comp_fn: id 1 context 0x10012658 completed with rec_num
> 1
> ===> slid 0x7 dlid 0x7
> uatt: main: ib_at_route_by_ip: ret 1 errno 0 for request 2 id 0 0
> uatt: att_rt_comp_fn: id 0 context 0x10013290 completed with rec_num 1
> ===> rt 0x10013290 sgid 0xfe8000000000000002e625f000020003 dgid
> 0xfe8000000000000002e625f000020003
> uatt: att_rt_comp_fn: ib_at_paths_by_route: ret 0 errno 0 id 2 2
> ...
> uatt: main: sleeping for 30 secs
> ...
> uatt: main: uat test cleanup
> uatt: main: cancel but no rt id 0 ret -1 errno 1
> uatt: main: cancel but no path id 1 ret -1 errno 1
> uatt: main: cancel but no rt id 0 ret -1 errno 1
> uatt: main: cancel but no path id 2 ret -1 errno 1
> 
> If I understood everything correctly the normal sequence is like:
> 
> 1. Execute ib_at_route_by_ip and check return code agains >0 =0 <0,
> etc.
> 2. Callback will be executed and I can process the received
> information included in struct ib_at_ib_route *rt (context)
> 3. After some timeout cancel pending requests with ib_at_cancel

Yes, but no requests were pending in this test execution.

> I've modified the att.c testcase and run in the route completion
> function ibv_get_device_name with parameter rt->out_dev. The source
> code looks like:
> 
> static void att_rt_comp_fn(uint64_t req_id, void *context, int
> rec_num)
> {
>         struct ib_at_ib_route *rt = context;
>         int r, i;
>         uint64_t req_id2;
>         char *ib_dev_name;
> 
>         printf("rt->out_dev: %p\n", rt->out_dev);
>         ibv_get_device_name(rt->out_dev);
>         ...
> 
> Should this code work, because it seems that out_dev is a kernel
> address (platform: PPC64) which cannot accessed  by a userspace
> program. Via GDB I can see that rt has the following content:
> 
> The address is rt->out_dev = 0xc0000000cffaa800 which looks like a
> kernel address.

Yes, this is a bug which has been previously pointed out on the list and
not fixed.

-- Hal

> Starting program: /home/schickhj/heiko/att -s 3232237633 -d 3232237633
> [Thread debugging using libthread_db enabled]
> [New Thread 549758242848 (LWP 3430)]
> uatt: main: src  ip address c0a80841
> uatt: main: dest ip address c0a80841
> uatt: main: uat test start
> uatt: main: ib_at_route_by_ip: ret 1 errno 0 for request 1 id 0 0
> [Switching to Thread 549758242848 (LWP 3430)]
> 
> Breakpoint 1, att_rt_comp_fn (req_id=0, context=0x10013208, rec_num=1)
> at att.c:139
> 139             struct ib_at_ib_route *rt = context;
> (gdb) bt
> 
> (gdb) print /x *rt
> $1 = {sgid = {raw = {0xfe, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2,
> 0xe6, 0x25, 0xf0, 0x0, 0x2, 0x0, 0x3}, global = {
>       subnet_prefix = 0xfe80000000000000, interface_id =
> 0x2e625f000020003}}, dgid = {raw = {0xfe, 0x80, 0x0, 0x0, 0x0, 0x0,
>       0x0, 0x0, 0x2, 0xe6, 0x25, 0xf0, 0x0, 0x2, 0x0, 0x3}, global =
> {subnet_prefix = 0xfe80000000000000,
>       interface_id = 0x2e625f000020003}}, out_dev =
> 0xc0000000cffaa800, out_port = 0x1, attr = {qos_tag = 0x0, pkey =
> 0xffff,
>     multi_path_type = 0x0}}
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 549758242848 (LWP 3605)]
> ibv_get_device_name (device=0xc0000000cffaa800) at device.c:62
> 62      device.c: No such file or directory.
>         in device.c
> (gdb) p /x *device
> Cannot access memory at address 0xc0000000cffaa800
> 
> Mit freundlichen Gruessen / Kind Regards
> Heiko Joerg Schick
> 
> IBM Deutschland Entwicklung GmbH
> I/Ox Microcode Development
> Linux Infiniband Device Drivers
> 
> Schoenaicher Str. 220
> 71032 Boeblingen
> E-Mail: schickhj at de.ibm.com
> External: 49-7031-16-0 x4219,   t/l: 120-4219




More information about the general mailing list