[openib-general] List of issues in uverbs

viswanath krishnamurthy viswak at yahoo.com
Wed Aug 31 14:49:41 PDT 2005



--- Sean Hefty <mshefty at ichips.intel.com> wrote:

> viswanath krishnamurthy wrote:
> > 1. ib_cm_destroy_id(cm_id)
> >     hangs (does return to the caller)
> >     Is there a particular shutdown sequence
> >     that needs to be followed ? Is there a
> trace/debug
> >     I can enable ?
> 
> There's no significant debug to enable.  What app
> are you running that's calling 
> ib_cm_destroy_id()?  I didn't think that the ping
> pong tests used it.  Are you 
> trying to call this function from within a CM
> callback?

   Probably called from a callback.. The application
   is small application which accepts incoming   
connections (Like a socket server). 
 When is the good time to call the destroy ?
> 
> The call will hang while there is a CM callback
> outstanding or if a CM event has 
> not been completed by calling put_event.
> 
> > 2. libmthca library crashes when a server accepts
> >     lots of new incoming sessions. See log (gdb)
> >     in the attachment. (It accepts about 170
> > connections) Looks like a memory allocation issue.
> 
> The log file borders on unreadable.

Hope this time attachment is better..

  See information here
==================
A server program that accepts multiple incoming
connections. After about 170 connections
the library dies as seen in the gdb output
==========================================

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208648784 (LWP 21309)]
0xb7f79de8 in mthca_free_db (db_tab=0x805c688,
type=MTHCA_DB_TYPE_CQ_SET_CI, db_index=494) at
src/memfree.c:150
150             db_tab->page[db_index /
MTHCA_DB_REC_PER_PAGE].
(gdb) bt
#0  0xb7f79de8 in mthca_free_db (db_tab=0x805c688,
type=MTHCA_DB_TYPE_CQ_SET_CI, db_index=494)
    at src/memfree.c:150
#1  0xb7f7c699 in mthca_create_cq (context=0x805a0b4,
cqe=10) at mthca.h:243
#2  0xb7f81eb5 in ibv_create_cq (context=0x805a0b4,
cqe=10, cq_context=0x0) at src/verbs.c:107
#3  0xb7f5d6c0 in xib_qp_alloc_init (hp=0x865c958,
port=1) at xsocket_trans2.c:157
#4  0xb7f5e19f in xib_conn_init (xcbp=0x865c958) at
xsocket_trans2.c:496
#5  0xb7f5bd06 in handle_cm_req (hp=0x805da08,
comm_id=0x865cab0, rguid=0x805db64 "",
rn_guid=0x805db64 "",
    data=0x805d7b0, len=90) at xsocket.c:230
#6  0xb7f5ec73 in cm_handler () at
xsocket_trans2.c:799
#7  0x007993ae in start_thread () from
/lib/tls/libpthread.so.0
#8  0x00619aee in clone () from /lib/tls/libc.so.6



> 
> > 3. Kernel oops when lots of traffic between
> multiple
> >    clients and server. Very consistently
> >    reproducible.  See attachment for details
> 
> Can you clarify what application you're running?  I
> can't understand your 
> configuration from the log file.

The application is a simple one, which accepts
incoming requests and spawns a thread to handle it.
The application does simple "ping-pong" of data.

 printing eip:
c0285f7d
*pde = 3649a001
Oops: 0000 [#1]
SMP
Modules linked in: nfs nfsd exportfs lockd autofs4
sunrpc uhci_hcd ehci_hcd hw_random e1000 ext3 jbd
sd_mod
CPU:    0
EIP:    0060:[<c0285f7d>]    Not tainted VLI
EFLAGS: 00010002   (2.6.12.5)
EIP is at mthca_poll_cq+0x158/0x534
eax: 00000000   ebx: c2027080   ecx: 00000007   edx:
00000a60
esi: 0000013c   edi: c2027104   ebp: c1a33f0c   esp:
c1a33ea4
ds: 007b   es: 007b   ss: 0068
Process ib_mad1 (pid: 312, threadinfo=c1a32000
task=f7f16540)
Stack: c1800560 c17f8560 c17f8ec0 c1a33edc c0116819
f7d9489c f78a31e0 00000000
       00000080 00000000 00000000 00000286 f7d83000
c1a33f0c 00000001 f7d94880
       f8806000 00000292 00000001 00000000 c2027080
f7d83000 f789bc00 c1a33f0c
Call Trace:
 [<c0116819>] load_balance_newidle+0x76/0x81
 [<c026b28c>] ib_mad_completion_handler+0x2c/0x8d
 [<c012d86a>] remove_wait_queue+0xf/0x34
 [<c0129aad>] worker_thread+0x1b0/0x23a
 [<c026b260>] ib_mad_completion_handler+0x0/0x8d
 [<c0116ff6>] default_wake_function+0x0/0xc
 [<c0116ff6>] default_wake_function+0x0/0xc
 [<c01298fd>] worker_thread+0x0/0x23a
 [<c012d5a0>] kthread+0x8a/0xb2
 [<c012d516>] kthread+0x0/0xb2
 [<c0101bb1>] kernel_thread_helper+0x5/0xb Code: 01 00
00 8b 44 24 18 8d bb 84 00 00 00 8b 53 5c 8b 70 18 8b
4f 24 0f ce 2b b3 b8 00 00 00 8b 83 bc 00 00 00 d3 ee
01 f2 8d 14 d0 <8b> 02 8b 52 04 85 ff 89 45 00 89 55
04 74 16 8b 57 10 89 f0 39




After about 170 incoming connections the library
(hence
the application) dies..

> 
> > 4. Is there a way to get the Port GUID from
> >      incoming connection. I can only get the
> remote
> >    node guid, but not the port GUID from the CM
> REQ
> >     data. This was possible in gen1 stack.
> 
> You can use the returned path record to obtain port
> information.  What do you 
> need the port GUID for?

If an HCA has multiple ports, the node guid will be
the
same. It will be good to get the port guid to uniqely
identify the port.
> 
> - Sean
> 
Here is the code version used..

[root at IB]# svn info
Path: .
URL: https://openib.org/svn/gen2/trunk
Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd
Revision: 3169
Node Kind: directory
Schedule: normal
Last Changed Author: mst
Last Changed Rev: 3169
Last Changed Date: 2005-08-23 09:25:31 -0700 (Tue, 23
Aug 2005)


# cat /sys/class/infiniband/mthca0/hw_rev
a0
# cat /sys/class/infiniband/mthca0/fw_ver
1.0.1

[root at subnetmgr4 ~]# ibv_devices
    device                 node GUID
    ------              ----------------
    mthca0              0002c90200400d00










		
____________________________________________________
Start your day with Yahoo! - make it your home page 
http://www.yahoo.com/r/hs 
 



More information about the general mailing list