[openib-general] [DAPL] ran kdapl test, got slab corruption

Hal Rosenstock halr at voltaire.com
Fri Apr 29 03:31:04 PDT 2005


On Thu, 2005-04-28 at 18:31, Tom Duffy wrote:
> On Fri, 2005-04-29 at 00:06 +0300, Itamar Rabenstein wrote:
> > I think the problem is related to the use of double in kernel
> > in i386 arch we need to add to makefile :
> > ifeq (${IS_i686},i686)
> > # Override -msoft-float in arch/i386/Makefile
> > EXTRA_CFLAGS += -mhard-float
> > endif 
> > 
> > I am not sure that you have a flag like this .
> > I am working now on a new version of kdapletst without any use of double's
> > in kernel.
> > I think it will be ready early next week.
> 
> If I add:
> 
> EXTRA_CLFAGS += -msse
> 
> I can compile on x86_64.

I don't need to do this on x64_64. Not sure why. Could this be due to a
compiler difference ? I am using gcc version 3.4.2 20041017 (Red Hat
3.4.2-6.fc3). I also build "in tree" rather that out of tree.

> Now that I run the test, I get the following:
> 
> [root at flopteron2 ~]# ./kdapltest -T S -D mthca0a -d
> Server_Cmd.debug:       1
> Server_Cmd.dapl_name: mthca0a
> DT_cs_Server: IA mthca0a opened
> DT_cs_Server: PZ created
> DT_cs_Server: EP created
> DT_cs_Server: PSP created
> *****  DAPL  Characteristics  *****
> Provider: mthca0a  Version 1.0  DAPL 1.2
> Adapter: Generic InfiniBand HCA by DAPL Reference Implementation Version
> 0.0
> Supporting:
>         64512 EPs with 65535 DTOs and 0 RDMA/RDs each
>         65408 EVDs of up to 65535 entries  (default S/R size is 16/16)
>         IOVs of up to 59 elements
>         131056 LMRs (and 131056 RMRs) of up to 0xffffffffffffffff bytes
>         Maximum MTU 0x80000000 bytes, RDMA 0x80000000 bytes
>         Maximum Private data size 92 bytes
> ***** ***** ***** ***** ***** *****
> DT_cs_Server: Posting 2 recvs
> Dapltest: Service Point Ready - mthca0a
> DT_cs_Server: Waiting for Connection Request
> DT_cs_Server: Accepting Connection Request
> DT_cs_Server: Awaiting connection ...
> DT_cs_Server: Connected!
> DAT_STATE: DAT_EP_STATE_CONNECTED
> DAT_STATE: Inbound DTO Status: Active
> DAT_STATE: Outbound DTO Status: Idle
> DT_cs_Server: Waiting for Client_Info
> DT_cs_Server: Got Client_Info
> DT_cs_Server: Waiting for Client_Cmd_Info
> DT_cs_Server: Send Server_Info
> Client Requests Server to Quit
> DT_cs_Server: Waiting for clients to all go away...
> DT_cs_Server: Cleaning up ...
> DT_cs_Server: IA mthca0a closed
> DT_cs_Server (mthca0a):  Exiting.
> TEST INSTANCE 1
> 
> [root at sins-stinger-10 ~]# ./kdapltest -T Q -s 192.168.0.26 -D mthca0a -d
> Server Name: 192.168.0.26
> Server Net Address: 192.168.0.26
> DT_cs_Client: Starting Test ...
> DT_cs_Client: IA mthca0a opened
> DT_cs_Client: EP created
> *****  DAPL  Characteristics  *****
> Provider: mthca0a  Version 1.0  DAPL 1.2
> Adapter: Generic InfiniBand HCA by DAPL Reference Implementation Version
> 0.0
> Supporting:
>         64512 EPs with 65535 DTOs and 0 RDMA/RDs each
>         65408 EVDs of up to 65535 entries  (default S/R size is 16/16)
>         IOVs of up to 28 elements
>         131056 LMRs (and 131056 RMRs) of up to 0xffffffffffffffff bytes
>         Maximum MTU 0x80000000 bytes, RDMA 0x80000000 bytes
>         Maximum Private data size 92 bytes
> ***** ***** ***** ***** ***** *****
> DT_cs_Client: Posting 1 recv buffer
> DT_cs_Client: Connect Endpoint
> DT_cs_Client: Await connection ...
> DT_cs_Client: Connected!
> DAT_STATE: DAT_EP_STATE_CONNECTED
> DAT_STATE: Inbound DTO Status: Active
> DAT_STATE: Outbound DTO Status: Idle
> DT_cs_Client: Sending Client_Info
> DT_cs_Client: Sent Client_Info - awaiting completion
> DT_cs_Client: Sending Command
> DT_cs_Client: Sent Command - awaiting completion
> DT_cs_Client: Waiting for Server_Info
> DT_cs_Client: Server_Info Received
> DT_cs_Client: Version OK!
> -------------------------------------
> Server_Info.dapltest_version   : 6
> Server_Info.is_little_endian   : 1
> -------------------------------------
> Client_Info.dapltest_version   : 6
> Client_Info.is_little_endian   : 1
> Client_Info.test_type          : 4
> Quit_Cmd.server_name: 192.168.0.26
> Quit_Cmd.device_name: mthca0a
> DT_cs_Client: Cleaning Up ...
> DT_cs_Client: IA mthca0a closed
> DT_cs_Client: ========== End of Work -- Client Exiting
> TEST INSTANCE 1
> 
> Unfortunately, I get this error in dmesg:
> 
> dapl_ib_disconnect_clean: ep_ptr 0xffff81003f085320 has invalid CM handle

Do you know if the DREQ actually has been sent on IB ?

In any case, that message is really debug and can be eliminated.

> Also, this is bad: slab corruption
> 
> Slab corruption: start=ffff810077455eb8, len=312
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<ffffffff882c8252>](req_comp_work+0x42/0x90 [ib_at])
> 050: 6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 00 00 00 00
> 120: 6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 00 00 00 00
> Prev obj: start=ffff810077455d68, len=312
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<0000000000000000>](0x0)
> 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> Slab corruption: start=ffff81003aecbdb0, len=288
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<ffffffff882c8264>](req_comp_work+0x54/0x90 [ib_at])
> 040: 00 00 00 00 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b
> 110: 00 00 00 00 00 00 00 00 6b 6b 6b 6b 6b 6b 6b a5
> Prev obj: start=ffff81003aecbc78, len=288
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<0000000000000000>](0x0)
> 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

Is this associated with the above (quit test) ? Can this be reproduced ?
I will inspect the code to see how this could occur. Is it between 2
x86_64 machines ?

-- Hal

> -tduffy
> 
> ______________________________________________________________________
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list