[openib-general] ping problem with ammassocards(iWARPinterface)

Pradipta Kumar Banerjee bpradip at in.ibm.com
Wed Jul 19 11:55:52 PDT 2006


Ravinandan Arakali wrote:
> Steve/Pradipta,
> Without the -O2 option, rping is now working !
>
Good to know that!! But still we need to find out the root cause of the problem.

Thanks,
Pradipta

> Earlier, I did not realize that the cable was yanked out.
> Thanks for all the help..
> 
> Ravi
> -----Original Message-----
> From: Ravinandan Arakali [mailto:ravinandan.arakali at neterion.com]
> Sent: Friday, July 14, 2006 3:37 PM
> To: 'Steve Wise'
> Cc: 'bpradip at in.ibm.com'; 'openib-general at openib.org'; Leonid. Grossman
> (E-mail)
> Subject: RE: [openib-general] ping problem with
> ammassocards(iWARPinterface)
> 
> 
> As Pradipta suggested, I rebuilt the libraries by removing
> the optimization(-O2 flag) from Makefile. Now, I don't see the
> core dump but there's no connection established with rping.
> This is similar to the failure I am seeing with rdma_lat test.
> 
> BTW, when I start the rping in server mode, at say port 9999,
> should I expect to see an entity listening on that port number
> when I do "netstat -an". Currently, I don't see that.
> 
> Ravi
> 
> -----Original Message-----
> From: Steve Wise [mailto:swise at opengridcomputing.com]
> Sent: Thursday, July 13, 2006 12:10 PM
> To: ravinandan.arakali at neterion.com
> Cc: bpradip at in.ibm.com; openib-general at openib.org
> Subject: Re: [openib-general] ping problem with
> ammassocards(iWARPinterface)
> 
> 
> By the way, does this failure happen immediately or after some period of
> time?
> 
> 
> On Thu, 2006-07-13 at 13:27 -0500, Steve Wise wrote:
>> I guess this isn't surprising since rping doesn't work for you either.
>> Something fundamental is screwed up on your user side methinks...
>>
>> CM event 8 == RDMA_CM_EVENT_REJECTED which means either the server side
>> wasn't listening on the appropriate TCP port, or the server process did
>> an rdma_reject().  I'm guessing its the former...
>>
>> You could use tcpdmp and to see if the connection request is getting RST
>> by the remote side.
>>
>>
>>
>>
>> On Thu, 2006-07-13 at 11:20 -0700, Ravinandan Arakali wrote:
>>> With the --cma option, I don't see the error about running SM.
>>> But there's no connection established.
>>>
>>> openfab2:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma
>>> pp_server_connect_cma starting server
>>>
>>> openfab:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma 17.2.2.102
>>> pp_client_connect_cma starting client
>>> pp_client_connect_cma/856 unexpected CM event 8
>>> pp_client_connect_cma NOT connected!
>>> pp_connect_cma(17.2.2.102,18515) failed!
>>>
>>> There are no messages in dmesg either.
>>>
>>> Ravi
>>>
>>> -----Original Message-----
>>> From: Steve Wise [mailto:swise at opengridcomputing.com]
>>> Sent: Thursday, July 13, 2006 6:55 AM
>>> To: Ravinandan Arakali
>>> Cc: bpradip at in.ibm.com; openib-general at openib.org
>>> Subject: Re: [openib-general] ping problem with ammasso
>>> cards(iWARPinterface)
>>>
>>>
>>> Are you trying to run this over iwarp?  It doesn't need an SM...
>>>
>>> For the perftests rdma_lat and rdma_bw in the iwarp branch, use the
>>> --cma flag.
>>>
>>> Steve.
>>>
>>>
>>> On Wed, 2006-07-12 at 16:39 -0700, Ravinandan Arakali wrote:
>>>> Also, I am trying to run some of the iwarp bandwidth/latency tests
>>>> (available under directory perftest).
>>>> The first thing to do here is to run opensm. When I run opensm (with
> debug
>>>> level 10), I get the following error. Any idea what needs to be done
> to
>>> get
>>>> this working ?
>>>>
>>>> openfab2:/tmp/ib/src/userspace # opensm  -d 10
>>>> -------------------------------------------------
>>>> OpenSM Rev:openib-1.2.0
>>>> Command Line Arguments:
>>>>  d level = 0xa
>>>>  Log File: /var/log/osm.log
>>>> -------------------------------------------------
>>>> OpenSM Rev:openib-1.2.0
>>>>
>>>> Using default GUID 0x0
>>>> Error: Could not get port guid
>>>> Exiting SM
>>>>
>>>> openfab2:/tmp/ib/src/userspace # cat /var/log/osm.log
>>>> Jul 12 08:35:04 718914 [B7E518C0] -> OpenSM Rev:openib-1.2.0
>>>> Jul 12 08:35:04 719111 [0000] -> OpenSM Rev:openib-1.2.0
>>>>
>>>> Jul 12 08:35:04 721381 [B7E518C0] -> osm_sa_mad_ctrl_unbind: ERR 1A11:
> No
>>>> previous bind
>>>> Jul 12 08:35:04 721702 [0000] -> Exiting SM
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com]
>>>> Sent: Wednesday, July 12, 2006 10:31 AM
>>>> To: Ravinandan Arakali
>>>> Cc: openib-general at openib.org
>>>> Subject: Re: [openib-general] ping problem with ammasso cards(iWARP
>>>> interface)
>>>>
>>>>
>>>> Ravinandan,
>>>>   Do you still see the rping crash?
>>>>
>>>> Thanks,
>>>> Pradipta Kumar.
>>>>
>>>> Ravinandan Arakali wrote:
>>>>> Pradipta,
>>>>> Okay, thanks.. Initially, I was not sure since I don't remember
> non-zero
>>>>> values in /proc/krping. When I re-ran the krping test, I see
> following
>>>>> output
>>>>> openfab2:~ # cat /proc/krping
>>>>> 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856
>>>>>
>>>>> As you mentioned, the RDMA traffic seems to be flowing indeed !
>>>>> Any idea why rping is dumping core ?
>>>>>
>>>>> Has any testing been done using SDP with ammasso cards ?
>>>>>
>>>>> Regards,
>>>>> Ravi
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com]
>>>>> Sent: Friday, July 07, 2006 11:20 PM
>>>>> To: Ravinandan Arakali
>>>>> Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com;
>>>>> openib-general at openib.org
>>>>> Subject: Re: [openib-general] ping problem with ammasso cards(iWARP
>>>>> interface)
>>>>>
>>>>>
>>>>> Ravinandan Arakali wrote:
>>>>>> Pradipta,
>>>>>> Following is the output from gdb after core dump. I have also
>>> copy-pasted
>>>>>> the gdb output on client system.
>>>>>>
>>>>>> Attached is the dmesg output when krping test is run in verbose
> mode.
>>>>>> The ping data on the sender(client) seems okay. The content is
> shifted
>>>>>> forward by one character for each packet. On receiver, after
> receiving
>>>>> ping
>>>>>> pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because
>>>> messages
>>>>>> can be lost during writing to /var/log/messages ?
>>>>> krping is indeed working!!...Using 'verbose' allows you to see the
> ping
>>>>> data.
>>>>> When not using 'verbose' you see only 'send/recv' messages.
>>>>>> -----------------------------------------
>>>>>> (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999
>>>>>> Starting program:
>>>>>>
> /tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a
>>>>>> 0.0.0.0 -p 9999
>>>>>> [Thread debugging using libthread_db enabled]
>>>>>> [New Thread -1210054992 (LWP 3668)]
>>>>>> ipaddr (0.0.0.0)
>>>>>> port 9999
>>>>>> created cm_id 0x804e6e0
>>>>>> [New Thread -1210057824 (LWP 3671)]
>>>>>> rdma_bind_addr successful
>>>>>> rdma_listen
>>>>>> cma_event type 4 cma_id 0x804e968 (child)
>>>>>> child cma 0x804e968
>>>>>>
>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>> [Switching to Thread -1210054992 (LWP 3668)]
>>>>>> rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514
>>>>>> 514             cb->pd = ibv_alloc_pd(cm_id->verbs);
>>>>>> (gdb) bt
>>>>>> #0  rping_setup_qp (cb=0x0, cm_id=0x804e968) at
> examples/rping.c:514
>>>>>> #1  0x0804a716 in main (argc=9, argv=Cannot access memory at
> address
>>> 0x6
>>>>>> ) at examples/rping.c:767
>>>>>> (gdb)
>>>>>>
>>>>>> ---------------------------------
>>>>>> (gdb) run -c -vV -C100 -d  -a 17.2.2.102 -p 9999
>>>>>> Starting program:
>>>>>>
>>> tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d  -a
>>>>>> 17.2.2.102 -p 9999
>>>>>> [Thread debugging using libthread_db enabled]
>>>>>> [New Thread 47388824908032 (LWP 4620)]
>>>>>> ipaddr (17.2.2.102)
>>>>>> port 9999
>>>>>> created cm_id 0x506b00
>>>>>> [New Thread 1082132800 (LWP 4623)]
>>>>>> cma_event type 0 cma_id 0x506b00 (parent)
>>>>>> cma_event type 2 cma_id 0x506b00 (parent)
>>>>>> rdma_resolve_addr - rdma_resolve_route successful
>>>>>> created pd 0x506e60
>>>>>> created channel 0x506e80
>>>>>> created cq 0x506ea0
>>>>>> created qp 0x506f40
>>>>>> rping_setup_buffers called on cb 0x505010
>>>>>> allocated & registered buffers...
>>>>>> [New Thread 1090525504 (LWP 4624)]
>>>>>> cq_thread started.
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> openib-general mailing list
>>>>> openib-general at openib.org
>>>>> http://openib.org/mailman/listinfo/openib-general
>>>>>
>>>>> To unsubscribe, please visit
>>>> http://openib.org/mailman/listinfo/openib-general
>>>>>
>>>>
>>>> _______________________________________________
>>>> openib-general mailing list
>>>> openib-general at openib.org
>>>> http://openib.org/mailman/listinfo/openib-general
>>>>
>>>> To unsubscribe, please visit
>>> http://openib.org/mailman/listinfo/openib-general
>>
>> _______________________________________________
>> openib-general mailing list
>> openib-general at openib.org
>> http://openib.org/mailman/listinfo/openib-general
>>
>> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
> 





More information about the general mailing list