[openib-general] ping problem with ammassocards(iWARPinterface)

Ravinandan Arakali ravinandan.arakali at neterion.com
Tue Jul 18 17:30:26 PDT 2006


Steve/Pradipta,
Without the -O2 option, rping is now working !

Earlier, I did not realize that the cable was yanked out.
Thanks for all the help..

Ravi
-----Original Message-----
From: Ravinandan Arakali [mailto:ravinandan.arakali at neterion.com]
Sent: Friday, July 14, 2006 3:37 PM
To: 'Steve Wise'
Cc: 'bpradip at in.ibm.com'; 'openib-general at openib.org'; Leonid. Grossman
(E-mail)
Subject: RE: [openib-general] ping problem with
ammassocards(iWARPinterface)


As Pradipta suggested, I rebuilt the libraries by removing
the optimization(-O2 flag) from Makefile. Now, I don't see the
core dump but there's no connection established with rping.
This is similar to the failure I am seeing with rdma_lat test.

BTW, when I start the rping in server mode, at say port 9999,
should I expect to see an entity listening on that port number
when I do "netstat -an". Currently, I don't see that.

Ravi

-----Original Message-----
From: Steve Wise [mailto:swise at opengridcomputing.com]
Sent: Thursday, July 13, 2006 12:10 PM
To: ravinandan.arakali at neterion.com
Cc: bpradip at in.ibm.com; openib-general at openib.org
Subject: Re: [openib-general] ping problem with
ammassocards(iWARPinterface)


By the way, does this failure happen immediately or after some period of
time?


On Thu, 2006-07-13 at 13:27 -0500, Steve Wise wrote:
> I guess this isn't surprising since rping doesn't work for you either.
> Something fundamental is screwed up on your user side methinks...
>
> CM event 8 == RDMA_CM_EVENT_REJECTED which means either the server side
> wasn't listening on the appropriate TCP port, or the server process did
> an rdma_reject().  I'm guessing its the former...
>
> You could use tcpdmp and to see if the connection request is getting RST
> by the remote side.
>
>
>
>
> On Thu, 2006-07-13 at 11:20 -0700, Ravinandan Arakali wrote:
> > With the --cma option, I don't see the error about running SM.
> > But there's no connection established.
> >
> > openfab2:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma
> > pp_server_connect_cma starting server
> >
> > openfab:/tmp/ib/src/userspace/perftest # ./rdma_lat --cma 17.2.2.102
> > pp_client_connect_cma starting client
> > pp_client_connect_cma/856 unexpected CM event 8
> > pp_client_connect_cma NOT connected!
> > pp_connect_cma(17.2.2.102,18515) failed!
> >
> > There are no messages in dmesg either.
> >
> > Ravi
> >
> > -----Original Message-----
> > From: Steve Wise [mailto:swise at opengridcomputing.com]
> > Sent: Thursday, July 13, 2006 6:55 AM
> > To: Ravinandan Arakali
> > Cc: bpradip at in.ibm.com; openib-general at openib.org
> > Subject: Re: [openib-general] ping problem with ammasso
> > cards(iWARPinterface)
> >
> >
> > Are you trying to run this over iwarp?  It doesn't need an SM...
> >
> > For the perftests rdma_lat and rdma_bw in the iwarp branch, use the
> > --cma flag.
> >
> > Steve.
> >
> >
> > On Wed, 2006-07-12 at 16:39 -0700, Ravinandan Arakali wrote:
> > > Also, I am trying to run some of the iwarp bandwidth/latency tests
> > > (available under directory perftest).
> > > The first thing to do here is to run opensm. When I run opensm (with
debug
> > > level 10), I get the following error. Any idea what needs to be done
to
> > get
> > > this working ?
> > >
> > > openfab2:/tmp/ib/src/userspace # opensm  -d 10
> > > -------------------------------------------------
> > > OpenSM Rev:openib-1.2.0
> > > Command Line Arguments:
> > >  d level = 0xa
> > >  Log File: /var/log/osm.log
> > > -------------------------------------------------
> > > OpenSM Rev:openib-1.2.0
> > >
> > > Using default GUID 0x0
> > > Error: Could not get port guid
> > > Exiting SM
> > >
> > > openfab2:/tmp/ib/src/userspace # cat /var/log/osm.log
> > > Jul 12 08:35:04 718914 [B7E518C0] -> OpenSM Rev:openib-1.2.0
> > > Jul 12 08:35:04 719111 [0000] -> OpenSM Rev:openib-1.2.0
> > >
> > > Jul 12 08:35:04 721381 [B7E518C0] -> osm_sa_mad_ctrl_unbind: ERR 1A11:
No
> > > previous bind
> > > Jul 12 08:35:04 721702 [0000] -> Exiting SM
> > >
> > >
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com]
> > > Sent: Wednesday, July 12, 2006 10:31 AM
> > > To: Ravinandan Arakali
> > > Cc: openib-general at openib.org
> > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP
> > > interface)
> > >
> > >
> > > Ravinandan,
> > >   Do you still see the rping crash?
> > >
> > > Thanks,
> > > Pradipta Kumar.
> > >
> > > Ravinandan Arakali wrote:
> > > > Pradipta,
> > > > Okay, thanks.. Initially, I was not sure since I don't remember
non-zero
> > > > values in /proc/krping. When I re-ran the krping test, I see
following
> > > > output
> > > > openfab2:~ # cat /proc/krping
> > > > 1-amso0 891376 55711 891376 55711 1782720 27855 1782784 27856
> > > >
> > > > As you mentioned, the RDMA traffic seems to be flowing indeed !
> > > > Any idea why rping is dumping core ?
> > > >
> > > > Has any testing been done using SDP with ammasso cards ?
> > > >
> > > > Regards,
> > > > Ravi
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Pradipta Kumar Banerjee [mailto:bpradip at in.ibm.com]
> > > > Sent: Friday, July 07, 2006 11:20 PM
> > > > To: Ravinandan Arakali
> > > > Cc: Leonid. Grossman (E-mail); sivakumar.subramani at neterion.com;
> > > > openib-general at openib.org
> > > > Subject: Re: [openib-general] ping problem with ammasso cards(iWARP
> > > > interface)
> > > >
> > > >
> > > > Ravinandan Arakali wrote:
> > > >> Pradipta,
> > > >> Following is the output from gdb after core dump. I have also
> > copy-pasted
> > > >> the gdb output on client system.
> > > >>
> > > >> Attached is the dmesg output when krping test is run in verbose
mode.
> > > >> The ping data on the sender(client) seems okay. The content is
shifted
> > > >> forward by one character for each packet. On receiver, after
receiving
> > > > ping
> > > >> pkt 9, it seems to jump to pkt no. 1935. Not sure if it's because
> > > messages
> > > >> can be lost during writing to /var/log/messages ?
> > > > krping is indeed working!!...Using 'verbose' allows you to see the
ping
> > > > data.
> > > > When not using 'verbose' you see only 'send/recv' messages.
> > > >> -----------------------------------------
> > > >> (gdb) run -s -vV -C100 -d -a 0.0.0.0 -p 9999
> > > >> Starting program:
> > > >>
/tmp/ib/src/userspace/librdmacm/examples/.libs/rping -s -vV -C100 -d -a
> > > >> 0.0.0.0 -p 9999
> > > >> [Thread debugging using libthread_db enabled]
> > > >> [New Thread -1210054992 (LWP 3668)]
> > > >> ipaddr (0.0.0.0)
> > > >> port 9999
> > > >> created cm_id 0x804e6e0
> > > >> [New Thread -1210057824 (LWP 3671)]
> > > >> rdma_bind_addr successful
> > > >> rdma_listen
> > > >> cma_event type 4 cma_id 0x804e968 (child)
> > > >> child cma 0x804e968
> > > >>
> > > >> Program received signal SIGSEGV, Segmentation fault.
> > > >> [Switching to Thread -1210054992 (LWP 3668)]
> > > >> rping_setup_qp (cb=0x0, cm_id=0x804e968) at examples/rping.c:514
> > > >> 514             cb->pd = ibv_alloc_pd(cm_id->verbs);
> > > >> (gdb) bt
> > > >> #0  rping_setup_qp (cb=0x0, cm_id=0x804e968) at
examples/rping.c:514
> > > >> #1  0x0804a716 in main (argc=9, argv=Cannot access memory at
address
> > 0x6
> > > >> ) at examples/rping.c:767
> > > >> (gdb)
> > > >>
> > > >> ---------------------------------
> > > >> (gdb) run -c -vV -C100 -d  -a 17.2.2.102 -p 9999
> > > >> Starting program:
> > > >>
> > tmp/ib/src/userspace/librdmacm/examples/.libs/rping -c -vV -C100 -d  -a
> > > >> 17.2.2.102 -p 9999
> > > >> [Thread debugging using libthread_db enabled]
> > > >> [New Thread 47388824908032 (LWP 4620)]
> > > >> ipaddr (17.2.2.102)
> > > >> port 9999
> > > >> created cm_id 0x506b00
> > > >> [New Thread 1082132800 (LWP 4623)]
> > > >> cma_event type 0 cma_id 0x506b00 (parent)
> > > >> cma_event type 2 cma_id 0x506b00 (parent)
> > > >> rdma_resolve_addr - rdma_resolve_route successful
> > > >> created pd 0x506e60
> > > >> created channel 0x506e80
> > > >> created cq 0x506ea0
> > > >> created qp 0x506f40
> > > >> rping_setup_buffers called on cb 0x505010
> > > >> allocated & registered buffers...
> > > >> [New Thread 1090525504 (LWP 4624)]
> > > >> cq_thread started.
> > > >>
> > > >>
> > > >
> > > >
> > > > _______________________________________________
> > > > openib-general mailing list
> > > > openib-general at openib.org
> > > > http://openib.org/mailman/listinfo/openib-general
> > > >
> > > > To unsubscribe, please visit
> > > http://openib.org/mailman/listinfo/openib-general
> > > >
> > > >
> > >
> > >
> > > _______________________________________________
> > > openib-general mailing list
> > > openib-general at openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> > >
> >
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
>





More information about the general mailing list