[openib-general] Infiniband on Debian etch RC1

Hal Rosenstock halr at voltaire.com
Tue Nov 21 05:35:15 PST 2006


On Tue, 2006-11-21 at 07:51, Diego Guella wrote:
> > If you are using OFED 1.1, then you should use the source RPM for
> > OpenSM. There was one patch on the list found with Debian for a stack
> > smashing issue with osm_helper.c.
> 
> The SM which is currently running (on SuSE 9.3) is the one included in 
> OFED-1.0.
> Should I migrate to OFED-1.1 or can I build opensm from the OFED-1.0 source 
> RPM?

I can't answer that question for you. All I say is that a lot of bugs
(and new features) have been added at OFED 1.1 to OpenSM. I don't know
how OFED 1.0 OpenSM will do on Debian as this is new ground. If you do
decide you want to upgrade, I can advise on what bits to pick up.

> >> I added a line with "ib_ipoib" to /etc/modules.
> >>
> >> So now I think I have to configure 2 new devices (MHES28 has 2 ports) in
> >> /etc/network/interfaces.
> >>
> >> I added 2 devices named ib0 and ib1, and I configured them to have static 
> >> IP
> >> addresses, just like a normal ethernet device.
> >>
> >> ifconfig shows they are up, one has the attribute "RUNNING" too, the 
> >> other
> >> not (I think this is because one has the cable plugged, the other not).
> >>
> >> All this is done on server PE1950
> >>
> >> Now, that cable goes to the other server, a PE2850, which has a SM 
> >> running.
> >>
> >> I try to ping that server, using the IP address of the infiniband IPoIB
> >> interace, but I get "destination unreachable".
> >
> > I presume the two machines are on the same IPoIB subnet. Are there any
> > errors in the OpenSM log ?
> >
> Yes, my Ethernet subnet is 192.168.200.0/255 and my Infiniband IPoIB subnet 
> is 193.168.200.0/255, this is the same on all the machines.
> 
> I opened /var/log/osm.log for the first time now and (apart for the log 
> size - 31MB!)
> there is this error, that is repeating every 10 seconds from June, 26 (the 
> date when I installed OFED-1.0) till today:
> -----
> Nov 21 13:00:22 438727 [42003960] -> __osm_sm_state_mgr_signal_error: ERR 
> 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state 
> IB_SMINFO_STATE_DISCOVERING
> Nov 21 13:00:32 441056 [0000] -> SM port is down
> -----

The SM state message is "normal" and can be ignored.

The problem is that the SM port is down. Do you have physical link
between the two HCAs ? That is the problem to solve.

-- Hal

> I want to point out that I have another system, a desktop, with installed 
> SuSE 9.3 and OFED-1.0 and a MHES14 card, and that works fine, IPoIB, SDP, 
> RDMA, all the features are OK.
> 
> 
> 
> Thanks,
> Diego
> 





More information about the general mailing list