[openfabrics-ewg] IPoIB HA not working on RHEL4 U3

John Blackwood john.blackwood at ccur.com
Mon Sep 25 11:17:29 PDT 2006


Hi Michael,

Thank you for the info/update.

While we're on the topic of IPoIBHA, I did a bit of testing on
RHEL4 U2 and SLES 10 systems, and I noticed the two following issues.

Maybe you have noticed these same issues too ?


1. If the port goes down, then ipoib_ha.pl will start up mcasthandle.
    If you then do a 'service openibd restart' then the mcasthandle
    process is not killed off.  Thus, you can end up with 2 copies of
    mcasthandle.  It would be nice if the 'stop' portion of openibd
    could kill off mcasthandle if it is running.


2. In OFED-1.1-rc5, the ipoib_ha.pl script would set the down port's
    IP address to zero.   In OFED-1.1-rc6, the script instead marks
    the interface 'down'.

    I've noticed that in rc5, when the primary port went down, the
    2ndary IP address became unusable.

    In rc6, when the primary port goes down, the 2ndary IP address
    is still usable, but now I notice that every 8 seconds or so,
    traffic across either IP address will pause for a bit.

    If the port comes back up, and I restart openibd, then this
    pausing behavior goes away.

I was just wondering if you are also seeing this behavior ?


thanks for listening.


John Blackwood
john.blackwood at ccur.com
Concurrent Computer Corp.






Michael S. Tsirkin wrote:
> Quoting r. John Blackwood <john.blackwood at ccur.com>:
> 
>>Subject: Re: IPoIB HA not working on RHEL4 U3
>>
>>Just FYI,
>>
>>I was also having problems with IPoIBHA on RHEL4 U2.
>>
>>I found that the iproute-2.6.9-3 rpm did not output the NO-CARRIER 
>>state, but when I installed iproute-2.6.11-1, the NO-CARRIER state did 
>>appear, and IPoIBHA started working much better.
> 
> 
> Correct. What we are doing is packaging a version of iproute into ofed
> for use solely by ipoibha. This will then be used whenever
> iproute is older than 2.6.9.
> 
> 





More information about the ewg mailing list