[openib-general] [PATCH]proposal for enabling partial ports on HCA

Roland Dreier rolandd at cisco.com
Wed Oct 5 12:06:57 PDT 2005


    Shirley> The port failure means the SW clients initilization of
    Shirley> that port failure. Doesn't matter whether the link is
    Shirley> up/down or the hardware/firmare problem. If encountering
    Shirley> any of the SW errors, the upper users can't use that port
    Shirley> correctly, or even the whole device correctly. It's
    Shirley> easily to prove that if you set error points during
    Shirley> client registration and start the upper users. The
    Shirley> problems could be kernel hung, kernel oops. For example,
    Shirley> if mad_client initilization ports failure and you start
    Shirley> ipoib_client. ifconfig will hung in kernel. If sa_client
    Shirley> failure, the ipoib multicast join will hit kernel
    Shirley> oops. Staring the upper users without checking the
    Shirley> depency resouce allocation is buggy. It is definitely
    Shirley> worth to spend time to address this.

Yes, I agree we should fix the bugs in error handling during
registration.  However, I don't think that a mask of ports is the
right answer -- it doesn't seem to address the real issue.  We should
just make sure that if, say, the MAD layer fails to initialize a
device, then all clients that depend on the MAD layer don't try to use
that device.  I'm not sure what the right way to express these
dependencies is, however.

 - R.



More information about the general mailing list