[openib-general] [PATCH]proposal for enabling partial ports on HCA

Fab Tillier ftillier at silverstorm.com
Wed Oct 5 12:15:49 PDT 2005


> From: Roland Dreier [mailto:rolandd at cisco.com]
> Sent: Wednesday, October 05, 2005 12:07 PM
> 
>     Shirley> The port failure means the SW clients initilization of
>     Shirley> that port failure. Doesn't matter whether the link is
>     Shirley> up/down or the hardware/firmare problem. If encountering
>     Shirley> any of the SW errors, the upper users can't use that port
>     Shirley> correctly, or even the whole device correctly. It's
>     Shirley> easily to prove that if you set error points during
>     Shirley> client registration and start the upper users. The
>     Shirley> problems could be kernel hung, kernel oops. For example,
>     Shirley> if mad_client initilization ports failure and you start
>     Shirley> ipoib_client. ifconfig will hung in kernel. If sa_client
>     Shirley> failure, the ipoib multicast join will hit kernel
>     Shirley> oops. Staring the upper users without checking the
>     Shirley> depency resouce allocation is buggy. It is definitely
>     Shirley> worth to spend time to address this.
> 
> Yes, I agree we should fix the bugs in error handling during
> registration.  However, I don't think that a mask of ports is the
> right answer -- it doesn't seem to address the real issue.  We should
> just make sure that if, say, the MAD layer fails to initialize a
> device, then all clients that depend on the MAD layer don't try to use
> that device.

Shouldn't a user get an error (not an oops) if they try to use the MAD layer for
a device that didn't initialize properly within the MAD layer?  Doesn't the MAD
layer trap that device requests are valid?  It seems that adding such checks
would be much simpler to implement, rather than trying to figure out how to
express these limitations to the various ULPs.

- Fab




More information about the general mailing list