[openib-general] [PATCH]proposal for enabling partial ports on HCA

Fab Tillier ftillier at silverstorm.com
Wed Oct 5 12:10:48 PDT 2005


> From: Shirley Ma [mailto:xma at us.ibm.com]
> Sent: Wednesday, October 05, 2005 11:56 AM
> 
> The port failure means the SW clients initilization of that port failure.
> Doesn't matter whether the link is up/down or the hardware/firmare problem. If
> encountering any of the SW errors, the upper users can't use that port
> correctly, or even the whole device correctly. It's easily to prove that if
> you set error points during client registration and start the upper users. The
> problems could be kernel hung, kernel oops. For example, if mad_client
> initilization ports failure and you start ipoib_client. ifconfig will hung in
> kernel. If sa_client failure, the ipoib multicast join will hit kernel oops.
> Staring the upper users without checking the depency resouce allocation is
> buggy. It is  definitely worth to spend time to address this.

This sounds like bugs in the code where we don't trap failures gracefully.  I
think fixing that is probably much more useful.  There will always be situations
where runtime errors can occur (memory allocation failure, for example), and all
upper level protocols must handle failures of these calls.

Putting in code and requiring every client to compare all the various bit fields
they're interested in doesn't remove the need for proper error handling.  Proper
error handling should resolve both the ifconfig hang and multicast join oops.

Just my $0.02

- Fab




More information about the general mailing list