[openib-general] [PATCH]proposal for enabling partial ports on HCA
Roland Dreier
rolandd at cisco.com
Wed Oct 5 12:06:57 PDT 2005
Shirley> The port failure means the SW clients initilization of
Shirley> that port failure. Doesn't matter whether the link is
Shirley> up/down or the hardware/firmare problem. If encountering
Shirley> any of the SW errors, the upper users can't use that port
Shirley> correctly, or even the whole device correctly. It's
Shirley> easily to prove that if you set error points during
Shirley> client registration and start the upper users. The
Shirley> problems could be kernel hung, kernel oops. For example,
Shirley> if mad_client initilization ports failure and you start
Shirley> ipoib_client. ifconfig will hung in kernel. If sa_client
Shirley> failure, the ipoib multicast join will hit kernel
Shirley> oops. Staring the upper users without checking the
Shirley> depency resouce allocation is buggy. It is definitely
Shirley> worth to spend time to address this.
Yes, I agree we should fix the bugs in error handling during
registration. However, I don't think that a mask of ports is the
right answer -- it doesn't seem to address the real issue. We should
just make sure that if, say, the MAD layer fails to initialize a
device, then all clients that depend on the MAD layer don't try to use
that device. I'm not sure what the right way to express these
dependencies is, however.
- R.
More information about the general
mailing list