[ofa-general] Re: Re: IPoIB-CM UC mode

Sean Hefty mshefty at ichips.intel.com
Tue Jul 3 11:14:23 PDT 2007


> Hmm, I don't see how REQ gives you data on existing connection. Further,
> this would need a spec extension to define private data format then?
> LAP trick works out of the box ...

LAP keep-alives requires the apps to implement the keep alive timers and 
detection, but sends the messages out-of-band.  Why not send the 
messages in-band?  Would it make more sense to implement the entire 
keep-alive solution in the CM?

> I actually think a single working solution is enough.
> No need to explore all of them :).

I'm not saying implement all of them, just make sure that we have the 
best solution.  I can't think of one that I like better than using LAP, 
but it feels like the CM protocol / MADs are being hijacked.  For 
example, if there's only one path between two nodes, LAP doesn't really 
make any sense, but it ends up being used.  Should we instead look at 
adding new CM messages for just this purpose?

>> For 
>> example, event registration could be used to detect that a remote node 
>> has gone down.
>> We could use per node keep alive messages, rather than 
>> per connection messages.
> 
> No, these won't address cases such as DREQ timeout after remote
> decides to close connection, without reboot.

Per node keep alive messages could.  It depends on what data is carried 
in the message (e.g. all currently connected QPs to the node in 
question).  I mentioned this because it may be more efficient under some 
circumstances.

- Sean



More information about the general mailing list