[ofa-general] Re: Re: IPoIB-CM UC mode
Sean Hefty
mshefty at ichips.intel.com
Tue Jul 3 11:14:23 PDT 2007
> Hmm, I don't see how REQ gives you data on existing connection. Further,
> this would need a spec extension to define private data format then?
> LAP trick works out of the box ...
LAP keep-alives requires the apps to implement the keep alive timers and
detection, but sends the messages out-of-band. Why not send the
messages in-band? Would it make more sense to implement the entire
keep-alive solution in the CM?
> I actually think a single working solution is enough.
> No need to explore all of them :).
I'm not saying implement all of them, just make sure that we have the
best solution. I can't think of one that I like better than using LAP,
but it feels like the CM protocol / MADs are being hijacked. For
example, if there's only one path between two nodes, LAP doesn't really
make any sense, but it ends up being used. Should we instead look at
adding new CM messages for just this purpose?
>> For
>> example, event registration could be used to detect that a remote node
>> has gone down.
>> We could use per node keep alive messages, rather than
>> per connection messages.
>
> No, these won't address cases such as DREQ timeout after remote
> decides to close connection, without reboot.
Per node keep alive messages could. It depends on what data is carried
in the message (e.g. all currently connected QPs to the node in
question). I mentioned this because it may be more efficient under some
circumstances.
- Sean
More information about the general
mailing list