[ofa-general] Couple of questions about OpenSM
Nicolas Morey-Chaisemartin
devel at morey-chaisemartin.com
Fri Mar 20 09:03:47 PDT 2009
Sasha Khapyorsky a écrit :
> Hi Nicolas,
>
> On 11:12 Thu 19 Mar , Nicolas Morey Chaisemartin wrote:
>> First question and an easy one:
>> By default with what optimization options is OpenSM compiled with?
>
> It is defined in autoconf I guess.
>
>> Without any specific options using git, there a no -O2 or such so all the inline functions are not inline which make a huge bad impact on performances (few millions calls to osm_switch_get_least_hops not inline consumes over 15% of computing time)
>
> In my setup I have in /usr/share/autoconf/autoconf/c.m4:
>
> if test "$GCC" = yes; then
> CFLAGS="-g -O2"
>
> And -O2 is turned on by default.
This doesn't seem to be the case for me, I'll check where that comes from
>
>> Next one and a bit harder:
>> In the Fat-Tree we have a 2D array for hop table (destination lid/port num). Why is this table allocated as we need and not all at once?
>
> Yes, I think it can be preallocated. Just not that this should not have
> port arrays for each lid in a fabric, but only for switches.
Good. There should be one table for each switch and each of them as one entry per (port/lid) couple no?
>> And is it really necessary to check each time if the lid we use is not greater than max_lid_ho ?
>
> May be not, but I will need to check carefully.
>
>> The only reason I would see for this is if a new node/switch with a bigger lid was added to the fabric while openSM is routing. In such a case, wouldn't a lock protect the variables so new lid can't appear/disappear while it calculates the routes ?
>
> Routing calculation cannot happen in parallel with discovery (it is all
> serialized in do_sweep() function), we should be protected at least in
> this part.
>
>> If yes, we could allocate all and skip a lot of checks. We have millions of calls to malloc and memset in osm_switch_set_hops plus tests in get_hops/get_least_hops.
>
> malloc() calls are conditional, there could be many checks, but only
> "needed" amount of malloc() itself.
>
We don't do more than necessary but it costs a lot of time to do millions of syscall.
>> This may cost a bit more memory,
>
> Min hop's port arrays preallocation should not cost any extra memory (if
> done properly - for switches only) - we are allocating all needed buffers
> in routing calculation time anyway.
>
>> but easily gain 15% on routing computation time.
>
> Well, I'm skeptical about 15% :). But it doesn't really matter - even
> 1% performance gain and/or cleaner code would be nice improvement.
>
Well it about 15% of computation time (for Ftree at least) is spent in set_hops and specifically in malloc/memset part
If we do it only once at the start, it should probably be much faster !
By the way, is there a reason you don't use likely/unlikely commands in conditions?
Regards
Nicolas
More information about the general
mailing list