[ofw] [PATCH] complib/user: fix timer race conditions

Smith, Stan stan.smith at intel.com
Fri Jun 25 16:13:24 PDT 2010


Hefty, Sean wrote:
>> I just did, and 3.3.6 is running now.  The diags are working, except
>> for ibnetdiscover, which is hanging.  I should have the fix for that
>> included, but I'm verifying that now.
>
> I deleted all older copies of libibumad off my system, and
> ibnetdiscover no longer hangs.  I have no idea how windows selects
> what dll it uses with an app, but it doesn't seem to give any
> preference to the local directory, windows\system32 directory, or
> even follow the path.
>
> In any case, please pull in this patch and test on the larger cluster.

Using Sean's kernel + user-mode cl_timer patches along with Leo's latest cancel patches, opensm 3.3.6(vendor-umad) on HPC Edition x64 (Beta), DHCP address assignment for IPoIB is working as expected over 11 reboots of 52 nodes.

DAPL tests over 52 nodes are all working correctly (no-hangs).
osmtest (full suite) passing; multiple nodes simultaneously.
saquery NR; passing
saquery PR; passing

After the 4th reboot, all IPoIB IPv4 leases were removed thus forcing the reassignment issue on the next reboot; all tests continued to pass.
Rebooted just the head-node, then all compute nodes - no problems.

Preliminary testing demonstrates the DHCP address assignment problems (port state/lost MADs & DHCP address assignment) are under control.

stan.

PS: Fab - With OpenSM (vendor-umad) ibdiags+osmtest can run from the opensm node.



More information about the ofw mailing list