[ofw] [PATCH] complib/user: fix timer race conditions
Smith, Stan
stan.smith at intel.com
Fri Jun 25 16:13:24 PDT 2010
Hefty, Sean wrote:
>> I just did, and 3.3.6 is running now. The diags are working, except
>> for ibnetdiscover, which is hanging. I should have the fix for that
>> included, but I'm verifying that now.
>
> I deleted all older copies of libibumad off my system, and
> ibnetdiscover no longer hangs. I have no idea how windows selects
> what dll it uses with an app, but it doesn't seem to give any
> preference to the local directory, windows\system32 directory, or
> even follow the path.
>
> In any case, please pull in this patch and test on the larger cluster.
Using Sean's kernel + user-mode cl_timer patches along with Leo's latest cancel patches, opensm 3.3.6(vendor-umad) on HPC Edition x64 (Beta), DHCP address assignment for IPoIB is working as expected over 11 reboots of 52 nodes.
DAPL tests over 52 nodes are all working correctly (no-hangs).
osmtest (full suite) passing; multiple nodes simultaneously.
saquery NR; passing
saquery PR; passing
After the 4th reboot, all IPoIB IPv4 leases were removed thus forcing the reassignment issue on the next reboot; all tests continued to pass.
Rebooted just the head-node, then all compute nodes - no problems.
Preliminary testing demonstrates the DHCP address assignment problems (port state/lost MADs & DHCP address assignment) are under control.
stan.
PS: Fab - With OpenSM (vendor-umad) ibdiags+osmtest can run from the opensm node.
More information about the ofw
mailing list