[ofw] files for network direct(ibndprov.dll)
Go Yoshimura
go-yoshimura at sstc.co.jp
Fri Oct 10 09:07:52 PDT 2008
- "WinOF RC3, soon to be released." is a good news!
- We believe ND binaries are different because of size difference.
2008/09/22 18:12 43,520 ibndprov.dll
2008/08/27 10:21 44,032 ibndprovRC2.dll
- We suspect "why WinIB-ND works and WinOF RC2 ND binaries do not" is version difference.
WinIB-NDv1.4.1 is version1.4.1, which seems to add NetworkDirect to ofed 1.3.1 in a special manner.(just for HPCS2008)
WinIB-NDv1.4.1 includes only inf files and does not include all packages.(udapl, srp, diags)
- According to https://docs.mellanox.com/dm/WinIB/docs/WinIB-ND_1_4_1_ReleaseNotes.txt
Mellanox WinIB-ND will be integrated into MLNX_WinOF v2.0 which is scheduled
for release in October 2008.
We expect MLNX_WinOF v2.0 will be compatible to WinOF_2-0_wlh_x64.zip(v2.0_rc2).
- We are looking forward to MLNX_WinOFv2.0 and WinOF_2-0_wlh_x64.zip(v2.0)
thank you
go
----------------
Smith, Stan wrote:
>
>Thank you very much for the concise problem descriptions.
>
>OpenSM is working much more reliably in WinOF RC3, soon to be released.
>
>Mellanox will need to explain why WinIB-ND works and WinOF RC2 ND binaries do not?
>Purportedly the ND binaries are the same?
>
>Stan.
>
>Go Yoshimura wrote:
>> - We use WinIB-ND on top of WinOF because
>> - we can not use Network Direct with
>> WinOF_2-0_wlh_x64.zip(v2.0_rc2).
>> - when we use WinOF_2-0_wlh_x64.zip(v2.0_rc2), we get error like
>> <NdOpenIAdapter failed> with
>> diagnostic/performance/MPI-PingPong-lightweight-throughput and all
>> mpi-programs including MP-Linpack.
>> - "ndinstall -r" eliminate this error and we can get 180MB/s
>> throughput performance result from MPI PingPong.
>> - when we use WinIB-NDv1.4.1(from mellanox.com), we can get 663MB/s
>> throughput performance result from MPI PingPong. We managed to
>> succeed this 2 days ago. - If we can use Network Direct, we will use
>> WinOF_2-0_wlh_x64.zip(v2.0_rc2). If we can use
>> WinOF_2-0_wlh_x64.zip(v2.0_rc2) mainly and
>> WinIB-NDv1.4.1(from mellanox.com) for Network Direct only,
>> please tell us how to do it.
>> Of course it will be more happy if newer version of
>> WinOF_2-0_wlh_x64.zip enable using Network Direct. - We tried
>> replacing ibndprov.dll from WinOF_2-0_wlh_x64.zip(v2.0_rc2) to
>> WinIB-NDv1.4.1(from mellanox.com). But we got similar error. Before
>> replacing C:\Program Files\Microsoft HPC Pack 2008
>> SDK\NetworkDirect\Bin\amd64>ndmrrate 192.168.12.90 NdOpenIAdapter
>> failed with c0000001 After replacing
>> C:\Program Files\Microsoft HPC Pack 2008
>> SDK\NetworkDirect\Bin\amd64>ndmrrate 192.168.12.90 NdOpenIAdapter
>> failed with c000009a - We tried updating driver
>> d. Open "Server Manager"->"Diagnostics"->"Device Manager".
>> e. Right click on the HCA device and select "Update Driver
>> Software". But we got a message "driver is latest"
>> - We will try adding 'Full-control' when we get messages like "Access
>> denied"
>> - opensmd.exe
>> - We get a problem with opensmd.exe.
>> After we run opensmd.exe, we get problems like
>> - vstat.exe will hang
>> - system response become slower
>> 1 cpu is occupied(we can see it in task-manager->process)
>> - we get BSOD when we shutdown
>> - Our workaround is "opensmd.exe -V".
>> opensmd.exe option / result
>> "opensmd.exe" / bad
>> "opensmd.exe -v" / bad
>> "opensmd.exe -V" / good
>> - We also modify
>>
>>
>> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\OpenSM\ImagePath
>> from "opensmd.exe -e --service" to "opensmd.exe -V -e --service"
>>
>> <NdOpenIAdapter failed>
>> 2008/10/01 15:06:07 CH3_ND::CAdapter::Init(43)..............:
>> [ch3:nd] NdOpenIAdapter failed with 0xc0000001 2008/10/01 15:06:07
>> CH3_ND::CAdapter::Create(98)............: 2008/10/01 15:06:07
>> CH3_ND::CEnvironment::CreateAdapter(576): 2008/10/01 15:06:07
>> CH3_ND::CEnvironment::Listen(161).......: 2008/10/01 15:06:07
>> MPIDI_CH3_Init(121).....................: 2008/10/01 15:06:07
>> MPID_Init(157)..........................: channel initialization
>> failed 2008/10/01 15:06:07
>> MPIR_Init_thread(225)...................: Initialization failed
>> 2008/10/01 15:06:07 Fatal error in MPI_Init: Other MPI error,
>> error stack: 2008/10/01 15:06:07 [0] fatal error
>>
>> <hardware>
>> - Motherboard is S5000PAL
>> - Infiniband HCA is InfiniHost III Lx SDR(fw-25204 Rev 1.0.800)
>> PSID: INT0010000001
>> This is Intel OEM version. (diffcult to update firmware)
>> - 4 systems which form a HPCS2008 cluster
>>
>> <software>
>> - We are running Windows Server 2008 HPC edition. (evaluation)
>> - We are using Windows Server 2008 Multilingual User Interface
>> Language Pack for Japanese.
>> 6001.18000.080118-1840_amd64fre_Server_LP_1-KRMSLPX1_DVD.img
>> - We are using HPCpack
>> - Infiniband software(trying)
>> WinOF_2-0_wlh_x64.zip(v2.0_rc2)
>> WinIB-NDv1.4.1(from mellanox.com)
>> - When we install WinOF_2-0_wlh_x64.zip(v2.0_rc2), decline only
>> connectX drivers. We have tried some other selections.
>>
>> thank you
>> go
>> --------------------
>>
>> Smith, Stan wrote:
>>> Go Yoshimura wrote:
>>>> Thank you for your helps.
>>>> - We understand ND binaries and others are not always updated at
>>>> the same time.
>>>> - Ibscan.bat and Ibcleanup.bat help us very much.
>>>> With these files, we have succeeded uninstalling
>>>> WinOF_2-0_wlh_x64.zip(v2.0_rc2) and then installing WinIB-ND
>>>> v1.4.1(from mellanox.com).
>>>
>>> Why do you add the WinIB-ND on top of WinOF?
>>>
>>>> - When we run IBcleanup.bat, we
>>>> sometimes(2 cases for 5 systems) get messages which means "Access
>>>> denied". In this case, we have to reboot and re-run IBcleanup.bat.
>>>
>>> I suspect you are running server 2008, as files in the driver store
>>> are owned by System which is now different from Administrator. The
>>> files in the driver store need to have their owner changed to
>> Administrator, then 'Full-control' applied to said files, then one
>> can actually delete them....sigh.
>>
>> ----
>> Go Yoshimura <go-yoshimura at sstc.co.jp>
>> Scalable Systems Co., Ltd. <http://www.sstc.co.jp/>
>> Tokyo Kojimachi Office BUREX Kojimachi 8F, 3-5-2 Kojimachi,
>> Chiyoda-ku, Tokyo 102-0083 Japan Tel: 81-3-5875-4718
>> Fax: 81-3-3237-7612
>> Osaka Office HONMACHI-COLLABO Bldg. 4F, 4-4-2
>> Kita-kyuhoji-machi, Chuo-ku, Osaka 541-0057 Japan Tel:
>> 81-6-6224-4115 _______________________________________________
>> ofw mailing list
>> ofw at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>
>
>
----
Go Yoshimura <go-yoshimura at sstc.co.jp>
Scalable Systems Co., Ltd. <http://www.sstc.co.jp/>
Tokyo Kojimachi Office BUREX Kojimachi 8F, 3-5-2 Kojimachi, Chiyoda-ku, Tokyo 102-0083 Japan
Tel: 81-3-5875-4718 Fax: 81-3-3237-7612
Osaka Office HONMACHI-COLLABO Bldg. 4F, 4-4-2 Kita-kyuhoji-machi, Chuo-ku, Osaka 541-0057 Japan
Tel: 81-6-6224-4115
More information about the ofw
mailing list