[ofw] files for network direct(ibndprov.dll)

Go Yoshimura go-yoshimura at sstc.co.jp
Fri Oct 10 09:07:52 PDT 2008


- "WinOF RC3, soon to be released." is a good news! 
- We believe ND binaries are different because of size difference.
  2008/09/22  18:12            43,520 ibndprov.dll
  2008/08/27  10:21            44,032 ibndprovRC2.dll
- We suspect "why WinIB-ND works and WinOF RC2 ND binaries do not" is version difference.
  WinIB-NDv1.4.1 is version1.4.1, which seems to add NetworkDirect to ofed 1.3.1 in a special manner.(just for HPCS2008)
  WinIB-NDv1.4.1 includes only inf files and does not include all packages.(udapl, srp, diags)
- According to https://docs.mellanox.com/dm/WinIB/docs/WinIB-ND_1_4_1_ReleaseNotes.txt 
       Mellanox WinIB-ND will be integrated into MLNX_WinOF v2.0 which is scheduled
       for release in October 2008.
  We expect MLNX_WinOF v2.0 will be compatible to WinOF_2-0_wlh_x64.zip(v2.0_rc2).
- We are looking forward to MLNX_WinOFv2.0 and WinOF_2-0_wlh_x64.zip(v2.0)

thank you
go
----------------

Smith, Stan wrote:
>
>Thank you very much for the concise problem descriptions.
>
>OpenSM is working much more reliably in WinOF RC3, soon to be released.
>
>Mellanox will need to explain why WinIB-ND works and WinOF RC2 ND binaries do not?
>Purportedly the ND binaries are the same?
>
>Stan.
>
>Go Yoshimura wrote:
>> - We use WinIB-ND on top of WinOF because
>>   - we can not use Network Direct with
>> WinOF_2-0_wlh_x64.zip(v2.0_rc2).
>>   - when we use WinOF_2-0_wlh_x64.zip(v2.0_rc2), we get error like
>> <NdOpenIAdapter failed> with
>> diagnostic/performance/MPI-PingPong-lightweight-throughput and all
>> mpi-programs including MP-Linpack.
>>   - "ndinstall -r" eliminate this error and we can get 180MB/s
>> throughput performance result from MPI PingPong.
>>   - when we use WinIB-NDv1.4.1(from mellanox.com), we can get 663MB/s
>>     throughput performance result from MPI PingPong. We managed to
>> succeed this 2 days ago. - If we can use Network Direct, we will use
>>   WinOF_2-0_wlh_x64.zip(v2.0_rc2). If we can use
>>      WinOF_2-0_wlh_x64.zip(v2.0_rc2) mainly and
>>      WinIB-NDv1.4.1(from mellanox.com) for Network Direct only,
>>   please tell us how to do it.
>>   Of course it will be more happy if newer version of
>> WinOF_2-0_wlh_x64.zip enable using Network Direct. - We tried
>>   replacing ibndprov.dll from WinOF_2-0_wlh_x64.zip(v2.0_rc2) to
>>   WinIB-NDv1.4.1(from mellanox.com). But we got similar error. Before
>>     replacing C:\Program Files\Microsoft HPC Pack 2008
>>     SDK\NetworkDirect\Bin\amd64>ndmrrate 192.168.12.90 NdOpenIAdapter
>>   failed with c0000001 After replacing
>>     C:\Program Files\Microsoft HPC Pack 2008
>>     SDK\NetworkDirect\Bin\amd64>ndmrrate 192.168.12.90 NdOpenIAdapter
>> failed with c000009a - We tried updating driver
>>    d. Open "Server Manager"->"Diagnostics"->"Device Manager".
>>    e. Right click on the HCA device and select "Update Driver
>>   Software". But we got a message "driver is latest"
>> - We will try adding 'Full-control' when we get messages like "Access
>> denied"
>> - opensmd.exe
>>   - We get a problem with opensmd.exe.
>>     After we run opensmd.exe, we get problems like
>>     - vstat.exe will hang
>>     - system response become slower
>>       1 cpu is occupied(we can see it in task-manager->process)
>>     - we get BSOD when we shutdown
>>   - Our workaround is "opensmd.exe -V".
>>     opensmd.exe option  /  result
>>     "opensmd.exe"       /  bad
>>     "opensmd.exe -v"    /  bad
>>     "opensmd.exe -V"    /  good
>>   - We also modify
>>
>>
>> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\OpenSM\ImagePath
>> from  "opensmd.exe -e --service" to    "opensmd.exe -V -e --service"
>>
>> <NdOpenIAdapter failed>
>> 2008/10/01 15:06:07     CH3_ND::CAdapter::Init(43)..............:
>> [ch3:nd] NdOpenIAdapter failed with 0xc0000001 2008/10/01 15:06:07
>> CH3_ND::CAdapter::Create(98)............: 2008/10/01 15:06:07
>> CH3_ND::CEnvironment::CreateAdapter(576): 2008/10/01 15:06:07
>> CH3_ND::CEnvironment::Listen(161).......: 2008/10/01 15:06:07
>> MPIDI_CH3_Init(121).....................: 2008/10/01 15:06:07
>> MPID_Init(157)..........................: channel initialization
>> failed 2008/10/01 15:06:07
>> MPIR_Init_thread(225)...................: Initialization failed
>> 2008/10/01 15:06:07     Fatal error in MPI_Init: Other MPI error,
>> error stack: 2008/10/01 15:06:07     [0] fatal error
>>
>> <hardware>
>> - Motherboard is S5000PAL
>> - Infiniband HCA is InfiniHost III Lx SDR(fw-25204 Rev 1.0.800)
>>   PSID:            INT0010000001
>>   This is Intel OEM version. (diffcult to update firmware)
>> - 4 systems which form a HPCS2008 cluster
>>
>> <software>
>> - We are running Windows Server 2008 HPC edition. (evaluation)
>> - We are using Windows Server 2008 Multilingual User Interface
>>   Language Pack for Japanese.
>> 6001.18000.080118-1840_amd64fre_Server_LP_1-KRMSLPX1_DVD.img
>> - We are using HPCpack
>> - Infiniband software(trying)
>>   WinOF_2-0_wlh_x64.zip(v2.0_rc2)
>>   WinIB-NDv1.4.1(from mellanox.com)
>> - When we install WinOF_2-0_wlh_x64.zip(v2.0_rc2), decline only
>>   connectX drivers. We have tried some other selections.
>>
>> thank you
>> go
>> --------------------
>>
>> Smith, Stan wrote:
>>> Go Yoshimura wrote:
>>>> Thank you for your helps.
>>>> - We understand ND binaries and others are not always updated at
>>>> the same time.
>>>> - Ibscan.bat and Ibcleanup.bat help us very much.
>>>>   With these files, we have succeeded uninstalling
>>>> WinOF_2-0_wlh_x64.zip(v2.0_rc2) and then installing WinIB-ND
>>>>   v1.4.1(from mellanox.com).
>>>
>>> Why do you add the WinIB-ND on top of WinOF?
>>>
>>>> - When we run IBcleanup.bat, we
>>>> sometimes(2 cases for 5 systems) get messages which means "Access
>>>> denied". In this case, we have to reboot and re-run IBcleanup.bat.
>>>
>>> I suspect you are running server 2008, as files in the driver store
>>> are owned by System which is now different from Administrator. The
>>> files in the driver store need to have their owner changed to
>> Administrator, then 'Full-control' applied to said files, then one
>> can actually delete them....sigh.
>>
>> ----
>> Go Yoshimura <go-yoshimura at sstc.co.jp>
>> Scalable Systems Co., Ltd.  <http://www.sstc.co.jp/>
>> Tokyo Kojimachi Office  BUREX Kojimachi 8F, 3-5-2 Kojimachi,
>>               Chiyoda-ku, Tokyo 102-0083 Japan Tel: 81-3-5875-4718
>> Fax: 81-3-3237-7612
>> Osaka Office            HONMACHI-COLLABO Bldg. 4F, 4-4-2
>>               Kita-kyuhoji-machi, Chuo-ku, Osaka 541-0057 Japan Tel:
>> 81-6-6224-4115 _______________________________________________
>> ofw mailing list
>> ofw at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>
>
>

----
Go Yoshimura <go-yoshimura at sstc.co.jp>
Scalable Systems Co., Ltd.  <http://www.sstc.co.jp/>
Tokyo Kojimachi Office  BUREX Kojimachi 8F, 3-5-2 Kojimachi, Chiyoda-ku, Tokyo 102-0083 Japan 
              Tel: 81-3-5875-4718 Fax: 81-3-3237-7612              
Osaka Office            HONMACHI-COLLABO Bldg. 4F, 4-4-2 Kita-kyuhoji-machi, Chuo-ku, Osaka 541-0057 Japan
              Tel: 81-6-6224-4115



More information about the ofw mailing list