[ofw] files for network direct(ibndprov.dll)

Smith, Stan stan.smith at intel.com
Fri Oct 10 08:35:13 PDT 2008


Thank you very much for the concise problem descriptions.

OpenSM is working much more reliably in WinOF RC3, soon to be released.

Mellanox will need to explain why WinIB-ND works and WinOF RC2 ND binaries do not?
Purportedly the ND binaries are the same?

Stan.

Go Yoshimura wrote:
> - We use WinIB-ND on top of WinOF because
>   - we can not use Network Direct with
> WinOF_2-0_wlh_x64.zip(v2.0_rc2).
>   - when we use WinOF_2-0_wlh_x64.zip(v2.0_rc2), we get error like
> <NdOpenIAdapter failed> with
> diagnostic/performance/MPI-PingPong-lightweight-throughput and all
> mpi-programs including MP-Linpack.
>   - "ndinstall -r" eliminate this error and we can get 180MB/s
> throughput performance result from MPI PingPong.
>   - when we use WinIB-NDv1.4.1(from mellanox.com), we can get 663MB/s
>     throughput performance result from MPI PingPong. We managed to
> succeed this 2 days ago. - If we can use Network Direct, we will use
>   WinOF_2-0_wlh_x64.zip(v2.0_rc2). If we can use
>      WinOF_2-0_wlh_x64.zip(v2.0_rc2) mainly and
>      WinIB-NDv1.4.1(from mellanox.com) for Network Direct only,
>   please tell us how to do it.
>   Of course it will be more happy if newer version of
> WinOF_2-0_wlh_x64.zip enable using Network Direct. - We tried
>   replacing ibndprov.dll from WinOF_2-0_wlh_x64.zip(v2.0_rc2) to
>   WinIB-NDv1.4.1(from mellanox.com). But we got similar error. Before
>     replacing C:\Program Files\Microsoft HPC Pack 2008
>     SDK\NetworkDirect\Bin\amd64>ndmrrate 192.168.12.90 NdOpenIAdapter
>   failed with c0000001 After replacing
>     C:\Program Files\Microsoft HPC Pack 2008
>     SDK\NetworkDirect\Bin\amd64>ndmrrate 192.168.12.90 NdOpenIAdapter
> failed with c000009a - We tried updating driver
>    d. Open "Server Manager"->"Diagnostics"->"Device Manager".
>    e. Right click on the HCA device and select "Update Driver
>   Software". But we got a message "driver is latest"
> - We will try adding 'Full-control' when we get messages like "Access
> denied"
> - opensmd.exe
>   - We get a problem with opensmd.exe.
>     After we run opensmd.exe, we get problems like
>     - vstat.exe will hang
>     - system response become slower
>       1 cpu is occupied(we can see it in task-manager->process)
>     - we get BSOD when we shutdown
>   - Our workaround is "opensmd.exe -V".
>     opensmd.exe option  /  result
>     "opensmd.exe"       /  bad
>     "opensmd.exe -v"    /  bad
>     "opensmd.exe -V"    /  good
>   - We also modify
>
>
> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\OpenSM\ImagePath
> from  "opensmd.exe -e --service" to    "opensmd.exe -V -e --service"
>
> <NdOpenIAdapter failed>
> 2008/10/01 15:06:07     CH3_ND::CAdapter::Init(43)..............:
> [ch3:nd] NdOpenIAdapter failed with 0xc0000001 2008/10/01 15:06:07
> CH3_ND::CAdapter::Create(98)............: 2008/10/01 15:06:07
> CH3_ND::CEnvironment::CreateAdapter(576): 2008/10/01 15:06:07
> CH3_ND::CEnvironment::Listen(161).......: 2008/10/01 15:06:07
> MPIDI_CH3_Init(121).....................: 2008/10/01 15:06:07
> MPID_Init(157)..........................: channel initialization
> failed 2008/10/01 15:06:07
> MPIR_Init_thread(225)...................: Initialization failed
> 2008/10/01 15:06:07     Fatal error in MPI_Init: Other MPI error,
> error stack: 2008/10/01 15:06:07     [0] fatal error
>
> <hardware>
> - Motherboard is S5000PAL
> - Infiniband HCA is InfiniHost III Lx SDR(fw-25204 Rev 1.0.800)
>   PSID:            INT0010000001
>   This is Intel OEM version. (diffcult to update firmware)
> - 4 systems which form a HPCS2008 cluster
>
> <software>
> - We are running Windows Server 2008 HPC edition. (evaluation)
> - We are using Windows Server 2008 Multilingual User Interface
>   Language Pack for Japanese.
> 6001.18000.080118-1840_amd64fre_Server_LP_1-KRMSLPX1_DVD.img
> - We are using HPCpack
> - Infiniband software(trying)
>   WinOF_2-0_wlh_x64.zip(v2.0_rc2)
>   WinIB-NDv1.4.1(from mellanox.com)
> - When we install WinOF_2-0_wlh_x64.zip(v2.0_rc2), decline only
>   connectX drivers. We have tried some other selections.
>
> thank you
> go
> --------------------
>
> Smith, Stan wrote:
>> Go Yoshimura wrote:
>>> Thank you for your helps.
>>> - We understand ND binaries and others are not always updated at
>>> the same time.
>>> - Ibscan.bat and Ibcleanup.bat help us very much.
>>>   With these files, we have succeeded uninstalling
>>> WinOF_2-0_wlh_x64.zip(v2.0_rc2) and then installing WinIB-ND
>>>   v1.4.1(from mellanox.com).
>>
>> Why do you add the WinIB-ND on top of WinOF?
>>
>>> - When we run IBcleanup.bat, we
>>> sometimes(2 cases for 5 systems) get messages which means "Access
>>> denied". In this case, we have to reboot and re-run IBcleanup.bat.
>>
>> I suspect you are running server 2008, as files in the driver store
>> are owned by System which is now different from Administrator. The
>> files in the driver store need to have their owner changed to
> Administrator, then 'Full-control' applied to said files, then one
> can actually delete them....sigh.
>
> ----
> Go Yoshimura <go-yoshimura at sstc.co.jp>
> Scalable Systems Co., Ltd.  <http://www.sstc.co.jp/>
> Tokyo Kojimachi Office  BUREX Kojimachi 8F, 3-5-2 Kojimachi,
>               Chiyoda-ku, Tokyo 102-0083 Japan Tel: 81-3-5875-4718
> Fax: 81-3-3237-7612
> Osaka Office            HONMACHI-COLLABO Bldg. 4F, 4-4-2
>               Kita-kyuhoji-machi, Chuo-ku, Osaka 541-0057 Japan Tel:
> 81-6-6224-4115 _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw




More information about the ofw mailing list