[ofw] files for network direct(ibndprov.dll)

Go Yoshimura go-yoshimura at sstc.co.jp
Fri Oct 10 06:37:24 PDT 2008


- We use WinIB-ND on top of WinOF because
  - we can not use Network Direct with WinOF_2-0_wlh_x64.zip(v2.0_rc2).
  - when we use WinOF_2-0_wlh_x64.zip(v2.0_rc2), we get error like <NdOpenIAdapter failed> with diagnostic/performance/MPI-PingPong-lightweight-throughput and all mpi-programs including MP-Linpack.
  - "ndinstall -r" eliminate this error and we can get 180MB/s throughput performance result from MPI PingPong.
  - when we use WinIB-NDv1.4.1(from mellanox.com), we can get 663MB/s throughput performance result from MPI PingPong. 
    We managed to succeed this 2 days ago.
- If we can use Network Direct, we will use WinOF_2-0_wlh_x64.zip(v2.0_rc2).
  If we can use 
     WinOF_2-0_wlh_x64.zip(v2.0_rc2) mainly and 
     WinIB-NDv1.4.1(from mellanox.com) for Network Direct only,
  please tell us how to do it.
  Of course it will be more happy if newer version of WinOF_2-0_wlh_x64.zip enable using Network Direct.
- We tried replacing ibndprov.dll from WinOF_2-0_wlh_x64.zip(v2.0_rc2) to WinIB-NDv1.4.1(from mellanox.com).
  But we got similar error.
  Before replacing
    C:\Program Files\Microsoft HPC Pack 2008 SDK\NetworkDirect\Bin\amd64>ndmrrate 192.168.12.90
    NdOpenIAdapter failed with c0000001
  After replacing
    C:\Program Files\Microsoft HPC Pack 2008 SDK\NetworkDirect\Bin\amd64>ndmrrate 192.168.12.90
    NdOpenIAdapter failed with c000009a
- We tried updating driver 
   d. Open "Server Manager"->"Diagnostics"->"Device Manager".
   e. Right click on the HCA device and select "Update Driver Software".
  But we got a message "driver is latest"
- We will try adding 'Full-control' when we get messages like "Access denied"
- opensmd.exe
  - We get a problem with opensmd.exe. 
    After we run opensmd.exe, we get problems like
    - vstat.exe will hang
    - system response become slower
      1 cpu is occupied(we can see it in task-manager->process)
    - we get BSOD when we shutdown
  - Our workaround is "opensmd.exe -V".
    opensmd.exe option  /  result
    "opensmd.exe"       /  bad
    "opensmd.exe -v"    /  bad
    "opensmd.exe -V"    /  good
  - We also modify HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\OpenSM\ImagePath
    from  "opensmd.exe -e --service" 
    to    "opensmd.exe -V -e --service" 
 
<NdOpenIAdapter failed>
2008/10/01 15:06:07	CH3_ND::CAdapter::Init(43)..............: [ch3:nd] NdOpenIAdapter failed with 0xc0000001
2008/10/01 15:06:07	CH3_ND::CAdapter::Create(98)............: 
2008/10/01 15:06:07	CH3_ND::CEnvironment::CreateAdapter(576): 
2008/10/01 15:06:07	CH3_ND::CEnvironment::Listen(161).......: 
2008/10/01 15:06:07	MPIDI_CH3_Init(121).....................: 
2008/10/01 15:06:07	MPID_Init(157)..........................: channel initialization failed
2008/10/01 15:06:07	MPIR_Init_thread(225)...................: Initialization failed
2008/10/01 15:06:07	Fatal error in MPI_Init: Other MPI error, error stack:
2008/10/01 15:06:07	[0] fatal error

<hardware>
- Motherboard is S5000PAL
- Infiniband HCA is InfiniHost III Lx SDR(fw-25204 Rev 1.0.800)
  PSID:            INT0010000001
  This is Intel OEM version. (diffcult to update firmware)
- 4 systems which form a HPCS2008 cluster

<software>
- We are running Windows Server 2008 HPC edition. (evaluation)
- We are using Windows Server 2008 Multilingual User Interface Language Pack for Japanese.
  6001.18000.080118-1840_amd64fre_Server_LP_1-KRMSLPX1_DVD.img
- We are using HPCpack
- Infiniband software(trying)
  WinOF_2-0_wlh_x64.zip(v2.0_rc2)   
  WinIB-NDv1.4.1(from mellanox.com)
- When we install WinOF_2-0_wlh_x64.zip(v2.0_rc2), decline only connectX drivers.
  We have tried some other selections.

thank you
go
--------------------

Smith, Stan wrote:
>Go Yoshimura wrote:
>> Thank you for your helps.
>> - We understand ND binaries and others are not always updated at the
>> same time.
>> - Ibscan.bat and Ibcleanup.bat help us very much.
>>   With these files, we have succeeded uninstalling
>> WinOF_2-0_wlh_x64.zip(v2.0_rc2) and then installing WinIB-ND
>>   v1.4.1(from mellanox.com).
>
>Why do you add the WinIB-ND on top of WinOF?
>
>> - When we run IBcleanup.bat, we
>> sometimes(2 cases for 5 systems) get messages which means "Access
>> denied". In this case, we have to reboot and re-run IBcleanup.bat.
>
>I suspect you are running server 2008, as files in the driver store are owned by System which is now different from Administrator. The files in the driver store need to have their owner changed to 
Administrator, then 'Full-control' applied to said files, then one can actually delete them....sigh.

----
Go Yoshimura <go-yoshimura at sstc.co.jp>
Scalable Systems Co., Ltd.  <http://www.sstc.co.jp/>
Tokyo Kojimachi Office  BUREX Kojimachi 8F, 3-5-2 Kojimachi, Chiyoda-ku, Tokyo 102-0083 Japan 
              Tel: 81-3-5875-4718 Fax: 81-3-3237-7612              
Osaka Office            HONMACHI-COLLABO Bldg. 4F, 4-4-2 Kita-kyuhoji-machi, Chuo-ku, Osaka 541-0057 Japan
              Tel: 81-6-6224-4115



More information about the ofw mailing list