[ofiwg] Using libfabric on Titan

Philip Davis philip.e.davis at rutgers.edu
Thu May 3 12:46:02 PDT 2018


Thank you both, I appreciate the assistance. I will raise any issues I find.

> On May 3, 2018, at 3:38 PM, James Swaro <jswaro at cray.com> wrote:
> 
> I would add that you can also attach my name to the issue. 
> 
> @jswaro
> 
> Feel free to reach out. I'd have answered earlier, but Howard already caught what I was going to say. 
> 
> Regards, 
> -- Jim
> 
> On 5/3/18, 2:36 PM, "Pritchard Jr., Howard" <howardp at lanl.gov> wrote:
> 
>    HI Philip,
> 
>    Yes.  If you hit bugs please open issues on
>    https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fofiwg%2Flibfabric&data=02%7C01%7Cphilip.e.davis%40rutgers.edu%7C183d634c6c9440eb1b1308d5b12d7e32%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636609731314422623&sdata=S6lWq2ZzMcgusMPIbdrlDHhUyBEjt7ocIIwxo5tY%2Fbc%3D&reserved=0
>    and @hppritcha in the issue.
> 
>    Don’t expect very good performance on theta for small transfers.
> 
>    On theta, you’ll want to use the --enable-ugni-static
> 
>    configure option to help a little bit with those slow processors.
> 
>    You may also want to peruse
> 
>    https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fofi-cray%2Flibfabric-cray%2Fwiki%2FGNI-provider-building-it&data=02%7C01%7Cphilip.e.davis%40rutgers.edu%7C183d634c6c9440eb1b1308d5b12d7e32%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636609731314422623&sdata=goljiFdOOBaqfZ%2Fgo3WWWYBrnvf0MfA0uFGXKPM7mIM%3D&reserved=0
> 
> 
>    Good luck!
> 
>    Howard
> 
> 
>    -- 
>    Howard Pritchard
> 
>    B Schedule
>    HPC-ENV
>    Office 9, 2nd floor Research Park
>    TA-03, Building 4200, Room 03U
> 
>    Los Alamos National Laboratory
> 
> 
> 
> 
> 
>    On 5/3/18, 1:24 PM, "Philip Davis" <philip.e.davis at rutgers.edu> wrote:
> 
>> Thanks for the quick reply. That’s disappointing, but understandable. Is
>> the GNI provider supported on Cori and Theta?
>> 
>>> On May 3, 2018, at 2:49 PM, Pritchard Jr., Howard <howardp at lanl.gov>
>>> wrote:
>>> 
>>> the GNI provider isn’t supposed on Titan - OS is too old.
>>> 
>>> 
>>> -- 
>>> Howard Pritchard
>>> 
>>> B Schedule
>>> HPC-ENV
>>> Office 9, 2nd floor Research Park
>>> TA-03, Building 4200, Room 03U
>>> 
>>> Los Alamos National Laboratory
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 5/3/18, 11:21 AM, "Hefty, Sean" <sean.hefty at intel.com> wrote:
>>> 
>>>> Copying ofiwg, Howard, and Jim.
>>>> 
>>>> 
>>>>> Apologies if this has been asked before.
>>>>> 
>>>>> I am trying to use the GNI provider for libfabric on Titan, the XK7
>>>>> machine at ORNL. Is this possible/supported?
>>>>> 
>>>>> More specifically, I need to be able to communicate between aprun
>>>>> instances (e.g. aprun ./writer and aprun ./reader are able to
>>>>> communicate with each other) so the sharing of credentials between the
>>>>> programs is necessary. I have been referring to the cray-tests repo as
>>>>> an example of how to use rdma credentials for Cray in Libfabric. I
>>>>> modified this code somewhat by replacing all DRC code with just
>>>>> setting the cookie manually, and had hoped that would work.
>>>>> Unfortunately, the library relies on GNI_GetPtag to set the ptag, and
>>>>> this API call does not seem to work on Titan. I can get the
>>>>> auth_keys_test to pass by adding a ptag field to the
>>>>> fi_gni_raw_auth_key struct along with appropriate language to handle
>>>>> that, but I was wondering if there was some inherently supported way
>>>>> to do that same thing without modifying libfabric.
>>>>> 
>>>>> Other than credentials exchange/loading, i am wondering what I can
>>>>> expect in trying to use the GNI provider on a non-ARIES system. Has
>>>>> anyone done this successfully?
>>>>> 
>>>>> Thanks,
>>>>> Philip
>>> 
>> 
> 
> 
> 



More information about the ofiwg mailing list