[libfabric-users] FI_EP_MSG on cray
Howard Pritchard
hppritcha at gmail.com
Mon Feb 13 15:08:16 PST 2017
Hi John,
Could you try the run_gnitest script with this UGNI debug level set? I'd
like to understand why that's failing for you.
I cannot get fi_pingpong to work with FI_EP_MSG for GNI provider. It
should work though. I filed an issue on the GNI downstream provider repo.
Howard
2017-02-13 13:21 GMT-07:00 Biddiscombe, John A. <biddisco at cscs.ch>:
> Howard, here’s some output …
>
>
>
> The machine is the cray piz daint at CSCS,
>
>
>
> Allocation as follows
>
>
>
> salloc -N 2 -C mc --time=02:00:00 –exclusive
>
>
>
> daint102:/scratch/snx3000/biddisco/build$ export UGNI_DEBUG=10
>
> daint102:/scratch/snx3000/biddisco/build$ ./frun.sh
> ~/apps/fabtests/bin/fi_msg
>
> running /users/biddisco/apps/fabtests/bin/fi_msg on nid00[722,724]
>
> nid00722 is 148.187.34.215
>
> Generated command is srun -n 2 --ntasks-per-node=1 -l --multi-prog
> ./scalable.conf
>
> 0 /users/biddisco/apps/fabtests/bin/fi_msg -p gni
>
> 1 /users/biddisco/apps/fabtests/bin/fi_msg -p gni 148.187.34.215
>
>
>
> 0: [ 44] GNII_DebugInit: GNII_debug_level: 10 GNII_subsys_debug: 0
> GNII_debug_mask: 0x0 GNII_debug_inst_id: 44
>
> 0: [ 44] JOB: GNII_GetKernelVersion: kgni version major = 0x0 minor
> 0x45 code 0xb9 built with major = 0x0 minor = 0x45 code 0x4e24
>
> 0: [ 44] JOB: GNI_GetJobResInfo: job resource: FMA (6) used: 0 limit:
> 123
>
> 0: [ 44] JOB: GNII_GetKernelVersion: kgni version major = 0x0 minor
> 0x45 code 0xb9 built with major = 0x0 minor = 0x45 code 0x4e24
>
> 0: [ 44] JOB: GNI_GetJobResInfo: job resource: CQ (5) used: 0 limit:
> 509
>
> 0: fi_getinfo(): common/shared.c:454, ret=-61 (No data available)
>
> 1: [ 36] GNII_DebugInit: GNII_debug_level: 10 GNII_subsys_debug: 0
> GNII_debug_mask: 0x0 GNII_debug_inst_id: 36
>
> 1: [ 36] JOB: GNII_GetKernelVersion: kgni version major = 0x0 minor
> 0x45 code 0xb9 built with major = 0x0 minor = 0x45 code 0x4e24
>
> 1: [ 36] JOB: GNI_GetJobResInfo: job resource: FMA (6) used: 0 limit:
> 123
>
> 1: [ 36] JOB: GNII_GetKernelVersion: kgni version major = 0x0 minor
> 0x45 code 0xb9 built with major = 0x0 minor = 0x45 code 0x4e24
>
> 1: [ 36] JOB: GNI_GetJobResInfo: job resource: CQ (5) used: 0 limit:
> 509
>
> 1: [ 36] JOB: GNII_GetKernelVersion: kgni version major = 0x0 minor
> 0x45 code 0xb9 built with major = 0x0 minor = 0x45 code 0x4e24
>
> 1: [ 36] FMA: GNI_CdmAttach: FMA window size: 32768
>
> 1: [ 36] FMA: GNI_CdmAttach: NOPRIV_ERR masked
>
> 1: [ 36] JOB: GNI_CdmAttach: ptag = 36 inst_id = 13864961 fma_window
> = 0x0000000000000000 fma_ctrl = 0x0000000000000000
>
> 1: [ 36] CQ: GNI_CqCreate: entry_count: 1361 reqs: 1361 adjusted
> entries: 1395 alloc_count: 1396
>
> 1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)
>
> 1: [ 36] CQ: cq_create: #1 cq created, kern_cq_descr = 1, mode = 2,
> rd_index_ptr = 0x2aaaaaad7ba0, queue = 0x2aaaaaad5000, intr_mask = (nil)
>
> 1: [ 36] CQ: GNI_CqCreate: entry_count: 1361 reqs: 1361 adjusted
> entries: 1395 alloc_count: 1396
>
> 1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)
>
> 1: [ 36] CQ: cq_create: #1 cq created, kern_cq_descr = 394, mode =
> 20, rd_index_ptr = 0x2aaaaaadc000, queue = 0x2aaaaaad8000, intr_mask = (nil)
>
> 1: [ 36] FLBTE: GNII_FlbteInit: FLBTE: tx_counter 0x2aaaaaace008, chan
> 2, max_len -1, total 511
>
> 1: [ 36] CQ: GNI_CqCreate: entry_count: 2048 reqs: 2048 adjusted
> entries: 2559 alloc_count: 2560
>
> 1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)
>
> 1: [ 36] CQ: cq_create: #1 cq created, kern_cq_descr = 395, mode =
> 4, rd_index_ptr = 0x2aaaaaae6000, queue = 0x2aaaaaade000, intr_mask = (nil)
>
> 1: [ 36] CQ: GNI_CqCreate: entry_count: 2048 reqs: 2048 adjusted
> entries: 2559 alloc_count: 2560
>
> 1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)
>
> 1: [ 36] CQ: cq_create: #1 cq created, kern_cq_descr = 396, mode =
> 5, rd_index_ptr = 0x2aaaaaaef000, queue = 0x2aaaaaae7000, intr_mask =
> 0x2aaaaaacf000
>
> 1: [ 36] CQ: GNI_CqCreate: entry_count: 16384 reqs: 16384 adjusted
> entries: 16895 alloc_count: 16896
>
> 1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE) returned error -
> Invalid argument
>
> 1: [ 36] CQ: GNI_CqCreate: GNI_IOC_CQ_CREATE with PHYS_MEM failed
> trying without PHYS_MEM
>
> 1: [ 36] MR: GNI_MemRegister: Mem reg of 135168 length at addr
> 0x2aaaaaaf0000
>
> 1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)
>
> 1: [ 36] CQ: cq_create: #1 cq created, kern_cq_descr = 397, mode =
> 0, rd_index_ptr = 0x2aaaaab11000, queue = 0x2aaaaaaf0000, intr_mask = (nil)
>
> 1: [ 36] CQ: GNI_CqCreate: entry_count: 16384 reqs: 16384 adjusted
> entries: 16895 alloc_count: 16896
>
> 1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE) returned error -
> Invalid argument
>
> 1: [ 36] CQ: GNI_CqCreate: GNI_IOC_CQ_CREATE with PHYS_MEM failed
> trying without PHYS_MEM
>
> 1: [ 36] MR: GNI_MemRegister: Mem reg of 135168 length at addr
> 0x2aaaaab23000
>
> 1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)
>
> 1: [ 36] CQ: cq_create: #1 cq created, kern_cq_descr = 398, mode =
> 1, rd_index_ptr = 0x2aaaaab12000, queue = 0x2aaaaab23000, intr_mask =
> 0x2aaaaaacf004
>
> 1: [ 36] MR: GNI_MemRegister: Mem reg of 136314880 length at addr
> 0x2aaaae400000
>
> srun: error: nid00722: task 0: Exited with exit code 61
>
> srun: Terminating job step 789872.11
>
> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
>
> srun: error: nid00724: task 1: Killed
>
> daint102:/scratch/snx3000/biddisco/build$
>
> _______________________________________________
> Libfabric-users mailing list
> Libfabric-users at lists.openfabrics.org
> http://lists.openfabrics.org/mailman/listinfo/libfabric-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20170213/6fc4280b/attachment.html>
More information about the Libfabric-users
mailing list