[libfabric-users] FI_EP_MSG on cray

Biddiscombe, John A. biddisco at cscs.ch
Mon Feb 13 12:21:29 PST 2017


Howard, here’s some output …

The machine is the cray piz daint at CSCS,

Allocation as follows

salloc -N 2 -C mc --time=02:00:00 –exclusive

daint102:/scratch/snx3000/biddisco/build$ export UGNI_DEBUG=10
daint102:/scratch/snx3000/biddisco/build$ ./frun.sh ~/apps/fabtests/bin/fi_msg
running /users/biddisco/apps/fabtests/bin/fi_msg   on nid00[722,724]
nid00722 is 148.187.34.215
Generated command is  srun -n 2 --ntasks-per-node=1 -l --multi-prog ./scalable.conf
0 /users/biddisco/apps/fabtests/bin/fi_msg -p gni
1 /users/biddisco/apps/fabtests/bin/fi_msg -p gni   148.187.34.215

0: [    44] GNII_DebugInit: GNII_debug_level: 10 GNII_subsys_debug: 0 GNII_debug_mask: 0x0 GNII_debug_inst_id: 44
0: [    44]   JOB: GNII_GetKernelVersion: kgni version major = 0x0 minor 0x45 code 0xb9 built with major = 0x0 minor = 0x45 code 0x4e24
0: [    44]   JOB: GNI_GetJobResInfo: job resource: FMA (6) used: 0 limit: 123
0: [    44]   JOB: GNII_GetKernelVersion: kgni version major = 0x0 minor 0x45 code 0xb9 built with major = 0x0 minor = 0x45 code 0x4e24
0: [    44]   JOB: GNI_GetJobResInfo: job resource: CQ (5) used: 0 limit: 509
0: fi_getinfo(): common/shared.c:454, ret=-61 (No data available)
1: [    36] GNII_DebugInit: GNII_debug_level: 10 GNII_subsys_debug: 0 GNII_debug_mask: 0x0 GNII_debug_inst_id: 36
1: [    36]   JOB: GNII_GetKernelVersion: kgni version major = 0x0 minor 0x45 code 0xb9 built with major = 0x0 minor = 0x45 code 0x4e24
1: [    36]   JOB: GNI_GetJobResInfo: job resource: FMA (6) used: 0 limit: 123
1: [    36]   JOB: GNII_GetKernelVersion: kgni version major = 0x0 minor 0x45 code 0xb9 built with major = 0x0 minor = 0x45 code 0x4e24
1: [    36]   JOB: GNI_GetJobResInfo: job resource: CQ (5) used: 0 limit: 509
1: [    36]   JOB: GNII_GetKernelVersion: kgni version major = 0x0 minor 0x45 code 0xb9 built with major = 0x0 minor = 0x45 code 0x4e24
1: [    36]   FMA: GNI_CdmAttach: FMA window size: 32768
1: [    36]   FMA: GNI_CdmAttach: NOPRIV_ERR masked
1: [    36]   JOB: GNI_CdmAttach: ptag = 36 inst_id = 13864961 fma_window = 0x0000000000000000 fma_ctrl = 0x0000000000000000
1: [    36]    CQ: GNI_CqCreate: entry_count: 1361 reqs: 1361 adjusted entries: 1395 alloc_count: 1396
1: [    36]    CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)
1: [    36]    CQ: cq_create: #1 cq created, kern_cq_descr = 1, mode = 2, rd_index_ptr = 0x2aaaaaad7ba0, queue = 0x2aaaaaad5000, intr_mask = (nil)
1: [    36]    CQ: GNI_CqCreate: entry_count: 1361 reqs: 1361 adjusted entries: 1395 alloc_count: 1396
1: [    36]    CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)
1: [    36]    CQ: cq_create: #1 cq created, kern_cq_descr = 394, mode = 20, rd_index_ptr = 0x2aaaaaadc000, queue = 0x2aaaaaad8000, intr_mask = (nil)
1: [    36] FLBTE: GNII_FlbteInit: FLBTE: tx_counter 0x2aaaaaace008, chan 2, max_len -1, total 511
1: [    36]    CQ: GNI_CqCreate: entry_count: 2048 reqs: 2048 adjusted entries: 2559 alloc_count: 2560
1: [    36]    CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)
1: [    36]    CQ: cq_create: #1 cq created, kern_cq_descr = 395, mode = 4, rd_index_ptr = 0x2aaaaaae6000, queue = 0x2aaaaaade000, intr_mask = (nil)
1: [    36]    CQ: GNI_CqCreate: entry_count: 2048 reqs: 2048 adjusted entries: 2559 alloc_count: 2560
1: [    36]    CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)
1: [    36]    CQ: cq_create: #1 cq created, kern_cq_descr = 396, mode = 5, rd_index_ptr = 0x2aaaaaaef000, queue = 0x2aaaaaae7000, intr_mask = 0x2aaaaaacf000
1: [    36]    CQ: GNI_CqCreate: entry_count: 16384 reqs: 16384 adjusted entries: 16895 alloc_count: 16896
1: [    36]    CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)  returned error - Invalid argument
1: [    36]    CQ: GNI_CqCreate: GNI_IOC_CQ_CREATE with PHYS_MEM failed trying without PHYS_MEM
1: [    36]    MR: GNI_MemRegister: Mem reg of 135168 length at addr 0x2aaaaaaf0000
1: [    36]    CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)
1: [    36]    CQ: cq_create: #1 cq created, kern_cq_descr = 397, mode = 0, rd_index_ptr = 0x2aaaaab11000, queue = 0x2aaaaaaf0000, intr_mask = (nil)
1: [    36]    CQ: GNI_CqCreate: entry_count: 16384 reqs: 16384 adjusted entries: 16895 alloc_count: 16896
1: [    36]    CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)  returned error - Invalid argument
1: [    36]    CQ: GNI_CqCreate: GNI_IOC_CQ_CREATE with PHYS_MEM failed trying without PHYS_MEM
1: [    36]    MR: GNI_MemRegister: Mem reg of 135168 length at addr 0x2aaaaab23000
1: [    36]    CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)
1: [    36]    CQ: cq_create: #1 cq created, kern_cq_descr = 398, mode = 1, rd_index_ptr = 0x2aaaaab12000, queue = 0x2aaaaab23000, intr_mask = 0x2aaaaaacf004
1: [    36]    MR: GNI_MemRegister: Mem reg of 136314880 length at addr 0x2aaaae400000
srun: error: nid00722: task 0: Exited with exit code 61
srun: Terminating job step 789872.11
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: nid00724: task 1: Killed
daint102:/scratch/snx3000/biddisco/build$
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20170213/ad036c47/attachment.html>


More information about the Libfabric-users mailing list