[libfabric-users] Thread question about gni provider

Biddiscombe, John A. biddisco at cscs.ch
Mon Mar 13 00:54:26 PDT 2017


OK. So now I’m sure. This morning I tested and attached the debugger and I see this horror

https://gist.github.com/biddisco/95ea97941ad3a2bdeab1e875adfbda5a

(N threads are stuck on this lock and one that hold it has been suspended by HPX, so we have a deadlock scenario)

So I’ll need to either block multiple threads from polling, or convert libfabric to use the hpx threading model.

JB


From: Libfabric-users <libfabric-users-bounces at lists.openfabrics.org> on behalf of John Biddiscombe <biddisco at cscs.ch>
Date: Monday, 13 March 2017 at 00:16
To: "libfabric-users at lists.openfabrics.org" <libfabric-users at lists.openfabrics.org>
Subject: [libfabric-users] Thread question about gni provider

Last question : I hope ☺

I have my parcelport running correctly using the gni provider and all the simple message sends and rdma reads are working as expected – however – with more complex examples, there are lockups.

Is the GNI provider taking locks using pthread mutexes? I see them mentioned in the wait objects code, but I am not using any wait_sets directly and I wonder if they are present elsewhere.

Unfortunately, the HPX runtime, uses lightweight threads - and OS level mutexes screw things up in a bad way, so I need to remove them. (The verbs API is entirely thread safe, so my code ran very nicely without any problems – I had hoped the fabric port would do the same – I gave up on trying to use ucx for similar reasons, as it isn’t thread safe at the transport level).

Question : if GNI is using pthread mutexes and I replace the locks in libfabric GNI with HPX spinlocks and compile libfabric into the hpx project rather than building it outside – so that it becomes part of the runtime – will the GNI layer work, or are there requirements that GNI api calls _must_ be made on certain threads? I seem to recall that GNI has some limitations in terms of threading and am worried that whatever I do, I’m going to have problems with the locking or threading issues.

Thanks once more for any guidance

JB


--
John Biddiscombe,                        email:biddisco @.at.@ cscs.ch
http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre  | Tel:  +41 (91) 610.82.07
Via Trevano 131, 6900 Lugano, Switzerland   | Fax:  +41 (91) 610.82.82

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20170313/b14e821b/attachment.html>


More information about the Libfabric-users mailing list