[openib-general] RE: ptrace peektext failure for Mellanox IBGD 1.7.0 based cluster

Eli Cohen eli at mellanox.co.il
Mon Oct 10 06:22:35 PDT 2005


David,
IBGD 1.7 does not support kernel 2.6.11 so I assume you have made changes to
IBGD to make it compile.
In the files you sent I can't see a call to ptrace with PTRACE_PEEKTEXT but
I can see a call to PTRACE_PEEKDATA. Note that in the IBGD stack, registered
buffers are not inherited by a child process when a the parent forks. This
is accomplished by setting the VM_DONTCOPY flag on the vma. It is so done to
retain the virtual to physical translation of a page at the parent by
disabling COW on the pages. So the child may not even have these buffers in
its address space and this could be the reason why ptrace fails.
Note also that IBGD 1.8 is the latest release and it does support kernel
2.6.11 so you may consider using it, though the description above holds also
for IBGD 1.8
Eli

-----Original Message-----
From: David Lecomber [mailto:david at allinea.com]
Sent: Monday, October 10, 2005 11:23 AM
To: openib-general at openib.org
Subject: ptrace peektext failure for Mellanox IBGD 1.7.0 based cluster


Dear all,

I'm having a kernel problem which I believe to be caused by the
infiniband drivers on the system I am using.

Kernel 2.6.11, Mellanox software stack IBGD 1.7.0.

Essentially, once an MPI code is started, the kernel refuses to allow
ptrace() access to the text segment (ie. where the program instructions
lie), although it is possible to access the data segment.

This means debugging is impossible (gdb, idb, ddt, etc.).

The attached code demonstrates the problem.

Untar, and then make.  Run the 'mpi' program, and pick a line of it's
output, paste into another shell.  On the standard, non MPI test code,
the ptrace reads are all successful.  On the MPI test, it gives an error
for the text segment reads..

Is this a known issue - are there any upgrades/fixes which should have
been applied?  I would appreciate if someone could run the test
suggested on a really new setup, and see if the error happens.


Regards
David
-- 
David Lecomber, CTO, Allinea Software
tel: +44 1926 623231  fax: +44 1926 623232

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20051010/e78b2be9/attachment.html>


More information about the general mailing list