[ofa-general] What causes "SRP abort called" error?
John Valdes
valdes at anl.gov
Tue Jan 8 15:44:17 PST 2008
Hello,
I'm new to SRP & IB, so please bear with me...
We have a storage server running RHEL 5.1 w/ the bundled OFED 1.2
stack directly attached to an IB port on a DDN 9550. It's been running
OK for about a week, but today we're getting a continuous stream of
SRP abort errors:
# tail /var/log/messages
[...]
Jan 8 17:00:59 server kernel: SRP abort called
Jan 8 17:01:59 server kernel: SRP abort called
Jan 8 17:02:04 server kernel: SRP reset_device called
Jan 8 17:02:09 server kernel: ib_srp: SRP reset_host called
Jan 8 17:02:11 server kernel: ib_srp: connection closed
How can I determine the cause of the aborts? The physical connection
between the server and the DDN seems to be OK (the error counts in
/sys/class/infiniband/mthca0/ports/1/counters/* are all zero), and the
SM (opensm) is still running. Are the aborts being triggered by the
server or by the storage target (the DDN)? I'm guessing something is
timing out, but what, and why?
To complicate matters, the LUNs on the DDN are shared with 7 other
servers as clustered LVM volumes with GFS filesystems. Each of the
other servers has its own, direct IB connection to the DDN.
Any suggestions on how to track down the cause of the aborts would be
welcome.
Thanks,
John
----------------------------------------------------------------------
John Valdes Mathematics and Computer Science Division
valdes at anl.gov Argonne National Laboratory
More information about the general
mailing list