[ofa-general] SRP target sporadic behaviour with Solaris, VMware

Wed Mar 12 15:18:00 PDT 2008

Bart Van Assche wrote:
> On Wed, Mar 12, 2008 at 12:03 AM, Daniel Pocock <daniel at pocock.com.au> wrote:
>>  I've recently set up the SRP target module on Linux (2.6.22).
>>
>>  Trying to access the target from various initiators (Fedora, Debian,
>>  Solaris 10, VMWare ESX 3.5) gives mixed results.
>>
>>  The Linux clients, despite having limited configuration tools, worked
>>  immediately.
>>
>>  I've opened a thread on the Sun forums to discuss the Solaris 10 issue:
>>
>>  http://forum.java.sun.com/thread.jspa?threadID=5273631
>>
>>  On VMware:
>>  - I had to reboot my new VMware ESX server a few times before it found
>>  my 500GB target.
>>  - VMWare completely rejects a target if it doesn't have a partition
>>  table - I ran parted on Linux and then VMWare was OK
>>  - Also, the messages in VMWare gave me the impression it would clobber
>>  the whole volume, rather than just a single partition - so to avoid the
>>  possibility of losing my other partitions, I made a special target
>>  representing the intended partition rather than the entire volume.  Now
>>  I have a VMware partition table nested within a partition.
>>  - VMware only seems to show one target at a time - I had created a few
>>  test targets, but I could only see one of them.  Is this what other
>>  people see?  ibsrpdm on the other Linux hosts shows all the targets.
> 
> My experience with SRP is as follows (with Linux 2.6.24 + SCST + SRPT
> as target):
> * Linux SRP initiator: works perfectly.
> * OpenSolaris SRP initiator: I could not get Sun's SRP initiator
> working on OpenSolaris. I even asked a Solaris expert to help me, but
> he couldn't get the SRP initiator working either.
> * VMware ESX 3.5 + Mellanox InfiniBand drivers (released in January
> 2008): until now I only have tested a setup with a single target. When
> doing a lot of I/O over the SRP connection, after about 10 minutes the
> virtual machine running on VMware starts logging communication errors.
> I reported this yesterday to Mellanox support, and Mellanox is
> currently working on this issue. Note: I had to upgrade the InfiniBand
> switch firmware before the ESX server was able to find the SRP target.
> 

Which mode of virtual disk did you use (rmd, rdmp, vmfs, raw)?

Could you provide Mellanox FAE both VM /var/log/messages and 
esx's /var/log/vmkernel?

I'll look over them

-vu