[openib-general] IBM eHCA Device Driver for gen2 IB stack

IBMEHCA DD IBMEHCAD at de.ibm.com
Tue Sep 20 05:48:14 PDT 2005


we released a 
https://sourceforge.net/projects/ibmehcad/ ehca2_0025
today which adresses most of these comments, 
and which survives most of pallas

Roland Dreier <rolandd at cisco.com> wrote on 25.07.2005 22:42:16:

> IBMEHCA> Hi, we've completed the first alpha code drop of the
> IBMEHCA> Power5 IBM eHCA Device Driver for the for the gen2
> IBMEHCA> openib.org stack.  We're running IPoIB and ibv userspace
> IBMEHCA> programs successfully with this code in our lab setup.
> 
> IBMEHCA> The source files can be downloaded from
> IBMEHCA> https://sourceforge.net/projects/ibmehcad/ ehca2_0011e
> 
> Thanks for posting this.  A few comments from a first read through.
> These are in no particular order, with minor nitpicks mixed in with
> more series problems.  Some more review is still needed before this
> code should go upstream.

> ehca (kernel driver):

> - Please use /* */ instead of // for comments

done

> - Can the debugging stuff be less ugly than EDEB_AV_EN() etc?
> Also the formatting of your goto labels is not very nice.  Instead
> of something like
> 
> EHCA_REG_SMR_EXIT0:
> if (retcode == 0)
> 
> I'd rather see

> out:
> if (retcode == 0)

we changed the goto statements to lower case, but didn't remove the 
numbering yet.
The numbering actually is sort of a coding pattern to detect cleanup bugs.
The idea behind it is that resources must be freed in the inverse order 
they where created,
so a
void * pd=malloc(4096)
if (0 == pd) {
   goto error_out_dealloc_pd
}
...
void * cq=malloc(4096)
if (0 == cq) {
   goto error_out_dealloc_cq
}
....
if (register_cq_fails) {
   goto error_out_dealloc_cq
}
...
void * qp=malloc(4096)
if (0 == qp) {
   goto error_out_dealloc_qp
}

goto out;

error_out_dealloc_qp
free(qp);
error_out_dealloc_cq
free(cq);
error_out_dealloc_pd
free(pd);
out:

return


would be more error prone than using numering. The basic idea is as soon 
as you allocate/register a resource (memory, HCA resources, kobject...)
increase the goto statement number by one
free in opposite order as allocated,
never skip a number during the initial checks
...and if you see sth like
if (...) goto out_0
if (...) goto out_2
if (...) goto out_1
it's very obviously wrong.


> 
> - In ehca_classes.c, it would be better to avoid using vmalloc() to
> allocate your structures.
done
> 
> - In ehca_common.h, all the stuff copied from from hvcall.h should be
> deleted.  Also, there are some functions that seem to use ugly
> return codes like H_Success for no good reason.
done (mostly, some of these defines are IB specific)
> 
> - The definitions of p_to_u64() etc. seem unnecesary and needlessly
> obfuscated.  You can always do
> 
> (unsigned long) ptr

> to cast a void * to u64, and

> (void *) (unsigned long) val
done

> - Similarly in ehca_qp.c,
> #define QP_ATTR_IS_SET(mask,attr)   (((mask)&(attr))!=0)
> is just needless obfuscation: just do the & directly where you need it.
done
> - In ehca_qp.c, is there a way we can take modqp_statetrans_table[]
> into common code and share with mthca_qp.c?
> 

well, maybe, modify QP is very complicated to get straight, but if 
somebody from mthca_qp.c thinks these are the right set of transitions
(this state machine base algorithm over the last 2 years) 

> - It seems many source files include ehca_classes_pSeries.h
> directly.  They should probably include ehca_classes.h.  Also
> ehca_classes_zSeries.h seems to be missing.

done
we currently don't have a ehca_classes_zSeries.h 

> 
> Instead of defining P_SERIES on your command line, I think you
> should just include <linux/config.h> and test CONFIG_PPC_PSERIES
> and CONFIG_ARCH_S390 instead.
> 
> - In ehca_main.c, ehca_hca_resources_show() violates the "one value
> per file" rule for sysfs.  Either put this in debugfs or split it
> up into separate attributes in a subdirectory.

done
> 
> - What is ehca_register_pci() doing?  Why are you changing the main
> kernel's pci_dma_ops??  Shouldn't you be creating a device on your
> own virtual bus with it's own dma ops?
> 

done,
see ebus
we're currently trying to get that integrated into base ppc64 kernel

> - It seems ehca_nopage() accesses ehca_idr without taking the
> ehca_idr_sem, is this OK?
> 

good, point!
done
> libehca:



> - It would probably be a good idea to use GNU autotools to build
> libehca.  In particular you want to make the destination directory
> configurable.

done (partly)
this is the initial version of autotools build,
still learning how to use autotools in a better way...
> 
> - I don't think you should hard code -m64 in your CFLAGS or put
> /usr/local/lib64 anywhere -- I believe nearly everyone running
> ppc64 is using 32-bit userspace.  Have you tested 32-bit userspace
> on a 64-bit kernel?

32 bit userspace will be supported in one of the next versions.
we're running mvapich as 64 bit library

> 
> - I don't think the install should unconditionally delete
> /dev/infiniband and then create the device nodes -- this should
> really be handled by either udev or the distribution.
> 

removed that from standard build, still trying to figure out how to 
configure udev to automatically add the infiniband device nodes

> - In ehca_reqs_core.c, get rid of all the

> #ifndef __KERNEL__
> #define ib_recv_wr ibv_recv_wr
> #define ib_send_wr ibv_send_wr
> //#define ib_ah ibv_ah
> #define ehca_av ehcau_av
> #define unlikely(x) x
> // ib_wr_opcode
> #define IB_WR_SEND IBV_WR_SEND
> #define IB_WR_SEND_WITH_IMM IBV_WR_SEND_WITH_IMM
> #define IB_WR_RDMA_WRITE IBV_WR_RDMA_WRITE
> #define IB_WR_RDMA_WRITE_WITH_IMM IBV_WR_RDMA_WRITE_WITH_IMM
> 
> and so on.  It's not a good idea to try and use the same source for
> code that runs in the kernel and code that runs in userspace.

technically the kernel version looks like it's a kernel only module...

> Splitting your source also gets rid of stuff like
> 
> #ifdef EHCA_USERDRIVER
> u8 *data = (u8 *) sge->addr;
> #else
> u8 *data = (u8 *) abs_to_virt(sge->addr);
> #endif

EHCA_USERDRIVER is the rest of a userspace sanbox to run the kernel code 
in userspace, as done for parts of the scsi stack for example.
we'll try to remove that flag on the next release

> - In ehca_uinit.c, is the LIBEHCA_FILE environment variable handling
> safe?  It seems that a user could trick a setuid process into doing
> something bad here.
> 

good point!
now picking up the place where to trace in a config file in /etc

> - In ehca_umain.c there seems to be some weird indentation:

> my_cq->ehca_cq_core.galpas.kernel.fw_handle = (u64)
> mmap(NULL, 4096, PROT_WRITE, MAP_SHARED, context->cmd_fd,
> ((u64) (my_cq->token) << 32) | 0x11000000);

hmm, changed from bad to not completely good.
Will fix in the next version. Hope we didn't add some more of those.

> - Does it make sense to encapsulate to use of ppc64-specific assembly
> stuff like:
> 
> __asm__ __volatile("dcbz 0,%0"::"r"(adr));
> __asm__ __volatile("dclst 0,%0"::"r"(adr));
> __asm__ __volatile__("sync":::"memory"); // serialize GAL register 
access

done
> 
> - In ehca_utools.h, ehca_swapbl32 and ehca_swapbl16 seem kind of 
useless.
> Why not just use ntohl/ntohs directly?

done
> 
> Thanks,
> Roland

Thanks for your input, I'm pretty sure this only was the start to get that 
code kernel ready.
Christoph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050920/41e9c0ab/attachment.html>


More information about the general mailing list