[openib-general] Problem related to integration of OS/NIC

Yingchao Zhou yc_zhou at ncic.ac.cn
Wed Sep 13 17:26:18 PDT 2006


     The current kernel set PAGE_COPY without write bit. This will cause intermittent non-cosistent data for user-level network drivers such as Infiniband, Quadrics and Myrinet. Which has also be mentioned by Costin Iancu in the paper "HUNTing the Overlap " (PACT'05).
    An example of such phenomena is the following sequences: 
	register a memory space BUFF for receive message, 
	receive message,
	call mprotect(...PROT_NONE...) and mprotect(...PROT_READ|PROT_WRITE) one by one, 	
	write into BUFF, 
	then receive again.      
    The second time received data will perhaps not be the data sent by the peer machine but the data written by itself in the 4th step.

     The reson is that :
     1) User-level network driver locks phy pages when memory space is registered;
     2) 2 calls to mprotect change ptes in the space to PAGE_COPY, so write any page in the space will cause a page fault;
     3) In the page fault handler, it goes to do_wp_page, and in it if Page Is Locked, a new page is generated and filled into the pte, which is the COW(Copy-On-Write). So the physical page seen by the host is not the same one by the NIC.

     Adding PAGE_RW to PAGE_COPY will resolve this problem.  
     In my option, the reason for absense of RW is to save memory by mapping all those only read pages into ZERO_PAGE. But is there really programs which make many read-ops in memory space without even initialize them?

___________________________________________________
_      Yingchao Zhou                              _
_      ICT, CAS                                   _
_      (86)010-62601009                           _
___________________________________________________






More information about the general mailing list