From mst at mellanox.co.il Thu Dec 1 02:02:52 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Dec 2005 12:02:52 +0200 Subject: [openib-general] [PATCH] libibverbs: document immediate data ordering Message-ID: <20051201100252.GS25751@mellanox.co.il> verbs.h documents ordering for immediate data in completion, but not in send work request. Signed-off-by: Michael S. Tsirkin Index: libibverbs/include/infiniband/verbs.h =================================================================== --- libibverbs/include/infiniband/verbs.h (revision 4031) +++ libibverbs/include/infiniband/verbs.h (working copy) @@ -441,7 +441,7 @@ struct ibv_send_wr { int num_sge; enum ibv_wr_opcode opcode; enum ibv_send_flags send_flags; - uint32_t imm_data; + uint32_t imm_data; /* in network byte order */ union { struct { uint64_t remote_addr; -- MST From yipeeyipeeyipeeyipee at yahoo.com Thu Dec 1 02:13:33 2005 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Thu, 1 Dec 2005 10:13:33 +0000 (UTC) Subject: [openib-general] static OpenSM Message-ID: Hi, How can I compile OpenSM to be statically linked? I tried configuring it with '--enable-static' but that was ignored. thanks, x From halr at voltaire.com Thu Dec 1 03:33:22 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Dec 2005 06:33:22 -0500 Subject: [openib-general] RE: [PATCH] [TRIVIAL] OpenSM/complib: Move assert before variable is used In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30E244B@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30E244B@mtlexch01.mtl.com> Message-ID: <1133436608.2984.20781.camel@hal.voltaire.com> On Thu, 2005-12-01 at 01:41, Yael Kalka wrote: > Hi Hal, > This fix isn't correct, since you are asserting on a variable not yet > initialized. Right. My bad. This was pointed out last night by Johannes Erdfelt. Thanks. -- Hal > Yael > > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, November 30, 2005 11:04 PM > To: Yael Kalka > Cc: openib-general at openib.org > Subject: [PATCH] [TRIVIAL] OpenSM/complib: Move assert before variable > is used > > > OpenSM/complib: Move assert before variable is used > > Signed-off-by: Hal Rosenstock > > Index: cl_dispatcher.c > =================================================================== > --- cl_dispatcher.c (revision 4257) > +++ cl_dispatcher.c (working copy) > @@ -344,8 +344,8 @@ cl_disp_post( > cl_dispatcher_t *p_disp; > cl_disp_msg_t *p_msg; > > - p_disp = handle->p_disp; > CL_ASSERT( p_disp ); > + p_disp = handle->p_disp; > CL_ASSERT( msg_id != CL_DISP_MSGID_NONE ); > > cl_spinlock_acquire( &p_disp->lock ); From yael at mellanox.co.il Thu Dec 1 04:17:11 2005 From: yael at mellanox.co.il (Yael Kalka) Date: Thu, 1 Dec 2005 14:17:11 +0200 Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30E2451@mtlexch01.mtl.com> Hi Hal, Eitan, I think the best option is to add an OpenSM option flag - exit_on_fatal. This flag can decide on the action on fatal cases: 1. Exit or not when seeing SM with different SM_Key. 2. Exit or not when there is a fatal link error (e.g - multiple guids). etc. I tried to run 2 SMs just now with different SM_keys, and I see that none of them exit, since both receive SM_Key=0 on SMInfo GetResp. The reason for that is that in the SMInfo Get request (as in all other requests) we do not send anything in the mad data. Meaning - all fields are clear. In the __osm_sminfo_rcv_process_get_request function we are checking the state according to the payload data. This is always zero! Thus - SM will never know that the SMInfo request is sent from an SM that is master. I will work on a fix for that. Yael -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Wednesday, November 30, 2005 11:57 PM To: Yael Kalka; Eitan Zahavi Cc: openib-general at openib.org Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] Hi Yael & Eitan, Based on the recent MgtWG discussions, are you still holding your position in terms of exiting OpenSM when a non matching SM Key is discovered ? Just wondering if I can issue a patch for this and clear this issue so OpenSM can be compliant for this aspect. Thanks. -- Hal -----Forwarded Message----- From: Hal Rosenstock To: openib-general at openib.org Subject: [openib-general] OpenSM and Wrong SM_Key Date: 08 Nov 2005 16:08:47 -0500 Hi, Currently, when OpenSM receives SMInfo with a different SM_Key, it exits as follows: void __osm_sminfo_rcv_process_get_response( IN const osm_sminfo_rcv_t* const p_rcv, IN const osm_madw_t* const p_madw ) { ... /* Check that the sm_key of the found SM is the same as ours, or is zero. If not - OpenSM cannot continue with configuration!. */ if ( p_smi->sm_key != 0 && p_smi->sm_key != p_rcv->p_subn->opt.sm_key ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "__osm_sminfo_rcv_process_get_response: ERR 2F18: " "Got SM with sm_key that doesn't match our " "local key. Exiting\n" ); osm_log( p_rcv->p_log, OSM_LOG_SYS, "Found remote SM with non-matching sm_key. Exiting\n" ); osm_exit_flag = TRUE; goto Exit; } C14-61.2.1 states that: A master SM which finds a higher priority master SM with the wrong SM_Key should not relinquish the subnet. Exiting OpenSM relinquishes the subnet. So it appears to me that perhaps this behavior of exiting OpenSM should be at least contingent on the SM state and relative priority of the SMInfo received. Make sense ? If so, I will work on a patch for this. -- Hal _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From yael at mellanox.co.il Thu Dec 1 04:40:49 2005 From: yael at mellanox.co.il (Yael Kalka) Date: Thu, 1 Dec 2005 14:40:49 +0200 Subject: [openib-general] RE: [PATCH] Opensm - fix LinkRecord get Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30E2452@mtlexch01.mtl.com> Hi Hal, Regarding your question - the answer is that there shouldn't be an error message in this case. Assume the following: A LinkRecord request is received with FromLid and ToLid, both Lids of switches. In this case __osm_lr_rcv_get_port_links function will be called with both src and dest port objects not NULL (the osm_port_t depends on the Lid). The __osm_lr_rcv_get_physp_link function will be called with all possible couple of 2 such physical ports (for all possible port numbers) - also couples that are not connected. This call is not an error, but part of the flow. Yael -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Wednesday, November 30, 2005 4:20 PM To: Yael Kalka Cc: openib-general at openib.org; Eitan Zahavi Subject: Re: [PATCH] Opensm - fix LinkRecord get On Wed, 2005-11-30 at 07:35, Yael Kalka wrote: > Hi Hal, > > During some tests I've noticed that in LinkRecord queries there are > some bugs: > 1. Trying to ensure the two physical ports are connected comparison > isn't done correctly. > 2. When __osm_lr_rcv_get_physp_link is called with physical ports not > null - there is no check that the value returned is actually different > than null. As a result we can get several links with the same value. > > This patch fixes both issues. Thanks. Applied. Some minor comments below. -- Hal > Thanks, > Yael > > Signed-off-by: Yael Kalka > > Index: opensm/osm_sa_link_record.c > =================================================================== > --- opensm/osm_sa_link_record.c (revision 4231) > +++ opensm/osm_sa_link_record.c (working copy) > @@ -235,7 +235,7 @@ __osm_lr_rcv_get_physp_link( > Ensure the two physp's are actually connected. > If not, bail out. > */ > - if( osm_physp_get_remote( p_src_physp ) != p_src_physp ) > + if( osm_physp_get_remote( p_src_physp ) != p_dest_physp ) > goto Exit; Should there be an error message here ? > } > else > @@ -393,12 +393,16 @@ __osm_lr_rcv_get_port_links( > { > p_dest_physp = osm_port_get_phys_ptr( p_dest_port, > dest_port_num ); > + /* both physical ports should be with data */ > + if (p_src_physp && p_dest_physp) > + { > __osm_lr_rcv_get_physp_link( p_rcv, p_lr, p_src_physp, > p_dest_physp, comp_mask, > p_list, p_req_physp ); > } > } > } > + } Formatting was off here (and similarly below)... I fixed it in the change that was just committed. > else > { > /* > @@ -412,17 +416,22 @@ __osm_lr_rcv_get_port_links( > if (port_num < p_src_port->physp_tbl_size) > { > p_src_physp = osm_port_get_phys_ptr( p_src_port, port_num ); > + if (p_src_physp) > + { > __osm_lr_rcv_get_physp_link( p_rcv, p_lr, p_src_physp, > NULL, comp_mask, p_list, > p_req_physp ); > } > } > + } > else > { > num_ports = osm_port_get_num_physp( p_src_port ); > for( port_num = 1; port_num < num_ports; port_num++ ) > { > p_src_physp = osm_port_get_phys_ptr( p_src_port, port_num ); > + if (p_src_physp) > + { > __osm_lr_rcv_get_physp_link( p_rcv, p_lr, p_src_physp, > NULL, comp_mask, p_list, > p_req_physp ); > @@ -430,6 +439,7 @@ __osm_lr_rcv_get_port_links( > } > } > } > + } > else > { > if( p_dest_port ) > @@ -446,11 +456,14 @@ __osm_lr_rcv_get_port_links( > { > p_dest_physp = osm_port_get_phys_ptr( > p_dest_port, port_num ); > + if (p_dest_physp) > + { > __osm_lr_rcv_get_physp_link( p_rcv, p_lr, NULL, > p_dest_physp, comp_mask, > p_list, p_req_physp ); > } > } > + } > else > { > num_ports = osm_port_get_num_physp( p_dest_port ); > @@ -458,12 +471,15 @@ __osm_lr_rcv_get_port_links( > { > p_dest_physp = osm_port_get_phys_ptr( > p_dest_port, port_num ); > + if (p_dest_physp) > + { > __osm_lr_rcv_get_physp_link( p_rcv, p_lr, NULL, > p_dest_physp, comp_mask, > p_list, p_req_physp ); > } > } > } > + } > else > { > /* > From mst at mellanox.co.il Thu Dec 1 05:04:29 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Dec 2005 15:04:29 +0200 Subject: [openib-general] [PATCH applied] sdp: fix aio completion on cancel Message-ID: <20051201130429.GA25751@mellanox.co.il> SDP AIO : Turn a warning message on completed iocb cancel to debug. SDP AIO : Put the proper AIO req on cancellation. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_send.c =================================================================== --- linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_send.c (revision 4198) +++ linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_send.c (working copy) @@ -1738,7 +1742,7 @@ /* * completion reference */ - aio_put_req(req); + aio_put_req(iocb->req); result = 0; } @@ -1797,9 +1801,8 @@ * no IOCB found. The cancel is probably in a race with a completion. * Assume the IOCB will be completed, return appropriate value. */ - sdp_warn("Cancel write with no IOCB. <%d:%d:%08lx>", - req->ki_users, req->ki_key, req->ki_flags); - + sdp_dbg_warn(conn, "Cancel write with no IOCB. <%d:%d:%08lx>", + req->ki_users, req->ki_key, req->ki_flags); result = -EAGAIN; unlock: -- MST From halr at voltaire.com Thu Dec 1 05:19:04 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Dec 2005 08:19:04 -0500 Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30E2451@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30E2451@mtlexch01.mtl.com> Message-ID: <1133443143.2984.21769.camel@hal.voltaire.com> On Thu, 2005-12-01 at 07:17, Yael Kalka wrote: > Hi Hal, Eitan, > I think the best option is to add an OpenSM option flag - exit_on_fatal. > This flag can decide on the action on fatal cases: > 1. Exit or not when seeing SM with different SM_Key. Still not sure why this would be an option. The compliance seems to me to be pretty clear on this. > 2. Exit or not when there is a fatal link error (e.g - multiple guids). > etc. I think the second issue is separable from the first. I would prefer to keep the discussion of this issue separate from SM Key. > I tried to run 2 SMs just now with different SM_keys, and I see that > none of them > exit, since both receive SM_Key=0 on SMInfo GetResp. > The reason for that is that in the SMInfo Get request (as in all other > requests) > we do not send anything in the mad data. Meaning - all fields are clear. The SM needs a way to know whether the other SM(s) (and which ones) are trusted or not so the SM_Key can be filled in. > In the __osm_sminfo_rcv_process_get_request function we are checking the > state according > to the payload data. This is always zero! Thus - SM will never know that > the SMInfo > request is sent from an SM that is master. Right, on the get side, SMState is reserved as it is a RO component (of SMInfo). > I will work on a fix for that. Thanks. -- Hal > Yael > > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, November 30, 2005 11:57 PM > To: Yael Kalka; Eitan Zahavi > Cc: openib-general at openib.org > Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > > Hi Yael & Eitan, > > Based on the recent MgtWG discussions, are you still holding your > position in terms of exiting OpenSM when a non matching SM Key is > discovered ? Just wondering if I can issue a patch for this and clear > this issue so OpenSM can be compliant for this aspect. Thanks. > > -- Hal > > -----Forwarded Message----- > > From: Hal Rosenstock > To: openib-general at openib.org > Subject: [openib-general] OpenSM and Wrong SM_Key > Date: 08 Nov 2005 16:08:47 -0500 > > Hi, > > Currently, when OpenSM receives SMInfo with a different SM_Key, it exits > as follows: > > > void > __osm_sminfo_rcv_process_get_response( > IN const osm_sminfo_rcv_t* const p_rcv, > IN const osm_madw_t* const p_madw ) > { > ... > > > > /* > Check that the sm_key of the found SM is the same as ours, > or is zero. If not - OpenSM cannot continue with configuration!. */ > if ( p_smi->sm_key != 0 && > p_smi->sm_key != p_rcv->p_subn->opt.sm_key ) > { > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > "__osm_sminfo_rcv_process_get_response: ERR 2F18: " > "Got SM with sm_key that doesn't match our " > "local key. Exiting\n" ); > osm_log( p_rcv->p_log, OSM_LOG_SYS, > "Found remote SM with non-matching sm_key. Exiting\n" ); > osm_exit_flag = TRUE; > goto Exit; > } > > C14-61.2.1 states that: > A master SM which finds a higher priority master SM with the wrong > SM_Key should not relinquish the subnet. > > Exiting OpenSM relinquishes the subnet. > > So it appears to me that perhaps this behavior of exiting OpenSM should > be at least contingent on the SM state and relative priority of the > SMInfo received. Make sense ? If so, I will work on a patch for this. > > -- Hal > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Thu Dec 1 05:47:08 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Dec 2005 15:47:08 +0200 Subject: [openib-general] [PATCH applied] sdp zcopy support for send_msg/recv_msd Message-ID: <20051201134708.GB25751@mellanox.co.il> I have added zcopy option to trunk. With this enabled I am getting good bandwidth with multiple sockets, but typically worse that bcopy bandwidth for a single socket. swlab155:~ # ( export SIMPLE_LIBSDP=1 ; export LD_PRELOAD=/usr/local/lib/libsdp.so; iperf -c 11.4.8.156 -P 4 -l 64000 -f M ) ------------------------------------------------------------ Client connecting to 11.4.8.156, TCP port 5001 TCP window size: 0.11 MByte (default) ------------------------------------------------------------ [ 7] local 11.4.8.155 port 32812 connected with 11.4.8.156 port 5001 [ 5] local 11.4.8.155 port 32810 connected with 11.4.8.156 port 5001 [ 8] local 11.4.8.155 port 32813 connected with 11.4.8.156 port 5001 [ 6] local 11.4.8.155 port 32811 connected with 11.4.8.156 port 5001 [ 7] 0.0-10.0 sec 2309 MBytes 231 MBytes/sec [ 5] 0.0-10.0 sec 2309 MBytes 231 MBytes/sec [ 8] 0.0-10.0 sec 2309 MBytes 231 MBytes/sec [ 6] 0.0-10.0 sec 2309 MBytes 231 MBytes/sec [SUM] 0.0-10.0 sec 9235 MBytes 924 MBytes/sec swlab155:~ # ( export SIMPLE_LIBSDP=1 ; export LD_PRELOAD=/usr/local/lib/libsdp.so; iperf -c 11.4.8.156 -P 2 -l 64000 -f M ) ------------------------------------------------------------ Client connecting to 11.4.8.156, TCP port 5001 TCP window size: 0.11 MByte (default) ------------------------------------------------------------ [ 5] local 11.4.8.155 port 32814 connected with 11.4.8.156 port 5001 [ 6] local 11.4.8.155 port 32815 connected with 11.4.8.156 port 5001 [ 5] 0.0-10.0 sec 4233 MBytes 423 MBytes/sec [ 6] 0.0-10.0 sec 4233 MBytes 423 MBytes/sec [SUM] 0.0-10.0 sec 8466 MBytes 847 MBytes/sec swlab155:~ # ( export SIMPLE_LIBSDP=1 ; export LD_PRELOAD=/usr/local/lib/libsdp.so; iperf -c 11.4.8.156 -l 64000 -f M ) ------------------------------------------------------------ Client connecting to 11.4.8.156, TCP port 5001 TCP window size: 0.11 MByte (default) ------------------------------------------------------------ [ 5] local 11.4.8.155 port 32816 connected with 11.4.8.156 port 5001 [ 5] 0.0-10.0 sec 5092 MBytes 509 MBytes/sec --- Add zero copy support to synchronous socket operations (send_msg/recv_msg). Signed-off-by: Michael S. Tsirkin Index: linux-2.6.14/drivers/infiniband/ulp/sdp/Kconfig =================================================================== --- linux-2.6.14/drivers/infiniband/ulp/sdp/Kconfig (revision 4198) +++ linux-2.6.14/drivers/infiniband/ulp/sdp/Kconfig (working copy) @@ -8,6 +8,20 @@ libsdp library from to have standard sockets applications use SDP. +config INFINIBAND_SDP_SEND_ZCOPY + bool "Sockets Direct Protocol Zero Copy Send support" + depends on INFINIBAND_SDP + default n + ---help--- + This option enables Zero Copy support for send_msg transactions. + +config INFINIBAND_SDP_RECV_ZCOPY + bool "Sockets Direct Protocol Zero Copy Receive support" + depends on INFINIBAND_SDP && INFINIBAND_SDP_SEND_ZCOPY + default n + ---help--- + This option enables Zero Copy support for recv_msg transactions. + config INFINIBAND_SDP_DEBUG bool "Sockets Direct Protocol debugging" depends on INFINIBAND_SDP Index: linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_rcvd.c =================================================================== --- linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_rcvd.c (revision 4198) +++ linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_rcvd.c (working copy) @@ -439,6 +439,11 @@ sdp_advt_destroy(advt); } + +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + /* There are no more src_avail, wake up any waiting thread */ + sdp_iocb_q_wakeup_complete(&conn->r_pend); +#endif /* * If there are active reads, mark the connection as being in * source cancel. Otherwise Index: linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_sock.h =================================================================== --- linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_sock.h (revision 4198) +++ linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_sock.h (working copy) @@ -61,7 +61,9 @@ #define SDP_ZCOPY_THRSH_SRC 257 /* Threshold for AIO write advertisments */ #define SDP_ZCOPY_THRSH_SNK 258 /* Threshold for AIO read advertisments */ #define SDP_ZCOPY_THRSH 256 /* Convenience for read and write */ - +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY +#define SDP_ZCOPY_CANCEL_TIMEOUT (HZ * 60) /* Time before abortive close */ +#endif /* * Default values for SDP specific socket options. (for reference) */ Index: linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_proto.h =================================================================== --- linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_proto.h (revision 4198) +++ linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_proto.h (working copy) @@ -152,7 +152,13 @@ void sdp_iocb_q_put_tail(struct sdpc_iocb_q *table, struct sdpc_iocb *iocb); struct sdpc_iocb *sdp_iocb_q_lookup(struct sdpc_iocb_q *table, u32 key); +struct sdpc_iocb *sdp_iocb_q_lookup_req(struct sdpc_iocb_q *table, struct kiocb *req); +struct sdpc_iocb *sdp_iocb_q_lookup_complete(struct sdpc_iocb_q *table, struct kiocb *req); +struct sdpc_iocb *sdp_iocb_q_wakeup_complete(struct sdpc_iocb_q *table); + +void sdp_iocb_q_mark_cancel(struct sdpc_iocb_q *table, struct kiocb *req); + void sdp_iocb_q_cancel(struct sdpc_iocb_q *table, u32 mask, ssize_t comp); void sdp_iocb_q_remove(struct sdpc_iocb *iocb); @@ -197,6 +203,8 @@ void *arg), void *arg); +int sdp_iocb_find_req(struct sdpc_desc *element, void *arg); + int sdp_desc_q_types_size(struct sdpc_desc_q *table, enum sdp_desc_type type); Index: linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_read.c =================================================================== --- linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_read.c (revision 4198) +++ linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_read.c (working copy) @@ -93,6 +93,12 @@ } } +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + /* If there are no more src_avail, wake up any waiting thread */ + if (!conn->src_recv) + sdp_iocb_q_wakeup_complete(&conn->r_pend); + +#endif done: return 0; error: @@ -222,14 +228,23 @@ iocb->flags &= ~(SDP_IOCB_F_ACTIVE | SDP_IOCB_F_RDMA_R); - if (sk_sdp(conn)->sk_rcvlowat > iocb->post) - break; +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + if (!iocb->len || (!conn->src_recv && iocb->post >= iocb->lowat)) +#else + if (iocb->post >= iocb->lowat) +#endif + { + /* + * complete IOCB + */ + SDP_CONN_STAT_READ_INC(conn, iocb->post); + SDP_CONN_STAT_RQ_DEC(conn, iocb->size); + /* + * callback to complete IOCB + */ + sdp_iocb_complete(sdp_iocb_q_get_head(&conn->r_pend), 0); + } - SDP_CONN_STAT_READ_INC(conn, iocb->post); - SDP_CONN_STAT_RQ_DEC(conn, iocb->size); - - sdp_iocb_complete(sdp_iocb_q_get_head(&conn->r_pend), 0); - break; default: sdp_warn("Unknown type <%d> at head of READ SRC queue. <%d>", Index: linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_send.c =================================================================== --- linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_send.c (revision 4198) +++ linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_send.c (working copy) @@ -122,6 +122,10 @@ send_param.send_flags |= IB_SEND_SIGNALED; conn->send_cons = 0; } + + if (buff->bsdh_hdr->mid == SDP_MID_SRC_CANCEL) + sdp_dbg_ctrl(conn, "SRC_CANCEL bsdh_hdr->seq_num = %d conn->send_seq=%d\n", + buff->bsdh_hdr->seq_num, conn->send_seq); /* * post send */ @@ -1680,8 +1684,8 @@ static int sdp_inet_write_cancel(struct kiocb *req, struct io_event *ev) { struct sock_iocb *si = kiocb_to_siocb(req); - struct sdp_sock *conn; struct sdpc_iocb *iocb; + struct sdp_sock *conn; int result = 0; sdp_dbg_ctrl(NULL, "Cancel Write IOCB user <%d> key <%d> flag <%08lx>", @@ -1810,7 +1813,151 @@ return result; } +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY +static int sdp_write_src_cancel(struct sdpc_desc *element, void *arg) +{ + struct sdpc_iocb *iocb = (struct sdpc_iocb *) element; + struct kiocb *req = (struct kiocb *)arg; + + if (element->type == SDP_DESC_TYPE_IOCB && iocb->req == req) + iocb->flags |= SDP_IOCB_F_CANCEL; + return -ERANGE; +} + +static int sdp_req_busy(struct sdp_sock *conn, struct sdpc_iocb_wait *wait) +{ + unsigned long flags; + int result = -EAGAIN; + + sdp_conn_lock(conn); + sdp_conn_unlock(conn); + + spin_lock_irqsave(&wait->lock, flags); + if (!wait->outstanding) + result = 0; + spin_unlock_irqrestore(&wait->lock, flags); + return result; +} /* + * sdp_write_cancel - cancel a synchronous IO operation + */ +static int sdp_write_cancel(struct kiocb *req, struct sdp_sock *conn, + struct sdpc_iocb_wait *wait) +{ + struct sdpc_iocb *iocb; + int result = 0; + + sdp_dbg_ctrl(NULL, "Cancel Write IOCB user <%d> key <%d> flag <%08lx>", + req->ki_users, req->ki_key, req->ki_flags); + + sdp_conn_lock(conn); + + sdp_dbg_ctrl(conn, "Cancel Write IOCB. <%08x:%04x> <%08x:%04x>", + conn->src_addr, conn->src_port, + conn->dst_addr, conn->dst_port); + /* + * attempt to find the IOCB for this key. we don't have an indication + * whether this is a read or write. + */ + + while ((iocb = (struct sdpc_iocb *) + sdp_desc_q_lookup(&conn->send_queue, sdp_iocb_find_req, req))) { + iocb->flags |= SDP_IOCB_F_CANCEL; + + /* + * always remove the IOCB. + * If active, then place it into the correct active queue + */ + sdp_desc_q_remove((struct sdpc_desc *)iocb); + + if (iocb->flags & SDP_IOCB_F_ACTIVE) { + if (iocb->flags & SDP_IOCB_F_RDMA_W) + sdp_desc_q_put_tail(&conn->w_snk, + (struct sdpc_desc *)iocb); + else { + SDP_EXPECT((iocb->flags & SDP_IOCB_F_RDMA_R)); + + sdp_iocb_q_put_tail(&conn->w_src, iocb); + } + } else { + /* + * empty IOCBs can be deleted, while partials + * needs to be compelted. + */ + if (iocb->post > 0) { + sdp_iocb_complete(iocb, 0); + result = -EAGAIN; + } else { + sdp_iocb_destroy(iocb); + + /* + * completion reference + */ + if (!iocb->wait) + aio_put_req(iocb->req); + else { + unsigned long flags; + spin_lock_irqsave(&iocb->wait->lock, flags); + --iocb->wait->outstanding; + /* No need to wake up, + since we call sdp_req_busy + directly below */ + + spin_unlock_irqrestore(&iocb->wait->lock, flags); + } + } + } + } + + /* + * check the sink queue, not much to do, since the operation is + * already in flight. + */ + sdp_desc_q_lookup(&conn->w_snk, sdp_write_src_cancel, req); + + iocb = (struct sdpc_iocb *)sdp_desc_q_lookup(&conn->w_snk, + sdp_iocb_find_req, + req); + if (iocb) { + sdp_dbg_ctrl(conn, "Sink Queue busy\n"); + result = -EAGAIN; + } + + /* + * check source queue. If we're in the source queue, then a cancel + * needs to be issued. + */ + sdp_iocb_q_mark_cancel(&conn->w_src, req); + + iocb = sdp_iocb_q_lookup_req(&conn->w_src, req); + if (iocb) { + sdp_dbg_ctrl(conn, "Sending Src Cancel\n"); + + if (! (conn->flags & SDP_CONN_F_SRC_CANCEL_L)) { + sdp_desc_q_lookup(&conn->w_snk, sdp_write_src_cancel, req); + conn->flags |= SDP_CONN_F_SRC_CANCEL_L; + result = sdp_send_ctrl_src_cancel(conn); + SDP_EXPECT(result >= 0); + } + + result = -EAGAIN; + } + + if (!result) { + /* + * no IOCB found. Assume the IOCB will be completed. + */ + sdp_dbg_ctrl(conn, "Cancel IOCB done. <%d:%d:%08lx>", + req->ki_users, req->ki_key, req->ki_flags); + } + + sdp_conn_unlock(conn); + + return sdp_req_busy(conn, wait); +} +#endif + +/* * sdp_send_flush_advt - Flush passive sink advertisments */ static int sdp_send_flush_advt(struct sdp_sock *conn) @@ -1987,7 +2134,7 @@ return timeout; } -static inline int sdp_queue_iocb(struct kiocb *req, struct sdp_sock *conn, +static inline int sdp_queue_aio(struct kiocb *req, struct sdp_sock *conn, struct msghdr *msg, size_t size, size_t *copied) { @@ -2038,14 +2185,79 @@ return -EIOCBQUEUED; } +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY +static inline int sdp_queue_sync(struct kiocb *req, struct sdp_sock *conn, + struct msghdr *msg, size_t size, + size_t *copied, + struct sdpc_iocb_wait *wait) +{ + struct sdpc_iocb *iocb; + struct iovec *msg_iov; + unsigned long flags; + size_t len; + int result; + /* + * create IOCB with remaining space + */ + iocb = sdp_iocb_create(); + if (!iocb) { + sdp_dbg_warn(conn, "Failed to allocate IOCB <%Zu:%ld>", + size, (long)*copied); + return -ENOMEM; + } + + for (msg_iov = msg->msg_iov; !msg_iov->iov_len; ++msg_iov); + + /* FMR alignment can add an extra page. */ + len = min(msg_iov->iov_len, (size_t)SDP_IOCB_SIZE_MAX - 4096); + iocb->len = len; + iocb->post = 0; + iocb->size = len; + iocb->req = req; + iocb->key = req->ki_key; + iocb->addr = (unsigned long)msg_iov->iov_base; + iocb->wait = wait; + + result = sdp_iocb_lock(iocb); + if (result < 0) { + sdp_dbg_warn(conn, "Error <%d> locking IOCB <%Zu:%ld>", + result, size, (long)copied); + + sdp_iocb_destroy(iocb); + return result; + } + + SDP_CONN_STAT_WQ_INC(conn, iocb->size); + + result = sdp_send_data_queue(conn, (struct sdpc_desc *)iocb); + if (result < 0) { + sdp_dbg_warn(conn, "Error <%d> queueing write IOCB", result); + sdp_iocb_destroy(iocb); + return result; + } + + spin_lock_irqsave(&wait->lock, flags); + ++wait->outstanding; + spin_unlock_irqrestore(&wait->lock, flags); + + conn->send_pipe += len; + *copied += len; /* copied amount was saved in IOCB. */ + msg_iov->iov_len -= len; + msg_iov->iov_base += len; + return 0; +} +#endif /* * sdp_inet_send - send data from user space to the network */ int sdp_inet_send(struct kiocb *req, struct socket *sock, struct msghdr *msg, size_t size) { - struct sock *sk; - struct sdp_sock *conn; +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY + struct sdpc_iocb_wait wait; +#endif + struct sock *sk; + struct sdp_sock *conn; int result = 0; size_t copied = 0; int oob, zcopy; @@ -2074,6 +2286,7 @@ if (conn->state == SDP_CONN_ST_LISTEN || conn->state == SDP_CONN_ST_CLOSED) { result = -ENOTCONN; + sdp_conn_unlock(conn); goto done; } /* @@ -2082,13 +2295,24 @@ * they are smaller then the zopy threshold, but only if there is * no buffer write space. */ - zcopy = (size >= conn->src_zthresh && !is_sync_kiocb(req)); + zcopy = (size >= conn->src_zthresh && (!is_sync_kiocb(req) +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY + || (!(msg->msg_flags & MSG_DONTWAIT) && !oob) +#endif + )); /* * clear ASYN space bit, it'll be reset if there is no space. */ if (!zcopy) clear_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags); +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY + else if (is_sync_kiocb(req)) { + init_waitqueue_head(&wait.wait); + spin_lock_init(&wait.lock); + wait.outstanding = 0; + } +#endif /* * process data first if window is open, next check conditions, then * wait if there is more work to be done. The absolute window size is @@ -2143,14 +2367,45 @@ * completion. Wait on sync IO call create IOCB for async * call. */ +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY + if (is_sync_kiocb(req) && zcopy) + result = sdp_queue_sync(req, conn, msg, size, &copied, + &wait); + /* TODO: limit the # of outstanding reqs */ + /* TODO: sleep on recoverable errors */ + else +#endif if (is_sync_kiocb(req)) timeout = sdp_wait_till_space(sk, conn, oob, timeout); else - result = sdp_queue_iocb(req, conn, msg, size, &copied); + result = sdp_queue_aio(req, conn, msg, size, &copied); } + sdp_conn_unlock(conn); + +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY + if (!result && is_sync_kiocb(req) && zcopy) { + timeout = wait_event_interruptible_timeout(wait.wait, + !sdp_req_busy(conn, &wait), timeout); + if (!timeout) + result = -EAGAIN; + } + + if (signal_pending(current) && is_sync_kiocb(req) && zcopy) { + result = (timeout > 0) ? sock_intr_errno(timeout) : -EAGAIN; + + timeout = wait_event_timeout(wait.wait, + !sdp_write_cancel(req, conn, &wait), + SDP_ZCOPY_CANCEL_TIMEOUT); + if (!timeout) { + sdp_warn("sdp_write_cancel timed out. Abort.\n"); + sdp_conn_lock(conn); + sdp_conn_abort(conn); + sdp_conn_unlock(conn); + } + } +#endif done: - sdp_conn_unlock(conn); result = ((copied > 0) ? copied : result); if (result == -EPIPE && !(msg->msg_flags & MSG_NOSIGNAL)) Index: linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_recv.c =================================================================== --- linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_recv.c (revision 4198) +++ linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_recv.c (working copy) @@ -327,6 +327,10 @@ iocb = sdp_iocb_q_look(&conn->r_pend); if (!iocb) return ENODEV; +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + if (iocb->flags & SDP_IOCB_F_RO) + return ENODEV; +#endif /* * check zcopy threshold */ @@ -414,7 +418,11 @@ * loop posting RDMA reads, if there is room. */ if (!sdp_iocb_q_size(&conn->r_pend)) - while (sdp_advt_q_size(&conn->src_pend) > 0 && + while( +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + !sdp_desc_q_size(&conn->r_src) && +#endif + sdp_advt_q_size(&conn->src_pend) > 0 && conn->recv_max > sdp_buff_q_size(&conn->recv_pool) && conn->rwin_max > conn->byte_strm) { @@ -706,9 +714,8 @@ * b) the amount of data moved into the IOCB is greater then the * socket recv low water mark. */ - if (!iocb->len || - (!conn->src_recv && - !(sk_sdp(conn)->sk_rcvlowat > iocb->post))) { + if (!iocb->len || (!conn->src_recv && iocb->post >= iocb->lowat)) + { /* * complete IOCB */ @@ -1055,7 +1062,151 @@ return result; } +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY +static int sdp_req_busy(struct kiocb *req, struct sdp_sock *conn, + struct sdpc_iocb_wait *wait, size_t *copied) +{ + struct sdpc_iocb *iocb; + unsigned long flags; + int result = -EAGAIN; + int lowat_reached = 0; + + sdp_conn_lock(conn); + /* Unlock polls cqs */ + sdp_conn_unlock(conn); + + for (;;) { + spin_lock_irqsave(&wait->lock, flags); + iocb = sdp_iocb_q_get_head(&wait->q); + if (!iocb) + break; + --wait->outstanding; + + if (iocb->post >= iocb->lowat) + wait->lowat_reached = 1; + + lowat_reached = wait->lowat_reached; + + spin_unlock_irqrestore(&wait->lock, flags); + + *copied -= iocb->len; + sdp_iocb_release(iocb); + sdp_iocb_unlock(iocb); + sdp_iocb_destroy(iocb); + } + + if (!wait->outstanding) + result = 0; + + spin_unlock_irqrestore(&wait->lock, flags); + + /* Remove any outstanding iocbs which have their low watermark + satisfied */ + if (lowat_reached && result) { + sdp_conn_lock(conn); + if (!conn->src_recv) + while ((iocb = sdp_iocb_q_lookup_complete(&conn->r_pend, + req))) { + sdp_iocb_q_remove(iocb); + SDP_CONN_STAT_READ_INC(conn, iocb->post); + SDP_CONN_STAT_RQ_DEC(conn, iocb->size); + sdp_iocb_complete(iocb, 0); + } + sdp_conn_unlock(conn); + } + + return result; +} + /* + * sdp_read_cancel - cancel a synchronous IO operation + */ +static int sdp_read_cancel(struct kiocb *req, struct sdp_sock *conn, + struct sdpc_iocb_wait *wait, size_t *copied) +{ + struct sdpc_iocb *iocb; + sdp_dbg_ctrl(NULL, "Cancel Read IOCBs. user <%d> req <%p> flag <%08lx>", + req->ki_users, req, req->ki_flags); + + sdp_conn_lock(conn); + + sdp_dbg_ctrl(conn, "Cancel Read IOCBs. <%08x:%04x> <%08x:%04x>", + conn->src_addr, conn->src_port, + conn->dst_addr, conn->dst_port); + /* + * attempt to find the IOCB for this req. + */ + while ((iocb = sdp_iocb_q_lookup_req(&conn->r_pend, req))) { + /* + * always remove the IOCB. If active, then place it into + * the correct active queue. Inactive empty IOCBs can be + * deleted, while inactive partials needs to be compelted. + */ + sdp_iocb_q_remove(iocb); + + if (!(iocb->flags & SDP_IOCB_F_ACTIVE)) { + sdp_iocb_complete(iocb, 0); + goto unlock; + } + + if (iocb->flags & SDP_IOCB_F_RDMA_W) + sdp_iocb_q_put_tail(&conn->r_snk, iocb); + else { + SDP_EXPECT((iocb->flags & SDP_IOCB_F_RDMA_R)); + + sdp_desc_q_put_tail(&conn->r_src, + (struct sdpc_desc *)iocb); + } + } + /* + * check the source queue, not much to do, since the operation is + * already in flight. + */ + iocb = (struct sdpc_iocb *)sdp_desc_q_lookup(&conn->r_src, + sdp_iocb_find_req, req); + if (iocb) { + iocb->flags |= SDP_IOCB_F_CANCEL; + goto unlock; + } + /* + * check sink queue. If we're in the sink queue, then a cancel + * needs to be issued. + */ + iocb = sdp_iocb_q_lookup_req(&conn->r_snk, req); + if (iocb) { + /* + * Unfortunetly there is only a course grain cancel in SDP, so + * we have to cancel everything. + */ + if (!(conn->flags & SDP_CONN_F_SNK_CANCEL)) { + int result; + + result = sdp_send_ctrl_snk_cancel(conn); + SDP_EXPECT(result >= 0); + + conn->flags |= SDP_CONN_F_SNK_CANCEL; + } + + iocb->flags |= SDP_IOCB_F_CANCEL; + + goto unlock; + } + /* + * no IOCB found. The cancel is probably in a race with a completion. + */ + sdp_dbg_ctrl(NULL, "Cancel read with no IOCB. <%d:%d:%08lx>", + req->ki_users, req->ki_key, req->ki_flags); + + +unlock: + sdp_conn_unlock(conn); + + return sdp_req_busy(req, conn, wait, copied); +} + +#endif + +/* * sdp_inet_recv - recv data from the network to user space */ int sdp_inet_recv(struct kiocb *req, struct socket *sock, struct msghdr *msg, @@ -1065,17 +1216,22 @@ struct sdp_sock *conn; struct sdpc_iocb *iocb; struct sdpc_buff *buff; - long timeout; + long timeout = 0 /*Turn off compiler warning */; size_t length; int result = 0; int expect; int low_water; - int copied = 0; + size_t copied = 0; int copy; int update; s8 oob = 0; s8 ack = 0; struct sdpc_buff_q peek_queue; +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + int zcopy = 0; + struct sdpc_iocb_wait wait; + unsigned long f; +#endif sk = sock->sk; conn = sdp_sk(sk); @@ -1293,6 +1449,80 @@ /* * Either wait or create IOCB for defered completion. */ +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + if (is_sync_kiocb(req) && !(flags & MSG_PEEK) && + (zcopy || size - copied >= conn->snk_zthresh) + /* && (conn->src_recv || + (low_water - copied >= conn->snk_zthresh)) */ ) { + struct iovec *msg_iov; + size_t len; + /* + * create IOCB with remaining space + */ + iocb = sdp_iocb_create(); + if (!iocb) { + sdp_dbg_warn(conn, + "Error allocating IOCB <%Zu:%Zd>", + size, copied); + result = -ENOMEM; + break; + } + + for (msg_iov = msg->msg_iov; !msg_iov->iov_len; ++msg_iov); + + /* FMR alignment can add an extra page. */ + len = min(msg_iov->iov_len, (size_t)SDP_IOCB_SIZE_MAX - 4096); + iocb->len = len; + iocb->post = 0; + iocb->size = len; + iocb->req = req; + iocb->key = req->ki_key; + iocb->addr = (unsigned long)msg_iov->iov_base; + if (copied >= low_water) + iocb->lowat = 0; + else + iocb->lowat = min_t(size_t, len, low_water - copied); + iocb->wait = &wait; + + iocb->flags |= SDP_IOCB_F_RECV | SDP_IOCB_F_RO; + + req->ki_cancel = sdp_inet_read_cancel; + + result = sdp_iocb_lock(iocb); + if (result < 0) { + sdp_dbg_warn(conn, + "Error <%d> IOCB lock <%Zu:%Zd>", + result, size, copied); + + sdp_iocb_destroy(iocb); + break; + } + + SDP_CONN_STAT_RQ_INC(conn, iocb->size); + + if (!zcopy) { + init_waitqueue_head(&wait.wait); + spin_lock_init(&wait.lock); + sdp_iocb_q_init(&wait.q); + wait.outstanding = 0; + wait.lowat_reached = copied >= low_water; + zcopy = 1; + } + + sdp_iocb_q_put_tail(&conn->r_pend, iocb); + + spin_lock_irqsave(&wait.lock, f); + ++wait.outstanding; + spin_unlock_irqrestore(&wait.lock, f); + + /* TODO: set it? */ + ack = 1; + copied += len; + msg_iov->iov_len -= len; + msg_iov->iov_base += len; + break; + } else +#endif if (is_sync_kiocb(req)) { DECLARE_WAITQUEUE(wait, current); @@ -1325,7 +1555,7 @@ iocb = sdp_iocb_create(); if (!iocb) { sdp_dbg_warn(conn, - "Error allocating IOCB <%Zu:%d>", + "Error allocating IOCB <%Zu:%Zd>", size, copied); result = -ENOMEM; break; @@ -1338,7 +1568,7 @@ iocb->key = req->ki_key; iocb->addr = ((unsigned long)msg->msg_iov->iov_base - copied); - + iocb->lowat = low_water; iocb->flags |= SDP_IOCB_F_RECV; req->ki_cancel = sdp_inet_read_cancel; @@ -1346,7 +1576,7 @@ result = sdp_iocb_lock(iocb); if (result < 0) { sdp_dbg_warn(conn, - "Error <%d> IOCB lock <%Zu:%d>", + "Error <%d> IOCB lock <%Zu:%Zd>", result, size, copied); sdp_iocb_destroy(iocb); @@ -1383,5 +1613,28 @@ sdp_buff_q_put_head(&conn->recv_pool, buff); sdp_conn_unlock(conn); +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + if (!result && is_sync_kiocb(req) && zcopy) { + timeout = wait_event_interruptible_timeout(wait.wait, + !sdp_req_busy(req, conn, &wait, &copied), timeout); + if (!timeout) + result = -EAGAIN; + } + + if (signal_pending(current) && is_sync_kiocb(req) && zcopy) { + result = (timeout > 0) ? sock_intr_errno(timeout) : -EAGAIN; + + timeout = wait_event_timeout(wait.wait, + !sdp_read_cancel(req, conn, &wait, &copied), + SDP_ZCOPY_CANCEL_TIMEOUT); + if (!timeout) { + sdp_warn("sdp_read_cancel timed out. Abort.\n"); + sdp_conn_lock(conn); + sdp_conn_abort(conn); + sdp_conn_unlock(conn); + } + } +#endif + return ((copied > 0) ? copied : result); } Index: linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_iocb.c =================================================================== --- linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_iocb.c (revision 4198) +++ linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_iocb.c (working copy) @@ -317,12 +317,23 @@ sdp_dbg_data(NULL, "IOCB complete. <%d:%d:%08lx> value <%ld>", iocb->req->ki_users, iocb->req->ki_key, iocb->req->ki_flags, value); + +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY + if (iocb->wait) { + unsigned long flags; + spin_lock_irqsave(&iocb->wait->lock, flags); + if (!--iocb->wait->outstanding) { + wake_up(&iocb->wait->wait); + } + spin_unlock_irqrestore(&iocb->wait->lock, flags); + } else +#endif + /* + * valid result can be 0 or 1 for complete so + * we ignore the value. + */ + (void)aio_complete(iocb->req, value, 0); /* - * valid result can be 0 or 1 for complete so - * we ignore the value. - */ - (void)aio_complete(iocb->req, value, 0); - /* * delete IOCB */ sdp_iocb_destroy(iocb); @@ -335,7 +346,19 @@ { iocb->status = status; - if (in_atomic() || irqs_disabled()) { +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + if ((iocb->flags & SDP_IOCB_F_RECV) && iocb->wait) { + unsigned long flags; + spin_lock_irqsave(&iocb->wait->lock, flags); + sdp_iocb_q_put_tail(&iocb->wait->q, iocb); + /* Possible optimization: only wake + if no more outstanding iocbs or low watermark reached */ + wake_up(&iocb->wait->wait); + spin_unlock_irqrestore(&iocb->wait->lock, flags); + } else +#endif + if ((iocb->flags & SDP_IOCB_F_RECV) && + (in_atomic() || irqs_disabled())) { INIT_WORK(&iocb->completion, do_iocb_complete, (void *)iocb); schedule_work(&iocb->completion); } else @@ -392,6 +415,75 @@ return NULL; } +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY +struct sdpc_iocb *sdp_iocb_q_lookup_req(struct sdpc_iocb_q *table, struct kiocb *req) +{ + struct sdpc_iocb *iocb; + int counter; + + for (counter = 0, iocb = table->head; counter < table->size; + counter++, iocb = iocb->next) + if (iocb->req == req) + return iocb; + + return NULL; +} + +void sdp_iocb_q_mark_cancel(struct sdpc_iocb_q *table, struct kiocb *req) +{ + struct sdpc_iocb *iocb = NULL; + int counter; + + for (counter = 0, iocb = table->head; counter < table->size; + counter++, iocb = iocb->next) + if (iocb->req == req) + iocb->flags |= SDP_IOCB_F_CANCEL; + +} + +int sdp_iocb_find_req(struct sdpc_desc *element, void *arg) +{ + struct sdpc_iocb *iocb = (struct sdpc_iocb *) element; + struct kiocb *req = (struct kiocb *)arg; + + if (element->type == SDP_DESC_TYPE_IOCB && iocb->req == req) + return 0; + return -ERANGE; +} +#endif + +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY +struct sdpc_iocb *sdp_iocb_q_lookup_complete(struct sdpc_iocb_q *table, struct kiocb *req) +{ + struct sdpc_iocb *iocb; + int counter; + + for (counter = 0, iocb = table->head; counter < table->size; + counter++, iocb = iocb->next) + if (iocb->req == req && iocb->post >= iocb->lowat) + return iocb; + + return NULL; +} +struct sdpc_iocb *sdp_iocb_q_wakeup_complete(struct sdpc_iocb_q *table) +{ + struct sdpc_iocb *iocb; + unsigned long flags; + int counter; + + for (counter = 0, iocb = table->head; counter < table->size; + counter++, iocb = iocb->next) + if (iocb->wait && iocb->post >= iocb->lowat) { + spin_lock_irqsave(&iocb->wait->lock, flags); + iocb->wait->lowat_reached = 1; + spin_unlock_irqrestore(&iocb->wait->lock, flags); + wake_up(&iocb->wait->wait); + } + + return NULL; +} +#endif + /* * sdp_iocb_create - create an IOCB object */ Index: linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_iocb.h =================================================================== --- linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_iocb.h (revision 4198) +++ linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_iocb.h (working copy) @@ -55,6 +55,9 @@ #define SDP_IOCB_F_LOCKED 0x00000040 /* IOCB is locked in memory */ #define SDP_IOCB_F_REG 0x00000080 /* IOCB memory is registered */ #define SDP_IOCB_F_RECV 0x00000100 /* IOCB is for a receive request */ +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY +#define SDP_IOCB_F_RO 0x00000200 /* Suppress SinkAvail for this IOCB */ +#endif #define SDP_IOCB_F_ALL 0xFFFFFFFF /* IOCB all mask */ /* * zcopy constants. @@ -66,10 +69,12 @@ */ #define sdp_iocb_q_size(table) ((table)->size) +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY +struct sdpc_iocb_wait; +#endif /* * INET read/write IOCBs */ - /* * save a kvec read/write for processing once data shows up. */ @@ -80,7 +85,7 @@ struct sdpc_iocb_q *table; /* table to which this iocb belongs */ void (*release)(struct sdpc_iocb *iocb); /* release the object */ /* - * iocb sepcific + * iocb specific */ int flags; /* usage flags */ /* @@ -89,6 +94,7 @@ u32 key; /* matches kiocb key for lookups */ int len; /* space left in the user buffer */ int post; /* amount of data requested so far. */ + int lowat; /* when to complete this IOCB (receive only). */ u64 wrid; /* work request completing this IOCB */ ssize_t status; /* status of completed iocb */ /* @@ -112,6 +118,9 @@ int page_offset; /* offset into first page. */ struct work_struct completion; /* task for defered completion. */ +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY + struct sdpc_iocb_wait *wait; +#endif /* * kernel iocb structure */ @@ -127,4 +136,17 @@ int size; /* current number of IOCBs in table */ }; +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY +/* Report completions here */ +struct sdpc_iocb_wait { + spinlock_t lock; + int outstanding; + wait_queue_head_t wait; +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + struct sdpc_iocb_q q; /* Receive iocbs only */ + int lowat_reached; +#endif +}; + +#endif #endif /* _SDP_IOCB_H */ -- MST From halr at voltaire.com Thu Dec 1 06:20:17 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Dec 2005 09:20:17 -0500 Subject: [openib-general] [PATCH] [TRIVIAL] OpenSM: osm_sm_state_mgr.c Fix some error codes Message-ID: <1133446817.4325.44.camel@hal.voltaire.com> osm_sm_state_mgr.c Eliminate some error codes Signed-off-by: Hal Rosenstock Index: osm_sm_state_mgr.c =================================================================== --- osm_sm_state_mgr.c (revision 4278) +++ osm_sm_state_mgr.c (working copy) @@ -300,7 +300,7 @@ __osm_sm_state_mgr_start_polling( if( cl_status != CL_SUCCESS ) { osm_log( p_sm_mgr->p_log, OSM_LOG_ERROR, - "__osm_sm_state_mgr_start_polling : ERROR 1000: " + "__osm_sm_state_mgr_start_polling : ERR 3210: " "Failed to start timer\n" ); } @@ -379,7 +379,7 @@ __osm_sm_state_mgr_polling_callback( if( cl_status != CL_SUCCESS ) { osm_log( p_sm_mgr->p_log, OSM_LOG_ERROR, - "__osm_sm_state_mgr_polling_callback : ERROR 1000: " + "__osm_sm_state_mgr_polling_callback : ERR 3211: " "Failed to re-start timer\n" ); } From jlentini at netapp.com Thu Dec 1 06:51:58 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 1 Dec 2005 09:51:58 -0500 (EST) Subject: [openib-general] [uDAPL]Linking error of dapltest for uDAPL In-Reply-To: <7b2fa1820511301900q485c8990t40d3876c45d7d0b8@mail.gmail.com> References: <7b2fa1820511301900q485c8990t40d3876c45d7d0b8@mail.gmail.com> Message-ID: > I want to confirm that my uDAPL could run correctly, so I try to build the > dapltest. But I have got a linking error. It seems that there is something > wrong with the "ldat", but I failed to find where this "ldat" is referenced. > I am using the default Makefile. Did you type make in svn/gen2/trunk/src/userspace/dapl/dat/udat? This will produce the dat registry library file in the Obj subdirectory. From halr at voltaire.com Thu Dec 1 06:52:57 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Dec 2005 09:52:57 -0500 Subject: [openib-general] OpenSM: search_mgrp_by_mgid questions Message-ID: <1133448776.4325.179.camel@hal.voltaire.com> Hi Yael, osm_sa_path_record.c::__search_mgrp_by_mgid has the following: p_recvd_mgid = p_ctxt->p_mgid; p_rcv = p_ctxt->p_rcv; /* Why not compare the entire MGID ???? */ /* different scope can sneak in for the same MGID ? */ /* EZ: I changed it to full compare ! */ if (cl_memcmp(&p_mgrp->mcmember_rec.mgid, p_recvd_mgid, sizeof(ib_gid_t))) return; whereas osm_sa_mcmember_record.c::__search_mgrp_by_mgid has the following: p_recvd_mcmember_rec = p_ctxt->p_mcmember_rec; p_rcv = p_ctxt->p_rcv; /* ignore groups marked for deletion */ if (p_mgrp->to_be_deleted) return; /* compare entire MGID so different scope will not sneak in for the same MGID */ if (cl_memcmp(&p_mgrp->mcmember_rec.mgid, &p_recvd_mcmember_rec->mgid, sizeof(ib_gid_t))) return; Shouldn't the SA PR code also check for "to be deleted" ? It also seems like the comments on the MGID comparison should also be made the same. -- Hal From jlentini at netapp.com Thu Dec 1 07:02:19 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 1 Dec 2005 10:02:19 -0500 (EST) Subject: [openib-general] Re: [PATCH][uDAPL] new provider with uCMA (librdmacm) support In-Reply-To: References: Message-ID: arlin> Here is a provider for the latest uCMA that is tested arlin> with dapltest, dtest, and Intel MPI. I also added a top arlin> level README with instructions. Default build is arlin> set for uCMA. Committed in revision 4279. From eitan at mellanox.co.il Thu Dec 1 07:28:20 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 1 Dec 2005 17:28:20 +0200 Subject: [openib-general] OpenSM: search_mgrp_by_mgid questions Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3618A47@mtlexch01.mtl.com> Hi Hal, You are very right. Thanks. Can you patch it? Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Thursday, December 01, 2005 4:53 PM > To: Yael Kalka > Cc: openib-general at openib.org > Subject: [openib-general] OpenSM: search_mgrp_by_mgid questions > > Hi Yael, > > osm_sa_path_record.c::__search_mgrp_by_mgid has the following: > > p_recvd_mgid = p_ctxt->p_mgid; > p_rcv = p_ctxt->p_rcv; > > /* Why not compare the entire MGID ???? */ > /* different scope can sneak in for the same MGID ? */ > /* EZ: I changed it to full compare ! */ > if (cl_memcmp(&p_mgrp->mcmember_rec.mgid, > p_recvd_mgid, > sizeof(ib_gid_t))) > return; > > whereas osm_sa_mcmember_record.c::__search_mgrp_by_mgid has the > following: > > p_recvd_mcmember_rec = p_ctxt->p_mcmember_rec; > p_rcv = p_ctxt->p_rcv; > > /* ignore groups marked for deletion */ > if (p_mgrp->to_be_deleted) > return; > > /* compare entire MGID so different scope will not sneak in for > the same MGID */ > if (cl_memcmp(&p_mgrp->mcmember_rec.mgid, > &p_recvd_mcmember_rec->mgid, > sizeof(ib_gid_t))) > return; > > Shouldn't the SA PR code also check for "to be deleted" ? It also seems > like the comments on the MGID comparison should also be made the same. > > -- Hal > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From eitan at mellanox.co.il Thu Dec 1 07:35:03 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 1 Dec 2005 17:35:03 +0200 Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3618A48@mtlexch01.mtl.com> Hi Yael, As I read through the MgtWg mails I get the impression that an out of spec mechanism is required to know if the other SM is trusted. In that case and since OpenSM does not currently provide any such mechanism, I would prefer never to send out the SM_Key on the request and always send zero. Sending our SM_Key to a non - trusted SM is not a good idea in my mind. OpenSM behavior should be to always trust any other SM. So any discovered SM that deserves to be the master should be granted that right. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Yael Kalka > Sent: Thursday, December 01, 2005 2:17 PM > To: 'Hal Rosenstock'; Eitan Zahavi > Cc: openib-general at openib.org > Subject: RE: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > Hi Hal, Eitan, > I think the best option is to add an OpenSM option flag - exit_on_fatal. > This flag can decide on the action on fatal cases: > 1. Exit or not when seeing SM with different SM_Key. > 2. Exit or not when there is a fatal link error (e.g - multiple guids). > etc. > > I tried to run 2 SMs just now with different SM_keys, and I see that none of them > exit, since both receive SM_Key=0 on SMInfo GetResp. > The reason for that is that in the SMInfo Get request (as in all other requests) > we do not send anything in the mad data. Meaning - all fields are clear. > In the __osm_sminfo_rcv_process_get_request function we are checking the state > according > to the payload data. This is always zero! Thus - SM will never know that the SMInfo > request is sent from an SM that is master. > > I will work on a fix for that. > Yael > > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, November 30, 2005 11:57 PM > To: Yael Kalka; Eitan Zahavi > Cc: openib-general at openib.org > Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > > Hi Yael & Eitan, > > Based on the recent MgtWG discussions, are you still holding your > position in terms of exiting OpenSM when a non matching SM Key is > discovered ? Just wondering if I can issue a patch for this and clear > this issue so OpenSM can be compliant for this aspect. Thanks. > > -- Hal > > -----Forwarded Message----- > > From: Hal Rosenstock > To: openib-general at openib.org > Subject: [openib-general] OpenSM and Wrong SM_Key > Date: 08 Nov 2005 16:08:47 -0500 > > Hi, > > Currently, when OpenSM receives SMInfo with a different SM_Key, it exits > as follows: > > > void > __osm_sminfo_rcv_process_get_response( > IN const osm_sminfo_rcv_t* const p_rcv, > IN const osm_madw_t* const p_madw ) > { > ... > > > > /* > Check that the sm_key of the found SM is the same as ours, > or is zero. If not - OpenSM cannot continue with configuration!. */ > if ( p_smi->sm_key != 0 && > p_smi->sm_key != p_rcv->p_subn->opt.sm_key ) > { > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > "__osm_sminfo_rcv_process_get_response: ERR 2F18: " > "Got SM with sm_key that doesn't match our " > "local key. Exiting\n" ); > osm_log( p_rcv->p_log, OSM_LOG_SYS, > "Found remote SM with non-matching sm_key. Exiting\n" ); > osm_exit_flag = TRUE; > goto Exit; > } > > C14-61.2.1 states that: > A master SM which finds a higher priority master SM with the wrong > SM_Key should not relinquish the subnet. > > Exiting OpenSM relinquishes the subnet. > > So it appears to me that perhaps this behavior of exiting OpenSM should > be at least contingent on the SM state and relative priority of the > SMInfo received. Make sense ? If so, I will work on a patch for this. > > -- Hal > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From Arkady.Kanevsky at netapp.com Thu Dec 1 07:29:56 2005 From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady) Date: Thu, 1 Dec 2005 10:29:56 -0500 Subject: [swg] RE: [openib-general] socket based connectionmodel for IBproposal -round 4 Message-ID: agreed. Arkady Kanevsky email: arkady at netapp.com Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 > -----Original Message----- > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Wednesday, November 30, 2005 12:59 PM > To: Yaron Haviv > Cc: Kanevsky, Arkady; Ted H. Kim; swg at infinibandta.org; > openib-general at openib.org > Subject: Re: [swg] RE: [openib-general] socket based > connectionmodel for IBproposal -round 4 > > Yaron Haviv wrote: > > How about using ARP to get from IP to DGID+Partition Followed by an > > SIDR to map DGID+PKey+Service to QKey & QP > > > > It is the same concept as CMA that first uses IP stack (ARP > etc') to > > get to the remote end-point (in that case GID+PKey combination) > > followed by SA-PR and CM REQ, we just substitute the CM REQ with a > > SIDR REQ It may not solve all the cases but probably most of the > > practical ones > > This was my thought as well. > > > Anyway the packets will need to carry some header (since it's not a > > connected model), you can add more stuff in that header > (e.g. can use > > IPoIB header as is which contains already the src/dst IP) > > I was assuming that each packet would need to carry some sort > of header. > > At this point, we may want to defer defining anything for UDP > until there's a better understanding of what an application > would want. My guess is that such an application will need > new APIs for posting sends based on UDP addressing. > > - Sean > From Arkady.Kanevsky at netapp.com Thu Dec 1 07:33:16 2005 From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady) Date: Thu, 1 Dec 2005 10:33:16 -0500 Subject: [openib-general] scoket based connection model for IB - round 5 Message-ID: Here is the fifth and I hope the final version of the proposal. The changes from previous version: 1. IBTA bit numbering scheme (reserse order) 2. Protocol version is split into major and monr wiht 4 bits each. Arkady Arkady Kanevsky email: arkady at netapp.com Network Appliance Inc. phone: 781-768-5395 275 Totten Pond Rd. Fax: 781-895-1195 Waltham, MA 02451-2010 central phone: 781-768-5300 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: IP Address Support by InfiniBand CM_v5.pdf Type: application/octet-stream Size: 24635 bytes Desc: IP Address Support by InfiniBand CM_v5.pdf URL: From eitan at mellanox.co.il Thu Dec 1 07:41:20 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 1 Dec 2005 17:41:20 +0200 Subject: [openib-general] First Multicast Leave disconnects all other clients Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3618A49@mtlexch01.mtl.com> Hi Hal, SRP uses InformInfo to get notification about new or lost ports (trap 64/65) such that new targets are recognized without periodic SA query. I do not know if that code already found its way to OpenIB. I do not think it is relevant to that discussion about missing APIs. Maybe to the priority of implementation. But IMO - until we do provide that missing capabilities we are actually preventing SRP and other ulps from doing the right thing and causing them to duplicate "Client Reregistration" handlers and periodic queries . The bottom line: Do you agree we are missing these API's? When can we get those done? By whom? EZ Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Thursday, December 01, 2005 8:20 AM > To: Eitan Zahavi > Cc: OPENIB GENERAL; Yael Kalka; Aviram Gutman; Tziporet Koren; Roland Dreier; > sean.hefty at intel.com > Subject: RE: [openib-general] First Multicast Leave disconnects all other clients > > On Thu, 2005-12-01 at 01:07, Eitan Zahavi wrote: > > > > > > > > The bottom line: > > > > We are missing 3 agents in the OpenIB stack: > > > > InformInfo - handling registrations and Report dispatching > > > > > > These are not currently used. > > [EZ] They are by SRP initiator. > > Not the SRP initiator in OpenIB svn as far as I can tell. > > > > > ServiceRecord - tracks registrations > > > > > > ServiceRecord is implemented in sa_query (and was used by AT/uAT but > > > that is largely historical now) > > > > > > > Multicast Join/Leave - tracking registrations to multicast groups > > and > > > > ref-counting > > > > > > > > All these agents should be able to cleanup dead client registrations > > and > > > > also provide re-registration in case of SM ClientReregistration > > event. > > > > > > In OpenIB, any Set of PortInfo (which includes ClientReregister) > > > currently causes a (coarse) event (LID change) which causes IPoIB > > client > > > to reregister its multicasts registrations with the SA. > > > > > > > Please see below > > > > > > > > > > > > It seems the IBTA intent was that the IB driver will be > > responsible > > > > for maintaining > > > > > the list of clients > > > > > > registered to each group. > > > > > > > > > > Yes, the end node is responsible for tracking the registrations > > within > > > > > the node and fabricating responses when the node does not want to > > > > leave. > > > > > Is delete a different case though ? > > > > [EZ] No it is not. Delete of multicast group is really the last > > leave. > > > > > > There is an explicit delete. While it shouldn't be needed to be > > forced, > > > there is always some scenario where this is useful. > > [EZ] To my best knowledge any leave is a "delete" so there is no way for > > any client to force other members out of a group. It can only leave > > itself. The delete will happen when the last will leave. > > Yes, you are right, other than the last full member (join state) rule. > > > > > > > But the IB core does not track what clients registered (through > > SA > > > > requests) to a > > > > > particular multicast group. > > > > > > The first client to leave the group causes the rest (of the > > clients) > > > > to be disconnected. > > > > > > > > > > This is an implementation issue IMO and applies to other > > subscriptions > > > > > too (not just limited to multicast). > > > > [EZ] I agree it is an implementation issue. I hope it will get > > > > implemented in OpenIB. > > > > > > It will. It's a question of priorities and timing. > > > > > > > > > My proposal is to provide an API for such registrations at both > > user > > > > and kernel and > > > > > track the requesting processes. > > > > > > Cleanup is also required both by process and kernel module > > > > granularity. > > > > > > > > > > Is the API the SA client request itself for this ? Shouldn't the > > > > > tracking be done there (within sa_query.c) ? > > > > [EZ] It will be hard to sniff the MADs (especially user level) for > > all > > > > the registration flows. > > > > > > It's not the sniffing which is hard but perhaps identifying which > > client > > > (and reference counting). > > > > > > > So I propose we should have > > > > > > ib_join/ib_leave/ib_reg_svc/ib_unreg_svc/ib_reg_inform/ib_unreg_inform. > > > > Both in user land and in kernel. > > > > > > I think this is TBD and the API would be discussed on this list first > > > prior to any implementation. > > > > > > > > > BTW: The same API could also handle "Client Reregistration" for > > > > multicast groups, > > > > > > > > > > Client reregistration is for all subscriptions (including > > > > ServiceRecords > > > > > and events as well). > > > > [EZ] Yes exactly. I believe similar problem exists for all > > > > registrations. > > > > > > > > > > > such that we could avoid the need to have that code duplicated > > by > > > > every client. > > > > > > > > > > I'm missing how client reregistration would help here. Can you > > > > elaborate > > > > > ? > > > > [EZ] It is related to the reference tracking: > > > > If a kernel module tracks all registrations to refcount them and > > perform > > > > cleanup, it could with similar effort also send the - > > re-registration in > > > > the event of SM change ... > > > > > > Sure, there are multiple ways to skin the same cat. > > > > > > > > > > > > > > But this refers to yet another API that is missing: Report > > > > dispatching which deserves > > > > > its own > > > > > > mail... > > > > > > > > > > I'm missing the connection between reregistration and report > > > > > dispatching. > > > > [EZ] Sorry for not being verbose. The need for Events dispatcher is > > > > based on the fact that only one client should respond to Report with > > > > ReportRepress. Reports are "unsolicited" MADs coming into the > > device. In > > > > umad the implementation prevents any "multiple" client registration > > for > > > > receiving any "unsolicited" MAD - only one class-agent needs to be > > there > > > > handling "unsolicited" messages. This is fine - but what it means is > > > > that when two clients wants to be notified about events they should > > > > register with that agent and the agent should be able to dispatch > > the > > > > message to all registered clients as well as send only one response > > > > back. > > > > > > Wouldn't report represses be reference counted and only actually sent > > on > > > the wire when all subscribed clients within the node indicated repress > > ? > > [EZ] As you say there are many ways to skin a cat. I am not sure we need > > to wait for all clients as they are located on the same node and will be > > surely notified. > > Right, it just needs to be done once whether it was actually delivered > to any client, clients, or none at all. > > -- Hal From iod00d at hp.com Thu Dec 1 08:16:00 2005 From: iod00d at hp.com (Grant Grundler) Date: Thu, 1 Dec 2005 08:16:00 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable Datagram Sockets) to OpenIB In-Reply-To: <52u0dvt4vx.fsf@cisco.com> References: <52d5li8waw.fsf@cisco.com> <52u0dvt4vx.fsf@cisco.com> Message-ID: <20051201161600.GA32308@esmail.cup.hp.com> On Tue, Nov 29, 2005 at 03:23:46PM -0800, Roland Dreier wrote: > Any progress to report on the port of RDS from the SilverStorm > proprietary stack to the standard Linux stack? I think it would > really move the discussion forward if there were some code that people > could build and use. As primary consumer of RDS, I think Oracle first needs to decide if the deficiencies that Mike Krause pointed out are acceptable or not. grant From halr at voltaire.com Thu Dec 1 09:02:25 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Dec 2005 12:02:25 -0500 Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3618A48@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3618A48@mtlexch01.mtl.com> Message-ID: <1133456350.4325.766.camel@hal.voltaire.com> Hi Eitan, On Thu, 2005-12-01 at 10:35, Eitan Zahavi wrote: > Hi Yael, > > As I read through the MgtWg mails I get the impression that an out of > spec mechanism is required to know if the other SM is trusted. Yes, that was what I was proposing (in http://openib.org/pipermail/openib-general/2005-December/014186.html where I wrote "The SM needs a way to know whether the other SM(s) (and which ones) are trusted or not so the SM_Key can be filled in."): that OpenSM have a list of trusted SMs and OpenSM would use that information. > In that case and since OpenSM does not currently provide any such > mechanism, I would prefer never to send out the SM_Key on the request > and always send zero. Sending our SM_Key to a non - trusted SM is not a > good idea in my mind. > > OpenSM behavior should be to always trust any other SM. Above you said no other SM was trusted so do you mean not trust rather than trust other SMs ? > So any discovered SM that deserves to be the master should be granted > that right. Only if it were trusted and had the correct SM Key. -- Hal > Eitan Zahavi > Design Technology Director > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -----Original Message----- > > From: Yael Kalka > > Sent: Thursday, December 01, 2005 2:17 PM > > To: 'Hal Rosenstock'; Eitan Zahavi > > Cc: openib-general at openib.org > > Subject: RE: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > > > Hi Hal, Eitan, > > I think the best option is to add an OpenSM option flag - > exit_on_fatal. > > This flag can decide on the action on fatal cases: > > 1. Exit or not when seeing SM with different SM_Key. > > 2. Exit or not when there is a fatal link error (e.g - multiple > guids). > > etc. > > > > I tried to run 2 SMs just now with different SM_keys, and I see that > none of them > > exit, since both receive SM_Key=0 on SMInfo GetResp. > > The reason for that is that in the SMInfo Get request (as in all other > requests) > > we do not send anything in the mad data. Meaning - all fields are > clear. > > In the __osm_sminfo_rcv_process_get_request function we are checking > the state > > according > > to the payload data. This is always zero! Thus - SM will never know > that the SMInfo > > request is sent from an SM that is master. > > > > I will work on a fix for that. > > Yael > > > > -----Original Message----- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Wednesday, November 30, 2005 11:57 PM > > To: Yael Kalka; Eitan Zahavi > > Cc: openib-general at openib.org > > Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > > > > > Hi Yael & Eitan, > > > > Based on the recent MgtWG discussions, are you still holding your > > position in terms of exiting OpenSM when a non matching SM Key is > > discovered ? Just wondering if I can issue a patch for this and clear > > this issue so OpenSM can be compliant for this aspect. Thanks. > > > > -- Hal > > > > -----Forwarded Message----- > > > > From: Hal Rosenstock > > To: openib-general at openib.org > > Subject: [openib-general] OpenSM and Wrong SM_Key > > Date: 08 Nov 2005 16:08:47 -0500 > > > > Hi, > > > > Currently, when OpenSM receives SMInfo with a different SM_Key, it > exits > > as follows: > > > > > > void > > __osm_sminfo_rcv_process_get_response( > > IN const osm_sminfo_rcv_t* const p_rcv, > > IN const osm_madw_t* const p_madw ) > > { > > ... > > > > > > > > /* > > Check that the sm_key of the found SM is the same as ours, > > or is zero. If not - OpenSM cannot continue with configuration!. > */ > > if ( p_smi->sm_key != 0 && > > p_smi->sm_key != p_rcv->p_subn->opt.sm_key ) > > { > > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > > "__osm_sminfo_rcv_process_get_response: ERR 2F18: " > > "Got SM with sm_key that doesn't match our " > > "local key. Exiting\n" ); > > osm_log( p_rcv->p_log, OSM_LOG_SYS, > > "Found remote SM with non-matching sm_key. Exiting\n" ); > > osm_exit_flag = TRUE; > > goto Exit; > > } > > > > C14-61.2.1 states that: > > A master SM which finds a higher priority master SM with the wrong > > SM_Key should not relinquish the subnet. > > > > Exiting OpenSM relinquishes the subnet. > > > > So it appears to me that perhaps this behavior of exiting OpenSM > should > > be at least contingent on the SM state and relative priority of the > > SMInfo received. Make sense ? If so, I will work on a patch for this. > > > > -- Hal > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Thu Dec 1 09:19:31 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Dec 2005 12:19:31 -0500 Subject: [openib-general] OpenSM: search_mgrp_by_mgid questions In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3618A47@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3618A47@mtlexch01.mtl.com> Message-ID: <1133457475.4325.888.camel@hal.voltaire.com> Hi Eitan, On Thu, 2005-12-01 at 10:28, Eitan Zahavi wrote: > Hi Hal, > > You are very right. Thanks. Can you patch it? Sure. Any prefereance for which way should the comment be (like PR or MCM) ? -- Hal > Eitan Zahavi > Design Technology Director > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -----Original Message----- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Thursday, December 01, 2005 4:53 PM > > To: Yael Kalka > > Cc: openib-general at openib.org > > Subject: [openib-general] OpenSM: search_mgrp_by_mgid questions > > > > Hi Yael, > > > > osm_sa_path_record.c::__search_mgrp_by_mgid has the following: > > > > p_recvd_mgid = p_ctxt->p_mgid; > > p_rcv = p_ctxt->p_rcv; > > > > /* Why not compare the entire MGID ???? */ > > /* different scope can sneak in for the same MGID ? */ > > /* EZ: I changed it to full compare ! */ > > if (cl_memcmp(&p_mgrp->mcmember_rec.mgid, > > p_recvd_mgid, > > sizeof(ib_gid_t))) > > return; > > > > whereas osm_sa_mcmember_record.c::__search_mgrp_by_mgid has the > > following: > > > > p_recvd_mcmember_rec = p_ctxt->p_mcmember_rec; > > p_rcv = p_ctxt->p_rcv; > > > > /* ignore groups marked for deletion */ > > if (p_mgrp->to_be_deleted) > > return; > > > > /* compare entire MGID so different scope will not sneak in for > > the same MGID */ > > if (cl_memcmp(&p_mgrp->mcmember_rec.mgid, > > &p_recvd_mcmember_rec->mgid, > > sizeof(ib_gid_t))) > > return; > > > > Shouldn't the SA PR code also check for "to be deleted" ? It also > seems > > like the comments on the MGID comparison should also be made the same. > > > > -- Hal > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From Richard.Frank at oracle.com Thu Dec 1 10:02:37 2005 From: Richard.Frank at oracle.com (Richard Frank) Date: Thu, 01 Dec 2005 13:02:37 -0500 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable Datagram Sockets) to OpenIB In-Reply-To: <20051201161600.GA32308@esmail.cup.hp.com> References: <52d5li8waw.fsf@cisco.com> <52u0dvt4vx.fsf@cisco.com> <20051201161600.GA32308@esmail.cup.hp.com> Message-ID: <1133460157.6456.44.camel@localhost.localdomain> We do not see any deficiencies - the RDS specification and current implementation so far meet our requirements and is working very well. There is more we will want to do further down the road - such as access the RDS sockets via AIO so we can add zero copy support. On Thu, 2005-12-01 at 08:16 -0800, Grant Grundler wrote: > On Tue, Nov 29, 2005 at 03:23:46PM -0800, Roland Dreier wrote: > > Any progress to report on the port of RDS from the SilverStorm > > proprietary stack to the standard Linux stack? I think it would > > really move the discussion forward if there were some code that people > > could build and use. > > As primary consumer of RDS, I think Oracle first needs to decide if > the deficiencies that Mike Krause pointed out are acceptable or not. > > grant > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mshefty at ichips.intel.com Thu Dec 1 10:28:00 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 01 Dec 2005 10:28:00 -0800 Subject: [openib-general] Re: spinlock wrong CPU on CPU#1, ib_addr In-Reply-To: References: Message-ID: <438F40B0.4010000@ichips.intel.com> Or Gerlitz wrote: > BUG: spinlock wrong CPU on CPU#1, ib_addr/3866 lock: ffffffff88073428, > .magic: dead4ead, .owner: ib_addr/3866, .owner_cpu: 0 > > Call Trace:{_raw_spin_unlock+112} > {:ib_iser:iser_adaptor_find_by_device+188} Based on my efforts trying to decipher the code, it looks like the adaptor_list_lock in iser_adaptor_find_device() was acquired while running on CPU 0, but an attempt was made to release it on CPU 1. I think that the .owner field above is simply referring to the fact that the thread was created by the ib_addr module. Maybe someone more familiar with the spinlock code can confirm this? Also, what happens if an interrupt occurs on the processor while this spinlock is held? A couple of things to try would be: changing from spin_lock to the spin_lock_irqsave, or using a mutex. I didn't notice where there were any errors in ib_addr, the CMA, or iSER. Maybe this is an issue in the spinlock debugging code? How easily is this reproduced? Does the system continue to operate after you see this? - Sean From mshefty at ichips.intel.com Thu Dec 1 11:39:28 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 01 Dec 2005 11:39:28 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens In-Reply-To: References: Message-ID: <438F5170.3000208@ichips.intel.com> Sean Hefty wrote: > The following patch modifies the kernel CM API to support matching > private data in received REQs against listen requests. This allows the > CM to support multiple listeners on the same service ID if a > discriminator is carried in the private data. As an update: further testing revealed that there is an issue with this implementation that is also found in the original code. The issue deals with how listen requests that rely on a data mask are inserted and located in the red/black tree. I'm trying to come up with a fix for this. - Sean From rpandit at silverstorm.com Thu Dec 1 15:16:06 2005 From: rpandit at silverstorm.com (Ranjit Pandit) Date: Thu, 1 Dec 2005 15:16:06 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable Datagram Sockets) to OpenIB In-Reply-To: <1133460157.6456.44.camel@localhost.localdomain> References: <52d5li8waw.fsf@cisco.com> <52u0dvt4vx.fsf@cisco.com> <20051201161600.GA32308@esmail.cup.hp.com> <1133460157.6456.44.camel@localhost.localdomain> Message-ID: <96f8e60e0512011516g784d7740j1032e05604519d93@mail.gmail.com> I'm shooting to get something out by mid December. Ranjit On 12/1/05, Richard Frank wrote: > We do not see any deficiencies - the RDS specification and current > implementation so far meet our requirements and is working very well. > > There is more we will want to do further down the road - such as access > the RDS sockets via AIO so we can add zero copy support. > > > On Thu, 2005-12-01 at 08:16 -0800, Grant Grundler wrote: > > On Tue, Nov 29, 2005 at 03:23:46PM -0800, Roland Dreier wrote: > > > Any progress to report on the port of RDS from the SilverStorm > > > proprietary stack to the standard Linux stack? I think it would > > > really move the discussion forward if there were some code that people > > > could build and use. > > > > As primary consumer of RDS, I think Oracle first needs to decide if > > the deficiencies that Mike Krause pointed out are acceptable or not. > > > > grant > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From dledford at redhat.com Thu Dec 1 18:50:06 2005 From: dledford at redhat.com (Doug Ledford) Date: Thu, 01 Dec 2005 21:50:06 -0500 Subject: [openib-general] Announce: Updated packages available Message-ID: <438FB65E.50406@redhat.com> I've added to the list of available packages. In addition to libibverbs, libmthca, libsdp, and opensm, we now have udapl compiled. We also have an update initscripts package for RHEL-4 that enables static IP setups on ipoib interfaces and works at boot time. In addition, all the user space tools have been revved up to svn rev 4265. The kernel has not been recompiled since the last one and is still at 3965. I hope to get an updated kernel sometime tomorrow. http://people.redhat.com/dledford/Infiniband From yaronh at voltaire.com Fri Dec 2 05:49:00 2005 From: yaronh at voltaire.com (Yaron Haviv) Date: Fri, 2 Dec 2005 15:49:00 +0200 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB Message-ID: <35EA21F54A45CB47B879F21A91F4862F8FF0B6@taurus.voltaire.com> > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Richard Frank > Sent: Thursday, December 01, 2005 1:03 PM > To: Grant Grundler > Cc: openib-general at openib.org > Subject: Re: [openib-general] [ANNOUNCE] Contribute RDS (Reliable > DatagramSockets) to OpenIB > > We do not see any deficiencies - the RDS specification and current > implementation so far meet our requirements and is working very well. > > There is more we will want to do further down the road - such as access > the RDS sockets via AIO so we can add zero copy support. > Richard, In the document you published few weeks ago you listed latency and CPU% as key goals I assume to really get the latency down you need a user space implementation that can leverage on pooling, any plans to work in user space ? Several other comments/suggestions if I may add (may already took them into account): As a UDP consumer isn't there a need to support Multicast as well, and potentially leverage on IB multicast for scalability ? I feel that there is not much benefit in eliminating the reliability checks in the upper (UDP) consumer, since its negligible in CPU or latency overhead, you may even just go with a UC implementation, also UDP consumers may want to use RDS without modifying the application, or may accept dropped packets or over subscription (since they are interested in the most recent data). And it is very important to tie the RDS implementation to the IP stack for routing information/resolution, ARPs, etc' So it would become transparent from the mng/configuration side as well, not requiring separate configuration files, or dealing better with dynamic environments and failures like a real UDP would. Yaron > > On Thu, 2005-12-01 at 08:16 -0800, Grant Grundler wrote: > > On Tue, Nov 29, 2005 at 03:23:46PM -0800, Roland Dreier wrote: > > > Any progress to report on the port of RDS from the SilverStorm > > > proprietary stack to the standard Linux stack? I think it would > > > really move the discussion forward if there were some code that people > > > could build and use. > > > > As primary consumer of RDS, I think Oracle first needs to decide if > > the deficiencies that Mike Krause pointed out are acceptable or not. > > > > grant > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general From halr at voltaire.com Fri Dec 2 06:46:55 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Dec 2005 09:46:55 -0500 Subject: [openib-general] [PATCH] OpenSM: In osm_sa_path_record.c::__search_mgrp_by_mgid, ignore groups marked for deletion Message-ID: <1133534815.4337.57.camel@hal.voltaire.com> In osm_sa_path_record.c::__search_mgrp_by_mgid, ignore groups marked for deletion Signed-off-by: Hal Rosenstock Index: osm_sa_path_record.c =================================================================== --- osm_sa_path_record.c (revision 4280) +++ osm_sa_path_record.c (working copy) @@ -1230,6 +1230,10 @@ __search_mgrp_by_mgid( p_recvd_mgid = p_ctxt->p_mgid; p_rcv = p_ctxt->p_rcv; + /* ignore groups marked for deletion */ + if (p_mgrp->to_be_deleted) + return; + /* Why not compare the entire MGID ???? */ /* different scope can sneak in for the same MGID ? */ /* EZ: I changed it to full compare ! */ From halr at voltaire.com Fri Dec 2 08:10:51 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Dec 2005 11:10:51 -0500 Subject: [openib-general] First Multicast Leave disconnects all other clients In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3618A49@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3618A49@mtlexch01.mtl.com> Message-ID: <1133539562.4337.454.camel@hal.voltaire.com> On Thu, 2005-12-01 at 10:41, Eitan Zahavi wrote: > Hi Hal, > > SRP uses InformInfo to get notification about new or lost ports (trap > 64/65) such that new targets are recognized without periodic SA query. > I do not know if that code already found its way to OpenIB. It hasn't. > I do not think it is relevant to that discussion about missing APIs. > Maybe to the priority of implementation. But IMO - until we do provide > that missing capabilities we are actually preventing SRP and other ulps > from doing the right thing and causing them to duplicate "Client > Reregistration" handlers and periodic queries . > The bottom line: Do you agree we are missing these API's? Yes, OpenIB is missing this functionality. I vaguely recall having this discussion with you on the list a while ago... What shape the API would take is a discussion for this list. Is it an extension to the existing SA client API ? > When can we get those done? By whom? That's also a discussion for this list. Anyone else care to comment ? -- Hal > EZ > > Eitan Zahavi > Design Technology Director > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > --Original Message-- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Thursday, December 01, 2005 8:20 AM > > To: Eitan Zahavi > > Cc: OPENIB GENERAL; Yael Kalka; Aviram Gutman; Tziporet Koren; Roland > Dreier; > > sean.hefty at intel.com > > Subject: RE: [openib-general] First Multicast Leave disconnects all > other clients > > > > On Thu, 2005-12-01 at 01:07, Eitan Zahavi wrote: > > > > > > > > > > The bottom line: > > > > > We are missing 3 agents in the OpenIB stack: > > > > > InformInfo - handling registrations and Report dispatching > > > > > > > > These are not currently used. > > > [EZ] They are by SRP initiator. > > > > Not the SRP initiator in OpenIB svn as far as I can tell. > > > > > > > ServiceRecord - tracks registrations > > > > > > > > ServiceRecord is implemented in sa_query (and was used by AT/uAT > but > > > > that is largely historical now) > > > > > > > > > Multicast Join/Leave - tracking registrations to multicast > groups > > > and > > > > > ref-counting > > > > > > > > > > All these agents should be able to cleanup dead client > registrations > > > and > > > > > also provide re-registration in case of SM ClientReregistration > > > event. > > > > > > > > In OpenIB, any Set of PortInfo (which includes ClientReregister) > > > > currently causes a (coarse) event (LID change) which causes IPoIB > > > client > > > > to reregister its multicasts registrations with the SA. > > > > > > > > > Please see below > > > > > > > > > > > > > > It seems the IBTA intent was that the IB driver will be > > > responsible > > > > > for maintaining > > > > > > the list of clients > > > > > > > registered to each group. > > > > > > > > > > > > Yes, the end node is responsible for tracking the > registrations > > > within > > > > > > the node and fabricating responses when the node does not want > to > > > > > leave. > > > > > > Is delete a different case though ? > > > > > [EZ] No it is not. Delete of multicast group is really the last > > > leave. > > > > > > > > There is an explicit delete. While it shouldn't be needed to be > > > forced, > > > > there is always some scenario where this is useful. > > > [EZ] To my best knowledge any leave is a "delete" so there is no way > for > > > any client to force other members out of a group. It can only leave > > > itself. The delete will happen when the last will leave. > > > > Yes, you are right, other than the last full member (join state) rule. > > > > > > > > > But the IB core does not track what clients registered > (through > > > SA > > > > > requests) to a > > > > > > particular multicast group. > > > > > > > The first client to leave the group causes the rest (of the > > > clients) > > > > > to be disconnected. > > > > > > > > > > > > This is an implementation issue IMO and applies to other > > > subscriptions > > > > > > too (not just limited to multicast). > > > > > [EZ] I agree it is an implementation issue. I hope it will get > > > > > implemented in OpenIB. > > > > > > > > It will. It's a question of priorities and timing. > > > > > > > > > > > My proposal is to provide an API for such registrations at > both > > > user > > > > > and kernel and > > > > > > track the requesting processes. > > > > > > > Cleanup is also required both by process and kernel module > > > > > granularity. > > > > > > > > > > > > Is the API the SA client request itself for this ? Shouldn't > the > > > > > > tracking be done there (within sa_query.c) ? > > > > > [EZ] It will be hard to sniff the MADs (especially user level) > for > > > all > > > > > the registration flows. > > > > > > > > It's not the sniffing which is hard but perhaps identifying which > > > client > > > > (and reference counting). > > > > > > > > > So I propose we should have > > > > > > > > > ib_join/ib_leave/ib_reg_svc/ib_unreg_svc/ib_reg_inform/ib_unreg_inform. > > > > > Both in user land and in kernel. > > > > > > > > I think this is TBD and the API would be discussed on this list > first > > > > prior to any implementation. > > > > > > > > > > > BTW: The same API could also handle "Client Reregistration" > for > > > > > multicast groups, > > > > > > > > > > > > Client reregistration is for all subscriptions (including > > > > > ServiceRecords > > > > > > and events as well). > > > > > [EZ] Yes exactly. I believe similar problem exists for all > > > > > registrations. > > > > > > > > > > > > > such that we could avoid the need to have that code > duplicated > > > by > > > > > every client. > > > > > > > > > > > > I'm missing how client reregistration would help here. Can you > > > > > elaborate > > > > > > ? > > > > > [EZ] It is related to the reference tracking: > > > > > If a kernel module tracks all registrations to refcount them and > > > perform > > > > > cleanup, it could with similar effort also send the - > > > re-registration in > > > > > the event of SM change ... > > > > > > > > Sure, there are multiple ways to skin the same cat. > > > > > > > > > > > > > > > > > But this refers to yet another API that is missing: Report > > > > > dispatching which deserves > > > > > > its own > > > > > > > mail... > > > > > > > > > > > > I'm missing the connection between reregistration and report > > > > > > dispatching. > > > > > [EZ] Sorry for not being verbose. The need for Events dispatcher > is > > > > > based on the fact that only one client should respond to Report > with > > > > > ReportRepress. Reports are "unsolicited" MADs coming into the > > > device. In > > > > > umad the implementation prevents any "multiple" client > registration > > > for > > > > > receiving any "unsolicited" MAD - only one class-agent needs to > be > > > there > > > > > handling "unsolicited" messages. This is fine - but what it > means is > > > > > that when two clients wants to be notified about events they > should > > > > > register with that agent and the agent should be able to > dispatch > > > the > > > > > message to all registered clients as well as send only one > response > > > > > back. > > > > > > > > Wouldn't report represses be reference counted and only actually > sent > > > on > > > > the wire when all subscribed clients within the node indicated > repress > > > ? > > > [EZ] As you say there are many ways to skin a cat. I am not sure we > need > > > to wait for all clients as they are located on the same node and > will be > > > surely notified. > > > > Right, it just needs to be done once whether it was actually delivered > > to any client, clients, or none at all. > > > > -- Hal From mikopon99 at kobej.zzn.com Fri Dec 2 09:18:25 2005 From: mikopon99 at kobej.zzn.com (mikopon99 at kobej.zzn.com) Date: Fri, 2 Dec 2005 09:18:25 -0800 (PST) Subject: [openib-general] =?iso-2022-jp?b?GyRCJTslbCVWPXdALSQrJGkbKEI=?= =?iso-2022-jp?b?GyRCI0gkTkpzPTckTzpHRGMjMiMwS3wxXyQrJGkkRyQ5GyhC?= Message-ID: 20051203014158.59877mail@mail.lovelove-kameriasex552158754_lookserver772_womansystem01_woman-kameria-love.tv $B""8}%m(I[$B%m(I[[(B $B!!8}""(B $B!!(I[(B $B!!!2!2!2!2!2(B $B!!!!"#""!!!!!!!!!!5.J}MM$X$N8=7BT(Bmail $B"!(+(+(B $B!!!!!!!!(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(B $B!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"#""($(B $B!!!1!1!1!1!1!1!1!1!1!1!1!1!1!1!1!1!1!1!1!1!1!1!1!1(B $B!!!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g(B $B!!(B http://camellia.cx/h/ $B!!Ev(Bclub$B$O!"8@$o$:$bCN$l$?!X#H:GM%@hL\E*!Y$N(B $B!!Bg?M$N(Bclub$B!#KhF|MM!9$JM_K>$rJz$-!"O"F|O"Lk!"(B $B!!8+CN$i$LCK at -MM$H$NHkL)$NBg?M$N$*IU$-9g$$$,(B $B!!7+$j9-$2$i$l$F$*$j$^$9!#(B $B!!Ev(Bclub$B$K$4EPO?$r$5$l$?=w at -2q0wMM$O!"(B $B!!!X=iBPLL$NCK at -$H$NB(0)$$B(#H!Y$rM_$7$F$*$j$^$9!#(B $B!!$=$s$J0|Mp0|=wMMJ}$O!"5.J}MM$H$N#H$X$NJs=7$K(B $B!!:GDc#2#0K|1_$N at .8yJs=7$r$b$4MQ0UD:$$$F(B $B!!$*$j$^$9!#(B $B!!(B http://camellia.cx/h/ $B!!!cBg?M$N$*IU$-9g$$!d$=$7$F!"!cC;;~4V9b<}F~!d(B $B!!$NA4$F$r/2ACM$N9b$$40A4(Bvip$BBT6x@)!#(B $B!!4{$K5.J}MM$X$N?dA&=w at -2q0wMM$r$b(B $B!!$4MQ0U$5$;$FD:$-$^$7$F$NEv$4O"Mm$H$J$j$^$9!#(B $B!!(B http://camellia.cx/h/ $B!!!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g(B $B!!(!"!(B $B5.J}MM$X?dA&=w at -2q0wMM(B $B"!(!(B $B!!!N(B:Entry-No.011057:$B!O!~OBED(B $BfF;R!J(I\@^(B (I<.3:$B!KMM!!!~(B29$B:M(B $B!!(B http://camellia.cx/h/ $B!!!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g!g(B From norika_nitta at yahoo.com Fri Dec 2 09:26:37 2005 From: norika_nitta at yahoo.com (norika_nitta at yahoo.com) Date: Fri, 2 Dec 2005 09:26:37 -0800 (PST) Subject: [openib-general] =?iso-2022-jp?b?GyRCNSQ3WiRKJFskJiQsJGgbKEI=?= =?iso-2022-jp?b?GyRCJC8kSiQkJEckOSQrIUohKSEyISkbKEI=?= Message-ID: ��T�C�g�͒j��Ə��̏o��̏�Ƃ��ẴR�~��j�e�B��񋟂�� ܂��B�L��ɂ��^�c��s��Ă��܂��̂ŁA��Ȃ��S�� ł��p��܂��B ------------------------------------------------------------ ��S��I�I�@��@http://ad.love-meets.com/?a001 ------------------------------------------------------------ �s�v�ȃ��m�͈�؍폜�I�V��v��p��ł̋C�y��𐶂݁A ��ݏ��[�U�[��啝�ɑ��ł��I��R��[�U�[�ɂȂ�� ŗ��p��Ȃ��̎q��܂��E�E�E�ł��[�U�[�ł��ȏ�A ��Ȃ��[��𑗂�΂��̎q�ɂ͊m��ɓ͂��܂�(^.^ ��̎q�̊֐S��A�f�G��ȃ��[��𑗂��Ă��Ă��I [�]�v�ȃ��[��؂Ȃ��I�o��D��̃V�X�e��ł��I] �L��ɂ��ĉ^�c��Ă��T�C�g�Ȃ̂ŁA�v��t�B�[��̉{��E ��[��̑��M�Ȃǂ͈�ؖ��ł��I��ɁE�E�E ��@��A�h/��d�̑��M�E��S��Ȃ��I �t�B�[��O��̎q��A��ЁI��A�h/��d�� I��݂��ɂ��ƈ��S��ĘA��ł��傤�I ------------------------------------------------------------ ��S��I�I�@��@http://ad.love-meets.com/?a001 ------------------------------------------------------------ ��@�t��ł̓o�^�n�j�I ya��o��Ahot��ail�Ȃǂ̃t��[��[��̗��p�͑S�R��Ȃ��ł��I �C�y�Ɏg��Ă��ˁB ��܂��ł��A�u��̃L��i��̃��[�U�[��j��߂��ȁv�� v��A�V��t��[��[��ĐV�L��ōēx�`��W ��Ă��I�I �v��t�B�[��ŏ��̎q�̋C��߂�Ƙb��ł��I ------------------------------------------------------------ ��S��I�I�@��@http://ad.love-meets.com/?a001 ------------------------------------------------------------ ��@��ł�މ�n�j �₵��ł��A�u��߂��ȁ[�v�Ǝv��炢�ł�މ�n�j�ł��E�E�E �܂��A�^�C�~��O��Ȃ��ƑS��_��Ȃ��Ƃ��炠�� Ȃ��̂ŁE�E�E�B ------------------------------------------------------------ ��S��I�I�@��@http://ad.love-meets.com/?a001 ------------------------------------------------------------ �z�M��ہFluvletters at cashette.com From mshefty at ichips.intel.com Fri Dec 2 10:52:13 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 02 Dec 2005 10:52:13 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens In-Reply-To: <438F5170.3000208@ichips.intel.com> References: <438F5170.3000208@ichips.intel.com> Message-ID: <439097DD.2050300@ichips.intel.com> Sean Hefty wrote: > As an update: further testing revealed that there is an issue with this > implementation that is also found in the original code. The issue deals > with how listen requests that rely on a data mask are inserted and > located in the red/black tree. I'm trying to come up with a fix for this. After researching into this, I'm coming to the conclusion that there does not exist an efficient way to sort/search for listens without adding some restrictions. For example, a client listens on id1 with mask1. A request is matched with the listen if its serviceid & mask1 = id1. If a second client listens on id2 with mask2, then a request must check against both requests for a match, or until a match is found. There's no method that I can find that can be used to filter checks that works in a generic fashion, resulting in requests needing to walk a linear list of listens. There are several potential fixes for this, with only a couple mentioned below. One solution around this is to have the IB CM only listen on service IDs, and remove the mask parameter from the API. This requires SDP to change to only listen on ports that have a listener. Another alternative is to restrict the type of masks that are supported. If masks are restricted to a series of most significant bits, then the existing algorithm can be used. For instance, we can support masks 0xFF00 and 0xFFF0, but not 0x00FF or 0xFF0F. This restriction would work for both SDP and the CMA. To be clear, the API could change from a mask to the number of bits to match. Matching on private data can either be done by clients, or restrictions can be placed on it as well. For private data, I believe that a restriction that all listen requests on the same service ID use the same mask is sufficient. Hopefully this makes sense to people. Thoughts? - Sean From caitlinb at broadcom.com Fri Dec 2 10:58:54 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Fri, 2 Dec 2005 10:58:54 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens Message-ID: <54AD0F12E08D1541B826BE97C98F99F10C27C8@NT-SJCA-0751.brcm.ad.broadcom.com> openib-general-bounces at openib.org wrote: > Sean Hefty wrote: >> As an update: further testing revealed that there is an issue with >> this implementation that is also found in the original code. The >> issue deals with how listen requests that rely on a data mask are >> inserted and located in the red/black tree. I'm trying to come up >> with a fix for this. > > After researching into this, I'm coming to the conclusion > that there does not exist an efficient way to sort/search for > listens without adding some restrictions. > > For example, a client listens on id1 with mask1. A request > is matched with the listen if its serviceid & mask1 = id1. > If a second client listens on id2 with mask2, then a request > must check against both requests for a match, or until a > match is found. There's no method that I can find that can > be used to filter checks that works in a generic fashion, > resulting in requests needing to walk a linear list of > listens. There are several potential fixes for this, with > only a couple mentioned below. > > One solution around this is to have the IB CM only listen on > service IDs, and remove the mask parameter from the API. > This requires SDP to change to only listen on ports that have a > listener. > > Another alternative is to restrict the type of masks that are > supported. If masks are restricted to a series of most > significant bits, then the existing algorithm can be used. > For instance, we can support masks 0xFF00 and 0xFFF0, but not > 0x00FF or 0xFF0F. This restriction would work for both SDP and the > CMA. To be clear, the API could change from a mask to the number of > bits to match. > > Matching on private data can either be done by clients, or > restrictions can be placed on it as well. For private data, > I believe that a restriction that all listen requests on the > same service ID use the same mask is sufficient. > > Hopefully this makes sense to people. Thoughts? > Just listen on the Service ID / Port and let the ULP sort them out by destination IP address. From ftillier at silverstorm.com Fri Dec 2 11:39:15 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Fri, 2 Dec 2005 11:39:15 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F10C27C8@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <001901c5f778$180fc4f0$9e5aa8c0@infiniconsys.com> > From: Caitlin Bestler [mailto:caitlinb at broadcom.com] > Sent: Friday, December 02, 2005 10:59 AM > > Just listen on the Service ID / Port and let the ULP sort them > out by destination IP address. That only works if there is a single kernel module providing the extra checks. Multiple user-mode ULPs cannot do the checking in user-mode - the checking must be done in the kernel to figure out which user-mode client to hand the request to. I think putting in restrictions to the comparisons possible is fine, as the functionality of having the CM facilitate some sort of filtering is useful. - Fab From caitlinb at broadcom.com Fri Dec 2 11:45:34 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Fri, 2 Dec 2005 11:45:34 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens Message-ID: <54AD0F12E08D1541B826BE97C98F99F10C27DB@NT-SJCA-0751.brcm.ad.broadcom.com> Fab Tillier wrote: >> From: Caitlin Bestler [mailto:caitlinb at broadcom.com] >> Sent: Friday, December 02, 2005 10:59 AM >> >> Just listen on the Service ID / Port and let the ULP sort them out by >> destination IP address. > > That only works if there is a single kernel module providing the > extra checks. Multiple user-mode ULPs cannot do the checking in > user-mode - the checking must be done in the kernel to figure out > which user-mode client to hand the request to. > > I think putting in restrictions to the comparisons possible > is fine, as the functionality of having the CM facilitate > some sort of filtering is useful. > > - Fab Filtering between multiple kernels is fine, but it does not involve the API. Basically if you are filtering amongst multiple kernels then the Hypervisor is doing it, not any user-mode client or even the individual kernels. Within the context of a single Guest OS, I do not seed the need to have multiple listens on the same Service ID / TCP Port number for a given network interface. In that context the more common usage is to have the destination address select the *content* that will be served, not the service that will be selected. (Or put another way, the class of the daemon object being connected with is a constant -- only the instance data is different). For those uses the number of virtual servers can be very large, literally thousands. The filtering/selection is best left to the content-serving daemon itself. From mshefty at ichips.intel.com Fri Dec 2 11:52:31 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 02 Dec 2005 11:52:31 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens In-Reply-To: <001901c5f778$180fc4f0$9e5aa8c0@infiniconsys.com> References: <001901c5f778$180fc4f0$9e5aa8c0@infiniconsys.com> Message-ID: <4390A5FF.9090404@ichips.intel.com> Fab Tillier wrote: >>Just listen on the Service ID / Port and let the ULP sort them >>out by destination IP address. > > That only works if there is a single kernel module providing the extra checks. > Multiple user-mode ULPs cannot do the checking in user-mode - the checking must > be done in the kernel to figure out which user-mode client to hand the request > to. > > I think putting in restrictions to the comparisons possible is fine, as the > functionality of having the CM facilitate some sort of filtering is useful. My concern with pushing this to the ULP is that it requires the ULP to track service IDs for reference counting purposes and adds additional synchronization to the ULP that could have been handled by the CM. I'm looking at what the full effect of implementing this in the ULP would be. - Sean From caitlinb at broadcom.com Fri Dec 2 12:13:00 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Fri, 2 Dec 2005 12:13:00 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens Message-ID: <54AD0F12E08D1541B826BE97C98F99F10C27E4@NT-SJCA-0751.brcm.ad.broadcom.com> Sean Hefty wrote: > Fab Tillier wrote: >>> Just listen on the Service ID / Port and let the ULP sort them out >>> by destination IP address. >> >> That only works if there is a single kernel module providing the >> extra checks. Multiple user-mode ULPs cannot do the checking in >> user-mode - the checking must be done in the kernel to figure out >> which user-mode client to hand the request to. >> >> I think putting in restrictions to the comparisons possible is fine, >> as the functionality of having the CM facilitate some sort of >> filtering is useful. > > My concern with pushing this to the ULP is that it requires > the ULP to track service IDs for reference counting purposes > and adds additional synchronization to the ULP that could have been > handled by the CM. > > I'm looking at what the full effect of implementing this in the ULP > would be. > > - Sean I'm still missing something. My understanding is that there are two scenarios where differentiating on the Destination IP address to subqualify a listen is required: 1) When virtualization is in effect and the device is shared by multiple kernels that are not aware of each other. In this case a Destination Address (which could be the IP Address, or the Ethernet MAC, or the GID) determines which kernel is the destination for all packets, and which connections can be set up for which kernels. 2) When the daemon itself is virtualizing multiple instances of the same service, such as a virtual web or ftp sites. The same httpd/ftpd is reached in all cases, but the virtual root maps to a different root in the local file system (i.e., the instance data is different). I don't see how filtering in the CM is of benefit in either case. The work either belongs in the Hypervisor or in the Daemon, not the CM. From ftillier at silverstorm.com Fri Dec 2 12:22:47 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Fri, 2 Dec 2005 12:22:47 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F10C27E4@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <001a01c5f77e$2a9c3f30$9e5aa8c0@infiniconsys.com> > From: Caitlin Bestler [mailto:caitlinb at broadcom.com] > Sent: Friday, December 02, 2005 12:13 PM > > Sean Hefty wrote: > > Fab Tillier wrote: > >>> Just listen on the Service ID / Port and let the ULP sort them out > >>> by destination IP address. > >> > >> That only works if there is a single kernel module providing the > >> extra checks. Multiple user-mode ULPs cannot do the checking in > >> user-mode - the checking must be done in the kernel to figure out > >> which user-mode client to hand the request to. > >> > >> I think putting in restrictions to the comparisons possible is fine, > >> as the functionality of having the CM facilitate some sort of > >> filtering is useful. > > > > My concern with pushing this to the ULP is that it requires > > the ULP to track service IDs for reference counting purposes > > and adds additional synchronization to the ULP that could have been > > handled by the CM. > > > > I'm looking at what the full effect of implementing this in the ULP > > would be. > > I'm still missing something. > > I don't see how filtering in the CM is of benefit in either case. The > work either belongs in the Hypervisor or in the Daemon, not the CM. Your focus is strictly on TCP socket semantics, but we're talking about IB CM functionality - the IB CM does more than just provide TCP socket semantics. Imagine a user-mode IB application (not virtualization mind you, but just an app) that wants to listen on a given SID (because the SID defines the application), but wants to discriminate incoming requests based on some content in the private data. Multiple instances of that application can only work properly if the CM performs the private data comparison to properly dispatch the incoming requests to the right user-mode process. If the CM doesn't provide the private data compare functionality, then the app developer needs to create a kernel agent to perform this functionality for the app. The functionality is simple enough, and has potential value to multiple clients, that it makes sense to have the IB CM provide it. - Fab From caitlinb at broadcom.com Fri Dec 2 12:27:41 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Fri, 2 Dec 2005 12:27:41 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens Message-ID: <54AD0F12E08D1541B826BE97C98F99F10C27E9@NT-SJCA-0751.brcm.ad.broadcom.com> Fab Tillier wrote: >> From: Caitlin Bestler [mailto:caitlinb at broadcom.com] >> Sent: Friday, December 02, 2005 12:13 PM >> >> Sean Hefty wrote: >>> Fab Tillier wrote: >>>>> Just listen on the Service ID / Port and let the ULP sort them out >>>>> by destination IP address. >>>> >>>> That only works if there is a single kernel module providing the >>>> extra checks. Multiple user-mode ULPs cannot do the checking in >>>> user-mode - the checking must be done in the kernel to figure out >>>> which user-mode client to hand the request to. >>>> >>>> I think putting in restrictions to the comparisons possible is >>>> fine, as the functionality of having the CM facilitate some sort of >>>> filtering is useful. >>> >>> My concern with pushing this to the ULP is that it requires the ULP >>> to track service IDs for reference counting purposes and adds >>> additional synchronization to the ULP that could have been handled >>> by the CM. >>> >>> I'm looking at what the full effect of implementing this in the ULP >>> would be. >> >> I'm still missing something. >> >> I don't see how filtering in the CM is of benefit in either case. The >> work either belongs in the Hypervisor or in the Daemon, not the CM. > > Your focus is strictly on TCP socket semantics, but we're > talking about IB CM functionality - the IB CM does more than > just provide TCP socket semantics. > > Imagine a user-mode IB application (not virtualization mind > you, but just an > app) that wants to listen on a given SID (because the SID > defines the application), but wants to discriminate incoming > requests based on some content in the private data. Multiple > instances of that application can only work properly if the > CM performs the private data comparison to properly dispatch > the incoming requests to the right user-mode process. > > If the CM doesn't provide the private data compare > functionality, then the app developer needs to create a > kernel agent to perform this functionality for the app. The > functionality is simple enough, and has potential value to > multiple clients, that it makes sense to have the IB CM provide it. > > - Fab You are proposing that the API be made more complex and you do not have any justification other that something some user-mode application *might* want to do. Why are these different user-mode applications sharing a Service ID in the first place? On what basis do they trust each other? How do they co-ordinate their filtering? Couldn't they use CM redirection to share the Service ID? The goal was supposed to be providing TCP-compatible connection setup, but this is describing something that is decidedly un-TCP-like. TCP applications differentiate within the daemon, or redirect connections. If they split connections based upon packet content it is only done by very sophisticated L7 load balancers that identify cookies or other HTTP content. From trimmer at silverstorm.com Fri Dec 2 13:22:19 2005 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Fri, 2 Dec 2005 16:22:19 -0500 Subject: [openib-general] [PATCH] [CM] add private data comparisonto match REQs with listens Message-ID: <5D78D28F88822E4D8702BB9EEF1A43670A0902@mercury.infiniconsys.com> > -----Original Message----- > From: Caitlin Bestler [mailto:caitlinb at broadcom.com] > Sent: Friday, December 02, 2005 1:59 PM > To: Sean Hefty > Cc: openib-general at openib.org > Subject: RE: [openib-general] [PATCH] [CM] add private data > comparisonto > match REQs with listens > > > openib-general-bounces at openib.org wrote: > > Sean Hefty wrote: > >> As an update: further testing revealed that there is an issue with > >> this implementation that is also found in the original code. The > >> issue deals with how listen requests that rely on a data mask are > >> inserted and located in the red/black tree. I'm trying to come up > >> with a fix for this. > > > > After researching into this, I'm coming to the conclusion > > that there does not exist an efficient way to sort/search for > > listens without adding some restrictions. > > > > For example, a client listens on id1 with mask1. A request > > is matched with the listen if its serviceid & mask1 = id1. > > If a second client listens on id2 with mask2, then a request > > must check against both requests for a match, or until a > > match is found. There's no method that I can find that can > > be used to filter checks that works in a generic fashion, > > resulting in requests needing to walk a linear list of > > listens. There are several potential fixes for this, with > > only a couple mentioned below. > > > > One solution around this is to have the IB CM only listen on > > service IDs, and remove the mask parameter from the API. > > This requires SDP to change to only listen on ports that have a > > listener. > > > > Another alternative is to restrict the type of masks that are > > supported. If masks are restricted to a series of most > > significant bits, then the existing algorithm can be used. > > For instance, we can support masks 0xFF00 and 0xFFF0, but not > > 0x00FF or 0xFF0F. This restriction would work for both SDP and the > > CMA. To be clear, the API could change from a mask to the > number of > > bits to match. > > > > Matching on private data can either be done by clients, or > > restrictions can be placed on it as well. For private data, > > I believe that a restriction that all listen requests on the > > same service ID use the same mask is sufficient. > > > > Hopefully this makes sense to people. Thoughts? > > > > Just listen on the Service ID / Port and let the ULP sort them > out by destination IP address. On approach is to make the sort criteria of the tree dependent on a comparison function. For example the sort could have a multi-faceted compare. We solved this problem in our stack (which allows listen by SID, sender GUID, receiver Port, private data, etc) by the following set of functions. These were called per red/black tree comparison (both inserts and searches used functions, potentially different). I realize these would not be used exactly as given, but they can provide some ideas on how to do it. ListenMap is the red/black tree our stack used to keep track of all listening CEPs in the system. // ListenMap Key Compare functions // Three functions are provided: // // CepListenAddrCompare - is used to insert cep entries into the ListenMap and // is the primary key_compare function for the ListenMap // // CepReqAddrCompare - is used to search the ListenMap as part of processing // an inbound REQ // // CepSidrReqAddrCompare - is used to search the ListenMap as part of // processing an inbound SIDR_REQ // // To provide the maximum flexibilty, the key for a CEP bound address is // sophisticated and allows wildcarded/optional fields. This allows // a listener to simply bind for all traffic of a given SID or to refine the // scope by binding for traffic to/from specific addresses, or specific // private data. The QPN/EECN/CaGUID aspect is used to allow multiple // outbound Peer Connects to still be considered unique. // // The result of this approach is very flexible CM bind. The same SID // can be used on different ports or between different node pairs for // completely different meanings. However a SID used between a given // pair of nodes must be used for a single model (Listen, Peer, Sidr) // In addition for Peer connects, each connect must have a unique // QPN/EECN/CaGUID. // // Comparision allows for wildcarding in all but SID // A value of 0 is a wildcard. See ib_helper.h:WildcardGidCompare for // the rules of GID comparision, which are more involved due to multiple Gid // formats // // Field is Used by models as follows: // Coallating order is: Listen Peer Connect Sidr Register // SID Y Y Y // local GID option Y future option // local LID option Y future option // QPN wildcard Y wildcard // EECN wildcard Y wildcard // CaGUID wildcard Y wildcard // remote GID option Y future option // remote LID option Y future option // private data discriminator length option option option // private data discriminator value option option option // // if bPeer is 0 for either CEP, the QPN, EECN and CaGUID are treated as a match // // FUTURE: add a sid masking option so can easily listen on a group // of SIDs with 1 listen (such as if low bits of sid have a private meaning) // // FUTURE: add a pkey option so can easily listen on a partition // // FUTURE: for SIDR to support GID/LID they will have to come from the LRH // and GRH headers to the CM mad. local GID and lid could be used to merely // select the local port number // A qmap key_compare function to compare the bound address for // two listener, SIDR or Peer Connect CEPs // // key1 - CEP1 pointer // key2 - CEP2 pointer // // Returns: // -1: cep1 bind address < cep2 bind address // 0: cep1 bind address = cep2 bind address (accounting for wildcards) // 1: cep1 bind address > cep2 bind address int CepListenAddrCompare(uint64 key1, uint64 key2) { IN CM_CEP_OBJECT* pCEP1 = (CM_CEP_OBJECT*)(uintn)key1; IN CM_CEP_OBJECT* pCEP2 = (CM_CEP_OBJECT*)(uintn)key2; int res; if (pCEP1->SID < pCEP2->SID) return -1; else if (pCEP1->SID > pCEP2->SID) return 1; res = WildcardGidCompare(&pCEP1->PrimaryPath.LocalGID, &pCEP2->PrimaryPath.LocalGID); if (res != 0) return res; res = WildcardCompareU64(pCEP1->PrimaryPath.LocalLID, pCEP2->PrimaryPath.LocalLID); if (res != 0) return res; if (pCEP1->bPeer && pCEP2->bPeer) { res = CompareU64(pCEP1->LocalEndPoint.QPN, pCEP2->LocalEndPoint.QPN); if (res != 0) return res; res = CompareU64(pCEP1->LocalEndPoint.EECN, pCEP2->LocalEndPoint.EECN); if (res != 0) return res; res = CompareU64(pCEP1->LocalEndPoint.CaGUID, pCEP2->LocalEndPoint.CaGUID); if (res != 0) return res; } res = WildcardGidCompare(&pCEP1->PrimaryPath.RemoteGID, &pCEP2->PrimaryPath.RemoteGID); if (res != 0) return res; res = WildcardCompareU64(pCEP1->PrimaryPath.RemoteLID, pCEP2->PrimaryPath.RemoteLID); if (res != 0) return res; // a length of 0 matches any private data, so this too is a wildcard compare if (pCEP1->DiscriminatorLen == 0 || pCEP2->DiscriminatorLen == 0) return 0; res = CompareU64(pCEP1->DiscriminatorLen, pCEP2->DiscriminatorLen); if (res != 0) return res; res = MemoryCompare(pCEP1->Discriminator, pCEP2->Discriminator, pCEP1->DiscriminatorLen); return res; } // A qmap key_compare function to search the ListenMap for a match with // a given REQ // // key1 - CEP pointer // key2 - REQ pointer // // Returns: // -1: cep1 bind address < req remote address // 0: cep1 bind address = req remote address (accounting for wildcards) // 1: cep1 bind address > req remote address // // The QPN/EECN/CaGUID are not part of the search, hence multiple Peer Connects // could be matched (and one which was started earliest should be then linearly // searched for among the neighbors of the matching CEP) int CepReqAddrCompare(uint64 key1, uint64 key2) { IN CM_CEP_OBJECT* pCEP = (CM_CEP_OBJECT*)(uintn)key1; IN CMM_REQ* pREQ = (CMM_REQ*)(uintn)key2; int res; if (pCEP->SID < pREQ->ServiceID) return -1; else if (pCEP->SID > pREQ->ServiceID) return 1; // local and remote is from perspective of sender (remote node in this // case, so we compare local to remote and visa versa res = WildcardGidCompare(&pCEP->PrimaryPath.LocalGID, &pREQ->PrimaryRemoteGID); if (res != 0) return res; res = WildcardCompareU64(pCEP->PrimaryPath.LocalLID, pREQ->PrimaryRemoteLID); if (res != 0) return res; // do not compare QPN/EECN/CaGUID res = WildcardGidCompare(&pCEP->PrimaryPath.RemoteGID, &pREQ->PrimaryLocalGID); if (res != 0) return res; res = WildcardCompareU64(pCEP->PrimaryPath.RemoteLID, pREQ->PrimaryLocalLID); if (res != 0) return res; // a length of 0 matches any private data, so this too is a wildcard compare if (pCEP->DiscriminatorLen == 0) return 0; res = MemoryCompare(pCEP->Discriminator, pREQ->PrivateData+pCEP->DiscrimPrivateDataOffset, pCEP->DiscriminatorLen); return res; } // A qmap key_compare function to search the ListenMap for a match with // a given SIDR_REQ // // key1 - CEP pointer // key2 - SIDR_REQ pointer // // Returns: // -1: cep1 bind address < cep2 bind address // 0: cep1 bind address = cep2 bind address (accounting for wildcards) // 1: cep1 bind address > cep2 bind address // // The QPN/EECN/CaGUID are not part of the search. int CepSidrReqAddrCompare(uint64 key1, uint64 key2) { IN CM_CEP_OBJECT* pCEP = (CM_CEP_OBJECT*)(uintn)key1; IN CMM_SIDR_REQ* pSIDR_REQ = (CMM_SIDR_REQ*)(uintn)key2; int res; if (pCEP->SID < pSIDR_REQ->ServiceID) return -1; else if (pCEP->SID > pSIDR_REQ->ServiceID) return 1; // GID and LIDs are wildcarded/not available at this time // do not compare QPN/EECN/CaGUID // a length of 0 matches any private data, so this too is a wildcard compare if (pCEP->DiscriminatorLen == 0) return 0; res = MemoryCompare(pCEP->Discriminator, pSIDR_REQ->PrivateData+pCEP->DiscrimPrivateDataOffset, pCEP->DiscriminatorLen); return res; } /* non-Wildcarded compare of 2 64 bit values * Return: * 0 : v1 == v2 * -1: v1 < v2 * 1 : v1 > v2 */ static __inline int CompareU64(uint64 v1, uint64 v2) { if (v1 == v2) return 0; else if (v1 < v2) return -1; else return 1; } /* Wildcarded compare of 2 64 bit values * Return: * 0 : v1 == v2 * -1: v1 < v2 * 1 : v1 > v2 * if v1 or v2 is 0, they are considered wildcards and match any value */ static __inline int WildcardCompareU64(uint64 v1, uint64 v2) { if (v1 == 0 || v2 == 0 || v1 == v2) return 0; else if (v1 < v2) return -1; else return 1; } /* Compare Gid1 to Gid2 (host byte order) * Return: * 0 : Gid1 == Gid2 * -1: Gid1 < Gid2 * 1 : Gid1 > Gid2 * This also allows for Wildcarded compare. * A MC Gid with the lower 56 bits all 0, will match any MC gid * A SubnetPrefix of 0 will match any top 64 bits of a non-MC gid * A InterfaceID of 0 will match any low 64 bits of a non-MC gid * Coallating order: * non-MC Subnet Prefix (0 is wildcard and comes first) * non-MC Interface ID (0 is wilcard and comes first) * MC wildcard * MC by value of low 56 bits (0 is wildcard and comes first) */ static __inline int WildcardGidCompare(IN const IB_GID* const pGid1, IN const IB_GID* const pGid2 ) { if (pGid1->Type.Multicast.s.FormatPrefix == IPV6_MULTICAST_PREFIX && pGid2->Type.Multicast.s.FormatPrefix == IPV6_MULTICAST_PREFIX) { /* Multicast compare: compare low 120 bits, 120 bits of 0 is wildcard */ uint64 h1 = pGid1->AsReg64s.H & ~IB_GID_MCAST_FORMAT_MASK_H; uint64 h2 = pGid2->AsReg64s.H & ~IB_GID_MCAST_FORMAT_MASK_H; /* check for 120 bits of wildcard */ if ((h1 == 0 && pGid1->AsReg64s.L == 0) || (h2 == 0 && pGid2->AsReg64s.L == 0)) { return 0; } else if (h1 < h2) { return -1; } else if (h1 > h2) { return 1; } else { return CompareU64(pGid1->AsReg64s.L, pGid1->AsReg64s.L); } } else if (pGid1->Type.Multicast.s.FormatPrefix == IPV6_MULTICAST_PREFIX) { /* Gid1 is MC, Gid2 is other, treat MC as > others */ return 1; } else if (pGid2->Type.Multicast.s.FormatPrefix == IPV6_MULTICAST_PREFIX) { /* Gid1 is other, Gid2 is MC, treat other as < MC */ return -1; } else { /* Non-Multicast compare: compare high 64 bits */ /* Note all other GID formats are essentially a prefix in upper */ /* 64 bits and a identifier in the low 64 bits */ /* so this covers link local, site local, global formats */ int res = WildcardCompareU64(pGid1->AsReg64s.H, pGid2->AsReg64s.H); if (res == 0) { return WildcardCompareU64(pGid1->AsReg64s.L, pGid2->AsReg64s.L); } else { return res; } } } From tom at opengridcomputing.com Fri Dec 2 14:14:10 2005 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 02 Dec 2005 16:14:10 -0600 Subject: [openib-general] [PATCH] [CM] add private data comparisonto match REQs with listens In-Reply-To: <5D78D28F88822E4D8702BB9EEF1A43670A0902@mercury.infiniconsys.com> References: <5D78D28F88822E4D8702BB9EEF1A43670A0902@mercury.infiniconsys.com> Message-ID: <1133561650.21815.124.camel@trinity.austin.ammasso.com> Am I correct to assume that this functionality is unique to the IB CM and is not going to be exposed through the CMA? On Fri, 2005-12-02 at 16:22 -0500, Rimmer, Todd wrote: > > -----Original Message----- > > From: Caitlin Bestler [mailto:caitlinb at broadcom.com] > > Sent: Friday, December 02, 2005 1:59 PM > > To: Sean Hefty > > Cc: openib-general at openib.org > > Subject: RE: [openib-general] [PATCH] [CM] add private data > > comparisonto > > match REQs with listens > > > > > > openib-general-bounces at openib.org wrote: > > > Sean Hefty wrote: > > >> As an update: further testing revealed that there is an issue with > > >> this implementation that is also found in the original code. The > > >> issue deals with how listen requests that rely on a data mask are > > >> inserted and located in the red/black tree. I'm trying to come up > > >> with a fix for this. > > > > > > After researching into this, I'm coming to the conclusion > > > that there does not exist an efficient way to sort/search for > > > listens without adding some restrictions. > > > > > > For example, a client listens on id1 with mask1. A request > > > is matched with the listen if its serviceid & mask1 = id1. > > > If a second client listens on id2 with mask2, then a request > > > must check against both requests for a match, or until a > > > match is found. There's no method that I can find that can > > > be used to filter checks that works in a generic fashion, > > > resulting in requests needing to walk a linear list of > > > listens. There are several potential fixes for this, with > > > only a couple mentioned below. > > > > > > One solution around this is to have the IB CM only listen on > > > service IDs, and remove the mask parameter from the API. > > > This requires SDP to change to only listen on ports that have a > > > listener. > > > > > > Another alternative is to restrict the type of masks that are > > > supported. If masks are restricted to a series of most > > > significant bits, then the existing algorithm can be used. > > > For instance, we can support masks 0xFF00 and 0xFFF0, but not > > > 0x00FF or 0xFF0F. This restriction would work for both SDP and the > > > CMA. To be clear, the API could change from a mask to the > > number of > > > bits to match. > > > > > > Matching on private data can either be done by clients, or > > > restrictions can be placed on it as well. For private data, > > > I believe that a restriction that all listen requests on the > > > same service ID use the same mask is sufficient. > > > > > > Hopefully this makes sense to people. Thoughts? > > > > > > > Just listen on the Service ID / Port and let the ULP sort them > > out by destination IP address. > > On approach is to make the sort criteria of the tree dependent on a comparison function. > > For example the sort could have a multi-faceted compare. > > We solved this problem in our stack (which allows listen by SID, sender GUID, receiver Port, private data, etc) by the following set of functions. These were called per red/black tree comparison (both inserts and searches used functions, potentially different). I realize these would not be used exactly as given, but they can provide some ideas on how to do it. ListenMap is the red/black tree our stack used to keep track of all listening CEPs in the system. > > // ListenMap Key Compare functions > // Three functions are provided: > // > // CepListenAddrCompare - is used to insert cep entries into the ListenMap and > // is the primary key_compare function for the ListenMap > // > // CepReqAddrCompare - is used to search the ListenMap as part of processing > // an inbound REQ > // > // CepSidrReqAddrCompare - is used to search the ListenMap as part of > // processing an inbound SIDR_REQ > // > // To provide the maximum flexibilty, the key for a CEP bound address is > // sophisticated and allows wildcarded/optional fields. This allows > // a listener to simply bind for all traffic of a given SID or to refine the > // scope by binding for traffic to/from specific addresses, or specific > // private data. The QPN/EECN/CaGUID aspect is used to allow multiple > // outbound Peer Connects to still be considered unique. > // > // The result of this approach is very flexible CM bind. The same SID > // can be used on different ports or between different node pairs for > // completely different meanings. However a SID used between a given > // pair of nodes must be used for a single model (Listen, Peer, Sidr) > // In addition for Peer connects, each connect must have a unique > // QPN/EECN/CaGUID. > // > // Comparision allows for wildcarding in all but SID > // A value of 0 is a wildcard. See ib_helper.h:WildcardGidCompare for > // the rules of GID comparision, which are more involved due to multiple Gid > // formats > // > // Field is Used by models as follows: > // Coallating order is: Listen Peer Connect Sidr Register > // SID Y Y Y > // local GID option Y future option > // local LID option Y future option > // QPN wildcard Y wildcard > // EECN wildcard Y wildcard > // CaGUID wildcard Y wildcard > // remote GID option Y future option > // remote LID option Y future option > // private data discriminator length option option option > // private data discriminator value option option option > // > // if bPeer is 0 for either CEP, the QPN, EECN and CaGUID are treated as a match > // > // FUTURE: add a sid masking option so can easily listen on a group > // of SIDs with 1 listen (such as if low bits of sid have a private meaning) > // > // FUTURE: add a pkey option so can easily listen on a partition > // > // FUTURE: for SIDR to support GID/LID they will have to come from the LRH > // and GRH headers to the CM mad. local GID and lid could be used to merely > // select the local port number > > > // A qmap key_compare function to compare the bound address for > // two listener, SIDR or Peer Connect CEPs > // > // key1 - CEP1 pointer > // key2 - CEP2 pointer > // > // Returns: > // -1: cep1 bind address < cep2 bind address > // 0: cep1 bind address = cep2 bind address (accounting for wildcards) > // 1: cep1 bind address > cep2 bind address > int > CepListenAddrCompare(uint64 key1, uint64 key2) > { > IN CM_CEP_OBJECT* pCEP1 = (CM_CEP_OBJECT*)(uintn)key1; > IN CM_CEP_OBJECT* pCEP2 = (CM_CEP_OBJECT*)(uintn)key2; > int res; > > if (pCEP1->SID < pCEP2->SID) > return -1; > else if (pCEP1->SID > pCEP2->SID) > return 1; > res = WildcardGidCompare(&pCEP1->PrimaryPath.LocalGID, &pCEP2->PrimaryPath.LocalGID); > if (res != 0) > return res; > res = WildcardCompareU64(pCEP1->PrimaryPath.LocalLID, pCEP2->PrimaryPath.LocalLID); > if (res != 0) > return res; > if (pCEP1->bPeer && pCEP2->bPeer) > { > res = CompareU64(pCEP1->LocalEndPoint.QPN, pCEP2->LocalEndPoint.QPN); > if (res != 0) > return res; > res = CompareU64(pCEP1->LocalEndPoint.EECN, pCEP2->LocalEndPoint.EECN); > if (res != 0) > return res; > res = CompareU64(pCEP1->LocalEndPoint.CaGUID, pCEP2->LocalEndPoint.CaGUID); > if (res != 0) > return res; > } > res = WildcardGidCompare(&pCEP1->PrimaryPath.RemoteGID, &pCEP2->PrimaryPath.RemoteGID); > if (res != 0) > return res; > res = WildcardCompareU64(pCEP1->PrimaryPath.RemoteLID, pCEP2->PrimaryPath.RemoteLID); > if (res != 0) > return res; > // a length of 0 matches any private data, so this too is a wildcard compare > if (pCEP1->DiscriminatorLen == 0 || pCEP2->DiscriminatorLen == 0) > return 0; > res = CompareU64(pCEP1->DiscriminatorLen, pCEP2->DiscriminatorLen); > if (res != 0) > return res; > res = MemoryCompare(pCEP1->Discriminator, pCEP2->Discriminator, pCEP1->DiscriminatorLen); > return res; > } > > // A qmap key_compare function to search the ListenMap for a match with > // a given REQ > // > // key1 - CEP pointer > // key2 - REQ pointer > // > // Returns: > // -1: cep1 bind address < req remote address > // 0: cep1 bind address = req remote address (accounting for wildcards) > // 1: cep1 bind address > req remote address > // > // The QPN/EECN/CaGUID are not part of the search, hence multiple Peer Connects > // could be matched (and one which was started earliest should be then linearly > // searched for among the neighbors of the matching CEP) > int > CepReqAddrCompare(uint64 key1, uint64 key2) > { > IN CM_CEP_OBJECT* pCEP = (CM_CEP_OBJECT*)(uintn)key1; > IN CMM_REQ* pREQ = (CMM_REQ*)(uintn)key2; > int res; > > if (pCEP->SID < pREQ->ServiceID) > return -1; > else if (pCEP->SID > pREQ->ServiceID) > return 1; > // local and remote is from perspective of sender (remote node in this > // case, so we compare local to remote and visa versa > res = WildcardGidCompare(&pCEP->PrimaryPath.LocalGID, &pREQ->PrimaryRemoteGID); > if (res != 0) > return res; > res = WildcardCompareU64(pCEP->PrimaryPath.LocalLID, pREQ->PrimaryRemoteLID); > if (res != 0) > return res; > // do not compare QPN/EECN/CaGUID > res = WildcardGidCompare(&pCEP->PrimaryPath.RemoteGID, &pREQ->PrimaryLocalGID); > if (res != 0) > return res; > res = WildcardCompareU64(pCEP->PrimaryPath.RemoteLID, pREQ->PrimaryLocalLID); > if (res != 0) > return res; > // a length of 0 matches any private data, so this too is a wildcard compare > if (pCEP->DiscriminatorLen == 0) > return 0; > res = MemoryCompare(pCEP->Discriminator, pREQ->PrivateData+pCEP->DiscrimPrivateDataOffset, pCEP->DiscriminatorLen); > return res; > } > > // A qmap key_compare function to search the ListenMap for a match with > // a given SIDR_REQ > // > // key1 - CEP pointer > // key2 - SIDR_REQ pointer > // > // Returns: > // -1: cep1 bind address < cep2 bind address > // 0: cep1 bind address = cep2 bind address (accounting for wildcards) > // 1: cep1 bind address > cep2 bind address > // > // The QPN/EECN/CaGUID are not part of the search. > int > CepSidrReqAddrCompare(uint64 key1, uint64 key2) > { > IN CM_CEP_OBJECT* pCEP = (CM_CEP_OBJECT*)(uintn)key1; > IN CMM_SIDR_REQ* pSIDR_REQ = (CMM_SIDR_REQ*)(uintn)key2; > int res; > > if (pCEP->SID < pSIDR_REQ->ServiceID) > return -1; > else if (pCEP->SID > pSIDR_REQ->ServiceID) > return 1; > // GID and LIDs are wildcarded/not available at this time > // do not compare QPN/EECN/CaGUID > // a length of 0 matches any private data, so this too is a wildcard compare > if (pCEP->DiscriminatorLen == 0) > return 0; > res = MemoryCompare(pCEP->Discriminator, pSIDR_REQ->PrivateData+pCEP->DiscrimPrivateDataOffset, pCEP->DiscriminatorLen); > return res; > } > > /* non-Wildcarded compare of 2 64 bit values > * Return: > * 0 : v1 == v2 > * -1: v1 < v2 > * 1 : v1 > v2 > */ > static __inline int > CompareU64(uint64 v1, uint64 v2) > { > if (v1 == v2) > return 0; > else if (v1 < v2) > return -1; > else > return 1; > } > > /* Wildcarded compare of 2 64 bit values > * Return: > * 0 : v1 == v2 > * -1: v1 < v2 > * 1 : v1 > v2 > * if v1 or v2 is 0, they are considered wildcards and match any value > */ > static __inline int > WildcardCompareU64(uint64 v1, uint64 v2) > { > if (v1 == 0 || v2 == 0 || v1 == v2) > return 0; > else if (v1 < v2) > return -1; > else > return 1; > } > > /* Compare Gid1 to Gid2 (host byte order) > * Return: > * 0 : Gid1 == Gid2 > * -1: Gid1 < Gid2 > * 1 : Gid1 > Gid2 > * This also allows for Wildcarded compare. > * A MC Gid with the lower 56 bits all 0, will match any MC gid > * A SubnetPrefix of 0 will match any top 64 bits of a non-MC gid > * A InterfaceID of 0 will match any low 64 bits of a non-MC gid > * Coallating order: > * non-MC Subnet Prefix (0 is wildcard and comes first) > * non-MC Interface ID (0 is wilcard and comes first) > * MC wildcard > * MC by value of low 56 bits (0 is wildcard and comes first) > */ > static __inline int > WildcardGidCompare(IN const IB_GID* const pGid1, IN const IB_GID* const pGid2 ) > { > if (pGid1->Type.Multicast.s.FormatPrefix == IPV6_MULTICAST_PREFIX > && pGid2->Type.Multicast.s.FormatPrefix == IPV6_MULTICAST_PREFIX) > { > /* Multicast compare: compare low 120 bits, 120 bits of 0 is wildcard */ > uint64 h1 = pGid1->AsReg64s.H & ~IB_GID_MCAST_FORMAT_MASK_H; > uint64 h2 = pGid2->AsReg64s.H & ~IB_GID_MCAST_FORMAT_MASK_H; > /* check for 120 bits of wildcard */ > if ((h1 == 0 && pGid1->AsReg64s.L == 0) > || (h2 == 0 && pGid2->AsReg64s.L == 0)) > { > return 0; > } else if (h1 < h2) { > return -1; > } else if (h1 > h2) { > return 1; > } else { > return CompareU64(pGid1->AsReg64s.L, pGid1->AsReg64s.L); > } > } else if (pGid1->Type.Multicast.s.FormatPrefix == IPV6_MULTICAST_PREFIX) { > /* Gid1 is MC, Gid2 is other, treat MC as > others */ > return 1; > } else if (pGid2->Type.Multicast.s.FormatPrefix == IPV6_MULTICAST_PREFIX) { > /* Gid1 is other, Gid2 is MC, treat other as < MC */ > return -1; > } else { > /* Non-Multicast compare: compare high 64 bits */ > /* Note all other GID formats are essentially a prefix in upper */ > /* 64 bits and a identifier in the low 64 bits */ > /* so this covers link local, site local, global formats */ > int res = WildcardCompareU64(pGid1->AsReg64s.H, pGid2->AsReg64s.H); > if (res == 0) > { > return WildcardCompareU64(pGid1->AsReg64s.L, pGid2->AsReg64s.L); > } else { > return res; > } > } > } > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From ftillier at silverstorm.com Fri Dec 2 14:21:01 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Fri, 2 Dec 2005 14:21:01 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F10C27E4@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <001b01c5f78e$af5706a0$9e5aa8c0@infiniconsys.com> > From: Caitlin Bestler [mailto:caitlinb at broadcom.com] > Sent: Friday, December 02, 2005 12:13 PM > > Sean Hefty wrote: > > Fab Tillier wrote: > >>> Just listen on the Service ID / Port and let the ULP sort them out > >>> by destination IP address. > >> > >> That only works if there is a single kernel module providing the > >> extra checks. Multiple user-mode ULPs cannot do the checking in > >> user-mode - the checking must be done in the kernel to figure out > >> which user-mode client to hand the request to. > >> > >> I think putting in restrictions to the comparisons possible is fine, > >> as the functionality of having the CM facilitate some sort of > >> filtering is useful. > > > > My concern with pushing this to the ULP is that it requires > > the ULP to track service IDs for reference counting purposes > > and adds additional synchronization to the ULP that could have been > > handled by the CM. > > > > I'm looking at what the full effect of implementing this in the ULP > > would be. > > I'm still missing something. > > I don't see how filtering in the CM is of benefit in either case. The > work either belongs in the Hypervisor or in the Daemon, not the CM. Your focus is strictly on TCP socket semantics, but we're talking about IB CM functionality - the IB CM does more than just provide TCP socket semantics. Imagine a user-mode IB application (not virtualization mind you, but just an app) that wants to listen on a given SID (because the SID defines the application), but wants to discriminate incoming requests based on some content in the private data. Multiple instances of that application can only work properly if the CM performs the private data comparison to properly dispatch the incoming requests to the right user-mode process. If the CM doesn't provide the private data compare functionality, then the app developer needs to create a kernel agent to perform this functionality for the app. The functionality is simple enough, and has potential value to multiple clients, that it makes sense to have the IB CM provide it. - Fab From ftillier at silverstorm.com Fri Dec 2 14:37:29 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Fri, 2 Dec 2005 14:37:29 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F10C27E9@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <001c01c5f790$fbba2980$9e5aa8c0@infiniconsys.com> > From: Caitlin Bestler [mailto:caitlinb at broadcom.com] > Sent: Friday, December 02, 2005 12:28 PM > > Fab Tillier wrote: > > > > Your focus is strictly on TCP socket semantics, but we're > > talking about IB CM functionality - the IB CM does more than > > just provide TCP socket semantics. > > > > Imagine a user-mode IB application (not virtualization mind > > you, but just an > > app) that wants to listen on a given SID (because the SID > > defines the application), but wants to discriminate incoming > > requests based on some content in the private data. Multiple > > instances of that application can only work properly if the > > CM performs the private data comparison to properly dispatch > > the incoming requests to the right user-mode process. > > > > If the CM doesn't provide the private data compare > > functionality, then the app developer needs to create a > > kernel agent to perform this functionality for the app. The > > functionality is simple enough, and has potential value to > > multiple clients, that it makes sense to have the IB CM provide it. > > You are proposing that the API be made more complex and > you do not have any justification other that something > some user-mode application *might* want to do. In Windows, the Winsock Direct provider does exactly this, and would require a kernel component if the IB CM wasn't providing this functionality. WSD uses the private data to carry the IP address of the client, but uses its own private data format. I believe some native-IB MPI implementations make use of similar functionality, using the rank of the process in the private data. This allows such implementations to limit the size of their SID range to a single value or a single value per job. > Why are these different user-mode applications sharing > a Service ID in the first place? On what basis do they > trust each other? How do they co-ordinate their filtering? > Couldn't they use CM redirection to share the Service ID? The world is larger than just TCP-compatible applications. I'm not talking about two applications sharing a SID, but two instances of one application sharing a SID. Imagine processes in a larger MPI job - the SID can be used to differentiate jobs, and the private data comparison can be used to differentiate different processes within that job. Alternatively, the SID could be constant, and the job ID and rank could be expressed in the private data, with the IB-level CM performing all the proper dispatching. I don't think CM redirection would work since both apps are on the same system, and share the same CM. There can only be a single connection ID namespace per HCA GUID or things quickly become ambiguous. > The goal was supposed to be providing TCP-compatible > connection setup, but this is describing something that > is decidedly un-TCP-like. TCP applications differentiate > within the daemon, or redirect connections. If they split > connections based upon packet content it is only done by > very sophisticated L7 load balancers that identify cookies > or other HTTP content. The goal of the CMA *is* to support TCP-compatible semantics, but that is not the goal of the IB CM. The IB CM already keeps track of listens and performs lookups when a REQ comes in based on service ID. Extending it to do some fairly basic extra checking is far simpler than adding duplicate lookup functionality to the CMA. This allows the IB CM to do all the filtering at once as part of REQ matching, and thus simplifies the CMA. It also allows user-mode apps to use similar functionality without requiring a kernel agent. Anyhow, do you have an objection to the CM enabling simple comparisons on private data? If so, what are your objections (aside from it not being TCP-like)? Thanks, - Fab From ftillier at silverstorm.com Fri Dec 2 14:37:29 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Fri, 2 Dec 2005 14:37:29 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison to match REQs with listens In-Reply-To: <1133561650.21815.124.camel@trinity.austin.ammasso.com> Message-ID: <001d01c5f790$fd0a5030$9e5aa8c0@infiniconsys.com> > From: Tom Tucker [mailto:tom at opengridcomputing.com] > Sent: Friday, December 02, 2005 2:14 PM > > Am I correct to assume that this functionality is unique to the IB CM > and is not going to be exposed through the CMA? My understanding is that the CMA would make use of that functionality, but it would not be exposed to users of the CMA. - Fab From trimmer at silverstorm.com Fri Dec 2 14:54:05 2005 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Fri, 2 Dec 2005 17:54:05 -0500 Subject: [openib-general] [PATCH] [CM] add private data comparison tomatch REQs with listens Message-ID: <5D78D28F88822E4D8702BB9EEF1A43670A0903@mercury.infiniconsys.com> > -----Original Message----- > From: Tillier, Fabian > Sent: Friday, December 02, 2005 5:21 PM > To: 'Caitlin Bestler'; 'Sean Hefty' > Cc: openib-general at openib.org > Subject: RE: [openib-general] [PATCH] [CM] add private data comparison > tomatch REQs with listens > > > > From: Caitlin Bestler [mailto:caitlinb at broadcom.com] > > Sent: Friday, December 02, 2005 12:13 PM > > > > Sean Hefty wrote: > > > Fab Tillier wrote: > > >>> Just listen on the Service ID / Port and let the ULP > sort them out > > >>> by destination IP address. > > >> > > >> That only works if there is a single kernel module providing the > > >> extra checks. Multiple user-mode ULPs cannot do the checking in > > >> user-mode - the checking must be done in the kernel to figure out > > >> which user-mode client to hand the request to. > > >> > > >> I think putting in restrictions to the comparisons > possible is fine, > > >> as the functionality of having the CM facilitate some sort of > > >> filtering is useful. > > > > > > My concern with pushing this to the ULP is that it requires > > > the ULP to track service IDs for reference counting purposes > > > and adds additional synchronization to the ULP that could > have been > > > handled by the CM. > > > > > > I'm looking at what the full effect of implementing this > in the ULP > > > would be. > > > > I'm still missing something. > > > > I don't see how filtering in the CM is of benefit in either > case. The > > work either belongs in the Hypervisor or in the Daemon, not the CM. > > Your focus is strictly on TCP socket semantics, but we're > talking about IB CM > functionality - the IB CM does more than just provide TCP > socket semantics. > > Imagine a user-mode IB application (not virtualization mind > you, but just an > app) that wants to listen on a given SID (because the SID defines the > application), but wants to discriminate incoming requests > based on some content > in the private data. Multiple instances of that application > can only work > properly if the CM performs the private data comparison to > properly dispatch the > incoming requests to the right user-mode process. > > If the CM doesn't provide the private data compare > functionality, then the app > developer needs to create a kernel agent to perform this > functionality for the > app. The functionality is simple enough, and has potential > value to multiple > clients, that it makes sense to have the IB CM provide it. > > - Fab I agree, to give you a good practical example, MPI needs to listen for incoming connections. It is wasteful to have MPI create separate SIDs for each rank (especially when there can be thousands of ranks in many jobs all running in the same cluster parts of which on the same node) and then listen on 1000s of SIDs in each process. Instead it makes sense to use a single SID for the entire job (possibly using the global Job ID as part of the SID), and have the private data of the REQ indicate the destination rank of the request. Then each rank in the MPI job can listen for the combination of the global Job ID's SID and private data where the destination rank matches itself (using 1 listening CEP per process) and let the CM filter by both criteria and deliver the REQs to the appropriate processes. The above scheme works very well and minimizes CM resource use for large MPI jobs. I'm sure other interesting and useful examples can be found as well. Todd Rimmer From caitlinb at broadcom.com Fri Dec 2 15:01:47 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Fri, 2 Dec 2005 15:01:47 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison tomatch REQs with listens Message-ID: <54AD0F12E08D1541B826BE97C98F99F10C2810@NT-SJCA-0751.brcm.ad.broadcom.com> Rimmer, Todd wrote: >> -----Original Message----- >> From: Tillier, Fabian >> Sent: Friday, December 02, 2005 5:21 PM >> To: 'Caitlin Bestler'; 'Sean Hefty' >> Cc: openib-general at openib.org >> Subject: RE: [openib-general] [PATCH] [CM] add private data >> comparison tomatch REQs with listens >> >> >>> From: Caitlin Bestler [mailto:caitlinb at broadcom.com] >>> Sent: Friday, December 02, 2005 12:13 PM >>> >>> Sean Hefty wrote: >>>> Fab Tillier wrote: >>>>>> Just listen on the Service ID / Port and let the ULP sort them >>>>>> out by destination IP address. >>>>> >>>>> That only works if there is a single kernel module providing the >>>>> extra checks. Multiple user-mode ULPs cannot do the checking in >>>>> user-mode - the checking must be done in the kernel to figure out >>>>> which user-mode client to hand the request to. >>>>> >>>>> I think putting in restrictions to the comparisons possible is >>>>> fine, as the functionality of having the CM facilitate some sort >>>>> of filtering is useful. >>>> >>>> My concern with pushing this to the ULP is that it requires the >>>> ULP to track service IDs for reference counting purposes and adds >>>> additional synchronization to the ULP that could have been handled >>>> by the CM. >>>> >>>> I'm looking at what the full effect of implementing this in the ULP >>>> would be. >>> >>> I'm still missing something. >>> >>> I don't see how filtering in the CM is of benefit in either case. >>> The work either belongs in the Hypervisor or in the Daemon, not the >>> CM. >> >> Your focus is strictly on TCP socket semantics, but we're talking >> about IB CM functionality - the IB CM does more than just provide >> TCP socket semantics. >> >> Imagine a user-mode IB application (not virtualization mind you, but >> just an app) that wants to listen on a given SID (because the SID >> defines the application), but wants to discriminate incoming >> requests based on some content in the private data. Multiple >> instances of that application can only work properly if the CM >> performs the private data comparison to properly dispatch the >> incoming requests to the right user-mode process. >> >> If the CM doesn't provide the private data compare functionality, >> then the app developer needs to create a kernel agent to perform this >> functionality for the app. The functionality is simple enough, and >> has potential value to multiple clients, that it makes sense to have >> the IB CM provide it. >> >> - Fab > > I agree, to give you a good practical example, MPI needs to > listen for incoming connections. > > It is wasteful to have MPI create separate SIDs for each rank > (especially when there can be thousands of ranks in many jobs > all running in the same cluster parts of which on the same > node) and then listen on 1000s of SIDs in each process. > > Instead it makes sense to use a single SID for the entire job > (possibly using the global Job ID as part of the SID), and > have the private data of the REQ indicate the destination > rank of the request. Then each rank in the MPI job can > listen for the combination of the global Job ID's SID and > private data where the destination rank matches itself (using > 1 listening CEP per process) and let the CM filter by both > criteria and deliver the REQs to the appropriate processes. > > The above scheme works very well and minimizes CM resource > use for large MPI jobs. > > I'm sure other interesting and useful examples can be found as well. > MPI works over plain TCP right now, and yet there is no such feature in INETD or in current socket listens. And they do not allocate a TCP Port to listen for each connection. Rather the same listen just accepts each connection and either creates the process or passes the handle to a process. There are many reasons why an established RDMA connection cannot be passed between processes, but I know of know reason why a Connection Request cannot be passed to a child or third process where it can be accepted. Why not emulate the existing solution rather than creating a new interface that is transport specific? Or conversely, if you truly think this is of general utility, why not implement it in INETD as well? From trimmer at silverstorm.com Fri Dec 2 15:11:59 2005 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Fri, 2 Dec 2005 18:11:59 -0500 Subject: [openib-general] [PATCH] [CM] add private data comparisonto match REQs with listens Message-ID: <5D78D28F88822E4D8702BB9EEF1A4367D12AE2@mercury.infiniconsys.com> Sean wrote: > This is similar in concept to what I have in my latest patch. > A difference is > that your discriminator is located at the start of the > private data, whereas I > was trying to use a mask. Actually our discrimator is an offset/len into the private data, adding an optional mask to that concept would be useful, this allowed a contiguous portion of the private data to be tested, but it did not need to occur at the start of the private data. > > Did you find a use for listening on the sender GUID? We have not yet used that feature, but having the ability to key off of all the assorted addressing info in the REQ seemed sensible. I'm sure some applications can come up with a use, perhaps as a security feature? Todd Rimmer From ftillier at silverstorm.com Fri Dec 2 15:17:10 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Fri, 2 Dec 2005 15:17:10 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison tomatch REQs with listens In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F10C2810@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <001e01c5f796$86f22b60$9e5aa8c0@infiniconsys.com> > From: Caitlin Bestler [mailto:caitlinb at broadcom.com] > Sent: Friday, December 02, 2005 3:02 PM > > There are many reasons why an established RDMA connection > cannot be passed between processes, but I know of know > reason why a Connection Request cannot be passed to a child > or third process where it can be accepted. > > Why not emulate the existing solution rather than creating > a new interface that is transport specific? Allowing a connection request to come in on one CID (which is associated with the listening process) and letting that connection be accepted by a different process requires making changes to the user-mode CM infrastructure to allow CIDs to be migrated safely between processes. This is very likely to be more difficult than adding private data comparison to the IB CM. This is all under the covers for socket applications. It avoids the need for the CMA to keep an efficiently searchable tree of listen requests to perform private data comparison when the IB CM already does 90% of the work. To sum up, it is simpler to add the private data compare functionality to the IB CM than to add it to every client that wants it. The changes required don't complicate the API significantly, certainly within the grasp of someone interfacing to verbs. I know this from experience because I've done it before. > Or conversely, if you truly think this is of general utility, > why not implement it in INETD as well? I wasn't making the case that it has general utility, just that it has utility within the realm of IB connection management. Someone else is welcome to expand the scope if they see fit, but that's not what I'm advocating. - Fab From caitlinb at broadcom.com Fri Dec 2 15:25:48 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Fri, 2 Dec 2005 15:25:48 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison tomatch REQs with listens Message-ID: <54AD0F12E08D1541B826BE97C98F99F10C281B@NT-SJCA-0751.brcm.ad.broadcom.com> Fab Tillier wrote: > > To sum up, it is simpler to add the private data compare > functionality to the IB CM than to add it to every client > that wants it. The changes required don't complicate the API > significantly, certainly within the grasp of someone > interfacing to verbs. I know this from experience because > I've done it before. > >> Or conversely, if you truly think this is of general utility, why not >> implement it in INETD as well? > > I wasn't making the case that it has general utility, just > that it has utility within the realm of IB connection > management. Someone else is welcome to expand the scope if > they see fit, but that's not what I'm advocating. > But if your justification is MPI Ranks then you have already exceed the scope "IB connection management". There is an *existing* solution on how the remote end establishes multiple connections to the same service but with different instances. That solution has been around for a very long time, literally decades. Needing to restructure your code slightly to preserve an existing interface that has been around that long does not seem inapropriate. Are you claiming that there is something in the definition of the protocol that *requires* IB to handle this differently than other networks do? The only IB specific issue that I can think of is that IB actually can afford to waste Service IDs more than IP can affor to waste TCP Ports. From ftillier at silverstorm.com Fri Dec 2 15:51:03 2005 From: ftillier at silverstorm.com (Fabian Tillier) Date: Fri, 2 Dec 2005 15:51:03 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison tomatch REQs with listens In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F10C281B@NT-SJCA-0751.brcm.ad.broadcom.com> References: <54AD0F12E08D1541B826BE97C98F99F10C281B@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <79ae2f320512021551x1bbfa79au7fedb8e5310549d9@mail.gmail.com> On 12/2/05, Caitlin Bestler wrote: > Fab Tillier wrote: > > > To sum up, it is simpler to add the private data compare > > functionality to the IB CM than to add it to every client > > that wants it. The changes required don't complicate the API > > significantly, certainly within the grasp of someone > > interfacing to verbs. I know this from experience because > > I've done it before. > > > >> Or conversely, if you truly think this is of general utility, why not > >> implement it in INETD as well? > > > > I wasn't making the case that it has general utility, just > > that it has utility within the realm of IB connection > > management. Someone else is welcome to expand the scope if > > they see fit, but that's not what I'm advocating. > > > > But if your justification is MPI Ranks then you have already > exceed the scope "IB connection management". Why shouldn't an MPI implementation that interfaces directly to IB verbs use the IB CM functionality? Why should it restrict itself to TCP connection semantics when IB can provide it with something richer? > There is an *existing* solution on how the remote end > establishes multiple connections to the same service > but with different instances. > > That solution has been around for a very long time, > literally decades. Needing to restructure your code > slightly to preserve an existing interface that has > been around that long does not seem inapropriate. The IB CM is not an existing interface. I'M TALKING ABOUT I-N-F-I-N-I-B-A-N-D. INFINIBAND. Not IP, not TCP, not sockets, not iWarp. Show me the IB CM API that has existed for decades. > Are you claiming that there is something in the > definition of the protocol that *requires* IB to > handle this differently than other networks do? IB listen semantics are different from socket listen semantics. Again, IB is not Ethernet, not iWarp, not IP, not TCP. This is an important point that I feel you keep missing. - Fab From caitlinb at broadcom.com Fri Dec 2 16:06:28 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Fri, 2 Dec 2005 16:06:28 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison tomatch REQs with listens Message-ID: <54AD0F12E08D1541B826BE97C98F99F10C2828@NT-SJCA-0751.brcm.ad.broadcom.com> ftillier.sst at gmail.com wrote: > On 12/2/05, Caitlin Bestler wrote: >> Fab Tillier wrote: >> >>> To sum up, it is simpler to add the private data compare >>> functionality to the IB CM than to add it to every client that wants >>> it. The changes required don't complicate the API significantly, >>> certainly within the grasp of someone interfacing to verbs. I know >>> this from experience because I've done it before. >>> >>>> Or conversely, if you truly think this is of general utility, why >>>> not implement it in INETD as well? >>> >>> I wasn't making the case that it has general utility, just that it >>> has utility within the realm of IB connection management. > Someone >>> else is welcome to expand the scope if they see fit, but that's not >>> what I'm advocating. >>> >> >> But if your justification is MPI Ranks then you have already exceed >> the scope "IB connection management". > > Why shouldn't an MPI implementation that interfaces directly > to IB verbs use the IB CM functionality? Why should it > restrict itself to TCP connection semantics when IB can > provide it with something richer? > >> There is an *existing* solution on how the remote end establishes >> multiple connections to the same service but with different >> instances. >> >> That solution has been around for a very long time, literally >> decades. Needing to restructure your code slightly to preserve an >> existing interface that has been around that long does not seem >> inapropriate. > > The IB CM is not an existing interface. I'M TALKING ABOUT > I-N-F-I-N-I-B-A-N-D. INFINIBAND. Not IP, not TCP, not > sockets, not iWarp. Show me the IB CM API that has existed for > decades. > >> Are you claiming that there is something in the definition of the >> protocol that *requires* IB to handle this differently than other >> networks do? > > IB listen semantics are different from socket listen semantics. > Again, IB is not Ethernet, not iWarp, not IP, not TCP. This > is an important point that I feel you keep missing. > > - Fab Socket listen semantics have nothing to do with Ethernet. They are Unix/POSIX. In fact a major point of socket semantics is that they worked over multiple networks. Sockets are part of the problem when it comes to transferring data once a connection is established, which is why we have QPs and CQs. But there is a very simple transport neutral definition of passive side connectin setup. The server issues a listen. The server receives connection requests. The service can optionally hand off the connection request, accept it or reject it. That model is a natural extension of both TCP connection setup and the InfiniBand CM. It allows the server to deal with destination multiplexing. DAPL and IT-API both already work this way. Are you opposed to transport neutral connection establishment? From ftillier at silverstorm.com Fri Dec 2 16:57:37 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Fri, 2 Dec 2005 16:57:37 -0800 Subject: [openib-general] [PATCH] [CM] add private data comparison tomatch REQs with listens In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F10C2828@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <001f01c5f7a4$8e9c6ac0$9e5aa8c0@infiniconsys.com> > From: Caitlin Bestler [mailto:caitlinb at broadcom.com] > Sent: Friday, December 02, 2005 4:06 PM > > Socket listen semantics have nothing to do with Ethernet. > They are Unix/POSIX. In fact a major point of socket semantics > is that they worked over multiple networks. The IB CM doesn't provide socket semantics. Period, end of story. Providing socket semantics is higher level functionality (the CMA), and outside the scope of the IB CM and this email thread. > Sockets are part of the problem when it comes to transferring > data once a connection is established, which is why we have > QPs and CQs. Irrelevant. > But there is a very simple transport neutral definition of > passive side connectin setup. The server issues a listen. > The server receives connection requests. The service can > optionally hand off the connection request, accept it > or reject it. There is no notion of per-request handoff in IB - you either accept or reject - that's it. The reject can cause a redirect, but that requires a new connection request from the client. > That model is a natural extension of both TCP connection > setup and the InfiniBand CM. How does the IB CM protocol support hand off? > It allows the server to deal > with destination multiplexing. DAPL and IT-API both already > work this way. > > Are you opposed to transport neutral connection establishment? I don't give a hoot about transport neutral connection establishment, DAPL, or IT-API in the scope of this email thread. They just aren't relevant whatsoever. This thread is about adding private data comparison functionality to the IB CM. The IB CM is the module to which the CMA interfaces. The CMA is a separate module providing higher level functionality, and is designed to provide transport neutral connection establishment, specifically IP addressing over IB. As Sean originally stated in the mail that started this thread, the CMA will make use of the private data comparison functionality. Adding this functionality to the IB CM is simpler than implementing it in the CMA while at the same time providing additional flexibility to future users of the IB CM that wish to have similar functionality. - Fab From ianjiang.ict at gmail.com Sat Dec 3 09:45:52 2005 From: ianjiang.ict at gmail.com (Ian Jiang) Date: Sun, 4 Dec 2005 01:45:52 +0800 Subject: [openib-general] [kDAPL]How to register a vmalloc() allocated buffer Message-ID: <7b2fa1820512030945j22e205d9j86a3b8e7bd709182@mail.gmail.com> I am doing a simple rdma-read test using the kDAPL. My test is running in the kernel model. When I allocate both the data source and sink buffers using kmalloc() and register the buffers using dat_lmr_kcreate() with memory type DAT_MEM_TYPE_PHYSICAL, everything goes well. If the sink buffer is allocated with vmalloc() and registered as before, no registering error or rdma read DTO completion error occours but My questions: 1) Could a buffer allocated with vmalloc() be used for a kDAPL rdma reading? If so, 2) should a buffer of this kind be registered in the same as a buffer allocated with kmalloc()? Could anyone give some suggestion? Thanks very much! -- Ian Jiang ianjiang.ict at gmail.com Laboratory of Spatial Information Technology Division of System Architecture Institute of Computing Technology Chinese Academy of Sciences -------------- next part -------------- An HTML attachment was scrubbed... URL: From yael at mellanox.co.il Sun Dec 4 01:44:10 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 04 Dec 2005 11:44:10 +0200 Subject: [openib-general] [PATCH] Opensm - fix segfault on exit Message-ID: <5z8xv1xklx.fsf@mtl066.yok.mtl.com> Hi Hal, If the driver isn't loaded, opensm exits with segfault. This is since it tries destroying the signal event in the osm_vendor, but due to the failure - this event wasn't created. The following patch fixes this. Thanks, Yael Signed-off-by: Yael Kalka Index: libvendor/osm_vendor_ibumad.c =================================================================== --- libvendor/osm_vendor_ibumad.c (revision 4281) +++ libvendor/osm_vendor_ibumad.c (working copy) @@ -552,6 +552,7 @@ osm_vendor_delete( /* umad receiver thread ? */ p_ur = (*pp_vend)->receiver; + if (&p_ur->signal) cl_event_destroy( &p_ur->signal ); cl_spinlock_destroy( &(*pp_vend)->cb_lock ); cl_spinlock_destroy( &(*pp_vend)->match_tbl_lock ); From ogerlitz at voltaire.com Sun Dec 4 03:35:38 2005 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 04 Dec 2005 13:35:38 +0200 Subject: [openib-general] Re: spinlock wrong CPU on CPU#1, ib_addr In-Reply-To: <438F40B0.4010000@ichips.intel.com> References: <438F40B0.4010000@ichips.intel.com> Message-ID: <4392D48A.3090201@voltaire.com> Sean Hefty wrote: > it looks like the adaptor_list_lock in iser_adaptor_find_device() was > acquired while running on CPU 0, but an attempt was made to release it on CPU 1. Indeed. The thing did not reproduce, moreover i can't see why a thread which was interrupted on one CPU resume its running on another CPU. For now, I will not change this code to lock irq's. Or. From yael at mellanox.co.il Sun Dec 4 04:10:15 2005 From: yael at mellanox.co.il (Yael Kalka) Date: Sun, 4 Dec 2005 14:10:15 +0200 Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30E245C@mtlexch01.mtl.com> Hi Eitan, Hal, I agree that currently we do not have an authentication mechanism, thus - we cannot decide that an SM is not trusted. I think that in the current situation the option of always sending our true SM_Key when receiving SMInfo SubnGet request is a good one. In this case - there is no need to update anything in the SMInfo SubnGet request. Any objections? Yael -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Thursday, December 01, 2005 7:02 PM To: Eitan Zahavi Cc: Yael Kalka; openib-general at openib.org Subject: RE: [Fwd: [openib-general] OpenSM and Wrong SM_Key] Hi Eitan, On Thu, 2005-12-01 at 10:35, Eitan Zahavi wrote: > Hi Yael, > > As I read through the MgtWg mails I get the impression that an out of > spec mechanism is required to know if the other SM is trusted. Yes, that was what I was proposing (in http://openib.org/pipermail/openib-general/2005-December/014186.html where I wrote "The SM needs a way to know whether the other SM(s) (and which ones) are trusted or not so the SM_Key can be filled in."): that OpenSM have a list of trusted SMs and OpenSM would use that information. > In that case and since OpenSM does not currently provide any such > mechanism, I would prefer never to send out the SM_Key on the request > and always send zero. Sending our SM_Key to a non - trusted SM is not a > good idea in my mind. > > OpenSM behavior should be to always trust any other SM. Above you said no other SM was trusted so do you mean not trust rather than trust other SMs ? > So any discovered SM that deserves to be the master should be granted > that right. Only if it were trusted and had the correct SM Key. -- Hal > Eitan Zahavi > Design Technology Director > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -----Original Message----- > > From: Yael Kalka > > Sent: Thursday, December 01, 2005 2:17 PM > > To: 'Hal Rosenstock'; Eitan Zahavi > > Cc: openib-general at openib.org > > Subject: RE: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > > > Hi Hal, Eitan, > > I think the best option is to add an OpenSM option flag - > exit_on_fatal. > > This flag can decide on the action on fatal cases: > > 1. Exit or not when seeing SM with different SM_Key. > > 2. Exit or not when there is a fatal link error (e.g - multiple > guids). > > etc. > > > > I tried to run 2 SMs just now with different SM_keys, and I see that > none of them > > exit, since both receive SM_Key=0 on SMInfo GetResp. > > The reason for that is that in the SMInfo Get request (as in all other > requests) > > we do not send anything in the mad data. Meaning - all fields are > clear. > > In the __osm_sminfo_rcv_process_get_request function we are checking > the state > > according > > to the payload data. This is always zero! Thus - SM will never know > that the SMInfo > > request is sent from an SM that is master. > > > > I will work on a fix for that. > > Yael > > > > -----Original Message----- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Wednesday, November 30, 2005 11:57 PM > > To: Yael Kalka; Eitan Zahavi > > Cc: openib-general at openib.org > > Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > > > > > Hi Yael & Eitan, > > > > Based on the recent MgtWG discussions, are you still holding your > > position in terms of exiting OpenSM when a non matching SM Key is > > discovered ? Just wondering if I can issue a patch for this and clear > > this issue so OpenSM can be compliant for this aspect. Thanks. > > > > -- Hal > > > > -----Forwarded Message----- > > > > From: Hal Rosenstock > > To: openib-general at openib.org > > Subject: [openib-general] OpenSM and Wrong SM_Key > > Date: 08 Nov 2005 16:08:47 -0500 > > > > Hi, > > > > Currently, when OpenSM receives SMInfo with a different SM_Key, it > exits > > as follows: > > > > > > void > > __osm_sminfo_rcv_process_get_response( > > IN const osm_sminfo_rcv_t* const p_rcv, > > IN const osm_madw_t* const p_madw ) > > { > > ... > > > > > > > > /* > > Check that the sm_key of the found SM is the same as ours, > > or is zero. If not - OpenSM cannot continue with configuration!. > */ > > if ( p_smi->sm_key != 0 && > > p_smi->sm_key != p_rcv->p_subn->opt.sm_key ) > > { > > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > > "__osm_sminfo_rcv_process_get_response: ERR 2F18: " > > "Got SM with sm_key that doesn't match our " > > "local key. Exiting\n" ); > > osm_log( p_rcv->p_log, OSM_LOG_SYS, > > "Found remote SM with non-matching sm_key. Exiting\n" ); > > osm_exit_flag = TRUE; > > goto Exit; > > } > > > > C14-61.2.1 states that: > > A master SM which finds a higher priority master SM with the wrong > > SM_Key should not relinquish the subnet. > > > > Exiting OpenSM relinquishes the subnet. > > > > So it appears to me that perhaps this behavior of exiting OpenSM > should > > be at least contingent on the SM state and relative priority of the > > SMInfo received. Make sense ? If so, I will work on a patch for this. > > > > -- Hal > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From yael at mellanox.co.il Sun Dec 4 05:02:50 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 04 Dec 2005 15:02:50 +0200 Subject: [openib-general] [PATCH] Opensm - duplicated guids issue Message-ID: <5z7jalxbet.fsf@mtl066.yok.mtl.com> Hi Hal, Currently if OpenSM discovers duplicated guids or 12x link with lane reversal badly configured it only issues an error to the log file. This issue, though, is much more problematic, since it will cause part of the subnet to be un-initialized. The following patch includes a fuller handling of the issue - first, issue an error message to the /var/log/messeges file as well. Second - add an option flag to the SM that will define wether or not to exit on such case. Thanks, Yael Signed-off-by: Yael Kalka Index: include/opensm/osm_subnet.h =================================================================== --- include/opensm/osm_subnet.h (revision 4288) +++ include/opensm/osm_subnet.h (working copy) @@ -235,6 +235,7 @@ typedef struct _osm_subn_opt osm_testability_modes_t testability_mode; boolean_t updn_activate; char * updn_guid_file; + boolean_t exit_on_fatal; } osm_subn_opt_t; /* * FIELDS @@ -372,6 +373,13 @@ typedef struct _osm_subn_opt * updn_guid_file * Pointer to name of the UPDN guid file given by User * +* exit_on_fatal +* If TRUE (default) - SM will exit on fatal subnet initialization issues. +* If FALSE - SM will not exit. +* Fatal initialization issues: +* a. SM recognizes 2 different nodes with the same guid, or 12x link with +* lane reversal badly configured. +* * SEE ALSO * Subnet object *********/ Index: opensm/osm_subnet.c =================================================================== --- opensm/osm_subnet.c (revision 4288) +++ opensm/osm_subnet.c (working copy) @@ -440,6 +440,7 @@ osm_subn_set_default_opt( p_opt->testability_mode = OSM_TEST_MODE_NONE; p_opt->updn_activate = FALSE; p_opt->updn_guid_file = NULL; + p_opt->exit_on_fatal = TRUE; } /********************************************************************** @@ -765,6 +766,10 @@ osm_subn_parse_conf_file( __osm_subn_opts_unpack_charp( "updn_guid_file" , p_key, p_val, &p_opts->updn_guid_file); + + __osm_subn_opts_unpack_boolean( + "exit_on_fatal", + p_key, p_val, &p_opts->exit_on_fatal); } } fclose(opts_file); @@ -930,14 +935,17 @@ osm_subn_write_conf_file( "# If TRUE if OpenSM should disable multicast support\n" "no_multicast_option %s\n\n" "# No multicast routing is performed if TRUE\n" - "disable_multicast %s\n\n", + "disable_multicast %s\n\n" + "# If TRUE opensm will exit on fatal initialization issues\n" + "exit_on_fatal %s\n\n", p_opts->log_flags, p_opts->force_log_flush ? "TRUE" : "FALSE", p_opts->log_file, p_opts->accum_log_file ? "TRUE" : "FALSE", p_opts->dump_files_dir, p_opts->no_multicast_option ? "TRUE" : "FALSE", - p_opts->disable_multicast ? "TRUE" : "FALSE" + p_opts->disable_multicast ? "TRUE" : "FALSE", + p_opts->exit_on_fatal ? "TRUE" : "FALSE" ); /* optional string attributes ... */ Index: opensm/osm_node_info_rcv.c =================================================================== --- opensm/osm_node_info_rcv.c (revision 4288) +++ opensm/osm_node_info_rcv.c (working copy) @@ -198,6 +198,14 @@ __osm_ni_rcv_set_links( p_ni_context->port_num, dr_new_path ); + + osm_log( p_rcv->p_log, OSM_LOG_SYS, + "Errors on subnet. SM found duplicated guids or 12x " + "link with lane reversal badly configured. " + "Use osm log for more details.\n"); + + if ( p_rcv->p_subn->opt.exit_on_fatal == TRUE ) + exit( 1 ); } /* Index: opensm/main.c =================================================================== --- opensm/main.c (revision 4288) +++ opensm/main.c (working copy) @@ -178,6 +178,12 @@ show_usage(void) " This option will cause deletion of the log file\n" " (if it previously exists). By default, the log file\n" " is accumulative.\n\n"); + printf( "-y\n" + "--stay_on_fatal\n" + " This option will cause SM not to exit on fatal initialization\n" + " issues: If SM discovers duplicated guids or 12x link with\n" + " lane reversal badly configured.\n" + " By default, the SM will exit.\n\n"); printf( "-v\n" "--verbose\n" " This option increases the log verbosity level.\n" @@ -460,7 +466,7 @@ main( boolean_t cache_options = FALSE; char *ignore_guids_file_name = NULL; uint32_t val; - const char * const short_option = "i:f:ed:g:l:s:t:a:uvVhorc"; + const char * const short_option = "i:f:ed:g:l:s:t:a:uvVhorcy"; /* In the array below, the 2nd parameter specified the number @@ -492,6 +498,7 @@ main( { "updn", 0, NULL, 'u'}, { "add_guid_file", 1, NULL, 'a'}, { "cache-options", 0, NULL, 'c'}, + { "stay_on_fatal", 0, NULL, 'y'}, { NULL, 0, NULL, 0 } /* Required at the end of the array */ }; @@ -665,6 +672,11 @@ main( printf(" Creating new log file\n"); break; + case 'y': + opt.exit_on_fatal = FALSE; + printf(" Staying on fatal initialization\n"); + break; + case 'v': log_flags = (log_flags <<1 )|1; printf(" Verbose option -v (log flags = 0x%X)\n", log_flags ); From halr at voltaire.com Sun Dec 4 09:11:36 2005 From: halr at voltaire.com (Hal Rosenstock) Date: Sun, 4 Dec 2005 19:11:36 +0200 Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F589AB79@taurus.voltaire.com> Hi Yael, On Sun, 2005-12-04 at 07:10, Yael Kalka wrote: > Hi Eitan, Hal, > > I agree that currently we do not have an authentication mechanism, > thus - we cannot decide that an SM is not trusted. > I think that in the current situation the option of always sending our > true SM_Key when receiving SMInfo SubnGet request is a good one. > In this case - there is no need to update anything in the SMInfo SubnGet > request. > Any objections? IMO this is a first step (assuming in subnet with only OpenSMs and hence all are trusted). What needs to be done is: The SM needs a way to know whether the other SM(s) (and which ones) are trusted or not so the SM_Key can be filled in. To accomplish this, OpenSM needs to have a list of trusted SMs (e.g. additional configuration). Also, given that the current default SM Key is 0. there is no difference here (so perhaps the default SM Key should be changed to a non 0 value). There is some ambiguity in the spec currently around this (and a comment has been filed with the MgtWG on this). -- Hal > Yael > > > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Thursday, December 01, 2005 7:02 PM > To: Eitan Zahavi > Cc: Yael Kalka; openib-general at openib.org > Subject: RE: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > > Hi Eitan, > > On Thu, 2005-12-01 at 10:35, Eitan Zahavi wrote: > > Hi Yael, > > > > As I read through the MgtWg mails I get the impression that an out of > > spec mechanism is required to know if the other SM is trusted. > > Yes, that was what I was proposing (in > http://openib.org/pipermail/openib-general/2005-December/014186.html > where I wrote "The SM needs a way to know whether the other SM(s) (and > which ones) are trusted or not so the SM_Key can be filled in."): that > OpenSM have a list of trusted SMs and OpenSM would use that information. > > > In that case and since OpenSM does not currently provide any such > > mechanism, I would prefer never to send out the SM_Key on the request > > and always send zero. Sending our SM_Key to a non - trusted SM is not > a > > good idea in my mind. > > > > OpenSM behavior should be to always trust any other SM. > > Above you said no other SM was trusted so do you mean not trust rather > than trust other SMs ? > > > So any discovered SM that deserves to be the master should be granted > > that right. > > Only if it were trusted and had the correct SM Key. > > -- Hal > > > Eitan Zahavi > > Design Technology Director > > Mellanox Technologies LTD > > Tel:+972-4-9097208 > > Fax:+972-4-9593245 > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > -----Original Message----- > > > From: Yael Kalka > > > Sent: Thursday, December 01, 2005 2:17 PM > > > To: 'Hal Rosenstock'; Eitan Zahavi > > > Cc: openib-general at openib.org > > > Subject: RE: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > > > > > Hi Hal, Eitan, > > > I think the best option is to add an OpenSM option flag - > > exit_on_fatal. > > > This flag can decide on the action on fatal cases: > > > 1. Exit or not when seeing SM with different SM_Key. > > > 2. Exit or not when there is a fatal link error (e.g - multiple > > guids). > > > etc. > > > > > > I tried to run 2 SMs just now with different SM_keys, and I see that > > none of them > > > exit, since both receive SM_Key=0 on SMInfo GetResp. > > > The reason for that is that in the SMInfo Get request (as in all > other > > requests) > > > we do not send anything in the mad data. Meaning - all fields are > > clear. > > > In the __osm_sminfo_rcv_process_get_request function we are checking > > the state > > > according > > > to the payload data. This is always zero! Thus - SM will never know > > that the SMInfo > > > request is sent from an SM that is master. > > > > > > I will work on a fix for that. > > > Yael > > > > > > -----Original Message----- > > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > > Sent: Wednesday, November 30, 2005 11:57 PM > > > To: Yael Kalka; Eitan Zahavi > > > Cc: openib-general at openib.org > > > Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > > > > > > > > Hi Yael & Eitan, > > > > > > Based on the recent MgtWG discussions, are you still holding your > > > position in terms of exiting OpenSM when a non matching SM Key is > > > discovered ? Just wondering if I can issue a patch for this and > clear > > > this issue so OpenSM can be compliant for this aspect. Thanks. > > > > > > -- Hal > > > > > > -----Forwarded Message----- > > > > > > From: Hal Rosenstock > > > To: openib-general at openib.org > > > Subject: [openib-general] OpenSM and Wrong SM_Key > > > Date: 08 Nov 2005 16:08:47 -0500 > > > > > > Hi, > > > > > > Currently, when OpenSM receives SMInfo with a different SM_Key, it > > exits > > > as follows: > > > > > > > > > void > > > __osm_sminfo_rcv_process_get_response( > > > IN const osm_sminfo_rcv_t* const p_rcv, > > > IN const osm_madw_t* const p_madw ) > > > { > > > ... > > > > > > > > > > > > /* > > > Check that the sm_key of the found SM is the same as ours, > > > or is zero. If not - OpenSM cannot continue with > configuration!. > > */ > > > if ( p_smi->sm_key != 0 && > > > p_smi->sm_key != p_rcv->p_subn->opt.sm_key ) > > > { > > > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > > > "__osm_sminfo_rcv_process_get_response: ERR 2F18: " > > > "Got SM with sm_key that doesn't match our " > > > "local key. Exiting\n" ); > > > osm_log( p_rcv->p_log, OSM_LOG_SYS, > > > "Found remote SM with non-matching sm_key. Exiting\n" > ); > > > osm_exit_flag = TRUE; > > > goto Exit; > > > } > > > > > > C14-61.2.1 states that: > > > A master SM which finds a higher priority master SM with the wrong > > > SM_Key should not relinquish the subnet. > > > > > > Exiting OpenSM relinquishes the subnet. > > > > > > So it appears to me that perhaps this behavior of exiting OpenSM > > should > > > be at least contingent on the SM state and relative priority of the > > > SMInfo received. Make sense ? If so, I will work on a patch for > this. > > > > > > -- Hal > > > > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Sun Dec 4 10:57:26 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Dec 2005 20:57:26 +0200 Subject: [openib-general] libibcommon: fix make dist Message-ID: <20051204185726.GA27549@mellanox.co.il> fix make dist for libibcommon Signed-off-by: Michael S. Tsirkin Index: trunk/src/userspace/management/libibcommon/Makefile.am =================================================================== --- trunk.orig/src/userspace/management/libibcommon/Makefile.am +++ trunk/src/userspace/management/libibcommon/Makefile.am @@ -22,7 +22,8 @@ libibcommonincludedir = $(includedir)/in libibcommoninclude_HEADERS = $(srcdir)/include/infiniband/common.h -EXTRA_DIST = $(srcdir)/include/infiniband/common.h libibcommon.spec.in +EXTRA_DIST = $(srcdir)/include/infiniband/common.h libibcommon.spec.in \ + $(srcdir)/src/libibcommon.map dist-hook: libibcommon.spec cp libibcommon.spec $(distdir) -- MST From olgag at voltaire.com Sun Dec 4 12:47:51 2005 From: olgag at voltaire.com (Olga Grissik) Date: Sun, 4 Dec 2005 22:47:51 +0200 Subject: [openib-general] address rcpt to: openib-general@openib.org Message-ID: Test , ignore it -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Sun Dec 4 14:57:24 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Dec 2005 17:57:24 -0500 Subject: [openib-general] Re: libibcommon: fix make dist In-Reply-To: <20051204185726.GA27549@mellanox.co.il> References: <20051204185726.GA27549@mellanox.co.il> Message-ID: <1133736913.4587.8485.camel@hal.voltaire.com> On Sun, 2005-12-04 at 13:57, Michael S. Tsirkin wrote: > fix make dist for libibcommon Thanks. Applied. From sayo at yahoo.com.cn Sun Dec 4 15:32:29 2005 From: sayo at yahoo.com.cn (sayo) Date: Mon, 5 Dec 2005 08:32:29 +0900 Subject: [openib-general] =?iso-2022-jp?b?GyRCJS8laiU5JV4lOUZDPTgbKEI=?= Message-ID: <20051204233443.E61D622834D@openib.ca.sandia.gov> やっとイイサイト見つけました。苦労した甲斐あります。 http://mr-goo.com/1292/ 問 gerolia8888 at yaho０、com.cn From mmmic at 55mail.cc Sun Dec 4 15:35:23 2005 From: mmmic at 55mail.cc (mmmic at 55mail.cc) Date: Sun, 4 Dec 2005 15:35:23 -0800 (PST) Subject: [openib-general] =?iso-2022-jp?b?GyRCQGRCUCRLJCo2YiQsTGMbKEI=?= =?iso-2022-jp?b?GyRCJCgkXiQ5ISobKEI=?= Message-ID: 20051205075816.92482mail@mail.serebu_woman-server99_soondeai-go-free1919_system08_heart-kiss.tv ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ■━■━■━■━■━■━■━■━■━■━■━■━■━■━■━■━ ┃　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　 ■　　絶対に逢えます！絶対にＨ出来ます！絶対にお金が貰えます！　 ┃　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　 ■　　　　　　　だって、ココは本物の…　　　　　　 ┃　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　 ■　　　　　　　　セレブな女性達の集まりですから　　　　　　　　 ┃　　　　　　　　━━━━━━━━━━━━━━━━　　　　　　　 ■━■━■━■━■━■━■━■━■━■━■━■━■━■━■━■━ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ┏┓ ┗★　なぜ絶対に逢えると言い切れるか？　　　　　　　　■■ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━ http://perfection.cx/h/ 当サークルは男性様が主役。当社のシステムがその理由です。女性様の登録条件が以下の通りなのです。 ¶.登録の際、登録金額をお振込み頂き、男性様への謝礼金としての保障と　してもお預かりさせて頂く ¶.直接連絡先の交換は、男性様からのメールが届き次第速やかに行う事。　これは双方の信頼性・安全性を高める上での絶対条件の為 ¶.＜男性様が貴方にお会い頂ける＞という認識を大切に、ご希望条件（肉　の体関係の求愛・逆援助・送迎）等には快く従う http://perfection.cx/h/ ┏┓ ┗★　貴方が思っている以上に女性は淫乱なんです…　　　■■ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━ http://perfection.cx/h/ 某有名雑誌にはこんな記事が掲載されております。『一般的に性欲と言う物は＜男性の方が高く持ち合わせている＞という説　が殆どの方の認識であると思うが、某有名病院○○医師の見解によると　、どうやらそれは違う様である。』　つまり、女性は貴方のその性欲よりも＜更に強く＞ＳＥＸを求めている　のです。これは物理的に考えると『絶対に逢える！』と言う答えが、＜　必然的＞に裏付けられるのです。 http://perfection.cx/h/ ┏┓ ┗★　ご存知ですが・・・　　　　　　　　　　　　　　　■■ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━ http://perfection.cx/h/ よく『本当ですか？』等のご質問を頂きますが、何も行動されないから不安になるんです。こうしてる間にもセレブ女性は貴方を待っています。当社は男性様に上記の項目に完全に当てはまる女性をご紹介しております。信じる・信じないは貴方様の自由で御座います。只、　　　　　　　　　＜紹介という事実＞は、決して曲げる事の出来ない事実で御座います。男性は登録料・紹介料など一切かかりませんまた、当サークルは問題視されている不正請求・自動課金も一切行っておりません。どなた様も安心してご利用いただけます。 ▼18歳未満のご利用は禁止されています▼ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ http://perfection.cx/h/ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ From lond152 at kiwi.ne.jp Sun Dec 4 11:46:38 2005 From: lond152 at kiwi.ne.jp (lond152 at kiwi.ne.jp) Date: Mon, 05 Dec 2005 04:46:38 +0900 Subject: [openib-general] =?iso-2022-jp?b?GyRCTDVOQSViJUslPyE8MnEbKEI=?= =?iso-2022-jp?b?GyRCMHdFdkEqISohKhsoQg==?= Message-ID: <20051204.1946380677@lond152-kiwi.ne.jp> 第3回、アドレス抽選会において、あなた様のアカウントが見事当選致しました。参加サービスとして、1500円分の無料ポイント進呈致します。【参加要項】参加料…無料　年会費…無料各地域＝男性5名・女性10名となっております。是非、ご参加下さい☆ 1.2回無料モニターでは、男性平均3名のカップル成功率と、大盛況でした。今回の募集で最後になりますので是非、お楽しみくださいませ♪ 当選参加URL↓ http://hitozuma-jp.org/elegant/pc/e.cgi?hx08a 参加辞退→me622133 at members.interq.or.jp From eitan at mellanox.co.il Sun Dec 4 23:46:56 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 5 Dec 2005 09:46:56 +0200 Subject: [openib-general] OpenSM: search_mgrp_by_mgid questions Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3618A56@mtlexch01.mtl.com> Hi Hal, I thought we all agree that a full MGID compare is required. Also we should not deal with MGRPs marked "to be deleted". For all purposes but MGRP re-route they should not exist... Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Thursday, December 01, 2005 7:20 PM > To: Eitan Zahavi > Cc: Yael Kalka; openib-general at openib.org > Subject: RE: [openib-general] OpenSM: search_mgrp_by_mgid questions > > Hi Eitan, > > On Thu, 2005-12-01 at 10:28, Eitan Zahavi wrote: > > Hi Hal, > > > > You are very right. Thanks. Can you patch it? > > Sure. Any prefereance for which way should the comment be (like PR or > MCM) ? > > -- Hal > > > Eitan Zahavi > > Design Technology Director > > Mellanox Technologies LTD > > Tel:+972-4-9097208 > > Fax:+972-4-9593245 > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > -----Original Message----- > > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > > Sent: Thursday, December 01, 2005 4:53 PM > > > To: Yael Kalka > > > Cc: openib-general at openib.org > > > Subject: [openib-general] OpenSM: search_mgrp_by_mgid questions > > > > > > Hi Yael, > > > > > > osm_sa_path_record.c::__search_mgrp_by_mgid has the following: > > > > > > p_recvd_mgid = p_ctxt->p_mgid; > > > p_rcv = p_ctxt->p_rcv; > > > > > > /* Why not compare the entire MGID ???? */ > > > /* different scope can sneak in for the same MGID ? */ > > > /* EZ: I changed it to full compare ! */ > > > if (cl_memcmp(&p_mgrp->mcmember_rec.mgid, > > > p_recvd_mgid, > > > sizeof(ib_gid_t))) > > > return; > > > > > > whereas osm_sa_mcmember_record.c::__search_mgrp_by_mgid has the > > > following: > > > > > > p_recvd_mcmember_rec = p_ctxt->p_mcmember_rec; > > > p_rcv = p_ctxt->p_rcv; > > > > > > /* ignore groups marked for deletion */ > > > if (p_mgrp->to_be_deleted) > > > return; > > > > > > /* compare entire MGID so different scope will not sneak in for > > > the same MGID */ > > > if (cl_memcmp(&p_mgrp->mcmember_rec.mgid, > > > &p_recvd_mcmember_rec->mgid, > > > sizeof(ib_gid_t))) > > > return; > > > > > > Shouldn't the SA PR code also check for "to be deleted" ? It also > > seems > > > like the comments on the MGID comparison should also be made the same. > > > > > > -- Hal > > > > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general From eitan at mellanox.co.il Sun Dec 4 23:52:21 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 5 Dec 2005 09:52:21 +0200 Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3618A57@mtlexch01.mtl.com> > > > > I agree that currently we do not have an authentication mechanism, > > thus - we cannot decide that an SM is not trusted. > > I think that in the current situation the option of always sending our > > true SM_Key when receiving SMInfo SubnGet request is a good one. > > In this case - there is no need to update anything in the SMInfo SubnGet > > request. > > Any objections? > > IMO this is a first step (assuming in subnet with only OpenSMs and hence > all are trusted). What needs to be done is: > The SM needs a way to know whether the other SM(s) (and which ones) are > trusted or not so the SM_Key can be filled in. To accomplish this, > OpenSM needs to have a list of trusted SMs (e.g. additional > configuration). [EZ] I guess what you mean is that a list of trusted SM's port guids will provided to the SM? We can do that. > > Also, given that the current default SM Key is 0. there is no difference > here (so perhaps the default SM Key should be changed to a non 0 value). > There is some ambiguity in the spec currently around this (and a comment > has been filed with the MgtWG on this). > > -- Hal > > > Yael > > > > > > -----Original Message----- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Thursday, December 01, 2005 7:02 PM > > To: Eitan Zahavi > > Cc: Yael Kalka; openib-general at openib.org > > Subject: RE: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > > > > > Hi Eitan, > > > > On Thu, 2005-12-01 at 10:35, Eitan Zahavi wrote: > > > Hi Yael, > > > > > > As I read through the MgtWg mails I get the impression that an out of > > > spec mechanism is required to know if the other SM is trusted. > > > > Yes, that was what I was proposing (in > > http://openib.org/pipermail/openib-general/2005-December/014186.html > > where I wrote "The SM needs a way to know whether the other SM(s) (and > > which ones) are trusted or not so the SM_Key can be filled in."): that > > OpenSM have a list of trusted SMs and OpenSM would use that information. > > > > > In that case and since OpenSM does not currently provide any such > > > mechanism, I would prefer never to send out the SM_Key on the request > > > and always send zero. Sending our SM_Key to a non - trusted SM is not > > a > > > good idea in my mind. > > > > > > OpenSM behavior should be to always trust any other SM. > > > > Above you said no other SM was trusted so do you mean not trust rather > > than trust other SMs ? > > > > > So any discovered SM that deserves to be the master should be granted > > > that right. > > > > Only if it were trusted and had the correct SM Key. > > > > -- Hal > > > > > Eitan Zahavi > > > Design Technology Director > > > Mellanox Technologies LTD > > > Tel:+972-4-9097208 > > > Fax:+972-4-9593245 > > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > > > > -----Original Message----- > > > > From: Yael Kalka > > > > Sent: Thursday, December 01, 2005 2:17 PM > > > > To: 'Hal Rosenstock'; Eitan Zahavi > > > > Cc: openib-general at openib.org > > > > Subject: RE: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > > > > > > > Hi Hal, Eitan, > > > > I think the best option is to add an OpenSM option flag - > > > exit_on_fatal. > > > > This flag can decide on the action on fatal cases: > > > > 1. Exit or not when seeing SM with different SM_Key. > > > > 2. Exit or not when there is a fatal link error (e.g - multiple > > > guids). > > > > etc. > > > > > > > > I tried to run 2 SMs just now with different SM_keys, and I see that > > > none of them > > > > exit, since both receive SM_Key=0 on SMInfo GetResp. > > > > The reason for that is that in the SMInfo Get request (as in all > > other > > > requests) > > > > we do not send anything in the mad data. Meaning - all fields are > > > clear. > > > > In the __osm_sminfo_rcv_process_get_request function we are checking > > > the state > > > > according > > > > to the payload data. This is always zero! Thus - SM will never know > > > that the SMInfo > > > > request is sent from an SM that is master. > > > > > > > > I will work on a fix for that. > > > > Yael > > > > > > > > -----Original Message----- > > > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > > > Sent: Wednesday, November 30, 2005 11:57 PM > > > > To: Yael Kalka; Eitan Zahavi > > > > Cc: openib-general at openib.org > > > > Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > > > > > > > > > > > Hi Yael & Eitan, > > > > > > > > Based on the recent MgtWG discussions, are you still holding your > > > > position in terms of exiting OpenSM when a non matching SM Key is > > > > discovered ? Just wondering if I can issue a patch for this and > > clear > > > > this issue so OpenSM can be compliant for this aspect. Thanks. > > > > > > > > -- Hal > > > > > > > > -----Forwarded Message----- > > > > > > > > From: Hal Rosenstock > > > > To: openib-general at openib.org > > > > Subject: [openib-general] OpenSM and Wrong SM_Key > > > > Date: 08 Nov 2005 16:08:47 -0500 > > > > > > > > Hi, > > > > > > > > Currently, when OpenSM receives SMInfo with a different SM_Key, it > > > exits > > > > as follows: > > > > > > > > > > > > void > > > > __osm_sminfo_rcv_process_get_response( > > > > IN const osm_sminfo_rcv_t* const p_rcv, > > > > IN const osm_madw_t* const p_madw ) > > > > { > > > > ... > > > > > > > > > > > > > > > > /* > > > > Check that the sm_key of the found SM is the same as ours, > > > > or is zero. If not - OpenSM cannot continue with > > configuration!. > > > */ > > > > if ( p_smi->sm_key != 0 && > > > > p_smi->sm_key != p_rcv->p_subn->opt.sm_key ) > > > > { > > > > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > > > > "__osm_sminfo_rcv_process_get_response: ERR 2F18: " > > > > "Got SM with sm_key that doesn't match our " > > > > "local key. Exiting\n" ); > > > > osm_log( p_rcv->p_log, OSM_LOG_SYS, > > > > "Found remote SM with non-matching sm_key. Exiting\n" > > ); > > > > osm_exit_flag = TRUE; > > > > goto Exit; > > > > } > > > > > > > > C14-61.2.1 states that: > > > > A master SM which finds a higher priority master SM with the wrong > > > > SM_Key should not relinquish the subnet. > > > > > > > > Exiting OpenSM relinquishes the subnet. > > > > > > > > So it appears to me that perhaps this behavior of exiting OpenSM > > > should > > > > be at least contingent on the SM state and relative priority of the > > > > SMInfo received. Make sense ? If so, I will work on a patch for > > this. > > > > > > > > -- Hal > > > > > > > > > > > > _______________________________________________ > > > > openib-general mailing list > > > > openib-general at openib.org > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general From eitan at mellanox.co.il Sun Dec 4 23:58:43 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 5 Dec 2005 09:58:43 +0200 Subject: [openib-general] OpenSM: search_mgrp_by_mgid questions Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3618A58@mtlexch01.mtl.com> Hi Hal, Yael tells me I did not understand your question. I also see you have provided the patch implementing exactly what I want. Please ignore my previous mail. (Maybe the zillion mails long inbox can serve as a poor excuse for my previous one ...) Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Eitan Zahavi > Sent: Monday, December 05, 2005 9:47 AM > To: Hal Rosenstock > Cc: openib-general at openib.org > Subject: RE: [openib-general] OpenSM: search_mgrp_by_mgid questions > > Hi Hal, > > I thought we all agree that a full MGID compare is required. > Also we should not deal with MGRPs marked "to be deleted". > For all purposes but MGRP re-route they should not exist... > > Eitan Zahavi > Design Technology Director > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -----Original Message----- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Thursday, December 01, 2005 7:20 PM > > To: Eitan Zahavi > > Cc: Yael Kalka; openib-general at openib.org > > Subject: RE: [openib-general] OpenSM: search_mgrp_by_mgid questions > > > > Hi Eitan, > > > > On Thu, 2005-12-01 at 10:28, Eitan Zahavi wrote: > > > Hi Hal, > > > > > > You are very right. Thanks. Can you patch it? > > > > Sure. Any prefereance for which way should the comment be (like PR or > > MCM) ? > > > > -- Hal > > > > > Eitan Zahavi > > > Design Technology Director > > > Mellanox Technologies LTD > > > Tel:+972-4-9097208 > > > Fax:+972-4-9593245 > > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > > > > -----Original Message----- > > > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > > > Sent: Thursday, December 01, 2005 4:53 PM > > > > To: Yael Kalka > > > > Cc: openib-general at openib.org > > > > Subject: [openib-general] OpenSM: search_mgrp_by_mgid questions > > > > > > > > Hi Yael, > > > > > > > > osm_sa_path_record.c::__search_mgrp_by_mgid has the following: > > > > > > > > p_recvd_mgid = p_ctxt->p_mgid; > > > > p_rcv = p_ctxt->p_rcv; > > > > > > > > /* Why not compare the entire MGID ???? */ > > > > /* different scope can sneak in for the same MGID ? */ > > > > /* EZ: I changed it to full compare ! */ > > > > if (cl_memcmp(&p_mgrp->mcmember_rec.mgid, > > > > p_recvd_mgid, > > > > sizeof(ib_gid_t))) > > > > return; > > > > > > > > whereas osm_sa_mcmember_record.c::__search_mgrp_by_mgid has the > > > > following: > > > > > > > > p_recvd_mcmember_rec = p_ctxt->p_mcmember_rec; > > > > p_rcv = p_ctxt->p_rcv; > > > > > > > > /* ignore groups marked for deletion */ > > > > if (p_mgrp->to_be_deleted) > > > > return; > > > > > > > > /* compare entire MGID so different scope will not sneak in for > > > > the same MGID */ > > > > if (cl_memcmp(&p_mgrp->mcmember_rec.mgid, > > > > &p_recvd_mcmember_rec->mgid, > > > > sizeof(ib_gid_t))) > > > > return; > > > > > > > > Shouldn't the SA PR code also check for "to be deleted" ? It also > > > seems > > > > like the comments on the MGID comparison should also be made the > same. > > > > > > > > -- Hal > > > > > > > > > > > > _______________________________________________ > > > > openib-general mailing list > > > > openib-general at openib.org > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Mon Dec 5 03:19:21 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Dec 2005 06:19:21 -0500 Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3618A57@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3618A57@mtlexch01.mtl.com> Message-ID: <1133781560.4587.11462.camel@hal.voltaire.com> On Mon, 2005-12-05 at 02:52, Eitan Zahavi wrote: > > > > > > I agree that currently we do not have an authentication mechanism, > > > thus - we cannot decide that an SM is not trusted. > > > I think that in the current situation the option of always sending > our > > > true SM_Key when receiving SMInfo SubnGet request is a good one. > > > In this case - there is no need to update anything in the SMInfo > SubnGet > > > request. > > > Any objections? > > > > IMO this is a first step (assuming in subnet with only OpenSMs and > hence > > all are trusted). What needs to be done is: > > The SM needs a way to know whether the other SM(s) (and which ones) > are > > trusted or not so the SM_Key can be filled in. To accomplish this, > > OpenSM needs to have a list of trusted SMs (e.g. additional > > configuration). > [EZ] I guess what you mean is that a list of trusted SM's port guids > will provided to the SM? We can do that. Yes, that's what I mean/meant. -- Hal > > > > Also, given that the current default SM Key is 0. there is no > difference > > here (so perhaps the default SM Key should be changed to a non 0 > value). > > There is some ambiguity in the spec currently around this (and a > comment > > has been filed with the MgtWG on this). > > > > -- Hal > > > > > Yael > > > > > > > > > -----Original Message----- > > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > > Sent: Thursday, December 01, 2005 7:02 PM > > > To: Eitan Zahavi > > > Cc: Yael Kalka; openib-general at openib.org > > > Subject: RE: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > > > > > > > > Hi Eitan, > > > > > > On Thu, 2005-12-01 at 10:35, Eitan Zahavi wrote: > > > > Hi Yael, > > > > > > > > As I read through the MgtWg mails I get the impression that an out > of > > > > spec mechanism is required to know if the other SM is trusted. > > > > > > Yes, that was what I was proposing (in > > > http://openib.org/pipermail/openib-general/2005-December/014186.html > > > where I wrote "The SM needs a way to know whether the other SM(s) > (and > > > which ones) are trusted or not so the SM_Key can be filled in."): > that > > > OpenSM have a list of trusted SMs and OpenSM would use that > information. > > > > > > > In that case and since OpenSM does not currently provide any such > > > > mechanism, I would prefer never to send out the SM_Key on the > request > > > > and always send zero. Sending our SM_Key to a non - trusted SM is > not > > > a > > > > good idea in my mind. > > > > > > > > OpenSM behavior should be to always trust any other SM. > > > > > > Above you said no other SM was trusted so do you mean not trust > rather > > > than trust other SMs ? > > > > > > > So any discovered SM that deserves to be the master should be > granted > > > > that right. > > > > > > Only if it were trusted and had the correct SM Key. > > > > > > -- Hal > > > > > > > Eitan Zahavi > > > > Design Technology Director > > > > Mellanox Technologies LTD > > > > Tel:+972-4-9097208 > > > > Fax:+972-4-9593245 > > > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > > > > > > > -----Original Message----- > > > > > From: Yael Kalka > > > > > Sent: Thursday, December 01, 2005 2:17 PM > > > > > To: 'Hal Rosenstock'; Eitan Zahavi > > > > > Cc: openib-general at openib.org > > > > > Subject: RE: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > > > > > > > > > Hi Hal, Eitan, > > > > > I think the best option is to add an OpenSM option flag - > > > > exit_on_fatal. > > > > > This flag can decide on the action on fatal cases: > > > > > 1. Exit or not when seeing SM with different SM_Key. > > > > > 2. Exit or not when there is a fatal link error (e.g - multiple > > > > guids). > > > > > etc. > > > > > > > > > > I tried to run 2 SMs just now with different SM_keys, and I see > that > > > > none of them > > > > > exit, since both receive SM_Key=0 on SMInfo GetResp. > > > > > The reason for that is that in the SMInfo Get request (as in all > > > other > > > > requests) > > > > > we do not send anything in the mad data. Meaning - all fields > are > > > > clear. > > > > > In the __osm_sminfo_rcv_process_get_request function we are > checking > > > > the state > > > > > according > > > > > to the payload data. This is always zero! Thus - SM will never > know > > > > that the SMInfo > > > > > request is sent from an SM that is master. > > > > > > > > > > I will work on a fix for that. > > > > > Yael > > > > > > > > > > -----Original Message----- > > > > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > > > > Sent: Wednesday, November 30, 2005 11:57 PM > > > > > To: Yael Kalka; Eitan Zahavi > > > > > Cc: openib-general at openib.org > > > > > Subject: [Fwd: [openib-general] OpenSM and Wrong SM_Key] > > > > > > > > > > > > > > > Hi Yael & Eitan, > > > > > > > > > > Based on the recent MgtWG discussions, are you still holding > your > > > > > position in terms of exiting OpenSM when a non matching SM Key > is > > > > > discovered ? Just wondering if I can issue a patch for this and > > > clear > > > > > this issue so OpenSM can be compliant for this aspect. Thanks. > > > > > > > > > > -- Hal > > > > > > > > > > -----Forwarded Message----- > > > > > > > > > > From: Hal Rosenstock > > > > > To: openib-general at openib.org > > > > > Subject: [openib-general] OpenSM and Wrong SM_Key > > > > > Date: 08 Nov 2005 16:08:47 -0500 > > > > > > > > > > Hi, > > > > > > > > > > Currently, when OpenSM receives SMInfo with a different SM_Key, > it > > > > exits > > > > > as follows: > > > > > > > > > > > > > > > void > > > > > __osm_sminfo_rcv_process_get_response( > > > > > IN const osm_sminfo_rcv_t* const p_rcv, > > > > > IN const osm_madw_t* const p_madw ) > > > > > { > > > > > ... > > > > > > > > > > > > > > > > > > > > /* > > > > > Check that the sm_key of the found SM is the same as ours, > > > > > or is zero. If not - OpenSM cannot continue with > > > configuration!. > > > > */ > > > > > if ( p_smi->sm_key != 0 && > > > > > p_smi->sm_key != p_rcv->p_subn->opt.sm_key ) > > > > > { > > > > > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > > > > > "__osm_sminfo_rcv_process_get_response: ERR 2F18: " > > > > > "Got SM with sm_key that doesn't match our " > > > > > "local key. Exiting\n" ); > > > > > osm_log( p_rcv->p_log, OSM_LOG_SYS, > > > > > "Found remote SM with non-matching sm_key. > Exiting\n" > > > ); > > > > > osm_exit_flag = TRUE; > > > > > goto Exit; > > > > > } > > > > > > > > > > C14-61.2.1 states that: > > > > > A master SM which finds a higher priority master SM with the > wrong > > > > > SM_Key should not relinquish the subnet. > > > > > > > > > > Exiting OpenSM relinquishes the subnet. > > > > > > > > > > So it appears to me that perhaps this behavior of exiting OpenSM > > > > should > > > > > be at least contingent on the SM state and relative priority of > the > > > > > SMInfo received. Make sense ? If so, I will work on a patch for > > > this. > > > > > > > > > > -- Hal > > > > > > > > > > > > > > > _______________________________________________ > > > > > openib-general mailing list > > > > > openib-general at openib.org > > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > To unsubscribe, please visit > > > > http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Mon Dec 5 05:19:27 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Dec 2005 08:19:27 -0500 Subject: [openib-general] Re: [PATCH] Opensm - fix segfault on exit In-Reply-To: <5z8xv1xklx.fsf@mtl066.yok.mtl.com> References: <5z8xv1xklx.fsf@mtl066.yok.mtl.com> Message-ID: <1133788766.4587.12025.camel@hal.voltaire.com> On Sun, 2005-12-04 at 04:44, Yael Kalka wrote: > Hi Hal, > > If the driver isn't loaded, opensm exits with segfault. This is since > it tries destroying the signal event in the osm_vendor, but due to the > failure - this event wasn't created. > The following patch fixes this. Thanks. Applied. > Thanks, > Yael From halr at voltaire.com Mon Dec 5 06:14:01 2005 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 5 Dec 2005 16:14:01 +0200 Subject: [openib-general] Re: [PATCH] Opensm - duplicated guids issue Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F589AB7A@taurus.voltaire.com> Hi Yael, On Sun, 2005-12-04 at 08:02, Yael Kalka wrote: > Hi Hal, > > Currently if OpenSM discovers duplicated guids What is the cause of a duplicated GUID ? Is it a misconfiguration of someone's firmware (rather than some error on the part of OpenSM) ? If so, I'm not sure exiting SM is the best option. IMO the policy is to decide which GUID to "honor" (either the original one or the new one). > or 12x link with lane reversal badly configured What does badly configured mean ? Does it mean the link does not come up at all or just in some non desired mode ? How is "bad lane reversal" reconfigured ? Can't this also occur on a 4x link as well ? > it only issues an error to the log > file. This issue, though, is much more problematic, since it will cause > part of the subnet to be un-initialized. > The following patch includes a fuller handling of the issue - first, > issue an error message to the /var/log/messeges file as well. I am incorporating this part of the patch. > Second - add an option flag to the SM that will define wether or not > to exit on such case. Also, there are other scenarios which mark the subnet initialization as failed (but don't exit the SM). This seems inconsistent to me. These cases also do not put errors out on syslog. Should they ? IMO, in general, exiting out of OpenSM should be avoided at all costs. The admin can always cause this to occur if desired and operating part of the subnet is better than none. Are these cases where the admin would not want to run the SM until the issues were resolved ? -- Hal > Thanks, > Yael > > Signed-off-by: Yael Kalka > > Index: include/opensm/osm_subnet.h > =================================================================== > --- include/opensm/osm_subnet.h (revision 4288) > +++ include/opensm/osm_subnet.h (working copy) > @@ -235,6 +235,7 @@ typedef struct _osm_subn_opt > osm_testability_modes_t testability_mode; > boolean_t updn_activate; > char * updn_guid_file; > + boolean_t exit_on_fatal; > } osm_subn_opt_t; > /* > * FIELDS > @@ -372,6 +373,13 @@ typedef struct _osm_subn_opt > * updn_guid_file > * Pointer to name of the UPDN guid file given by User > * > +* exit_on_fatal > +* If TRUE (default) - SM will exit on fatal subnet initialization issues. > +* If FALSE - SM will not exit. > +* Fatal initialization issues: > +* a. SM recognizes 2 different nodes with the same guid, or 12x link with > +* lane reversal badly configured. > +* > * SEE ALSO > * Subnet object > *********/ > Index: opensm/osm_subnet.c > =================================================================== > --- opensm/osm_subnet.c (revision 4288) > +++ opensm/osm_subnet.c (working copy) > @@ -440,6 +440,7 @@ osm_subn_set_default_opt( > p_opt->testability_mode = OSM_TEST_MODE_NONE; > p_opt->updn_activate = FALSE; > p_opt->updn_guid_file = NULL; > + p_opt->exit_on_fatal = TRUE; > } > > /********************************************************************** > @@ -765,6 +766,10 @@ osm_subn_parse_conf_file( > __osm_subn_opts_unpack_charp( > "updn_guid_file" , > p_key, p_val, &p_opts->updn_guid_file); > + > + __osm_subn_opts_unpack_boolean( > + "exit_on_fatal", > + p_key, p_val, &p_opts->exit_on_fatal); > } > } > fclose(opts_file); > @@ -930,14 +935,17 @@ osm_subn_write_conf_file( > "# If TRUE if OpenSM should disable multicast support\n" > "no_multicast_option %s\n\n" > "# No multicast routing is performed if TRUE\n" > - "disable_multicast %s\n\n", > + "disable_multicast %s\n\n" > + "# If TRUE opensm will exit on fatal initialization issues\n" > + "exit_on_fatal %s\n\n", > p_opts->log_flags, > p_opts->force_log_flush ? "TRUE" : "FALSE", > p_opts->log_file, > p_opts->accum_log_file ? "TRUE" : "FALSE", > p_opts->dump_files_dir, > p_opts->no_multicast_option ? "TRUE" : "FALSE", > - p_opts->disable_multicast ? "TRUE" : "FALSE" > + p_opts->disable_multicast ? "TRUE" : "FALSE", > + p_opts->exit_on_fatal ? "TRUE" : "FALSE" > ); > > /* optional string attributes ... */ > Index: opensm/osm_node_info_rcv.c > =================================================================== > --- opensm/osm_node_info_rcv.c (revision 4288) > +++ opensm/osm_node_info_rcv.c (working copy) > @@ -198,6 +198,14 @@ __osm_ni_rcv_set_links( > p_ni_context->port_num, > dr_new_path > ); > + > + osm_log( p_rcv->p_log, OSM_LOG_SYS, > + "Errors on subnet. SM found duplicated guids or 12x " > + "link with lane reversal badly configured. " > + "Use osm log for more details.\n"); > + > + if ( p_rcv->p_subn->opt.exit_on_fatal == TRUE ) > + exit( 1 ); > } > > /* > Index: opensm/main.c > =================================================================== > --- opensm/main.c (revision 4288) > +++ opensm/main.c (working copy) > @@ -178,6 +178,12 @@ show_usage(void) > " This option will cause deletion of the log file\n" > " (if it previously exists). By default, the log file\n" > " is accumulative.\n\n"); > + printf( "-y\n" > + "--stay_on_fatal\n" > + " This option will cause SM not to exit on fatal initialization\n" > + " issues: If SM discovers duplicated guids or 12x link with\n" > + " lane reversal badly configured.\n" > + " By default, the SM will exit.\n\n"); > printf( "-v\n" > "--verbose\n" > " This option increases the log verbosity level.\n" > @@ -460,7 +466,7 @@ main( > boolean_t cache_options = FALSE; > char *ignore_guids_file_name = NULL; > uint32_t val; > - const char * const short_option = "i:f:ed:g:l:s:t:a:uvVhorc"; > + const char * const short_option = "i:f:ed:g:l:s:t:a:uvVhorcy"; > > /* > In the array below, the 2nd parameter specified the number > @@ -492,6 +498,7 @@ main( > { "updn", 0, NULL, 'u'}, > { "add_guid_file", 1, NULL, 'a'}, > { "cache-options", 0, NULL, 'c'}, > + { "stay_on_fatal", 0, NULL, 'y'}, > { NULL, 0, NULL, 0 } /* Required at the end of the array */ > }; > > @@ -665,6 +672,11 @@ main( > printf(" Creating new log file\n"); > break; > > + case 'y': > + opt.exit_on_fatal = FALSE; > + printf(" Staying on fatal initialization\n"); > + break; > + > case 'v': > log_flags = (log_flags <<1 )|1; > printf(" Verbose option -v (log flags = 0x%X)\n", log_flags ); > From eitan at mellanox.co.il Mon Dec 5 07:32:06 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 5 Dec 2005 17:32:06 +0200 Subject: [openib-general] RE: [PATCH] Opensm - duplicated guids issue Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3618A70@mtlexch01.mtl.com> Hi Hal, Please see my response below > > Currently if OpenSM discovers duplicated guids > > What is the cause of a duplicated GUID ? Is it a misconfiguration of > someone's firmware (rather than some error on the part of OpenSM) ? If > so, I'm not sure exiting SM is the best option. IMO the policy is to > decide which GUID to "honor" (either the original one or the new one). [EZ] There is no way to know which GUID to honor if this is the first sweep. More over the cause for duplicated GUID is from bad firmware burning. Currently the last GUID found is honored but the fabric behind the first one is ignored. > > > or 12x link with lane reversal badly configured > > What does badly configured mean ? Does it mean the link does not come up > at all or just in some non desired mode ? How is "bad lane reversal" > reconfigured ? [EZ] Bad FW configuration. The details are provided in the IS3 PRM. But if one route the board and swizzle the lanes it has to enable automatic lane reversal detection in the INI file. > > Can't this also occur on a 4x link as well ? [EZ] No. > > > it only issues an error to the log > > file. This issue, though, is much more problematic, since it will cause > > part of the subnet to be un-initialized. > > The following patch includes a fuller handling of the issue - first, > > issue an error message to the /var/log/messeges file as well. > > I am incorporating this part of the patch. > > > Second - add an option flag to the SM that will define wether or not > > to exit on such case. > > Also, there are other scenarios which mark the subnet initialization as > failed (but don't exit the SM). This seems inconsistent to me. These > cases also do not put errors out on syslog. Should they ? > > IMO, in general, exiting out of OpenSM should be avoided at all costs. > The admin can always cause this to occur if desired and operating part > of the subnet is better than none. Are these cases where the admin would > not want to run the SM until the issues were resolved ? [EZ] The case of "bad connectivity" is different then "initialization failure": "bad connectivity" is a static problem caused by bad firmware options used or even bad hardware. "initialization failure" can be caused by management packet dropping which may happen due to flaky links or even reasonable bit error rate. The proposal is to provide an option for the sake of exiting the SM on such "bad hardware/firmware" conditions. If one wants to keep going all he has to do is to set that option to 0. Needless to say we have proposed this "exit condition" based on our experience where such cases have happened and the log message ignored. Such that many man hours could have been saved if the SM would insist on not running under such conditions. > > -- Hal > > > Thanks, > > Yael > > > > Signed-off-by: Yael Kalka > > > > Index: include/opensm/osm_subnet.h > > =================================================================== > > --- include/opensm/osm_subnet.h (revision 4288) > > +++ include/opensm/osm_subnet.h (working copy) > > @@ -235,6 +235,7 @@ typedef struct _osm_subn_opt > > osm_testability_modes_t testability_mode; > > boolean_t updn_activate; > > char * updn_guid_file; > > + boolean_t exit_on_fatal; > > } osm_subn_opt_t; > > /* > > * FIELDS > > @@ -372,6 +373,13 @@ typedef struct _osm_subn_opt > > * updn_guid_file > > * Pointer to name of the UPDN guid file given by User > > * > > +* exit_on_fatal > > +* If TRUE (default) - SM will exit on fatal subnet initialization issues. > > +* If FALSE - SM will not exit. > > +* Fatal initialization issues: > > +* a. SM recognizes 2 different nodes with the same guid, or 12x link with > > +* lane reversal badly configured. > > +* > > * SEE ALSO > > * Subnet object > > *********/ > > Index: opensm/osm_subnet.c > > =================================================================== > > --- opensm/osm_subnet.c (revision 4288) > > +++ opensm/osm_subnet.c (working copy) > > @@ -440,6 +440,7 @@ osm_subn_set_default_opt( > > p_opt->testability_mode = OSM_TEST_MODE_NONE; > > p_opt->updn_activate = FALSE; > > p_opt->updn_guid_file = NULL; > > + p_opt->exit_on_fatal = TRUE; > > } > > > > /********************************************************************** > > @@ -765,6 +766,10 @@ osm_subn_parse_conf_file( > > __osm_subn_opts_unpack_charp( > > "updn_guid_file" , > > p_key, p_val, &p_opts->updn_guid_file); > > + > > + __osm_subn_opts_unpack_boolean( > > + "exit_on_fatal", > > + p_key, p_val, &p_opts->exit_on_fatal); > > } > > } > > fclose(opts_file); > > @@ -930,14 +935,17 @@ osm_subn_write_conf_file( > > "# If TRUE if OpenSM should disable multicast support\n" > > "no_multicast_option %s\n\n" > > "# No multicast routing is performed if TRUE\n" > > - "disable_multicast %s\n\n", > > + "disable_multicast %s\n\n" > > + "# If TRUE opensm will exit on fatal initialization issues\n" > > + "exit_on_fatal %s\n\n", > > p_opts->log_flags, > > p_opts->force_log_flush ? "TRUE" : "FALSE", > > p_opts->log_file, > > p_opts->accum_log_file ? "TRUE" : "FALSE", > > p_opts->dump_files_dir, > > p_opts->no_multicast_option ? "TRUE" : "FALSE", > > - p_opts->disable_multicast ? "TRUE" : "FALSE" > > + p_opts->disable_multicast ? "TRUE" : "FALSE", > > + p_opts->exit_on_fatal ? "TRUE" : "FALSE" > > ); > > > > /* optional string attributes ... */ > > Index: opensm/osm_node_info_rcv.c > > =================================================================== > > --- opensm/osm_node_info_rcv.c (revision 4288) > > +++ opensm/osm_node_info_rcv.c (working copy) > > @@ -198,6 +198,14 @@ __osm_ni_rcv_set_links( > > p_ni_context->port_num, > > dr_new_path > > ); > > + > > + osm_log( p_rcv->p_log, OSM_LOG_SYS, > > + "Errors on subnet. SM found duplicated guids or 12x " > > + "link with lane reversal badly configured. " > > + "Use osm log for more details.\n"); > > + > > + if ( p_rcv->p_subn->opt.exit_on_fatal == TRUE ) > > + exit( 1 ); > > } > > > > /* > > Index: opensm/main.c > > =================================================================== > > --- opensm/main.c (revision 4288) > > +++ opensm/main.c (working copy) > > @@ -178,6 +178,12 @@ show_usage(void) > > " This option will cause deletion of the log file\n" > > " (if it previously exists). By default, the log file\n" > > " is accumulative.\n\n"); > > + printf( "-y\n" > > + "--stay_on_fatal\n" > > + " This option will cause SM not to exit on fatal initialization\n" > > + " issues: If SM discovers duplicated guids or 12x link with\n" > > + " lane reversal badly configured.\n" > > + " By default, the SM will exit.\n\n"); > > printf( "-v\n" > > "--verbose\n" > > " This option increases the log verbosity level.\n" > > @@ -460,7 +466,7 @@ main( > > boolean_t cache_options = FALSE; > > char *ignore_guids_file_name = NULL; > > uint32_t val; > > - const char * const short_option = "i:f:ed:g:l:s:t:a:uvVhorc"; > > + const char * const short_option = "i:f:ed:g:l:s:t:a:uvVhorcy"; > > > > /* > > In the array below, the 2nd parameter specified the number > > @@ -492,6 +498,7 @@ main( > > { "updn", 0, NULL, 'u'}, > > { "add_guid_file", 1, NULL, 'a'}, > > { "cache-options", 0, NULL, 'c'}, > > + { "stay_on_fatal", 0, NULL, 'y'}, > > { NULL, 0, NULL, 0 } /* Required at the end of the array */ > > }; > > > > @@ -665,6 +672,11 @@ main( > > printf(" Creating new log file\n"); > > break; > > > > + case 'y': > > + opt.exit_on_fatal = FALSE; > > + printf(" Staying on fatal initialization\n"); > > + break; > > + > > case 'v': > > log_flags = (log_flags <<1 )|1; > > printf(" Verbose option -v (log flags = 0x%X)\n", log_flags ); > > From jlentini at netapp.com Mon Dec 5 08:11:53 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 5 Dec 2005 11:11:53 -0500 (EST) Subject: [openib-general] [kDAPL]How to register a vmalloc() allocated buffer In-Reply-To: <7b2fa1820512030945j22e205d9j86a3b8e7bd709182@mail.gmail.com> References: <7b2fa1820512030945j22e205d9j86a3b8e7bd709182@mail.gmail.com> Message-ID: > I am doing a simple rdma-read test using the kDAPL. My test is running in > the kernel model. > When I allocate both the data source and sink buffers using kmalloc() and > register the buffers using dat_lmr_kcreate() with memory type > DAT_MEM_TYPE_PHYSICAL, everything goes well. If the sink buffer is allocated > with vmalloc() and registered as before, no registering error or rdma read > DTO completion error occours but > My questions: > 1) Could a buffer allocated with vmalloc() be used for a kDAPL rdma reading? > If so, > 2) should a buffer of this kind be registered in the same as a buffer > allocated with kmalloc()? > > Could anyone give some suggestion? > Thanks very much! Hi Ian, An IB HCA needs to be able to DMA the memory used for RDMA read. Since vmalloc does not guarantee that the memory it returns can be accessed via DMA, you should not use vmalloc. james From halr at voltaire.com Mon Dec 5 10:40:08 2005 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 5 Dec 2005 20:40:08 +0200 Subject: [openib-general] RE: [PATCH] Opensm - duplicated guids issue Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F589AB7E@taurus.voltaire.com> Hi Eitan, On Mon, 2005-12-05 at 10:32, Eitan Zahavi wrote: > Hi Hal, > > Please see my response below > > > Currently if OpenSM discovers duplicated guids > > > > What is the cause of a duplicated GUID ? Is it a misconfiguration of > > someone's firmware (rather than some error on the part of OpenSM) ? If > > so, I'm not sure exiting SM is the best option. IMO the policy is to > > decide which GUID to "honor" (either the original one or the new one). > [EZ] There is no way to know which GUID to honor if this is the first > sweep. More over the cause for duplicated GUID is from bad firmware > burning. IMO, leaving the configuration of a globally unique ID to firmware configuration is a poor choice as it lends itself to being error prone. It should be done at manufacturing time in something like an EEPROM. I know this increases the cost, etc. but also reduces the chances of this being an issue. > Currently the last GUID found is honored but the fabric behind > the first one is ignored. > > > > > or 12x link with lane reversal badly configured > > > > What does badly configured mean ? Does it mean the link does not come > up > > at all or just in some non desired mode ? How is "bad lane reversal" > > reconfigured ? > [EZ] Bad FW configuration. The details are provided in the IS3 PRM. But > if one route the board and swizzle the lanes it has to enable automatic > lane reversal detection in the INI file. > > > > Can't this also occur on a 4x link as well ? > [EZ] No. > > > > > it only issues an error to the log > > > file. This issue, though, is much more problematic, since it will > cause > > > part of the subnet to be un-initialized. > > > The following patch includes a fuller handling of the issue - first, > > > issue an error message to the /var/log/messeges file as well. > > > > I am incorporating this part of the patch. > > > > > Second - add an option flag to the SM that will define wether or not > > > to exit on such case. > > > > Also, there are other scenarios which mark the subnet initialization > as > > failed (but don't exit the SM). This seems inconsistent to me. These > > cases also do not put errors out on syslog. Should they ? > > > > IMO, in general, exiting out of OpenSM should be avoided at all costs. > > The admin can always cause this to occur if desired and operating part > > of the subnet is better than none. Are these cases where the admin > would > > not want to run the SM until the issues were resolved ? > [EZ] The case of "bad connectivity" is different then "initialization > failure": > "bad connectivity" is a static problem caused by bad firmware options > used or even bad hardware. "initialization failure" can be caused by > management packet dropping which may happen due to flaky links or even > reasonable bit error rate. I think there are other cases aside from the "bad connectivity" cases you cite (as was seen at SC05). > The proposal is to provide an option for the sake of exiting the SM on > such "bad hardware/firmware" conditions. If one wants to keep going all > he has to do is to set that option to 0. > > Needless to say we have proposed this "exit condition" based on our > experience where such cases have happened and the log message ignored. > Such that many man hours could have been saved if the SM would insist on > not running under such conditions. I think there is a chance that there will be support calls this way too since the OpenSM won't come up at all in this case. We can always change the default for this (for exiting on these errors) from TRUE to FALSE if and when this becomes an issue... Anyone else have an opinion on this ? -- Hal > > > > -- Hal > > > > > Thanks, > > > Yael > > > > > > Signed-off-by: Yael Kalka > > > > > > Index: include/opensm/osm_subnet.h > > > =================================================================== > > > --- include/opensm/osm_subnet.h (revision 4288) > > > +++ include/opensm/osm_subnet.h (working copy) > > > @@ -235,6 +235,7 @@ typedef struct _osm_subn_opt > > > osm_testability_modes_t testability_mode; > > > boolean_t updn_activate; > > > char * updn_guid_file; > > > + boolean_t exit_on_fatal; > > > } osm_subn_opt_t; > > > /* > > > * FIELDS > > > @@ -372,6 +373,13 @@ typedef struct _osm_subn_opt > > > * updn_guid_file > > > * Pointer to name of the UPDN guid file given by User > > > * > > > +* exit_on_fatal > > > +* If TRUE (default) - SM will exit on fatal subnet > initialization issues. > > > +* If FALSE - SM will not exit. > > > +* Fatal initialization issues: > > > +* a. SM recognizes 2 different nodes with the same guid, or 12x > link with > > > +* lane reversal badly configured. > > > +* > > > * SEE ALSO > > > * Subnet object > > > *********/ > > > Index: opensm/osm_subnet.c > > > =================================================================== > > > --- opensm/osm_subnet.c (revision 4288) > > > +++ opensm/osm_subnet.c (working copy) > > > @@ -440,6 +440,7 @@ osm_subn_set_default_opt( > > > p_opt->testability_mode = OSM_TEST_MODE_NONE; > > > p_opt->updn_activate = FALSE; > > > p_opt->updn_guid_file = NULL; > > > + p_opt->exit_on_fatal = TRUE; > > > } > > > > > > > /********************************************************************** > > > @@ -765,6 +766,10 @@ osm_subn_parse_conf_file( > > > __osm_subn_opts_unpack_charp( > > > "updn_guid_file" , > > > p_key, p_val, &p_opts->updn_guid_file); > > > + > > > + __osm_subn_opts_unpack_boolean( > > > + "exit_on_fatal", > > > + p_key, p_val, &p_opts->exit_on_fatal); > > > } > > > } > > > fclose(opts_file); > > > @@ -930,14 +935,17 @@ osm_subn_write_conf_file( > > > "# If TRUE if OpenSM should disable multicast support\n" > > > "no_multicast_option %s\n\n" > > > "# No multicast routing is performed if TRUE\n" > > > - "disable_multicast %s\n\n", > > > + "disable_multicast %s\n\n" > > > + "# If TRUE opensm will exit on fatal initialization issues\n" > > > + "exit_on_fatal %s\n\n", > > > p_opts->log_flags, > > > p_opts->force_log_flush ? "TRUE" : "FALSE", > > > p_opts->log_file, > > > p_opts->accum_log_file ? "TRUE" : "FALSE", > > > p_opts->dump_files_dir, > > > p_opts->no_multicast_option ? "TRUE" : "FALSE", > > > - p_opts->disable_multicast ? "TRUE" : "FALSE" > > > + p_opts->disable_multicast ? "TRUE" : "FALSE", > > > + p_opts->exit_on_fatal ? "TRUE" : "FALSE" > > > ); > > > > > > /* optional string attributes ... */ > > > Index: opensm/osm_node_info_rcv.c > > > =================================================================== > > > --- opensm/osm_node_info_rcv.c (revision 4288) > > > +++ opensm/osm_node_info_rcv.c (working copy) > > > @@ -198,6 +198,14 @@ __osm_ni_rcv_set_links( > > > p_ni_context->port_num, > > > dr_new_path > > > ); > > > + > > > + osm_log( p_rcv->p_log, OSM_LOG_SYS, > > > + "Errors on subnet. SM found duplicated guids > or 12x " > > > + "link with lane reversal badly configured. " > > > + "Use osm log for more details.\n"); > > > + > > > + if ( p_rcv->p_subn->opt.exit_on_fatal == TRUE ) > > > + exit( 1 ); > > > } > > > > > > /* > > > Index: opensm/main.c > > > =================================================================== > > > --- opensm/main.c (revision 4288) > > > +++ opensm/main.c (working copy) > > > @@ -178,6 +178,12 @@ show_usage(void) > > > " This option will cause deletion of the log > file\n" > > > " (if it previously exists). By default, the log > file\n" > > > " is accumulative.\n\n"); > > > + printf( "-y\n" > > > + "--stay_on_fatal\n" > > > + " This option will cause SM not to exit on fatal > initialization\n" > > > + " issues: If SM discovers duplicated guids or > 12x link with\n" > > > + " lane reversal badly configured.\n" > > > + " By default, the SM will exit.\n\n"); > > > printf( "-v\n" > > > "--verbose\n" > > > " This option increases the log verbosity > level.\n" > > > @@ -460,7 +466,7 @@ main( > > > boolean_t cache_options = FALSE; > > > char *ignore_guids_file_name = NULL; > > > uint32_t val; > > > - const char * const short_option = "i:f:ed:g:l:s:t:a:uvVhorc"; > > > + const char * const short_option = "i:f:ed:g:l:s:t:a:uvVhorcy"; > > > > > > /* > > > In the array below, the 2nd parameter specified the number > > > @@ -492,6 +498,7 @@ main( > > > { "updn", 0, NULL, 'u'}, > > > { "add_guid_file", 1, NULL, 'a'}, > > > { "cache-options", 0, NULL, 'c'}, > > > + { "stay_on_fatal", 0, NULL, 'y'}, > > > { NULL, 0, NULL, 0 } /* Required at the end of > the array */ > > > }; > > > > > > @@ -665,6 +672,11 @@ main( > > > printf(" Creating new log file\n"); > > > break; > > > > > > + case 'y': > > > + opt.exit_on_fatal = FALSE; > > > + printf(" Staying on fatal initialization\n"); > > > + break; > > > + > > > case 'v': > > > log_flags = (log_flags <<1 )|1; > > > printf(" Verbose option -v (log flags = 0x%X)\n", log_flags > ); > > > > From halr at voltaire.com Mon Dec 5 10:47:07 2005 From: halr at voltaire.com (Hal Rosenstock) Date: Mon, 5 Dec 2005 20:47:07 +0200 Subject: [openib-general] Re: [PATCH] Opensm - duplicated guids issue Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F589AB7F@taurus.voltaire.com> On Sun, 2005-12-04 at 08:02, Yael Kalka wrote: > Hi Hal, > > Currently if OpenSM discovers duplicated guids or 12x link with lane > reversal badly configured it only issues an error to the log > file. This issue, though, is much more problematic, since it will cause > part of the subnet to be un-initialized. > The following patch includes a fuller handling of the issue - first, > issue an error message to the /var/log/messeges file as well. > Second - add an option flag to the SM that will define wether or not > to exit on such case. Thanks. Applied. From mshefty at ichips.intel.com Mon Dec 5 11:02:38 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 05 Dec 2005 11:02:38 -0800 Subject: [openib-general] possible CMA bug In-Reply-To: <013701c5de5a$9124ad10$d5000a0a@STEVO> References: <013701c5de5a$9124ad10$d5000a0a@STEVO> Message-ID: <43948ECE.1010502@ichips.intel.com> Steve Wise wrote: > I'm using the new rdma cma interface and i've perhaps stumbled onto a > bug. I'm trying to bind to port 9999 on both IB ports of a mthca > device. The IPoIB interfaces for the HCA are configured as two seperate > subnets. The second rdma_listen() always fails with EBUSY. Maybe this > is a limitation in the CMA design, but TCP stacks allow binding to the > same port on different ip addresses. And the CMA interface allows it > too as long as the two ip addresses map to different IB devices. > Whether this should work or not, I am seeing a crash when I try to > destroy the cm_id after the rdma_listen() failure. This has been fixed in the latest check-in (revision 4313). - Sean From sean.hefty at intel.com Mon Dec 5 12:36:35 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 5 Dec 2005 12:36:35 -0800 Subject: [openib-general] SDP use of CMA Message-ID: I'm like to start a discussion of having the CMA support SDP. To do this, I'd like to propose adding the following optional call to the CMA: enum rdma_cm_protocol { RDMA_CM_DEFAULT, RDMA_CM_SDP }; /** * rdma_set_protocol - Associate a specific connection protocol with the * RDMA identifier. * * @id: RDMA identifier. * @protocol: Connection protocol. * @version: Connection protocol version, or 0 to use the default. * * This is an optional call that may be made to specify that the indicated * connection protocol be used when establishing connections to the * associated RDMA identifier. */ int rdma_set_protocol(struct rdma_cm_id *id, enum rdma_cm_protocol protocol, int version); For most users, this routing would not need to be called. If called by SDP, then the CMA would use SDP's service IDs and private data format. The CMA would be responsible for setting the local port number and source/destination IP addresses. All other fields would be set by the caller. Comments? - Sean From caitlin.bestler at gmail.com Mon Dec 5 13:30:56 2005 From: caitlin.bestler at gmail.com (Caitlin Bestler) Date: Mon, 5 Dec 2005 13:30:56 -0800 Subject: [openib-general] SDP use of CMA In-Reply-To: References: Message-ID: <469958e00512051330o7ef9af94jb5684f04e5e76185@mail.gmail.com> On 12/5/05, Sean Hefty wrote: > > I'm like to start a discussion of having the CMA support SDP. To do this, > I'd > like to propose adding the following optional call to the CMA: > > enum rdma_cm_protocol { > RDMA_CM_DEFAULT, > RDMA_CM_SDP > }; > > /** > * rdma_set_protocol - Associate a specific connection protocol with the > * RDMA identifier. > * > * @id: RDMA identifier. > * @protocol: Connection protocol. > * @version: Connection protocol version, or 0 to use the default. > * > * This is an optional call that may be made to specify that the indicated > * connection protocol be used when establishing connections to the > * associated RDMA identifier. > */ > int rdma_set_protocol(struct rdma_cm_id *id, enum rdma_cm_protocol > protocol, > int version); > > > For most users, this routing would not need to be called. If called by > SDP, > then the CMA would use SDP's service IDs and private data format. The CMA > would > be responsible for setting the local port number and source/destination IP > addresses. All other fields would be set by the caller. > > Comments? > > - Sean > > Who is the intended consumer of this API? My understanding is that there are few to zero end applications that use SDP knowingly. They use the sockets API, which is intercepted at one layer or another by a middleware library, and it is that middleware library that uses SDP. If SDP middleware libraries are the only users of SDP-style connection setup then it would make more sense to have a distinct method to serve that purpose rather than having an enum/option flag on the main method. In particular I would not want end applications to expected to "request SDP" merely to get an offloaded SOCK_STREAM connection. On an IB network the advantage of SDP over TCP/IP over IPoIB is a no-brainer. But the tradeoff between the host TCP/IP stack, an offload TCP/IP stack and SDP/iWARP is a much more complex tradeoff. Depending on who the envisioned user is we may need to distinquish between 'definitely use SDP, because I know my peer is using SDP' and 'offloaded by whatever mutually available methods'. Those questions are irrelevant if the call is made from the intercept library itself, as that they were decided by controlling the intercept. But if the intercept library is the primary user of this option then I defnitely think that a separate method is better than an option param. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Mon Dec 5 14:00:42 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 05 Dec 2005 14:00:42 -0800 Subject: [openib-general] SDP use of CMA In-Reply-To: <469958e00512051330o7ef9af94jb5684f04e5e76185@mail.gmail.com> References: <469958e00512051330o7ef9af94jb5684f04e5e76185@mail.gmail.com> Message-ID: <4394B88A.2090605@ichips.intel.com> Caitlin Bestler wrote: > Who is the intended consumer of this API? The SDP kernel module is the intended consumer. Currently SDP duplicates most of the functionality found in the CMA. > If SDP middleware libraries are the only users of SDP-style connection setup > then it would make more sense to have a distinct method to serve that > purpose > rather than having an enum/option flag on the main method. I'm open to alternate proposals. Please provide specific details. - Sean From mshefty at ichips.intel.com Mon Dec 5 14:27:20 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 05 Dec 2005 14:27:20 -0800 Subject: [openib-general] SDP use of CMA In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F10C2929@NT-SJCA-0751.brcm.ad.broadcom.com> References: <54AD0F12E08D1541B826BE97C98F99F10C2929@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <4394BEC8.3020104@ichips.intel.com> Caitlin Bestler wrote: > Generally, I was advocating adding an extra method > that appends "_sdp" to the name rather than inserting > an "is_sdp" param. An option that supports this format would be to add a new call similar to: struct rdma_cm_id* sdp_create_id(rdma_cm_event_handler event_handler, void *context); - Sean From mst at mellanox.co.il Mon Dec 5 21:54:58 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Dec 2005 07:54:58 +0200 Subject: [openib-general] Re: SDP use of CMA In-Reply-To: References: Message-ID: <20051206055457.GA13071@mellanox.co.il> Quoting r. Sean Hefty : > Subject: SDP use of CMA > > I'm like to start a discussion of having the CMA support SDP. To do > this, I'd > like to propose adding the following optional call to the CMA: > > enum rdma_cm_protocol { > RDMA_CM_DEFAULT, > RDMA_CM_SDP > }; > > /** > * rdma_set_protocol - Associate a specific connection protocol with the > * RDMA identifier. > * > * @id: RDMA identifier. > * @protocol: Connection protocol. > * @version: Connection protocol version, or 0 to use the default. > * > * This is an optional call that may be made to specify that the > indicated > * connection protocol be used when establishing connections to the > * associated RDMA identifier. > */ > int rdma_set_protocol(struct rdma_cm_id *id, enum rdma_cm_protocol > protocol, > int version); > > > For most users, this routing would not need to be called. If called by > SDP, > then the CMA would use SDP's service IDs and private data format. The > CMA would > be responsible for setting the local port number and source/destination > IP > addresses. All other fields would be set by the caller. > > Comments? > > - Sean Fine with me. -- MST From mst at mellanox.co.il Mon Dec 5 21:55:28 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Dec 2005 07:55:28 +0200 Subject: [openib-general] Re: SDP use of CMA In-Reply-To: <4394BEC8.3020104@ichips.intel.com> References: <4394BEC8.3020104@ichips.intel.com> Message-ID: <20051206055528.GB13071@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: SDP use of CMA > > Caitlin Bestler wrote: > > Generally, I was advocating adding an extra method > > that appends "_sdp" to the name rather than inserting > > an "is_sdp" param. > > An option that supports this format would be to add a new call similar to: > > struct rdma_cm_id* sdp_create_id(rdma_cm_event_handler event_handler, > void *context); > > - Sean Thats fine with me, too. -- MST From rjwalsh at pathscale.com Mon Dec 5 21:59:13 2005 From: rjwalsh at pathscale.com (Robert Walsh) Date: Mon, 05 Dec 2005 21:59:13 -0800 Subject: [openib-general] ip_dev_find resolution? Message-ID: <1133848753.15727.11.camel@phosphene.durables.org> Hi all, There was some discussion back in Sep/Oct about ip_dev_find. Was there ever a resolution to this? Are we just waiting to get the modules that use it put into the kernel so we can justify getting it re-exported once again? Regards, Robert. -- Robert Walsh Email: rjwalsh at pathscale.com PathScale, Inc. Phone: +1 650 934 8117 2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969 Mountain View, CA 94043. From ianjiang.ict at gmail.com Tue Dec 6 00:00:36 2005 From: ianjiang.ict at gmail.com (Ian Jiang) Date: Tue, 6 Dec 2005 16:00:36 +0800 Subject: [openib-general] Want to learn more about the FMR Message-ID: <7b2fa1820512060000n6ce66e7eo2ab1278fa30a1358@mail.gmail.com> It is said in "Zero Copy Sockets Direct Protocol over InfiniBand - Preliminary Implementation and Performance Analysis" that the FMR is a Mellanox feature extending the 1.1 InfiniBand specification, and a similar feature was added later on to the 1.2 InfiniBand specification. But I have got nothing about the FMR in "InfiniBand™ Architecture Specification Release 1.2" (http://www.infinibandta.org/specs/register/publicspec/). I read the description of FMR related verbs in "Mellanox IB-Verbs API (VAPI) 1.00, however I am not very clear of the difference between FMR registering and common registering. Is there a description of more details? Any suggestion is appreciated! -- Ian Jiang ianjiang.ict at gmail.com Laboratory of Spatial Information Technology Division of System Architecture Institute of Computing Technology Chinese Academy of Sciences -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at mellanox.co.il Tue Dec 6 00:31:42 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 06 Dec 2005 10:31:42 +0200 Subject: [openib-general] [PATCH] osm: change info message to debug Message-ID: <861x0qsk29.fsf@mtl066.yok.mtl.com> Hi Hal The umad vendor provides an info message every time the osm_vendor_get_all_port_attr is invoked. This patch makes it a debug message. Thanks Eitan Signed-off-by: Eitan Zahavi Index: osm/libvendor/osm_vendor_ibumad.c =================================================================== --- osm/libvendor/osm_vendor_ibumad.c (revision 4317) +++ osm/libvendor/osm_vendor_ibumad.c (working copy) @@ -630,7 +630,7 @@ osm_vendor_get_all_port_attr( lids[0] = def_port.base_lid; linkstates[0] = def_port.state; - osm_log( p_vend->p_log, OSM_LOG_INFO, + osm_log( p_vend->p_log, OSM_LOG_DEBUG, "osm_vendor_get_all_port_attr: " "assign CA %s port %d guid (0x%"PRIx64") as the default port\n", def_port.ca_name, def_port.portnum, From halr at voltaire.com Tue Dec 6 03:27:49 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Dec 2005 06:27:49 -0500 Subject: [openib-general] ip_dev_find resolution? In-Reply-To: <1133848753.15727.11.camel@phosphene.durables.org> References: <1133848753.15727.11.camel@phosphene.durables.org> Message-ID: <1133868031.4587.20000.camel@hal.voltaire.com> Hi Robert, On Tue, 2005-12-06 at 00:59, Robert Walsh wrote: > Hi all, > > There was some discussion back in Sep/Oct about ip_dev_find. Was there > ever a resolution to this? Are we just waiting to get the modules that > use it put into the kernel so we can justify getting it re-exported once > again? Yes. At one point, Grant had indicated that IPmc might need that but I'm not sure how that was resolved. -- Hal > > Regards, > Robert. From mwqcduditw at nougen.com Tue Dec 6 04:36:09 2005 From: mwqcduditw at nougen.com (Isabelle Lockhart) Date: Tue, 6 Dec 2005 11:36:09 -0100 Subject: [openib-general] Brain? Message-ID: <09b601c5fa59$4030f1f0$955a81de@unit0d2q87dmlx> You've seen it on "60 Minutes" and read the BBC News report -- now find out just what everyone is talking about.

# Suppress your appetite and feel full and satisfied all day long
# Increase your energy levels
# Lose excess weight
# Increase your metabolism
# Burn body fat
# Burn calories
# Attack obesity
And more..

http://healthcarehoodia.com/

# Suitable for vegetarians and vegans
# MAINTAIN your weight loss
# Make losing weight a sure guarantee
# Look your best during the summer months

http://healthcarehoodia.com/

Regards, Dr. Isabelle Lockhart -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Dec 6 03:37:57 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Dec 2005 06:37:57 -0500 Subject: [openib-general] Re: [PATCH] osm: change info message to debug In-Reply-To: <861x0qsk29.fsf@mtl066.yok.mtl.com> References: <861x0qsk29.fsf@mtl066.yok.mtl.com> Message-ID: <1133869075.4587.20098.camel@hal.voltaire.com> On Tue, 2005-12-06 at 03:31, Eitan Zahavi wrote: > Hi Hal > > The umad vendor provides an info message every time the > osm_vendor_get_all_port_attr is invoked. This patch makes it a debug > message. Thanks. Applied. From yael at mellanox.co.il Tue Dec 6 04:02:49 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 06 Dec 2005 14:02:49 +0200 Subject: [openib-general] [PATCH] Opensm - add node record dumping Message-ID: <5zacfeh1qu.fsf@mtl066.yok.mtl.com> Hi Hal, The following code exists at least in several of the osm_sa_*_record.c files, but is missing in the osm_sa_node_record.c. When running with debug level - add a dump of the node record sent in the request. Thanks, Yael Signed-off-by: Yael Kalka Index: opensm/osm_sa_node_record.c =================================================================== --- opensm/osm_sa_node_record.c (revision 4319) +++ opensm/osm_sa_node_record.c (working copy) @@ -467,6 +467,9 @@ osm_nr_rcv_process( goto Exit; } + if( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) + osm_dump_node_record( p_rcv->p_log, p_rcvd_rec, OSM_LOG_DEBUG ); + cl_qlist_init( &rec_list ); context.p_rcvd_rec = p_rcvd_rec; From halr at voltaire.com Tue Dec 6 04:15:19 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Dec 2005 07:15:19 -0500 Subject: [openib-general] Re: [PATCH] Opensm - add node record dumping In-Reply-To: <5zacfeh1qu.fsf@mtl066.yok.mtl.com> References: <5zacfeh1qu.fsf@mtl066.yok.mtl.com> Message-ID: <1133871318.4587.20286.camel@hal.voltaire.com> Hi Yael, On Tue, 2005-12-06 at 07:02, Yael Kalka wrote: > Hi Hal, > > The following code exists at least in several of the osm_sa_*_record.c > files, but is missing in the osm_sa_node_record.c. > When running with debug level - add a dump of the node record sent > in the request. Thanks. Applied. Does the same thing apply to any other SA records which are not currently dumped on debug ? -- Hal From yael at mellanox.co.il Tue Dec 6 04:44:02 2005 From: yael at mellanox.co.il (Yael Kalka) Date: Tue, 6 Dec 2005 14:44:02 +0200 Subject: [openib-general] RE: [PATCH] Opensm - add node record dumping Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30E246C@mtlexch01.mtl.com> Didn't look at all of them, but I know at least some of them include the record dumping. -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Tuesday, December 06, 2005 2:15 PM To: Yael Kalka Cc: openib-general at openib.org; Eitan Zahavi Subject: Re: [PATCH] Opensm - add node record dumping Hi Yael, On Tue, 2005-12-06 at 07:02, Yael Kalka wrote: > Hi Hal, > > The following code exists at least in several of the osm_sa_*_record.c > files, but is missing in the osm_sa_node_record.c. > When running with debug level - add a dump of the node record sent > in the request. Thanks. Applied. Does the same thing apply to any other SA records which are not currently dumped on debug ? -- Hal From ianjiang.ict at gmail.com Tue Dec 6 04:52:13 2005 From: ianjiang.ict at gmail.com (Ian Jiang) Date: Tue, 6 Dec 2005 20:52:13 +0800 Subject: [openib-general] [kDAPL]How to register a vmalloc() allocated buffer In-Reply-To: References: <7b2fa1820512030945j22e205d9j86a3b8e7bd709182@mail.gmail.com> Message-ID: <7b2fa1820512060452p28a7a552w3b68b57513b3c80d@mail.gmail.com> Hi James, You are always so kind! Now I have a question about reading a buffer of a application in user space. Is it the only way to use the uDAPL? I used to have an idea like this: The application in user space gives the virtual start address and length of its data buffer to a kernel module program. This kernel program acts as a application of the kDAPL and registers the user space data buffer with the kDAPl, then request a RDMA read operation to complete the data transferring. But I think it is not feasible after getting your last reply. Am I right? Please give some suggestion and thanks very much! On 12/6/05, James Lentini wrote: > > Hi Ian, > > An IB HCA needs to be able to DMA the memory used for RDMA read. Since > vmalloc does not guarantee that the memory it returns can be accessed > via DMA, you should not use vmalloc. > > james > -- Ian Jiang ianjiang.ict at gmail.com Laboratory of Spatial Information Technology Division of System Architecture Institute of Computing Technology Chinese Academy of Sciences -------------- next part -------------- An HTML attachment was scrubbed... URL: From caitlinb at broadcom.com Tue Dec 6 05:24:39 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Tue, 6 Dec 2005 05:24:39 -0800 Subject: [openib-general] [kDAPL]How to register a vmalloc() allocated buffer Message-ID: <54AD0F12E08D1541B826BE97C98F99F10C29C3@NT-SJCA-0751.brcm.ad.broadcom.com> An option like that was discussed in RNIC-PI, but is not generally explicitly supported. What the kernel daemon must do is map the user-space to bus/io addresses and then physically register that as a Memory Region (or LMR if working at the DAT layer). If the user-mode application is not explicitly involved in identifying what buffers are going to be used (i.e., registering the memory) then you won't achieve the full efficiency of RDMA. Creating memory regions per operation at the verb layer is expensive. and interferes with pipelining. Full fast-memory-register work requests are defined in iWARP and the latest IBTA spec, but has not made it into wide deployment yet. Unless the user-kernel daemon interface is something you are inheriting I would not recommend this approach. ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Ian Jiang Sent: Tuesday, December 06, 2005 4:52 AM To: James Lentini Cc: openib-general Subject: Re: [openib-general] [kDAPL]How to register a vmalloc() allocated buffer Hi James, You are always so kind! Now I have a question about reading a buffer of a application in user space. Is it the only way to use the uDAPL? I used to have an idea like this: The application in user space gives the virtual start address and length of its data buffer to a kernel module program. This kernel program acts as a application of the kDAPL and registers the user space data buffer with the kDAPl, then request a RDMA read operation to complete the data transferring. But I think it is not feasible after getting your last reply. Am I right? Please give some suggestion and thanks very much! On 12/6/05, James Lentini wrote: Hi Ian, An IB HCA needs to be able to DMA the memory used for RDMA read. Since vmalloc does not guarantee that the memory it returns can be accessed via DMA, you should not use vmalloc. james -- Ian Jiang ianjiang.ict at gmail.com Laboratory of Spatial Information Technology Division of System Architecture Institute of Computing Technology Chinese Academy of Sciences -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Dec 6 06:05:36 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Dec 2005 09:05:36 -0500 Subject: [openib-general] [PATCH] OpenSM: SA SMInfoRecord should support GetTable as well as Get method Message-ID: <1133877936.4587.20821.camel@hal.voltaire.com> OpenSM: SA SMInfoRecord should support GetTable as well as Get method Signed-off-by: Hal Rosenstock Index: osm_sa_sminfo_record.c =================================================================== --- osm_sa_sminfo_record.c (revision 4323) +++ osm_sa_sminfo_record.c (working copy) @@ -165,7 +165,8 @@ osm_smir_rcv_process( CL_ASSERT( p_sa_mad->attr_id == IB_MAD_ATTR_SMINFO_RECORD ); - if (p_sa_mad->method != IB_MAD_METHOD_GET) + if ( (p_sa_mad->method != IB_MAD_METHOD_GET) && + (p_sa_mad->method != IB_MAD_METHOD_GETTABLE) ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "osm_smir_rcv_process: ERR 2804: " From halr at voltaire.com Tue Dec 6 06:11:40 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Dec 2005 09:11:40 -0500 Subject: [openib-general] RE: [PATCH] Opensm - add node record dumping In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30E246C@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30E246C@mtlexch01.mtl.com> Message-ID: <1133877977.4587.20828.camel@hal.voltaire.com> On Tue, 2005-12-06 at 07:44, Yael Kalka wrote: > Didn't look at all of them, but I know at least some of them include the > record dumping. It looks to me like the following supported SA records don't currently do this: sminfo vlarb slvl pkey lft -- Hal > > > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Tuesday, December 06, 2005 2:15 PM > To: Yael Kalka > Cc: openib-general at openib.org; Eitan Zahavi > Subject: Re: [PATCH] Opensm - add node record dumping > > > Hi Yael, > > On Tue, 2005-12-06 at 07:02, Yael Kalka wrote: > > Hi Hal, > > > > The following code exists at least in several of the osm_sa_*_record.c > > > files, but is missing in the osm_sa_node_record.c. > > When running with debug level - add a dump of the node record sent > > in the request. > > Thanks. Applied. > > Does the same thing apply to any other SA records which are not > currently dumped on debug ? > > -- Hal From tom at opengridcomputing.com Tue Dec 6 07:34:43 2005 From: tom at opengridcomputing.com (Tom Tucker) Date: Tue, 06 Dec 2005 09:34:43 -0600 Subject: [openib-general] SDP use of CMA In-Reply-To: <4394BEC8.3020104@ichips.intel.com> References: <54AD0F12E08D1541B826BE97C98F99F10C2929@NT-SJCA-0751.brcm.ad.broadcom.com> <4394BEC8.3020104@ichips.intel.com> Message-ID: <1133883283.11138.6.camel@trinity.austin.ammasso.com> Not to jump in late on this, but why couldn't we just add a protocol parameter to the create_id call. Then it is arbitrarily extensible ala socket(AF_INET, SOCK_STREAM, ) So what I'm specifically suggesting is: struct rdma_cm_id* rdma_create_id(rdma_cm_event_handler* cm_handler, void* context, rdma_cm_proto proto); Then we don't need a new call, it's extensible to new protocols, and it's not a single purpose is_sdp parameter. On Mon, 2005-12-05 at 14:27 -0800, Sean Hefty wrote: > Caitlin Bestler wrote: > > Generally, I was advocating adding an extra method > > that appends "_sdp" to the name rather than inserting > > an "is_sdp" param. > > An option that supports this format would be to add a new call similar to: > > struct rdma_cm_id* sdp_create_id(rdma_cm_event_handler event_handler, > void *context); > > - Sean > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Tue Dec 6 07:38:55 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Dec 2005 17:38:55 +0200 Subject: [openib-general] Re: SDP use of CMA In-Reply-To: <1133883283.11138.6.camel@trinity.austin.ammasso.com> References: <1133883283.11138.6.camel@trinity.austin.ammasso.com> Message-ID: <20051206153855.GG21035@mellanox.co.il> Quoting r. Tom Tucker : > Subject: Re: SDP use of CMA > > Not to jump in late on this, but why couldn't we just add a protocol > parameter to the create_id call. Then it is arbitrarily extensible ala > socket(AF_INET, SOCK_STREAM, ) > > > So what I'm specifically suggesting is: > > struct rdma_cm_id* rdma_create_id(rdma_cm_event_handler* cm_handler, > void* context, > rdma_cm_proto proto); Makes sense. We'd need to define rdma_cm_proto values I guess: SDP, default, .... -- MST From dotanb at mellanox.co.il Tue Dec 6 07:48:13 2005 From: dotanb at mellanox.co.il (Dotan Barak) Date: Tue, 6 Dec 2005 17:48:13 +0200 Subject: [openib-general] RE: can i post a send request with 0 bytes with the inline bit enabled? Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3B8D436@mtlexch01.mtl.com> > > I guess we might as well fix it. I checked in the following patch. > > - R. > sorry about the delay. i checked this patch: posting SR with 0 s/g list length with the inline flag enabled works only for tavor. for memfree devices i still get completion with error. thanks Dotan From steve.apo at googlemail.com Tue Dec 6 08:05:26 2005 From: steve.apo at googlemail.com (Steven Wooding) Date: Tue, 6 Dec 2005 16:05:26 +0000 Subject: [openib-general] UC connection server Message-ID: <2cfcf21e0512060805u466c9d83m@mail.gmail.com> Hi, I wonder if anybody could give some advice about an idea for making UC connections with a device that doesn't support a CM (it's a custom-made embedded device). The idea is to use a PC-based stack that does use the standard CM interface. I can then make a connection with that. The PC then gets the info about the real QP from the embedded device via some proprietary method. The problem with this idea is that in the standard CM protocol, it forms the connection using the LID that the REQ was sent to. But I need to change this to the LID of the embedded device. I've looked at doing path migration which looked like it might do this, but I could do with some advice. For example, in path migration, does the original connection remain? Any other suggests are welcome (I know I could do an Ethernet connection with the PC and exchange the info that way, but that's a last resort at the moment). Thanks for your time. Cheers, Steve. From steve.apo at googlemail.com Tue Dec 6 08:16:17 2005 From: steve.apo at googlemail.com (Steven Wooding) Date: Tue, 6 Dec 2005 16:16:17 +0000 Subject: [openib-general] Relaying data through an HCA card Message-ID: <2cfcf21e0512060816r6ea2083fr@mail.gmail.com> Hi, I have the requirement for a PC that acts as a data relay. I basically need to pass data from an input QP connection to an output QP connection on the PC. Could this be done entirely within the HCA card, without touching system memory or using a userspace application to supervise the data? This is so the data throughput remains as high as possible. Thanks for your time. Cheers, Steve. From dotanb at mellanox.co.il Tue Dec 6 08:23:40 2005 From: dotanb at mellanox.co.il (Dotan Barak) Date: Tue, 6 Dec 2005 18:23:40 +0200 Subject: [openib-general] RE: can i post a send request with 0 bytes with the inline bit enabled? Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3B8D453@mtlexch01.mtl.com> > > > > > I guess we might as well fix it. I checked in the following patch. > > > > - R. > > > > sorry about the delay. > > i checked this patch: > posting SR with 0 s/g list length with the inline flag > enabled works only for tavor. > for memfree devices i still get completion with error. > > > thanks > Dotan > i'm sorry about my last email (i had a mess with in my sources): i checked the patch and it works for tavor and memfree devices as well. thanks Dotan From caitlinb at broadcom.com Tue Dec 6 08:10:12 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Tue, 6 Dec 2005 08:10:12 -0800 Subject: [openib-general] SDP use of CMA Message-ID: <54AD0F12E08D1541B826BE97C98F99F10C29E5@NT-SJCA-0751.brcm.ad.broadcom.com> Tom Tucker wrote: > Not to jump in late on this, but why couldn't we just add a > protocol parameter to the create_id call. Then it is > arbitrarily extensible ala socket(AF_INET, SOCK_STREAM, ) > > > So what I'm specifically suggesting is: > > struct rdma_cm_id* rdma_create_id(rdma_cm_event_handler* cm_handler, > void* context, rdma_cm_proto proto); > > > Then we don't need a new call, it's extensible to new > protocols, and it's not a single purpose is_sdp parameter. > That's an excellent suggestion, if we think that this is an area that will be extensible. So far in iWARP we have RDMAC MPA, SDP, IETF MPA using unstructured private data and IETF MPA using the IT-API structure private data. And IB we have unstructured private data and TCP compatible connection setup private data. If we're confident that there is a dominant one for each transport, and the others are merely transitional relics, then the extra methods make the most sense. If we don't think things are really settled then the 'proto' argument makes a lot of sense. The crux question remains though, will there ever be a caller that does not specify the 'proto' as a constant? If there's a scenario for that, then having a parameter in the call rather than a case statement in the caller makes a lot of sense. But if every actual use will select a constant value then what is gained by having a single method? From mshefty at ichips.intel.com Tue Dec 6 09:31:46 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 06 Dec 2005 09:31:46 -0800 Subject: [openib-general] SDP use of CMA In-Reply-To: <1133883283.11138.6.camel@trinity.austin.ammasso.com> References: <54AD0F12E08D1541B826BE97C98F99F10C2929@NT-SJCA-0751.brcm.ad.broadcom.com> <4394BEC8.3020104@ichips.intel.com> <1133883283.11138.6.camel@trinity.austin.ammasso.com> Message-ID: <4395CB02.9010602@ichips.intel.com> Tom Tucker wrote: > Not to jump in late on this, but why couldn't we just add a protocol > parameter to the create_id call. Then it is arbitrarily extensible ala > socket(AF_INET, SOCK_STREAM, ) > > So what I'm specifically suggesting is: > > struct rdma_cm_id* rdma_create_id(rdma_cm_event_handler* cm_handler, > void* context, > rdma_cm_proto proto); This isn't much different that having a separate call to set the protocol. I went with a separate protocol API to add in version information as well. Changing just the create_id call makes sense. - Sean From iod00d at hp.com Tue Dec 6 09:43:55 2005 From: iod00d at hp.com (Grant Grundler) Date: Tue, 6 Dec 2005 09:43:55 -0800 Subject: [openib-general] ip_dev_find resolution? In-Reply-To: <1133868031.4587.20000.camel@hal.voltaire.com> References: <1133848753.15727.11.camel@phosphene.durables.org> <1133868031.4587.20000.camel@hal.voltaire.com> Message-ID: <20051206174355.GB21980@esmail.cup.hp.com> On Tue, Dec 06, 2005 at 06:27:49AM -0500, Hal Rosenstock wrote: > Hi Robert, > > On Tue, 2005-12-06 at 00:59, Robert Walsh wrote: > > Hi all, > > > > There was some discussion back in Sep/Oct about ip_dev_find. Was there > > ever a resolution to this? Not that I'm aware of. > > Are we just waiting to get the modules that > > use it put into the kernel so we can justify getting it re-exported once > > again? That would be a good approach. > Yes. At one point, Grant had indicated that IPmc might need that but I'm > not sure how that was resolved. IPmc? Oh! IP_MROUTE. But IP_MROUTE doesn't need ip_dev_find exported since IP_MROUTE code can't be built as a module. My original email is here: http://openib.org/pipermail/openib-general/2005-November/013563.html Original email thread starts here: http://openib.org/pipermail/openib-general/2005-November/013471.html thanks, grant From mshefty at ichips.intel.com Tue Dec 6 09:43:50 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 06 Dec 2005 09:43:50 -0800 Subject: [openib-general] UC connection server In-Reply-To: <2cfcf21e0512060805u466c9d83m@mail.gmail.com> References: <2cfcf21e0512060805u466c9d83m@mail.gmail.com> Message-ID: <4395CDD6.1080601@ichips.intel.com> Steven Wooding wrote: > The idea is to use a PC-based stack that does use the standard CM > interface. I can then make a connection with that. The PC then gets > the info about the real QP from the embedded device via some > proprietary method. You can use the existing connection to exchange the QP information, similar to how you would exchange the information over Ethernet. Once the other QPs are setup, you can either teardown the existing connection or use it to connect other QPs. > The problem with this idea is that in the standard CM protocol, it > forms the connection using the LID that the REQ was sent to. But I > need to change this to the LID of the embedded device. I've looked at > doing path migration which looked like it might do this, but I could > do with some advice. Just a thought, and likely not a very good one, but you could reject the REQ, then issue a new REQ from the PC that contained the new LID. The CM might need some changes to support this. I think that CM port redirection (reject code 25) may also give you what you want. Supporting this in the CM is a little tricky however. - Sean From mst at mellanox.co.il Tue Dec 6 09:56:52 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Dec 2005 19:56:52 +0200 Subject: [openib-general] Re: ip_dev_find resolution? In-Reply-To: <20051206174355.GB21980@esmail.cup.hp.com> References: <20051206174355.GB21980@esmail.cup.hp.com> Message-ID: <20051206175652.GC22860@mellanox.co.il> Quoting r. Grant Grundler : > Subject: Re: ip_dev_find resolution? > > On Tue, Dec 06, 2005 at 06:27:49AM -0500, Hal Rosenstock wrote: > > Hi Robert, > > > > On Tue, 2005-12-06 at 00:59, Robert Walsh wrote: > > > Hi all, > > > > > > There was some discussion back in Sep/Oct about ip_dev_find. Was > there > > > ever a resolution to this? > > Not that I'm aware of. > > > > Are we just waiting to get the modules that > > > use it put into the kernel so we can justify getting it re-exported > once > > > again? > > That would be a good approach. > > > Yes. At one point, Grant had indicated that IPmc might need that but > I'm > > not sure how that was resolved. > > IPmc? Oh! IP_MROUTE. But IP_MROUTE doesn't need ip_dev_find exported > since IP_MROUTE code can't be built as a module. > > My original email is here: > http://openib.org/pipermail/openib-general/2005-November/013563.html > > Original email thread starts here: > http://openib.org/pipermail/openib-general/2005-November/013471.html Actually, I wander whether instead of ip_dev_find we can just read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { and check the ip address? If this works, this has the advantage of supporting IPv6 as well. MST -- MST From ftillier at silverstorm.com Tue Dec 6 10:14:47 2005 From: ftillier at silverstorm.com (Fabian Tillier) Date: Tue, 6 Dec 2005 10:14:47 -0800 Subject: [openib-general] UC connection server In-Reply-To: <2cfcf21e0512060805u466c9d83m@mail.gmail.com> References: <2cfcf21e0512060805u466c9d83m@mail.gmail.com> Message-ID: <79ae2f320512061014u31811f8bwafc5ed37acd4f97c@mail.gmail.com> Hi Steve, On 12/6/05, Steven Wooding wrote: > Hi, > > The idea is to use a PC-based stack that does use the standard CM > interface. I can then make a connection with that. The PC then gets > the info about the real QP from the embedded device via some > proprietary method. > > The problem with this idea is that in the standard CM protocol, it > forms the connection using the LID that the REQ was sent to. Actually, it's the other way around. The CM uses the LID from the path information in the REQ as the destination of the MAD. There is no way currently to send a CM REQ to another LID than the one indicated in the REQ's path information. > But I need to change this to the LID of the embedded device. I've > looked at doing path migration which looked like it might do this, > but I could do with some advice. No, path migration will only work accross ports of a single HCA. - Fab From ftillier at silverstorm.com Tue Dec 6 10:18:18 2005 From: ftillier at silverstorm.com (Fabian Tillier) Date: Tue, 6 Dec 2005 10:18:18 -0800 Subject: [openib-general] UC connection server In-Reply-To: <4395CDD6.1080601@ichips.intel.com> References: <2cfcf21e0512060805u466c9d83m@mail.gmail.com> <4395CDD6.1080601@ichips.intel.com> Message-ID: <79ae2f320512061018m61a445d9xeee4e37bfb326974@mail.gmail.com> On 12/6/05, Sean Hefty wrote: > Steven Wooding wrote: > > The idea is to use a PC-based stack that does use the standard CM > > interface. I can then make a connection with that. The PC then gets > > the info about the real QP from the embedded device via some > > proprietary method. > > You can use the existing connection to exchange the QP information, similar to > how you would exchange the information over Ethernet. Once the other QPs are > setup, you can either teardown the existing connection or use it to connect > other QPs. I think what Steve wants to do is issue a REQ, send it to the PC, but have the path record go to his embedded device (which has a different LID). The CM protocol supports this, but the implementation of the CM looks at the path record to determine the destination of the CM MADs. Supporting this would require some way for the user to set the target of the CM MADs independently of the path information contained in the REQ. Adding an optional extra path record for the CM path might do the trick. - Fab From iod00d at hp.com Tue Dec 6 10:23:38 2005 From: iod00d at hp.com (Grant Grundler) Date: Tue, 6 Dec 2005 10:23:38 -0800 Subject: [openib-general] [kDAPL]How to register a vmalloc() allocated buffer In-Reply-To: <7b2fa1820512060452p28a7a552w3b68b57513b3c80d@mail.gmail.com> References: <7b2fa1820512030945j22e205d9j86a3b8e7bd709182@mail.gmail.com> <7b2fa1820512060452p28a7a552w3b68b57513b3c80d@mail.gmail.com> Message-ID: <20051206182338.GC21980@esmail.cup.hp.com> On Tue, Dec 06, 2005 at 08:52:13PM +0800, Ian Jiang wrote: > Hi James, > You are always so kind! > Now I have a question about reading a buffer of a application in user space. > Is it the only way to use the uDAPL? > I used to have an idea like this: > The application in user space gives the virtual start address and length of > its data buffer to a kernel module program. This kernel program acts as a > application of the kDAPL and registers the user space data buffer with the > kDAPl, Ian, If you are doing this with OpenIB, my advice is to NOT start with kDAPL. AFAICT, kDAPL is going away once any dependencies on it are resolved. And it's clearly not going to be pushed to kernel.org source trees. ISTR Dan Bar Dov wrote iSER was no longer dependent on kDAPL but not sure if that was the only module. > then request a RDMA read operation to complete the data transferring. > But I think it is not feasible after getting your last reply. Am I right? > Please give some suggestion and thanks very much! In general, a kernel module can map a user space address to a "DMA Address". OpenIB code has interfaces to register the "DMA Address" with the IB card. grant From mst at mellanox.co.il Tue Dec 6 10:57:52 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Dec 2005 20:57:52 +0200 Subject: [openib-general] Re: SDP use of CMA In-Reply-To: <4395CB02.9010602@ichips.intel.com> References: <4395CB02.9010602@ichips.intel.com> Message-ID: <20051206185752.GD23088@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: SDP use of CMA > > Tom Tucker wrote: > > Not to jump in late on this, but why couldn't we just add a protocol > > parameter to the create_id call. Then it is arbitrarily extensible ala > > socket(AF_INET, SOCK_STREAM, ) > > > > So what I'm specifically suggesting is: > > > > struct rdma_cm_id* rdma_create_id(rdma_cm_event_handler* cm_handler, > > void* context, > > rdma_cm_proto proto); > > This isn't much different that having a separate call to set the protocol. I > went with a separate protocol API to add in version information as well. > Changing just the create_id call makes sense. > > - Sean I also have this notion that it might be a good idea to put the protocol in the reserved bits in the service id. Makes sense? -- MST From mst at mellanox.co.il Tue Dec 6 10:48:14 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Dec 2005 20:48:14 +0200 Subject: [openib-general] [PATCH] core: segmented rmpp sends Message-ID: <20051206184814.GC23088@mellanox.co.il> With the following in place we are able to perform very large RMPP transfers. Please comment. --- Modify the rmpp mad support to accept a linked list of segments instead of a large physically contigious buffer. The list is kept in mad_send_wr private data and constructed with new ib_append_to_multipacket_mad API call. Modify user_mad.c to allocate large MADs for send/receive by chunks. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Index: latest/drivers/infiniband/core/mad_rmpp.c =================================================================== --- latest.orig/drivers/infiniband/core/mad_rmpp.c +++ latest/drivers/infiniband/core/mad_rmpp.c @@ -433,44 +433,6 @@ static struct ib_mad_recv_wc * complete_ return rmpp_wc; } -void ib_coalesce_recv_mad(struct ib_mad_recv_wc *mad_recv_wc, void *buf) -{ - struct ib_mad_recv_buf *seg_buf; - struct ib_rmpp_mad *rmpp_mad; - void *data; - int size, len, offset; - u8 flags; - - len = mad_recv_wc->mad_len; - if (len <= sizeof(struct ib_mad)) { - memcpy(buf, mad_recv_wc->recv_buf.mad, len); - return; - } - - offset = data_offset(mad_recv_wc->recv_buf.mad->mad_hdr.mgmt_class); - - list_for_each_entry(seg_buf, &mad_recv_wc->rmpp_list, list) { - rmpp_mad = (struct ib_rmpp_mad *)seg_buf->mad; - flags = ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr); - - if (flags & IB_MGMT_RMPP_FLAG_FIRST) { - data = rmpp_mad; - size = sizeof(*rmpp_mad); - } else { - data = (void *) rmpp_mad + offset; - if (flags & IB_MGMT_RMPP_FLAG_LAST) - size = len; - else - size = sizeof(*rmpp_mad) - offset; - } - - memcpy(buf, data, size); - len -= size; - buf += size; - } -} -EXPORT_SYMBOL(ib_coalesce_recv_mad); - static struct ib_mad_recv_wc * continue_rmpp(struct ib_mad_agent_private *agent, struct ib_mad_recv_wc *mad_recv_wc) @@ -570,16 +532,26 @@ start_rmpp(struct ib_mad_agent_private * return mad_recv_wc; } -static inline u64 get_seg_addr(struct ib_mad_send_wr_private *mad_send_wr) +static inline void * get_seg_addr(struct ib_mad_send_wr_private *mad_send_wr) { - return mad_send_wr->sg_list[0].addr + mad_send_wr->data_offset + - (sizeof(struct ib_rmpp_mad) - mad_send_wr->data_offset) * - (mad_send_wr->seg_num - 1); + struct ib_mad_multipacket_seg *seg; + int i = 2; + + if (list_empty(&mad_send_wr->multipacket_list)) + return NULL; + + list_for_each_entry(seg, &mad_send_wr->multipacket_list, list) { + if (i == mad_send_wr->seg_num) + return seg->data; + i++; + } + return NULL; } -static int send_next_seg(struct ib_mad_send_wr_private *mad_send_wr) +int send_next_seg(struct ib_mad_send_wr_private *mad_send_wr) { struct ib_rmpp_mad *rmpp_mad; + void *next_data; int timeout; u32 paylen; @@ -594,12 +566,14 @@ static int send_next_seg(struct ib_mad_s rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(paylen); mad_send_wr->sg_list[0].length = sizeof(struct ib_rmpp_mad); } else { - mad_send_wr->send_wr.num_sge = 2; - mad_send_wr->sg_list[0].length = mad_send_wr->data_offset; - mad_send_wr->sg_list[1].addr = get_seg_addr(mad_send_wr); - mad_send_wr->sg_list[1].length = sizeof(struct ib_rmpp_mad) - - mad_send_wr->data_offset; - mad_send_wr->sg_list[1].lkey = mad_send_wr->sg_list[0].lkey; + next_data = get_seg_addr(mad_send_wr); + if (!next_data) { + printk(KERN_ERR PFX "send_next_seg: " + "could not find next segment\n"); + return -EINVAL; + } + memcpy((void *)rmpp_mad + mad_send_wr->data_offset, next_data, + sizeof(struct ib_rmpp_mad) - mad_send_wr->data_offset); rmpp_mad->rmpp_hdr.paylen_newwin = 0; } Index: latest/drivers/infiniband/include/rdma/ib_mad.h =================================================================== --- latest.orig/drivers/infiniband/include/rdma/ib_mad.h +++ latest/drivers/infiniband/include/rdma/ib_mad.h @@ -141,6 +141,11 @@ struct ib_rmpp_hdr { __be32 paylen_newwin; }; +struct ib_mad_multipacket_seg { + struct list_head list; + u8 data[0]; +}; + typedef u64 __bitwise ib_sa_comp_mask; #define IB_SA_COMP_MASK(n) ((__force ib_sa_comp_mask) cpu_to_be64(1ull << n)) @@ -485,17 +490,6 @@ int ib_unregister_mad_agent(struct ib_ma int ib_post_send_mad(struct ib_mad_send_buf *send_buf, struct ib_mad_send_buf **bad_send_buf); -/** - * ib_coalesce_recv_mad - Coalesces received MAD data into a single buffer. - * @mad_recv_wc: Work completion information for a received MAD. - * @buf: User-provided data buffer to receive the coalesced buffers. The - * referenced buffer should be at least the size of the mad_len specified - * by @mad_recv_wc. - * - * This call copies a chain of received MAD segments into a single data buffer, - * removing duplicated headers. - */ -void ib_coalesce_recv_mad(struct ib_mad_recv_wc *mad_recv_wc, void *buf); /** * ib_free_recv_mad - Returns data buffers used to receive a MAD. @@ -601,6 +595,18 @@ struct ib_mad_send_buf * ib_create_send_ gfp_t gfp_mask); /** + * ib_append_to_multipacket_mad - Append a segment of an RMPP multipacket mad send + * to the send buffer. + * @send_buf: Previously allocated send data buffer. + * @seg: segment to append to linked list (already filled with data). + * + * This routine appends a segment of a multipacket RMPP message + * (copied from user space) to a MAD for sending. + */ +void ib_append_to_multipacket_mad(struct ib_mad_send_buf * send_buf, + struct ib_mad_multipacket_seg *seg); + +/** * ib_free_send_mad - Returns data buffers used to send a MAD. * @send_buf: Previously allocated send data buffer. */ Index: latest/drivers/infiniband/core/mad.c =================================================================== --- latest.orig/drivers/infiniband/core/mad.c +++ latest/drivers/infiniband/core/mad.c @@ -792,17 +792,13 @@ struct ib_mad_send_buf * ib_create_send_ return ERR_PTR(-EINVAL); length = sizeof *mad_send_wr + buf_size; - if (length >= PAGE_SIZE) - buf = (void *)__get_free_pages(gfp_mask, long_log2(roundup_pow_of_two(length)) - PAGE_SHIFT); - else - buf = kmalloc(length, gfp_mask); + buf = kzalloc(sizeof *mad_send_wr + sizeof(struct ib_mad), gfp_mask); if (!buf) return ERR_PTR(-ENOMEM); - memset(buf, 0, length); - - mad_send_wr = buf + buf_size; + mad_send_wr = buf + sizeof(struct ib_mad); + INIT_LIST_HEAD(&mad_send_wr->multipacket_list); mad_send_wr->send_buf.mad = buf; mad_send_wr->mad_agent_priv = mad_agent_priv; @@ -834,23 +830,33 @@ struct ib_mad_send_buf * ib_create_send_ } EXPORT_SYMBOL(ib_create_send_mad); +void ib_append_to_multipacket_mad(struct ib_mad_send_buf * send_buf, + struct ib_mad_multipacket_seg *seg) +{ + struct ib_mad_send_wr_private *mad_send_wr; + + mad_send_wr = container_of(send_buf, struct ib_mad_send_wr_private, + send_buf); + list_add_tail(&seg->list, &mad_send_wr->multipacket_list); +} +EXPORT_SYMBOL(ib_append_to_multipacket_mad); + void ib_free_send_mad(struct ib_mad_send_buf *send_buf) { struct ib_mad_agent_private *mad_agent_priv; - void *mad_send_wr; - int length; + struct ib_mad_send_wr_private *mad_send_wr; + struct ib_mad_multipacket_seg *seg, *tmp; mad_agent_priv = container_of(send_buf->mad_agent, struct ib_mad_agent_private, agent); mad_send_wr = container_of(send_buf, struct ib_mad_send_wr_private, send_buf); - length = sizeof(struct ib_mad_send_wr_private) + (mad_send_wr - send_buf->mad); - if (length >= PAGE_SIZE) - free_pages((unsigned long)send_buf->mad, long_log2(roundup_pow_of_two(length)) - PAGE_SHIFT); - else - kfree(send_buf->mad); - + list_for_each_entry_safe(seg, tmp, &mad_send_wr->multipacket_list, list) { + list_del(&seg->list); + kfree(seg); + } + kfree(send_buf->mad); if (atomic_dec_and_test(&mad_agent_priv->refcount)) wake_up(&mad_agent_priv->wait); } Index: latest/drivers/infiniband/core/mad_priv.h =================================================================== --- latest.orig/drivers/infiniband/core/mad_priv.h +++ latest/drivers/infiniband/core/mad_priv.h @@ -130,6 +130,7 @@ struct ib_mad_send_wr_private { enum ib_wc_status status; /* RMPP control */ + struct list_head multipacket_list; int last_ack; int seg_num; int newwin; Index: latest/drivers/infiniband/core/user_mad.c =================================================================== --- latest.orig/drivers/infiniband/core/user_mad.c +++ latest/drivers/infiniband/core/user_mad.c @@ -123,6 +123,7 @@ struct ib_umad_packet { struct ib_mad_send_buf *msg; struct list_head list; int length; + struct list_head seg_list; struct ib_user_mad mad; }; @@ -176,6 +177,87 @@ static int queue_packet(struct ib_umad_f return ret; } +static int data_offset(u8 mgmt_class) +{ + if (mgmt_class == IB_MGMT_CLASS_SUBN_ADM) + return IB_MGMT_SA_HDR; + else if ((mgmt_class >= IB_MGMT_CLASS_VENDOR_RANGE2_START) && + (mgmt_class <= IB_MGMT_CLASS_VENDOR_RANGE2_END)) + return IB_MGMT_VENDOR_HDR; + else + return IB_MGMT_RMPP_HDR; +} + +static int copy_recv_mad(struct ib_mad_recv_wc *mad_recv_wc, + struct ib_umad_packet *packet) +{ + struct ib_mad_recv_buf *seg_buf; + struct ib_rmpp_mad *rmpp_mad; + void *data; + struct ib_mad_multipacket_seg *seg; + int size, len, offset; + u8 flags; + + len = mad_recv_wc->mad_len; + if (len <= sizeof(struct ib_mad)) { + memcpy(&packet->mad.data, mad_recv_wc->recv_buf.mad, len); + return 0; + } + + offset = data_offset(mad_recv_wc->recv_buf.mad->mad_hdr.mgmt_class); + + list_for_each_entry(seg_buf, &mad_recv_wc->rmpp_list, list) { + rmpp_mad = (struct ib_rmpp_mad *)seg_buf->mad; + flags = ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr); + + if (flags & IB_MGMT_RMPP_FLAG_FIRST) { + size = sizeof(*rmpp_mad); + memcpy(&packet->mad.data, rmpp_mad, size); + } else { + data = (void *) rmpp_mad + offset; + if (flags & IB_MGMT_RMPP_FLAG_LAST) + size = len; + else + size = sizeof(*rmpp_mad) - offset; + seg = kmalloc(sizeof(struct ib_mad_multipacket_seg) + + sizeof(struct ib_rmpp_mad) - offset, + GFP_KERNEL); + if (!seg) + return -ENOMEM; + memcpy(seg->data, data, size); + list_add_tail(&seg->list, &packet->seg_list); + } + len -= size; + } + return 0; +} + +static struct ib_umad_packet *alloc_packet(void) +{ + struct ib_umad_packet *packet; + int length = sizeof *packet + sizeof(struct ib_mad); + + packet = kzalloc(length, GFP_KERNEL); + if (!packet) { + printk(KERN_ERR "alloc_packet: mem alloc failed for length %d\n", + length); + return NULL; + } + INIT_LIST_HEAD(&packet->seg_list); + return packet; +} + +static void free_packet(struct ib_umad_packet *packet) +{ + struct ib_mad_multipacket_seg *seg, *tmp; + + list_for_each_entry_safe(seg, tmp, &packet->seg_list, list) { + list_del(&seg->list); + kfree(seg); + } + kfree(packet); +} + static void send_handler(struct ib_mad_agent *agent, struct ib_mad_send_wc *send_wc) { @@ -187,7 +269,7 @@ static void send_handler(struct ib_mad_a ib_free_send_mad(packet->msg); if (send_wc->status == IB_WC_RESP_TIMEOUT_ERR) { - timeout = kzalloc(sizeof *timeout + IB_MGMT_MAD_HDR, GFP_KERNEL); + timeout = alloc_packet(); if (!timeout) goto out; @@ -198,40 +280,14 @@ static void send_handler(struct ib_mad_a sizeof (struct ib_mad_hdr)); if (!queue_packet(file, agent, timeout)) - return; + return; + else + free_packet(timeout); } out: kfree(packet); } -static struct ib_umad_packet *alloc_packet(int buf_size) -{ - struct ib_umad_packet *packet; - int length = sizeof *packet + buf_size; - - if (length >= PAGE_SIZE) - packet = (void *)__get_free_pages(GFP_KERNEL, long_log2(roundup_pow_of_two(length)) - PAGE_SHIFT); - else - packet = kmalloc(length, GFP_KERNEL); - - if (!packet) - return NULL; - - memset(packet, 0, length); - return packet; -} - -static void free_packet(struct ib_umad_packet *packet) -{ - int length = packet->length + sizeof *packet; - if (length >= PAGE_SIZE) - free_pages((unsigned long) packet, long_log2(roundup_pow_of_two(length)) - PAGE_SHIFT); - else - kfree(packet); -} - - - static void recv_handler(struct ib_mad_agent *agent, struct ib_mad_recv_wc *mad_recv_wc) { @@ -243,13 +299,16 @@ static void recv_handler(struct ib_mad_a goto out; length = mad_recv_wc->mad_len; - packet = alloc_packet(length); + packet = alloc_packet(); if (!packet) goto out; packet->length = length; - ib_coalesce_recv_mad(mad_recv_wc, packet->mad.data); + if (copy_recv_mad(mad_recv_wc, packet)) { + free_packet(packet); + goto out; + } packet->mad.hdr.status = 0; packet->mad.hdr.length = length + sizeof (struct ib_user_mad); @@ -278,6 +337,7 @@ static ssize_t ib_umad_read(struct file size_t count, loff_t *pos) { struct ib_umad_file *file = filp->private_data; + struct ib_mad_multipacket_seg *seg; struct ib_umad_packet *packet; ssize_t ret; @@ -304,18 +364,42 @@ static ssize_t ib_umad_read(struct file spin_unlock_irq(&file->recv_lock); - if (count < packet->length + sizeof (struct ib_user_mad)) { - /* Return length needed (and first RMPP segment) if too small */ - if (copy_to_user(buf, &packet->mad, - sizeof (struct ib_user_mad) + sizeof (struct ib_mad))) - ret = -EFAULT; - else - ret = -ENOSPC; - } else if (copy_to_user(buf, &packet->mad, - packet->length + sizeof (struct ib_user_mad))) + if (copy_to_user(buf, &packet->mad, + sizeof(struct ib_user_mad) + sizeof(struct ib_mad))) { ret = -EFAULT; - else + goto err; + } + + if (count < packet->length + sizeof (struct ib_user_mad)) + /* User buffer too small. Return first RMPP segment (which + * includes RMPP message length). + */ + ret = -ENOSPC; + else if (packet->length <= sizeof(struct ib_mad)) + ret = packet->length + sizeof(struct ib_user_mad); + else { + int len = packet->length - sizeof(struct ib_mad); + struct ib_rmpp_mad *rmpp_mad = + (struct ib_rmpp_mad *) packet->mad.data; + int max_seg_payload = sizeof(struct ib_mad) - + data_offset(rmpp_mad->mad_hdr.mgmt_class); + int seg_payload; + /* multipacket RMPP MAD message. Copy remainder of message. + * Note that last segment may have a shorter payload. + */ + buf += sizeof(struct ib_user_mad) + sizeof(struct ib_mad); + list_for_each_entry(seg, &packet->seg_list, list) { + seg_payload = min_t(int, len, max_seg_payload); + if (copy_to_user(buf, seg->data, seg_payload)) { + ret = -EFAULT; + goto err; + } + buf += seg_payload; + len -= seg_payload; + } ret = packet->length + sizeof (struct ib_user_mad); + } +err: if (ret < 0) { /* Requeue packet */ spin_lock_irq(&file->recv_lock); @@ -339,6 +423,8 @@ static ssize_t ib_umad_write(struct file __be64 *tid; int ret, length, hdr_len, copy_offset; int rmpp_active, has_rmpp_header; + int max_seg_payload; + struct ib_mad_multipacket_seg *seg; if (count < sizeof (struct ib_user_mad) + IB_MGMT_RMPP_HDR) return -EINVAL; @@ -415,6 +501,11 @@ static ssize_t ib_umad_write(struct file goto err_ah; } + if (!rmpp_active && length > sizeof(struct ib_mad)) { + ret = -EINVAL; + goto err_ah; + } + packet->msg = ib_create_send_mad(agent, be32_to_cpu(packet->mad.hdr.qpn), 0, rmpp_active, @@ -432,12 +523,39 @@ static ssize_t ib_umad_write(struct file /* Copy MAD headers (RMPP header in place) */ memcpy(packet->msg->mad, packet->mad.data, IB_MGMT_MAD_HDR); - /* Now, copy rest of message from user into send buffer */ + /* complete copying first 256 bytes of message into send buffer */ if (copy_from_user(packet->msg->mad + copy_offset, buf + sizeof (struct ib_user_mad) + copy_offset, - length - copy_offset)) { + min_t(int, length, sizeof(struct ib_mad)) - copy_offset)) { ret = -EFAULT; - goto err_msg; + goto err_ah; + } + + /* if multipacket, copy remainder of send message from user to multipacket list */ + length -= sizeof(struct ib_mad); + buf += sizeof (struct ib_user_mad) + sizeof(struct ib_mad); + max_seg_payload = sizeof(struct ib_mad) - + data_offset(rmpp_mad->mad_hdr.mgmt_class); + while (length > 0) { + int seg_payload = min_t(int, length, max_seg_payload); + seg = kzalloc(sizeof(struct ib_mad_multipacket_seg) + + max_seg_payload, GFP_KERNEL); + if (!seg) { + printk(KERN_ERR "ib_umad_write: " + "mem alloc failed for length %d\n", + sizeof(struct ib_mad_multipacket_seg) + + max_seg_payload); + ret = -ENOMEM; + goto err_msg; + } + + if (copy_from_user(seg->data, buf, seg_payload)) { + ret = -EFAULT; + goto err_msg; + } + ib_append_to_multipacket_mad(packet->msg, seg); + buf += seg_payload; + length -= seg_payload; } /* -- MST From swise at opengridcomputing.com Tue Dec 6 11:05:48 2005 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 06 Dec 2005 13:05:48 -0600 Subject: [openib-general] ISER question Message-ID: <1133895948.27598.41.camel@linux.site> Is there iscsi initiator code somewhere that uses the infiniband/ulp/iser module? Thanks, Steve. From caitlinb at broadcom.com Tue Dec 6 11:13:29 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Tue, 6 Dec 2005 11:13:29 -0800 Subject: [openib-general] [kDAPL]How to register a vmalloc() allocated buffer Message-ID: <54AD0F12E08D1541B826BE97C98F99F10C2A1A@NT-SJCA-0751.brcm.ad.broadcom.com> openib-general-bounces at openib.org wrote: > On Tue, Dec 06, 2005 at 08:52:13PM +0800, Ian Jiang wrote: >> Hi James, >> You are always so kind! >> Now I have a question about reading a buffer of a application in >> user space. Is it the only way to use the uDAPL? >> I used to have an idea like this: >> The application in user space gives the virtual start address and >> length of its data buffer to a kernel module program. This kernel >> program acts as a application of the kDAPL and registers the user >> space data buffer with the kDAPl, > > Ian, > If you are doing this with OpenIB, my advice is to NOT start > with kDAPL. > AFAICT, kDAPL is going away once any dependencies on it are resolved. > And it's clearly not going to be pushed to kernel.org source trees. > ISTR Dan Bar Dov wrote iSER was no longer dependent on kDAPL > but not sure if that was the only module. > > >> then request a RDMA read operation to complete the data transferring. >> But I think it is not feasible after getting your last reply. Am I >> right? Please give some suggestion and thanks very much! > > In general, a kernel module can map a user space address to a > "DMA Address". OpenIB code has interfaces to register the > "DMA Address" with the IB card. > kDAPL will still be of value for applications that want to minimize their dependencies on the OS while still operating in kernel space (but obviously not as part of *the* kernel). However, agenting user-mode buffers is going to get very OS specific, so this application doesn't seem to be one that would benefit from kDAPL. From ftillier at silverstorm.com Tue Dec 6 11:15:35 2005 From: ftillier at silverstorm.com (Fabian Tillier) Date: Tue, 6 Dec 2005 11:15:35 -0800 Subject: [openib-general] Re: SDP use of CMA In-Reply-To: <20051206185752.GD23088@mellanox.co.il> References: <4395CB02.9010602@ichips.intel.com> <20051206185752.GD23088@mellanox.co.il> Message-ID: <79ae2f320512061115w58304228ib6c7003c71c34b19@mail.gmail.com> On 12/6/05, Michael S. Tsirkin wrote: > Quoting r. Sean Hefty : > > Subject: Re: SDP use of CMA > > > > Tom Tucker wrote: > > > Not to jump in late on this, but why couldn't we just add a protocol > > > parameter to the create_id call. Then it is arbitrarily extensible ala > > > socket(AF_INET, SOCK_STREAM, ) > > > > > > So what I'm specifically suggesting is: > > > > > > struct rdma_cm_id* rdma_create_id(rdma_cm_event_handler* cm_handler, > > > void* context, > > > rdma_cm_proto proto); > > > > This isn't much different that having a separate call to set the protocol. I > > went with a separate protocol API to add in version information as well. > > Changing just the create_id call makes sense. > > > > - Sean > > I also have this notion that it might be a good idea to put the > protocol in the reserved bits in the service id. > > Makes sense? The protocol defines the private data format. SDP already defines its private data format which is different than the CMA's native private data format. The protocol input to the CMA would dictate what SID range would be used (SDP's or the CMA's) and what the private data format would be so that the CMA can properly process incoming requests. - Fab From mst at mellanox.co.il Tue Dec 6 11:25:58 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Dec 2005 21:25:58 +0200 Subject: [openib-general] Re: SDP use of CMA In-Reply-To: <20051206185752.GD23088@mellanox.co.il> References: <20051206185752.GD23088@mellanox.co.il> Message-ID: <20051206192558.GF23088@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Subject: Re: SDP use of CMA > > Quoting r. Sean Hefty : > > Subject: Re: SDP use of CMA > > > > Tom Tucker wrote: > > > Not to jump in late on this, but why couldn't we just add a protocol > > > parameter to the create_id call. Then it is arbitrarily extensible > ala > > > socket(AF_INET, SOCK_STREAM, ) > > > > > > So what I'm specifically suggesting is: > > > > > > struct rdma_cm_id* rdma_create_id(rdma_cm_event_handler* cm_handler, > > > void* context, > > > rdma_cm_proto proto); > > > > This isn't much different that having a separate call to set the > protocol. I > > went with a separate protocol API to add in version information as > well. > > Changing just the create_id call makes sense. > > > > - Sean > > I also have this notion that it might be a good idea to put the > protocol in the reserved bits in the service id. It seems I was too brief. What I'm trying to say here, service ID not only specifies the private data format but also allows demultiplexing. For SDP the SID format is set, but for other protocols we can have if (id->protocol != CMA_SDP_PROTO) return cpu_to_be64(((u64)IB_OPENIB_OUI << 48) + (id->protocol << 16) + ((struct sockaddr_in *) addr)->sin_port); and this way more than one protocol will be able to listen on the same port. > Makes sense? -- MST From sean.hefty at intel.com Tue Dec 6 10:57:28 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 6 Dec 2005 10:57:28 -0800 Subject: [openib-general] Re: SDP use of CMA In-Reply-To: <20051206185752.GD23088@mellanox.co.il> Message-ID: >I also have this notion that it might be a good idea to put the >protocol in the reserved bits in the service id. > >Makes sense? The service ID is an IB specific identifier, not exposed through the CMA. Also, there aren't any "reserved bits" in the service ID. Can you clarify? - Sean From mshefty at ichips.intel.com Tue Dec 6 11:29:01 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 06 Dec 2005 11:29:01 -0800 Subject: [openib-general] Re: SDP use of CMA In-Reply-To: <20051206192558.GF23088@mellanox.co.il> References: <20051206185752.GD23088@mellanox.co.il> <20051206192558.GF23088@mellanox.co.il> Message-ID: <4395E67D.9000703@ichips.intel.com> Michael S. Tsirkin wrote: > It seems I was too brief. > What I'm trying to say here, service ID not only specifies the private > data format but also allows demultiplexing. > > For SDP the SID format is set, but for other protocols we can have > > if (id->protocol != CMA_SDP_PROTO) > return cpu_to_be64(((u64)IB_OPENIB_OUI << 48) + > (id->protocol << 16) + > ((struct sockaddr_in *) addr)->sin_port); > > and this way more than one protocol will be able to listen on the > same port. The latest version of this has the socket protocol as part of the service ID, similar to what you have above. The CMA won't be updated to reflect this until the proposed protocol becomes standard however. - Sean From halr at voltaire.com Tue Dec 6 11:40:03 2005 From: halr at voltaire.com (Hal Rosenstock) Date: Tue, 6 Dec 2005 21:40:03 +0200 Subject: [openib-general] ISER question Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F589AB80@taurus.voltaire.com> Hi Steve, *The Linux open-iscsi initiator is used with the iSER initiator (http://www.open-iscsi.org ). *It needs to be built for iSER transport and produces scsi_transport_iscsi.ko and iscsi_iser.ko We will provide more complete build (and running) instructions on the wiki for this. Do you have an iSER target ? -- Hal ________________________________ From: openib-general-bounces at openib.org on behalf of Steve Wise Sent: Tue 12/6/2005 2:05 PM To: openib-general Subject: [openib-general] ISER question Is there iscsi initiator code somewhere that uses the infiniband/ulp/iser module? Thanks, Steve. _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From tom at opengridcomputing.com Tue Dec 6 12:19:51 2005 From: tom at opengridcomputing.com (Tom Tucker) Date: Tue, 06 Dec 2005 14:19:51 -0600 Subject: [openib-general] cm_add_one events Message-ID: <1133900391.11138.9.camel@trinity.austin.ammasso.com> Sean: Should the IB CM ignore add_one events for all but node_type == IB_NODE_HCA? Thanks, Tom From mshefty at ichips.intel.com Tue Dec 6 12:24:04 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 06 Dec 2005 12:24:04 -0800 Subject: [openib-general] Re: cm_add_one events In-Reply-To: <1133900391.11138.9.camel@trinity.austin.ammasso.com> References: <1133900391.11138.9.camel@trinity.austin.ammasso.com> Message-ID: <4395F364.6080204@ichips.intel.com> Tom Tucker wrote: > Should the IB CM ignore add_one events for all but node_type == > IB_NODE_HCA? My assumption was that if someone wanted to run it on a switch or router for whatever reason, then there's no real reason to prevent it. If we add an iWarp node type, then the CM could filter out those nodes, but it should just fail when trying to register a MAD agent anyway. - Sean From iod00d at hp.com Tue Dec 6 13:00:25 2005 From: iod00d at hp.com (Grant Grundler) Date: Tue, 6 Dec 2005 13:00:25 -0800 Subject: [openib-general] Flash sector size? eh? Message-ID: <20051206210025.GJ21980@esmail.cup.hp.com> Hi, I'm wondering if anyone has a clue what this is about: # ./mstflint/mstflint -d /proc/bus/pci/0084\:05/00.0 -i /root/fw-25208-4_7_400-MHGA28-1T.bin -s b Flash sector size(0x10000) differs from sector size defined in image (0x20000) # Did I grab the wrong firmware image? http://www.mellanox.com/support/firmware_download.php The HCA is : 0084:05:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode) (rev a0) Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode) Flags: bus master, fast devsel, latency 0, IRQ 58 Memory at 00000f2888800000 (64-bit, non-prefetchable) [size=1M] Memory at 00000f2888000000 (64-bit, prefetchable) [size=8M] Memory at 00000f2880000000 (64-bit, prefetchable) [size=128M] Capabilities: [40] Power Management version 2 Capabilities: [48] Vital Product Data Capabilities: [90] Message Signalled Interrupts: 64bit+ Queue=0/5 Enable- Capabilities: [60] #10 [0001] This is running a recent svn openib bits (less than 2 weeks old) on 2.6.14 kernel. I'm messing with firmware because when loading mthca driver, I get: ... GSI 65 (level, low) -> CPU 1 (0x0808) vector 58 unregistered ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) ib_mthca: Initializing 0084:05:00.0 GSI 65 (level, low) -> CPU 0 (0x0008) vector 58 ACPI: PCI Interrupt 0084:05:00.0[A] -> GSI 65 (level, low) -> IRQ 58 ib_mthca 0084:05:00.0: HCA FW version 4.5.0 is old (4.7.0 is current). ib_mthca 0084:05:00.0: If you have problems, try updating your HCA FW. And I like that kind of warning. thanks, grant From halr at voltaire.com Tue Dec 6 13:18:27 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Dec 2005 16:18:27 -0500 Subject: [openib-general] Re: ip_dev_find resolution? In-Reply-To: <20051206175652.GC22860@mellanox.co.il> References: <20051206174355.GB21980@esmail.cup.hp.com> <20051206175652.GC22860@mellanox.co.il> Message-ID: <1133903906.4587.23311.camel@hal.voltaire.com> On Tue, 2005-12-06 at 12:56, Michael S. Tsirkin wrote: > Actually, I wander whether instead of ip_dev_find we can just > > read_lock(&dev_base_lock); > for (dev = dev_base; dev; dev = dev->next) { > > and check the ip address? working off the ip_ptr and ip6_ptr ? > If this works, this has the advantage of supporting IPv6 as well. This was introduced at one point and we subsequently changed to ip_dev_find. I forget exactly why this was but can dig it out if no one recalls. -- Hal From steve.apo at googlemail.com Tue Dec 6 13:57:52 2005 From: steve.apo at googlemail.com (Steven Wooding) Date: Tue, 6 Dec 2005 21:57:52 +0000 Subject: [openib-general] UC connection server In-Reply-To: <79ae2f320512061018m61a445d9xeee4e37bfb326974@mail.gmail.com> References: <2cfcf21e0512060805u466c9d83m@mail.gmail.com> <4395CDD6.1080601@ichips.intel.com> <79ae2f320512061018m61a445d9xeee4e37bfb326974@mail.gmail.com> Message-ID: <2cfcf21e0512061357h5d1a90h@mail.gmail.com> Hi Fabain, > I think what Steve wants to do is issue a REQ, send it to the PC, but > have the path record go to his embedded device (which has a different > LID). The CM protocol supports this, but the implementation of the CM > looks at the path record to determine the destination of the CM MADs. > Supporting this would require some way for the user to set the target > of the CM MADs independently of the path information contained in the > REQ. Adding an optional extra path record for the CM path might do > the trick. That's it in a nutshell really. I don't know how useful such a feature would be in the wider IB community. We've been forced into this position by a vendor not following the standard. I wanted to check with you guys whether there was a quick solution that was ready to fly. It seems that this feature would need to go into the openib drivers, which we don't have time or money to do. We do have a backup solution from the vendor, but its non-standard and I was trying to keep our side of the interface use the standard. Anyway, thanks for your suggests. It's all useful info. Regards, Steve. From panda at cse.ohio-state.edu Tue Dec 6 21:21:37 2005 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed, 7 Dec 2005 00:21:37 -0500 (EST) Subject: [openib-general] Announcing the release of MVAPICH 0.9.6 (MPI-1 over InfiniBand and other RDMA Interconnects) Message-ID: <200512070521.jB75LbnM010303@xi.cse.ohio-state.edu> As MVAPICH software keeps on empowering several clusters in the TOP500 list (including the #5 ranked Sandia Thunderbird), the MVAPICH team is aiming to push the performance and scalability of InfiniBand clusters to the next level!! The team is pleased to announce the release of MVAPICH 0.9.6 for the following platforms, OS, compilers, and InfiniBand adapters: - Platforms: EM64T, Opteron, IA-32 and Mac G5 - Operating Systems: Linux, Solaris and Mac OSX - Compilers: gcc, intel, pathscale and pgi - InfiniBand Adapters: Mellanox adapters with PCI-X and PCI-Express (SDR and DDR with mem-full and mem-free cards) In addition to delivering high performance with VAPI interface, MVAPICH 0.9.6 also provides uDAPL support for portability across networks and platforms with highest performance. The uDAPL interface of this release has been tested with InfiniBand (OpenIB SCM/Gen2 uDAPL and Solaris IBTL/uDAPL) and Myrinet (DAPL-GM beta). Starting with this release, MVAPICH enables InfiniBand support for Solaris environment through uDAPL support. MVAPICH 0.9.6 is being distributed as a single integrated package (with MPICH 1.2.7 and MVICH). It is available under BSD license. This release has the following features: - Designs for scaling InfiniBand clusters to multi-thousand nodes with highest performance and reduced memory usage - Optimized implementation of Rendezvous protocol (RDMA Read and RDMA Write) for better computation-communication overlap and progress - Two modes of communication progress (polling and blocking) - Resource-aware registration cache - Optimized intra-node communication for Bus-based and NUMA-based systems with processor affinity - High performance and scalable collective communication support (Broadcast support using IB hardware multicast mechanism; RDMA-based barrier, all-to-all and all-gather) - Multi-rail communication support (multiple ports per adapter and multiple adapters) - Shared library support - ROMIO support - uDAPL support for portability across networks and OS (tested for InfiniBand on Linux and Solaris; and Myrinet) - Scalable job start-up with MPD - TotalView debugger support - Optimized and tuned for the above platforms and different network interfaces (PCI-X and PCI-Express with SDR and DDR) - Support for multiple compilers (gcc, icc, pathscale and pgi) - Single code base for all of the above platforms and OS - Integrated and easy-to-use build script for installing the code on various platforms, OS, compilers, Devices, and InfiniBand adapters - Incorporates a set of runtime and compiler time tunable parameters for convenient tuning on large-scale clusters Other features of this release include: - Excellent performance: Sample performance numbers include: EM64T, PCI-Ex, IBA-DDR: - 3.09 microsec one-way latency (4 bytes) - 1475 MB/sec unidirectional bandwidth - 2661 MB/sec bidirectional bandwidth EM64T, PCI-Ex, IBA-SDR: - 3.52 microsec one-way latency (4 bytes) - 968 MB/sec unidirectional bandwidth with single-rail and 1497 MB/sec with multi-rail - 1781 MB/sec bidirectional bandwidth with single-rail and 2721 MB/sec with multi-rail Opteron, PCI-Ex, IBA-SDR: - 3.42 microsec one-way latency (4 bytes) - 968 MB/sec unidirectional bandwidth with single-rail - 1865 MB/sec bidirectional bandwidth with single-rail Solaris uDAPL/IBTL on Opteron, PCI-X, IBA-SDR: - 5.38 microsec one-way latency (4 bytes) - 651 MB/sec unidirectional bandwidth - 808 MB/sec bidirectional bandwidth OpenIB/Gen2 uDAPL on Opteron, PCI-Ex, IBA-SDR: - 3.39 microsec one-way latency (4 bytes) - 968 MB/sec unidirectional bandwidth - 1890 MB/sec bidirectional bandwidth OpenIB/Gen2 uDAPL on EM64T, PCI-Ex, IBA-SDR: - 3.43 microsec one-way latency (4 bytes) - 968 MB/sec unidirectional bandwidth - 1912 MB/sec bidirectional bandwidth Performance numbers for all other platforms, system configurations and operations can be viewed by visiting `Performance Results' section of the project's web page. - A set of benchmarks to evaluate point-to-point and collective operations - An enhanced and detailed `User Guide' to assist users: - to install this package on different platforms with both interfaces (VAPI and uDAPL) and different options - to vary different parameters of the MPI installation to extract maximum performance and achieve scalability, especially on large-scale systems. You are welcome to download the MVAPICH 0.9.6 package and access relevant information from the following URL: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/ A successive version with support for OpenIB/Gen2 will be available soon. All feedbacks, including bug reports and hints for performance tuning, are welcome. Please send an e-mail to mvapich-help at cse.ohio-state.edu. Thanks, MVAPICH Team at OSU/NBCL ---------- PS: If you would like to be removed from this mailing list, please end an e-mail to mvapich_request at cse.ohio-state.edu. ====================================================================== MVAPICH/MVAPICH2 project is currently supported with funding from U.S. National Science Foundation, U.S. DOE Office of Science, Mellanox, Intel, Cisco Systems, Sun Microsystems, and Linux Networx; and with equipment support from AMD, Apple, Appro, IBM, Intel, Mellanox, Microway, PathScale, SilverStorm and Sun Microsystems. Other technology partner includes Etnus. ====================================================================== From ianjiang.ict at gmail.com Wed Dec 7 00:13:04 2005 From: ianjiang.ict at gmail.com (Ian Jiang) Date: Wed, 7 Dec 2005 16:13:04 +0800 Subject: [openib-general] [kDAPL]How to register a vmalloc() allocated buffer In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F10C2A1A@NT-SJCA-0751.brcm.ad.broadcom.com> References: <54AD0F12E08D1541B826BE97C98F99F10C2A1A@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <7b2fa1820512070013n5f6a41d1ie39049452f5c4083@mail.gmail.com> My question originally come from the iSER. I used to think that a data buffer described in a iSCSI data PDU is in the user space, but now I am afraid that it was not correct. openib-general-bounces at openib.org wrote: > > On Tue, Dec 06, 2005 at 08:52:13PM +0800, Ian Jiang wrote: > >> Hi James, > >> You are always so kind! > >> Now I have a question about reading a buffer of a application in > >> user space. Is it the only way to use the uDAPL? > >> I used to have an idea like this: > >> The application in user space gives the virtual start address and > >> length of its data buffer to a kernel module program. This kernel > >> program acts as a application of the kDAPL and registers the user > >> space data buffer with the kDAPl, > > > > Ian, > > If you are doing this with OpenIB, my advice is to NOT start > > with kDAPL. > > AFAICT, kDAPL is going away once any dependencies on it are resolved. > > And it's clearly not going to be pushed to kernel.org source trees. > > ISTR Dan Bar Dov wrote iSER was no longer dependent on kDAPL > > but not sure if that was the only module. > > > > > >> then request a RDMA read operation to complete the data transferring. > >> But I think it is not feasible after getting your last reply. Am I > >> right? Please give some suggestion and thanks very much! > > > > In general, a kernel module can map a user space address to a > > "DMA Address". OpenIB code has interfaces to register the > > "DMA Address" with the IB card. > > > > > kDAPL will still be of value for applications that want to minimize > their dependencies on the OS while still operating in kernel space > (but obviously not as part of *the* kernel). > > However, agenting user-mode buffers is going to get very OS > specific, so this application doesn't seem to be one that > would benefit from kDAPL. > > -- Ian Jiang ianjiang.ict at gmail.com Laboratory of Spatial Information Technology Division of System Architecture Institute of Computing Technology Chinese Academy of Sciences -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Wed Dec 7 00:22:20 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Dec 2005 10:22:20 +0200 Subject: [openib-general] Re: ip_dev_find resolution? In-Reply-To: <1133903906.4587.23311.camel@hal.voltaire.com> References: <1133903906.4587.23311.camel@hal.voltaire.com> Message-ID: <20051207082220.GK21035@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: ip_dev_find resolution? > > On Tue, 2005-12-06 at 12:56, Michael S. Tsirkin wrote: > > Actually, I wander whether instead of ip_dev_find we can just > > > > read_lock(&dev_base_lock); > > for (dev = dev_base; dev; dev = dev->next) { > > > > and check the ip address? > > working off the ip_ptr and ip6_ptr ? Yes. > > If this works, this has the advantage of supporting IPv6 as well. > > This was introduced at one point and we subsequently changed to > ip_dev_find. I forget exactly why this was but can dig it out if no one > recalls. Please do. -- MST From danb at voltaire.com Wed Dec 7 00:38:24 2005 From: danb at voltaire.com (Dan Bar Dov) Date: Wed, 7 Dec 2005 10:38:24 +0200 Subject: [openib-general] ISER question Message-ID: The open-iscsi initiator project has the capability to use the infiniband/ulp/iser module. Currently it needs an in-between module called iscsi_iser, but that in-between is being merged into ib_iser, so that open-iscsl will directly interface ib_iser. Please let me know if you need iscsi_iser since its code is not in open-iscsi, nor in openIB. Dan > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Steve Wise > Sent: Tuesday, December 06, 2005 9:06 PM > To: openib-general > Subject: [openib-general] ISER question > > Is there iscsi initiator code somewhere that uses the > infiniband/ulp/iser module? > > Thanks, > > Steve. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Wed Dec 7 01:31:20 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Dec 2005 11:31:20 +0200 Subject: [openib-general] Re: Flash sector size? eh? In-Reply-To: <20051206210025.GJ21980@esmail.cup.hp.com> References: <20051206210025.GJ21980@esmail.cup.hp.com> Message-ID: <20051207093120.GN21035@mellanox.co.il> You need to query the Board ID on card and in image: ./mstflint/mstflint -d q ./mstflint/mstflint -i q Quoting r. Grant Grundler : > Subject: Flash sector size? eh? > > Hi, > I'm wondering if anyone has a clue what this is about: > > # ./mstflint/mstflint -d /proc/bus/pci/0084\:05/00.0 -i > /root/fw-25208-4_7_400-MHGA28-1T.bin -s b > Flash sector size(0x10000) differs from sector size defined in image > (0x20000) > # > > Did I grab the wrong firmware image? > http://www.mellanox.com/support/firmware_download.php > > > The HCA is : > 0084:05:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex > (Tavor compatibility mode) (rev a0) > Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex > (Tavor compatibility mode) > Flags: bus master, fast devsel, latency 0, IRQ 58 > Memory at 00000f2888800000 (64-bit, non-prefetchable) [size=1M] > Memory at 00000f2888000000 (64-bit, prefetchable) [size=8M] > Memory at 00000f2880000000 (64-bit, prefetchable) [size=128M] > Capabilities: [40] Power Management version 2 > Capabilities: [48] Vital Product Data > Capabilities: [90] Message Signalled Interrupts: 64bit+ > Queue=0/5 Enable- > Capabilities: [60] #10 [0001] > > > This is running a recent svn openib bits (less than 2 weeks old) > on 2.6.14 kernel. > > I'm messing with firmware because when loading mthca driver, I get: > ... > GSI 65 (level, low) -> CPU 1 (0x0808) vector 58 unregistered > ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) > ib_mthca: Initializing 0084:05:00.0 > GSI 65 (level, low) -> CPU 0 (0x0008) vector 58 > ACPI: PCI Interrupt 0084:05:00.0[A] -> GSI 65 (level, low) -> IRQ 58 > ib_mthca 0084:05:00.0: HCA FW version 4.5.0 is old (4.7.0 is current). > ib_mthca 0084:05:00.0: If you have problems, try updating your HCA FW. > > And I like that kind of warning. > > thanks, > grant > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > -- MST From juliab at agt.net Wed Dec 7 02:26:00 2005 From: juliab at agt.net (Wanda Jarrett) Date: Wed, 07 Dec 2005 04:26:00 -0600 Subject: [openib-general] Low mortagge ratee approvall Message-ID: <385m943c.6405665@msn.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pemmican.4.gif Type: image/gif Size: 6879 bytes Desc: not available URL: From yael at mellanox.co.il Wed Dec 7 03:59:14 2005 From: yael at mellanox.co.il (Yael Kalka) Date: Wed, 7 Dec 2005 13:59:14 +0200 Subject: [openib-general] RE: [PATCH] OpenSM: SA SMInfoRecord should support GetTable as well as Get method Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30E2473@mtlexch01.mtl.com> Hi Hal, If you look at the code - currently the What is returned is a record with the local SMInfo record. The code should be fixed to return a table, or a requested SMInfo record - not only of the local port. So currently - a table is not returned, and the code isn't correct with or without the patch... This issue should be added to our to-do list. Yael -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Tuesday, December 06, 2005 4:06 PM To: Yael Kalka Cc: openib-general at openib.org Subject: [PATCH] OpenSM: SA SMInfoRecord should support GetTable as well as Get method OpenSM: SA SMInfoRecord should support GetTable as well as Get method Signed-off-by: Hal Rosenstock Index: osm_sa_sminfo_record.c =================================================================== --- osm_sa_sminfo_record.c (revision 4323) +++ osm_sa_sminfo_record.c (working copy) @@ -165,7 +165,8 @@ osm_smir_rcv_process( CL_ASSERT( p_sa_mad->attr_id == IB_MAD_ATTR_SMINFO_RECORD ); - if (p_sa_mad->method != IB_MAD_METHOD_GET) + if ( (p_sa_mad->method != IB_MAD_METHOD_GET) && + (p_sa_mad->method != IB_MAD_METHOD_GETTABLE) ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, "osm_smir_rcv_process: ERR 2804: " From halr at voltaire.com Wed Dec 7 04:05:48 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Dec 2005 07:05:48 -0500 Subject: [openib-general] [PATCH] OpenSM: SubnAdmGet PathRecord should assume NumbPath of 1 Message-ID: <1133957147.4587.28533.camel@hal.voltaire.com> OpenSM: SubnAdmGet PathRecord should assume NumbPath of 1 (1.2 erratum) Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_path_record.c =================================================================== --- opensm/osm_sa_path_record.c (revision 4335) +++ opensm/osm_sa_path_record.c (working copy) @@ -709,13 +709,15 @@ __osm_pr_rcv_get_lid_pair_path( static void __osm_pr_rcv_get_port_pair_paths( IN osm_pr_rcv_t* const p_rcv, - IN const ib_path_rec_t* const p_pr, + IN const osm_madw_t* const p_madw, IN const osm_port_t* const p_req_port, IN const osm_port_t* const p_src_port, IN const osm_port_t* const p_dest_port, IN const ib_net64_t comp_mask, IN cl_qlist_t* const p_list ) { + const ib_path_rec_t* p_pr; + const ib_sa_mad_t* p_sa_mad; osm_pr_item_t* p_pr_item; uint16_t src_lid_min_ho; uint16_t src_lid_max_ho; @@ -752,6 +754,9 @@ __osm_pr_rcv_get_port_pair_paths( goto Exit; } + p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); + p_pr = (ib_path_rec_t*)ib_sa_mad_get_payload_ptr( p_sa_mad ); + /* We shouldn't be here if the paths are disqualified in some way... Thus, we assume every possible connection is valid. @@ -842,10 +847,14 @@ __osm_pr_rcv_get_port_pair_paths( preference = 0; path_num = 0; - if( comp_mask & IB_PR_COMPMASK_NUMBPATH ) - iterations = p_pr->num_path & 0x7F; + /* If SubnAdmGet, assume NumbPaths 1 (1.2 erratum) */ + if (p_sa_mad->method != IB_MAD_METHOD_GET) + if( comp_mask & IB_PR_COMPMASK_NUMBPATH ) + iterations = p_pr->num_path & 0x7F; + else + iterations = (uintn_t)(-1); else - iterations = (uintn_t)(-1); + iterations = 1; while( path_num < iterations ) { @@ -1101,7 +1110,7 @@ __osm_pr_rcv_get_end_points( static void __osm_pr_rcv_process_world( IN osm_pr_rcv_t* const p_rcv, - IN const ib_path_rec_t* const p_pr, + IN const osm_madw_t* const p_madw, IN const osm_port_t* const requestor_port, IN const ib_net64_t comp_mask, IN cl_qlist_t* const p_list ) @@ -1128,7 +1137,7 @@ __osm_pr_rcv_process_world( p_src_port = (osm_port_t*)cl_qmap_head( p_tbl ); while( p_src_port != (osm_port_t*)cl_qmap_end( p_tbl ) ) { - __osm_pr_rcv_get_port_pair_paths( p_rcv, p_pr, requestor_port, p_src_port, + __osm_pr_rcv_get_port_pair_paths( p_rcv, p_madw, requestor_port, p_src_port, p_dest_port, comp_mask, p_list ); p_src_port = (osm_port_t*)cl_qmap_next( &p_src_port->map_item ); @@ -1145,7 +1154,7 @@ __osm_pr_rcv_process_world( static void __osm_pr_rcv_process_half( IN osm_pr_rcv_t* const p_rcv, - IN const ib_path_rec_t* const p_pr, + IN const osm_madw_t* const p_madw, IN const osm_port_t* const requestor_port, IN const osm_port_t* const p_src_port, IN const osm_port_t* const p_dest_port, @@ -1172,7 +1181,7 @@ __osm_pr_rcv_process_half( p_port = (osm_port_t*)cl_qmap_head( p_tbl ); while( p_port != (osm_port_t*)cl_qmap_end( p_tbl ) ) { - __osm_pr_rcv_get_port_pair_paths( p_rcv, p_pr, requestor_port, p_src_port, + __osm_pr_rcv_get_port_pair_paths( p_rcv, p_madw , requestor_port, p_src_port, p_port, comp_mask, p_list ); p_port = (osm_port_t*)cl_qmap_next( &p_port->map_item ); } @@ -1185,7 +1194,7 @@ __osm_pr_rcv_process_half( p_port = (osm_port_t*)cl_qmap_head( p_tbl ); while( p_port != (osm_port_t*)cl_qmap_end( p_tbl ) ) { - __osm_pr_rcv_get_port_pair_paths( p_rcv, p_pr, requestor_port, p_port, + __osm_pr_rcv_get_port_pair_paths( p_rcv, p_madw, requestor_port, p_port, p_dest_port, comp_mask, p_list ); p_port = (osm_port_t*)cl_qmap_next( &p_port->map_item ); } @@ -1199,7 +1208,7 @@ __osm_pr_rcv_process_half( static void __osm_pr_rcv_process_pair( IN osm_pr_rcv_t* const p_rcv, - IN const ib_path_rec_t* const p_pr, + IN const osm_madw_t* const p_madw, IN const osm_port_t* const requestor_port, IN const osm_port_t* const p_src_port, IN const osm_port_t* const p_dest_port, @@ -1208,7 +1217,7 @@ __osm_pr_rcv_process_pair( { OSM_LOG_ENTER( p_rcv->p_log, __osm_pr_rcv_process_pair ); - __osm_pr_rcv_get_port_pair_paths( p_rcv, p_pr, requestor_port, p_src_port, + __osm_pr_rcv_get_port_pair_paths( p_rcv, p_madw, requestor_port, p_src_port, p_dest_port, comp_mask, p_list ); OSM_LOG_EXIT( p_rcv->p_log ); @@ -1413,7 +1422,8 @@ __osm_pr_match_mgrp_attributes( goto Exit; } - if( comp_mask & IB_PR_COMPMASK_NUMBPATH ) + /* If SubnAdmGet, assume NumbPaths of 1 (1.2 erratum) */ + if( ( comp_mask & IB_PR_COMPMASK_NUMBPATH ) && ( p_sa_mad->method != IB_MAD_METHOD_GET ) ) { if( ( p_pr->num_path & 0x7f ) == 0 ) goto Exit; @@ -1513,7 +1523,7 @@ __osm_pr_rcv_respond( /* * C15-0.1.30: - * If we do a SubAdmGet and got more than one record it is an error ! + * If we do a SubnAdmGet and got more than one record it is an error ! */ if ( (p_rcvd_mad->method == IB_MAD_METHOD_GET) && (num_rec > 1)) { @@ -1720,22 +1730,22 @@ osm_pr_rcv_process( if( p_src_port ) { if( p_dest_port ) - __osm_pr_rcv_process_pair( p_rcv, p_pr, requestor_port, p_src_port, p_dest_port, + __osm_pr_rcv_process_pair( p_rcv, p_madw, requestor_port, p_src_port, p_dest_port, p_sa_mad->comp_mask, &pr_list ); else - __osm_pr_rcv_process_half( p_rcv, p_pr, requestor_port, p_src_port, NULL, + __osm_pr_rcv_process_half( p_rcv, p_madw, requestor_port, p_src_port, NULL, p_sa_mad->comp_mask, &pr_list ); } else { if( p_dest_port ) - __osm_pr_rcv_process_half( p_rcv, p_pr, requestor_port, NULL, p_dest_port, + __osm_pr_rcv_process_half( p_rcv, p_madw, requestor_port, NULL, p_dest_port, p_sa_mad->comp_mask, &pr_list ); else /* Katie, bar the door! */ - __osm_pr_rcv_process_world( p_rcv, p_pr, requestor_port, + __osm_pr_rcv_process_world( p_rcv, p_madw, requestor_port, p_sa_mad->comp_mask, &pr_list ); } goto Unlock; From halr at voltaire.com Wed Dec 7 04:14:39 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Dec 2005 07:14:39 -0500 Subject: [openib-general] RE: [PATCH] OpenSM: SA SMInfoRecord should support GetTable as well as Get method In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30E2473@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30E2473@mtlexch01.mtl.com> Message-ID: <1133957678.4587.28597.camel@hal.voltaire.com> Hi Yael, On Wed, 2005-12-07 at 06:59, Yael Kalka wrote: > Hi Hal, > If you look at the code - currently the What is returned is a record > with the > local SMInfo record. > The code should be fixed to return a table, or a requested SMInfo record > - > not only of the local port. > So currently - a table is not returned, and the code isn't correct with > or > without the patch... > This issue should be added to our to-do list. Are you saying that because of that the GetTable should not be accepted until this is fixed or is it a separate issue ? There is more to do here as you point out and I will track this on the TODO list. Do you have other things for this list (see management/osm/doc/todo) ? -- Hal > Yael > > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Tuesday, December 06, 2005 4:06 PM > To: Yael Kalka > Cc: openib-general at openib.org > Subject: [PATCH] OpenSM: SA SMInfoRecord should support GetTable as well > as Get method > > > OpenSM: SA SMInfoRecord should support GetTable as well as Get method > > Signed-off-by: Hal Rosenstock > > Index: osm_sa_sminfo_record.c > =================================================================== > --- osm_sa_sminfo_record.c (revision 4323) > +++ osm_sa_sminfo_record.c (working copy) > @@ -165,7 +165,8 @@ osm_smir_rcv_process( > > CL_ASSERT( p_sa_mad->attr_id == IB_MAD_ATTR_SMINFO_RECORD ); > > - if (p_sa_mad->method != IB_MAD_METHOD_GET) > + if ( (p_sa_mad->method != IB_MAD_METHOD_GET) && > + (p_sa_mad->method != IB_MAD_METHOD_GETTABLE) ) > { > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > "osm_smir_rcv_process: ERR 2804: " From yael at mellanox.co.il Wed Dec 7 04:36:16 2005 From: yael at mellanox.co.il (Yael Kalka) Date: Wed, 7 Dec 2005 14:36:16 +0200 Subject: [openib-general] RE: [PATCH] OpenSM: SA SMInfoRecord should support GetTable as well as Get method Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30E2474@mtlexch01.mtl.com> The GetTable can be accepted. As I said - currently it doesn't mean anything. As for the list - the client re-registration issue can be added. I can't think of anything else right now, I will let you know when I have something to add there. Yael -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Wednesday, December 07, 2005 2:15 PM To: Yael Kalka Cc: openib-general at openib.org Subject: RE: [PATCH] OpenSM: SA SMInfoRecord should support GetTable as well as Get method Hi Yael, On Wed, 2005-12-07 at 06:59, Yael Kalka wrote: > Hi Hal, > If you look at the code - currently the What is returned is a record > with the > local SMInfo record. > The code should be fixed to return a table, or a requested SMInfo record > - > not only of the local port. > So currently - a table is not returned, and the code isn't correct with > or > without the patch... > This issue should be added to our to-do list. Are you saying that because of that the GetTable should not be accepted until this is fixed or is it a separate issue ? There is more to do here as you point out and I will track this on the TODO list. Do you have other things for this list (see management/osm/doc/todo) ? -- Hal > Yael > > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Tuesday, December 06, 2005 4:06 PM > To: Yael Kalka > Cc: openib-general at openib.org > Subject: [PATCH] OpenSM: SA SMInfoRecord should support GetTable as well > as Get method > > > OpenSM: SA SMInfoRecord should support GetTable as well as Get method > > Signed-off-by: Hal Rosenstock > > Index: osm_sa_sminfo_record.c > =================================================================== > --- osm_sa_sminfo_record.c (revision 4323) > +++ osm_sa_sminfo_record.c (working copy) > @@ -165,7 +165,8 @@ osm_smir_rcv_process( > > CL_ASSERT( p_sa_mad->attr_id == IB_MAD_ATTR_SMINFO_RECORD ); > > - if (p_sa_mad->method != IB_MAD_METHOD_GET) > + if ( (p_sa_mad->method != IB_MAD_METHOD_GET) && > + (p_sa_mad->method != IB_MAD_METHOD_GETTABLE) ) > { > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > "osm_smir_rcv_process: ERR 2804: " From mst at mellanox.co.il Wed Dec 7 05:11:05 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Dec 2005 15:11:05 +0200 Subject: [openib-general] [PATCH] ipoib_multicast: IPOIB_FLAG_ADMIN_UP test racy Message-ID: <20051207131105.GS21035@mellanox.co.il> Hello, Roland! Here's a simple race scenario. device is up. port event triggers flush_task. ipoib_ib_dev_flush (running from the default work queue) calls ipoib_ib_dev_down. This calls ipoib_mcast_stop_thread. This flushes the ipoib workqueue. mcast_task runs on ipoib workqueue, since IPOIB_FLAG_ADMIN_UP is set, this re-starts the mcast task. As a result mcast_task may be running while mcast_stop_thread is scanning the multicast_list, or after that. --- Fix race condition where mcast_task may be running after ipoib_mcast_stop_thread has flushed the workqueue. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.14/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- linux-2.6.14/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (revision 4042) +++ linux-2.6.14/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (working copy) @@ -904,7 +904,7 @@ ipoib_mcast_free(mcast); } - if (test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) + if (test_bit(IPOIB_FLAG_OPER_UP, &priv->flags)) ipoib_mcast_start_thread(dev); } -- MST From Arkady.Kanevsky at netapp.com Wed Dec 7 05:11:53 2005 From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady) Date: Wed, 7 Dec 2005 08:11:53 -0500 Subject: [openib-general] FW: [swg] 12/6 meeting minutes (2nd half) Message-ID: SWG have approved the IP address proposal (v5). Arkady Kanevsky email: arkady at netapp.com Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 > -----Original Message----- > From: Mike Ko [mailto:mako at almaden.ibm.com] > Sent: Tuesday, December 06, 2005 6:41 PM > To: swg at infinibandta.org > Subject: [swg] 12/6 meeting minutes (2nd half) > > We had a brief discussion on the revised slide deck from > Arkady on the RDMA-Aware SID and CM REQ Message Extension and > there were no disagreements on the direction. > > Arkady Kanevsky from NetApp made the following motion: > "Create a new Annex for RDMA aware ULPs that includes: > a. port mapping between IETF protocols ports and IB SIDs b. > CM REQ message private data format extensions c. CM usage for > RDMA aware ULPs" > > Ted Kim from Sun seconded the motion. > > Vote count: > Against: 0 > Abstain: 0 > > Motion passed. > > We continued with a discussion on the slide deck from Mike Ko > on supporting iSER on InfiniBand. There were disagreements > on the merits on the need for Connection Preference bits. We > decided to move forward with the rest of the suggestions from > Mike and postpone the decision on the CP bits until the next meeting. > > Mike Ko from IBM made the following motion: > "Create a new annex to support iSER on InfiniBand release 1.1 > and 1.2 as represented in Mike Ko's slidedeck dated December > 1 but not including the support for Connection Preference > bits, and also making ARI a must requirement for CM REJ." > > Yaron Haviv from Voltaire seconded the motion. > > Vote count: > Against: 0 > Abstain: 0 > > Motion passed. > > The meeting was adjourned after the vote. > > Mike > From kkmihdlcz at go.com Wed Dec 7 06:10:46 2005 From: kkmihdlcz at go.com (Victoria Wolf) Date: Wed, 07 Dec 2005 14:10:46 -0000 Subject: [openib-general] Just reduce it. Message-ID: <116cd01c4db9b$e2612e50$492a323a@billgates> You've seen it on "60 Minutes" and read the BBC News report -- now find out just what everyone is talking about.

# Suppress your appetite and feel full and satisfied all day long
# Increase your energy levels
# Lose excess weight
# Increase your metabolism
# Burn body fat
# Burn calories
# Attack obesity
And more..

http://healthcarehoodia.com/

# Suitable for vegetarians and vegans
# MAINTAIN your weight loss
# Make losing weight a sure guarantee
# Look your best during the summer months

http://healthcarehoodia.com/

Regards, Dr. Victoria Wolf -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Wed Dec 7 07:43:48 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Dec 2005 17:43:48 +0200 Subject: [openib-general] Re: mthca_qp patch In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3B8D6CD@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3B8D6CD@mtlexch01.mtl.com> Message-ID: <20051207154348.GZ21035@mellanox.co.il> Several fixes in mthca: 1. Add limit checking on rd_atomic and dest_rd_atomic attributes: especially for max_dest_rd_atomic, a value that is larger than HCA capability can cause RDB overflow and corruption of another QP. 2. Fix typo in rd_atomic calculation: ffs(x) - 1 does not find the next power of 2, fls(x - 1) does. 3. Only change the driver's copy of the QP attributes in modify QP after checking the modify QP command completed successfully. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Index: linux-kernel/drivers/infiniband/hw/mthca/mthca_qp.c =================================================================== --- linux-kernel.orig/drivers/infiniband/hw/mthca/mthca_qp.c +++ linux-kernel/drivers/infiniband/hw/mthca/mthca_qp.c @@ -589,6 +589,20 @@ int mthca_modify_qp(struct ib_qp *ibqp, return -EINVAL; } + if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC && + attr->max_rd_atomic > dev->limits.max_qp_init_rdma) { + mthca_dbg(dev, "Max rdma_atomic as initiator (%u) too large. max is %d\n", + attr->max_rd_atomic, dev->limits.max_qp_init_rdma); + return -EINVAL; + } + + if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC && + attr->max_dest_rd_atomic > 1 << dev->qp_table.rdb_shift) { + mthca_dbg(dev, "Max rdma_atomic as responder(%u) too large. max is %d\n", + attr->max_dest_rd_atomic, 1 << dev->qp_table.rdb_shift); + return -EINVAL; + } + mailbox = mthca_alloc_mailbox(dev, GFP_KERNEL); if (IS_ERR(mailbox)) return PTR_ERR(mailbox); @@ -712,9 +726,9 @@ int mthca_modify_qp(struct ib_qp *ibqp, } if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC) { - qp_context->params1 |= cpu_to_be32(min(attr->max_rd_atomic ? - ffs(attr->max_rd_atomic) - 1 : 0, - 7) << 21); + qp_context->params1 |= + cpu_to_be32(attr->max_rd_atomic ? + fls(attr->max_rd_atomic - 1) << 21 : 0); qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_SRA_MAX); } @@ -748,13 +762,9 @@ int mthca_modify_qp(struct ib_qp *ibqp, qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_RWE | MTHCA_QP_OPTPAR_RRE | MTHCA_QP_OPTPAR_RAE); - - qp->atomic_rd_en = attr->qp_access_flags; } if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC) { - u8 rra_max; - if (qp->resp_depth && !attr->max_dest_rd_atomic) { /* * Lowering our responder resources to zero. @@ -782,16 +792,10 @@ int mthca_modify_qp(struct ib_qp *ibqp, MTHCA_QP_OPTPAR_RAE); } - for (rra_max = 0; - 1 << rra_max < attr->max_dest_rd_atomic && - rra_max < dev->qp_table.rdb_shift; - ++rra_max) - ; /* nothing */ - - qp_context->params2 |= cpu_to_be32(rra_max << 21); + qp_context->params2 |= + cpu_to_be32(attr->max_dest_rd_atomic ? + fls(attr->max_dest_rd_atomic - 1) << 21 : 0); qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_RRA_MAX); - - qp->resp_depth = attr->max_dest_rd_atomic; } qp_context->params2 |= cpu_to_be32(MTHCA_QP_BIT_RSC); @@ -833,8 +837,13 @@ int mthca_modify_qp(struct ib_qp *ibqp, err = -EINVAL; } - if (!err) + if (!err) { + if (attr_mask & IB_QP_ACCESS_FLAGS) + qp->atomic_rd_en = attr->qp_access_flags; + if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC) + qp->resp_depth = attr->max_dest_rd_atomic; qp->state = new_state; + } mthca_free_mailbox(dev, mailbox); From mst at mellanox.co.il Wed Dec 7 08:44:33 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Dec 2005 18:44:33 +0200 Subject: [openib-general] [PATCH] ipoib_multicast/ipoib_mcast_send race In-Reply-To: <20051207131105.GS21035@mellanox.co.il> References: <20051207131105.GS21035@mellanox.co.il> Message-ID: <20051207164433.GA21035@mellanox.co.il> Hello, Roland! Here's another race scenario. --- Fix the following race scenario: device is up. port event or set mcast list triggers ipoib_mcast_stop_thread, This cancels the query and waits on mcast "done" completion. completion is called and "done" is set. Meanwhile, ipoib_mcast_send arrives and starts a new query, re-initializing "done". Signed-off-by: Michael S. Tsirkin Index: openib/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- openib.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2005-12-01 14:53:08.000000000 +0200 +++ openib/drivers/infiniband/ulp/ipoib/ipoib.h 2005-12-07 18:23:55.000000000 +0200 @@ -78,6 +78,7 @@ enum { IPOIB_FLAG_SUBINTERFACE = 4, IPOIB_MCAST_RUN = 5, IPOIB_STOP_REAPER = 6, + IPOIB_MCAST_STARTED = 7, IPOIB_MAX_BACKOFF_SECONDS = 16, Index: openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- openib.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2005-12-07 18:22:12.000000000 +0200 +++ openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2005-12-07 18:29:49.000000000 +0200 @@ -582,6 +582,10 @@ int ipoib_mcast_start_thread(struct net_ queue_work(ipoib_workqueue, &priv->mcast_task); up(&mcast_mutex); + spin_lock_irq(&priv->lock); + set_bit(IPOIB_MCAST_STARTED, &priv->flags); + spin_unlock_irq(&priv->lock); + return 0; } @@ -592,6 +596,10 @@ int ipoib_mcast_stop_thread(struct net_d ipoib_dbg_mcast(priv, "stopping multicast thread\n"); + spin_lock_irq(&priv->lock); + clear_bit(IPOIB_MCAST_STARTED, &priv->flags); + spin_unlock_irq(&priv->lock); + down(&mcast_mutex); clear_bit(IPOIB_MCAST_RUN, &priv->flags); cancel_delayed_work(&priv->mcast_task); @@ -674,6 +682,9 @@ void ipoib_mcast_send(struct net_device */ spin_lock(&priv->lock); + if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags)) + goto unlock; + mcast = __ipoib_mcast_find(dev, mgid); if (!mcast) { /* Let's create a new send only group now */ @@ -732,6 +743,7 @@ out: ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN); } +unlock: spin_unlock(&priv->lock); } -- MST From rdreier at cisco.com Wed Dec 7 10:38:53 2005 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 07 Dec 2005 10:38:53 -0800 Subject: [openib-general] [PATCH] ipoib_multicast: IPOIB_FLAG_ADMIN_UP test racy In-Reply-To: <20051207131105.GS21035@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 7 Dec 2005 15:11:05 +0200") References: <20051207131105.GS21035@mellanox.co.il> Message-ID: Thanks -- I'm just now digging myself out of the backlog caused by being offline while we moved offices but I will review and apply these patches ASAP. - R. From iod00d at hp.com Wed Dec 7 11:41:37 2005 From: iod00d at hp.com (Grant Grundler) Date: Wed, 7 Dec 2005 11:41:37 -0800 Subject: [openib-general] Re: Flash sector size? eh? In-Reply-To: <20051207093120.GN21035@mellanox.co.il> References: <20051206210025.GJ21980@esmail.cup.hp.com> <20051207093120.GN21035@mellanox.co.il> Message-ID: <20051207194137.GB26945@esmail.cup.hp.com> On Wed, Dec 07, 2005 at 11:31:20AM +0200, Michael S. Tsirkin wrote: > You need to query the Board ID on card and in image: > ./mstflint/mstflint -d q > ./mstflint/mstflint -i q Ah...another issue: # ./mstflint/mstflint -d /proc/bus/pci/0084\:05/00.0 q *** ERROR *** Can't open /proc/bus/pci/0084:05/00.0: Can not obtain Flash semaphore (63). You can use -clear_semaphore to force semaphore unlock. See help for details. I expect that's due to the previous failure not cleaning up behind itself. After clearing the flash semaphore: # ./mstflint/mstflint -d /proc/bus/pci/0084\:05/00.0 q Image type: FailSafe Chip rev.: A0 GUID Des: Node Port1 Port2 Sys image GUIDs: 001321ffff757800 001321ffff757801 001321ffff757802 001321ffff757803 Board ID: 76 # ./mstflint/mstflint -i /root/fw-25208-4_7_400-MHGA28-1T.bin q Image type: FailSafe Chip rev.: A0 GUID Des: Node Port1 Port2 Sys image GUIDs: 0002c9000100d050 0002c9000100d051 0002c9000100d052 0002c9000100d050 Board ID: V_ym (MT_0200000001) I don't know what to make of the "76-" for board ID. Could this be a prototype board with some HP generated firmware? Is "Board ID" the only way to tell which vendor provided an HCA? I normally expect Subsystem ID to tell me that but have the impression (after looking at several PCI-X HCAs I have installed) that I can't trust that in this case. :( thanks, grant From ftillier at silverstorm.com Wed Dec 7 11:50:18 2005 From: ftillier at silverstorm.com (Fabian Tillier) Date: Wed, 7 Dec 2005 11:50:18 -0800 Subject: [openib-general] Re: Flash sector size? eh? In-Reply-To: <20051207194137.GB26945@esmail.cup.hp.com> References: <20051206210025.GJ21980@esmail.cup.hp.com> <20051207093120.GN21035@mellanox.co.il> <20051207194137.GB26945@esmail.cup.hp.com> Message-ID: <79ae2f320512071150p600f9854o3553340c6b385c97@mail.gmail.com> On 12/7/05, Grant Grundler wrote: > I don't know what to make of the "76-" for board ID. > Could this be a prototype board with some HP generated firmware? > > Is "Board ID" the only way to tell which vendor provided an HCA? > I normally expect Subsystem ID to tell me that but have the impression > (after looking at several PCI-X HCAs I have installed) that I can't > trust that in this case. :( The GUIDs have the OUI of the vendor in the first 3 bytes. 0002c9 is Mellanox, and 001321 is HP. So your card looks like it has HP FW. However, that in itself shouldn't prevent the FW from loading, so I'm not of much use to you. I'll let Michael chime in on that part. - Fab From mst at mellanox.co.il Wed Dec 7 13:25:04 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Dec 2005 23:25:04 +0200 Subject: [openib-general] Re: Flash sector size? eh? In-Reply-To: <20051207194137.GB26945@esmail.cup.hp.com> References: <20051207194137.GB26945@esmail.cup.hp.com> Message-ID: <20051207212504.GE1404@mellanox.co.il> Quoting r. Grant Grundler : > Subject: Re: Flash sector size? eh? > > On Wed, Dec 07, 2005 at 11:31:20AM +0200, Michael S. Tsirkin wrote: > > You need to query the Board ID on card and in image: > > ./mstflint/mstflint -d q > > ./mstflint/mstflint -i q > > Ah...another issue: > # ./mstflint/mstflint -d /proc/bus/pci/0084\:05/00.0 q > *** ERROR *** Can't open /proc/bus/pci/0084:05/00.0: Can not obtain Flash semaphore (63). You can use -clear_semaphore to force semaphore unlock. See help for details. > > I expect that's due to the previous failure not cleaning up behind itself. > After clearing the flash semaphore: > > # ./mstflint/mstflint -d /proc/bus/pci/0084\:05/00.0 q > Image type: FailSafe > Chip rev.: A0 > GUID Des: Node Port1 Port2 Sys image > GUIDs: 001321ffff757800 001321ffff757801 001321ffff757802 001321ffff757803 > Board ID: 76 > > # ./mstflint/mstflint -i /root/fw-25208-4_7_400-MHGA28-1T.bin q > Image type: FailSafe > Chip rev.: A0 > GUID Des: Node Port1 Port2 Sys image > GUIDs: 0002c9000100d050 0002c9000100d051 0002c9000100d052 0002c9000100d050 > Board ID: V_ym (MT_0200000001) > > I don't know what to make of the "76-" for board ID. > Could this be a prototype board with some HP generated firmware? Looks like it is, and looks like HP has its own way to format board IDs, different from both mellanox and topspin boards. Could you find out about this? > Is "Board ID" the only way to tell which vendor provided an HCA? > I normally expect Subsystem ID to tell me that but have the impression > (after looking at several PCI-X HCAs I have installed) that I can't > trust that in this case. :( Mellanox FW lets you set the Subsystem ID, I wouldnt know whether vendors actually do this. -- MST From mst at mellanox.co.il Wed Dec 7 13:27:41 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Dec 2005 23:27:41 +0200 Subject: [openib-general] Re: Flash sector size? eh? In-Reply-To: <20051207194137.GB26945@esmail.cup.hp.com> References: <20051207194137.GB26945@esmail.cup.hp.com> Message-ID: <20051207212741.GF1404@mellanox.co.il> Quoting r. Grant Grundler : > # ./mstflint/mstflint -d /proc/bus/pci/0084\:05/00.0 q > Image type: FailSafe > Chip rev.: A0 > GUID Des: Node Port1 Port2 Sys image > GUIDs: 001321ffff757800 001321ffff757801 001321ffff757802 001321ffff757803 > Board ID: 76 > > # ./mstflint/mstflint -i /root/fw-25208-4_7_400-MHGA28-1T.bin q > Image type: FailSafe > Chip rev.: A0 > GUID Des: Node Port1 Port2 Sys image > GUIDs: 0002c9000100d050 0002c9000100d051 0002c9000100d052 0002c9000100d050 > Board ID: V_ym (MT_0200000001) > > I don't know what to make of the "76-" for board ID. > Could this be a prototype board with some HP generated firmware? > > Is "Board ID" the only way to tell which vendor provided an HCA? Actually, quite a lot of data is accessible in the PCI VPD records. I'm not sure what, need to look the format up in the spec ... -- MST From ralphc at pathscale.com Wed Dec 7 13:28:04 2005 From: ralphc at pathscale.com (Ralph Campbell) Date: Wed, 07 Dec 2005 13:28:04 -0800 Subject: [openib-general] Async events all lumped together? Message-ID: <1133990884.12986.37.camel@brick.internal.keyresearch.com> I was wondering why the mthca driver generates a single IB_EVENT_LID_CHANGE event for all the changes (if any) contained in a SubnSet(Portinfo) MAD. Also, the various OpenIB core agents seem to respond to more event types than is strictly necessary: ib_sa_event() looks like it should only need to respond to IB_EVENT_SM_CHANGE. ib_cache_event() looks like it should only need to respond to IB_EVENT_PKEY_CHANGE or IB_EVENT_GID_CHANGE if there was one. -- Ralph Campbell From iod00d at hp.com Wed Dec 7 13:52:00 2005 From: iod00d at hp.com (Grant Grundler) Date: Wed, 7 Dec 2005 13:52:00 -0800 Subject: [openib-general] Re: Flash sector size? eh? In-Reply-To: <20051207212504.GE1404@mellanox.co.il> References: <20051207194137.GB26945@esmail.cup.hp.com> <20051207212504.GE1404@mellanox.co.il> Message-ID: <20051207215200.GH26945@esmail.cup.hp.com> On Wed, Dec 07, 2005 at 11:25:04PM +0200, Michael S. Tsirkin wrote: > Looks like it is, and looks like HP has its own way to format board IDs, > different from both mellanox and topspin boards. > Could you find out about this? Yes, I should be able to. In anycase, I want to upgrade the HCA firmware to 4.7.0 before posting issues with opensm. > Mellanox FW lets you set the Subsystem ID, > I wouldnt know whether vendors actually do this. I'll raise the issue inside HP and see who bites. thanks, grant From robert.j.woodruff at intel.com Wed Dec 7 15:59:00 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Wed, 7 Dec 2005 15:59:00 -0800 Subject: [openib-general] Announce: Updated packages available In-Reply-To: <438FB65E.50406@redhat.com> Message-ID: Doug Ledford wrote, >I've added to the list of available packages. In addition to >libibverbs, libmthca, libsdp, and opensm, we now have udapl compiled. >We also have an update initscripts package for RHEL-4 that enables >static IP setups on ipoib interfaces and works at boot time. In >addition, all the user space tools have been revved up to svn rev 4265. > The kernel has not been recompiled since the last one and is still at >3965. I hope to get an updated kernel sometime tomorrow. Hi Doug, I loaded your latest code onto a couple of X86_64 boxes and was successful at running MPI over the uDAPL from your RPM. The only problem I ran into was that I had to use my own libdat.so. Are you also planning on installing the libdat.so along with the libdapl.so for InfiniBand ? woody From caitlinb at broadcom.com Wed Dec 7 16:09:33 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Wed, 7 Dec 2005 16:09:33 -0800 Subject: [openib-general] Announce: Updated packages available Message-ID: <54AD0F12E08D1541B826BE97C98F99F10C2B6B@NT-SJCA-0751.brcm.ad.broadcom.com> openib-general-bounces at openib.org wrote: > Doug Ledford wrote, >> I've added to the list of available packages. In addition to >> libibverbs, libmthca, libsdp, and opensm, we now have udapl compiled. >> We also have an update initscripts package for RHEL-4 that enables >> static IP setups on ipoib interfaces and works at boot time. In >> addition, all the user space tools have been revved up to svn rev >> 4265. The kernel has not been recompiled since the last one and is >> still at 3965. I hope to get an updated kernel sometime tomorrow. > > Hi Doug, > > I loaded your latest code onto a couple of X86_64 boxes and > was successful at running MPI over the uDAPL from your RPM. > The only problem I ran into was that I had to use my own > libdat.so. Are you also planning on installing the libdat.so along > with the libdapl.so for InfiniBand ? > > woody > In the true spirit of both RPM and DAPL, libdat should probably be its own distinct package. From robert.j.woodruff at intel.com Wed Dec 7 16:14:29 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Wed, 7 Dec 2005 16:14:29 -0800 Subject: [openib-general] Announce: Updated packages available In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F10C2B6B@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: Catlin wrote, >In the true spirit of both RPM and DAPL, libdat should probably >be its own distinct package. Makes sense. woody From dledford at redhat.com Wed Dec 7 16:40:54 2005 From: dledford at redhat.com (Doug Ledford) Date: Wed, 07 Dec 2005 19:40:54 -0500 Subject: [openib-general] Announce: Updated packages available In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F10C2B6B@NT-SJCA-0751.brcm.ad.broadcom.com> References: <54AD0F12E08D1541B826BE97C98F99F10C2B6B@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <43978116.5050301@redhat.com> Caitlin Bestler wrote: > openib-general-bounces at openib.org wrote: > >>Doug Ledford wrote, >> >>>I've added to the list of available packages. In addition to >>>libibverbs, libmthca, libsdp, and opensm, we now have udapl compiled. >>>We also have an update initscripts package for RHEL-4 that enables >>>static IP setups on ipoib interfaces and works at boot time. In >>>addition, all the user space tools have been revved up to svn rev >>> 4265. The kernel has not been recompiled since the last one and is >>>still at 3965. I hope to get an updated kernel sometime tomorrow. >> >>Hi Doug, >> >>I loaded your latest code onto a couple of X86_64 boxes and >>was successful at running MPI over the uDAPL from your RPM. >>The only problem I ran into was that I had to use my own >>libdat.so. Are you also planning on installing the libdat.so along >>with the libdapl.so for InfiniBand ? >> >>woody >> > > In the true spirit of both RPM and DAPL, libdat should probably > be its own distinct package. > For the future, I more or less had that in mind, but I didn't do that this go around. -- Doug Ledford http://people.redhat.com/dledford From robert.j.woodruff at intel.com Wed Dec 7 16:59:01 2005 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Wed, 7 Dec 2005 16:59:01 -0800 Subject: [openib-general] Announce: Updated packages available Message-ID: <1AC79F16F5C5284499BB9591B33D6F00064EF5ED@orsmsx408> Bob Woodruff wrote, >I loaded your latest code onto a couple of X86_64 boxes and >was successful at running MPI over the uDAPL from your >RPM. The only problem I ran into was that I had to use my own >libdat.so. Are you also planning on installing the libdat.so along >with the libdapl.so for InfiniBand ? >woody Never mind, my bad. This turned out to be my problem when converting my systems from the SVN installed S/W to the new Redhat RPMS. woody From ogerlitz at voltaire.com Wed Dec 7 23:17:15 2005 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 08 Dec 2005 09:17:15 +0200 Subject: [openib-general] assumptions on page mapping (was High memory) In-Reply-To: <52acg25zr6.fsf@cisco.com> References: <437C1622.5070505@cse.ohio-state.edu> <52acg25zr6.fsf@cisco.com> Message-ID: <4397DDFB.2060102@voltaire.com> Roland Dreier wrote: > The right way to use the MR from get_dma_mr() is to use "bus > addresses" from the DMA mapping API. For highmem, the right way to > get those addresses is with dma_map_sg() or dma_map_page(). Looking on the kernel x86_64 code, both dma_map_sg and dma_map_page seem to assume that the page is already mapped, since they call page_address(page). Specifically is it safe in a SCSI LLD (eg SRP and iSER which is among other things such) to call dma_map_sg on a SG which comes with a SCSI command, so the SCSI Mid-Layer always makes sure the pages are mapped? Or. From fohhzqp at nougen.com Wed Dec 7 21:16:19 2005 From: fohhzqp at nougen.com (Ethan Ames) Date: Thu, 8 Dec 2005 09:16:19 +0400 Subject: [openib-general] Re: Hello. Message-ID: <2e9401c5fbd8$aca0bc50$ddf194da@kkz2h81vdqssaz> You've seen it on "60 Minutes" and read the BBC News report -- now find out just what everyone is talking about.

# Suppress your appetite and feel full and satisfied all day long
# Increase your energy levels
# Lose excess weight
# Increase your metabolism
# Burn body fat
# Burn calories
# Attack obesity
And more..

http://doctorsfound.com/

# Suitable for vegetarians and vegans
# MAINTAIN your weight loss
# Make losing weight a sure guarantee
# Look your best during the summer months

http://doctorsfound.com/

Regards, Dr. Ethan Ames -------------- next part -------------- An HTML attachment was scrubbed... URL: From yael at mellanox.co.il Thu Dec 8 02:39:30 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 08 Dec 2005 12:39:30 +0200 Subject: [openib-general] [PATCH] Opensm - fix osm_venodr_get_all_port_attr Message-ID: <5z8xuvhnz1.fsf@mtl066.yok.mtl.com> Hi Hal, If osm_vendor_get_all_port_attr is called before the osm_vendor_bind, then the sm_lid of the default port isn't updated correctly. This patch fixes it. Thanks, Yael Signed-off-by: Yael Kalka Index: libvendor/osm_vendor_ibumad.c =================================================================== --- libvendor/osm_vendor_ibumad.c (revision 4345) +++ libvendor/osm_vendor_ibumad.c (working copy) @@ -577,6 +577,7 @@ osm_vendor_get_all_port_attr( int *p_linkstates = linkstates; umad_port_t def_port = {""}; int r, i, j; + int sm_lid = 0; OSM_LOG_ENTER( p_vend->p_log, osm_vendor_get_all_port_attr ); @@ -636,6 +637,8 @@ osm_vendor_get_all_port_attr( def_port.ca_name, def_port.portnum, cl_hton64(def_port.port_guid)); + sm_lid = def_port.sm_lid; + umad_release_port(&def_port); } @@ -644,6 +647,9 @@ osm_vendor_get_all_port_attr( for (i = 0; i < *p_num_ports; i++) { p_attr_array[i].port_guid = portguids[i]; p_attr_array[i].lid = lids[i]; + if (i == 0) + p_attr_array[i].sm_lid = sm_lid; + else p_attr_array[i].sm_lid = p_vend->umad_port.sm_lid; p_attr_array[i].link_state = linkstates[i]; } From yael at mellanox.co.il Thu Dec 8 03:19:57 2005 From: yael at mellanox.co.il (Yael Kalka) Date: Thu, 8 Dec 2005 13:19:57 +0200 Subject: [openib-general] RE: [PATCH] OpenSM: SubnAdmGet PathRecord should assume NumbPath of 1 Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30E2482@mtlexch01.mtl.com> This looks good. Please apply it. Thanks, Yael -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Wednesday, December 07, 2005 2:06 PM To: Yael Kalka; Eitan Zahavi Cc: openib-general at openib.org Subject: [PATCH] OpenSM: SubnAdmGet PathRecord should assume NumbPath of 1 OpenSM: SubnAdmGet PathRecord should assume NumbPath of 1 (1.2 erratum) Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_path_record.c =================================================================== --- opensm/osm_sa_path_record.c (revision 4335) +++ opensm/osm_sa_path_record.c (working copy) @@ -709,13 +709,15 @@ __osm_pr_rcv_get_lid_pair_path( static void __osm_pr_rcv_get_port_pair_paths( IN osm_pr_rcv_t* const p_rcv, - IN const ib_path_rec_t* const p_pr, + IN const osm_madw_t* const p_madw, IN const osm_port_t* const p_req_port, IN const osm_port_t* const p_src_port, IN const osm_port_t* const p_dest_port, IN const ib_net64_t comp_mask, IN cl_qlist_t* const p_list ) { + const ib_path_rec_t* p_pr; + const ib_sa_mad_t* p_sa_mad; osm_pr_item_t* p_pr_item; uint16_t src_lid_min_ho; uint16_t src_lid_max_ho; @@ -752,6 +754,9 @@ __osm_pr_rcv_get_port_pair_paths( goto Exit; } + p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); + p_pr = (ib_path_rec_t*)ib_sa_mad_get_payload_ptr( p_sa_mad ); + /* We shouldn't be here if the paths are disqualified in some way... Thus, we assume every possible connection is valid. @@ -842,10 +847,14 @@ __osm_pr_rcv_get_port_pair_paths( preference = 0; path_num = 0; - if( comp_mask & IB_PR_COMPMASK_NUMBPATH ) - iterations = p_pr->num_path & 0x7F; + /* If SubnAdmGet, assume NumbPaths 1 (1.2 erratum) */ + if (p_sa_mad->method != IB_MAD_METHOD_GET) + if( comp_mask & IB_PR_COMPMASK_NUMBPATH ) + iterations = p_pr->num_path & 0x7F; + else + iterations = (uintn_t)(-1); else - iterations = (uintn_t)(-1); + iterations = 1; while( path_num < iterations ) { @@ -1101,7 +1110,7 @@ __osm_pr_rcv_get_end_points( static void __osm_pr_rcv_process_world( IN osm_pr_rcv_t* const p_rcv, - IN const ib_path_rec_t* const p_pr, + IN const osm_madw_t* const p_madw, IN const osm_port_t* const requestor_port, IN const ib_net64_t comp_mask, IN cl_qlist_t* const p_list ) @@ -1128,7 +1137,7 @@ __osm_pr_rcv_process_world( p_src_port = (osm_port_t*)cl_qmap_head( p_tbl ); while( p_src_port != (osm_port_t*)cl_qmap_end( p_tbl ) ) { - __osm_pr_rcv_get_port_pair_paths( p_rcv, p_pr, requestor_port, p_src_port, + __osm_pr_rcv_get_port_pair_paths( p_rcv, p_madw, requestor_port, p_src_port, p_dest_port, comp_mask, p_list ); p_src_port = (osm_port_t*)cl_qmap_next( &p_src_port->map_item ); @@ -1145,7 +1154,7 @@ __osm_pr_rcv_process_world( static void __osm_pr_rcv_process_half( IN osm_pr_rcv_t* const p_rcv, - IN const ib_path_rec_t* const p_pr, + IN const osm_madw_t* const p_madw, IN const osm_port_t* const requestor_port, IN const osm_port_t* const p_src_port, IN const osm_port_t* const p_dest_port, @@ -1172,7 +1181,7 @@ __osm_pr_rcv_process_half( p_port = (osm_port_t*)cl_qmap_head( p_tbl ); while( p_port != (osm_port_t*)cl_qmap_end( p_tbl ) ) { - __osm_pr_rcv_get_port_pair_paths( p_rcv, p_pr, requestor_port, p_src_port, + __osm_pr_rcv_get_port_pair_paths( p_rcv, p_madw , requestor_port, p_src_port, p_port, comp_mask, p_list ); p_port = (osm_port_t*)cl_qmap_next( &p_port->map_item ); } @@ -1185,7 +1194,7 @@ __osm_pr_rcv_process_half( p_port = (osm_port_t*)cl_qmap_head( p_tbl ); while( p_port != (osm_port_t*)cl_qmap_end( p_tbl ) ) { - __osm_pr_rcv_get_port_pair_paths( p_rcv, p_pr, requestor_port, p_port, + __osm_pr_rcv_get_port_pair_paths( p_rcv, p_madw, requestor_port, p_port, p_dest_port, comp_mask, p_list ); p_port = (osm_port_t*)cl_qmap_next( &p_port->map_item ); } @@ -1199,7 +1208,7 @@ __osm_pr_rcv_process_half( static void __osm_pr_rcv_process_pair( IN osm_pr_rcv_t* const p_rcv, - IN const ib_path_rec_t* const p_pr, + IN const osm_madw_t* const p_madw, IN const osm_port_t* const requestor_port, IN const osm_port_t* const p_src_port, IN const osm_port_t* const p_dest_port, @@ -1208,7 +1217,7 @@ __osm_pr_rcv_process_pair( { OSM_LOG_ENTER( p_rcv->p_log, __osm_pr_rcv_process_pair ); - __osm_pr_rcv_get_port_pair_paths( p_rcv, p_pr, requestor_port, p_src_port, + __osm_pr_rcv_get_port_pair_paths( p_rcv, p_madw, requestor_port, p_src_port, p_dest_port, comp_mask, p_list ); OSM_LOG_EXIT( p_rcv->p_log ); @@ -1413,7 +1422,8 @@ __osm_pr_match_mgrp_attributes( goto Exit; } - if( comp_mask & IB_PR_COMPMASK_NUMBPATH ) + /* If SubnAdmGet, assume NumbPaths of 1 (1.2 erratum) */ + if( ( comp_mask & IB_PR_COMPMASK_NUMBPATH ) && ( p_sa_mad->method != IB_MAD_METHOD_GET ) ) { if( ( p_pr->num_path & 0x7f ) == 0 ) goto Exit; @@ -1513,7 +1523,7 @@ __osm_pr_rcv_respond( /* * C15-0.1.30: - * If we do a SubAdmGet and got more than one record it is an error ! + * If we do a SubnAdmGet and got more than one record it is an error ! */ if ( (p_rcvd_mad->method == IB_MAD_METHOD_GET) && (num_rec > 1)) { @@ -1720,22 +1730,22 @@ osm_pr_rcv_process( if( p_src_port ) { if( p_dest_port ) - __osm_pr_rcv_process_pair( p_rcv, p_pr, requestor_port, p_src_port, p_dest_port, + __osm_pr_rcv_process_pair( p_rcv, p_madw, requestor_port, p_src_port, p_dest_port, p_sa_mad->comp_mask, &pr_list ); else - __osm_pr_rcv_process_half( p_rcv, p_pr, requestor_port, p_src_port, NULL, + __osm_pr_rcv_process_half( p_rcv, p_madw, requestor_port, p_src_port, NULL, p_sa_mad->comp_mask, &pr_list ); } else { if( p_dest_port ) - __osm_pr_rcv_process_half( p_rcv, p_pr, requestor_port, NULL, p_dest_port, + __osm_pr_rcv_process_half( p_rcv, p_madw, requestor_port, NULL, p_dest_port, p_sa_mad->comp_mask, &pr_list ); else /* Katie, bar the door! */ - __osm_pr_rcv_process_world( p_rcv, p_pr, requestor_port, + __osm_pr_rcv_process_world( p_rcv, p_madw, requestor_port, p_sa_mad->comp_mask, &pr_list ); } goto Unlock; From mst at mellanox.co.il Thu Dec 8 05:51:16 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Dec 2005 15:51:16 +0200 Subject: [openib-general] [PATCH] core: fix user_mad memory leaks on timeout Message-ID: <20051208135116.GL21035@mellanox.co.il> Dont leak packet if it had a timeout. Dont leak timeout mad if queue_packet fails. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Index: openib/drivers/infiniband/core/user_mad.c =================================================================== --- openib.orig/drivers/infiniband/core/user_mad.c 2005-12-08 15:40:41.000000000 +0200 +++ openib/drivers/infiniband/core/user_mad.c 2005-12-08 15:40:28.000000000 +0200 @@ -197,8 +197,8 @@ static void send_handler(struct ib_mad_a memcpy(timeout->mad.data, packet->mad.data, sizeof (struct ib_mad_hdr)); - if (!queue_packet(file, agent, timeout)) - return; + if (queue_packet(file, agent, timeout)) + kfree(timeout); } out: kfree(packet); -- MST From mst at mellanox.co.il Thu Dec 8 05:55:43 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Dec 2005 15:55:43 +0200 Subject: [openib-general] [PATCH rebase] large rmpp support Message-ID: <20051208135543.GM21035@mellanox.co.il> Hi! I am still looking at addressing Sean's comments. Meanwhile, for all adventurous testers out there, here's a revision of the previous large rmpp patch that applies on top of the memory leak fix that I've just posted. For review only. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Index: openib/drivers/infiniband/core/mad_rmpp.c =================================================================== --- openib.orig/drivers/infiniband/core/mad_rmpp.c 2005-11-22 10:53:48.000000000 +0200 +++ openib/drivers/infiniband/core/mad_rmpp.c 2005-12-08 15:44:35.000000000 +0200 @@ -433,44 +433,6 @@ static struct ib_mad_recv_wc * complete_ return rmpp_wc; } -void ib_coalesce_recv_mad(struct ib_mad_recv_wc *mad_recv_wc, void *buf) -{ - struct ib_mad_recv_buf *seg_buf; - struct ib_rmpp_mad *rmpp_mad; - void *data; - int size, len, offset; - u8 flags; - - len = mad_recv_wc->mad_len; - if (len <= sizeof(struct ib_mad)) { - memcpy(buf, mad_recv_wc->recv_buf.mad, len); - return; - } - - offset = data_offset(mad_recv_wc->recv_buf.mad->mad_hdr.mgmt_class); - - list_for_each_entry(seg_buf, &mad_recv_wc->rmpp_list, list) { - rmpp_mad = (struct ib_rmpp_mad *)seg_buf->mad; - flags = ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr); - - if (flags & IB_MGMT_RMPP_FLAG_FIRST) { - data = rmpp_mad; - size = sizeof(*rmpp_mad); - } else { - data = (void *) rmpp_mad + offset; - if (flags & IB_MGMT_RMPP_FLAG_LAST) - size = len; - else - size = sizeof(*rmpp_mad) - offset; - } - - memcpy(buf, data, size); - len -= size; - buf += size; - } -} -EXPORT_SYMBOL(ib_coalesce_recv_mad); - static struct ib_mad_recv_wc * continue_rmpp(struct ib_mad_agent_private *agent, struct ib_mad_recv_wc *mad_recv_wc) @@ -570,16 +532,26 @@ start_rmpp(struct ib_mad_agent_private * return mad_recv_wc; } -static inline u64 get_seg_addr(struct ib_mad_send_wr_private *mad_send_wr) +static inline void * get_seg_addr(struct ib_mad_send_wr_private *mad_send_wr) { - return mad_send_wr->sg_list[0].addr + mad_send_wr->data_offset + - (sizeof(struct ib_rmpp_mad) - mad_send_wr->data_offset) * - (mad_send_wr->seg_num - 1); + struct ib_mad_multipacket_seg *seg; + int i = 2; + + if (list_empty(&mad_send_wr->multipacket_list)) + return NULL; + + list_for_each_entry(seg, &mad_send_wr->multipacket_list, list) { + if (i == mad_send_wr->seg_num) + return seg->data; + i++; + } + return NULL; } -static int send_next_seg(struct ib_mad_send_wr_private *mad_send_wr) +int send_next_seg(struct ib_mad_send_wr_private *mad_send_wr) { struct ib_rmpp_mad *rmpp_mad; + void *next_data; int timeout; u32 paylen; @@ -594,12 +566,14 @@ static int send_next_seg(struct ib_mad_s rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(paylen); mad_send_wr->sg_list[0].length = sizeof(struct ib_rmpp_mad); } else { - mad_send_wr->send_wr.num_sge = 2; - mad_send_wr->sg_list[0].length = mad_send_wr->data_offset; - mad_send_wr->sg_list[1].addr = get_seg_addr(mad_send_wr); - mad_send_wr->sg_list[1].length = sizeof(struct ib_rmpp_mad) - - mad_send_wr->data_offset; - mad_send_wr->sg_list[1].lkey = mad_send_wr->sg_list[0].lkey; + next_data = get_seg_addr(mad_send_wr); + if (!next_data) { + printk(KERN_ERR PFX "send_next_seg: " + "could not find next segment\n"); + return -EINVAL; + } + memcpy((void *)rmpp_mad + mad_send_wr->data_offset, next_data, + sizeof(struct ib_rmpp_mad) - mad_send_wr->data_offset); rmpp_mad->rmpp_hdr.paylen_newwin = 0; } Index: openib/drivers/infiniband/include/rdma/ib_mad.h =================================================================== --- openib.orig/drivers/infiniband/include/rdma/ib_mad.h 2005-11-22 12:52:31.000000000 +0200 +++ openib/drivers/infiniband/include/rdma/ib_mad.h 2005-12-08 15:44:35.000000000 +0200 @@ -141,6 +141,11 @@ struct ib_rmpp_hdr { __be32 paylen_newwin; }; +struct ib_mad_multipacket_seg { + struct list_head list; + u8 data[0]; +}; + typedef u64 __bitwise ib_sa_comp_mask; #define IB_SA_COMP_MASK(n) ((__force ib_sa_comp_mask) cpu_to_be64(1ull << n)) @@ -485,17 +490,6 @@ int ib_unregister_mad_agent(struct ib_ma int ib_post_send_mad(struct ib_mad_send_buf *send_buf, struct ib_mad_send_buf **bad_send_buf); -/** - * ib_coalesce_recv_mad - Coalesces received MAD data into a single buffer. - * @mad_recv_wc: Work completion information for a received MAD. - * @buf: User-provided data buffer to receive the coalesced buffers. The - * referenced buffer should be at least the size of the mad_len specified - * by @mad_recv_wc. - * - * This call copies a chain of received MAD segments into a single data buffer, - * removing duplicated headers. - */ -void ib_coalesce_recv_mad(struct ib_mad_recv_wc *mad_recv_wc, void *buf); /** * ib_free_recv_mad - Returns data buffers used to receive a MAD. @@ -601,6 +595,18 @@ struct ib_mad_send_buf * ib_create_send_ gfp_t gfp_mask); /** + * ib_append_to_multipacket_mad - Append a segment of an RMPP multipacket mad send + * to the send buffer. + * @send_buf: Previously allocated send data buffer. + * @seg: segment to append to linked list (already filled with data). + * + * This routine appends a segment of a multipacket RMPP message + * (copied from user space) to a MAD for sending. + */ +void ib_append_to_multipacket_mad(struct ib_mad_send_buf * send_buf, + struct ib_mad_multipacket_seg *seg); + +/** * ib_free_send_mad - Returns data buffers used to send a MAD. * @send_buf: Previously allocated send data buffer. */ Index: openib/drivers/infiniband/core/mad.c =================================================================== --- openib.orig/drivers/infiniband/core/mad.c 2005-11-28 09:03:21.000000000 +0200 +++ openib/drivers/infiniband/core/mad.c 2005-12-08 15:44:35.000000000 +0200 @@ -792,17 +792,13 @@ struct ib_mad_send_buf * ib_create_send_ return ERR_PTR(-EINVAL); length = sizeof *mad_send_wr + buf_size; - if (length >= PAGE_SIZE) - buf = (void *)__get_free_pages(gfp_mask, long_log2(roundup_pow_of_two(length)) - PAGE_SHIFT); - else - buf = kmalloc(length, gfp_mask); + buf = kzalloc(sizeof *mad_send_wr + sizeof(struct ib_mad), gfp_mask); if (!buf) return ERR_PTR(-ENOMEM); - memset(buf, 0, length); - - mad_send_wr = buf + buf_size; + mad_send_wr = buf + sizeof(struct ib_mad); + INIT_LIST_HEAD(&mad_send_wr->multipacket_list); mad_send_wr->send_buf.mad = buf; mad_send_wr->mad_agent_priv = mad_agent_priv; @@ -834,23 +830,33 @@ struct ib_mad_send_buf * ib_create_send_ } EXPORT_SYMBOL(ib_create_send_mad); +void ib_append_to_multipacket_mad(struct ib_mad_send_buf * send_buf, + struct ib_mad_multipacket_seg *seg) +{ + struct ib_mad_send_wr_private *mad_send_wr; + + mad_send_wr = container_of(send_buf, struct ib_mad_send_wr_private, + send_buf); + list_add_tail(&seg->list, &mad_send_wr->multipacket_list); +} +EXPORT_SYMBOL(ib_append_to_multipacket_mad); + void ib_free_send_mad(struct ib_mad_send_buf *send_buf) { struct ib_mad_agent_private *mad_agent_priv; - void *mad_send_wr; - int length; + struct ib_mad_send_wr_private *mad_send_wr; + struct ib_mad_multipacket_seg *seg, *tmp; mad_agent_priv = container_of(send_buf->mad_agent, struct ib_mad_agent_private, agent); mad_send_wr = container_of(send_buf, struct ib_mad_send_wr_private, send_buf); - length = sizeof(struct ib_mad_send_wr_private) + (mad_send_wr - send_buf->mad); - if (length >= PAGE_SIZE) - free_pages((unsigned long)send_buf->mad, long_log2(roundup_pow_of_two(length)) - PAGE_SHIFT); - else - kfree(send_buf->mad); - + list_for_each_entry_safe(seg, tmp, &mad_send_wr->multipacket_list, list) { + list_del(&seg->list); + kfree(seg); + } + kfree(send_buf->mad); if (atomic_dec_and_test(&mad_agent_priv->refcount)) wake_up(&mad_agent_priv->wait); } Index: openib/drivers/infiniband/core/mad_priv.h =================================================================== --- openib.orig/drivers/infiniband/core/mad_priv.h 2005-11-13 10:48:32.000000000 +0200 +++ openib/drivers/infiniband/core/mad_priv.h 2005-12-08 15:44:35.000000000 +0200 @@ -130,6 +130,7 @@ struct ib_mad_send_wr_private { enum ib_wc_status status; /* RMPP control */ + struct list_head multipacket_list; int last_ack; int seg_num; int newwin; Index: openib/drivers/infiniband/core/user_mad.c =================================================================== --- openib.orig/drivers/infiniband/core/user_mad.c 2005-12-08 15:40:28.000000000 +0200 +++ openib/drivers/infiniband/core/user_mad.c 2005-12-08 15:45:07.000000000 +0200 @@ -123,6 +123,7 @@ struct ib_umad_packet { struct ib_mad_send_buf *msg; struct list_head list; int length; + struct list_head seg_list; struct ib_user_mad mad; }; @@ -176,6 +177,87 @@ static int queue_packet(struct ib_umad_f return ret; } +static int data_offset(u8 mgmt_class) +{ + if (mgmt_class == IB_MGMT_CLASS_SUBN_ADM) + return IB_MGMT_SA_HDR; + else if ((mgmt_class >= IB_MGMT_CLASS_VENDOR_RANGE2_START) && + (mgmt_class <= IB_MGMT_CLASS_VENDOR_RANGE2_END)) + return IB_MGMT_VENDOR_HDR; + else + return IB_MGMT_RMPP_HDR; +} + +static int copy_recv_mad(struct ib_mad_recv_wc *mad_recv_wc, + struct ib_umad_packet *packet) +{ + struct ib_mad_recv_buf *seg_buf; + struct ib_rmpp_mad *rmpp_mad; + void *data; + struct ib_mad_multipacket_seg *seg; + int size, len, offset; + u8 flags; + + len = mad_recv_wc->mad_len; + if (len <= sizeof(struct ib_mad)) { + memcpy(&packet->mad.data, mad_recv_wc->recv_buf.mad, len); + return 0; + } + + offset = data_offset(mad_recv_wc->recv_buf.mad->mad_hdr.mgmt_class); + + list_for_each_entry(seg_buf, &mad_recv_wc->rmpp_list, list) { + rmpp_mad = (struct ib_rmpp_mad *)seg_buf->mad; + flags = ib_get_rmpp_flags(&rmpp_mad->rmpp_hdr); + + if (flags & IB_MGMT_RMPP_FLAG_FIRST) { + size = sizeof(*rmpp_mad); + memcpy(&packet->mad.data, rmpp_mad, size); + } else { + data = (void *) rmpp_mad + offset; + if (flags & IB_MGMT_RMPP_FLAG_LAST) + size = len; + else + size = sizeof(*rmpp_mad) - offset; + seg = kmalloc(sizeof(struct ib_mad_multipacket_seg) + + sizeof(struct ib_rmpp_mad) - offset, + GFP_KERNEL); + if (!seg) + return -ENOMEM; + memcpy(seg->data, data, size); + list_add_tail(&seg->list, &packet->seg_list); + } + len -= size; + } + return 0; +} + +static struct ib_umad_packet *alloc_packet(void) +{ + struct ib_umad_packet *packet; + int length = sizeof *packet + sizeof(struct ib_mad); + + packet = kzalloc(length, GFP_KERNEL); + if (!packet) { + printk(KERN_ERR "alloc_packet: mem alloc failed for length %d\n", + length); + return NULL; + } + INIT_LIST_HEAD(&packet->seg_list); + return packet; +} + +static void free_packet(struct ib_umad_packet *packet) +{ + struct ib_mad_multipacket_seg *seg, *tmp; + + list_for_each_entry_safe(seg, tmp, &packet->seg_list, list) { + list_del(&seg->list); + kfree(seg); + } + kfree(packet); +} + static void send_handler(struct ib_mad_agent *agent, struct ib_mad_send_wc *send_wc) { @@ -187,7 +269,7 @@ static void send_handler(struct ib_mad_a ib_free_send_mad(packet->msg); if (send_wc->status == IB_WC_RESP_TIMEOUT_ERR) { - timeout = kzalloc(sizeof *timeout + IB_MGMT_MAD_HDR, GFP_KERNEL); + timeout = alloc_packet(); if (!timeout) goto out; @@ -198,40 +280,12 @@ static void send_handler(struct ib_mad_a sizeof (struct ib_mad_hdr)); if (queue_packet(file, agent, timeout)) - kfree(timeout); + free_packet(timeout); } out: kfree(packet); } -static struct ib_umad_packet *alloc_packet(int buf_size) -{ - struct ib_umad_packet *packet; - int length = sizeof *packet + buf_size; - - if (length >= PAGE_SIZE) - packet = (void *)__get_free_pages(GFP_KERNEL, long_log2(roundup_pow_of_two(length)) - PAGE_SHIFT); - else - packet = kmalloc(length, GFP_KERNEL); - - if (!packet) - return NULL; - - memset(packet, 0, length); - return packet; -} - -static void free_packet(struct ib_umad_packet *packet) -{ - int length = packet->length + sizeof *packet; - if (length >= PAGE_SIZE) - free_pages((unsigned long) packet, long_log2(roundup_pow_of_two(length)) - PAGE_SHIFT); - else - kfree(packet); -} - - - static void recv_handler(struct ib_mad_agent *agent, struct ib_mad_recv_wc *mad_recv_wc) { @@ -243,13 +297,16 @@ static void recv_handler(struct ib_mad_a goto out; length = mad_recv_wc->mad_len; - packet = alloc_packet(length); + packet = alloc_packet(); if (!packet) goto out; packet->length = length; - ib_coalesce_recv_mad(mad_recv_wc, packet->mad.data); + if (copy_recv_mad(mad_recv_wc, packet)) { + free_packet(packet); + goto out; + } packet->mad.hdr.status = 0; packet->mad.hdr.length = length + sizeof (struct ib_user_mad); @@ -278,6 +335,7 @@ static ssize_t ib_umad_read(struct file size_t count, loff_t *pos) { struct ib_umad_file *file = filp->private_data; + struct ib_mad_multipacket_seg *seg; struct ib_umad_packet *packet; ssize_t ret; @@ -304,18 +362,42 @@ static ssize_t ib_umad_read(struct file spin_unlock_irq(&file->recv_lock); - if (count < packet->length + sizeof (struct ib_user_mad)) { - /* Return length needed (and first RMPP segment) if too small */ - if (copy_to_user(buf, &packet->mad, - sizeof (struct ib_user_mad) + sizeof (struct ib_mad))) - ret = -EFAULT; - else - ret = -ENOSPC; - } else if (copy_to_user(buf, &packet->mad, - packet->length + sizeof (struct ib_user_mad))) + if (copy_to_user(buf, &packet->mad, + sizeof(struct ib_user_mad) + sizeof(struct ib_mad))) { ret = -EFAULT; - else + goto err; + } + + if (count < packet->length + sizeof (struct ib_user_mad)) + /* User buffer too small. Return first RMPP segment (which + * includes RMPP message length). + */ + ret = -ENOSPC; + else if (packet->length <= sizeof(struct ib_mad)) + ret = packet->length + sizeof(struct ib_user_mad); + else { + int len = packet->length - sizeof(struct ib_mad); + struct ib_rmpp_mad *rmpp_mad = + (struct ib_rmpp_mad *) packet->mad.data; + int max_seg_payload = sizeof(struct ib_mad) - + data_offset(rmpp_mad->mad_hdr.mgmt_class); + int seg_payload; + /* multipacket RMPP MAD message. Copy remainder of message. + * Note that last segment may have a shorter payload. + */ + buf += sizeof(struct ib_user_mad) + sizeof(struct ib_mad); + list_for_each_entry(seg, &packet->seg_list, list) { + seg_payload = min_t(int, len, max_seg_payload); + if (copy_to_user(buf, seg->data, seg_payload)) { + ret = -EFAULT; + goto err; + } + buf += seg_payload; + len -= seg_payload; + } ret = packet->length + sizeof (struct ib_user_mad); + } +err: if (ret < 0) { /* Requeue packet */ spin_lock_irq(&file->recv_lock); @@ -339,6 +421,8 @@ static ssize_t ib_umad_write(struct file __be64 *tid; int ret, length, hdr_len, copy_offset; int rmpp_active, has_rmpp_header; + int max_seg_payload; + struct ib_mad_multipacket_seg *seg; if (count < sizeof (struct ib_user_mad) + IB_MGMT_RMPP_HDR) return -EINVAL; @@ -415,6 +499,11 @@ static ssize_t ib_umad_write(struct file goto err_ah; } + if (!rmpp_active && length > sizeof(struct ib_mad)) { + ret = -EINVAL; + goto err_ah; + } + packet->msg = ib_create_send_mad(agent, be32_to_cpu(packet->mad.hdr.qpn), 0, rmpp_active, @@ -432,12 +521,39 @@ static ssize_t ib_umad_write(struct file /* Copy MAD headers (RMPP header in place) */ memcpy(packet->msg->mad, packet->mad.data, IB_MGMT_MAD_HDR); - /* Now, copy rest of message from user into send buffer */ + /* complete copying first 256 bytes of message into send buffer */ if (copy_from_user(packet->msg->mad + copy_offset, buf + sizeof (struct ib_user_mad) + copy_offset, - length - copy_offset)) { + min_t(int, length, sizeof(struct ib_mad)) - copy_offset)) { ret = -EFAULT; - goto err_msg; + goto err_ah; + } + + /* if multipacket, copy remainder of send message from user to multipacket list */ + length -= sizeof(struct ib_mad); + buf += sizeof (struct ib_user_mad) + sizeof(struct ib_mad); + max_seg_payload = sizeof(struct ib_mad) - + data_offset(rmpp_mad->mad_hdr.mgmt_class); + while (length > 0) { + int seg_payload = min_t(int, length, max_seg_payload); + seg = kzalloc(sizeof(struct ib_mad_multipacket_seg) + + max_seg_payload, GFP_KERNEL); + if (!seg) { + printk(KERN_ERR "ib_umad_write: " + "mem alloc failed for length %d\n", + sizeof(struct ib_mad_multipacket_seg) + + max_seg_payload); + ret = -ENOMEM; + goto err_msg; + } + + if (copy_from_user(seg->data, buf, seg_payload)) { + ret = -EFAULT; + goto err_msg; + } + ib_append_to_multipacket_mad(packet->msg, seg); + buf += seg_payload; + length -= seg_payload; } /* -- MST From halr at voltaire.com Thu Dec 8 06:15:33 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 08 Dec 2005 09:15:33 -0500 Subject: [openib-general] [PATCH] [TRIVIAL] OpenSM: Add DivergeNet into recognized vendors in osm_helper.c Message-ID: <1134051331.4587.42550.camel@hal.voltaire.com> OpenSM: Add DivergeNet into list of recognized manufacturers in osm_helper.c::osm_get_manufacturer_str Signed-off-by: Hal Rosenstock Index: osm_helper.c =================================================================== --- osm_helper.c (revision 4332) +++ osm_helper.c (working copy) @@ -1849,6 +1849,7 @@ osm_get_node_type_str_fixed_width( #define OSM_VENDOR_ID_YOTTAYOTTA 0x000453 /* Also, Obsidian Research */ #define OSM_VENDOR_ID_PATHSCALE 0x001175 #define OSM_VENDOR_ID_IBM 0x000255 +#define OSM_VENDOR_ID_DIVERGENET 0x00084E /********************************************************************** **********************************************************************/ @@ -1866,6 +1867,7 @@ osm_get_manufacturer_str( static const char* yotta_str = "YottaYotta "; static const char* pathscale_str = "PathScale "; static const char* ibm_str = "IBM "; + static const char* divergenet_str = "DivergeNet "; static const char* unknown_str = "Unknown "; switch( (uint32_t)(guid_ho >> (5 * 8)) ) @@ -1891,6 +1893,8 @@ osm_get_manufacturer_str( return( pathscale_str ); case OSM_VENDOR_ID_IBM: return( ibm_str ); + case OSM_VENDOR_ID_DIVERGENET: + return( divergenet_str ); default: return( unknown_str ); } From ianjiang.ict at gmail.com Thu Dec 8 06:26:14 2005 From: ianjiang.ict at gmail.com (Ian Jiang) Date: Thu, 8 Dec 2005 22:26:14 +0800 Subject: [openib-general] [kDAPL]questions about the LMR creation of different types of memory Message-ID: <7b2fa1820512080626kf4c9c23hdc3f416dcb970f6d@mail.gmail.com> Hi James, As is known to all, there several memory types uesed for memory register in kDAPL. I have some questions about the types DAT_MEM_TYPE_PHYSICAL and DAT_MEM_TYPE_IA: 1) Could memory allocated by kmem_cache_create() be OK? AFAIK, memory allocated by kmalloc() is OK and that by vmalloc() is not. What about that allocated by kmem_cache_create()? Is it OK in the condition that the SLAB_CACHE_DMA flag is used? 2) What is the difference between DAT_MEM_TYPE_PHYSICAL and DAT_MEM_TYPE_IA when a continuous range of physical memory is to be registered? In my opinion, the continuous range should be tranlated into a serious of page addresses before registered as the DAT_MEM_TYPE_PHYSICAL type, and it's not necessary for the DAT_MEM_TYPE_IA type. Is the translation is done in the dat_lmr_kcreate() for the DAT_MEM_TYPE_IA type? Thanks a lot! -- Ian Jiang ianjiang.ict at gmail.com Laboratory of Spatial Information Technology Division of System Architecture Institute of Computing Technology Chinese Academy of Sciences -------------- next part -------------- An HTML attachment was scrubbed... URL: From hch at lst.de Thu Dec 8 06:42:15 2005 From: hch at lst.de (Christoph Hellwig) Date: Thu, 8 Dec 2005 15:42:15 +0100 Subject: [openib-general] assumptions on page mapping (was High memory) In-Reply-To: <4397DDFB.2060102@voltaire.com> References: <437C1622.5070505@cse.ohio-state.edu> <52acg25zr6.fsf@cisco.com> <4397DDFB.2060102@voltaire.com> Message-ID: <20051208144215.GA15022@lst.de> On Thu, Dec 08, 2005 at 09:17:15AM +0200, Or Gerlitz wrote: > Roland Dreier wrote: > >The right way to use the MR from get_dma_mr() is to use "bus > >addresses" from the DMA mapping API. For highmem, the right way to > >get those addresses is with dma_map_sg() or dma_map_page(). > > Looking on the kernel x86_64 code, both dma_map_sg and dma_map_page seem > to assume that the page is already mapped, since they call > page_address(page). x86_64 doesn't have highmem, so page_address(page) is valid on every page. > Specifically is it safe in a SCSI LLD (eg SRP and iSER which is among > other things such) to call dma_map_sg on a SG which comes with a SCSI yes, this is definitly safe. > command, so the SCSI Mid-Layer always makes sure the pages are mapped? no, it doesn't. in fact pages don't need to be mapped at all for dma normally. From halr at voltaire.com Thu Dec 8 07:03:16 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 08 Dec 2005 10:03:16 -0500 Subject: [openib-general] Re: [PATCH] Opensm - fix osm_venodr_get_all_port_attr In-Reply-To: <5z8xuvhnz1.fsf@mtl066.yok.mtl.com> References: <5z8xuvhnz1.fsf@mtl066.yok.mtl.com> Message-ID: <1134054074.4485.3.camel@hal.voltaire.com> Hi Yael, On Thu, 2005-12-08 at 05:39, Yael Kalka wrote: > Hi Hal, > > If osm_vendor_get_all_port_attr is called before the osm_vendor_bind, What exercises the vendor calls in this manner ? > then the sm_lid of the default port isn't updated correctly. > This patch fixes it. Thanks. Applied. -- Hal From halr at voltaire.com Thu Dec 8 07:12:29 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 08 Dec 2005 10:12:29 -0500 Subject: [openib-general] Re: [PATCH] core: fix user_mad memory leaks on timeout In-Reply-To: <20051208135116.GL21035@mellanox.co.il> References: <20051208135116.GL21035@mellanox.co.il> Message-ID: <1134054749.4485.10.camel@hal.voltaire.com> On Thu, 2005-12-08 at 08:51, Michael S. Tsirkin wrote: > Dont leak packet if it had a timeout. > Dont leak timeout mad if queue_packet fails. > > Signed-off-by: Jack Morgenstein > Signed-off-by: Michael S. Tsirkin > > Index: openib/drivers/infiniband/core/user_mad.c > =================================================================== > --- openib.orig/drivers/infiniband/core/user_mad.c 2005-12-08 15:40:41.000000000 +0200 > +++ openib/drivers/infiniband/core/user_mad.c 2005-12-08 15:40:28.000000000 +0200 > @@ -197,8 +197,8 @@ static void send_handler(struct ib_mad_a > memcpy(timeout->mad.data, packet->mad.data, > sizeof (struct ib_mad_hdr)); > > - if (!queue_packet(file, agent, timeout)) > - return; > + if (queue_packet(file, agent, timeout)) > + kfree(timeout); Yes, there appears to be a memory leak here but I don't think this fix is quite right as it has lost the return when the queue_packet succeeds. Isn't that still needed ? if (!queue_packet(file, agent, timeout)) return; kfree(timeout); > } > out: > kfree(packet); Another point: on either failure to allocate the timeout MAD or failure to queue the timeout MAD, is simply throwing this away sufficient ? It seems to me that if this occurs, then the contract is broken and the client still needs to worry about its own timeout. -- Hal From mst at mellanox.co.il Thu Dec 8 07:25:21 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Dec 2005 17:25:21 +0200 Subject: [openib-general] Re: [PATCH] core: fix user_mad memory leaks on timeout In-Reply-To: <1134054749.4485.10.camel@hal.voltaire.com> References: <1134054749.4485.10.camel@hal.voltaire.com> Message-ID: <20051208152521.GQ21035@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: [PATCH] core: fix user_mad memory leaks on timeout > > On Thu, 2005-12-08 at 08:51, Michael S. Tsirkin wrote: > > Dont leak packet if it had a timeout. > > Dont leak timeout mad if queue_packet fails. > > > > Signed-off-by: Jack Morgenstein > > Signed-off-by: Michael S. Tsirkin > > > > Index: openib/drivers/infiniband/core/user_mad.c > > =================================================================== > > --- openib.orig/drivers/infiniband/core/user_mad.c 2005-12-08 > 15:40:41.000000000 +0200 > > +++ openib/drivers/infiniband/core/user_mad.c 2005-12-08 > 15:40:28.000000000 +0200 > > @@ -197,8 +197,8 @@ static void send_handler(struct ib_mad_a > > memcpy(timeout->mad.data, packet->mad.data, > > sizeof (struct ib_mad_hdr)); > > > > - if (!queue_packet(file, agent, timeout)) > > - return; > > + if (queue_packet(file, agent, timeout)) > > + kfree(timeout); > > Yes, there appears to be a memory leak here but I don't think this fix > is quite right as it has lost the return when the queue_packet succeeds. > Isn't that still needed ? No, the return here was wrong: we copied the packet and we need to free it anyway so falling through to kfree below is the correct behaviour. Thats what I mean by "Dont leak packet if it had a timeout". > if (!queue_packet(file, agent, timeout)) > return; > kfree(timeout); > > > } > > out: > > kfree(packet); > > Another point: on either failure to allocate the timeout MAD or failure > to queue the timeout MAD, is simply throwing this away sufficient ? It > seems to me that if this occurs, then the contract is broken and the > client still needs to worry about its own timeout. > > -- Hal > Not much to do though, since allocating memory with gfp kernel fails: lets at least be careful to avoid a crash or memory leak. -- MST From mst at mellanox.co.il Thu Dec 8 07:36:46 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Dec 2005 17:36:46 +0200 Subject: [openib-general] mthca_multicast_attach/detach questions Message-ID: <20051208153646.GR21035@mellanox.co.il> Hello, Roland! 1. error handling in mthca_multicast_attach looks strange: in particular, dont we want to revert the result of mthca_alloc if QP is already a member of MGM, or if MGM is full? 2. mthca_multicast_detach has an unconditional goto if (i != 1) goto out; goto out; this looks wrong: it seems you'll never remove an empty multicast group. Comments? -- MST From halr at voltaire.com Thu Dec 8 07:35:14 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 08 Dec 2005 10:35:14 -0500 Subject: [openib-general] Re: [PATCH] core: fix user_mad memory leaks on timeout In-Reply-To: <20051208152521.GQ21035@mellanox.co.il> References: <1134054749.4485.10.camel@hal.voltaire.com> <20051208152521.GQ21035@mellanox.co.il> Message-ID: <1134056114.4485.13.camel@hal.voltaire.com> On Thu, 2005-12-08 at 10:25, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: [PATCH] core: fix user_mad memory leaks on timeout > > > > On Thu, 2005-12-08 at 08:51, Michael S. Tsirkin wrote: > > > Dont leak packet if it had a timeout. > > > Dont leak timeout mad if queue_packet fails. > > > > > > Signed-off-by: Jack Morgenstein > > > Signed-off-by: Michael S. Tsirkin > > > > > > Index: openib/drivers/infiniband/core/user_mad.c > > > =================================================================== > > > --- openib.orig/drivers/infiniband/core/user_mad.c 2005-12-08 > > 15:40:41.000000000 +0200 > > > +++ openib/drivers/infiniband/core/user_mad.c 2005-12-08 > > 15:40:28.000000000 +0200 > > > @@ -197,8 +197,8 @@ static void send_handler(struct ib_mad_a > > > memcpy(timeout->mad.data, packet->mad.data, > > > sizeof (struct ib_mad_hdr)); > > > > > > - if (!queue_packet(file, agent, timeout)) > > > - return; > > > + if (queue_packet(file, agent, timeout)) > > > + kfree(timeout); > > > > Yes, there appears to be a memory leak here but I don't think this fix > > is quite right as it has lost the return when the queue_packet succeeds. > > Isn't that still needed ? > > No, the return here was wrong: we copied the packet > and we need to free it anyway > so falling through to kfree below is the correct behaviour. > Thats what I mean by "Dont leak packet if it had a timeout". You're right. > > if (!queue_packet(file, agent, timeout)) > > return; > > kfree(timeout); > > > > > } > > > out: > > > kfree(packet); > > > > Another point: on either failure to allocate the timeout MAD or failure > > to queue the timeout MAD, is simply throwing this away sufficient ? It > > seems to me that if this occurs, then the contract is broken and the > > client still needs to worry about its own timeout. > > > > -- Hal > > > > Not much to do though, since allocating memory with gfp kernel fails: Couldn't the callback be rescheduled for some time later where the allocation might succeed ? > lets at least be careful to avoid a crash or memory leak. Agreed. -- Hal From rdreier at cisco.com Thu Dec 8 07:44:10 2005 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Dec 2005 07:44:10 -0800 Subject: [openib-general] Re: mthca_multicast_attach/detach questions In-Reply-To: <20051208153646.GR21035@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 8 Dec 2005 17:36:46 +0200") References: <20051208153646.GR21035@mellanox.co.il> Message-ID: Michael> Hello, Roland! 1. error handling in Michael> mthca_multicast_attach looks strange: in particular, dont Michael> we want to revert the result of mthca_alloc if QP is Michael> already a member of MGM, or if MGM is full? Yes, I would use the word "wrong" instead of "strange." We certainly have to undo the alloc if something after it fails. Michael> 2. mthca_multicast_detach has an unconditional goto Michael> if (i != 1) goto out; Michael> goto out; Michael> this looks wrong: it seems you'll never remove an empty Michael> multicast group. Yes, the second goto looks wrong as well... probably some sort of typing/editing error in the past. - R. From mst at mellanox.co.il Thu Dec 8 07:54:37 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Dec 2005 17:54:37 +0200 Subject: [openib-general] Re: [PATCH] core: fix user_mad memory leaks on timeout In-Reply-To: <1134056114.4485.13.camel@hal.voltaire.com> References: <1134056114.4485.13.camel@hal.voltaire.com> Message-ID: <20051208155437.GS21035@mellanox.co.il> Quoting Hal Rosenstock : > Couldn't the callback be rescheduled for some time later where the > allocation might succeed ? Arent we allocating with GFP_KERNEL already? I think this means "try as hard as you can" already. No? Maybe we could allocate the timeout packet together with allocating the send mad. > > lets at least be careful to avoid a crash or memory leak. > > Agreed. -- MST From yipeeyipeeyipeeyipee at yahoo.com Thu Dec 8 07:54:29 2005 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Thu, 8 Dec 2005 15:54:29 +0000 (UTC) Subject: [openib-general] QP from userspace used in kernel Message-ID: Hi, What are the reasons that a qp allocated in user-space can't be passed and used by a kernel module? What are the steps needed to make a userspace-allocated qp usable by a kernel module? Thanks, x From jlentini at netapp.com Thu Dec 8 08:05:48 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 8 Dec 2005 08:05:48 -0800 (PST) Subject: [openib-general] Re: [kDAPL]questions about the LMR creation of different types of memory In-Reply-To: <7b2fa1820512080626kf4c9c23hdc3f416dcb970f6d@mail.gmail.com> Message-ID: ian> Hi James, ian> As is known to all, there several memory types uesed for memory ian> register in kDAPL. I have some questions about the types ian> DAT_MEM_TYPE_PHYSICAL and DAT_MEM_TYPE_IA: ian> 1) Could memory allocated by kmem_cache_create() be OK? ian> AFAIK, memory allocated by kmalloc() is OK and that by vmalloc() ian> is not. Correct, assuming you pass kmalloc the GFP_DMA flag. ian> What about that allocated by kmem_cache_create()? Is it ian> OK in the condition that the SLAB_CACHE_DMA flag is used? Yes. ian> 2) What is the difference between DAT_MEM_TYPE_PHYSICAL and ian> DAT_MEM_TYPE_IA when a continuous range of physical memory is to ian> be registered? ian> In my opinion, the continuous range should be tranlated into a ian> serious of page addresses before registered as the ian> DAT_MEM_TYPE_PHYSICAL type, correct ian> and it's not necessary for the DAT_MEM_TYPE_IA type. Is the ian> translation is done in the dat_lmr_kcreate() for the ian> DAT_MEM_TYPE_IA type? No translation is done in dat_lmr_kcreate for DAT_MEM_TYPE_IA. A DAT_MEM_TYPE_IA address is supposed to be an I/O address that the adapter can use. From jackm at mellanox.co.il Thu Dec 8 08:16:25 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Thu, 8 Dec 2005 18:16:25 +0200 Subject: [openib-general] [PATCH] mthca: fix memory leak Message-ID: <20051208161625.GA7653@mellanox.co.il> Hi, This patch frees the memory allocated in mthca_init_user_db_tab. Signed-off-by: Jack Morgenstein Index: linux-kernel/drivers/infiniband/hw/mthca/mthca_memfree.c =================================================================== --- linux-kernel.orig/drivers/infiniband/hw/mthca/mthca_memfree.c +++ linux-kernel/drivers/infiniband/hw/mthca/mthca_memfree.c @@ -485,6 +485,7 @@ void mthca_cleanup_user_db_tab(struct mt put_page(db_tab->page[i].mem.page); } } + kfree(db_tab); } int mthca_alloc_db(struct mthca_dev *dev, enum mthca_db_type type, From mst at mellanox.co.il Thu Dec 8 08:16:46 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Dec 2005 18:16:46 +0200 Subject: [openib-general] ipoib: ipoib_mcast_join_finish oops Message-ID: <20051208161646.GT21035@mellanox.co.il> Roland, from some ipoib oopses that I see, it seems, that ipoib_mcast_join_finish is running when priv->dev->broadcast is NULL. Any idea how could that be the case? -- MST From caitlinb at broadcom.com Thu Dec 8 08:22:51 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Thu, 8 Dec 2005 08:22:51 -0800 Subject: [openib-general] QP from userspace used in kernel Message-ID: <54AD0F12E08D1541B826BE97C98F99F10C2BF6@NT-SJCA-0751.brcm.ad.broadcom.com> openib-general-bounces at openib.org wrote: > Hi, > > What are the reasons that a qp allocated in user-space can't > be passed and used by a kernel module? > What are the steps needed to make a userspace-allocated qp usable by > a kernel module? > > You would need to construct an environment such that the device-specific verbs module, which assumes it is executing in the user space where the QP was created, would never notice the difference. The device-specific verbs will typically have created shared memory resources that are accessible by both the RDMA device and from the creating user memory map. These resources may include pointers that assume the original memory map. The exact methods of remembering the locations of these resources will vary by device, so the chance of coming up with a scheme that works without explicit support of all device vendors is very low. The chances of convincing all device vendors to add a new option to support this model is similarly low unless you can make a very compelling case as to why this is necessary. Having the in-kernel proxy create the QP and do operations for the end-user is a very adequate work around. For complex cleanup purposes the kernel could simply assume the identity of the failed process, but that would only be required if the standard cleanup was somehow not adequate. From mst at mellanox.co.il Thu Dec 8 10:34:06 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Dec 2005 20:34:06 +0200 Subject: [openib-general] [PATCH fixed] was Re: [PATCH] ipoib_multicast/ipoib_mcast_send race In-Reply-To: <20051207164433.GA21035@mellanox.co.il> References: <20051207164433.GA21035@mellanox.co.il> Message-ID: <20051208183406.GA13614@mellanox.co.il> Quoting Michael S. Tsirkin : > Subject: [PATCH] ipoib_multicast/ipoib_mcast_send race > > Hello, Roland! > Here's another race scenario. > > --- > > Fix the following race scenario: > device is up. > port event or set mcast list triggers ipoib_mcast_stop_thread, > This cancels the query and waits on mcast "done" completion. > completion is called and "done" is set. > Meanwhile, ipoib_mcast_send arrives and starts a new query, > re-initializing "done". > > Signed-off-by: Michael S. Tsirkin The patch I posted previously leaked an skb when a multicast send arrived while the mcast thread is stopped. Further, there's an additional issue that I saw in testing: ipoib_mcast_send may get called when priv->broadcast is NULL (e.g. if the device was downed and then upped internally because of a port event). If this happends and the sendonly join request gets completed before priv->broadcast is set, we get an oops that I posted previously. Here's a better patch to address these two problems. It has been running fine here for a while now. Please note that this replaces the ipoib_multicast/ipoib_mcast_send patch, but not the ADMIN_UP patch that I posted previously. --- Do not send multicasts if mcast thread is stopped or if priv->broadcast is not set. Signed-off-by: Michael S. Tsirkin Index: openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (revision 4222) +++ openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (working copy) @@ -582,6 +582,10 @@ int ipoib_mcast_start_thread(struct net_ queue_work(ipoib_workqueue, &priv->mcast_task); up(&mcast_mutex); + spin_lock_irq(&priv->lock); + set_bit(IPOIB_MCAST_STARTED, &priv->flags); + spin_unlock_irq(&priv->lock); + return 0; } @@ -592,6 +596,10 @@ int ipoib_mcast_stop_thread(struct net_d ipoib_dbg_mcast(priv, "stopping multicast thread\n"); + spin_lock_irq(&priv->lock); + clear_bit(IPOIB_MCAST_STARTED, &priv->flags); + spin_unlock_irq(&priv->lock); + down(&mcast_mutex); clear_bit(IPOIB_MCAST_RUN, &priv->flags); cancel_delayed_work(&priv->mcast_task); @@ -674,6 +682,11 @@ void ipoib_mcast_send(struct net_device */ spin_lock(&priv->lock); + if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags) || !priv->broadcast) { + dev_kfree_skb_any(skb); + goto unlock; + } + mcast = __ipoib_mcast_find(dev, mgid); if (!mcast) { /* Let's create a new send only group now */ @@ -732,6 +745,7 @@ out: ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN); } +unlock: spin_unlock(&priv->lock); } Index: openib/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- openib/drivers/infiniband/ulp/ipoib/ipoib.h (revision 4222) +++ openib/drivers/infiniband/ulp/ipoib/ipoib.h (working copy) @@ -78,6 +78,7 @@ enum { IPOIB_FLAG_SUBINTERFACE = 4, IPOIB_MCAST_RUN = 5, IPOIB_STOP_REAPER = 6, + IPOIB_MCAST_STARTED = 7, IPOIB_MAX_BACKOFF_SECONDS = 16, -- MST From rdreier at cisco.com Thu Dec 8 12:16:03 2005 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 08 Dec 2005 12:16:03 -0800 Subject: [openib-general] Re: [kDAPL]questions about the LMR creation of different types of memory In-Reply-To: (James Lentini's message of "Thu, 8 Dec 2005 08:05:48 -0800 (PST)") References: Message-ID: ian> 1) Could memory allocated by kmem_cache_create() be OK? ian> AFAIK, memory allocated by kmalloc() is OK and that by ian> vmalloc() is not. James> Correct, assuming you pass kmalloc the GFP_DMA flag. No, the GFP_DMA flag is not necessary. On x86 is means to allocate from the 24-bit ISA DMA region (ie the low 16 MB of RAM). In general it is never necessary to use GFP_DMA in modern code. - R. From sean.hefty at intel.com Thu Dec 8 16:59:26 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 8 Dec 2005 16:59:26 -0800 Subject: [openib-general] [PATCH] [CMA] support for SDP + standard protocol Message-ID: The following patch updates the CMA to support the IB socket-based protocol standard and SDP's private data format. The CMA now defines RDMA "port spaces". RDMA identifiers are associated with a user-specified port space at creation time. Please respond with any comments on the approach. Note that these changes have not been pushed up to userspace yet. Signed-off-by: Sean Hefty Index: ulp/iser/iser_verbs.c =================================================================== --- ulp/iser/iser_verbs.c (revision 4356) +++ ulp/iser/iser_verbs.c (working copy) @@ -428,7 +428,8 @@ iser_connect(struct iser_conn *p_iser_co return -1; } p_iser_conn->cma_id = rdma_create_id(iser_cma_handler, - (void *)p_iser_conn); + (void *)p_iser_conn, + RDMA_PS_TCP); if (IS_ERR(p_iser_conn->cma_id)) { ret = PTR_ERR(p_iser_conn->cma_id); iser_err("rdma_create_id failed: %d\n", ret); Index: include/rdma/rdma_cm.h =================================================================== --- include/rdma/rdma_cm.h (revision 4356) +++ include/rdma/rdma_cm.h (working copy) @@ -54,6 +54,13 @@ enum rdma_cm_event_type { RDMA_CM_EVENT_DEVICE_REMOVAL, }; +enum rdma_port_space { + RDMA_PS_SDP = 0x0001, + RDMA_PS_TCP = 0x0106, + RDMA_PS_UDP = 0x0111, + RDMA_PS_SCTP = 0x0183 +}; + struct rdma_addr { struct sockaddr src_addr; u8 src_pad[sizeof(struct sockaddr_in6) - @@ -97,11 +104,20 @@ struct rdma_cm_id { struct ib_qp *qp; rdma_cm_event_handler event_handler; struct rdma_route route; + enum rdma_port_space ps; u8 port_num; }; +/** + * rdma_create_id - Create an RDMA identifier. + * + * @event_handler: User callback invoked to report events associated with the + * returned rdma_id. + * @context: User specified context associated with the id. + * @ps: RDMA port space. + */ struct rdma_cm_id* rdma_create_id(rdma_cm_event_handler event_handler, - void *context); + void *context, enum rdma_port_space ps); void rdma_destroy_id(struct rdma_cm_id *id); Index: core/cma.c =================================================================== --- core/cma.c (revision 4356) +++ core/cma.c (working copy) @@ -110,21 +110,35 @@ struct rdma_id_private { u8 srq; }; -struct cma_addr { - u8 version; /* CMA version: 7:4, IP version: 3:0 */ - u8 reserved; - __u16 port; +union cma_ip_addr { + struct in6_addr ip6; struct { - union { - struct in6_addr ip6; - struct { - __u32 pad[3]; - __u32 addr; - } ip4; - } ver; - } src_addr, dst_addr; + __u32 pad[3]; + __u32 addr; + } ip4; +}; + +struct cma_hdr { + u8 cma_version; + u8 ip_version; /* IP version: 7:4 */ + __u16 port; + union cma_ip_addr src_addr; + union cma_ip_addr dst_addr; }; +struct sdp_hh { + u8 sdp_version; + u8 ip_version; /* IP version: 7:4 */ + u8 sdp_specific1[10]; + __u16 port; + __u16 sdp_specific2; + union cma_ip_addr src_addr; + union cma_ip_addr dst_addr; +}; + +#define CMA_VERSION 0x10 +#define SDP_VERSION 0x22 + static int cma_comp(struct rdma_id_private *id_priv, enum cma_state comp) { unsigned long flags; @@ -162,19 +176,24 @@ static enum cma_state cma_exch(struct rd return old; } -static inline u8 cma_get_ip_ver(struct cma_addr *addr) +static inline u8 cma_get_ip_ver(struct cma_hdr *hdr) { - return addr->version & 0xF; + return hdr->ip_version >> 4; } -static inline u8 cma_get_cma_ver(struct cma_addr *addr) +static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver) { - return addr->version >> 4; + hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF); } -static inline void cma_set_vers(struct cma_addr *addr, u8 cma_ver, u8 ip_ver) +static inline u8 sdp_get_ip_ver(struct sdp_hh *hh) { - addr->version = (cma_ver << 4) + (ip_ver & 0xF); + return hh->ip_version >> 4; +} + +static inline void sdp_set_ip_ver(struct sdp_hh *hh, u8 ip_ver) +{ + hh->ip_version = (ip_ver << 4) | (hh->ip_version & 0xF); } static void cma_attach_to_dev(struct rdma_id_private *id_priv, @@ -226,17 +245,18 @@ static void cma_release_remove(struct rd } struct rdma_cm_id* rdma_create_id(rdma_cm_event_handler event_handler, - void *context) + void *context, enum rdma_port_space ps) { struct rdma_id_private *id_priv; id_priv = kzalloc(sizeof *id_priv, GFP_KERNEL); if (!id_priv) - return NULL; + return ERR_PTR(-ENOMEM); id_priv->state = CMA_IDLE; id_priv->id.context = context; id_priv->id.event_handler = event_handler; + id_priv->id.ps = ps; spin_lock_init(&id_priv->lock); init_waitqueue_head(&id_priv->wait); atomic_set(&id_priv->refcount, 1); @@ -387,25 +407,93 @@ int rdma_init_qp_attr(struct rdma_cm_id } EXPORT_SYMBOL(rdma_init_qp_attr); -static int cma_verify_addr(struct cma_addr *addr, - struct sockaddr_in *ip_addr) +static inline int cma_any_addr(struct sockaddr *addr) { - if (cma_get_cma_ver(addr) != 1 || cma_get_ip_ver(addr) != 4) - return -EINVAL; + struct in6_addr *ip6; - if (ip_addr->sin_port != addr->port) - return -EINVAL; + if (addr->sa_family == AF_INET) + return ((struct sockaddr_in *) addr)->sin_addr.s_addr == + INADDR_ANY; + else { + ip6 = &((struct sockaddr_in6 *) addr)->sin6_addr; + return (ip6->s6_addr32[0] | ip6->s6_addr32[1] | + ip6->s6_addr32[3] | ip6->s6_addr32[4]) == 0; + } +} - if (ip_addr->sin_addr.s_addr && - (ip_addr->sin_addr.s_addr != addr->dst_addr.ver.ip4.addr)) - return -EINVAL; +static int cma_get_net_info(void *hdr, enum rdma_port_space ps, + u8 *ip_ver, __u16 *port, + union cma_ip_addr **src, union cma_ip_addr **dst) +{ + switch (ps) { + case RDMA_PS_SDP: + if (((struct sdp_hh *) hdr)->sdp_version != SDP_VERSION) + return -EINVAL; + *ip_ver = sdp_get_ip_ver(hdr); + *port = ((struct sdp_hh *) hdr)->port; + *src = &((struct sdp_hh *) hdr)->src_addr; + *dst = &((struct sdp_hh *) hdr)->dst_addr; + break; + default: + if (((struct cma_hdr *) hdr)->cma_version != CMA_VERSION) + return -EINVAL; + + *ip_ver = cma_get_ip_ver(hdr); + *port = ((struct cma_hdr *) hdr)->port; + *src = &((struct cma_hdr *) hdr)->src_addr; + *dst = &((struct cma_hdr *) hdr)->dst_addr; + break; + } return 0; } -static inline int cma_any_addr(struct sockaddr *addr) +static void cma_save_net_info(struct rdma_addr *addr, + struct rdma_addr *listen_addr, + u8 ip_ver, __u16 port, + union cma_ip_addr *src, union cma_ip_addr *dst) +{ + struct sockaddr_in *listen4, *ip4; + struct sockaddr_in6 *listen6, *ip6; + + switch (ip_ver) { + case 4: + listen4 = (struct sockaddr_in *) &listen_addr->src_addr; + ip4 = (struct sockaddr_in *) &addr->src_addr; + ip4->sin_family = listen4->sin_family; + ip4->sin_addr.s_addr = dst->ip4.addr; + ip4->sin_port = listen4->sin_port; + + ip4 = (struct sockaddr_in *) &addr->dst_addr; + ip4->sin_family = listen4->sin_family; + ip4->sin_addr.s_addr = src->ip4.addr; + ip4->sin_port = port; + break; + case 6: + listen6 = (struct sockaddr_in6 *) &listen_addr->src_addr; + ip6 = (struct sockaddr_in6 *) &addr->src_addr; + ip6->sin6_family = listen6->sin6_family; + ip6->sin6_addr = dst->ip6; + ip6->sin6_port = listen6->sin6_port; + + ip6 = (struct sockaddr_in6 *) &addr->dst_addr; + ip6->sin6_family = listen6->sin6_family; + ip6->sin6_addr = src->ip6; + ip6->sin6_port = port; + break; + default: + break; + } +} + +static inline int cma_user_data_offset(enum rdma_port_space ps) { - return ((struct sockaddr_in *) addr)->sin_addr.s_addr == 0; + switch (ps) { + case RDMA_PS_SDP: + return 0; + default: + return sizeof(struct cma_hdr); + } } static int cma_notify_user(struct rdma_id_private *id_priv, @@ -640,53 +728,41 @@ static struct rdma_id_private* cma_new_i { struct rdma_id_private *id_priv; struct rdma_cm_id *id; - struct rdma_route *route; - struct sockaddr_in *ip_addr, *listen_addr; - struct ib_sa_path_rec *path_rec; - struct cma_addr *addr; - int num_paths; - - listen_addr = (struct sockaddr_in *) &listen_id->route.addr.src_addr; - if (cma_verify_addr(ib_event->private_data, listen_addr)) - return NULL; + struct rdma_route *rt; + union cma_ip_addr *src, *dst; + __u16 port; + u8 ip_ver; - num_paths = 1 + (ib_event->param.req_rcvd.alternate_path != NULL); - path_rec = kmalloc(sizeof *path_rec * num_paths, GFP_KERNEL); - if (!path_rec) + id = rdma_create_id(listen_id->event_handler, listen_id->context, + listen_id->ps); + if (IS_ERR(id)) return NULL; - id = rdma_create_id(listen_id->event_handler, listen_id->context); - if (!id) + rt = &id->route; + rt->num_paths = ib_event->param.req_rcvd.alternate_path ? 2 : 1; + rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, GFP_KERNEL); + if (!rt->path_rec) goto err; - addr = ib_event->private_data; - route = &id->route; + if (cma_get_net_info(ib_event->private_data, listen_id->ps, + &ip_ver, &port, &src, &dst)) + goto err; - ip_addr = (struct sockaddr_in *) &route->addr.src_addr; - ip_addr->sin_family = listen_addr->sin_family; - ip_addr->sin_addr.s_addr = addr->dst_addr.ver.ip4.addr; - ip_addr->sin_port = listen_addr->sin_port; - - ip_addr = (struct sockaddr_in *) &route->addr.dst_addr; - ip_addr->sin_family = listen_addr->sin_family; - ip_addr->sin_addr.s_addr = addr->src_addr.ver.ip4.addr; - ip_addr->sin_port = addr->port; - - route->num_paths = num_paths; - route->path_rec = path_rec; - path_rec[0] = *ib_event->param.req_rcvd.primary_path; - if (num_paths == 2) - path_rec[1] = *ib_event->param.req_rcvd.alternate_path; - - route->addr.addr.ibaddr.sgid = path_rec->sgid; - route->addr.addr.ibaddr.dgid = path_rec->dgid; - route->addr.addr.ibaddr.pkey = be16_to_cpu(path_rec->pkey); + cma_save_net_info(&id->route.addr, &listen_id->route.addr, + ip_ver, port, src, dst); + rt->path_rec[0] = *ib_event->param.req_rcvd.primary_path; + if (rt->num_paths == 2) + rt->path_rec[1] = *ib_event->param.req_rcvd.alternate_path; + + rt->addr.addr.ibaddr.sgid = rt->path_rec[0].sgid; + rt->addr.addr.ibaddr.dgid = rt->path_rec[0].dgid; + rt->addr.addr.ibaddr.pkey = be16_to_cpu(rt->path_rec[0].pkey); id_priv = container_of(id, struct rdma_id_private, id); id_priv->state = CMA_CONNECT; return id_priv; err: - kfree(path_rec); + rdma_destroy_id(id); return NULL; } @@ -708,7 +784,6 @@ static int cma_req_handler(struct ib_cm_ goto out; } - conn_id->state = CMA_CONNECT; atomic_inc(&conn_id->dev_remove); ret = cma_acquire_ib_dev(conn_id, &conn_id->id.route.path_rec[0].sgid); if (ret) { @@ -722,7 +797,7 @@ static int cma_req_handler(struct ib_cm_ cm_id->context = conn_id; cm_id->cm_handler = cma_ib_handler; - offset = sizeof(struct cma_addr); + offset = cma_user_data_offset(listen_id->id.ps); ret = cma_notify_user(conn_id, RDMA_CM_EVENT_CONNECT_REQUEST, 0, ib_event->private_data + offset, IB_CM_REQ_PRIVATE_DATA_SIZE - offset); @@ -738,16 +813,16 @@ out: return ret; } -static __be64 cma_get_service_id(struct sockaddr *addr) +static __be64 cma_get_service_id(enum rdma_port_space ps, struct sockaddr *addr) { - return cpu_to_be64(((u64)IB_OPENIB_OUI << 48) + + return cpu_to_be64(((u64)ps << 16) + ((struct sockaddr_in *) addr)->sin_port); } static void cma_set_compare_data(struct sockaddr *addr, struct ib_cm_private_data_compare *compare) { - struct cma_addr *data, *mask; + struct cma_hdr *data, *mask; memset(compare, 0, sizeof *compare); data = (void *) compare->data; @@ -755,19 +830,18 @@ static void cma_set_compare_data(struct switch (addr->sa_family) { case AF_INET: - cma_set_vers(data, 0, 4); - cma_set_vers(mask, 0, 0xF); - data->dst_addr.ver.ip4.addr = ((struct sockaddr_in *) addr)-> - sin_addr.s_addr; - mask->dst_addr.ver.ip4.addr = ~0; + cma_set_ip_ver(data, 4); + cma_set_ip_ver(mask, 0xF); + data->dst_addr.ip4.addr = ((struct sockaddr_in *) addr)-> + sin_addr.s_addr; + mask->dst_addr.ip4.addr = ~0; break; case AF_INET6: - cma_set_vers(data, 0, 6); - cma_set_vers(mask, 0, 0xF); - data->dst_addr.ver.ip6 = ((struct sockaddr_in6 *) addr)-> - sin6_addr; - memset(&mask->dst_addr.ver.ip6, 1, - sizeof mask->dst_addr.ver.ip6); + cma_set_ip_ver(data, 6); + cma_set_ip_ver(mask, 0xF); + data->dst_addr.ip6 = ((struct sockaddr_in6 *) addr)-> + sin6_addr; + memset(&mask->dst_addr.ip6, 1, sizeof mask->dst_addr.ip6); break; default: break; @@ -787,7 +861,7 @@ static int cma_ib_listen(struct rdma_id_ return PTR_ERR(id_priv->cm_id); addr = &id_priv->id.route.addr.src_addr; - svc_id = cma_get_service_id(addr); + svc_id = cma_get_service_id(id_priv->id.ps, addr); if (cma_any_addr(addr)) ret = ib_cm_listen(id_priv->cm_id, svc_id, 0, NULL); else { @@ -835,7 +909,7 @@ static void cma_listen_on_dev(struct rdm struct rdma_cm_id *id; int ret; - id = rdma_create_id(cma_listen_handler, id_priv); + id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps); if (IS_ERR(id)) return; @@ -1099,19 +1173,34 @@ err: } EXPORT_SYMBOL(rdma_bind_addr); -static void cma_format_addr(struct cma_addr *addr, struct rdma_route *route) +static void cma_format_hdr(void *hdr, enum rdma_port_space ps, + struct rdma_route *route) { - struct sockaddr_in *ip_addr; - - memset(addr, 0, sizeof *addr); - cma_set_vers(addr, 1, 4); - - ip_addr = (struct sockaddr_in *) &route->addr.src_addr; - addr->src_addr.ver.ip4.addr = ip_addr->sin_addr.s_addr; - - ip_addr = (struct sockaddr_in *) &route->addr.dst_addr; - addr->dst_addr.ver.ip4.addr = ip_addr->sin_addr.s_addr; - addr->port = ip_addr->sin_port; + struct sockaddr_in *src4, *dst4; + struct cma_hdr *cma_hdr; + struct sdp_hh *sdp_hdr; + + src4 = (struct sockaddr_in *) &route->addr.src_addr; + dst4 = (struct sockaddr_in *) &route->addr.dst_addr; + + switch (ps) { + case RDMA_PS_SDP: + sdp_hdr = hdr; + sdp_hdr->sdp_version = SDP_VERSION; + sdp_set_ip_ver(sdp_hdr, 4); + sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; + sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; + sdp_hdr->port = src4->sin_port; + break; + default: + cma_hdr = hdr; + cma_hdr->cma_version = CMA_VERSION; + cma_set_ip_ver(cma_hdr, 4); + cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; + cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; + cma_hdr->port = src4->sin_port; + break; + } } static int cma_connect_ib(struct rdma_id_private *id_priv, @@ -1119,17 +1208,20 @@ static int cma_connect_ib(struct rdma_id { struct ib_cm_req_param req; struct rdma_route *route; - struct cma_addr *addr; void *private_data; - int ret; + int offset, ret; memset(&req, 0, sizeof req); - req.private_data_len = sizeof *addr + conn_param->private_data_len; - - private_data = kmalloc(req.private_data_len, GFP_ATOMIC); + offset = cma_user_data_offset(id_priv->id.ps); + req.private_data_len = offset + conn_param->private_data_len; + private_data = kzalloc(req.private_data_len, GFP_ATOMIC); if (!private_data) return -ENOMEM; + if (conn_param->private_data && conn_param->private_data_len) + memcpy(private_data + offset, conn_param->private_data, + conn_param->private_data_len); + id_priv->cm_id = ib_create_cm_id(id_priv->id.device, cma_ib_handler, id_priv); if (IS_ERR(id_priv->cm_id)) { @@ -1137,20 +1229,16 @@ static int cma_connect_ib(struct rdma_id goto out; } - addr = private_data; route = &id_priv->id.route; - cma_format_addr(addr, route); - - if (conn_param->private_data && conn_param->private_data_len) - memcpy(addr + 1, conn_param->private_data, - conn_param->private_data_len); + cma_format_hdr(private_data, id_priv->id.ps, route); req.private_data = private_data; req.primary_path = &route->path_rec[0]; if (route->num_paths == 2) req.alternate_path = &route->path_rec[1]; - req.service_id = cma_get_service_id(&route->addr.dst_addr); + req.service_id = cma_get_service_id(id_priv->id.ps, + &route->addr.dst_addr); req.qp_num = id_priv->qp_num; req.qp_type = id_priv->qp_type; req.starting_psn = id_priv->seq_num; @@ -1317,23 +1405,6 @@ out: } EXPORT_SYMBOL(rdma_disconnect); -/* TODO: add this to the device structure - see Roland's patch */ -static __be64 get_ca_guid(struct ib_device *device) -{ - struct ib_device_attr *device_attr; - __be64 guid; - int ret; - - device_attr = kmalloc(sizeof *device_attr, GFP_KERNEL); - if (!device_attr) - return 0; - - ret = ib_query_device(device, device_attr); - guid = ret ? 0 : device_attr->node_guid; - kfree(device_attr); - return guid; -} - static void cma_add_one(struct ib_device *device) { struct cma_device *cma_dev; @@ -1344,7 +1415,7 @@ static void cma_add_one(struct ib_device return; cma_dev->device = device; - cma_dev->node_guid = get_ca_guid(device); + cma_dev->node_guid = device->node_guid; if (!cma_dev->node_guid) goto err; Index: core/ucma.c =================================================================== --- core/ucma.c (revision 4356) +++ core/ucma.c (working copy) @@ -287,7 +287,7 @@ static ssize_t ucma_create_id(struct ucm return -ENOMEM; ctx->uid = cmd.uid; - ctx->cm_id = rdma_create_id(ucma_event_handler, ctx); + ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, RDMA_PS_TCP); if (IS_ERR(ctx->cm_id)) { ret = PTR_ERR(ctx->cm_id); goto err1; From ianjiang.ict at gmail.com Thu Dec 8 17:42:56 2005 From: ianjiang.ict at gmail.com (Ian Jiang) Date: Fri, 9 Dec 2005 09:42:56 +0800 Subject: [openib-general] Re: [kDAPL]questions about the LMR creation of different types of memory In-Reply-To: References: <7b2fa1820512080626kf4c9c23hdc3f416dcb970f6d@mail.gmail.com> Message-ID: <7b2fa1820512081742j7ef50a27kc2322cbf0e52d908@mail.gmail.com> On 12/9/05, James Lentini wrote: > ian> 1) Could memory allocated by kmem_cache_create() be OK? > ian> AFAIK, memory allocated by kmalloc() is OK and that by vmalloc() > ian> is not. > > Correct, assuming you pass kmalloc the GFP_DMA flag. Roland> No, the GFP_DMA flag is not necessary. On x86 is means to allocate Roland> from the 24-bit ISA DMA region (ie the low 16 MB of RAM). In general Roland> it is never necessary to use GFP_DMA in modern code. I agree with Roland. I tried the kmalloc() with GFP_ATOMIC flag and it was OK. ian> and it's not necessary for the DAT_MEM_TYPE_IA type. Is the > ian> translation is done in the dat_lmr_kcreate() for the > ian> DAT_MEM_TYPE_IA type? > > No translation is done in dat_lmr_kcreate for DAT_MEM_TYPE_IA. A > DAT_MEM_TYPE_IA address is supposed to be an I/O address that the > adapter can use. Question 1: How to distinguish a address that the adapter can use from that the adapter cannot use? Could you give an example? I am really not very familiar with the I/O address details. Question 2: Which memory type should be use given a continuous range of physical memory? It seems simpler to use the DAT_MEM_TYPE_IA type since no translation is needed. But is not there any limitation to the memory to be registered using the DAT_MEM_TYPE_IA, contrasted with the DAT_MEM_PHYSICAL type? Thanks a lot! -- Ian Jiang ianjiang.ict at gmail.com Laboratory of Spatial Information Technology Division of System Architecture Institute of Computing Technology Chinese Academy of Sciences -------------- next part -------------- An HTML attachment was scrubbed... URL: From tsxjjjje at go.com Thu Dec 8 22:12:51 2005 From: tsxjjjje at go.com (Cornelius Anthony) Date: Fri, 9 Dec 2005 04:12:51 -0200 Subject: [openib-general] Increase your energy levels Message-ID: <9b8101c5fc77$1a780c30$e3a85eda@billgates> You've seen it on "60 Minutes" and read the BBC News report -- now find out just what everyone is talking about.

# Suppress your appetite and feel full and satisfied all day long
# Increase your energy levels
# Lose excess weight
# Increase your metabolism
# Burn body fat
# Burn calories
# Attack obesity
And more..

http://vitaserious.com/

# Suitable for vegetarians and vegans
# MAINTAIN your weight loss
# Make losing weight a sure guarantee
# Look your best during the summer months

http://vitaserious.com/

Regards, Dr. Cornelius Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From bboas at llnl.gov Thu Dec 8 22:52:38 2005 From: bboas at llnl.gov (Bill Boas) Date: Thu, 08 Dec 2005 22:52:38 -0800 Subject: [openib-general] Next workshop dates? Please respond with your preferences Message-ID: <6.2.3.4.2.20051208224443.03a16be0@mail-lc.llnl.gov> All those wishing to attend the next workshop in Sonoma at the Lodge (same as last year) in the late January-early February please respond with your preferred dates. We currently have Jan29-Feb1 held for us but some people are telling us that is bad for them. The next 2 Sun-Wed slots (Feb 5-8 or 12-15) maybe available but we need guidance from those planning to attend as to their preferred dates. Bill. Bill Boas bboas at llnl.gov ICCD LLNL, B-453, R-2018 Wk: 925-422-4110 7000 East Ave, L-555 Cell: 925-337-2224 Livermore, CA 94551 Pgr: 877-203-2248 From krkumar2 at in.ibm.com Fri Dec 9 00:48:47 2005 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Fri, 9 Dec 2005 14:18:47 +0530 Subject: [openib-general] [PATCH fixed] was Re: [PATCH] ipoib_multicast/ipoib_mcast_send race In-Reply-To: <20051208183406.GA13614@mellanox.co.il> Message-ID: Hi Micheal, Is there a reason to have the atomic set_bit() within a lock (even for a race condition of stop vs send, it doesn't seem to be required) ? Which means the test_bit() can also be put before the existing lock... Thanks, - KK openib-general-bounces at openib.org wrote on 12/09/2005 12:04:06 AM: > Quoting Michael S. Tsirkin : > > Subject: [PATCH] ipoib_multicast/ipoib_mcast_send race > > > > Hello, Roland! > > Here's another race scenario. > > > > --- > > > > Fix the following race scenario: > > device is up. > > port event or set mcast list triggers ipoib_mcast_stop_thread, > > This cancels the query and waits on mcast "done" completion. > > completion is called and "done" is set. > > Meanwhile, ipoib_mcast_send arrives and starts a new query, > > re-initializing "done". > > > > Signed-off-by: Michael S. Tsirkin > > The patch I posted previously leaked an skb when a multicast > send arrived while the mcast thread is stopped. > > Further, there's an additional issue that I saw in testing: > ipoib_mcast_send may get called when priv->broadcast is NULL > (e.g. if the device was downed and then upped internally because > of a port event). > If this happends and the sendonly join request gets completed before > priv->broadcast is set, we get an oops that I posted previously. > > Here's a better patch to address these two problems. > It has been running fine here for a while now. > > Please note that this replaces the ipoib_multicast/ipoib_mcast_send patch, > but not the ADMIN_UP patch that I posted previously. > > --- > > Do not send multicasts if mcast thread is stopped or if > priv->broadcast is not set. > > Signed-off-by: Michael S. Tsirkin > > Index: openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > =================================================================== > --- openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (revision 4222) > +++ openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (working copy) > @@ -582,6 +582,10 @@ int ipoib_mcast_start_thread(struct net_ > queue_work(ipoib_workqueue, &priv->mcast_task); > up(&mcast_mutex); > > + spin_lock_irq(&priv->lock); > + set_bit(IPOIB_MCAST_STARTED, &priv->flags); > + spin_unlock_irq(&priv->lock); > + > return 0; > } > > @@ -592,6 +596,10 @@ int ipoib_mcast_stop_thread(struct net_d > > ipoib_dbg_mcast(priv, "stopping multicast thread\n"); > > + spin_lock_irq(&priv->lock); > + clear_bit(IPOIB_MCAST_STARTED, &priv->flags); > + spin_unlock_irq(&priv->lock); > + > down(&mcast_mutex); > clear_bit(IPOIB_MCAST_RUN, &priv->flags); > cancel_delayed_work(&priv->mcast_task); > @@ -674,6 +682,11 @@ void ipoib_mcast_send(struct net_device > */ > spin_lock(&priv->lock); > > + if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags) || !priv->broadcast) { > + dev_kfree_skb_any(skb); > + goto unlock; > + } > + > mcast = __ipoib_mcast_find(dev, mgid); > if (!mcast) { > /* Let's create a new send only group now */ > @@ -732,6 +745,7 @@ out: > ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN); > } > > +unlock: > spin_unlock(&priv->lock); > } > > Index: openib/drivers/infiniband/ulp/ipoib/ipoib.h > =================================================================== > --- openib/drivers/infiniband/ulp/ipoib/ipoib.h (revision 4222) > +++ openib/drivers/infiniband/ulp/ipoib/ipoib.h (working copy) > @@ -78,6 +78,7 @@ enum { > IPOIB_FLAG_SUBINTERFACE = 4, > IPOIB_MCAST_RUN = 5, > IPOIB_STOP_REAPER = 6, > + IPOIB_MCAST_STARTED = 7, > > IPOIB_MAX_BACKOFF_SECONDS = 16, > > > -- > MST > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Fri Dec 9 05:11:51 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 9 Dec 2005 15:11:51 +0200 Subject: [openib-general] [PATCH fixed] was Re: [PATCH] ipoib_multicast/ipoib_mcast_send race In-Reply-To: References: Message-ID: <20051209131151.GA21716@mellanox.co.il> The lock around clear_bit is there to ensure that ipoib_mcast_send isnt running already when we stop the thread. Thats why test_bit has to be instide the lock, too. Quoting r. Krishna Kumar2 : > Subject: Re: [openib-general] [PATCH fixed] was Re: [PATCH]?ipoib_multicast/ipoib_mcast_send race > > > Hi Micheal, > > Is there a reason to have the atomic set_bit() within a lock (even for > a race condition of stop vs send, it doesn't seem to be required) ? > Which means the test_bit() can also be put before the existing lock... > > Thanks, > > - KK > > openib-general-bounces at openib.org wrote on 12/09/2005 12:04:06 AM: > > > Quoting Michael S. Tsirkin : > > > Subject: [PATCH] ipoib_multicast/ipoib_mcast_send race > > > > > > Hello, Roland! > > > Here's another race scenario. > > > > > > --- > > > > > > Fix the following race scenario: > > > device is up. > > > port event or set mcast list triggers ipoib_mcast_stop_thread, > > > This cancels the query and waits on mcast "done" completion. > > > completion is called and "done" is set. > > > Meanwhile, ipoib_mcast_send arrives and starts a new query, > > > re-initializing "done". > > > > > > Signed-off-by: Michael S. Tsirkin > > > > The patch I posted previously leaked an skb when a multicast > > send arrived while the mcast thread is stopped. > > > > Further, there's an additional issue that I saw in testing: > > ipoib_mcast_send may get called when priv->broadcast is NULL > > (e.g. if the device was downed and then upped internally because > > of a port event). > > If this happends and the sendonly join request gets completed before > > priv->broadcast is set, we get an oops that I posted previously. > > > > Here's a better patch to address these two problems. > > It has been running fine here for a while now. > > > > Please note that this replaces the ipoib_multicast/ipoib_mcast_send > patch, > > but not the ADMIN_UP patch that I posted previously. > > > > --- > > > > Do not send multicasts if mcast thread is stopped or if > > priv->broadcast is not set. > > > > Signed-off-by: Michael S. Tsirkin > > > > Index: openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > > =================================================================== > > --- openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (revision > 4222) > > +++ openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (working > copy) > > @@ -582,6 +582,10 @@ int ipoib_mcast_start_thread(struct net_ > > queue_work(ipoib_workqueue, &priv->mcast_task); > > up(&mcast_mutex); > > > > + spin_lock_irq(&priv->lock); > > + set_bit(IPOIB_MCAST_STARTED, &priv->flags); > > + spin_unlock_irq(&priv->lock); > > + > > return 0; > > } > > > > @@ -592,6 +596,10 @@ int ipoib_mcast_stop_thread(struct net_d > > > > ipoib_dbg_mcast(priv, "stopping multicast thread\n"); > > > > + spin_lock_irq(&priv->lock); > > + clear_bit(IPOIB_MCAST_STARTED, &priv->flags); > > + spin_unlock_irq(&priv->lock); > > + > > down(&mcast_mutex); > > clear_bit(IPOIB_MCAST_RUN, &priv->flags); > > cancel_delayed_work(&priv->mcast_task); > > @@ -674,6 +682,11 @@ void ipoib_mcast_send(struct net_device > > */ > > spin_lock(&priv->lock); > > > > + if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags) || > !priv->broadcast) { > > + dev_kfree_skb_any(skb); > > + goto unlock; > > + } > > + > > mcast = __ipoib_mcast_find(dev, mgid); > > if (!mcast) { > > /* Let's create a new send only group now */ > > @@ -732,6 +745,7 @@ out: > > ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN); > > } > > > > +unlock: > > spin_unlock(&priv->lock); > > } > > > > Index: openib/drivers/infiniband/ulp/ipoib/ipoib.h > > =================================================================== > > --- openib/drivers/infiniband/ulp/ipoib/ipoib.h (revision 4222) > > +++ openib/drivers/infiniband/ulp/ipoib/ipoib.h (working copy) > > @@ -78,6 +78,7 @@ enum { > > IPOIB_FLAG_SUBINTERFACE = 4, > > IPOIB_MCAST_RUN = 5, > > IPOIB_STOP_REAPER = 6, > > + IPOIB_MCAST_STARTED = 7, > > > > IPOIB_MAX_BACKOFF_SECONDS = 16, > > > > > > -- > > MST > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > -- MST From halr at voltaire.com Fri Dec 9 07:51:36 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Dec 2005 10:51:36 -0500 Subject: [openib-general] Re: [PATCH] core: fix user_mad memory leaks on timeout In-Reply-To: <20051208135116.GL21035@mellanox.co.il> References: <20051208135116.GL21035@mellanox.co.il> Message-ID: <1134143496.4485.6696.camel@hal.voltaire.com> On Thu, 2005-12-08 at 08:51, Michael S. Tsirkin wrote: > Dont leak packet if it had a timeout. > Dont leak timeout mad if queue_packet fails. Thanks. Applied. Should this change be pushed upstream to 2.6.15 ? From arlin.r.davis at intel.com Fri Dec 9 12:39:15 2005 From: arlin.r.davis at intel.com (Arlin Davis) Date: Fri, 9 Dec 2005 12:39:15 -0800 Subject: [openib-general] [PATCH][uDAPL] openib_cma provider update Message-ID: James, I modified the IP address lookup during the open to take either a network name, network address, or device name. This will make the dat.conf setup a little easier and more flexible. I updated the README, and /doc/dat.conf with details. Thanks, -arlin Signed-off by: Arlin Davis Index: dapl/openib_cma/dapl_ib_util.c =================================================================== --- dapl/openib_cma/dapl_ib_util.c (revision 4361) +++ dapl/openib_cma/dapl_ib_util.c (working copy) @@ -58,6 +58,13 @@ static const char rcsid[] = "$Id: $"; #include #include +#include /* for IOCTL's */ +#include /* for socket(2) and related bits and pieces */ +#include /* for socket(2) */ +#include /* for struct ifreq */ +#include /* for ARPHRD_INFINIBAND */ + + int g_dapl_loopback_connection = 0; int g_ib_pipe[2]; ib_thread_state_t g_ib_thread_state = 0; @@ -65,39 +72,77 @@ DAPL_OS_THREAD g_ib_thread; DAPL_OS_LOCK g_hca_lock; struct dapl_llist_entry *g_hca_list; -/* Get IP address */ +/* Get IP address using network device name */ +static int getipaddr_netdev(char *name, char *addr, int addr_len) +{ + struct ifreq ifr; + int skfd, ret, len; + + /* Fill in the structure */ + snprintf(ifr.ifr_name, IFNAMSIZ, "%s", name); + ifr.ifr_hwaddr.sa_family = ARPHRD_INFINIBAND; + + /* Create a socket fd */ + skfd = socket(PF_INET, SOCK_STREAM, 0); + ret = ioctl(skfd, SIOCGIFADDR, &ifr); + if (ret) + goto bail; + + switch (ifr.ifr_addr.sa_family) + { +#ifdef AF_INET6 + case AF_INET6: + len = sizeof(struct sockaddr_in6); + break; +#endif + case AF_INET: + default: + len = sizeof(struct sockaddr); + break; + } + + if (len <= addr_len) + memcpy(addr, &ifr.ifr_addr, len); + else + ret = EINVAL; + +bail: + close(skfd); + return ret; +} + +/* Get IP address using network name, address, or device name */ static int getipaddr(char *name, char *addr, int len) { struct addrinfo *res; int ret; - - ret = getaddrinfo(name, NULL, NULL, &res); - if (ret) { - dapl_dbg_log(DAPL_DBG_TYPE_WARN, - " getipaddr: invalid name or address (%s)\n", - name); + + /* Assume network name and address type for first attempt */ + if (getaddrinfo(name, NULL, NULL, &res)) { + /* retry using network device name */ + ret = getipaddr_netdev(name,addr,len); + if (ret) + dapl_dbg_log(DAPL_DBG_TYPE_WARN, + " getipaddr: invalid name, addr, or netdev(%s)\n", + name); return ret; + } else { + if (len >= res->ai_addrlen) + memcpy(addr, res->ai_addr, res->ai_addrlen); + else + return EINVAL; + + freeaddrinfo(res); } dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " getipaddr: family %d port %d addr %d.%d.%d.%d\n", - ((struct sockaddr_in *)res->ai_addr)->sin_family, - ((struct sockaddr_in *)res->ai_addr)->sin_port, - ((struct sockaddr_in *) - res->ai_addr)->sin_addr.s_addr >> 0 & 0xff, - ((struct sockaddr_in *) - res->ai_addr)->sin_addr.s_addr >> 8 & 0xff, - ((struct sockaddr_in *) - res->ai_addr)->sin_addr.s_addr >> 16 & 0xff, - ((struct sockaddr_in *) - res->ai_addr)->sin_addr.s_addr >> 24 & 0xff ); - - if (len >= res->ai_addrlen) - memcpy(addr, res->ai_addr, res->ai_addrlen); - else - return EINVAL; - - freeaddrinfo(res); + ((struct sockaddr_in *)addr)->sin_family, + ((struct sockaddr_in *)addr)->sin_port, + ((struct sockaddr_in *)addr)->sin_addr.s_addr >> 0 & 0xff, + ((struct sockaddr_in *)addr)->sin_addr.s_addr >> 8 & 0xff, + ((struct sockaddr_in *)addr)->sin_addr.s_addr >> 16 & 0xff, + ((struct sockaddr_in *)addr)->sin_addr.s_addr >> 24 & 0xff); return 0; } Index: doc/dat.conf =================================================================== --- doc/dat.conf (revision 4361) +++ doc/dat.conf (working copy) @@ -9,9 +9,12 @@ # Example for openib_cma and openib_scm # # For scm version you specify as actual device name and port -# For cma version you specify as the ib device network address or network hostname and 0 for port +# For cma version you specify as: +# network address, network hostname, or netdev name and 0 for port # OpenIB-scm1 u1.2 nonthreadsafe default /usr/local/openib_dapl/udapl/Target/libdapl.so mv_dapl.1.2 "mthca0 1" "" OpenIB-scm2 u1.2 nonthreadsafe default /usr/local/openib_dapl/udapl/Target/libdapl.so mv_dapl.1.2 "mthca0 2" "" OpenIB-cma-ip u1.2 nonthreadsafe default /usr/local/openib_dapl/udapl/Target/libdapl.so mv_dapl.1.2 "192.168.0.22 0" "" OpenIB-cma-name u1.2 nonthreadsafe default /usr/local/openib_dapl/udapl/Target/libdapl.so mv_dapl.1.2 "svr1-ib0 0" "" +OpenIB-cma-netdev u1.2 nonthreadsafe default /usr/local/openib_dapl/udapl/Target/libdapl.so mv_dapl.1.2 "ib0 0" "" + Index: README =================================================================== --- README (revision 4361) +++ README (working copy) @@ -63,12 +63,14 @@ sample /etc/dat.conf # Example for openib_cma and openib_scm # # For scm version you specify as actual device name and port -# For cma version you specify as the ib device network address or network hostname and 0 for port +# For cma version you specify as: +# network address, network hostname, or netdev name and 0 for port # OpenIB-scm1 u1.2 nonthreadsafe default /usr/local/openib_dapl/udapl/Target/libdapl.so mv_dapl.1.2 "mthca0 1" "" OpenIB-scm2 u1.2 nonthreadsafe default /usr/local/openib_dapl/udapl/Target/libdapl.so mv_dapl.1.2 "mthca0 2" "" OpenIB-cma-ip u1.2 nonthreadsafe default /usr/local/openib_dapl/udapl/Target/libdapl.so mv_dapl.1.2 "192.168.0.22 0" "" OpenIB-cma-name u1.2 nonthreadsafe default /usr/local/openib_dapl/udapl/Target/libdapl.so mv_dapl.1.2 "svr1-ib0 0" "" +OpenIB-cma-netdev u1.2 nonthreadsafe default /usr/local/openib_dapl/udapl/Target/libdapl.so mv_dapl.1.2 "ib0 0" "" ============================= 3.0 SAMPLE uDAPL APPLICATION: From rdreier at cisco.com Fri Dec 9 13:46:46 2005 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 09 Dec 2005 13:46:46 -0800 Subject: [openib-general] Re: [PATCH] core: fix user_mad memory leaks on timeout In-Reply-To: <20051208135116.GL21035@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 8 Dec 2005 15:51:16 +0200") References: <20051208135116.GL21035@mellanox.co.il> Message-ID: Thanks, I queued this in my git tree. - R. From rdreier at cisco.com Fri Dec 9 13:49:14 2005 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 09 Dec 2005 13:49:14 -0800 Subject: [openib-general] Re: [PATCH] mthca: fix memory leak In-Reply-To: <20051208161625.GA7653@mellanox.co.il> (Jack Morgenstein's message of "Thu, 8 Dec 2005 18:16:25 +0200") References: <20051208161625.GA7653@mellanox.co.il> Message-ID: Thanks, applied to svn and queued in git... - R. From rolandd at cisco.com Fri Dec 9 13:51:50 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Dec 2005 21:51:50 +0000 Subject: [openib-general] [git patch review 2/5] IB/cm: correct reported reject code In-Reply-To: <1134165110300-0a7b2146d584150e@cisco.com> Message-ID: <1134165110300-7a2e27ea7ca96ec0@cisco.com> Change reject code from TIMEOUT to CONSUMER_REJECT when destroying a cm_id in the process of connecting. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier --- drivers/infiniband/core/cm.c | 13 +++++++++---- 1 files changed, 9 insertions(+), 4 deletions(-) 227eca83690da7dcbd698d3268e29402e0571723 diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 02110e0..1fe2186 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -684,6 +684,13 @@ retest: cm_reject_sidr_req(cm_id_priv, IB_SIDR_REJECT); break; case IB_CM_REQ_SENT: + ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg); + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + ib_send_cm_rej(cm_id, IB_CM_REJ_TIMEOUT, + &cm_id_priv->av.port->cm_dev->ca_guid, + sizeof cm_id_priv->av.port->cm_dev->ca_guid, + NULL, 0); + break; case IB_CM_MRA_REQ_RCVD: case IB_CM_REP_SENT: case IB_CM_MRA_REP_RCVD: @@ -694,10 +701,8 @@ retest: case IB_CM_REP_RCVD: case IB_CM_MRA_REP_SENT: spin_unlock_irqrestore(&cm_id_priv->lock, flags); - ib_send_cm_rej(cm_id, IB_CM_REJ_TIMEOUT, - &cm_id_priv->av.port->cm_dev->ca_guid, - sizeof cm_id_priv->av.port->cm_dev->ca_guid, - NULL, 0); + ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED, + NULL, 0, NULL, 0); break; case IB_CM_ESTABLISHED: spin_unlock_irqrestore(&cm_id_priv->lock, flags); -- 0.99.9l From rolandd at cisco.com Fri Dec 9 13:51:50 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Dec 2005 21:51:50 +0000 Subject: [openib-general] [git patch review 4/5] IB/umad: fix memory leaks In-Reply-To: <1134165110300-7535693e84cc230f@cisco.com> Message-ID: <1134165110301-ac635a95a66180bb@cisco.com> Don't leak packet if it had a timeout, and don't leak timeout struct if queue_packet() fails. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Signed-off-by: Roland Dreier --- drivers/infiniband/core/user_mad.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) 0efc4883a6b3de12476cd7a35e638c0a9f5fd75f diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index eb7f525..c908de8 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -197,8 +197,8 @@ static void send_handler(struct ib_mad_a memcpy(timeout->mad.data, packet->mad.data, sizeof (struct ib_mad_hdr)); - if (!queue_packet(file, agent, timeout)) - return; + if (queue_packet(file, agent, timeout)) + kfree(timeout); } out: kfree(packet); -- 0.99.9l From rolandd at cisco.com Fri Dec 9 13:51:50 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Dec 2005 21:51:50 +0000 Subject: [openib-general] [git patch review 3/5] IB/cm: avoid reusing local ID In-Reply-To: <1134165110300-7a2e27ea7ca96ec0@cisco.com> Message-ID: <1134165110300-7535693e84cc230f@cisco.com> Use an increasing local ID to avoid re-using identifiers while messages may still be outstanding on the old ID. Without this, a quick connect-disconnect-connect sequence can fail by matching messages for the new connection with the old connection. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier --- drivers/infiniband/core/cm.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) de1bb1a64c29bae4f5330c70bd1dc6a62954c9f4 diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 1fe2186..3a611fe 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -308,10 +308,11 @@ static int cm_alloc_id(struct cm_id_priv { unsigned long flags; int ret; + static int next_id; do { spin_lock_irqsave(&cm.lock, flags); - ret = idr_get_new_above(&cm.local_id_table, cm_id_priv, 1, + ret = idr_get_new_above(&cm.local_id_table, cm_id_priv, next_id++, (__force int *) &cm_id_priv->id.local_id); spin_unlock_irqrestore(&cm.lock, flags); } while( (ret == -EAGAIN) && idr_pre_get(&cm.local_id_table, GFP_KERNEL) ); -- 0.99.9l From rolandd at cisco.com Fri Dec 9 13:51:50 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Dec 2005 21:51:50 +0000 Subject: [openib-general] [git patch review 1/5] IB/mthca: fix QP size limits for mem-free HCAs Message-ID: <1134165110300-0a7b2146d584150e@cisco.com> Unlike tavor, the max work queue size is an exact power of 2 for arbel mode, despite what the documentation (of the QUERY_DEV_LIM firmware command) says. Without this patch, on Arbel, we can start with a QP of a valid size and get above the reported limit after rounding to the next power of two. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mthca/mthca_cmd.c | 12 ++++++++---- 1 files changed, 8 insertions(+), 4 deletions(-) a3c8ab4fe8f006d742c24be677518bfa9862e732 diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c index 9ed3458..22ac72b 100644 --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -937,10 +937,6 @@ int mthca_QUERY_DEV_LIM(struct mthca_dev if (err) goto out; - MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_SRQ_SZ_OFFSET); - dev_lim->max_srq_sz = (1 << field) - 1; - MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_QP_SZ_OFFSET); - dev_lim->max_qp_sz = (1 << field) - 1; MTHCA_GET(field, outbox, QUERY_DEV_LIM_RSVD_QP_OFFSET); dev_lim->reserved_qps = 1 << (field & 0xf); MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_QP_OFFSET); @@ -1056,6 +1052,10 @@ int mthca_QUERY_DEV_LIM(struct mthca_dev mthca_dbg(dev, "Flags: %08x\n", dev_lim->flags); if (mthca_is_memfree(dev)) { + MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_SRQ_SZ_OFFSET); + dev_lim->max_srq_sz = 1 << field; + MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_QP_SZ_OFFSET); + dev_lim->max_qp_sz = 1 << field; MTHCA_GET(field, outbox, QUERY_DEV_LIM_RSZ_SRQ_OFFSET); dev_lim->hca.arbel.resize_srq = field & 1; MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_SG_RQ_OFFSET); @@ -1087,6 +1087,10 @@ int mthca_QUERY_DEV_LIM(struct mthca_dev mthca_dbg(dev, "Max ICM size %lld MB\n", (unsigned long long) dev_lim->hca.arbel.max_icm_sz >> 20); } else { + MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_SRQ_SZ_OFFSET); + dev_lim->max_srq_sz = (1 << field) - 1; + MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_QP_SZ_OFFSET); + dev_lim->max_qp_sz = (1 << field) - 1; MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_AV_OFFSET); dev_lim->hca.tavor.max_avs = 1 << (field & 0x3f); dev_lim->mpt_entry_sz = MTHCA_MPT_ENTRY_SIZE; -- 0.99.9l From rolandd at cisco.com Fri Dec 9 13:51:50 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Dec 2005 21:51:50 +0000 Subject: [openib-general] [git patch review 5/5] IB/mthca: fix memory user DB table leak In-Reply-To: <1134165110301-ac635a95a66180bb@cisco.com> Message-ID: <1134165110301-b5d3e449a24a06fe@cisco.com> Free the memory allocated in mthca_init_user_db_tab() when releasing the db_tab in mthca_cleanup_user_db_tab(). Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mthca/mthca_memfree.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) 52d0df153c987e4ad57d15f5df91848f65858e5d diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c index d72fe95..5798ed0 100644 --- a/drivers/infiniband/hw/mthca/mthca_memfree.c +++ b/drivers/infiniband/hw/mthca/mthca_memfree.c @@ -485,6 +485,8 @@ void mthca_cleanup_user_db_tab(struct mt put_page(db_tab->page[i].mem.page); } } + + kfree(db_tab); } int mthca_alloc_db(struct mthca_dev *dev, enum mthca_db_type type, -- 0.99.9l From sean.hefty at intel.com Fri Dec 9 14:55:18 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 9 Dec 2005 14:55:18 -0800 Subject: [openib-general] [PATCH] [uCM] prevent userspace from using SDP/CMA SIDs Message-ID: The following patch rejects requests from userspace to use either the SDP or CMA service IDs. Signed-off-by: Sean Hefty Index: core/ucm.c =================================================================== --- core/ucm.c (revision 4356) +++ core/ucm.c (working copy) @@ -645,6 +645,17 @@ out: return result; } +static int ucm_validate_listen(__be64 service_id, __be64 service_mask) +{ + service_id &= service_mask; + + if (((service_id & IB_CMA_SERVICE_ID_MASK) == IB_CMA_SERVICE_ID) || + ((service_id & IB_SDP_SERVICE_ID_MASK) == IB_SDP_SERVICE_ID)) + return -EINVAL; + + return 0; +} + static ssize_t ib_ucm_listen(struct ib_ucm_file *file, const char __user *inbuf, int in_len, int out_len) @@ -660,8 +671,13 @@ static ssize_t ib_ucm_listen(struct ib_u if (IS_ERR(ctx)) return PTR_ERR(ctx); + result = ucm_validate_listen(cmd.service_id, cmd.service_mask); + if (result) + goto out; + result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask, NULL); +out: ib_ucm_ctx_put(ctx); return result; } Index: include/rdma/ib_cm.h =================================================================== --- include/rdma/ib_cm.h (revision 4356) +++ include/rdma/ib_cm.h (working copy) @@ -317,6 +317,10 @@ void ib_destroy_cm_id(struct ib_cm_id *c #define IB_SERVICE_ID_AGN_MASK __constant_cpu_to_be64(0xFF00000000000000ULL) #define IB_CM_ASSIGN_SERVICE_ID __constant_cpu_to_be64(0x0200000000000000ULL) +#define IB_CMA_SERVICE_ID __constant_cpu_to_be64(0x0000000001000000ULL) +#define IB_CMA_SERVICE_ID_MASK __constant_cpu_to_be64(0xFFFFFFFFFF000000ULL) +#define IB_SDP_SERVICE_ID __constant_cpu_to_be64(0x0000000000010000ULL) +#define IB_SDP_SERVICE_ID_MASK __constant_cpu_to_be64(0xFFFFFFFFFFFF0000ULL) struct ib_cm_private_data_compare { u8 data[IB_CM_PRIVATE_DATA_COMPARE_SIZE]; From xma at us.ibm.com Fri Dec 9 16:11:03 2005 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 9 Dec 2005 17:11:03 -0700 Subject: [openib-general] [PATCH] check create_srq in libibverbs Message-ID: create_irq is not a mandatory device function, therefor in userspace/libibverbs/src/verbs.c ibv_create_srq should check create_srq() first before calling it, otherwise the caller will cause the segmentation fault on device which doesn't support srq. Signed-off-by: Shirley Ma diff -urN userspace/libibverbs/src/verbs.c userspace-srq/libibverbs/src/verbs.c --- userspace/libibverbs/src/verbs.c 2005-11-14 13:44:52.000000000 -0800 +++ userspace-srq/libibverbs/src/verbs.c 2005-12-09 16:04:12.022433272 -0800 @@ -246,7 +246,9 @@ struct ibv_srq *ibv_create_srq(struct ibv_pd *pd, struct ibv_srq_init_attr *srq_init_attr) { - struct ibv_srq *srq = pd->context->ops.create_srq(pd, srq_init_attr); + struct ibv_srq *srq = NULL; + if (pd->context->ops.create_srq) + srq = pd->context->ops.create_srq(pd, srq_init_attr); if (srq) { srq->context = pd->context; Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ibv_srq.patch Type: application/octet-stream Size: 605 bytes Desc: not available URL: From rdreier at cisco.com Fri Dec 9 16:42:56 2005 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 09 Dec 2005 16:42:56 -0800 Subject: [openib-general] Re: mthca_qp patch In-Reply-To: <20051207154348.GZ21035@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 7 Dec 2005 17:43:48 +0200") References: <6AB138A2AB8C8E4A98B9C0C3D52670E3B8D6CD@mtlexch01.mtl.com> <20051207154348.GZ21035@mellanox.co.il> Message-ID: Thanks, I applied this and queued it in git as three separate patches. From rdreier at cisco.com Fri Dec 9 16:48:59 2005 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 09 Dec 2005 16:48:59 -0800 Subject: [openib-general] [PATCH] check create_srq in libibverbs In-Reply-To: (Shirley Ma's message of "Fri, 9 Dec 2005 17:11:03 -0700") References: Message-ID: Thanks, looks good. I'll apply this after some pending stuff I have in my tree... From rdreier at cisco.com Fri Dec 9 17:22:32 2005 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 09 Dec 2005 17:22:32 -0800 Subject: [openib-general] [PATCH/RFC] change ibv_get_devices() to ibv_get_device_list() Message-ID: This patch converts the ibv_get_devices() API to a better ibv_get_device_list(). The old API was bad because it exposed the dlist data structure exposed by libsysfs, which was not thread-safe and was just plain overly complex for what it was used for. In addition, I've converted over all the in-tree users of ibv_get_devices() that I could find -- DAPL, libehca, libibcm, librdmacm and mvapich. I'm planning to commit this early next week; any objections, comments, or suggestions before I do so? Thanks, Roland dapl/dapl/openib/dapl_ib_util.c | 36 ++++++----- dapl/dapl/openib_scm/dapl_ib_util.c | 35 +++++++---- libehca/configure.in | 2 libibcm/configure.in | 4 - libibcm/examples/cmpost.c | 7 -- libibverbs/ChangeLog | 13 +++- libibverbs/examples/asyncwatch.c | 16 +---- libibverbs/examples/device_list.c | 13 ++-- libibverbs/examples/devinfo.c | 101 +++++++++++++++----------------- libibverbs/examples/rc_pingpong.c | 9 +- libibverbs/examples/srq_pingpong.c | 9 +- libibverbs/examples/uc_pingpong.c | 9 +- libibverbs/examples/ud_pingpong.c | 9 +- libibverbs/include/infiniband/verbs.h | 16 ++++- libibverbs/src/device.c | 27 ++++++-- libibverbs/src/ibverbs.h | 11 +-- libibverbs/src/init.c | 72 +++++++++++++--------- libibverbs/src/libibverbs.map | 3 librdmacm/configure.in | 4 - librdmacm/src/cma.c | 14 ++-- mpi/mvapich-gen2/mpid/ch_gen2/viainit.c | 11 +++ perftest/rdma_bw.c | 9 +- perftest/rdma_lat.c | 9 +- 23 files changed, 255 insertions(+), 184 deletions(-) --- userspace/libibverbs/include/infiniband/verbs.h (revision 4360) +++ userspace/libibverbs/include/infiniband/verbs.h (working copy) @@ -585,9 +585,21 @@ struct ibv_context { }; /** - * ibv_get_devices - Return list of IB devices + * ibv_get_device_list - Get list of IB devices currently available + * @num_devices: optional. if non-NULL, set to the number of devices + * returned in the array. + * + * Return a NULL-terminated array of IB devices. The array can be + * released with ibv_free_device_list(). + */ +extern struct ibv_device **ibv_get_device_list(int *num_devices); + +/** + * ibv_free_device_list - Free list from ibv_get_device_list() + * + * Free an array of devices returned from ibv_get_device_list() */ -extern struct dlist *ibv_get_devices(void); +extern void ibv_free_device_list(struct ibv_device **list); /** * ibv_get_device_name - Return kernel device name --- userspace/libibverbs/ChangeLog (revision 4360) +++ userspace/libibverbs/ChangeLog (working copy) @@ -1,4 +1,15 @@ -2005-11-10 Sean Hefty +2005-11-11 Roland Dreier + + * examples/asyncwatch.c, examples/rc_pingpong.c, + examples/srq_pingpong.c, examples/uc_pingpong.c, + examples/ud_pingpong.c, examples/device_list.c, + examples/devinfo.c: Update examples to match new API. + + * include/infiniband/verbs.h, src/device.c, src/init.c, + src/ibverbs.h: Change from dlist-based ibv_get_devices() API to + simpler ibv_get_device_list() and ibv_free_device_list() API. + +2005-11-10 Sean Hefty * include/infiniband/sa-kern-abi.h: New include file to contain definitions of SA structures passed between userspace and kernel. --- userspace/libibverbs/src/libibverbs.map (revision 4360) +++ userspace/libibverbs/src/libibverbs.map (working copy) @@ -1,6 +1,7 @@ IBVERBS_1.0 { global: - ibv_get_devices; + ibv_get_device_list; + ibv_free_device_list; ibv_get_device_name; ibv_get_device_guid; ibv_open_device; --- userspace/libibverbs/src/device.c (revision 4360) +++ userspace/libibverbs/src/device.c (working copy) @@ -49,21 +49,36 @@ #include "ibverbs.h" static pthread_mutex_t device_list_lock = PTHREAD_MUTEX_INITIALIZER; -static struct dlist *device_list; +static int num_devices; +static struct ibv_device **device_list; -struct dlist *ibv_get_devices(void) +struct ibv_device **ibv_get_device_list(int *num) { - struct dlist *l; + struct ibv_device **l; + int i; pthread_mutex_lock(&device_list_lock); - if (!device_list) - device_list = ibverbs_init(); - l = device_list; + + if (!num_devices) + num_devices = ibverbs_init(&device_list); + + l = calloc(num_devices, sizeof (struct ibv_device *)); + for (i = 0; i < num_devices; ++i) + l[i] = device_list[i]; + pthread_mutex_unlock(&device_list_lock); + if (num) + *num = l ? num_devices : 0; + return l; } +void ibv_free_device_list(struct ibv_device **list) +{ + free(list); +} + const char *ibv_get_device_name(struct ibv_device *device) { return device->ibdev->name; --- userspace/libibverbs/src/ibverbs.h (revision 4360) +++ userspace/libibverbs/src/ibverbs.h (working copy) @@ -47,7 +47,8 @@ #define PFX "libibverbs: " struct ibv_driver { - ibv_driver_init_func init_func; + ibv_driver_init_func init_func; + struct ibv_driver *next; }; struct ibv_abi_compat_v2 { @@ -57,11 +58,11 @@ struct ibv_abi_compat_v2 { extern HIDDEN int abi_ver; -extern struct dlist *ibverbs_init(void); +extern HIDDEN int ibverbs_init(struct ibv_device ***list); -extern int ibv_init_mem_map(void); -extern int ibv_lock_range(void *base, size_t size); -extern int ibv_unlock_range(void *base, size_t size); +extern HIDDEN int ibv_init_mem_map(void); +extern HIDDEN int ibv_lock_range(void *base, size_t size); +extern HIDDEN int ibv_unlock_range(void *base, size_t size); #define IBV_INIT_CMD(cmd, size, opcode) \ do { \ --- userspace/libibverbs/src/init.c (revision 4360) +++ userspace/libibverbs/src/init.c (working copy) @@ -55,7 +55,7 @@ HIDDEN int abi_ver; static char default_path[] = DRIVER_PATH; static const char *user_path; -static struct dlist *driver_list; +static struct ibv_driver *driver_list; static void load_driver(char *so_path) { @@ -82,7 +82,8 @@ static void load_driver(char *so_path) } driver->init_func = init_func; - dlist_push(driver_list, driver); + driver->next = driver_list; + driver_list = driver; } static void find_drivers(char *dir) @@ -112,8 +113,7 @@ static void find_drivers(char *dir) load_driver(so_glob.gl_pathv[i]); } -static void init_drivers(struct sysfs_class_device *verbs_dev, - struct dlist *device_list) +static struct ibv_device *init_drivers(struct sysfs_class_device *verbs_dev) { struct sysfs_class_device *ib_dev; struct sysfs_attribute *attr; @@ -125,7 +125,7 @@ static void init_drivers(struct sysfs_cl if (!attr) { fprintf(stderr, PFX "Warning: no ibdev class attr for %s\n", verbs_dev->name); - return; + return NULL; } sscanf(attr->value, "%63s", ibdev_name); @@ -134,19 +134,17 @@ static void init_drivers(struct sysfs_cl if (!ib_dev) { fprintf(stderr, PFX "Warning: no infiniband class device %s for %s\n", attr->value, verbs_dev->name); - return; + return NULL; } - dlist_for_each_data(driver_list, driver, struct ibv_driver) { + for (driver = driver_list; driver; driver = driver->next) { dev = driver->init_func(verbs_dev); if (dev) { dev->dev = verbs_dev; dev->ibdev = ib_dev; dev->driver = driver; - dlist_push(device_list, dev); - - return; + return dev; } } @@ -155,6 +153,8 @@ static void init_drivers(struct sysfs_cl if (user_path) fprintf(stderr, "%s:", user_path); fprintf(stderr, "%s\n", default_path); + + return NULL; } static int check_abi_version(void) @@ -188,28 +188,23 @@ static int check_abi_version(void) } -struct dlist *ibverbs_init(void) +HIDDEN int ibverbs_init(struct ibv_device ***list) { char *wr_path, *dir; struct sysfs_class *cls; struct dlist *verbs_dev_list; - struct dlist *device_list; struct sysfs_class_device *verbs_dev; + struct ibv_device *device; + struct ibv_device **new_list; + int num_devices = 0; + int list_size = 0; - driver_list = dlist_new(sizeof (struct ibv_driver)); - device_list = dlist_new(sizeof (struct ibv_device)); - if (!driver_list || !device_list) { - fprintf(stderr, PFX "Fatal: couldn't allocate device/driver list.\n"); - abort(); - } + *list = NULL; if (ibv_init_mem_map()) - return NULL; + return 0; - /* - * Check if a driver is statically linked, and if so load it first. - */ - load_driver(NULL); + find_drivers(default_path); /* * Only follow the path passed in through the calling user's @@ -224,25 +219,42 @@ struct dlist *ibverbs_init(void) } } - find_drivers(default_path); + /* + * Now check if a driver is statically linked. Since we push + * drivers onto our driver list, the last driver we find will + * be the first one we try. + */ + load_driver(NULL); cls = sysfs_open_class("infiniband_verbs"); if (!cls) { fprintf(stderr, PFX "Fatal: couldn't open sysfs class 'infiniband_verbs'.\n"); - return NULL; + return 0; } if (check_abi_version()) - return NULL; + return 0; verbs_dev_list = sysfs_get_class_devices(cls); if (!verbs_dev_list) { fprintf(stderr, PFX "Fatal: no infiniband class devices found.\n"); - return NULL; + return 0; } - dlist_for_each_data(verbs_dev_list, verbs_dev, struct sysfs_class_device) - init_drivers(verbs_dev, device_list); + dlist_for_each_data(verbs_dev_list, verbs_dev, struct sysfs_class_device) { + device = init_drivers(verbs_dev); + if (device) { + if (list_size <= num_devices) { + list_size = list_size ? list_size * 2 : 1; + new_list = realloc(*list, list_size * sizeof (struct ibv_device *)); + if (!new_list) + goto out; + *list = new_list; + } + *list[num_devices++] = device; + } + } - return device_list; +out: + return num_devices; } --- userspace/libibverbs/examples/asyncwatch.c (revision 4360) +++ userspace/libibverbs/examples/asyncwatch.c (working copy) @@ -50,34 +50,30 @@ static inline uint64_t be64_to_cpu(uint6 int main(int argc, char *argv[]) { - struct dlist *dev_list; - struct ibv_device *ib_dev; + struct ibv_device **dev_list; struct ibv_context *context; struct ibv_async_event event; - dev_list = ibv_get_devices(); + dev_list = ibv_get_device_list(NULL); if (!dev_list) { fprintf(stderr, "No IB devices found\n"); return 1; } - dlist_start(dev_list); - ib_dev = dlist_next(dev_list); - - if (!ib_dev) { + if (!*dev_list) { fprintf(stderr, "No IB devices found\n"); return 1; } - context = ibv_open_device(ib_dev); + context = ibv_open_device(*dev_list); if (!context) { fprintf(stderr, "Couldn't get context for %s\n", - ibv_get_device_name(ib_dev)); + ibv_get_device_name(*dev_list)); return 1; } printf("%s: async event FD %d\n", - ibv_get_device_name(ib_dev), context->async_fd); + ibv_get_device_name(*dev_list), context->async_fd); while (1) { if (ibv_get_async_event(context, &event)) --- userspace/libibverbs/examples/rc_pingpong.c (revision 4360) +++ userspace/libibverbs/examples/rc_pingpong.c (working copy) @@ -447,7 +447,7 @@ static void usage(const char *argv0) int main(int argc, char *argv[]) { - struct dlist *dev_list; + struct ibv_device **dev_list; struct ibv_device *ib_dev; struct pingpong_context *ctx; struct pingpong_dest my_dest; @@ -536,21 +536,20 @@ int main(int argc, char *argv[]) page_size = sysconf(_SC_PAGESIZE); - dev_list = ibv_get_devices(); + dev_list = ibv_get_device_list(NULL); if (!dev_list) { fprintf(stderr, "No IB devices found\n"); return 1; } - dlist_start(dev_list); if (!ib_devname) { - ib_dev = dlist_next(dev_list); + ib_dev = *dev_list; if (!ib_dev) { fprintf(stderr, "No IB devices found\n"); return 1; } } else { - dlist_for_each_data(dev_list, ib_dev, struct ibv_device) + for (ib_dev = *dev_list; ib_dev; ++dev_list) if (!strcmp(ibv_get_device_name(ib_dev), ib_devname)) break; if (!ib_dev) { --- userspace/libibverbs/examples/srq_pingpong.c (revision 4360) +++ userspace/libibverbs/examples/srq_pingpong.c (working copy) @@ -509,7 +509,7 @@ static void usage(const char *argv0) int main(int argc, char *argv[]) { - struct dlist *dev_list; + struct ibv_device **dev_list; struct ibv_device *ib_dev; struct pingpong_context *ctx; struct pingpong_dest my_dest[MAX_QP]; @@ -605,21 +605,20 @@ int main(int argc, char *argv[]) page_size = sysconf(_SC_PAGESIZE); - dev_list = ibv_get_devices(); + dev_list = ibv_get_device_list(NULL); if (!dev_list) { fprintf(stderr, "No IB devices found\n"); return 1; } - dlist_start(dev_list); if (!ib_devname) { - ib_dev = dlist_next(dev_list); + ib_dev = *dev_list; if (!ib_dev) { fprintf(stderr, "No IB devices found\n"); return 1; } } else { - dlist_for_each_data(dev_list, ib_dev, struct ibv_device) + for (ib_dev = *dev_list; ib_dev; ++dev_list) if (!strcmp(ibv_get_device_name(ib_dev), ib_devname)) break; if (!ib_dev) { --- userspace/libibverbs/examples/uc_pingpong.c (revision 4360) +++ userspace/libibverbs/examples/uc_pingpong.c (working copy) @@ -435,7 +435,7 @@ static void usage(const char *argv0) int main(int argc, char *argv[]) { - struct dlist *dev_list; + struct ibv_device **dev_list; struct ibv_device *ib_dev; struct pingpong_context *ctx; struct pingpong_dest my_dest; @@ -524,21 +524,20 @@ int main(int argc, char *argv[]) page_size = sysconf(_SC_PAGESIZE); - dev_list = ibv_get_devices(); + dev_list = ibv_get_device_list(NULL); if (!dev_list) { fprintf(stderr, "No IB devices found\n"); return 1; } - dlist_start(dev_list); if (!ib_devname) { - ib_dev = dlist_next(dev_list); + ib_dev = *dev_list; if (!ib_dev) { fprintf(stderr, "No IB devices found\n"); return 1; } } else { - dlist_for_each_data(dev_list, ib_dev, struct ibv_device) + for (ib_dev = *dev_list; ib_dev; ++dev_list) if (!strcmp(ibv_get_device_name(ib_dev), ib_devname)) break; if (!ib_dev) { --- userspace/libibverbs/examples/ud_pingpong.c (revision 4360) +++ userspace/libibverbs/examples/ud_pingpong.c (working copy) @@ -443,7 +443,7 @@ static void usage(const char *argv0) int main(int argc, char *argv[]) { - struct dlist *dev_list; + struct ibv_device **dev_list; struct ibv_device *ib_dev; struct pingpong_context *ctx; struct pingpong_dest my_dest; @@ -532,21 +532,20 @@ int main(int argc, char *argv[]) page_size = sysconf(_SC_PAGESIZE); - dev_list = ibv_get_devices(); + dev_list = ibv_get_device_list(NULL); if (!dev_list) { fprintf(stderr, "No IB devices found\n"); return 1; } - dlist_start(dev_list); if (!ib_devname) { - ib_dev = dlist_next(dev_list); + ib_dev = *dev_list; if (!ib_dev) { fprintf(stderr, "No IB devices found\n"); return 1; } } else { - dlist_for_each_data(dev_list, ib_dev, struct ibv_device) + for (ib_dev = *dev_list; ib_dev; ++dev_list) if (!strcmp(ibv_get_device_name(ib_dev), ib_devname)) break; if (!ib_dev) { --- userspace/libibverbs/examples/device_list.c (revision 4360) +++ userspace/libibverbs/examples/device_list.c (working copy) @@ -51,10 +51,9 @@ static inline uint64_t be64_to_cpu(uint6 int main(int argc, char *argv[]) { - struct dlist *dev_list; - struct ibv_device *ib_dev; + struct ibv_device **dev_list; - dev_list = ibv_get_devices(); + dev_list = ibv_get_device_list(NULL); if (!dev_list) { fprintf(stderr, "No IB devices found\n"); return 1; @@ -63,10 +62,12 @@ int main(int argc, char *argv[]) printf(" %-16s\t node GUID\n", "device"); printf(" %-16s\t----------------\n", "------"); - dlist_for_each_data(dev_list, ib_dev, struct ibv_device) + while (*dev_list) { printf(" %-16s\t%016llx\n", - ibv_get_device_name(ib_dev), - (unsigned long long) be64_to_cpu(ibv_get_device_guid(ib_dev))); + ibv_get_device_name(*dev_list), + (unsigned long long) be64_to_cpu(ibv_get_device_guid(*dev_list))); + ++dev_list; + } return 0; } --- userspace/libibverbs/examples/devinfo.c (revision 4360) +++ userspace/libibverbs/examples/devinfo.c (working copy) @@ -299,11 +299,11 @@ cleanup: static void usage(const char *argv0) { - printf("Usage: %s print the ca attributes\n", argv0); - printf("\n"); - printf("Options:\n"); - printf(" -d, --ib-dev= use IB device (default first device found)\n"); - printf(" -i, --ib-port= use port of IB device (default all ports)\n"); + printf("Usage: %s print the ca attributes\n", argv0); + printf("\n"); + printf("Options:\n"); + printf(" -d, --ib-dev= use IB device (default first device found)\n"); + printf(" -i, --ib-port= use port of IB device (default all ports)\n"); printf(" -l, --list print only the IB devices names\n"); printf(" -v, --verbose print all the attributes of the IB device(s)\n"); } @@ -312,60 +312,56 @@ int main(int argc, char *argv[]) { char *ib_devname = NULL; int ret = 0; - struct dlist *dev_list; - struct ibv_device *ib_dev; + struct ibv_device **dev_list; int num_of_hcas; int ib_port = 0; /* parse command line options */ while (1) { int c; - static struct option long_options[] = { - { .name = "ib-dev", .has_arg = 1, .val = 'd' }, - { .name = "ib-port", .has_arg = 1, .val = 'i' }, + static struct option long_options[] = { + { .name = "ib-dev", .has_arg = 1, .val = 'd' }, + { .name = "ib-port", .has_arg = 1, .val = 'i' }, { .name = "list", .has_arg = 0, .val = 'l' }, - { .name = "verbose", .has_arg = 0, .val = 'v' }, - { 0, 0, 0, 0} - }; + { .name = "verbose", .has_arg = 0, .val = 'v' }, + { 0, 0, 0, 0} + }; - c = getopt_long(argc, argv, "d:i:lv", long_options, NULL); - if (c == -1) - break; - - switch (c) { - case 'd': - ib_devname = strdup(optarg); - break; - - case 'i': - ib_port = strtol(optarg, NULL, 0); - if (ib_port < 0) { - usage(argv[0]); - return 1; - } - break; + c = getopt_long(argc, argv, "d:i:lv", long_options, NULL); + if (c == -1) + break; + + switch (c) { + case 'd': + ib_devname = strdup(optarg); + break; + + case 'i': + ib_port = strtol(optarg, NULL, 0); + if (ib_port < 0) { + usage(argv[0]); + return 1; + } + break; case 'v': - verbose = 1; - break; + verbose = 1; + break; case 'l': - dev_list = ibv_get_devices(); + dev_list = ibv_get_device_list(&num_of_hcas); if (!dev_list) { fprintf(stderr, "Failed to get IB devices list"); return -1; } - num_of_hcas = 0; - dlist_for_each_data(dev_list, ib_dev, struct ibv_device) - num_of_hcas ++; - printf("%d HCA%s found:\n", num_of_hcas, num_of_hcas != 1 ? "s" : ""); - dlist_start(dev_list); - dlist_for_each_data(dev_list, ib_dev, struct ibv_device) - printf("\t%s\n", ibv_get_device_name(ib_dev)); + while (*dev_list) { + printf("\t%s\n", ibv_get_device_name(*dev_list)); + ++dev_list; + } printf("\n"); return 0; @@ -376,28 +372,31 @@ int main(int argc, char *argv[]) } } - dev_list = ibv_get_devices(); + dev_list = ibv_get_device_list(NULL); if (!dev_list) { fprintf(stderr, "Failed to get IB device list\n"); return -1; } - dlist_start(dev_list); + if (ib_devname) { - dlist_for_each_data(dev_list, ib_dev, struct ibv_device) - if (!strcmp(ibv_get_device_name(ib_dev), ib_devname)) + while (*dev_list) { + if (!strcmp(ibv_get_device_name(*dev_list), ib_devname)) break; - if (!ib_dev) { + ++dev_list; + } + + if (!*dev_list) { fprintf(stderr, "IB device '%s' wasn't found\n", ib_devname); return -1; } - ret |= print_hca_cap(ib_dev, ib_port); + + ret |= print_hca_cap(*dev_list, ib_port); } else { - ib_dev = dlist_next(dev_list); - if (!ib_dev) { - fprintf(stderr, "No IB devices found\n"); - return -1; - } - ret |= print_hca_cap(ib_dev, ib_port); + if (!*dev_list) { + fprintf(stderr, "No IB devices found\n"); + return -1; + } + ret |= print_hca_cap(*dev_list, ib_port); } if (ib_devname) --- userspace/dapl/dapl/openib/dapl_ib_util.c (revision 4360) +++ userspace/dapl/dapl/openib/dapl_ib_util.c (working copy) @@ -206,29 +206,34 @@ DAT_RETURN dapls_ib_open_hca ( IN IB_HCA_NAME hca_name, IN DAPL_HCA *hca_ptr) { - struct dlist *dev_list; + struct ibv_device **dev_list; long opts; + int i; dapl_dbg_log (DAPL_DBG_TYPE_UTIL, " open_hca: %s - %p\n", hca_name, hca_ptr ); /* Get list of all IB devices, find match, open */ - dev_list = ibv_get_devices(); - dlist_start(dev_list); - dlist_for_each_data(dev_list, - hca_ptr->ib_trans.ib_dev, - struct ibv_device) { - if (!strcmp(ibv_get_device_name(hca_ptr->ib_trans.ib_dev), - hca_name)) - break; - } - - if (!hca_ptr->ib_trans.ib_dev) { + dev_list = ibv_get_device_list(NULL); + if (!dev_list) { dapl_dbg_log (DAPL_DBG_TYPE_ERR, - " open_hca: IB device %s not found\n", + " open_hca: ibv_get_device_list() failed\n", hca_name); return DAT_INTERNAL_ERROR; } + + for (i = 0; dev_list[i]; ++i) { + hca_ptr->ib_trans.ib_dev = dev_list[i]; + if (!strcmp(ibv_get_device_name(hca_ptr->ib_trans.ib_dev), + hca_name)) + goto found; + } + + dapl_dbg_log (DAPL_DBG_TYPE_ERR, + " open_hca: IB device %s not found\n", + hca_name); + goto err; +found: dapl_dbg_log ( DAPL_DBG_TYPE_UTIL," open_hca: Found dev %s %016llx\n", ibv_get_device_name(hca_ptr->ib_trans.ib_dev), @@ -240,7 +245,7 @@ DAT_RETURN dapls_ib_open_hca ( dapl_dbg_log (DAPL_DBG_TYPE_ERR, " open_hca: IB dev open failed for %s\n", ibv_get_device_name(hca_ptr->ib_trans.ib_dev)); - return DAT_INTERNAL_ERROR; + goto err; } hca_ptr->ib_trans.ib_ctx = hca_ptr->ib_hca_handle; @@ -336,11 +341,14 @@ DAT_RETURN dapls_ib_open_hca ( hca_ptr->ib_trans.max_inline_send ); hca_ptr->ib_trans.d_hca = hca_ptr; + ibv_free_device_list(dev_list); return DAT_SUCCESS; bail: ibv_close_device(hca_ptr->ib_hca_handle); hca_ptr->ib_hca_handle = IB_INVALID_HANDLE; +err: + ibv_free_device_list(dev_list); return DAT_INTERNAL_ERROR; } --- userspace/dapl/dapl/openib_scm/dapl_ib_util.c (revision 4360) +++ userspace/dapl/dapl/openib_scm/dapl_ib_util.c (working copy) @@ -131,28 +131,35 @@ DAT_RETURN dapls_ib_open_hca ( IN IB_HCA_NAME hca_name, IN DAPL_HCA *hca_ptr) { - struct dlist *dev_list; + struct ibv_device **dev_list; int opts; + int i; DAT_RETURN dat_status = DAT_SUCCESS; dapl_dbg_log (DAPL_DBG_TYPE_UTIL, " open_hca: %s - %p\n", hca_name, hca_ptr ); /* Get list of all IB devices, find match, open */ - dev_list = ibv_get_devices(); - dlist_start(dev_list); - dlist_for_each_data(dev_list,hca_ptr->ib_trans.ib_dev,struct ibv_device) { - if (!strcmp(ibv_get_device_name(hca_ptr->ib_trans.ib_dev),hca_name)) - break; - } - - if (!hca_ptr->ib_trans.ib_dev) { + dev_list = ibv_get_device_list(NULL); + if (!dev_list) { dapl_dbg_log (DAPL_DBG_TYPE_ERR, - " open_hca: IB device %s not found\n", + " open_hca: ibv_get_device_list() failed\n", hca_name); return DAT_INTERNAL_ERROR; } - + + for (i = 0; dev_list[i]; ++i) { + hca_ptr->ib_trans.ib_dev = dev_list[i]; + if (!strcmp(ibv_get_device_name(hca_ptr->ib_trans.ib_dev),hca_name)) + goto found; + } + + dapl_dbg_log (DAPL_DBG_TYPE_ERR, + " open_hca: IB device %s not found\n", + hca_name); + goto err; + +found: dapl_dbg_log (DAPL_DBG_TYPE_UTIL," open_hca: Found dev %s %016llx\n", ibv_get_device_name(hca_ptr->ib_trans.ib_dev), (unsigned long long)bswap_64(ibv_get_device_guid(hca_ptr->ib_trans.ib_dev))); @@ -162,7 +169,7 @@ DAT_RETURN dapls_ib_open_hca ( dapl_dbg_log (DAPL_DBG_TYPE_ERR, " open_hca: IB dev open failed for %s\n", ibv_get_device_name(hca_ptr->ib_trans.ib_dev) ); - return DAT_INTERNAL_ERROR; + goto err; } /* set inline max with enviroment or default */ @@ -242,10 +249,14 @@ DAT_RETURN dapls_ib_open_hca ( ((struct sockaddr_in *)&hca_ptr->hca_address)->sin_addr.s_addr >> 16 & 0xff, ((struct sockaddr_in *)&hca_ptr->hca_address)->sin_addr.s_addr >> 24 & 0xff ); + ibv_free_device_list(dev_list); return dat_status; + bail: ibv_close_device(hca_ptr->ib_hca_handle); hca_ptr->ib_hca_handle = IB_INVALID_HANDLE; +err: + ibv_free_device_list(dev_list); return DAT_INTERNAL_ERROR; } --- userspace/mpi/mvapich-gen2/mpid/ch_gen2/viainit.c (revision 4360) +++ userspace/mpi/mvapich-gen2/mpid/ch_gen2/viainit.c (working copy) @@ -74,13 +74,22 @@ static void set_malloc_options(void) static void open_hca(void) { - struct dlist *dev_list; struct ibv_device *ib_dev = NULL; +#ifdef GEN2_OLD_DEVICE_LIST_VERB + struct dlist *dev_list; + dev_list = ibv_get_devices(); dlist_start(dev_list); ib_dev = dlist_next(dev_list); +#else + struct ibv_device **dev_list; + + dev_list = ibv_get_device_list(NULL); + ib_dev = dev_list[0]; + ibv_free_device_list(dev_list); +#endif if (!ib_dev) { fprintf(stderr, "No IB devices found\n"); --- userspace/libehca/configure.in (revision 4360) +++ userspace/libehca/configure.in (working copy) @@ -12,7 +12,7 @@ AC_HEADER_STDC dnl Checks for libraries. AC_CHECK_LIB(ibverbs, - ibv_get_devices, + ibv_get_device_list, [], AC_MSG_ERROR([libibverbs not installed])) --- userspace/librdmacm/configure.in (revision 4360) +++ userspace/librdmacm/configure.in (working copy) @@ -25,8 +25,8 @@ AC_CHECK_SIZEOF(long) dnl Checks for libraries if test "$disable_libcheck" != "yes" then -AC_CHECK_LIB(ibverbs, ibv_get_devices, [], - AC_MSG_ERROR([ibv_get_devices() not found. librdmacm requires libibverbs.])) +AC_CHECK_LIB(ibverbs, ibv_get_device_list, [], + AC_MSG_ERROR([ibv_get_device_list() not found. librdmacm requires libibverbs.])) fi dnl Checks for header files. --- userspace/librdmacm/src/cma.c (revision 4360) +++ userspace/librdmacm/src/cma.c (working copy) @@ -114,7 +114,7 @@ struct cma_id_private { uint32_t handle; }; -static struct dlist *dev_list; +static struct ibv_device **dev_list; static struct dlist *cma_dev_list; static pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER; static int ucma_initialized; @@ -141,7 +141,7 @@ static void ucma_cleanup(void) static int ucma_init(void) { - struct ibv_device *dev; + int i; struct cma_device *cma_dev; struct ibv_device_attr attr; int ret; @@ -163,22 +163,22 @@ static int ucma_init(void) goto err; } - dev_list = ibv_get_devices(); + dev_list = ibv_get_device_list(NULL); if (!dev_list) { printf("CMA: unable to get RDMA device liste\n"); ret = -ENODEV; goto err; } - dlist_for_each_data(dev_list, dev, struct ibv_device) { + for (i = 0; dev_list[i]; ++i) { cma_dev = malloc(sizeof *cma_dev); if (!cma_dev) { ret = -ENOMEM; goto err; } - cma_dev->guid = ibv_get_device_guid(dev); - cma_dev->verbs = ibv_open_device(dev); + cma_dev->guid = ibv_get_device_guid(dev_list[i]); + cma_dev->verbs = ibv_open_device(dev_list[i]); if (!cma_dev->verbs) { printf("CMA: unable to open RDMA device\n"); ret = -ENODEV; @@ -201,6 +201,8 @@ out: err: ucma_cleanup(); pthread_mutex_unlock(&mut); + if (dev_list) + ibv_free_device_list(dev_list); return ret; } --- userspace/perftest/rdma_lat.c (revision 4360) +++ userspace/perftest/rdma_lat.c (working copy) @@ -105,18 +105,17 @@ static uint16_t pp_get_local_lid(struct static struct ibv_device *pp_find_dev(const char *ib_devname) { - struct dlist *dev_list; + struct ibv_device **dev_list; struct ibv_device *ib_dev = NULL; - dev_list = ibv_get_devices(); + dev_list = ibv_get_device_list(NULL); - dlist_start(dev_list); if (!ib_devname) { - ib_dev = dlist_next(dev_list); + ib_dev = dev_list[0]; if (!ib_dev) fprintf(stderr, "No IB devices found\n"); } else { - dlist_for_each_data(dev_list, ib_dev, struct ibv_device) + for (ib_dev = *dev_list; ib_dev; ++dev_list) if (!strcmp(ibv_get_device_name(ib_dev), ib_devname)) break; if (!ib_dev) --- userspace/perftest/rdma_bw.c (revision 4360) +++ userspace/perftest/rdma_bw.c (working copy) @@ -472,7 +472,7 @@ static void print_report(unsigned int it int main(int argc, char *argv[]) { - struct dlist *dev_list; + struct ibv_device **dev_list; struct ibv_device *ib_dev; struct pingpong_context *ctx; struct pingpong_dest my_dest; @@ -587,17 +587,16 @@ int main(int argc, char *argv[]) page_size = sysconf(_SC_PAGESIZE); - dev_list = ibv_get_devices(); + dev_list = ibv_get_device_list(NULL); - dlist_start(dev_list); if (!ib_devname) { - ib_dev = dlist_next(dev_list); + ib_dev = dev_list[0]; if (!ib_dev) { fprintf(stderr, "No IB devices found\n"); return 1; } } else { - dlist_for_each_data(dev_list, ib_dev, struct ibv_device) + for (ib_dev = *dev_list; ib_dev; ++dev_list) if (!strcmp(ibv_get_device_name(ib_dev), ib_devname)) break; if (!ib_dev) { --- userspace/libibcm/configure.in (revision 4360) +++ userspace/libibcm/configure.in (working copy) @@ -25,8 +25,8 @@ AC_CHECK_SIZEOF(long) dnl Checks for libraries if test "$disable_libcheck" != "yes" then -AC_CHECK_LIB(ibverbs, ibv_get_devices, [], - AC_MSG_ERROR([ibv_get_devices() not found. libibcm requires libibverbs.])) +AC_CHECK_LIB(ibverbs, ibv_get_device_list, [], + AC_MSG_ERROR([ibv_get_device_list() not found. libibcm requires libibverbs.])) #AC_CHECK_LIB(rdmacm, rdma_create_id, [], # AC_MSG_ERROR([rdma_create_id() not found. ucmpost requires librdmacm.])) fi --- userspace/libibcm/examples/cmpost.c (revision 4360) +++ userspace/libibcm/examples/cmpost.c (working copy) @@ -423,15 +423,14 @@ static void destroy_messages(void) static int init(void) { - struct dlist *dev_list; + struct ibv_device **dev_list; int ret; test.connects_left = connections; test.disconnects_left = connections; - dev_list = ibv_get_devices(); - dlist_start(dev_list); - test.device = dlist_next(dev_list); + dev_list = ibv_get_device_list(NULL); + test.device = dev_list[0]; if (!test.device) return -1; From sean.hefty at intel.com Sat Dec 10 09:49:49 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Sat, 10 Dec 2005 09:49:49 -0800 Subject: [openib-general] [PATCH/RFC] change ibv_get_devices() toibv_get_device_list() In-Reply-To: Message-ID: >I'm planning to commit this early next week; any objections, comments, >or suggestions before I do so? I'm in favor of this change. - Sean From mst at mellanox.co.il Sat Dec 10 11:17:43 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 10 Dec 2005 21:17:43 +0200 Subject: [openib-general] Re: [PATCH/RFC] change ibv_get_devices() to ibv_get_device_list() In-Reply-To: References: Message-ID: <20051210191743.GB30682@mellanox.co.il> Quoting r. Roland Dreier : > Subject: [PATCH/RFC] change ibv_get_devices() to ibv_get_device_list() > > This patch converts the ibv_get_devices() API to a better > ibv_get_device_list(). The old API was bad because it exposed the > dlist data structure exposed by libsysfs, which was not thread-safe > and was just plain overly complex for what it was used for. > > In addition, I've converted over all the in-tree users of > ibv_get_devices() that I could find -- DAPL, libehca, libibcm, > librdmacm and mvapich. > > I'm planning to commit this early next week; any objections, comments, > or suggestions before I do so? > > Thanks, > Roland To make hotplug feasible, we need to document requirements that 1. users call free_device_list after opening relevant devices 2. no user opens a device after calling free_device_list -- MST From mst at mellanox.co.il Sat Dec 10 13:21:40 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 10 Dec 2005 23:21:40 +0200 Subject: [openib-general] [PATCH applied] sdp: fix kunmap_atomic usage Message-ID: <20051210212140.GA30971@mellanox.co.il> SDP was using kunmap_atomic incorrectly. Of course, I'm typically using it on platforms where its a nop, so I dint notice :) The following is already applied. --- kunmap_atomic gets a virtual address, not a page* pointer. Signed-off-by: Michael S. Tsirkin Index: openib/drivers/infiniband/ulp/sdp/sdp_send.c =================================================================== --- openib/drivers/infiniband/ulp/sdp/sdp_send.c (revision 4369) +++ openib/drivers/infiniband/ulp/sdp/sdp_send.c (working copy) @@ -647,7 +647,7 @@ static int sdp_send_data_iocb_src(struct memcpy(buff->tail, addr + off, len); - kunmap_atomic(iocb->page_array[pos], KM_IRQ0); + kunmap_atomic(addr, KM_IRQ0); local_irq_restore(flags); Index: openib/drivers/infiniband/ulp/sdp/sdp_recv.c =================================================================== --- openib/drivers/infiniband/ulp/sdp/sdp_recv.c (revision 4369) +++ openib/drivers/infiniband/ulp/sdp/sdp_recv.c (working copy) @@ -618,7 +618,7 @@ static int sdp_read_buff_iocb(struct sdp iocb->io_addr += copy; - kunmap_atomic(iocb->page_array[counter], KM_IRQ0); + kunmap_atomic(addr, KM_IRQ0); ++counter; local_irq_restore(flags); -- MST From mst at mellanox.co.il Sat Dec 10 13:48:32 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 10 Dec 2005 23:48:32 +0200 Subject: [openib-general] Re: [PATCH/RFC] change ibv_get_devices() to ibv_get_device_list() In-Reply-To: References: Message-ID: <20051210214832.GA31057@mellanox.co.il> Quoting Roland Dreier : > --- userspace/mpi/mvapich-gen2/mpid/ch_gen2/viainit.c (revision 4360) > +++ userspace/mpi/mvapich-gen2/mpid/ch_gen2/viainit.c (working copy) > @@ -74,13 +74,22 @@ static void set_malloc_options(void) > > static void open_hca(void) > { > - struct dlist *dev_list; > struct ibv_device *ib_dev = NULL; > > +#ifdef GEN2_OLD_DEVICE_LIST_VERB > + struct dlist *dev_list; > + > dev_list = ibv_get_devices(); > > dlist_start(dev_list); > ib_dev = dlist_next(dev_list); > +#else > + struct ibv_device **dev_list; > + > + dev_list = ibv_get_device_list(NULL); > + ib_dev = dev_list[0]; > + ibv_free_device_list(dev_list); > +#endif > > if (!ib_dev) { > fprintf(stderr, "No IB devices found\n"); This wont work for hotplug: you are saving the device pointer without opening the device, so it might go away from under your feet. I wander whether we can come up with an API that helps people get it right more easily? -- MST From iod00d at hp.com Sat Dec 10 19:23:47 2005 From: iod00d at hp.com (Grant Grundler) Date: Sat, 10 Dec 2005 19:23:47 -0800 Subject: [openib-general] [PATCH] better warning about libsdp.conf location Message-ID: <20051211032347.GC9348@esmail.cup.hp.com> Michael, When LIBSDP_DEFAULT_CONFIG_FILE isn't set, the default location lidsdp looks for libsdp.conf doesn't match where the Makefile installs it (sysconfdir = /usr/local/etc). Patch below also provides a _useful_ warning message by indicating *why* we are warning the user and the default location (which might vary by release). It just occurred to me that libsdp could set LIBSDP_CONFIG_FILE so the warning doesn't appear on the next invocation. Oh well, idea for another patch... thanks, grant Signed-off-by: Grant Grundler Index: src/userspace/libsdp/src/port.c =================================================================== --- src/userspace/libsdp/src/port.c (revision 4356) +++ src/userspace/libsdp/src/port.c (working copy) @@ -1202,8 +1202,9 @@ if (config_file) { __sdp_read_config(config_file); } else { - printf("default libsdp configuration is used\n"); -#define LIBSDP_DEFAULT_CONFIG_FILE "/usr/local/ibgd/etc/libsdp.conf" +#define LIBSDP_DEFAULT_CONFIG_FILE "/usr/local/etc/libsdp.conf" + printf("libsdp.so: $LIBSDP_CONFIG_FILE not set. Using " + LIBSDP_DEFAULT_CONFIG_FILE "\n"); __sdp_read_config(LIBSDP_DEFAULT_CONFIG_FILE); } } /* __sdp_init */ From iod00d at hp.com Sat Dec 10 20:15:13 2005 From: iod00d at hp.com (Grant Grundler) Date: Sat, 10 Dec 2005 20:15:13 -0800 Subject: [openib-general] [PATCH applied] sdp: fix kunmap_atomic usage In-Reply-To: <20051210212140.GA30971@mellanox.co.il> References: <20051210212140.GA30971@mellanox.co.il> Message-ID: <20051211041513.GD9348@esmail.cup.hp.com> On Sat, Dec 10, 2005 at 11:21:40PM +0200, Michael S. Tsirkin wrote: > SDP was using kunmap_atomic incorrectly. > Of course, I'm typically using it on platforms where its a nop, > so I dint notice :) It's a real function on ia64 so I had to try this. :) One of the recent changes (possibly this one) seems to have fixed the issue! I'll have to run a full set but the initial test was promising. IIRC, the most recent "failure" was with r4279. With r4371, I'm now getting: gsyprf3:~# LD_PRELOAD=/usr/local/lib/libsdp.so /usr/local/bin/netperf -p 12866 -l 60 -H 10.0.0.30 -t TCP_STREAM -T 1,1 -- -m 512 -s 16384 -S 16384 libsdp.so: $LIBSDP_CONFIG_FILE not set. Using /usr/local/etc/libsdp.conf bind_to_specific_processor: enter masking masked TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.30 (10.0.0.30) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 32768 32768 512 60.01 4823.45 gsyprf3:~# ("dual rope" PCI-X on a pair of HP rx2600, 1.5Ghz/6M) Single stream throughput normally peaks around ~5.5 to 6 Gb/s with this configuration. thanks! grant From iod00d at hp.com Sat Dec 10 20:34:52 2005 From: iod00d at hp.com (Grant Grundler) Date: Sat, 10 Dec 2005 20:34:52 -0800 Subject: [openib-general] [PATCH] OPENSM make missing /dev/infiniband/umad entries obvious Message-ID: <20051211043452.GF9348@esmail.cup.hp.com> Hi, When installing openib bits on a new machine, I wasted an unreasonable and absurd amount of time (by doing some other stupid things) when OpenSM failed to start and gave an error msg about "osm_vendor_bind: ERR 5424: Unable to Open Port 0x1321ffff75787a" Once I enabled debugging in umad.c the problem was obvious: I forgot to mknod the /dev/infiniband device files! Doh! (obviously didn't have udev installed either.) Just tell me the first time around please. Don't hide error messages that users likely to run opensm are able to correct. This error now shows up on the controlling tty. thanks, grant Signed-off-by: Grant Grundler Index: src/userspace/management/libibumad/src/umad.c =================================================================== --- src/userspace/management/libibumad/src/umad.c (revision 4371) +++ src/userspace/management/libibumad/src/umad.c (working copy) @@ -558,7 +558,7 @@ umad_open_port(char *ca_name, int portnu UMAD_DEV_DIR , umad_id); if ((port->dev_fd = open(port->dev_file, O_RDWR|O_NONBLOCK)) < 0) { - DEBUG("open %s failed", port->dev_file); + IBWARN("open %s failed", port->dev_file); return -EIO; } From iod00d at hp.com Sat Dec 10 20:42:31 2005 From: iod00d at hp.com (Grant Grundler) Date: Sat, 10 Dec 2005 20:42:31 -0800 Subject: [openib-general] [PATCH] OPENSM identify failure cases uniquely Message-ID: <20051211044231.GG9348@esmail.cup.hp.com> Hi, When tracking down the opensm "can't open port" failure described in previous email, I added log output for each of the failure cases in osm_vendor_open_port(). The "ERR" numbers need to be compared to some "master list" that I don't know about and replaced. I just picked sequential numbers not used in that routine. thanks, grant Signed-off-by: Grant Grundler Index: src/userspace/management/osm/libvendor/osm_vendor_ibumad.c =================================================================== --- src/userspace/management/osm/libvendor/osm_vendor_ibumad.c (revision 4371) +++ src/userspace/management/osm/libvendor/osm_vendor_ibumad.c (working copy) @@ -715,14 +715,26 @@ osm_vendor_open_port( } /* Port found, try to open it */ - if (umad_get_ca(p_vend->ca_names[ca], &p_vend->umad_ca) < 0) + if (umad_get_ca(p_vend->ca_names[ca], &p_vend->umad_ca) < 0) { + osm_log( p_vend->p_log, OSM_LOG_ERROR, + "osm_vendor_open_port: ERR 5423: " + "umad_get_ca() failed\n" ); goto Exit; + } - if (umad_get_port(p_vend->ca_names[ca], i, &p_vend->umad_port) < 0) + if (umad_get_port(p_vend->ca_names[ca], i, &p_vend->umad_port) < 0) { + osm_log( p_vend->p_log, OSM_LOG_ERROR, + "osm_vendor_open_port: ERR 5424: " + "umad_get_port() failed\n" ); goto Exit; + } - if ((umad_port_id = umad_open_port(p_vend->ca_names[ca], i)) < 0) + if ((umad_port_id = umad_open_port(p_vend->ca_names[ca], i)) < 0) { + osm_log( p_vend->p_log, OSM_LOG_ERROR, + "osm_vendor_open_port: ERR 5425: " + "umad_open_port() failed\n" ); goto Exit; + } p_vend->umad_port_id = umad_port_id; From yael at mellanox.co.il Sun Dec 11 04:06:06 2005 From: yael at mellanox.co.il (Yael Kalka) Date: Sun, 11 Dec 2005 14:06:06 +0200 Subject: [openib-general] RE: [PATCH] Opensm - fix osm_venodr_get_all_port_attr Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30E2486@mtlexch01.mtl.com> Hi Hal, Hi Yael, On Thu, 2005-12-08 at 05:39, Yael Kalka wrote: > Hi Hal, > > If osm_vendor_get_all_port_attr is called before the osm_vendor_bind, What exercises the vendor calls in this manner ? [YK] - We saw it in some test that uses the vendor lib. Later on the bind was called. > then the sm_lid of the default port isn't updated correctly. > This patch fixes it. Thanks. Applied. -- Hal From tziporet at mellanox.co.il Sun Dec 11 07:25:32 2005 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 11 Dec 2005 17:25:32 +0200 Subject: [openib-general] Next workshop dates? Please respond with your preferences Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E366C44A@mtlexch01.mtl.com> Hi Bill, What is the proposed agenda for the workshop? Tziporet -----Original Message----- From: Bill Boas [mailto:bboas at llnl.gov] Sent: Friday, December 09, 2005 8:53 AM To: openib-promoters at openib.org; openib-general at openib.org Subject: [openib-general] Next workshop dates? Please respond with your preferences All those wishing to attend the next workshop in Sonoma at the Lodge (same as last year) in the late January-early February please respond with your preferred dates. We currently have Jan29-Feb1 held for us but some people are telling us that is bad for them. The next 2 Sun-Wed slots (Feb 5-8 or 12-15) maybe available but we need guidance from those planning to attend as to their preferred dates. Bill. Bill Boas bboas at llnl.gov ICCD LLNL, B-453, R-2018 Wk: 925-422-4110 7000 East Ave, L-555 Cell: 925-337-2224 Livermore, CA 94551 Pgr: 877-203-2248 _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From bardov at gmail.com Sun Dec 11 07:57:04 2005 From: bardov at gmail.com (Dan Bar Dov) Date: Sun, 11 Dec 2005 17:57:04 +0200 Subject: [openib-general] [PATCH] [CMA] support for SDP + standard protocol In-Reply-To: References: Message-ID: I would have preferred not to add upper layer aware code into CMA, but I guess I'm late for that discussion. Regarding the patch below, it makes sense. Are you going to apply it to all affected modules? Dan On 12/9/05, Sean Hefty wrote: > The following patch updates the CMA to support the IB socket-based > protocol standard and SDP's private data format. > > The CMA now defines RDMA "port spaces". RDMA identifiers are associated > with a user-specified port space at creation time. > > Please respond with any comments on the approach. Note that these > changes have not been pushed up to userspace yet. > > Signed-off-by: Sean Hefty > > > > Index: ulp/iser/iser_verbs.c > =================================================================== > --- ulp/iser/iser_verbs.c (revision 4356) > +++ ulp/iser/iser_verbs.c (working copy) > @@ -428,7 +428,8 @@ iser_connect(struct iser_conn *p_iser_co > return -1; > } > p_iser_conn->cma_id = rdma_create_id(iser_cma_handler, > - (void *)p_iser_conn); > + (void *)p_iser_conn, > + RDMA_PS_TCP); > if (IS_ERR(p_iser_conn->cma_id)) { > ret = PTR_ERR(p_iser_conn->cma_id); > iser_err("rdma_create_id failed: %d\n", ret); > Index: include/rdma/rdma_cm.h > =================================================================== > --- include/rdma/rdma_cm.h (revision 4356) > +++ include/rdma/rdma_cm.h (working copy) > @@ -54,6 +54,13 @@ enum rdma_cm_event_type { > RDMA_CM_EVENT_DEVICE_REMOVAL, > }; > > +enum rdma_port_space { > + RDMA_PS_SDP = 0x0001, > + RDMA_PS_TCP = 0x0106, > + RDMA_PS_UDP = 0x0111, > + RDMA_PS_SCTP = 0x0183 > +}; > + > struct rdma_addr { > struct sockaddr src_addr; > u8 src_pad[sizeof(struct sockaddr_in6) - > @@ -97,11 +104,20 @@ struct rdma_cm_id { > struct ib_qp *qp; > rdma_cm_event_handler event_handler; > struct rdma_route route; > + enum rdma_port_space ps; > u8 port_num; > }; > > +/** > + * rdma_create_id - Create an RDMA identifier. > + * > + * @event_handler: User callback invoked to report events associated with the > + * returned rdma_id. > + * @context: User specified context associated with the id. > + * @ps: RDMA port space. > + */ > struct rdma_cm_id* rdma_create_id(rdma_cm_event_handler event_handler, > - void *context); > + void *context, enum rdma_port_space ps); > > void rdma_destroy_id(struct rdma_cm_id *id); > > Index: core/cma.c > =================================================================== > --- core/cma.c (revision 4356) > +++ core/cma.c (working copy) > @@ -110,21 +110,35 @@ struct rdma_id_private { > u8 srq; > }; > > -struct cma_addr { > - u8 version; /* CMA version: 7:4, IP version: 3:0 */ > - u8 reserved; > - __u16 port; > +union cma_ip_addr { > + struct in6_addr ip6; > struct { > - union { > - struct in6_addr ip6; > - struct { > - __u32 pad[3]; > - __u32 addr; > - } ip4; > - } ver; > - } src_addr, dst_addr; > + __u32 pad[3]; > + __u32 addr; > + } ip4; > +}; > + > +struct cma_hdr { > + u8 cma_version; > + u8 ip_version; /* IP version: 7:4 */ > + __u16 port; > + union cma_ip_addr src_addr; > + union cma_ip_addr dst_addr; > }; > > +struct sdp_hh { > + u8 sdp_version; > + u8 ip_version; /* IP version: 7:4 */ > + u8 sdp_specific1[10]; > + __u16 port; > + __u16 sdp_specific2; > + union cma_ip_addr src_addr; > + union cma_ip_addr dst_addr; > +}; > + > +#define CMA_VERSION 0x10 > +#define SDP_VERSION 0x22 > + > static int cma_comp(struct rdma_id_private *id_priv, enum cma_state comp) > { > unsigned long flags; > @@ -162,19 +176,24 @@ static enum cma_state cma_exch(struct rd > return old; > } > > -static inline u8 cma_get_ip_ver(struct cma_addr *addr) > +static inline u8 cma_get_ip_ver(struct cma_hdr *hdr) > { > - return addr->version & 0xF; > + return hdr->ip_version >> 4; > } > > -static inline u8 cma_get_cma_ver(struct cma_addr *addr) > +static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver) > { > - return addr->version >> 4; > + hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF); > } > > -static inline void cma_set_vers(struct cma_addr *addr, u8 cma_ver, u8 ip_ver) > +static inline u8 sdp_get_ip_ver(struct sdp_hh *hh) > { > - addr->version = (cma_ver << 4) + (ip_ver & 0xF); > + return hh->ip_version >> 4; > +} > + > +static inline void sdp_set_ip_ver(struct sdp_hh *hh, u8 ip_ver) > +{ > + hh->ip_version = (ip_ver << 4) | (hh->ip_version & 0xF); > } > > static void cma_attach_to_dev(struct rdma_id_private *id_priv, > @@ -226,17 +245,18 @@ static void cma_release_remove(struct rd > } > > struct rdma_cm_id* rdma_create_id(rdma_cm_event_handler event_handler, > - void *context) > + void *context, enum rdma_port_space ps) > { > struct rdma_id_private *id_priv; > > id_priv = kzalloc(sizeof *id_priv, GFP_KERNEL); > if (!id_priv) > - return NULL; > + return ERR_PTR(-ENOMEM); > > id_priv->state = CMA_IDLE; > id_priv->id.context = context; > id_priv->id.event_handler = event_handler; > + id_priv->id.ps = ps; > spin_lock_init(&id_priv->lock); > init_waitqueue_head(&id_priv->wait); > atomic_set(&id_priv->refcount, 1); > @@ -387,25 +407,93 @@ int rdma_init_qp_attr(struct rdma_cm_id > } > EXPORT_SYMBOL(rdma_init_qp_attr); > > -static int cma_verify_addr(struct cma_addr *addr, > - struct sockaddr_in *ip_addr) > +static inline int cma_any_addr(struct sockaddr *addr) > { > - if (cma_get_cma_ver(addr) != 1 || cma_get_ip_ver(addr) != 4) > - return -EINVAL; > + struct in6_addr *ip6; > > - if (ip_addr->sin_port != addr->port) > - return -EINVAL; > + if (addr->sa_family == AF_INET) > + return ((struct sockaddr_in *) addr)->sin_addr.s_addr == > + INADDR_ANY; > + else { > + ip6 = &((struct sockaddr_in6 *) addr)->sin6_addr; > + return (ip6->s6_addr32[0] | ip6->s6_addr32[1] | > + ip6->s6_addr32[3] | ip6->s6_addr32[4]) == 0; > + } > +} > > - if (ip_addr->sin_addr.s_addr && > - (ip_addr->sin_addr.s_addr != addr->dst_addr.ver.ip4.addr)) > - return -EINVAL; > +static int cma_get_net_info(void *hdr, enum rdma_port_space ps, > + u8 *ip_ver, __u16 *port, > + union cma_ip_addr **src, union cma_ip_addr **dst) > +{ > + switch (ps) { > + case RDMA_PS_SDP: > + if (((struct sdp_hh *) hdr)->sdp_version != SDP_VERSION) > + return -EINVAL; > > + *ip_ver = sdp_get_ip_ver(hdr); > + *port = ((struct sdp_hh *) hdr)->port; > + *src = &((struct sdp_hh *) hdr)->src_addr; > + *dst = &((struct sdp_hh *) hdr)->dst_addr; > + break; > + default: > + if (((struct cma_hdr *) hdr)->cma_version != CMA_VERSION) > + return -EINVAL; > + > + *ip_ver = cma_get_ip_ver(hdr); > + *port = ((struct cma_hdr *) hdr)->port; > + *src = &((struct cma_hdr *) hdr)->src_addr; > + *dst = &((struct cma_hdr *) hdr)->dst_addr; > + break; > + } > return 0; > } > > -static inline int cma_any_addr(struct sockaddr *addr) > +static void cma_save_net_info(struct rdma_addr *addr, > + struct rdma_addr *listen_addr, > + u8 ip_ver, __u16 port, > + union cma_ip_addr *src, union cma_ip_addr *dst) > +{ > + struct sockaddr_in *listen4, *ip4; > + struct sockaddr_in6 *listen6, *ip6; > + > + switch (ip_ver) { > + case 4: > + listen4 = (struct sockaddr_in *) &listen_addr->src_addr; > + ip4 = (struct sockaddr_in *) &addr->src_addr; > + ip4->sin_family = listen4->sin_family; > + ip4->sin_addr.s_addr = dst->ip4.addr; > + ip4->sin_port = listen4->sin_port; > + > + ip4 = (struct sockaddr_in *) &addr->dst_addr; > + ip4->sin_family = listen4->sin_family; > + ip4->sin_addr.s_addr = src->ip4.addr; > + ip4->sin_port = port; > + break; > + case 6: > + listen6 = (struct sockaddr_in6 *) &listen_addr->src_addr; > + ip6 = (struct sockaddr_in6 *) &addr->src_addr; > + ip6->sin6_family = listen6->sin6_family; > + ip6->sin6_addr = dst->ip6; > + ip6->sin6_port = listen6->sin6_port; > + > + ip6 = (struct sockaddr_in6 *) &addr->dst_addr; > + ip6->sin6_family = listen6->sin6_family; > + ip6->sin6_addr = src->ip6; > + ip6->sin6_port = port; > + break; > + default: > + break; > + } > +} > + > +static inline int cma_user_data_offset(enum rdma_port_space ps) > { > - return ((struct sockaddr_in *) addr)->sin_addr.s_addr == 0; > + switch (ps) { > + case RDMA_PS_SDP: > + return 0; > + default: > + return sizeof(struct cma_hdr); > + } > } > > static int cma_notify_user(struct rdma_id_private *id_priv, > @@ -640,53 +728,41 @@ static struct rdma_id_private* cma_new_i > { > struct rdma_id_private *id_priv; > struct rdma_cm_id *id; > - struct rdma_route *route; > - struct sockaddr_in *ip_addr, *listen_addr; > - struct ib_sa_path_rec *path_rec; > - struct cma_addr *addr; > - int num_paths; > - > - listen_addr = (struct sockaddr_in *) &listen_id->route.addr.src_addr; > - if (cma_verify_addr(ib_event->private_data, listen_addr)) > - return NULL; > + struct rdma_route *rt; > + union cma_ip_addr *src, *dst; > + __u16 port; > + u8 ip_ver; > > - num_paths = 1 + (ib_event->param.req_rcvd.alternate_path != NULL); > - path_rec = kmalloc(sizeof *path_rec * num_paths, GFP_KERNEL); > - if (!path_rec) > + id = rdma_create_id(listen_id->event_handler, listen_id->context, > + listen_id->ps); > + if (IS_ERR(id)) > return NULL; > > - id = rdma_create_id(listen_id->event_handler, listen_id->context); > - if (!id) > + rt = &id->route; > + rt->num_paths = ib_event->param.req_rcvd.alternate_path ? 2 : 1; > + rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, GFP_KERNEL); > + if (!rt->path_rec) > goto err; > > - addr = ib_event->private_data; > - route = &id->route; > + if (cma_get_net_info(ib_event->private_data, listen_id->ps, > + &ip_ver, &port, &src, &dst)) > + goto err; > > - ip_addr = (struct sockaddr_in *) &route->addr.src_addr; > - ip_addr->sin_family = listen_addr->sin_family; > - ip_addr->sin_addr.s_addr = addr->dst_addr.ver.ip4.addr; > - ip_addr->sin_port = listen_addr->sin_port; > - > - ip_addr = (struct sockaddr_in *) &route->addr.dst_addr; > - ip_addr->sin_family = listen_addr->sin_family; > - ip_addr->sin_addr.s_addr = addr->src_addr.ver.ip4.addr; > - ip_addr->sin_port = addr->port; > - > - route->num_paths = num_paths; > - route->path_rec = path_rec; > - path_rec[0] = *ib_event->param.req_rcvd.primary_path; > - if (num_paths == 2) > - path_rec[1] = *ib_event->param.req_rcvd.alternate_path; > - > - route->addr.addr.ibaddr.sgid = path_rec->sgid; > - route->addr.addr.ibaddr.dgid = path_rec->dgid; > - route->addr.addr.ibaddr.pkey = be16_to_cpu(path_rec->pkey); > + cma_save_net_info(&id->route.addr, &listen_id->route.addr, > + ip_ver, port, src, dst); > + rt->path_rec[0] = *ib_event->param.req_rcvd.primary_path; > + if (rt->num_paths == 2) > + rt->path_rec[1] = *ib_event->param.req_rcvd.alternate_path; > + > + rt->addr.addr.ibaddr.sgid = rt->path_rec[0].sgid; > + rt->addr.addr.ibaddr.dgid = rt->path_rec[0].dgid; > + rt->addr.addr.ibaddr.pkey = be16_to_cpu(rt->path_rec[0].pkey); > > id_priv = container_of(id, struct rdma_id_private, id); > id_priv->state = CMA_CONNECT; > return id_priv; > err: > - kfree(path_rec); > + rdma_destroy_id(id); > return NULL; > } > > @@ -708,7 +784,6 @@ static int cma_req_handler(struct ib_cm_ > goto out; > } > > - conn_id->state = CMA_CONNECT; > atomic_inc(&conn_id->dev_remove); > ret = cma_acquire_ib_dev(conn_id, &conn_id->id.route.path_rec[0].sgid); > if (ret) { > @@ -722,7 +797,7 @@ static int cma_req_handler(struct ib_cm_ > cm_id->context = conn_id; > cm_id->cm_handler = cma_ib_handler; > > - offset = sizeof(struct cma_addr); > + offset = cma_user_data_offset(listen_id->id.ps); > ret = cma_notify_user(conn_id, RDMA_CM_EVENT_CONNECT_REQUEST, 0, > ib_event->private_data + offset, > IB_CM_REQ_PRIVATE_DATA_SIZE - offset); > @@ -738,16 +813,16 @@ out: > return ret; > } > > -static __be64 cma_get_service_id(struct sockaddr *addr) > +static __be64 cma_get_service_id(enum rdma_port_space ps, struct sockaddr *addr) > { > - return cpu_to_be64(((u64)IB_OPENIB_OUI << 48) + > + return cpu_to_be64(((u64)ps << 16) + > ((struct sockaddr_in *) addr)->sin_port); > } > > static void cma_set_compare_data(struct sockaddr *addr, > struct ib_cm_private_data_compare *compare) > { > - struct cma_addr *data, *mask; > + struct cma_hdr *data, *mask; > > memset(compare, 0, sizeof *compare); > data = (void *) compare->data; > @@ -755,19 +830,18 @@ static void cma_set_compare_data(struct > > switch (addr->sa_family) { > case AF_INET: > - cma_set_vers(data, 0, 4); > - cma_set_vers(mask, 0, 0xF); > - data->dst_addr.ver.ip4.addr = ((struct sockaddr_in *) addr)-> > - sin_addr.s_addr; > - mask->dst_addr.ver.ip4.addr = ~0; > + cma_set_ip_ver(data, 4); > + cma_set_ip_ver(mask, 0xF); > + data->dst_addr.ip4.addr = ((struct sockaddr_in *) addr)-> > + sin_addr.s_addr; > + mask->dst_addr.ip4.addr = ~0; > break; > case AF_INET6: > - cma_set_vers(data, 0, 6); > - cma_set_vers(mask, 0, 0xF); > - data->dst_addr.ver.ip6 = ((struct sockaddr_in6 *) addr)-> > - sin6_addr; > - memset(&mask->dst_addr.ver.ip6, 1, > - sizeof mask->dst_addr.ver.ip6); > + cma_set_ip_ver(data, 6); > + cma_set_ip_ver(mask, 0xF); > + data->dst_addr.ip6 = ((struct sockaddr_in6 *) addr)-> > + sin6_addr; > + memset(&mask->dst_addr.ip6, 1, sizeof mask->dst_addr.ip6); > break; > default: > break; > @@ -787,7 +861,7 @@ static int cma_ib_listen(struct rdma_id_ > return PTR_ERR(id_priv->cm_id); > > addr = &id_priv->id.route.addr.src_addr; > - svc_id = cma_get_service_id(addr); > + svc_id = cma_get_service_id(id_priv->id.ps, addr); > if (cma_any_addr(addr)) > ret = ib_cm_listen(id_priv->cm_id, svc_id, 0, NULL); > else { > @@ -835,7 +909,7 @@ static void cma_listen_on_dev(struct rdm > struct rdma_cm_id *id; > int ret; > > - id = rdma_create_id(cma_listen_handler, id_priv); > + id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps); > if (IS_ERR(id)) > return; > > @@ -1099,19 +1173,34 @@ err: > } > EXPORT_SYMBOL(rdma_bind_addr); > > -static void cma_format_addr(struct cma_addr *addr, struct rdma_route *route) > +static void cma_format_hdr(void *hdr, enum rdma_port_space ps, > + struct rdma_route *route) > { > - struct sockaddr_in *ip_addr; > - > - memset(addr, 0, sizeof *addr); > - cma_set_vers(addr, 1, 4); > - > - ip_addr = (struct sockaddr_in *) &route->addr.src_addr; > - addr->src_addr.ver.ip4.addr = ip_addr->sin_addr.s_addr; > - > - ip_addr = (struct sockaddr_in *) &route->addr.dst_addr; > - addr->dst_addr.ver.ip4.addr = ip_addr->sin_addr.s_addr; > - addr->port = ip_addr->sin_port; > + struct sockaddr_in *src4, *dst4; > + struct cma_hdr *cma_hdr; > + struct sdp_hh *sdp_hdr; > + > + src4 = (struct sockaddr_in *) &route->addr.src_addr; > + dst4 = (struct sockaddr_in *) &route->addr.dst_addr; > + > + switch (ps) { > + case RDMA_PS_SDP: > + sdp_hdr = hdr; > + sdp_hdr->sdp_version = SDP_VERSION; > + sdp_set_ip_ver(sdp_hdr, 4); > + sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; > + sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; > + sdp_hdr->port = src4->sin_port; > + break; > + default: > + cma_hdr = hdr; > + cma_hdr->cma_version = CMA_VERSION; > + cma_set_ip_ver(cma_hdr, 4); > + cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; > + cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; > + cma_hdr->port = src4->sin_port; > + break; > + } > } > > static int cma_connect_ib(struct rdma_id_private *id_priv, > @@ -1119,17 +1208,20 @@ static int cma_connect_ib(struct rdma_id > { > struct ib_cm_req_param req; > struct rdma_route *route; > - struct cma_addr *addr; > void *private_data; > - int ret; > + int offset, ret; > > memset(&req, 0, sizeof req); > - req.private_data_len = sizeof *addr + conn_param->private_data_len; > - > - private_data = kmalloc(req.private_data_len, GFP_ATOMIC); > + offset = cma_user_data_offset(id_priv->id.ps); > + req.private_data_len = offset + conn_param->private_data_len; > + private_data = kzalloc(req.private_data_len, GFP_ATOMIC); > if (!private_data) > return -ENOMEM; > > + if (conn_param->private_data && conn_param->private_data_len) > + memcpy(private_data + offset, conn_param->private_data, > + conn_param->private_data_len); > + > id_priv->cm_id = ib_create_cm_id(id_priv->id.device, cma_ib_handler, > id_priv); > if (IS_ERR(id_priv->cm_id)) { > @@ -1137,20 +1229,16 @@ static int cma_connect_ib(struct rdma_id > goto out; > } > > - addr = private_data; > route = &id_priv->id.route; > - cma_format_addr(addr, route); > - > - if (conn_param->private_data && conn_param->private_data_len) > - memcpy(addr + 1, conn_param->private_data, > - conn_param->private_data_len); > + cma_format_hdr(private_data, id_priv->id.ps, route); > req.private_data = private_data; > > req.primary_path = &route->path_rec[0]; > if (route->num_paths == 2) > req.alternate_path = &route->path_rec[1]; > > - req.service_id = cma_get_service_id(&route->addr.dst_addr); > + req.service_id = cma_get_service_id(id_priv->id.ps, > + &route->addr.dst_addr); > req.qp_num = id_priv->qp_num; > req.qp_type = id_priv->qp_type; > req.starting_psn = id_priv->seq_num; > @@ -1317,23 +1405,6 @@ out: > } > EXPORT_SYMBOL(rdma_disconnect); > > -/* TODO: add this to the device structure - see Roland's patch */ > -static __be64 get_ca_guid(struct ib_device *device) > -{ > - struct ib_device_attr *device_attr; > - __be64 guid; > - int ret; > - > - device_attr = kmalloc(sizeof *device_attr, GFP_KERNEL); > - if (!device_attr) > - return 0; > - > - ret = ib_query_device(device, device_attr); > - guid = ret ? 0 : device_attr->node_guid; > - kfree(device_attr); > - return guid; > -} > - > static void cma_add_one(struct ib_device *device) > { > struct cma_device *cma_dev; > @@ -1344,7 +1415,7 @@ static void cma_add_one(struct ib_device > return; > > cma_dev->device = device; > - cma_dev->node_guid = get_ca_guid(device); > + cma_dev->node_guid = device->node_guid; > if (!cma_dev->node_guid) > goto err; > > Index: core/ucma.c > =================================================================== > --- core/ucma.c (revision 4356) > +++ core/ucma.c (working copy) > @@ -287,7 +287,7 @@ static ssize_t ucma_create_id(struct ucm > return -ENOMEM; > > ctx->uid = cmd.uid; > - ctx->cm_id = rdma_create_id(ucma_event_handler, ctx); > + ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, RDMA_PS_TCP); > if (IS_ERR(ctx->cm_id)) { > ret = PTR_ERR(ctx->cm_id); > goto err1; > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From iod00d at hp.com Sun Dec 11 09:53:41 2005 From: iod00d at hp.com (Grant Grundler) Date: Sun, 11 Dec 2005 09:53:41 -0800 Subject: [openib-general] [PATCH applied] sdp: fix kunmap_atomic usage In-Reply-To: <20051210212140.GA30971@mellanox.co.il> References: <20051210212140.GA30971@mellanox.co.il> Message-ID: <20051211175341.GA12176@esmail.cup.hp.com> On Sat, Dec 10, 2005 at 11:21:40PM +0200, Michael S. Tsirkin wrote: > SDP was using kunmap_atomic incorrectly. > Of course, I'm typically using it on platforms where its a nop, > so I dint notice :) I might have spoken too soon...I just started getting "ERR" output from ib_sdp running netperf TCP_STREAM over SDP on the IA64 rx2600's. I killed and restarted the "sdpstream" script. It seems to be working. I've not yet seen this type of error running r4344 on a different box. If it's not obvious what's wrong, I can try r4344 on the rx2600's as well. thanks, grant $ for i in run-*; do date; echo $i; ./$i 10.0.0.30 CPU ; done Sat Dec 10 20:12:36 PST 2005 run-sdprr Sun Dec 11 09:07:06 PST 2005 run-sdprr-gnuplot Sun Dec 11 09:07:06 PST 2005 run-sdpstream ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <8192:0:8192> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <8197:0:8197> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <16384:0:16384> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <49152:0:49152> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <49157:0:49157> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <57344:0:57344> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <65536:0:65536> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <126976:0:126976> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <126976:0:126976> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <126976:0:126976> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <8192:0:8192> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <8197:0:8197> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <16384:0:16384> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <49152:0:49152> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <49157:0:49157> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <57344:0:57344> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <65536:0:65536> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <126976:0:126976> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <126976:0:126976> ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <126976:0:126976> ... From mst at mellanox.co.il Sun Dec 11 10:05:43 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Dec 2005 20:05:43 +0200 Subject: [openib-general] [PATCH applied] sdp: fix kunmap_atomic usage In-Reply-To: <20051211175341.GA12176@esmail.cup.hp.com> References: <20051211175341.GA12176@esmail.cup.hp.com> Message-ID: <20051211180543.GR14936@mellanox.co.il> Quoting Grant Grundler : > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <8192:0:8192> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <8197:0:8197> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <16384:0:16384> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <49152:0:49152> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <49157:0:49157> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <57344:0:57344> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <65536:0:65536> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <126976:0:126976> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <126976:0:126976> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <126976:0:126976> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <8192:0:8192> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <8197:0:8197> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <16384:0:16384> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <49152:0:49152> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <49157:0:49157> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <57344:0:57344> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <65536:0:65536> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <126976:0:126976> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <126976:0:126976> > ib_sdp ERR: IOCB <-1> cancel <0> flag <0340> size <126976:0:126976> This might be benign, need to check. Did the test run to completion with these messages? -- MST From heritage at 0571jiajiao.com Sun Dec 11 10:04:52 2005 From: heritage at 0571jiajiao.com (Yahir Adams) Date: Sun, 11 Dec 2005 12:04:52 -0600 Subject: [openib-general] Massive PE patch sale Message-ID: <000001c5fea7$5345bd00$0100007f@localhost> Finally the real thing- no more ripoffs! Enhancment Patches are hot right now, VERY hot! Unfortunately, most are cheap imitiations and do very little to increase your size and stamina. Well this is the real thing, not an imitation! One of the very originals, the absolutely strongest Patch available, anywhere! A top team of British scientists and medical doctors have worked to develop the state-of-the-art Pen1s Enlargment Patch delivery system which automatically increases pen1s size up to 3-4 full inches. The patches are the easiest and most effective way to increase your size. You won't have to take pills, get under the knife to perform expensive and very painful surgery, use any pumps or other devices. No one will ever find out that you are using our product. Just apply one patch on your body and wear it for 3 days and you will start noticing dramatic results. Millions of men are taking advantage of this revolutionary new product - Don't be left behind! As an added incentive, they are offering huge discount specials right now, check out the site to see for yourself! Here's the link to check out! http://www.befaso.net/pt/?46&fqmroe -------------- next part -------------- An HTML attachment was scrubbed... URL: From eyeleen at indosat.net.id Sun Dec 11 10:18:42 2005 From: eyeleen at indosat.net.id (Otis Daugherty) Date: Sun, 11 Dec 2005 12:18:42 -0600 Subject: [openib-general] Re-finance before rates skyrocket Message-ID: <149j615y.2320444@yahoo.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rhombi.4.gif Type: image/gif Size: 7817 bytes Desc: not available URL: From bboas at llnl.gov Sun Dec 11 16:10:54 2005 From: bboas at llnl.gov (Bill Boas) Date: Sun, 11 Dec 2005 16:10:54 -0800 Subject: [openib-general] Next workshop dates? Ideas for agenda??? In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E366C44A@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E366C44A@mtlexch01.mtl.com> Message-ID: <6.2.3.4.2.20051211160126.03bc3878@mail-lc.llnl.gov> Tziporet, Not sure yet - I think, subject to others input, it'll be focused on wrapping up rel 1.0 of OpenIB, discussing what the developers are going to focus on next and validating the strategy for RDMA over Ethernet integration at the verbs level to lay the foundation for one, consistent RDMA structure in Linux, if possible. We may also see the formation of customer working groups with vertical (e.g.:- financial, HPC, oil and gas, ) common interests able to express their requirements as a group to the development community. Just some ideas, it would be good to get feedback from both the developer and the promoter communities???? Bill. At 07:25 AM 12/11/2005, Tziporet Koren wrote: >Hi Bill, > >What is the proposed agenda for the workshop? > >Tziporet > > >-----Original Message----- >From: Bill Boas [mailto:bboas at llnl.gov] >Sent: Friday, December 09, 2005 8:53 AM >To: openib-promoters at openib.org; openib-general at openib.org >Subject: [openib-general] Next workshop dates? Please respond with your >preferences > >All those wishing to attend the next workshop in Sonoma at the Lodge >(same as last year) in the late January-early February please respond >with your preferred dates. > >We currently have Jan29-Feb1 held for us but some people are telling >us that is bad for them. > >The next 2 Sun-Wed slots (Feb 5-8 or 12-15) maybe available but we >need guidance from those planning to attend as to their preferred dates. > >Bill. > >Bill Boas bboas at llnl.gov >ICCD LLNL, B-453, R-2018 Wk: 925-422-4110 >7000 East Ave, L-555 Cell: 925-337-2224 >Livermore, CA 94551 Pgr: 877-203-2248 > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general >_______________________________________________ >openib-promoters mailing list >openib-promoters at openib.org >http://openib.org/mailman/listinfo/openib-promoters Bill Boas bboas at llnl.gov ICCD LLNL, B-453, R-2018 Wk: 925-422-4110 7000 East Ave, L-555 Cell: 925-337-2224 Livermore, CA 94551 Pgr: 877-203-2248 From spoole at lanl.gov Sun Dec 11 16:16:37 2005 From: spoole at lanl.gov (Steve Poole) Date: Sun, 11 Dec 2005 17:16:37 -0700 Subject: [openib-general] Re: [Openib-promoters] Next workshop dates? Ideas for agenda??? In-Reply-To: <6.2.3.4.2.20051211160126.03bc3878@mail-lc.llnl.gov> References: <6AB138A2AB8C8E4A98B9C0C3D52670E366C44A@mtlexch01.mtl.com> <6.2.3.4.2.20051211160126.03bc3878@mail-lc.llnl.gov> Message-ID: <6.2.5.6.0.20051211171549.0203dd28@lanl.gov> At 05:10 PM 12/11/2005, Bill Boas wrote: >Tziporet, > >Not sure yet - I think, subject to others input, it'll be focused on >wrapping up rel 1.0 of OpenIB, discussing what the developers are >going to focus on next and validating the strategy for RDMA over >Ethernet integration at the verbs level to lay the foundation for >one, consistent RDMA structure in Linux, if possible. > >We may also see the formation of customer working groups with >vertical (e.g.:- financial, HPC, oil and gas, ) common interests >able to express their requirements as a group to the development community. As long as they merge with the rest of the requirements for OpenIB, this is great. We will not have several different versions of OpenIB. Steve... >Just some ideas, it would be good to get feedback from both the >developer and the promoter communities???? > >Bill. > > At 07:25 AM 12/11/2005, Tziporet Koren wrote: >>Hi Bill, >> >>What is the proposed agenda for the workshop? >> >>Tziporet >> >> >>-----Original Message----- >>From: Bill Boas [mailto:bboas at llnl.gov] >>Sent: Friday, December 09, 2005 8:53 AM >>To: openib-promoters at openib.org; openib-general at openib.org >>Subject: [openib-general] Next workshop dates? Please respond with your >>preferences >> >>All those wishing to attend the next workshop in Sonoma at the Lodge >>(same as last year) in the late January-early February please respond >>with your preferred dates. >> >>We currently have Jan29-Feb1 held for us but some people are telling >>us that is bad for them. >> >>The next 2 Sun-Wed slots (Feb 5-8 or 12-15) maybe available but we >>need guidance from those planning to attend as to their preferred dates. >> >>Bill. >> >>Bill Boas bboas at llnl.gov >>ICCD LLNL, B-453, R-2018 Wk: 925-422-4110 >>7000 East Ave, L-555 Cell: 925-337-2224 >>Livermore, CA 94551 Pgr: 877-203-2248 >> >>_______________________________________________ >>openib-general mailing list >>openib-general at openib.org >>http://openib.org/mailman/listinfo/openib-general >> >>To unsubscribe, please visit >>http://openib.org/mailman/listinfo/openib-general >>_______________________________________________ >>openib-promoters mailing list >>openib-promoters at openib.org >>http://openib.org/mailman/listinfo/openib-promoters > >Bill Boas bboas at llnl.gov >ICCD LLNL, B-453, R-2018 Wk: 925-422-4110 >7000 East Ave, L-555 Cell: 925-337-2224 >Livermore, CA 94551 Pgr: 877-203-2248 >_______________________________________________ >openib-promoters mailing list >openib-promoters at openib.org >http://openib.org/mailman/listinfo/openib-promoters From bboas at llnl.gov Sun Dec 11 16:18:19 2005 From: bboas at llnl.gov (Bill Boas) Date: Sun, 11 Dec 2005 16:18:19 -0800 Subject: [openib-general] RE: Next workshop dates? Feedback from others please!!! In-Reply-To: <1AC79F16F5C5284499BB9591B33D6F000652719A@orsmsx408> References: <1AC79F16F5C5284499BB9591B33D6F000652719A@orsmsx408> Message-ID: <6.2.3.4.2.20051211161206.039c1b80@mail-lc.llnl.gov> Bob, Voltaire says they cannot not make Jan 29 'cos they have an all hands 3 day meeting in that time slot. In the Feb 5-8 timeframe we worked in the super bowl last year OK I think??? By the time we get to late Feb the hotel rates in Sonoma go up so I'm thinking we should go for Feb 5 again, but feedback on this would be very helpful as we need to make a decision in the next few days. Feedback from others please. Bill. At 03:10 PM 12/9/2005, you wrote: >Jan29-Feb1 works for me. >Feb 5 is super bowl Sunday, might want to stay away from that one >Feb 12 - Sean is on vacation. >Feb 20 - Is a holiday in the US. >How about Feb 27- March 2 as an alternative if people cannot make Jan29 >? > >woody > > >-----Original Message----- >From: openib-general-bounces at openib.org >[mailto:openib-general-bounces at openib.org] On Behalf Of Bill Boas >Sent: Thursday, December 08, 2005 10:53 PM >To: openib-promoters at openib.org; openib-general at openib.org >Subject: [openib-general] Next workshop dates? Please respond with >yourpreferences > >All those wishing to attend the next workshop in Sonoma at the Lodge >(same as last year) in the late January-early February please respond >with your preferred dates. > >We currently have Jan29-Feb1 held for us but some people are telling >us that is bad for them. > >The next 2 Sun-Wed slots (Feb 5-8 or 12-15) maybe available but we >need guidance from those planning to attend as to their preferred dates. > >Bill. > >Bill Boas bboas at llnl.gov >ICCD LLNL, B-453, R-2018 Wk: 925-422-4110 >7000 East Ave, L-555 Cell: 925-337-2224 >Livermore, CA 94551 Pgr: 877-203-2248 > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general Bill Boas bboas at llnl.gov ICCD LLNL, B-453, R-2018 Wk: 925-422-4110 7000 East Ave, L-555 Cell: 925-337-2224 Livermore, CA 94551 Pgr: 877-203-2248 From krkumar2 at in.ibm.com Sun Dec 11 21:40:12 2005 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Mon, 12 Dec 2005 11:10:12 +0530 Subject: [openib-general] [PATCH fixed] was Re: [PATCH] ipoib_multicast/ipoib_mcast_send race In-Reply-To: <20051209131151.GA21716@mellanox.co.il> Message-ID: Hi Micheal, But the lock doesn't help that case. The only difference with having the lock is that in case of a race, the mcast_send() will complete *before* the flag is set, while without the lock the mcast_send() could be in the *middle* of execution when the flag is set. But in both cases, the packet would be sent out. In case of reverse race (that means the stop_thread executes before the mcast_send(), in both cases, the packet would not be sent out, hence the lock is not helping both cases. I feel the new code looks fine without the lock. Thanks, - KK "Michael S. Tsirkin" wrote on 12/09/2005 06:41:51 PM: > The lock around clear_bit is there to ensure that ipoib_mcast_send isnt running > already when we stop the thread. > Thats why test_bit has to be instide the lock, too. > > > Quoting r. Krishna Kumar2 : > > Subject: Re: [openib-general] [PATCH fixed] was Re: [PATCH]? > ipoib_multicast/ipoib_mcast_send race > > > > > > Hi Micheal, > > > > Is there a reason to have the atomic set_bit() within a lock (even for > > a race condition of stop vs send, it doesn't seem to be required) ? > > Which means the test_bit() can also be put before the existing lock... > > > > Thanks, > > > > - KK > > > > openib-general-bounces at openib.org wrote on 12/09/2005 12:04:06 AM: > > > > > Quoting Michael S. Tsirkin : > > > > Subject: [PATCH] ipoib_multicast/ipoib_mcast_send race > > > > > > > > Hello, Roland! > > > > Here's another race scenario. > > > > > > > > --- > > > > > > > > Fix the following race scenario: > > > > device is up. > > > > port event or set mcast list triggers ipoib_mcast_stop_thread, > > > > This cancels the query and waits on mcast "done" completion. > > > > completion is called and "done" is set. > > > > Meanwhile, ipoib_mcast_send arrives and starts a new query, > > > > re-initializing "done". > > > > > > > > Signed-off-by: Michael S. Tsirkin > > > > > > The patch I posted previously leaked an skb when a multicast > > > send arrived while the mcast thread is stopped. > > > > > > Further, there's an additional issue that I saw in testing: > > > ipoib_mcast_send may get called when priv->broadcast is NULL > > > (e.g. if the device was downed and then upped internally because > > > of a port event). > > > If this happends and the sendonly join request gets completed before > > > priv->broadcast is set, we get an oops that I posted previously. > > > > > > Here's a better patch to address these two problems. > > > It has been running fine here for a while now. > > > > > > Please note that this replaces the ipoib_multicast/ipoib_mcast_send > > patch, > > > but not the ADMIN_UP patch that I posted previously. > > > > > > --- > > > > > > Do not send multicasts if mcast thread is stopped or if > > > priv->broadcast is not set. > > > > > > Signed-off-by: Michael S. Tsirkin > > > > > > Index: openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > > > =================================================================== > > > --- openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (revision > > 4222) > > > +++ openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (working > > copy) > > > @@ -582,6 +582,10 @@ int ipoib_mcast_start_thread(struct net_ > > > queue_work(ipoib_workqueue, &priv->mcast_task); > > > up(&mcast_mutex); > > > > > > + spin_lock_irq(&priv->lock); > > > + set_bit(IPOIB_MCAST_STARTED, &priv->flags); > > > + spin_unlock_irq(&priv->lock); > > > + > > > return 0; > > > } > > > > > > @@ -592,6 +596,10 @@ int ipoib_mcast_stop_thread(struct net_d > > > > > > ipoib_dbg_mcast(priv, "stopping multicast thread\n"); > > > > > > + spin_lock_irq(&priv->lock); > > > + clear_bit(IPOIB_MCAST_STARTED, &priv->flags); > > > + spin_unlock_irq(&priv->lock); > > > + > > > down(&mcast_mutex); > > > clear_bit(IPOIB_MCAST_RUN, &priv->flags); > > > cancel_delayed_work(&priv->mcast_task); > > > @@ -674,6 +682,11 @@ void ipoib_mcast_send(struct net_device > > > */ > > > spin_lock(&priv->lock); > > > > > > + if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags) || > > !priv->broadcast) { > > > + dev_kfree_skb_any(skb); > > > + goto unlock; > > > + } > > > + > > > mcast = __ipoib_mcast_find(dev, mgid); > > > if (!mcast) { > > > /* Let's create a new send only group now */ > > > @@ -732,6 +745,7 @@ out: > > > ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN); > > > } > > > > > > +unlock: > > > spin_unlock(&priv->lock); > > > } > > > > > > Index: openib/drivers/infiniband/ulp/ipoib/ipoib.h > > > =================================================================== > > > --- openib/drivers/infiniband/ulp/ipoib/ipoib.h (revision 4222) > > > +++ openib/drivers/infiniband/ulp/ipoib/ipoib.h (working copy) > > > @@ -78,6 +78,7 @@ enum { > > > IPOIB_FLAG_SUBINTERFACE = 4, > > > IPOIB_MCAST_RUN = 5, > > > IPOIB_STOP_REAPER = 6, > > > + IPOIB_MCAST_STARTED = 7, > > > > > > IPOIB_MAX_BACKOFF_SECONDS = 16, > > > > > > > > > -- > > > MST > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > -- > MST -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Sun Dec 11 22:12:13 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Dec 2005 08:12:13 +0200 Subject: [openib-general] [PATCH fixed] was Re: [PATCH] ipoib_multicast/ipoib_mcast_send race In-Reply-To: References: Message-ID: <20051212061213.GA24168@mellanox.co.il> Quoting Krishna Kumar2 : > Subject: Re: [openib-general] [PATCH fixed] was Re: [PATCH]?ipoib_multicast/ipoib_mcast_send race > > > Hi Micheal, > > But the lock doesn't help that case. The only difference with having the > > lock is that in case of a race, the mcast_send() will complete *before* > the flag is set, while without the lock the mcast_send() could be in the > > *middle* of execution when the flag is set. Exactly, you got it. If you look at mcast_send you'll see that it creates new queries, creates broad cast group and adds entries to the list. So here's why the lock helps: > > > > > Fix the following race scenario: > > > > > device is up. > > > > > port event or set mcast list triggers ipoib_mcast_stop_thread, > > > > > This cancels the query and waits on mcast "done" completion. > > > > > completion is called and "done" is set. > > > > > Meanwhile, ipoib_mcast_send arrives and starts a new query, > > > > > re-initializing "done". Clear now? -- MST From krkumar2 at in.ibm.com Sun Dec 11 22:37:58 2005 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Mon, 12 Dec 2005 12:07:58 +0530 Subject: [openib-general] [PATCH fixed] was Re: [PATCH] ipoib_multicast/ipoib_mcast_send race In-Reply-To: <20051212061213.GA24168@mellanox.co.il> Message-ID: Hi Michael, > Exactly, you got it. If you look at mcast_send you'll see that it creates new > queries, creates broad cast group and adds entries to the list. Correct, but even with the lock, it would create those (once, and that is true for whether lock is held or not). The only thing stopping creation of those is setting the bit (but only for a. the non-race part or b. the race part where the stop_thread executes before the mcast_send(), but not the race part where the mcast_send() wins over the stop_thread), but holding the lock for the setting/testing of that bit will not stop creation of those entries in the b. case. > This cancels the query and waits on mcast "done" completion. > completion is called and "done" is set. > Meanwhile, ipoib_mcast_send arrives and starts a new query, > re-initializing "done". Isn't all that managed by clearing/testing the bit ? Because holding the lock doesn't solve it. To give an example : stop_thread() { lock(); clear(); unlock(); ... wait_for_completion(mcast); } mcast_send() { lock(); test(); results_in_creation_of_entries_done_etc();; unlock(); } In this case, if mcast_send() gets the lock first and proceeds while the stop_thread is spinning on the lock, the entries are created and then the stop_thread() clears the bit and at this point in time, no more entries can be ever created. Now if the lock were removed, the behavior is identical - the mcast_send() would test the bit, and get the lock() while the stop_thread() clears the bit (without a lock) and waits for completion, while *no more* mcast_sends() would ever continue beyond this time. thanks, - KK "Michael S. Tsirkin" wrote on 12/12/2005 11:42:13 AM: > Quoting Krishna Kumar2 : > > Subject: Re: [openib-general] [PATCH fixed] was Re: [PATCH]? > ipoib_multicast/ipoib_mcast_send race > > > > > > Hi Micheal, > > > > But the lock doesn't help that case. The only difference with having the > > > > lock is that in case of a race, the mcast_send() will complete *before* > > the flag is set, while without the lock the mcast_send() could be in the > > > > *middle* of execution when the flag is set. > > Exactly, you got it. If you look at mcast_send you'll see that it creates new > queries, creates broad cast group and adds entries to the list. > > So here's why the lock helps: > > > > > > > Fix the following race scenario: > > > > > > device is up. > > > > > > port event or set mcast list triggers ipoib_mcast_stop_thread, > > > > > > This cancels the query and waits on mcast "done" completion. > > > > > > completion is called and "done" is set. > > > > > > Meanwhile, ipoib_mcast_send arrives and starts a new query, > > > > > > re-initializing "done". > > Clear now? > > -- > MST -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Sun Dec 11 22:56:23 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Dec 2005 08:56:23 +0200 Subject: [openib-general] [PATCH fixed] was Re: [PATCH] ipoib_multicast/ipoib_mcast_send race In-Reply-To: References: Message-ID: <20051212065623.GD24168@mellanox.co.il> Quoting r. Krishna Kumar2 : > Isn't all that managed by clearing/testing the bit ? Because holding the > lock doesn't solve > it. > To give an example : > > stop_thread() > { > lock(); > clear(); > unlock(); > ... > wait_for_completion(mcast); > } > > mcast_send() > { > lock(); > test(); > results_in_creation_of_entries_done_etc();; > unlock(); > } > > In this case, if mcast_send() gets the lock first and proceeds while the > stop_thread is spinning > on the lock, the entries are created and then the stop_thread() clears > the bit and at this point > in time, no more entries can be ever created. Now if the lock were > removed, the behavior > is identical - the mcast_send() would test the bit, and get the lock() > while the stop_thread() > clears the bit (without a lock) and waits for completion, while *no > more* mcast_sends() would > ever continue beyond this time. Now mcast_send can call init_completion *after* stop_thread does wait for completion. It could also call list_add while mcast_stop_thread walks the list. Thats what I am trying to prevent. -- MST From yael at mellanox.co.il Sun Dec 11 23:52:49 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 12 Dec 2005 09:52:49 +0200 Subject: [openib-general] [PATCH] Opensm - fix bug in osm_sa_portinfo_record Message-ID: <5z7jaahhv2.fsf@mtl066.yok.mtl.com> Hi Hal, During some tests here, we noticed that if the SA is queried with IB_PIR_COMPMASK_BASELID, and base_lid = zero - the SA will return in result all the ports. The following patch fixes this. Thanks, Yael Signed-off-by: Yael Kalka Index: opensm/osm_sa_portinfo_record.c =================================================================== --- opensm/osm_sa_portinfo_record.c (revision 4371) +++ opensm/osm_sa_portinfo_record.c (working copy) @@ -266,6 +266,13 @@ __osm_sa_pir_check_physp( &p_physp->port_info, OSM_LOG_DEBUG ); + /* We have to re-chech the base_lid, since if the given + base_lid in p_pi is zero - we are comparing on all ports. */ + if( comp_mask & IB_PIR_COMPMASK_BASELID ) + { + if( p_comp_pi->base_lid != p_pi->base_lid ) + goto Exit; + } if( comp_mask & IB_PIR_COMPMASK_MKEY ) { if( p_comp_pi->m_key != p_pi->m_key ) @@ -586,6 +593,9 @@ osm_pir_rcv_process( goto Exit; } + if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) + osm_dump_portinfo_record( p_rcv->p_log, p_rcvd_rec, OSM_LOG_DEBUG ); + p_tbl = &p_rcv->p_subn->port_lid_tbl; p_pi = &p_rcvd_rec->port_info; From krkumar2 at in.ibm.com Mon Dec 12 02:52:13 2005 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Mon, 12 Dec 2005 16:22:13 +0530 Subject: [openib-general] [PATCH fixed] was Re: [PATCH] ipoib_multicast/ipoib_mcast_send race In-Reply-To: <20051212065623.GD24168@mellanox.co.il> Message-ID: Hi Michael, I see what you are doing with that lock :-) But isn't the lock a hack ? Eg, I could instead do this (If I can make sure the redundant lock/unlock is not optimized out) and it would still work : stop_thread() { clear_bit(); lock(); /* empty lock/unlock to synchronize with the mcast_send() */ unlock(); /* make the other routine FINISH before we start other activity */ ... ... } mcast_send() { lock(); if (test_bit) ... ... unlock(); } So basically, the lock is not required for clearing (and absolutely not required for setting) the bit, but a lock is required before we start waiting, to enable us to synchronize with any sends. The lock is being used as a signalling mechanism between two processes (in this case, lock/unlock is a mechanism for the mcast_send() to finish if running). Thanks, - KK "Michael S. Tsirkin" wrote on 12/12/2005 12:26:23 PM: > Quoting r. Krishna Kumar2 : > > Isn't all that managed by clearing/testing the bit ? Because holding the > > lock doesn't solve > > it. > > To give an example : > > > > stop_thread() > > { > > lock(); > > clear(); > > unlock(); > > ... > > wait_for_completion(mcast); > > } > > > > mcast_send() > > { > > lock(); > > test(); > > results_in_creation_of_entries_done_etc();; > > unlock(); > > } > > > > In this case, if mcast_send() gets the lock first and proceeds while the > > stop_thread is spinning > > on the lock, the entries are created and then the stop_thread() clears > > the bit and at this point > > in time, no more entries can be ever created. Now if the lock were > > removed, the behavior > > is identical - the mcast_send() would test the bit, and get the lock() > > while the stop_thread() > > clears the bit (without a lock) and waits for completion, while *no > > more* mcast_sends() would > > ever continue beyond this time. > > Now mcast_send can call init_completion *after* stop_thread does wait for > completion. > It could also call list_add while mcast_stop_thread walks the list. > > Thats what I am trying to prevent. > > > -- > MST -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Mon Dec 12 03:03:50 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Dec 2005 13:03:50 +0200 Subject: [openib-general] [PATCH fixed] was Re: [PATCH] ipoib_multicast/ipoib_mcast_send race In-Reply-To: References: Message-ID: <20051212110350.GY14936@mellanox.co.il> Quoting Krishna Kumar2 : > stop_thread() > { > clear_bit(); > lock(); /* empty lock/unlock to synchronize with the mcast_send() */ > unlock(); /* make the other routine FINISH before we start other activity */ > ... > ... > } > mcast_send() > { > lock(); > if (test_bit) > ... > ... > unlock(); > } I think this will work, too. But I have easier time reasoning about locks than barriers and atomic operations. "bit is protected by priv->lock" is a simple rule, and we are not on data path here. The fact that the race went unnoticed for a while validates this approach in my eyes. -- MST From krkumar2 at in.ibm.com Mon Dec 12 03:30:04 2005 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Mon, 12 Dec 2005 17:00:04 +0530 Subject: [openib-general] [PATCH fixed] was Re: [PATCH] ipoib_multicast/ipoib_mcast_send race In-Reply-To: <20051212110350.GY14936@mellanox.co.il> Message-ID: Hi Michael, (wow, don't you sleep!) > I think this will work, too. But I have easier time reasoning about locks than > barriers and atomic operations. "bit is protected by priv->lock" is a simple Correct. Also, some optimization in mcast_send() could be done by moving the label "out:" to just before the spin_unlock() (and change "mcast = NULL" to "goto out"). - KK "Michael S. Tsirkin" wrote on 12/12/2005 04:33:50 PM: > Quoting Krishna Kumar2 : > > stop_thread() > > { > > clear_bit(); > > lock(); /* empty lock/unlock to synchronize with the mcast_send() */ > > unlock(); /* make the other routine FINISH before we start other activity */ > > ... > > ... > > } > > mcast_send() > > { > > lock(); > > if (test_bit) > > ... > > ... > > unlock(); > > } > > I think this will work, too. But I have easier time reasoning about locks than > barriers and atomic operations. "bit is protected by priv->lock" is a simple > rule, and we are not on data path here. > > The fact that the race went unnoticed for a while validates this approach > in my eyes. > > -- > MST -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Dec 12 06:07:20 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Dec 2005 09:07:20 -0500 Subject: [openib-general] [PATCH] OPENSM identify failure cases uniquely In-Reply-To: <20051211044231.GG9348@esmail.cup.hp.com> References: <20051211044231.GG9348@esmail.cup.hp.com> Message-ID: <1134396439.4485.28325.camel@hal.voltaire.com> Hi Grant, On Sat, 2005-12-10 at 23:42, Grant Grundler wrote: > Hi, > When tracking down the opensm "can't open port" failure described > in previous email, I added log output for each of the failure cases > in osm_vendor_open_port(). Thanks. Applied. > The "ERR" numbers need to be compared to some "master list" that > I don't know about and replaced. > I just picked sequential numbers not used in that routine. I modified the error numbers to be unique. -- Hal From halr at voltaire.com Mon Dec 12 06:24:31 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Dec 2005 09:24:31 -0500 Subject: [openib-general] Re: [PATCH] Opensm - fix bug in osm_sa_portinfo_record In-Reply-To: <5z7jaahhv2.fsf@mtl066.yok.mtl.com> References: <5z7jaahhv2.fsf@mtl066.yok.mtl.com> Message-ID: <1134397469.4485.28408.camel@hal.voltaire.com> On Mon, 2005-12-12 at 02:52, Yael Kalka wrote: > Hi Hal, > > During some tests here, we noticed that if the SA is queried with > IB_PIR_COMPMASK_BASELID, and base_lid = zero - the SA will return in > result all the ports. > The following patch fixes this. Thanks. Applied. From ianjiang.ict at gmail.com Mon Dec 12 06:37:07 2005 From: ianjiang.ict at gmail.com (Ian Jiang) Date: Mon, 12 Dec 2005 22:37:07 +0800 Subject: [openib-general] [kDAPL]Need the array of physical pages be continuous when using dat_lmr_kcreate Message-ID: <7b2fa1820512120637m2869e1fdjf4c962decc4de9ae@mail.gmail.com> I created a LMR from three buffers which were allocated respectively with kmalloc of size 64kB. The registration went well, but the subsequent rdma read dto completed with a DAT_DTO_ERR_LOCAL_PROTECTION error. Was that because the physical address of the three buffers were not continuous? Any suggestion is appreciated! -- Ian Jiang ianjiang.ict at gmail.com Laboratory of Spatial Information Technology Division of System Architecture Institute of Computing Technology Chinese Academy of Sciences -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Mon Dec 12 06:56:29 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Dec 2005 16:56:29 +0200 Subject: [openib-general] [PATCH] sdp: replace ip_dev_find with dev_base scan (was Re: ip_dev_find resolution?) In-Reply-To: <20051207082220.GK21035@mellanox.co.il> References: <20051207082220.GK21035@mellanox.co.il> Message-ID: <20051212145629.GF14936@mellanox.co.il> Quoting Michael S. Tsirkin : > Subject: Re: ip_dev_find resolution? > > Quoting r. Hal Rosenstock : > > Subject: Re: ip_dev_find resolution? > > > > On Tue, 2005-12-06 at 12:56, Michael S. Tsirkin wrote: > > > Actually, I wander whether instead of ip_dev_find we can just > > > > > > read_lock(&dev_base_lock); > > > for (dev = dev_base; dev; dev = dev->next) { > > > > > > and check the ip address? > > > > working off the ip_ptr and ip6_ptr ? > > Yes. > > > > If this works, this has the advantage of supporting IPv6 as well. > > > > This was introduced at one point and we subsequently changed to > > ip_dev_find. I forget exactly why this was but can dig it out if no > one > > recalls. > > Please do. Any updates? Hal? I've coded the following up since I grew tired of patching my kernels to run sdp. Seems to work fine for me, can someone please speak up on why this isnt a good idea for CMA, as well? Ultimately, IMO this also has a better chance to be generalizable to IPv6. --- Replace ip_dev_find (which isnt exported in 2.6.14) with full device list lookup. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.14.3/drivers/infiniband/ulp/sdp/sdp_link.c =================================================================== --- linux-2.6.14.3.orig/drivers/infiniband/ulp/sdp/sdp_link.c +++ linux-2.6.14.3/drivers/infiniband/ulp/sdp/sdp_link.c @@ -346,6 +346,41 @@ static int sdp_link_path_rec_get(struct return 0; } +static int tryaddrmatch(struct net_device *dev, u32 s_addr, u32 d_addr) +{ + struct in_ifaddr **ifap; + struct in_ifaddr *ifa; + struct in_device *in_dev; + int rc = -ENETUNREACH; + __be32 addr; + + if (dev->type != ARPHRD_INFINIBAND) + return rc; + + in_dev = in_dev_get(dev); + if (!in_dev) + return rc; + + addr = (ZERONET(s_addr) || LOOPBACK(s_addr)) ? d_addr : s_addr; + + /* Hack to enable using SDP on addresses such as 127.0.0.1 */ + if (ZERONET(addr) || LOOPBACK(addr)) { + rc = (dev->flags & IFF_UP) ? 0 : -ENETUNREACH; + goto done; + } + + for (ifap = &in_dev->ifa_list; (ifa = *ifap); ifap = &ifa->ifa_next) { + if (s_addr == ifa->ifa_address) { + rc = 0; + break; /* found */ + } + } + +done: + in_dev_put(in_dev); + return rc; +} + /* * do_link_path_lookup - resolve an ip address to a path record */ @@ -406,17 +441,9 @@ static void do_link_path_lookup(struct s rt->u.dst.neighbour->dev->name, rt->rt_src, rt->rt_dst, rt->rt_gateway, rt->u.dst.neighbour->nud_state); - /* - * device needs to be a valid IB device. Check for loopback. - * In case of loopback find a valid IB device on which to - * direct the loopback traffic. - */ - if (rt->u.dst.neighbour->dev->flags & IFF_LOOPBACK) - dev = ip_dev_find(rt->rt_src); - else { - dev = rt->u.dst.neighbour->dev; - dev_hold(dev); - } + + dev = rt->u.dst.neighbour->dev; + dev_hold(dev); /* * check for IB device or loopback, the later requires extra @@ -433,13 +460,11 @@ static void do_link_path_lookup(struct s if (dev->flags & IFF_LOOPBACK) { dev_put(dev); read_lock(&dev_base_lock); - for (dev = dev_base; dev; dev = dev->next) { - if (dev->type == ARPHRD_INFINIBAND && - (dev->flags & IFF_UP)) { + for (dev = dev_base; dev; dev = dev->next) + if (!tryaddrmatch(dev, rt->rt_src, rt->rt_dst)) { dev_hold(dev); break; } - } read_unlock(&dev_base_lock); } -- MST From halr at voltaire.com Mon Dec 12 07:08:09 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Dec 2005 10:08:09 -0500 Subject: [openib-general] Re: [PATCH] sdp: replace ip_dev_find with dev_base scan (was Re: ip_dev_find resolution?) In-Reply-To: <20051212145629.GF14936@mellanox.co.il> References: <20051207082220.GK21035@mellanox.co.il> <20051212145629.GF14936@mellanox.co.il> Message-ID: <1134400089.4485.28653.camel@hal.voltaire.com> On Mon, 2005-12-12 at 09:56, Michael S. Tsirkin wrote: > Quoting Michael S. Tsirkin : > > Subject: Re: ip_dev_find resolution? > > > > Quoting r. Hal Rosenstock : > > > Subject: Re: ip_dev_find resolution? > > > > > > On Tue, 2005-12-06 at 12:56, Michael S. Tsirkin wrote: > > > > Actually, I wander whether instead of ip_dev_find we can just > > > > > > > > read_lock(&dev_base_lock); > > > > for (dev = dev_base; dev; dev = dev->next) { > > > > > > > > and check the ip address? > > > > > > working off the ip_ptr and ip6_ptr ? > > > > Yes. > > > > > > If this works, this has the advantage of supporting IPv6 as well. > > > > > > This was introduced at one point and we subsequently changed to > > > ip_dev_find. I forget exactly why this was but can dig it out if no > > one > > > recalls. > > > > Please do. > > Any updates? Hal? > > I've coded the following up since I grew tired of patching my kernels > to run sdp. Seems to work fine for me, can someone please speak up > on why this isnt a good idea for CMA, as well? Sorry for the slow response on this. I meant to dig this out over the weekend. I believe the reason it was changed from searching the netdevices list to ip_dev_find originally was that this really is a route lookup on the dest addr to determine what the local outgoing interface is and that can't be done directly from the netdevices list if the destination is not (IP) subnet local (e.g. gateway cases). > Ultimately, IMO this also has a better chance to be generalizable to IPv6. Yes, we need a different lookup for IPv6 with the current (ip_dev_find) approach. -- Hal From mst at mellanox.co.il Mon Dec 12 07:28:43 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Dec 2005 17:28:43 +0200 Subject: [openib-general] Re: [PATCH] sdp: replace ip_dev_find with dev_base scan (was Re: ip_dev_find resolution?) In-Reply-To: <1134400089.4485.28653.camel@hal.voltaire.com> References: <1134400089.4485.28653.camel@hal.voltaire.com> Message-ID: <20051212152843.GJ14936@mellanox.co.il> Quoting Hal Rosenstock : > > I've coded the following up since I grew tired of patching my kernels > > to run sdp. Seems to work fine for me, can someone please speak up > > on why this isnt a good idea for CMA, as well? > > Sorry for the slow response on this. I meant to dig this out over the > weekend. > > I believe the reason it was changed from searching the netdevices list > to ip_dev_find originally was that this really is a route lookup on the > dest addr to determine what the local outgoing interface is and that > can't be done directly from the netdevices list if the destination is > not (IP) subnet local (e.g. gateway cases). What you are saying is, the original approach didnt do ip route lookup at all, and thats why it was changed? But now we only do the list walk if the ip route resolution returns a loopback device, so we are ok, right? -- MST From mst at mellanox.co.il Mon Dec 12 08:09:20 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Dec 2005 18:09:20 +0200 Subject: [openib-general] [PATCH] libibverbs: report board id in ibv_devinfo Message-ID: <20051212160920.GM14936@mellanox.co.il> Report board_id from ibv_devinfo, if present. Signed-off-by: Dotan Barak Signed-off-by: Michael S. Tsirkin Index: openib/src/userspace/libibverbs/examples/devinfo.c =================================================================== --- openib/src/userspace/libibverbs/examples/devinfo.c (revision 4399) +++ openib/src/userspace/libibverbs/examples/devinfo.c (working copy) @@ -174,6 +174,7 @@ static int print_hca_cap(struct ibv_devi struct ibv_context *ctx; struct ibv_device_attr device_attr; struct ibv_port_attr port_attr; + struct sysfs_attribute *attr; int rc = 0; uint8_t port; char buf[256]; @@ -198,6 +199,12 @@ static int print_hca_cap(struct ibv_devi printf("\tvendor_id:\t\t\t0x%04x\n", device_attr.vendor_id); printf("\tvendor_part_id:\t\t\t%d\n", device_attr.vendor_part_id); printf("\thw_ver:\t\t\t\t0x%X\n", device_attr.hw_ver); + attr = sysfs_get_classdev_attr(ib_dev->ibdev, "board_id"); + if (attr) { + printf("\tboard_id:\t\t\t%s", attr->value); + sysfs_close_attribute(attr); + } + printf("\tphys_port_cnt:\t\t\t%d\n", device_attr.phys_port_cnt); if (verbose) { -- MST From robert.j.woodruff at intel.com Mon Dec 12 08:30:38 2005 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Mon, 12 Dec 2005 08:30:38 -0800 Subject: [openib-general] RE: Next workshop dates? Feedback from others please!!! Message-ID: <1AC79F16F5C5284499BB9591B33D6F00065627B7@orsmsx408> Sounds like the 5th is the best date to accommodate most people. woody -----Original Message----- From: Bill Boas [mailto:bboas at llnl.gov] Sent: Sunday, December 11, 2005 4:18 PM To: Woodruff, Robert J; openib-general at openib.org; openib-promoters at openib.org Subject: RE: Next workshop dates? Feedback from others please!!! Bob, Voltaire says they cannot not make Jan 29 'cos they have an all hands 3 day meeting in that time slot. In the Feb 5-8 timeframe we worked in the super bowl last year OK I think??? By the time we get to late Feb the hotel rates in Sonoma go up so I'm thinking we should go for Feb 5 again, but feedback on this would be very helpful as we need to make a decision in the next few days. Feedback from others please. Bill. At 03:10 PM 12/9/2005, you wrote: >Jan29-Feb1 works for me. >Feb 5 is super bowl Sunday, might want to stay away from that one >Feb 12 - Sean is on vacation. >Feb 20 - Is a holiday in the US. >How about Feb 27- March 2 as an alternative if people cannot make Jan29 >? > >woody > > >-----Original Message----- >From: openib-general-bounces at openib.org >[mailto:openib-general-bounces at openib.org] On Behalf Of Bill Boas >Sent: Thursday, December 08, 2005 10:53 PM >To: openib-promoters at openib.org; openib-general at openib.org >Subject: [openib-general] Next workshop dates? Please respond with >yourpreferences > >All those wishing to attend the next workshop in Sonoma at the Lodge >(same as last year) in the late January-early February please respond >with your preferred dates. > >We currently have Jan29-Feb1 held for us but some people are telling >us that is bad for them. > >The next 2 Sun-Wed slots (Feb 5-8 or 12-15) maybe available but we >need guidance from those planning to attend as to their preferred dates. > >Bill. > >Bill Boas bboas at llnl.gov >ICCD LLNL, B-453, R-2018 Wk: 925-422-4110 >7000 East Ave, L-555 Cell: 925-337-2224 >Livermore, CA 94551 Pgr: 877-203-2248 > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general Bill Boas bboas at llnl.gov ICCD LLNL, B-453, R-2018 Wk: 925-422-4110 7000 East Ave, L-555 Cell: 925-337-2224 Livermore, CA 94551 Pgr: 877-203-2248 From mst at mellanox.co.il Mon Dec 12 08:58:59 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Dec 2005 18:58:59 +0200 Subject: [openib-general] [PATCH] mthca: correct IB_QP_ACCESS_FLAGS handling Message-ID: <20051212165859.GO14936@mellanox.co.il> This patch corrects some corner cases in managing the RAE/RRE bits in the mthca qp context. These bits need to be zero if the user requests max_dest_rd_atomic of zero. The bits need to be restored to the value implied by the qp access flags attribute in a previous (or the current) modify-qp command if the dest_rd_atomic variable is changed to non-zero. In the current implementation, the following scenario will not work: RESET-to-INIT set QP access flags to all disabled (zeroes) INIT-to-RTR set max_dest_rd_atomic=10, AND set qp_access_flags = IB_ACCESS_REMOTE_READ | IB_ACCESS_REMOTE_ATOMIC The current code will incorrectly take the access-flags value set in the RESET-to-INIT transition. --- Simplify, and correct, IB_QP_ACCESS_FLAGS handling: it is always safe to set qp access flags in hardware command if either of IB_QP_MAX_DEST_RD_ATOMIC or IB_QP_ACCESS_FLAGS is set, so lets just set it to the correct value, always. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Index: linux-kernel/drivers/infiniband/hw/mthca/mthca_qp.c =================================================================== --- linux-kernel.orig/drivers/infiniband/hw/mthca/mthca_qp.c +++ linux-kernel/drivers/infiniband/hw/mthca/mthca_qp.c @@ -520,6 +520,36 @@ static void init_port(struct mthca_dev * mthca_warn(dev, "INIT_IB returned status %02x.\n", status); } +static u32 get_hw_access_flags(struct mthca_qp *qp, struct ib_qp_attr *attr, + int attr_mask) +{ + u8 dest_rd_atomic; + u32 access_flags; + u32 hw_access_flags; + + if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC) + dest_rd_atomic = attr->max_dest_rd_atomic; + else + dest_rd_atomic = qp->resp_depth; + + if (attr_mask & IB_QP_ACCESS_FLAGS) + access_flags = (u32)attr->qp_access_flags; + else + access_flags = qp->atomic_rd_en; + + if (!dest_rd_atomic) + access_flags &= IB_ACCESS_REMOTE_WRITE; + + hw_access_flags = access_flags & IB_ACCESS_REMOTE_READ ? + MTHCA_QP_BIT_RRE : 0; + hw_access_flags |= access_flags & IB_ACCESS_REMOTE_ATOMIC ? + MTHCA_QP_BIT_RAE : 0; + hw_access_flags |= access_flags & IB_ACCESS_REMOTE_WRITE ? + MTHCA_QP_BIT_RWE : 0; + + return cpu_to_be32(hw_access_flags); +} + int mthca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask) { struct mthca_dev *dev = to_mdev(ibqp->device); @@ -741,57 +776,7 @@ int mthca_modify_qp(struct ib_qp *ibqp, qp_context->snd_db_index = cpu_to_be32(qp->sq.db_index); } - if (attr_mask & IB_QP_ACCESS_FLAGS) { - qp_context->params2 |= - cpu_to_be32(attr->qp_access_flags & IB_ACCESS_REMOTE_WRITE ? - MTHCA_QP_BIT_RWE : 0); - - /* - * Only enable RDMA reads and atomics if we have - * responder resources set to a non-zero value. - */ - if (qp->resp_depth) { - qp_context->params2 |= - cpu_to_be32(attr->qp_access_flags & IB_ACCESS_REMOTE_READ ? - MTHCA_QP_BIT_RRE : 0); - qp_context->params2 |= - cpu_to_be32(attr->qp_access_flags & IB_ACCESS_REMOTE_ATOMIC ? - MTHCA_QP_BIT_RAE : 0); - } - - qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_RWE | - MTHCA_QP_OPTPAR_RRE | - MTHCA_QP_OPTPAR_RAE); - } - if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC) { - if (qp->resp_depth && !attr->max_dest_rd_atomic) { - /* - * Lowering our responder resources to zero. - * Turn off reads RDMA and atomics as responder. - * (RRE/RAE in params2 already zero) - */ - qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_RRE | - MTHCA_QP_OPTPAR_RAE); - } - - if (!qp->resp_depth && attr->max_dest_rd_atomic) { - /* - * Increasing our responder resources from - * zero. Turn on RDMA reads and atomics as - * appropriate. - */ - qp_context->params2 |= - cpu_to_be32(qp->atomic_rd_en & IB_ACCESS_REMOTE_READ ? - MTHCA_QP_BIT_RRE : 0); - qp_context->params2 |= - cpu_to_be32(qp->atomic_rd_en & IB_ACCESS_REMOTE_ATOMIC ? - MTHCA_QP_BIT_RAE : 0); - - qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_RRE | - MTHCA_QP_OPTPAR_RAE); - } - if (attr->max_dest_rd_atomic) qp_context->params2 |= cpu_to_be32(fls(attr->max_dest_rd_atomic - 1) << 21); @@ -799,6 +784,13 @@ int mthca_modify_qp(struct ib_qp *ibqp, qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_RRA_MAX); } + if (attr_mask & (IB_QP_ACCESS_FLAGS | IB_QP_MAX_DEST_RD_ATOMIC)) { + qp_context->params2 |= get_hw_access_flags(qp, attr, attr_mask); + qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_RWE | + MTHCA_QP_OPTPAR_RRE | + MTHCA_QP_OPTPAR_RAE); + } + qp_context->params2 |= cpu_to_be32(MTHCA_QP_BIT_RSC); if (ibqp->srq) -- MST From iod00d at hp.com Mon Dec 12 09:38:24 2005 From: iod00d at hp.com (Grant Grundler) Date: Mon, 12 Dec 2005 09:38:24 -0800 Subject: [openib-general] [PATCH applied] sdp: fix kunmap_atomic usage In-Reply-To: <20051211180543.GR14936@mellanox.co.il> References: <20051211175341.GA12176@esmail.cup.hp.com> <20051211180543.GR14936@mellanox.co.il> Message-ID: <20051212173824.GA15771@esmail.cup.hp.com> On Sun, Dec 11, 2005 at 08:05:43PM +0200, Michael S. Tsirkin wrote: > This might be benign, need to check. > Did the test run to completion with these messages? I aborted the test with ^C and I tried to restarted it. The two hosts could not ping each other via the IPoIB link. I'll poke at this more later today. grant From mst at mellanox.co.il Mon Dec 12 09:42:39 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Dec 2005 19:42:39 +0200 Subject: [openib-general] [PATCH applied] sdp: fix kunmap_atomic usage In-Reply-To: <20051212173824.GA15771@esmail.cup.hp.com> References: <20051212173824.GA15771@esmail.cup.hp.com> Message-ID: <20051212174239.GQ14936@mellanox.co.il> Quoting r. Grant Grundler : > Subject: Re: [openib-general] [PATCH applied] sdp: fix kunmap_atomic usage > > On Sun, Dec 11, 2005 at 08:05:43PM +0200, Michael S. Tsirkin wrote: > > This might be benign, need to check. > > Did the test run to completion with these messages? > > I aborted the test with ^C and I tried to restarted it. > The two hosts could not ping each other via the IPoIB link. > I'll poke at this more later today. Unless SDP triggered an oops, this doesnt sound like an SDP problem ... -- MST From rdreier at cisco.com Mon Dec 12 09:39:30 2005 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 12 Dec 2005 09:39:30 -0800 Subject: [openib-general] Next workshop dates? Ideas for agenda??? In-Reply-To: <6.2.3.4.2.20051211160126.03bc3878@mail-lc.llnl.gov> (Bill Boas's message of "Sun, 11 Dec 2005 16:10:54 -0800") References: <6AB138A2AB8C8E4A98B9C0C3D52670E366C44A@mtlexch01.mtl.com> <6.2.3.4.2.20051211160126.03bc3878@mail-lc.llnl.gov> Message-ID: Bill> I think, subject to others input, it'll be focused on Bill> wrapping up rel 1.0 of OpenIB, discussing what the Bill> developers are going to focus on next and validating the Bill> strategy for RDMA over Ethernet integration at the verbs Bill> level to lay the foundation for one, consistent RDMA Bill> structure in Linux, if possible. I'm not sure I see the point in dragging everyone together in early February. With the holidays coming, realistically we only have maybe 5 weeks to prepare a conference agenda, and I don't see that as being enough time to set up a productive meeting. In particular: * wrapping up rel 1.0 -- the release process for a "1.0" release has not even started. About all we could hope to accomplish would be to pick a release manager and tell that person to go start driving a release, and I don't see that as a good use of face-to-face time. It would be much better to pick someone to drive the release and then give the release manager time to start putting the release together before getting together, so that we have some idea of what the real issues that need to be hashed out in person are. * iWARP integration -- again, not enough discussion has taken place in advance. Until the community has a chance to really study the proposed changes and figure out what the real difficult issues that need to be sorted out in person are, again it's a waste of time to meet in person. * discuss developers next steps -- perhaps I'm pessimistic but I think we'll just get the same talks we've already seen twice before at Sonoma and IDF. Sonoma is a short trip for me but given the number of people that will have to come from the East coast and Israel, I think we should think hard about whether this conference is the best use of our time. - R. From mshefty at ichips.intel.com Mon Dec 12 09:54:26 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 12 Dec 2005 09:54:26 -0800 Subject: [openib-general] [PATCH] [CMA] support for SDP + standard protocol In-Reply-To: References: Message-ID: <439DB952.3040506@ichips.intel.com> Dan Bar Dov wrote: > I would have preferred not to add upper layer aware code into CMA, > but I guess I'm late for that discussion. If you mean add SDP code to the CMA, without it, SDP cannot use the CMA and must duplicate most of the same functionality itself. > Regarding the patch below, it makes sense. Are you going to apply it to all > affected modules? I will apply the patch to all affected modules. - Sean From bboas at llnl.gov Mon Dec 12 09:55:03 2005 From: bboas at llnl.gov (Bill Boas) Date: Mon, 12 Dec 2005 09:55:03 -0800 Subject: [Openib-promoters] Re: [openib-general] Next workshop dates? Ideas for agenda??? In-Reply-To: References: <6AB138A2AB8C8E4A98B9C0C3D52670E366C44A@mtlexch01.mtl.com> <6.2.3.4.2.20051211160126.03bc3878@mail-lc.llnl.gov> Message-ID: <6.2.3.4.2.20051212094618.01ff5c70@mail-lc.llnl.gov> Roland, These are all excellent perspectives, I hope others will respond with their view points. Certainly repeating what we have heard already is not a good use of anyone's time or money but I'm under the impression that we will have made some progress toward what we want to work on next as a result of PathForward Phase 2, input from Tom Tucker and others on OpenIB iWARP integration and the HSIR meeting in NYC tomorrow. With respect to "release" of OpenIB rel 1.0, did Doug Ledford effectively do that a week or two ago? I think those of us ( including me) who originally thought OpenIB was actually going be an organization that released and supported code (like RedHat, say) had got it wrong. Now I believe that when a Linux distribution, an IB company or a Tier One OEM decides that is a version of the code that they will support, then that is a "release". OpenIB may be best utilized to try to achieve some consistency in timeframe and content amongst those who wish to "release and support" the code??? Bill. At 09:39 AM 12/12/2005, Roland Dreier wrote: > Bill> I think, subject to others input, it'll be focused on > Bill> wrapping up rel 1.0 of OpenIB, discussing what the > Bill> developers are going to focus on next and validating the > Bill> strategy for RDMA over Ethernet integration at the verbs > Bill> level to lay the foundation for one, consistent RDMA > Bill> structure in Linux, if possible. > >I'm not sure I see the point in dragging everyone together in early >February. With the holidays coming, realistically we only have maybe >5 weeks to prepare a conference agenda, and I don't see that as being >enough time to set up a productive meeting. > >In particular: > > * wrapping up rel 1.0 -- the release process for a "1.0" release has > not even started. About all we could hope to accomplish would be > to pick a release manager and tell that person to go start driving > a release, and I don't see that as a good use of face-to-face > time. It would be much better to pick someone to drive the release > and then give the release manager time to start putting the release > together before getting together, so that we have some idea of what > the real issues that need to be hashed out in person are. > > * iWARP integration -- again, not enough discussion has taken place > in advance. Until the community has a chance to really study the > proposed changes and figure out what the real difficult issues that > need to be sorted out in person are, again it's a waste of time to > meet in person. > > * discuss developers next steps -- perhaps I'm pessimistic but I > think we'll just get the same talks we've already seen twice before > at Sonoma and IDF. > >Sonoma is a short trip for me but given the number of people that will >have to come from the East coast and Israel, I think we should think >hard about whether this conference is the best use of our time. > > - R. >_______________________________________________ >openib-promoters mailing list >openib-promoters at openib.org >http://openib.org/mailman/listinfo/openib-promoters Bill Boas bboas at llnl.gov ICCD LLNL, B-453, R-2018 Wk: 925-422-4110 7000 East Ave, L-555 Cell: 925-337-2224 Livermore, CA 94551 Pgr: 877-203-2248 From robert.j.woodruff at intel.com Mon Dec 12 11:37:26 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Mon, 12 Dec 2005 11:37:26 -0800 Subject: [Openib-promoters] Re: [openib-general] Next workshopdates? Ideas for agenda??? In-Reply-To: <6.2.3.4.2.20051212094618.01ff5c70@mail-lc.llnl.gov> Message-ID: Roland wrote, >I'm not sure I see the point in dragging everyone together in early >February. With the holidays coming, realistically we only have maybe >5 weeks to prepare a conference agenda, and I don't see that as being >enough time to set up a productive meeting. Another possibility would be to delay the workshop till early March and have it the day before IDF, as we did last fall. Thoughts ? woody From halr at voltaire.com Mon Dec 12 12:41:50 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Dec 2005 15:41:50 -0500 Subject: [openib-general] Re: [PATCH] sdp: replace ip_dev_find with dev_base scan (was Re: ip_dev_find resolution?) In-Reply-To: <20051212152843.GJ14936@mellanox.co.il> References: <1134400089.4485.28653.camel@hal.voltaire.com> <20051212152843.GJ14936@mellanox.co.il> Message-ID: <1134420109.4485.31207.camel@hal.voltaire.com> On Mon, 2005-12-12 at 10:28, Michael S. Tsirkin wrote: > Quoting Hal Rosenstock : > > > I've coded the following up since I grew tired of patching my kernels > > > to run sdp. Seems to work fine for me, can someone please speak up > > > on why this isnt a good idea for CMA, as well? > > > > Sorry for the slow response on this. I meant to dig this out over the > > weekend. > > > > I believe the reason it was changed from searching the netdevices list > > to ip_dev_find originally was that this really is a route lookup on the > > dest addr to determine what the local outgoing interface is and that > > can't be done directly from the netdevices list if the destination is > > not (IP) subnet local (e.g. gateway cases). > > What you are saying is, the original approach didnt do ip route lookup at all, > and thats why it was changed? Yes, I believe so. > But now we only do the list walk if the ip route resolution returns > a loopback device, so we are ok, right? That's the way it seems to me (at least for SDP but not CMA (addr)). -- Hal From mst at mellanox.co.il Mon Dec 12 12:55:15 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Dec 2005 22:55:15 +0200 Subject: [openib-general] Re: [PATCH] sdp: replace ip_dev_find with dev_base scan (was Re: ip_dev_find resolution?) In-Reply-To: <1134420109.4485.31207.camel@hal.voltaire.com> References: <1134420109.4485.31207.camel@hal.voltaire.com> Message-ID: <20051212205515.GC28391@mellanox.co.il> Quoting Hal Rosenstock : > > But now we only do the list walk if the ip route resolution returns > > a loopback device, so we are ok, right? > > That's the way it seems to me (at least for SDP but not CMA (addr)). OK, Sean, do you want to cook up a patch for CMA based on this code of mine? I could do it too but I dont have a way to test cma yet. -- MST From halr at voltaire.com Mon Dec 12 12:54:55 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Dec 2005 15:54:55 -0500 Subject: [openib-general] Re: [PATCH] sdp: replace ip_dev_find with dev_base scan (was Re: ip_dev_find resolution?) In-Reply-To: <20051212205515.GC28391@mellanox.co.il> References: <1134420109.4485.31207.camel@hal.voltaire.com> <20051212205515.GC28391@mellanox.co.il> Message-ID: <1134420895.4485.31272.camel@hal.voltaire.com> On Mon, 2005-12-12 at 15:55, Michael S. Tsirkin wrote: > Quoting Hal Rosenstock : > > > But now we only do the list walk if the ip route resolution returns > > > a loopback device, so we are ok, right? > > > > That's the way it seems to me (at least for SDP but not CMA (addr)). > > OK, Sean, do you want to cook up a patch for CMA based on this code of mine? > I could do it too but I dont have a way to test cma yet. It's used in 2 ways in addr.c from what I can see. One is for the local address, the other not. So I'm not sure the same thing can be done here. -- Hal From lindahl at pathscale.com Mon Dec 12 13:23:25 2005 From: lindahl at pathscale.com (Greg Lindahl) Date: Mon, 12 Dec 2005 13:23:25 -0800 Subject: [Openib-promoters] Re: [openib-general] Next workshop dates? Ideas for agenda??? In-Reply-To: <6.2.3.4.2.20051212094618.01ff5c70@mail-lc.llnl.gov> References: <6AB138A2AB8C8E4A98B9C0C3D52670E366C44A@mtlexch01.mtl.com> <6.2.3.4.2.20051211160126.03bc3878@mail-lc.llnl.gov> <6.2.3.4.2.20051212094618.01ff5c70@mail-lc.llnl.gov> Message-ID: <20051212212325.GA1990@greglaptop.internal.keyresearch.com> On Mon, Dec 12, 2005 at 09:55:03AM -0800, Bill Boas wrote: > Now I believe that when a Linux > distribution, an IB company or a Tier One OEM decides that is a > version of the code that they will support, then that is a "release". Why don't we imitate the Linux kernel process? OpenIB has to follow a sane process of innovation followed by stabilization and bug-fixing in order for the IB companies and Tier 1s to be able to make solid releases. -- greg From rdreier at cisco.com Mon Dec 12 14:01:33 2005 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 12 Dec 2005 14:01:33 -0800 Subject: [Openib-promoters] Re: [openib-general] Next workshop dates? Ideas for agenda??? In-Reply-To: <6.2.3.4.2.20051212094618.01ff5c70@mail-lc.llnl.gov> (Bill Boas's message of "Mon, 12 Dec 2005 09:55:03 -0800") References: <6AB138A2AB8C8E4A98B9C0C3D52670E366C44A@mtlexch01.mtl.com> <6.2.3.4.2.20051211160126.03bc3878@mail-lc.llnl.gov> <6.2.3.4.2.20051212094618.01ff5c70@mail-lc.llnl.gov> Message-ID: Bill> With respect to "release" of OpenIB rel 1.0, did Doug Bill> Ledford effectively do that a week or two ago? No, Doug put a snapshot of the OpenIB tree into a RHEL update, which is quite a different thing. Bill> I think those of us ( including me) who originally thought Bill> OpenIB was actually going be an organization that released Bill> and supported code (like RedHat, say) had got it wrong. Now Bill> I believe that when a Linux distribution, an IB company or a Bill> Tier One OEM decides that is a version of the code that they Bill> will support, then that is a "release". OpenIB may be best Bill> utilized to try to achieve some consistency in timeframe and Bill> content amongst those who wish to "release and support" the Bill> code??? I think the model that projects like the Linux kernel, Gnome, KDE, Mozilla, et al follow is a good one: the project releases well-defined packages with a known version number, which distributors can then base their packages on (adding value through QA, bug fixes, additional functionality, or whatever else they can think of). For example, I know that my Ubuntu desktop ships with Gnome version 2.10, so even though Ubuntu has done a fair bit of customization, I know basically what to expect. - R. From mshefty at ichips.intel.com Mon Dec 12 14:36:56 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 12 Dec 2005 14:36:56 -0800 Subject: [openib-general] Re: [PATCH] sdp: replace ip_dev_find with dev_base scan (was Re: ip_dev_find resolution?) In-Reply-To: <20051212205515.GC28391@mellanox.co.il> References: <1134420109.4485.31207.camel@hal.voltaire.com> <20051212205515.GC28391@mellanox.co.il> Message-ID: <439DFB88.9000006@ichips.intel.com> Michael S. Tsirkin wrote: >>>But now we only do the list walk if the ip route resolution returns >>>a loopback device, so we are ok, right? >> >>That's the way it seems to me (at least for SDP but not CMA (addr)). > > OK, Sean, do you want to cook up a patch for CMA based on this code of mine? > I could do it too but I dont have a way to test cma yet. I need to read back over this thread; I didn't follow it the first time. (At this point, I think that it makes more sense to convert SDP to use the CMA or ib_addr, rather than duplicating their functionality.) There are both kernel and userspace test programs (cmatose) for the CMA checked into the tree. ip_addr can use a different approach than ip_dev_find, but I think it makes sense to use existing kernel functionality wherever possible. If ip_dev_find cannot be modified to support IPv6 addresses, then how about adding a API that is similar, but takes a pointer to an address, along with an indication of the address family that's used? I can work on a patch for this. - Sean From mst at mellanox.co.il Mon Dec 12 21:06:57 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Dec 2005 07:06:57 +0200 Subject: [openib-general] Re: [PATCH] sdp: replace ip_dev_find with dev_base scan (was Re: ip_dev_find resolution?) In-Reply-To: <439DFB88.9000006@ichips.intel.com> References: <439DFB88.9000006@ichips.intel.com> Message-ID: <20051213050657.GA4940@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] Re: [PATCH] sdp: replace ip_dev_find with dev_base scan (was Re: ip_dev_find resolution?) > > Michael S. Tsirkin wrote: > >>>But now we only do the list walk if the ip route resolution returns > >>>a loopback device, so we are ok, right? > >> > >>That's the way it seems to me (at least for SDP but not CMA (addr)). > > > > OK, Sean, do you want to cook up a patch for CMA based on this code of mine? > > I could do it too but I dont have a way to test cma yet. > > I need to read back over this thread; I didn't follow it the first time. (At > this point, I think that it makes more sense to convert SDP to use the CMA or > ib_addr, rather than duplicating their functionality.) Sure, this was just a test to show how to get rid of ip_dev_find. I didnt follow CMA recently - does CMA already support SDP private format? If yes, I need to work on moving SDP to use it. > There are both kernel and userspace test programs (cmatose) for the CMA checked > into the tree. > > ip_addr can use a different approach than ip_dev_find, but I think it makes > sense to use existing kernel functionality wherever possible. If ip_dev_find > cannot be modified to support IPv6 addresses, That's my assumption. > Then how about adding a API that > is similar, but takes a pointer to an address, along with an indication of the > address family that's used? > > I can work on a patch for this. > > - Sean ip_dev_find isnt ideal for our purposes either. As a reminder, what we are trying to do is handle the loopback case, where IP route resolution simply gives us back the loopback device, but we want to use the IB loopback: either external if source/destination are specified, or external if not. Take a look at tryaddrmatch in the SDP patch I've sent (reposting here for convenience), I think it might be a good start functionality-wise. Here's what it does: +static int tryaddrmatch(struct net_device *dev, u32 s_addr, u32 d_addr) +{ + struct in_ifaddr **ifap; + struct in_ifaddr *ifa; + struct in_device *in_dev; + int rc = -ENETUNREACH; + __be32 addr; + + if (dev->type != ARPHRD_INFINIBAND) + return rc; Look for net device of a given hardware type only. + + in_dev = in_dev_get(dev); + if (!in_dev) + return rc; And that supports the given address family. + + addr = (ZERONET(s_addr) || LOOPBACK(s_addr)) ? d_addr : s_addr; If source address is for a loopback device, select by destination address. + + /* Hack to enable using SDP on addresses such as 127.0.0.1 */ + if (ZERONET(addr) || LOOPBACK(addr)) { If destination address is for loopback as well, select any device of appropriate type that is up. + rc = (dev->flags & IFF_UP) ? 0 : -ENETUNREACH; + goto done; + } + + for (ifap = &in_dev->ifa_list; (ifa = *ifap); ifap = &ifa->ifa_next) { + if (s_addr == ifa->ifa_address) { Otherwise, look for device with the appropriate IP address. + rc = 0; + break; /* found */ + } + } + +done: + in_dev_put(in_dev); + return rc; +} + /* * do_link_path_lookup - resolve an ip address to a path record */ @@ -406,17 +441,9 @@ static void do_link_path_lookup(struct s rt->u.dst.neighbour->dev->name, rt->rt_src, rt->rt_dst, rt->rt_gateway, rt->u.dst.neighbour->nud_state); - /* - * device needs to be a valid IB device. Check for loopback. - * In case of loopback find a valid IB device on which to - * direct the loopback traffic. - */ - if (rt->u.dst.neighbour->dev->flags & IFF_LOOPBACK) - dev = ip_dev_find(rt->rt_src); - else { - dev = rt->u.dst.neighbour->dev; - dev_hold(dev); - } + + dev = rt->u.dst.neighbour->dev; + dev_hold(dev); /* * check for IB device or loopback, the later requires extra @@ -433,13 +460,11 @@ static void do_link_path_lookup(struct s if (dev->flags & IFF_LOOPBACK) { dev_put(dev); read_lock(&dev_base_lock); - for (dev = dev_base; dev; dev = dev->next) { - if (dev->type == ARPHRD_INFINIBAND && - (dev->flags & IFF_UP)) { + for (dev = dev_base; dev; dev = dev->next) + if (!tryaddrmatch(dev, rt->rt_src, rt->rt_dst)) { dev_hold(dev); break; } - } read_unlock(&dev_base_lock); } -- MST From bardov at gmail.com Tue Dec 13 00:40:26 2005 From: bardov at gmail.com (Dan Bar Dov) Date: Tue, 13 Dec 2005 10:40:26 +0200 Subject: [openib-general] [PATCH] [CMA] support for SDP + standard protocol In-Reply-To: <439DB952.3040506@ichips.intel.com> References: <439DB952.3040506@ichips.intel.com> Message-ID: On 12/12/05, Sean Hefty wrote: > Dan Bar Dov wrote: > > I would have preferred not to add upper layer aware code into CMA, > > but I guess I'm late for that discussion. > > If you mean add SDP code to the CMA, without it, SDP cannot use the CMA and must > duplicate most of the same functionality itself. I understand that SDP needs address translation services as well as its own private data. However, I think it could be implemented using optional API functions that allow the ULP to modify the private data per its need, rather than adding ULP knowledge into CMA. As example, if ISER spec will be modified, or some new ULP implemented, that needed their own private data, we'll need to modify CMA again, as well as creating a dependency between CMA versions and ULPs. > > > Regarding the patch below, it makes sense. Are you going to apply it to all > > affected modules? > > I will apply the patch to all affected modules. Thanks. Dan > > - Sean > From mst at mellanox.co.il Tue Dec 13 01:07:26 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Dec 2005 11:07:26 +0200 Subject: [openib-general] Re: [PATCH] [CMA] support for SDP + standard protocol In-Reply-To: References: Message-ID: <20051213090726.GV14936@mellanox.co.il> Quoting Dan Bar Dov : > I understand that SDP needs address translation services as well as > its own private data. SDP is an exception simply because it was there first. > However, I think it could be implemented using > optional API functions that allow the ULP to modify the private data > per its need, rather than adding ULP knowledge into CMA. I agree this would also work, but I like the existing API better. Hopefully, the simple way in which its being implemented will help drive new ULP authors to follow the uniform spec rather than override it :) > As example, if ISER spec will be modified, or some new ULP > implemented, that needed their own private data, we'll need to modify > CMA again, as well as creating a dependency between CMA versions and > ULPs. What do you mean by "CMA versions"? -- MST From mst at mellanox.co.il Tue Dec 13 01:09:19 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Dec 2005 11:09:19 +0200 Subject: [openib-general] [PATCH] mthca: correct max_rd_atomic handling Message-ID: <20051213090919.GW14936@mellanox.co.il> Fix corner cases in max_rd_atomic value handling in modify-qp. sae/sre/swe bits should only be set when setting sra_max. Further, in current code, if caller specifies max_rd_atomic = 0, the sre and sae bits remain set, with the result that max_rd_atomic = 1 in effect. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Index: latest/drivers/infiniband/hw/mthca/mthca_qp.c =================================================================== --- latest.orig/drivers/infiniband/hw/mthca/mthca_qp.c +++ latest/drivers/infiniband/hw/mthca/mthca_qp.c @@ -714,10 +714,7 @@ int mthca_modify_qp(struct ib_qp *ibqp, /* leave wqe_base as 0 (we always create an MR based at 0 for WQs) */ qp_context->wqe_lkey = cpu_to_be32(qp->mr.ibmr.lkey); qp_context->params1 = cpu_to_be32((MTHCA_ACK_REQ_FREQ << 28) | - (MTHCA_FLIGHT_LIMIT << 24) | - MTHCA_QP_BIT_SRE | - MTHCA_QP_BIT_SWE | - MTHCA_QP_BIT_SAE); + (MTHCA_FLIGHT_LIMIT << 24)); if (qp->sq_policy == IB_SIGNAL_ALL_WR) qp_context->params1 |= cpu_to_be32(MTHCA_QP_BIT_SSC); if (attr_mask & IB_QP_RETRY_CNT) { @@ -726,9 +723,14 @@ int mthca_modify_qp(struct ib_qp *ibqp, } if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC) { - if (attr->max_rd_atomic) + qp_context->params1 = cpu_to_be32(MTHCA_QP_BIT_SWE); + if (attr->max_rd_atomic) { + qp_context->params1 |= + cpu_to_be32(MTHCA_QP_BIT_SRE | + MTHCA_QP_BIT_SAE); qp_context->params1 |= cpu_to_be32(fls(attr->max_rd_atomic - 1) << 21); + } qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_SRA_MAX); } -- MST From mst at mellanox.co.il Tue Dec 13 02:22:55 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Dec 2005 12:22:55 +0200 Subject: [openib-general] [PATCH] libmthca: fix error handling in mthca_store_qp In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3D1B80D@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3D1B80D@mtlexch01.mtl.com> Message-ID: <20051213102255.GX14936@mellanox.co.il> fix error handling in mthca_store_qp Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Index: latest/src/userspace/libmthca/src/qp.c =================================================================== --- latest.orig/src/userspace/libmthca/src/qp.c +++ latest/src/userspace/libmthca/src/qp.c @@ -879,6 +879,7 @@ int mthca_store_qp(struct mthca_context ctx->qp_table[tind].table = calloc(ctx->qp_table_mask + 1, sizeof (struct mthca_qp *)); if (!ctx->qp_table[tind].table) { + --ctx->qp_table[tind].refcnt; ret = -1; goto out; } From sean.hefty at intel.com Tue Dec 13 10:39:13 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 13 Dec 2005 10:39:13 -0800 Subject: [openib-general] [PATCH] [CMA] support for SDP + standard protocol In-Reply-To: Message-ID: >I understand that SDP needs address translation services as well as >its own private data. However, I think it could be implemented using >optional API functions that allow the ULP to modify the private data >per its need, rather than adding ULP knowledge into CMA. >As example, if ISER spec will be modified, or some new ULP >implemented, that needed their own private data, we'll need to modify >CMA again, as well as creating a dependency between CMA versions and >ULPs. The CMA must be aware of the format of the data in order to set and extract the IP addressing information. SDP and the new CMA format locate these in different areas of the private data. The CMA only defines the SDP hello header, and restricts its definition to the location of the IP addresses, source port, and version information. If a ULP wants to define their own private data format and move the locations of any of those fields, then yes, the CMA would need to be changed again. But I don't see how any API changes can prevent this, since the CMA must be able to extract the data on the remote side. - Sean From sean.hefty at intel.com Tue Dec 13 10:46:03 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 13 Dec 2005 10:46:03 -0800 Subject: [openib-general] Re: [PATCH] sdp: replace ip_dev_find withdev_base scan (was Re: ip_dev_find resolution?) In-Reply-To: <20051213050657.GA4940@mellanox.co.il> Message-ID: >Sure, this was just a test to show how to get rid of ip_dev_find. >I didnt follow CMA recently - does CMA already support SDP private format? >If yes, I need to work on moving SDP to use it. The latest check-in added support for SDP. >As a reminder, what we are trying to do is handle the loopback case, >where IP route resolution simply gives us back the loopback device, but >we want to use the IB loopback: either external if source/destination >are specified, or external if not. Can you look over the CMA/ib_addr code and see if what is done meets your needs (minus support for IPv6)? I've never tested the CMA using an address of 127.0.0.1, so I'm not sure what it would do in that case. I'm also not sure if it makes sense for the CMA to handle that case... I will review this patch early next week. - Sean From ftillier at silverstorm.com Tue Dec 13 12:14:45 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Tue, 13 Dec 2005 12:14:45 -0800 Subject: [openib-general] [PATCH] [CMA] support for SDP + standard protocol In-Reply-To: Message-ID: <000001c60021$e4215bb0$6401a8c0@infiniconsys.com> > From: Sean Hefty [mailto:sean.hefty at intel.com] > Sent: Tuesday, December 13, 2005 10:39 AM > > >I understand that SDP needs address translation services as well as > >its own private data. However, I think it could be implemented using > >optional API functions that allow the ULP to modify the private data > >per its need, rather than adding ULP knowledge into CMA. > >As example, if ISER spec will be modified, or some new ULP > >implemented, that needed their own private data, we'll need to modify > >CMA again, as well as creating a dependency between CMA versions and > >ULPs. > > The CMA must be aware of the format of the data in order to > set and extract the IP addressing information. SDP and the > new CMA format locate these in different areas of the private > data. The CMA only defines the SDP hello header, and > restricts its definition to the location of the IP addresses, > source port, and version information. > > If a ULP wants to define their own private data format and move > the locations of any of those fields, then yes, the CMA would > need to be changed again. But I don't see how any API changes > can prevent this, since the CMA must be able to extract the data > on the remote side. Now that the IB spec is going to have a section for how to support IP addressing in CM MADs, there shouldn't be any need for a ULP to duplicate that functionality. SDP is a special case because it predates the IP addressing extension to the CM protocol. - Fab From mst at mellanox.co.il Tue Dec 13 12:37:58 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Dec 2005 22:37:58 +0200 Subject: [openib-general] Re: [PATCH] sdp: replace ip_dev_find withdev_base scan (was Re: ip_dev_find resolution?) In-Reply-To: References: Message-ID: <20051213203758.GA6715@mellanox.co.il> Quoting Sean Hefty : > >As a reminder, what we are trying to do is handle the loopback case, > >where IP route resolution simply gives us back the loopback device, but > >we want to use the IB loopback: either external if source/destination > >are specified, or external if not. > > Can you look over the CMA/ib_addr code and see if what is done meets > your needs > (minus support for IPv6)? Will do. > I've never tested the CMA using an address of > 127.0.0.1, so I'm not sure what it would do in that case. I'm also not > sure if it makes sense for the CMA to handle that case... At least for SDP its important: people are used to being able to specify 127.0.0.1 and get a loopback connection. And in some cases (zcopy), you actually can get good performance out of it. > I will review this patch early next week. -- MST From mst at mellanox.co.il Tue Dec 13 12:44:01 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Dec 2005 22:44:01 +0200 Subject: [openib-general] Re: [PATCH] [CMA] support for SDP + standard protocol In-Reply-To: References: Message-ID: <20051213204401.GB6715@mellanox.co.il> Quoting Sean Hefty : > Subject: [PATCH] [CMA] support for SDP + standard protocol > > The following patch updates the CMA to support the IB socket-based > protocol standard and SDP's private data format. > > The CMA now defines RDMA "port spaces". RDMA identifiers are associated > with a user-specified port space at creation time. > > Please respond with any comments on the approach. Note that these > changes have not been pushed up to userspace yet. > > Signed-off-by: Sean Hefty OK, I started looking at converting SDP to CMA. One thing I'm a bit confused about: do I do my own QP transitions on the passive side? -- MST From ardavis at ichips.intel.com Tue Dec 13 13:55:17 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Tue, 13 Dec 2005 13:55:17 -0800 Subject: [openib-general] [PATCH][uDAPL] openib_cma provider update In-Reply-To: References: Message-ID: <439F4345.6070304@ichips.intel.com> Arlin Davis wrote: >James, > >I modified the IP address lookup during the open to take either a network name, network address, or >device name. This will make the dat.conf setup a little easier and more flexible. I updated the >README, and /doc/dat.conf with details. > >Thanks, > >-arlin > >Signed-off by: Arlin Davis > > > James, Did you get a chance to look at this patch? -arlin From sean.hefty at intel.com Tue Dec 13 14:21:34 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 13 Dec 2005 14:21:34 -0800 Subject: [openib-general] Re: [PATCH] sdp: replace ip_dev_find withdev_base scan (was Re: ip_dev_find resolution?) In-Reply-To: <20051213203758.GA6715@mellanox.co.il> Message-ID: >> I've never tested the CMA using an address of >> 127.0.0.1, so I'm not sure what it would do in that case. I'm also not >> sure if it makes sense for the CMA to handle that case... > >At least for SDP its important: people are used to being able >to specify 127.0.0.1 and get a loopback connection. >And in some cases (zcopy), you actually can get good performance out of it. I agree that this should be supported from the user's perspective, just not sure if the CMA should perform this functionality. A higher level ULP could map 127.0.0.1 to a specific IP address before calling the CMA, but I'm not sure that's any better. From the CMA's perspective, 127.0.0.1 could just as easily map to an iWarp device as an Infiniband device. - Sean From jlentini at netapp.com Tue Dec 13 14:31:26 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 13 Dec 2005 17:31:26 -0500 (EST) Subject: [openib-general] [PATCH][uDAPL] openib_cma provider update In-Reply-To: <439F4345.6070304@ichips.intel.com> Message-ID: On Tue, 13 Dec 2005, Arlin Davis wrote: > Arlin Davis wrote: > > >James, > > > >I modified the IP address lookup during the open to take either a network name, network address, or > >device name. This will make the dat.conf setup a little easier and more flexible. I updated the > >README, and /doc/dat.conf with details. > > > >Thanks, > > > >-arlin > > > >Signed-off by: Arlin Davis > > > > > > > James, > > Did you get a chance to look at this patch? > > -arlin I haven't had a chance yet. I'm traveling this week, so my conectivity is sporadic. I'll be able to review it by Thursday at the latest. From sean.hefty at intel.com Tue Dec 13 14:31:47 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 13 Dec 2005 14:31:47 -0800 Subject: [openib-general] RE: [PATCH] [CMA] support for SDP + standard protocol In-Reply-To: <20051213204401.GB6715@mellanox.co.il> Message-ID: >OK, I started looking at converting SDP to CMA. >One thing I'm a bit confused about: do I do >my own QP transitions on the passive side? The CMA should perform the QP transitions on both sides. The main difference between the SDP and other users of the CMA is that SDP passes in the start of the SDP hello header, and owns setting any information not related to the IP addressing, such as the SDP version, MaxAdverts, etc. The CMA will fill in the IP version, IP addresses, and local port. - Sean From mst at mellanox.co.il Tue Dec 13 14:36:22 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Dec 2005 00:36:22 +0200 Subject: [openib-general] Re: [PATCH] sdp: replace ip_dev_find withdev_base scan (was Re: ip_dev_find resolution?) In-Reply-To: References: Message-ID: <20051213223622.GA7173@mellanox.co.il> Quoting Sean Hefty : > A higher level ULP could map 127.0.0.1 to a specific IP address before > calling the CMA, but I'm not sure that's any better. Ugh. I really would like to hide all the IPv4/IPv6 etc from ULPs. > From the CMA's perspective, 127.0.0.1 could just as easily > map to an iWarp device as an Infiniband device. Which device to select is a difficult problem. I think we might be able to just punt on this for now, selecting an arbitrary device of an appropriate type that happens to be up. My hope is that in the long run, this can be viewed as a special case in the general path selection/multipathing problem. By the way, CMA seems to happily take bits out of the hardware address and assume that these include the gid, pkey, etc. Shouldnt it check the device type before doing this? -- MST From mst at mellanox.co.il Tue Dec 13 14:42:02 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Dec 2005 00:42:02 +0200 Subject: [openib-general] Re: [PATCH] [CMA] support for SDP + standard protocol In-Reply-To: References: Message-ID: <20051213224202.GB7173@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [PATCH] [CMA] support for SDP + standard protocol > > >OK, I started looking at converting SDP to CMA. > >One thing I'm a bit confused about: do I do > >my own QP transitions on the passive side? > > The CMA should perform the QP transitions on both sides. > > The main difference between the SDP and other users of the CMA is that > SDP > passes in the start of the SDP hello header, and owns setting any > information > not related to the IP addressing, such as the SDP version, MaxAdverts, > etc. The > CMA will fill in the IP version, IP addresses, and local port. > > - Sean > What confuses me is how do I handle creation of multiple QPs when multiple clients want to connect to a specific port on a server. cma id seems to only include one qp: do I disconnect it from qp somehow after connection is set up? -- MST From rpandit at silverstorm.com Tue Dec 13 14:59:13 2005 From: rpandit at silverstorm.com (Ranjit Pandit) Date: Tue, 13 Dec 2005 14:59:13 -0800 Subject: [openib-general] Re: [PATCH] [CMA] support for SDP + standard protocol In-Reply-To: <20051213224202.GB7173@mellanox.co.il> References: <20051213224202.GB7173@mellanox.co.il> Message-ID: <96f8e60e0512131459m76301fb4gae24954921a3e388@mail.gmail.com> On 12/13/05, Michael S. Tsirkin wrote: > > What confuses me is how do I handle creation of multiple QPs > when multiple clients want to connect to a specific port on a server. > cma id seems to only include one qp: do I disconnect it from qp > somehow after connection is set up? RDS has a similar requirement - it creates one listener to which all clients connect. I'm also trying to figure out whether to use CM or CMA for Rds. I would think that rdma_accept() would create a seperate cma_id (and it's associated qp) which will then become the actual passive side of the connection. Ranjit > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From tom at opengridcomputing.com Tue Dec 13 15:07:24 2005 From: tom at opengridcomputing.com (Tom Tucker) Date: Tue, 13 Dec 2005 17:07:24 -0600 Subject: [openib-general] dev_remove in the CMA Message-ID: <1134515244.3764.9.camel@trinity.austin.ammasso.com> Sean: I'm don't understand the dev_remove usage in the rdma_cm_id. It looks to me like if the user calls rdma_resolve_addr, but never calls rdma_resolve_route that the device could not be removed. Is this the intended behavior? Is the goal to prevent the user from removing the device if the client is in a callback? If so, can't we just increment and decrement in the cma_notify_user function? I guess I just don't understand... Thanks, Tom From rpandit at silverstorm.com Tue Dec 13 15:50:22 2005 From: rpandit at silverstorm.com (Ranjit Pandit) Date: Tue, 13 Dec 2005 15:50:22 -0800 Subject: [openib-general] Re: [PATCH] [CMA] support for SDP + standard protocol In-Reply-To: <96f8e60e0512131459m76301fb4gae24954921a3e388@mail.gmail.com> References: <20051213224202.GB7173@mellanox.co.il> <96f8e60e0512131459m76301fb4gae24954921a3e388@mail.gmail.com> Message-ID: <96f8e60e0512131550m4cdc9081h9b00883e4cd572e8@mail.gmail.com> Looks like the CMA does create a new cm_id on every connect request. cma_req_handler() calls cma_new_id() and passes the new id to the connect request callback. On 12/13/05, Ranjit Pandit wrote: > On 12/13/05, Michael S. Tsirkin wrote: > > > > What confuses me is how do I handle creation of multiple QPs > > when multiple clients want to connect to a specific port on a server. > > cma id seems to only include one qp: do I disconnect it from qp > > somehow after connection is set up? > > RDS has a similar requirement - it creates one listener to which all > clients connect. > I'm also trying to figure out whether to use CM or CMA for Rds. > > I would think that rdma_accept() would create a seperate cma_id (and > it's associated qp) which will then become the actual passive side of > the connection. > > Ranjit > > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From abeyn at datadirectnet.com Tue Dec 13 19:07:53 2005 From: abeyn at datadirectnet.com (Alexander Beyn) Date: Tue, 13 Dec 2005 19:07:53 -0800 Subject: [openib-general] [PATCH] ibsrpdm: use the proper HCA and port with non-default umad device Message-ID: <439F8C89.7040507@datadirectnet.com> In srptools-0.0.2, ibsrpdm gets the LID of the first port of the first HCA to do discovery. This means ibsrpdm can't find SRP targets connected to other ports, even if the proper umad device is passed with the -d option. With the following patch, ibsrpdm uses the HCA and port associated with the umad device to get the LID. It was tested with 2 dual-port HCAs directly connected to our Infiniband array, properly finding SRP targets on all four ports. Alexander Beyn DataDirect Networks --- srp-dm.c.orig 2005-12-13 15:32:52.000000000 -0800 +++ srp-dm.c 2005-12-13 18:22:10.000000000 -0800 @@ -44,6 +44,7 @@ static const uint8_t topspin_oui[3] = { 0x00, 0x05, 0xad }; static char *umad_dev = "/dev/infiniband/umad0"; +static char *port_sysfs_path; static int timeout_ms = 2500; static uint16_t sm_lid; static uint32_t tid = 1; @@ -77,6 +78,47 @@ fprintf(stderr, "Usage: %s [-gGvc] [-d ]\n", argv0); } +int setup_port_sysfs_path(void) { + char path[256]; + char ibport[16]; + char ibdev[16]; + char *umad_dev_name; + struct sysfs_class_device *umad_sysfs_dev; + struct sysfs_attribute *umad_attr; + + if (sysfs_get_mnt_path(path, sizeof path)) { + fprintf(stderr, "Couldn't find sysfs mount.\n"); + return -1; + } + if((umad_dev_name = rindex(umad_dev, '/'))) { + umad_dev_name++; + } + umad_sysfs_dev = sysfs_open_class_device("infiniband_mad", + umad_dev_name); + if(!umad_sysfs_dev) { + fprintf(stderr, "Couldn't open umad sysfs entry named: %s\n", + umad_dev_name); + return -1; + } + umad_attr = sysfs_get_classdev_attr(umad_sysfs_dev, "ibdev"); + if(sysfs_read_attribute(umad_attr)) { + fprintf(stderr, "Couldn't read ibdev attribute.\n"); + return -1; + } + sscanf(umad_attr->value, "%15s", ibdev); + + umad_attr = sysfs_get_classdev_attr(umad_sysfs_dev, "port"); + if(sysfs_read_attribute(umad_attr)) { + fprintf(stderr, "Couldn't read port attribute.\n"); + return -1; + } + sscanf(umad_attr->value, "%15s", ibport); + + asprintf(&port_sysfs_path, "%s/class/infiniband/%s/ports/%s", + path, ibdev, ibport); + return 0; +} + int create_agent(int fd, uint32_t agent[2]) { struct ib_user_mad_reg_req req; @@ -196,7 +238,6 @@ struct ib_user_mad in_mad, out_mad; struct srp_dm_mad *out_dm_mad, *in_dm_mad; struct srp_dm_class_port_info *cpi; - char path[256]; char val[64]; char *name; int i; @@ -208,12 +249,7 @@ cpi = (void *) out_dm_mad->data; - if (sysfs_get_mnt_path(path, sizeof path)) { - fprintf(stderr, "Couldn't find sysfs mount.\n"); - return -1; - } - - asprintf(&name, "%s/class/infiniband/mthca0/ports/1/lid", path); + asprintf(&name, "%s/lid", port_sysfs_path); if (sysfs_read_attribute_value(name, val, sizeof val)) { fprintf(stderr, "Couldn't read LID at %s\n", name); @@ -222,7 +258,7 @@ cpi->trap_lid = htons(strtol(val, NULL, 0)); - asprintf(&name, "%s/class/infiniband/mthca0/ports/1/gids/0", path); + asprintf(&name, "%s/gids/0", port_sysfs_path); if (sysfs_read_attribute_value(name, val, sizeof val)) { fprintf(stderr, "Couldn't read GID at %s\n", name); @@ -473,19 +509,13 @@ struct srp_dm_rmpp_sa_mad *out_sa_mad, *in_sa_mad; struct srp_sa_port_info_rec *port_info; ssize_t len; - char path[256]; char val[64]; char *name; int pn; int size; int i; - if (sysfs_get_mnt_path(path, sizeof path)) { - fprintf(stderr, "Couldn't find sysfs mount.\n"); - return -1; - } - - asprintf(&name, "%s/class/infiniband/mthca0/ports/1/sm_lid", path); + asprintf(&name, "%s/sm_lid", port_sysfs_path); if (sysfs_read_attribute_value(name, val, sizeof val)) { fprintf(stderr, "Couldn't read LID at %s\n", name); @@ -585,6 +615,8 @@ perror("open"); return 1; } + if(setup_port_sysfs_path()) + return 1; if (create_agent(fd, agent)) return 1; From bardov at gmail.com Tue Dec 13 22:43:26 2005 From: bardov at gmail.com (Dan Bar Dov) Date: Wed, 14 Dec 2005 08:43:26 +0200 Subject: [openib-general] [PATCH] [CMA] support for SDP + standard protocol In-Reply-To: <000001c60021$e4215bb0$6401a8c0@infiniconsys.com> References: <000001c60021$e4215bb0$6401a8c0@infiniconsys.com> Message-ID: Wouldn't it make sense than, to also modify the SDP spec? After all, the change in openIB would modify both client & server sides. Of course existing stacks would have to be changed if they will want interoperability, but I think it could fly. Dan On 12/13/05, Fab Tillier wrote: > > From: Sean Hefty [mailto:sean.hefty at intel.com] > > Sent: Tuesday, December 13, 2005 10:39 AM > > > > >I understand that SDP needs address translation services as well as > > >its own private data. However, I think it could be implemented using > > >optional API functions that allow the ULP to modify the private data > > >per its need, rather than adding ULP knowledge into CMA. > > >As example, if ISER spec will be modified, or some new ULP > > >implemented, that needed their own private data, we'll need to modify > > >CMA again, as well as creating a dependency between CMA versions and > > >ULPs. > > > > The CMA must be aware of the format of the data in order to > > set and extract the IP addressing information. SDP and the > > new CMA format locate these in different areas of the private > > data. The CMA only defines the SDP hello header, and > > restricts its definition to the location of the IP addresses, > > source port, and version information. > > > > If a ULP wants to define their own private data format and move > > the locations of any of those fields, then yes, the CMA would > > need to be changed again. But I don't see how any API changes > > can prevent this, since the CMA must be able to extract the data > > on the remote side. > > Now that the IB spec is going to have a section for how to support IP addressing > in CM MADs, there shouldn't be any need for a ULP to duplicate that > functionality. SDP is a special case because it predates the IP addressing > extension to the CM protocol. > > - Fab > > > From vvcute at kobej.zzn.com Wed Dec 14 00:45:05 2005 From: vvcute at kobej.zzn.com (vvcute at kobej.zzn.com) Date: Wed, 14 Dec 2005 00:45:05 -0800 (PST) Subject: [openib-general] =?utf-8?b?wo/Cl8KQwqvCgsONwo1EwoLCq8KCw4XCgsK3?= =?utf-8?b?woLCqcKBSA==?= Message-ID: 20030410235146.58176mail@mail.love-sexlife88545879889_woman-server889_womansystem01_woman-sexlife-love.tv ��y�o��z�X�g�z��ł� �y�E��z�F�o��z�X�g �y�ҋ��z�F�o�^��P��A�T2�񏗐��Љ� �y��^�z�F�挎��57.6��i��Ǝҕ��21.3��~�j �y��i�z�F18�Έȏ�̌��N�ȕ� �y�Q��@�z�Fhttp://twilight.cx/h/ ��o�^��Ă��B �o�^�̍ۂɁy�o��z�X�g�z��K��L��B �o�^��͎��珗��Ă�ǂ��ł��A �y�o��z�X�g�z��ă��[��Ă��鏗��̂ŁA ��[��ԐM��āA��Ă��B �@�@�@�@�@�@�@http://twilight.cx/h/ From ogerlitz at voltaire.com Wed Dec 14 01:56:22 2005 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 14 Dec 2005 11:56:22 +0200 Subject: [openib-general] Re: [PATCH] [CMA] support for SDP + standard protocol In-Reply-To: <96f8e60e0512131550m4cdc9081h9b00883e4cd572e8@mail.gmail.com> References: <20051213224202.GB7173@mellanox.co.il> <96f8e60e0512131459m76301fb4gae24954921a3e388@mail.gmail.com> <96f8e60e0512131550m4cdc9081h9b00883e4cd572e8@mail.gmail.com> Message-ID: <439FEC46.9080904@voltaire.com> Ranjit Pandit wrote: > Looks like the CMA does create a new cm_id on every connect request. > cma_req_handler() calls cma_new_id() and passes the new id to the > connect request callback. Indeed, to see how it is used you can follow on the passive side flow in gen2/utils/src/linux-kernel/infiniband/util/cmatose. Having both the cma ids of the listener and the new connection at hand within the connection request callback, is easy: you add the listener id as a field in the context struct which you associate with it, so you get it as the callback second param. From ogerlitz at voltaire.com Wed Dec 14 02:17:55 2005 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 14 Dec 2005 12:17:55 +0200 Subject: [openib-general] CMA/RDS In-Reply-To: <96f8e60e0512131459m76301fb4gae24954921a3e388@mail.gmail.com> References: <20051213224202.GB7173@mellanox.co.il> <96f8e60e0512131459m76301fb4gae24954921a3e388@mail.gmail.com> Message-ID: <439FF153.8030203@voltaire.com> Ranjit Pandit wrote: > I'm also trying to figure out whether to use CM or CMA for RDS Please note that the CMA does much more then making your CM interaction easier, eg handle IP to RDMA (eg IB) address translation, being at hotplug client and generally provides IP based RDMA transport neutral connection management service. Combining this with the ib_verbs api being also RDMA transport neutral enables an app/ulp to be such. Eventually, most (all) the middlewares/ULPs are to be coded over the CMA and ib_verbs, eg uDAPL, iSER, NFSoRDMA, Lustre, SDP, SRP(?). The iSER initiator code over the CMA is under work and initial drop is commited to the openib svn. Or. From yael at mellanox.co.il Wed Dec 14 02:41:42 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 14 Dec 2005 12:41:42 +0200 Subject: [openib-general] [PATCH] Opensm - support arbitrary paths for driver installation Message-ID: <5z64psgduh.fsf@mtl066.yok.mtl.com> Hi Hal, Currently, if the user level driver installation is not in /usr/local/ dir - configure of opensm failes. The following patch enables support for arbitrary paths of user level driver installation. Path can be given using --with-uldrv flag. Thanks, Yael Signed-off-by: Yael Kalka Index: osmtest/Makefile.am =================================================================== --- osmtest/Makefile.am (revision 4412) +++ osmtest/Makefile.am (working copy) @@ -5,10 +5,7 @@ else DBGFLAGS = -g -O2 endif -INCLUDES = -I$(srcdir)/include \ - -I$(srcdir)/../include \ - -I$(srcdir)/../../libibcommon/include/infiniband \ - -I$(srcdir)/../../libibumad/include/infiniband +INCLUDES = -I$(srcdir)/include $(OSMV_INCLUDES) bin_PROGRAMS = osmtest osmtest_SOURCES = main.c osmtest.c osmt_service.c osmt_slvl_vl_arb.c \ Index: opensm/Makefile.am =================================================================== --- opensm/Makefile.am (revision 4412) +++ opensm/Makefile.am (working copy) @@ -1,7 +1,5 @@ -INCLUDES = -I$(srcdir)/../include \ - -I$(srcdir)/../../libibcommon/include/infiniband \ - -I$(srcdir)/../../libibumad/include/infiniband +INCLUDES = $(OSMV_INCLUDES) lib_LTLIBRARIES = libopensm.la Index: config/osmvsel.m4 =================================================================== --- config/osmvsel.m4 (revision 4412) +++ config/osmvsel.m4 (working copy) @@ -13,21 +13,39 @@ AC_DEFUN([OPENIB_APP_OSMV_SEL], [ dnl Define a way for the user to provide the osm vendor type AC_ARG_WITH(osmv, -[ --with-osmv= define the osm vendor type], +[ --with-osmv= define the osm vendor type to build], AC_MSG_NOTICE(Using OSM Vendor Type:$with_osmv), with_osmv="openib") +dnl Define a way for the user to provide the path to the driver installation +AC_ARG_WITH(uldrv, +[ --with-uldrv= define the dir where the user level driver is installed], +AC_MSG_NOTICE(Using user level installation prefix:$with_uldrv), +with_uldrv="") + dnl Define a way for the user to provide the path to the simulator installation AC_ARG_WITH(sim, [ --with-sim= define the simulator prefix for building sim vendor (/usr)], AC_MSG_NOTICE(Using Simulator from:$with_sim), with_sim="/usr") +dnl Should we use lib64 or lib +if test "$(uname -m)" = "x86_64"; then + osmv_lib_type="lib64" +else + osmv_lib_type="lib" +fi + dnl based on the with_osmv we can try the vendor flag if test $with_osmv = "openib"; then OSMV_CFLAGS="-DOSM_VENDOR_INTF_OPENIB" OSMV_INCLUDES="-I\$(srcdir)/../include -I\$(srcdir)/../../libibcommon/include/infiniband -I\$(srcdir)/../../libibumad/include/infiniband" + if test "x$with_uldrv" = "x"; then OSMV_LDADD="-libumad" + else + OSMV_INCLUDES="-I$with_uldrv/include $OSMV_INCLUDES" + OSMV_LDADD="-L$with_uldrv/$osmv_lib_type -libumad" + fi elif test $with_osmv = "sim" ; then OSMV_CFLAGS="-DOSM_VENDOR_INTF_SIM" OSMV_INCLUDES="-I$with_sim/include -I\$(srcdir)/../include" @@ -90,8 +108,11 @@ if test "$disable_libcheck" != "yes"; th dnl based on the with_osmv we can try the vendor flag if test $with_osmv = "openib"; then + osmv_save_ldflags=$LDFALGS + LDFLAGS="$LDFLAGS $OSMV_LDADD" AC_CHECK_LIB(ibumad, umad_init, [], AC_MSG_ERROR([umad_init() not found. libosmvendor of type openib requires libibumad.])) + LD_FLAGS=$osmv_save_ldflags elif test $with_osmv = "sim" ; then LDFLAGS="$LDFLAGS -L$with_sim/lib" AC_CHECK_FILE([$with_sim/lib/libibmscli.a], [], From mst at mellanox.co.il Wed Dec 14 04:48:40 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Dec 2005 14:48:40 +0200 Subject: [openib-general] ipoib: question Message-ID: <20051214124840.GN14870@mellanox.co.il> Roland, where exactly does the following math come from? static inline struct ipoib_neigh **to_ipoib_neigh(struct neighbour *neigh) { return (struct ipoib_neigh **) (neigh->ha + 24 - (offsetof(struct neighbour, ha) & 4)); } 1. What does & 4 do here? 2. Why are we subsrctucting a function of ha offset? 4. What is 24? Is it related to INFINIBAND_ALEN? Thanks, -- MST From halr at voltaire.com Wed Dec 14 08:09:04 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Dec 2005 11:09:04 -0500 Subject: [openib-general] Re: [PATCH] Opensm - support arbitrary paths for driver installation In-Reply-To: <5z64psgduh.fsf@mtl066.yok.mtl.com> References: <5z64psgduh.fsf@mtl066.yok.mtl.com> Message-ID: <1134576541.26766.7669.camel@hal.voltaire.com> Hi Yael, On Wed, 2005-12-14 at 05:41, Yael Kalka wrote: > Hi Hal, > > Currently, if the user level driver installation is not in /usr/local/ > dir - configure of opensm failes. > The following patch enables support for arbitrary paths of user level > driver installation. Path can be given using --with-uldrv flag. Thanks. Applied. A couple of comments below. -- Hal > > Thanks, > Yael > > Signed-off-by: Yael Kalka > > Index: osmtest/Makefile.am > =================================================================== > --- osmtest/Makefile.am (revision 4412) > +++ osmtest/Makefile.am (working copy) > @@ -5,10 +5,7 @@ else > DBGFLAGS = -g -O2 > endif > > -INCLUDES = -I$(srcdir)/include \ > - -I$(srcdir)/../include \ > - -I$(srcdir)/../../libibcommon/include/infiniband \ > - -I$(srcdir)/../../libibumad/include/infiniband > +INCLUDES = -I$(srcdir)/include $(OSMV_INCLUDES) > > bin_PROGRAMS = osmtest > osmtest_SOURCES = main.c osmtest.c osmt_service.c osmt_slvl_vl_arb.c \ > Index: opensm/Makefile.am > =================================================================== > --- opensm/Makefile.am (revision 4412) > +++ opensm/Makefile.am (working copy) > @@ -1,7 +1,5 @@ > > -INCLUDES = -I$(srcdir)/../include \ > - -I$(srcdir)/../../libibcommon/include/infiniband \ > - -I$(srcdir)/../../libibumad/include/infiniband > +INCLUDES = $(OSMV_INCLUDES) > > lib_LTLIBRARIES = libopensm.la For some reason, the Makefile.am patches were rejected. I applied them by hand. > Index: config/osmvsel.m4 > =================================================================== > --- config/osmvsel.m4 (revision 4412) > +++ config/osmvsel.m4 (working copy) > @@ -13,21 +13,39 @@ AC_DEFUN([OPENIB_APP_OSMV_SEL], [ > > dnl Define a way for the user to provide the osm vendor type > AC_ARG_WITH(osmv, > -[ --with-osmv= define the osm vendor type], > +[ --with-osmv= define the osm vendor type to build], > AC_MSG_NOTICE(Using OSM Vendor Type:$with_osmv), > with_osmv="openib") > > +dnl Define a way for the user to provide the path to the driver installation > +AC_ARG_WITH(uldrv, > +[ --with-uldrv= define the dir where the user level driver is installed], > +AC_MSG_NOTICE(Using user level installation prefix:$with_uldrv), > +with_uldrv="") > + > dnl Define a way for the user to provide the path to the simulator installation > AC_ARG_WITH(sim, > [ --with-sim= define the simulator prefix for building sim vendor (/usr)], > AC_MSG_NOTICE(Using Simulator from:$with_sim), > with_sim="/usr") > > +dnl Should we use lib64 or lib > +if test "$(uname -m)" = "x86_64"; then > + osmv_lib_type="lib64" > +else > + osmv_lib_type="lib" > +fi > + > dnl based on the with_osmv we can try the vendor flag > if test $with_osmv = "openib"; then > OSMV_CFLAGS="-DOSM_VENDOR_INTF_OPENIB" > OSMV_INCLUDES="-I\$(srcdir)/../include -I\$(srcdir)/../../libibcommon/include/infiniband -I\$(srcdir)/../../libibumad/include/infiniband" > + if test "x$with_uldrv" = "x"; then > OSMV_LDADD="-libumad" > + else > + OSMV_INCLUDES="-I$with_uldrv/include $OSMV_INCLUDES" > + OSMV_LDADD="-L$with_uldrv/$osmv_lib_type -libumad" > + fi > elif test $with_osmv = "sim" ; then > OSMV_CFLAGS="-DOSM_VENDOR_INTF_SIM" > OSMV_INCLUDES="-I$with_sim/include -I\$(srcdir)/../include" > @@ -90,8 +108,11 @@ if test "$disable_libcheck" != "yes"; th > > dnl based on the with_osmv we can try the vendor flag > if test $with_osmv = "openib"; then > + osmv_save_ldflags=$LDFALGS ^^^^^^^ LDFLAGS (in some other places too) > + LDFLAGS="$LDFLAGS $OSMV_LDADD" > AC_CHECK_LIB(ibumad, umad_init, [], > AC_MSG_ERROR([umad_init() not found. libosmvendor of type openib requires libibumad.])) > + LD_FLAGS=$osmv_save_ldflags > elif test $with_osmv = "sim" ; then > LDFLAGS="$LDFLAGS -L$with_sim/lib" > AC_CHECK_FILE([$with_sim/lib/libibmscli.a], [], From iod00d at hp.com Wed Dec 14 09:14:52 2005 From: iod00d at hp.com (Grant Grundler) Date: Wed, 14 Dec 2005 09:14:52 -0800 Subject: [openib-general] Re: [openib-commits] r4453 - trunk/contrib/mellanox/gen2/src/userspace/perftest In-Reply-To: <20051214095726.682B32283EE@openib.ca.sandia.gov> References: <20051214095726.682B32283EE@openib.ca.sandia.gov> Message-ID: <20051214171452.GA26274@esmail.cup.hp.com> On Wed, Dec 14, 2005 at 01:57:26AM -0800, sagir at openib.org wrote: > Author: sagir > Date: 2005-12-14 01:57:24 -0800 (Wed, 14 Dec 2005) > New Revision: 4453 > > Modified: > trunk/contrib/mellanox/gen2/src/userspace/perftest/rdma_lat.c Can someone from mellanox explain why mainline src/userspace is cloned under contrib/mellanox? > Log: > mtu per device You guys are certainly welcome to add stuff to contrib/mellanox. I just would like to be able to explain to HP management why there are two versions of rmda_lat.c. thanks, grant From eitan at mellanox.co.il Wed Dec 14 11:25:23 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 14 Dec 2005 21:25:23 +0200 Subject: [openib-general] Next workshop dates? Please respond with your preferences In-Reply-To: <6.2.3.4.2.20051208224443.03a16be0@mail-lc.llnl.gov> References: <6.2.3.4.2.20051208224443.03a16be0@mail-lc.llnl.gov> Message-ID: <43A071A3.30500@mellanox.co.il> Hi, I would like to propose the following agenda topics: Core Enhancements: QoS - directions for integration and support by OpenIB stack Partitions - recognize areas needing enhancements Multicast, Services and InformInfo Registrations: reference counting and re-registrations - implementation plan/API Diagnostics: Describe/discuss new diagnostic tools feature. Eitan From arlin.r.davis at intel.com Wed Dec 14 11:56:02 2005 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 14 Dec 2005 11:56:02 -0800 Subject: [openib-general] [PATCH][uDAPL] openib_scm uses incorrect rd_atomic values for modify_qp Message-ID: James, Here is a fix for openib socket cm version. I ran into a problem with the latest verbs qp_modify as a result of incorrect rd_atomic values so I modified to use the values returned from the ibv_query_device() instead of hard coded values. -arlin Signed-off by: Arlin Davis Index: dapl/openib_scm/dapl_ib_qp.c =================================================================== --- dapl/openib_scm/dapl_ib_qp.c (revision 4464) +++ dapl/openib_scm/dapl_ib_qp.c (working copy) @@ -300,10 +300,11 @@ dapls_modify_qp_state ( IN ib_qp_handle_ { struct ibv_qp_attr qp_attr; enum ibv_qp_attr_mask mask = IBV_QP_STATE; - + DAPL_EP *ep_ptr = (DAPL_EP*)qp_handle->qp_context; + dapl_os_memzero((void*)&qp_attr, sizeof(qp_attr)); qp_attr.qp_state = qp_state; - + switch (qp_state) { /* additional attributes with RTR and RTS */ case IBV_QPS_RTR: @@ -318,17 +319,21 @@ dapls_modify_qp_state ( IN ib_qp_handle_ qp_attr.path_mtu = IBV_MTU_1024; qp_attr.dest_qp_num = qp_cm->qpn; qp_attr.rq_psn = 1; - qp_attr.max_dest_rd_atomic = 8; + qp_attr.max_dest_rd_atomic = + ep_ptr->param.ep_attr.max_rdma_read_out; qp_attr.min_rnr_timer = 12; qp_attr.ah_attr.is_global = 0; qp_attr.ah_attr.dlid = qp_cm->lid; qp_attr.ah_attr.sl = 0; qp_attr.ah_attr.src_path_bits = 0; qp_attr.ah_attr.port_num = qp_cm->port; - + dapl_dbg_log (DAPL_DBG_TYPE_EP, - " modify_qp_rtr: qpn %x lid %x port %x\n", - qp_cm->qpn,qp_cm->lid,qp_cm->port ); + " modify_qp_rtr: qpn %x lid %x " + "port %x rd_atomic %d\n", + qp_cm->qpn, qp_cm->lid, qp_cm->port, + qp_attr.max_dest_rd_atomic ); + break; } case IBV_QPS_RTS: @@ -343,9 +348,11 @@ dapls_modify_qp_state ( IN ib_qp_handle_ qp_attr.retry_cnt = 7; qp_attr.rnr_retry = 7; qp_attr.sq_psn = 1; - qp_attr.max_rd_atomic = 8; + qp_attr.max_rd_atomic = + ep_ptr->param.ep_attr.max_rdma_read_out; + dapl_dbg_log (DAPL_DBG_TYPE_EP, - " modify_qp_rts: psn %x or %x\n", + " modify_qp_rts: psn %x rd_atomic %d\n", qp_attr.sq_psn, qp_attr.max_rd_atomic ); break; } From rdreier at cisco.com Wed Dec 14 12:16:45 2005 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 14 Dec 2005 12:16:45 -0800 Subject: [openib-general] Re: ipoib: question In-Reply-To: <20051214124840.GN14870@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 14 Dec 2005 14:48:40 +0200") References: <20051214124840.GN14870@mellanox.co.il> Message-ID: > where exactly does the following math come from? > > static inline struct ipoib_neigh **to_ipoib_neigh(struct neighbour *neigh) > { > return (struct ipoib_neigh **) (neigh->ha + 24 - > (offsetof(struct neighbour, ha) & 4)); > } > > 1. What does & 4 do here? > 2. Why are we subsrctucting a function of ha offset? No #3 again ;) > 4. What is 24? Is it related to INFINIBAND_ALEN? Yes, 24 is INFINIBAND_ALEN + 4. Maybe it would be clearer to write it that way. The idea is that we want to get something aligned to 8 bytes. I'd have to check again to be sure but I think that on some architectures, the beginning ha member of struct neighbour is only aligned to 4 bytes, so we should offset by 20 bytes to get to an alignment of 8 and still leave room for the real hardware address. - R. From rdreier at cisco.com Wed Dec 14 12:21:34 2005 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 14 Dec 2005 12:21:34 -0800 Subject: [openib-general] Re: [PATCH] libmthca: fix error handling in mthca_store_qp In-Reply-To: <20051213102255.GX14936@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 13 Dec 2005 12:22:55 +0200") References: <6AB138A2AB8C8E4A98B9C0C3D52670E3D1B80D@mtlexch01.mtl.com> <20051213102255.GX14936@mellanox.co.il> Message-ID: Thanks, good catch. I decided it was clearer to split off the test of refcnt, and only increment refcnt if the allocation succeeds. - R. --- libmthca/src/qp.c (revision 4465) +++ libmthca/src/qp.c (working copy) @@ -875,13 +875,15 @@ int mthca_store_qp(struct mthca_context pthread_mutex_lock(&ctx->qp_table_mutex); - if (!ctx->qp_table[tind].refcnt++) { + if (!ctx->qp_table[tind].refcnt) { ctx->qp_table[tind].table = calloc(ctx->qp_table_mask + 1, sizeof (struct mthca_qp *)); if (!ctx->qp_table[tind].table) { ret = -1; goto out; } + + ++ctx->qp_table[tind].refcnt; } ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = qp; --- libmthca/ChangeLog (revision 4465) +++ libmthca/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2005-12-14 Roland Dreier + + * src/qp.c (mthca_store_qp): Only increment qp_table ref count if + allocation succeeds. + 2005-11-29 Michael S. Tsirkin * src/qp.c (mthca_arbel_post_send): Add handling for posting long From rdreier at cisco.com Wed Dec 14 12:22:50 2005 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 14 Dec 2005 12:22:50 -0800 Subject: [openib-general] Re: [PATCH] mthca: correct max_rd_atomic handling In-Reply-To: <20051213090919.GW14936@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 13 Dec 2005 11:09:19 +0200") References: <20051213090919.GW14936@mellanox.co.il> Message-ID: What happens if SAE and SRE are turned off and the consumer posts an RDMA read? Does it fail and generate an error completion? I don't think we want it to just stop the QP processing. - R. From rdreier at cisco.com Wed Dec 14 12:28:18 2005 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 14 Dec 2005 12:28:18 -0800 Subject: [openib-general] Re: [PATCH/RFC] change ibv_get_devices() to ibv_get_device_list() In-Reply-To: <20051210214832.GA31057@mellanox.co.il> (Michael S. Tsirkin's message of "Sat, 10 Dec 2005 23:48:32 +0200") References: <20051210214832.GA31057@mellanox.co.il> Message-ID: Michael> This wont work for hotplug: you are saving the device Michael> pointer without opening the device, so it might go away Michael> from under your feet. Good point -- see updated change below. Michael> I wander whether we can come up with an API that helps Michael> people get it right more easily? I guess we could return some opaque cookie or validate pointers using a hash table or something like that. --- mvapich-gen2/mpid/ch_gen2/viainit.c (revision 4465) +++ mvapich-gen2/mpid/ch_gen2/viainit.c (working copy) @@ -74,13 +74,21 @@ static void set_malloc_options(void) static void open_hca(void) { - struct dlist *dev_list; struct ibv_device *ib_dev = NULL; +#ifdef GEN2_OLD_DEVICE_LIST_VERB + struct dlist *dev_list; + dev_list = ibv_get_devices(); dlist_start(dev_list); ib_dev = dlist_next(dev_list); +#else + struct ibv_device **dev_list; + + dev_list = ibv_get_device_list(NULL); + ib_dev = dev_list[0]; +#endif if (!ib_dev) { fprintf(stderr, "No IB devices found\n"); @@ -90,6 +98,10 @@ static void open_hca(void) ibv_dev.context = ibv_open_device(ib_dev); +#ifndef GEN2_OLD_DEVICE_LIST_VERB + ibv_free_device_list(dev_list); +#endif + if(!ibv_dev.context) { error_abort_all(GEN_EXIT_ERR, "Error getting HCA context\n"); } From rdreier at cisco.com Wed Dec 14 12:39:43 2005 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 14 Dec 2005 12:39:43 -0800 Subject: [openib-general] Re: [PATCH/RFC] change ibv_get_devices() to ibv_get_device_list() In-Reply-To: (Roland Dreier's message of "Wed, 14 Dec 2005 12:28:18 -0800") References: <20051210214832.GA31057@mellanox.co.il> Message-ID: BTW, I'm going to commit this whole set of changes now... - R. From mst at mellanox.co.il Wed Dec 14 13:25:16 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Dec 2005 23:25:16 +0200 Subject: [openib-general] Re: ipoib: question In-Reply-To: References: Message-ID: <20051214212516.GI17538@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: ipoib: question > > > where exactly does the following math come from? > > > > static inline struct ipoib_neigh **to_ipoib_neigh(struct neighbour *neigh) > > { > > return (struct ipoib_neigh **) (neigh->ha + 24 - > > (offsetof(struct neighbour, ha) & 4)); > > } > The idea is that we want to get something aligned to 8 bytes. Does & 4 do that? -- MST From mst at mellanox.co.il Wed Dec 14 13:28:49 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Dec 2005 23:28:49 +0200 Subject: [openib-general] Re: [PATCH/RFC] change ibv_get_devices() to ibv_get_device_list() In-Reply-To: References: Message-ID: <20051214212848.GJ17538@mellanox.co.il> Quoting r. Roland Dreier : > Michael> I wander whether we can come up with an API that helps > Michael> people get it right more easily? > > I guess we could return some opaque cookie or validate pointers using > a hash table or something like that. Lets just clarify this in the documentation. -- MST From rdreier at cisco.com Wed Dec 14 13:29:17 2005 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 14 Dec 2005 13:29:17 -0800 Subject: [openib-general] Re: ipoib: question In-Reply-To: <20051214212516.GI17538@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 14 Dec 2005 23:25:16 +0200") References: <20051214212516.GI17538@mellanox.co.il> Message-ID: Roland> The idea is that we want to get something aligned to 8 bytes. Michael> Does & 4 do that? I think so -- I'd have to check to be sure, but the idea is that if the offset of ha is 4 mod 8, then we should subtract off 4, and if it's 0 mod 8, then we should subtract off 0. So "& 4" is right, I think. - R. From huanwei at cse.ohio-state.edu Wed Dec 14 13:36:38 2005 From: huanwei at cse.ohio-state.edu (wei huang) Date: Wed, 14 Dec 2005 16:36:38 -0500 (EST) Subject: [openib-general] *** glibc detected *** corrupted double-linked list error Message-ID: Hi, We encountered the following error when we call ibv_close_device: *** glibc detected *** corrupted double-linked list: 0x0000000000a54e10 *** Could someone tell us what could be the possible reasons for this error? Thanks! Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 From huanwei at cse.ohio-state.edu Wed Dec 14 13:44:52 2005 From: huanwei at cse.ohio-state.edu (wei huang) Date: Wed, 14 Dec 2005 16:44:52 -0500 (EST) Subject: [openib-general] *** glibc detected *** corrupted double-linked list error In-Reply-To: Message-ID: Sorry we forget some detailed information: 1) we use gen2 svn revision 4344 with linux kernel 2.6.14 2) Machine is Opteron. On Wed, 14 Dec 2005, wei huang wrote: > Hi, > > We encountered the following error when we call ibv_close_device: > *** glibc detected *** corrupted double-linked list: 0x0000000000a54e10 *** > > Could someone tell us what could be the possible reasons for this error? > > Thanks! > > Regards, > Wei Huang > > 774 Dreese Lab, 2015 Neil Ave, > Dept. of Computer Science and Engineering > Ohio State University > OH 43210 > Tel: (614)292-8501 > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mst at mellanox.co.il Wed Dec 14 13:59:15 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Dec 2005 23:59:15 +0200 Subject: [openib-general] Re: ipoib: question In-Reply-To: References: Message-ID: <20051214215915.GA18526@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: ipoib: question > > Roland> The idea is that we want to get something aligned to 8 bytes. > > Michael> Does & 4 do that? > > I think so -- I'd have to check to be sure, but the idea is that if > the offset of ha is 4 mod 8, then we should subtract off 4, and if > it's 0 mod 8, then we should subtract off 0. So "& 4" is right, I think. Oh, I see. Is this better? - return (struct ipoib_neigh **) (neigh->ha + 24 - - (offsetof(struct neighbour, ha) & 4)); + return (void*)neigh + ALIGN(offsetof(struct neighbour, ha) + INFINIBAND_ALEN, x) -- MST From nacc at us.ibm.com Wed Dec 14 15:23:08 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Wed, 14 Dec 2005 15:23:08 -0800 Subject: [openib-general] [PATCH] add LDFLAGS to perftest/Makefile In-Reply-To: <20051214171452.GA26274@esmail.cup.hp.com> References: <20051214095726.682B32283EE@openib.ca.sandia.gov> <20051214171452.GA26274@esmail.cup.hp.com> Message-ID: <20051214232308.GA3369@us.ibm.com> On 14.12.2005 [09:14:52 -0800], Grant Grundler wrote: > On Wed, Dec 14, 2005 at 01:57:26AM -0800, sagir at openib.org wrote: > > Author: sagir > > Date: 2005-12-14 01:57:24 -0800 (Wed, 14 Dec 2005) > > New Revision: 4453 > > > > Modified: > > trunk/contrib/mellanox/gen2/src/userspace/perftest/rdma_lat.c > > Can someone from mellanox explain why mainline src/userspace > is cloned under contrib/mellanox? Is there a reason the perftest/Makefile doesn't use LDFLAGS? Specifically, in automating userspace build & test, I put the IB libraries in a temporary directory, and exporting CFLAGS and LDFLAGS works with all other Makefiles (well, the ones I expect to work), but perftest does not seem to pick up my exports. Would something like the following make sense (sorry if a different -p is preferred)? Or does it need to be +=? Description: Add LDFLAGS to the perftest Makefile to allow library directories in non-standard locations to be specified. Signed-off-by: Nishanth Aravamudan --- Makefile 2005-12-14 14:57:04.000000000 -0800 +++ Makefile.ldflags 2005-12-14 14:57:23.000000000 -0800 @@ -2,6 +2,7 @@ TESTS = rdma_lat rdma_bw all: ${TESTS} +LDFLAGS = CFLAGS += -Wall -O2 -g -D_GNU_SOURCE LOADLIBES += -libverbs EXTRA_FILES = get_clock.c From nacc at us.ibm.com Wed Dec 14 18:25:19 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Wed, 14 Dec 2005 18:25:19 -0800 Subject: [openib-general] Re: [PATCH] add LDFLAGS to perftest/Makefile In-Reply-To: <20051214232308.GA3369@us.ibm.com> References: <20051214095726.682B32283EE@openib.ca.sandia.gov> <20051214171452.GA26274@esmail.cup.hp.com> <20051214232308.GA3369@us.ibm.com> Message-ID: <20051215022519.GF11674@us.ibm.com> On 14.12.2005 [15:23:08 -0800], Nishanth Aravamudan wrote: > On 14.12.2005 [09:14:52 -0800], Grant Grundler wrote: > > On Wed, Dec 14, 2005 at 01:57:26AM -0800, sagir at openib.org wrote: > > > Author: sagir > > > Date: 2005-12-14 01:57:24 -0800 (Wed, 14 Dec 2005) > > > New Revision: 4453 > > > > > > Modified: > > > trunk/contrib/mellanox/gen2/src/userspace/perftest/rdma_lat.c > > > > Can someone from mellanox explain why mainline src/userspace > > is cloned under contrib/mellanox? > > Is there a reason the perftest/Makefile doesn't use LDFLAGS? > Specifically, in automating userspace build & test, I put the IB > libraries in a temporary directory, and exporting CFLAGS and LDFLAGS > works with all other Makefiles (well, the ones I expect to work), but > perftest does not seem to pick up my exports. > > Would something like the following make sense (sorry if a different -p > is preferred)? Or does it need to be +=? It does need to be +=... Description: Add LDFLAGS to perftest/Makefile to allow non-standard library location. Signed-off-by: Nishanth Aravamudan --- Makefile 2005-12-14 14:57:04.000000000 -0800 +++ Makefile.ldflags 2005-12-14 14:57:23.000000000 -0800 @@ -2,6 +2,7 @@ TESTS = rdma_lat rdma_bw all: ${TESTS} +LDFLAGS += CFLAGS += -Wall -O2 -g -D_GNU_SOURCE LOADLIBES += -libverbs EXTRA_FILES = get_clock.c From nacc at us.ibm.com Wed Dec 14 18:29:04 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Wed, 14 Dec 2005 18:29:04 -0800 Subject: [openib-general] Compilation failure in perftest/rdma_lat (latest svn) Message-ID: <20051215022904.GG11674@us.ibm.com> Hi, ppc32 version of perftest is failing with: gcc -m32 -m32 -I/usr/local/autobench/var/tmp/out/ppc32/include -Wall -O2 -g -D_GNU_SOURCE rdma_lat.c get_clock.c -libverbs -o rdma_lat /usr/local/autobench/var/tmp//ccJld3yB.o(.text+0x6c8): In function `main': /usr/local/autobench/var/tmp/gen2-trunk/userspace/perftest/rdma_lat.c:111: undefined reference to `ibv_get_device_list' collect2: ld returned 1 exit status Thanks, Nish From rdreier at cisco.com Wed Dec 14 19:16:10 2005 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 14 Dec 2005 19:16:10 -0800 Subject: [openib-general] Compilation failure in perftest/rdma_lat (latest svn) In-Reply-To: <20051215022904.GG11674@us.ibm.com> (Nishanth Aravamudan's message of "Wed, 14 Dec 2005 18:29:04 -0800") References: <20051215022904.GG11674@us.ibm.com> Message-ID: > /usr/local/autobench/var/tmp/gen2-trunk/userspace/perftest/rdma_lat.c:111: undefined reference to `ibv_get_device_list' Is your build in sync with the latest tree? ibv_get_device_list was added to libibverbs in the same changeset that changed perftest to use it. - R. From rdreier at cisco.com Wed Dec 14 19:17:36 2005 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 14 Dec 2005 19:17:36 -0800 Subject: [openib-general] *** glibc detected *** corrupted double-linked list error In-Reply-To: (wei huang's message of "Wed, 14 Dec 2005 16:36:38 -0500 (EST)") References: Message-ID: wei> Hi, We encountered the following error when we call wei> ibv_close_device: *** glibc detected *** corrupted wei> double-linked list: 0x0000000000a54e10 *** wei> Could someone tell us what could be the possible reasons for wei> this error? Probably a memory-management bug somewhere. Can you get a traceback from a core dump when this happens? - R. From nacc at us.ibm.com Wed Dec 14 19:26:04 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Wed, 14 Dec 2005 19:26:04 -0800 Subject: [openib-general] Compilation failure in perftest/rdma_lat (latest svn) In-Reply-To: <20051215022904.GG11674@us.ibm.com> References: <20051215022904.GG11674@us.ibm.com> Message-ID: <20051215032604.GI11674@us.ibm.com> On 14.12.2005 [18:29:04 -0800], Nishanth Aravamudan wrote: > Hi, > > ppc32 version of perftest is failing with: > > gcc -m32 -m32 -I/usr/local/autobench/var/tmp/out/ppc32/include -Wall -O2 -g -D_GNU_SOURCE rdma_lat.c get_clock.c -libverbs -o rdma_lat > /usr/local/autobench/var/tmp//ccJld3yB.o(.text+0x6c8): In function `main': > /usr/local/autobench/var/tmp/gen2-trunk/userspace/perftest/rdma_lat.c:111: undefined reference to `ibv_get_device_list' > collect2: ld returned 1 exit status Nevermind, this is is an error on my part. Thanks, Nish From mst at mellanox.co.il Wed Dec 14 22:39:51 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Dec 2005 08:39:51 +0200 Subject: [openib-general] Re: [PATCH] libmthca: fix error handling in mthca_store_qp In-Reply-To: References: Message-ID: <20051215063951.GB26191@mellanox.co.il> Quoting Roland Dreier : > Subject: Re: [PATCH] libmthca: fix error handling in mthca_store_qp > > Thanks, good catch. I decided it was clearer to split off the test of > refcnt, and only increment refcnt if the allocation succeeds. > > - R. > > --- libmthca/src/qp.c (revision 4465) > +++ libmthca/src/qp.c (working copy) > @@ -875,13 +875,15 @@ int mthca_store_qp(struct mthca_context > > pthread_mutex_lock(&ctx->qp_table_mutex); > > - if (!ctx->qp_table[tind].refcnt++) { > + if (!ctx->qp_table[tind].refcnt) { > ctx->qp_table[tind].table = calloc(ctx->qp_table_mask + 1, > sizeof (struct mthca_qp *)); > if (!ctx->qp_table[tind].table) { > ret = -1; > goto out; > } > + > + ++ctx->qp_table[tind].refcnt; > } > > ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = qp; This does not look right: it seems you are incrementing the counter from 0 to 1, but it never goes to 2. How about this: --- Only increment qp_table ref count if allocation succeeds. Signed-off-by: Michael S. Tsirkin Index: openib/src/userspace/libmthca/src/qp.c =================================================================== --- openib/src/userspace/libmthca/src/qp.c (revision 4466) +++ openib/src/userspace/libmthca/src/qp.c (working copy) @@ -875,7 +875,7 @@ int mthca_store_qp(struct mthca_context pthread_mutex_lock(&ctx->qp_table_mutex); - if (!ctx->qp_table[tind].refcnt++) { + if (!ctx->qp_table[tind].refcnt) { ctx->qp_table[tind].table = calloc(ctx->qp_table_mask + 1, sizeof (struct mthca_qp *)); if (!ctx->qp_table[tind].table) { @@ -884,6 +884,8 @@ int mthca_store_qp(struct mthca_context } } + ++ctx->qp_table[tind].refcnt; + ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = qp; out: -- MST From mst at mellanox.co.il Wed Dec 14 22:52:41 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Dec 2005 08:52:41 +0200 Subject: [openib-general] Re: [PATCH] add LDFLAGS to perftest/Makefile In-Reply-To: <20051215022519.GF11674@us.ibm.com> References: <20051215022519.GF11674@us.ibm.com> Message-ID: <20051215065241.GE26191@mellanox.co.il> Quoting Nishanth Aravamudan : > > Is there a reason the perftest/Makefile doesn't use LDFLAGS? > > Specifically, in automating userspace build & test, I put the IB > > libraries in a temporary directory, and exporting CFLAGS and LDFLAGS > > works with all other Makefiles (well, the ones I expect to work), but > > perftest does not seem to pick up my exports. I'll fix this. -- MST From mst at mellanox.co.il Wed Dec 14 22:57:24 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Dec 2005 08:57:24 +0200 Subject: [openib-general] Re: [PATCH] add LDFLAGS to perftest/Makefile In-Reply-To: <20051214232308.GA3369@us.ibm.com> References: <20051214232308.GA3369@us.ibm.com> Message-ID: <20051215065723.GA26401@mellanox.co.il> Quoting Nishanth Aravamudan : > Is there a reason the perftest/Makefile doesn't use LDFLAGS? > Specifically, in automating userspace build & test, I put the IB > libraries in a temporary directory, and exporting CFLAGS and LDFLAGS > works with all other Makefiles (well, the ones I expect to work), but > perftest does not seem to pick up my exports. > > Would something like the following make sense (sorry if a different -p > is preferred)? Or does it need to be +=? > > Description: Add LDFLAGS to the perftest Makefile to allow library > directories in non-standard locations to be specified. Are you using gnu make? which version? Gnu make should use LDFLAGS automatically: Linking a single object file `N' is made automatically from `N.o' by running the linker (usually called `ld') via the C compiler. The precise command used is `$(CC) $(LDFLAGS) N.o $(LOADLIBES) $(LDLIBS)'. > Signed-off-by: Nishanth Aravamudan > > --- Makefile 2005-12-14 14:57:04.000000000 -0800 > +++ Makefile.ldflags 2005-12-14 14:57:23.000000000 -0800 > @@ -2,6 +2,7 @@ TESTS = rdma_lat rdma_bw > > all: ${TESTS} > > +LDFLAGS += > CFLAGS += -Wall -O2 -g -D_GNU_SOURCE > LOADLIBES += -libverbs > EXTRA_FILES = get_clock.c This really does nothing. Does this patch help you? -- MST From nacc at us.ibm.com Wed Dec 14 23:02:22 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Wed, 14 Dec 2005 23:02:22 -0800 Subject: [openib-general] Re: [PATCH] add LDFLAGS to perftest/Makefile In-Reply-To: <20051215065723.GA26401@mellanox.co.il> References: <20051214232308.GA3369@us.ibm.com> <20051215065723.GA26401@mellanox.co.il> Message-ID: <20051215070222.GK11674@us.ibm.com> On 15.12.2005 [08:57:24 +0200], Michael S. Tsirkin wrote: > Quoting Nishanth Aravamudan : > > Is there a reason the perftest/Makefile doesn't use LDFLAGS? > > Specifically, in automating userspace build & test, I put the IB > > libraries in a temporary directory, and exporting CFLAGS and LDFLAGS > > works with all other Makefiles (well, the ones I expect to work), but > > perftest does not seem to pick up my exports. > > > > Would something like the following make sense (sorry if a different -p > > is preferred)? Or does it need to be +=? > > > > Description: Add LDFLAGS to the perftest Makefile to allow library > > directories in non-standard locations to be specified. > > Are you using gnu make? which version? GNU Make 3.80 on SLES 9 SP2. > Gnu make should use LDFLAGS automatically: > > Linking a single object file > `N' is made automatically from `N.o' by running the linker > (usually called `ld') via the C compiler. The precise command > used is `$(CC) $(LDFLAGS) N.o $(LOADLIBES) $(LDLIBS)'. I thought this would be the case as well, but it didn't seem to work without the Makefile modification. > > Signed-off-by: Nishanth Aravamudan > > > > --- Makefile 2005-12-14 14:57:04.000000000 -0800 > > +++ Makefile.ldflags 2005-12-14 14:57:23.000000000 -0800 > > @@ -2,6 +2,7 @@ TESTS = rdma_lat rdma_bw > > > > all: ${TESTS} > > > > +LDFLAGS += > > CFLAGS += -Wall -O2 -g -D_GNU_SOURCE > > LOADLIBES += -libverbs > > EXTRA_FILES = get_clock.c > > This really does nothing. Does this patch help you? I didn't think it should do anything either, but it did allow the make to work on both ppc32 and ppc64 with LDFLAGS exported in the environment. Without the change, the build would fail as it would not have the appropriate -L flags. Thanks, Nish From mst at mellanox.co.il Wed Dec 14 23:35:12 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Dec 2005 09:35:12 +0200 Subject: [openib-general] Re: [PATCH] add LDFLAGS to perftest/Makefile In-Reply-To: <20051215070222.GK11674@us.ibm.com> References: <20051215070222.GK11674@us.ibm.com> Message-ID: <20051215073512.GA26722@mellanox.co.il> Quoting r. Nishanth Aravamudan : > Subject: Re: [PATCH] add LDFLAGS to perftest/Makefile > > On 15.12.2005 [08:57:24 +0200], Michael S. Tsirkin wrote: > > Quoting Nishanth Aravamudan : > > > Is there a reason the perftest/Makefile doesn't use LDFLAGS? > > > Specifically, in automating userspace build & test, I put the IB > > > libraries in a temporary directory, and exporting CFLAGS and LDFLAGS > > > works with all other Makefiles (well, the ones I expect to work), > but > > > perftest does not seem to pick up my exports. > > > > > > Would something like the following make sense (sorry if a different > -p > > > is preferred)? Or does it need to be +=? > > > > > > Description: Add LDFLAGS to the perftest Makefile to allow library > > > directories in non-standard locations to be specified. > > > > Are you using gnu make? which version? > > GNU Make 3.80 on SLES 9 SP2. > > > Gnu make should use LDFLAGS automatically: > > > > Linking a single object file > > `N' is made automatically from `N.o' by running the linker > > (usually called `ld') via the C compiler. The precise command > > used is `$(CC) $(LDFLAGS) N.o $(LOADLIBES) $(LDLIBS)'. > > I thought this would be the case as well, but it didn't seem to work > without the Makefile modification. > > > > Signed-off-by: Nishanth Aravamudan > > > > > > --- Makefile 2005-12-14 14:57:04.000000000 -0800 > > > +++ Makefile.ldflags 2005-12-14 14:57:23.000000000 -0800 > > > @@ -2,6 +2,7 @@ TESTS = rdma_lat rdma_bw > > > > > > all: ${TESTS} > > > > > > +LDFLAGS += > > > CFLAGS += -Wall -O2 -g -D_GNU_SOURCE > > > LOADLIBES += -libverbs > > > EXTRA_FILES = get_clock.c > > > > This really does nothing. Does this patch help you? > > I didn't think it should do anything either, but it did allow the make > to work on both ppc32 and ppc64 with LDFLAGS exported in the > environment. Without the change, the build would fail as it would not > have the appropriate -L flags. Looks like a work around for bug in make. I'll have a look. -- MST From mst at mellanox.co.il Thu Dec 15 00:30:35 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Dec 2005 10:30:35 +0200 Subject: [openib-general] Re: [openib-commits] r4453 - trunk/contrib/mellanox/gen2/src/userspace/perftest In-Reply-To: <20051214171452.GA26274@esmail.cup.hp.com> References: <20051214171452.GA26274@esmail.cup.hp.com> Message-ID: <20051215083035.GF26722@mellanox.co.il> Quoting Grant Grundler : > On Wed, Dec 14, 2005 at 01:57:26AM -0800, sagir at openib.org wrote: > > Author: sagir > > Date: 2005-12-14 01:57:24 -0800 (Wed, 14 Dec 2005) > > New Revision: 4453 > > > > Modified: > > trunk/contrib/mellanox/gen2/src/userspace/perftest/rdma_lat.c > > Can someone from mellanox explain why mainline src/userspace > is cloned under contrib/mellanox? Thats how subversion handles tags: we are tagging mainline approximately weekly. I could put it in some other place - just didnt want to interfere with people. > > Log: > > mtu per device > > You guys are certainly welcome to add stuff to contrib/mellanox. > I just would like to be able to explain to HP management why there > are two versions of rmda_lat.c. > > thanks, > grant Sagi is working on adding performance tests. We are not ready to get community feedback on that, yet. I agree it is probably better to leave rdma_lat.c alone and add new tests under new names. I'll fix that next week. -- MST From mst at mellanox.co.il Thu Dec 15 00:46:47 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Dec 2005 10:46:47 +0200 Subject: [openib-general] [PATCH applied] kill dead code around kmap_atomic Message-ID: <20051215084647.GH26722@mellanox.co.il> kmap_atomic never returns NULL. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.14/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- linux-2.6.14.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2005-11-15 21:10:49.000000000 +0200 +++ linux-2.6.14/drivers/infiniband/ulp/ipoib/ipoib.h 2005-12-07 21:23:33.000000000 +0200 @@ -78,6 +78,7 @@ enum { IPOIB_FLAG_SUBINTERFACE = 4, IPOIB_MCAST_RUN = 5, IPOIB_STOP_REAPER = 6, + IPOIB_MCAST_STARTED = 7, IPOIB_MAX_BACKOFF_SECONDS = 16, Index: linux-2.6.14/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- linux-2.6.14.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2005-12-07 17:42:56.000000000 +0200 +++ linux-2.6.14/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2005-12-08 20:28:02.000000000 +0200 @@ -203,16 +203,20 @@ static int ipoib_mcast_join_finish(struc { struct net_device *dev = mcast->dev; struct ipoib_dev_priv *priv = netdev_priv(dev); + unsigned long flags; int ret; mcast->mcmember = *mcmember; + spin_lock_irqsave(&priv->lock, flags); /* Set the cached Q_Key before we attach if it's the broadcast group */ - if (!memcmp(mcast->mcmember.mgid.raw, priv->dev->broadcast + 4, + if (priv->broadcast && + !memcmp(mcast->mcmember.mgid.raw, priv->dev->broadcast + 4, sizeof (union ib_gid))) { priv->qkey = be32_to_cpu(priv->broadcast->mcmember.qkey); priv->tx_wr.wr.ud.remote_qkey = priv->qkey; } + spin_unlock_irqrestore(&priv->lock, flags); if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) { if (test_and_set_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) { @@ -582,6 +586,10 @@ int ipoib_mcast_start_thread(struct net_ queue_work(ipoib_workqueue, &priv->mcast_task); up(&mcast_mutex); + spin_lock_irq(&priv->lock); + set_bit(IPOIB_MCAST_STARTED, &priv->flags); + spin_unlock_irq(&priv->lock); + return 0; } @@ -592,6 +600,10 @@ int ipoib_mcast_stop_thread(struct net_d ipoib_dbg_mcast(priv, "stopping multicast thread\n"); + spin_lock_irq(&priv->lock); + clear_bit(IPOIB_MCAST_STARTED, &priv->flags); + spin_unlock_irq(&priv->lock); + down(&mcast_mutex); clear_bit(IPOIB_MCAST_RUN, &priv->flags); cancel_delayed_work(&priv->mcast_task); @@ -674,6 +686,9 @@ void ipoib_mcast_send(struct net_device */ spin_lock(&priv->lock); + if (!test_bit(IPOIB_MCAST_STARTED, &priv->flags)) + goto unlock; + mcast = __ipoib_mcast_find(dev, mgid); if (!mcast) { /* Let's create a new send only group now */ @@ -732,6 +747,7 @@ out: ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN); } +unlock: spin_unlock(&priv->lock); } Index: linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_send.c =================================================================== --- linux-2.6.14.orig/drivers/infiniband/ulp/sdp/sdp_send.c 2005-12-15 13:25:11.000000000 +0200 +++ linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_send.c 2005-12-15 13:25:49.000000000 +0200 @@ -639,11 +639,6 @@ static int sdp_send_data_iocb_src(struct local_irq_save(flags); addr = kmap_atomic(iocb->page_array[pos], KM_IRQ0); - if (!addr) { - result = -ENOMEM; - local_irq_restore(flags); - goto error; - } memcpy(buff->tail, addr + off, len); @@ -711,10 +706,6 @@ static int sdp_send_iocb_buff_write(stru local_irq_save(flags); addr = kmap_atomic(iocb->page_array[counter], KM_IRQ0); - if (!addr) { - local_irq_restore(flags); - break; - } copy = min(PAGE_SIZE - offset, (unsigned long)(buff->end - buff->tail)); Index: linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_recv.c =================================================================== --- linux-2.6.14.orig/drivers/infiniband/ulp/sdp/sdp_recv.c 2005-12-15 13:25:11.000000000 +0200 +++ linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_recv.c 2005-12-15 13:25:53.000000000 +0200 @@ -599,8 +599,6 @@ static int sdp_read_buff_iocb(struct sdp local_irq_save(flags); addr = kmap_atomic(iocb->page_array[counter], KM_IRQ0); - if (!addr) - break; copy = min(PAGE_SIZE - offset, (unsigned long)(buff->tail - buff->data)); -- MST From mst at mellanox.co.il Thu Dec 15 00:49:45 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Dec 2005 10:49:45 +0200 Subject: [openib-general] Re: [PATCH applied] kill dead code around kmap_atomic In-Reply-To: <20051215084647.GH26722@mellanox.co.il> References: <20051215084647.GH26722@mellanox.co.il> Message-ID: <20051215084945.GI26722@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Subject: [PATCH applied] kill dead code around kmap_atomic > > kmap_atomic never returns NULL. > > Signed-off-by: Michael S. Tsirkin Sorry, wrong patch. Here it is: Index: linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_send.c =================================================================== --- linux-2.6.14.orig/drivers/infiniband/ulp/sdp/sdp_send.c 2005-12-15 13:25:11.000000000 +0200 +++ linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_send.c 2005-12-15 13:25:49.000000000 +0200 @@ -639,11 +639,6 @@ static int sdp_send_data_iocb_src(struct local_irq_save(flags); addr = kmap_atomic(iocb->page_array[pos], KM_IRQ0); - if (!addr) { - result = -ENOMEM; - local_irq_restore(flags); - goto error; - } memcpy(buff->tail, addr + off, len); @@ -711,10 +706,6 @@ static int sdp_send_iocb_buff_write(stru local_irq_save(flags); addr = kmap_atomic(iocb->page_array[counter], KM_IRQ0); - if (!addr) { - local_irq_restore(flags); - break; - } copy = min(PAGE_SIZE - offset, (unsigned long)(buff->end - buff->tail)); Index: linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_recv.c =================================================================== --- linux-2.6.14.orig/drivers/infiniband/ulp/sdp/sdp_recv.c 2005-12-15 13:25:11.000000000 +0200 +++ linux-2.6.14/drivers/infiniband/ulp/sdp/sdp_recv.c 2005-12-15 13:25:53.000000000 +0200 @@ -599,8 +599,6 @@ static int sdp_read_buff_iocb(struct sdp local_irq_save(flags); addr = kmap_atomic(iocb->page_array[counter], KM_IRQ0); - if (!addr) - break; copy = min(PAGE_SIZE - offset, (unsigned long)(buff->tail - buff->data)); -- MST From jackm at mellanox.co.il Thu Dec 15 01:23:03 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Thu, 15 Dec 2005 11:23:03 +0200 Subject: [openib-general] [PATCH] libmthca: fix SRQ cleanup during destroy-qp Message-ID: <20051215092303.GA27784@mellanox.co.il> When cleaning up a CQ for a QP attached to SRQ, need to free an SRQ wqe only if the CQE is a receive completion. Signed-off-by: Jack Morgenstein