[openib-general][RFC][PATCH] core/sysfs.c: ability to reset port counters

Michael S. Tsirkin mst at mellanox.co.il
Tue May 9 06:17:13 PDT 2006


I think the capability to reset counters is useful.
Some comments on the patch:

Quoting r. Leonid Arsh <leonida at voltaire.com>:
> Subject: [openib-general][RFC][PATCH] core/sysfs.c: ability to reset port counters
> 
> 
> Hello,
> 
>      we need a possibility to reset the port counters in /sys/class/infiniband/mthca0/ports/1/counters/.
> 
>      The attached patch implements the ability to reset the counters by writing to the sysfs  counter files.
>      The patch uses the same process_mad() mechanism as in the counter reading, but with IB_MGMT_METHOD_SET.
> 
>      The patch was checked on IBED-1.0-rc3 code, with MTHCA adaptor.
> 
>      I  checked also possibility to set specific counter values, but always got the counter reset instead.

That's what the spec says.
IB spec:
C16-7: When initially powered-up or reset, the value of all counters on all
ports of a node shall be set to zero. During operation, instead of overflowing,
they shall stop at all ones. At any time, writing (Set) zero into a
counter shall cause the counter to be reset to zero.
Note that writing (Set) anything other than zero into a counter results in
undefined behavior.

>      The questions are:
>           Is it a protocol or a firmware limitation, that I couldn't set specific values?
>           If there is a way to set specific values, should we implement it?

I don't think it would be useful even if some devices supported it.

>           Should we implement an ability to reset (or set) specific counters,
>           just like like I did in this patch?

Generally I think what you do is fine. However you must validate the input
and reject any attempt to set the value to anything except 0, to avoid
triggering undefined behaviour in hardware.

>                (this can cause inconsistency between  counter values,
>                 for example between  port_xmit_data and port_xmit_packets)

You can't read the values atomically anyway, so why do you care
about atomic reset?

>           Should we create an additional sysfs entry for the counter reset purpose
>           (like /sys/class/infiniband/mthca0/ports/1/reset_counters) instead?

I wouldn't think so.

> 
>     Regards,
>        Leonid


Some comments:

> Signed-off-by: Leonid Arsh <leonida at voltaire.com>
> 
> --- linux-kernel/infiniband/core/sysfs.c.orig	2006-05-07 23:07:10.000000000 +0300
> +++ linux-kernel/infiniband/core/sysfs.c	2006-05-09 17:55:55.000000000 +0300
> @@ -88,8 +88,24 @@
>  	return port_attr->show(p, port_attr, buf);
>  }
>  
> +static ssize_t port_attr_store(struct kobject *kobj,
> +			       struct attribute *attr, const char *buf, size_t len)

Move more parameters to the first line.

Documentation/CodingStyle:
"Descendants are always substantially shorter than the parent"

> +{
> +	struct port_attribute *port_attr =
> +		container_of(attr, struct port_attribute, attr);
> +	struct ib_port *p = container_of(kobj, struct ib_port, kobj);
> +
> +	if (!port_attr->store)
> +		return -EIO;
> +	if (!ibdev_is_alive(p->ibdev))
> +		return -ENODEV;
> +
> +	return port_attr->store(p, port_attr, buf, len);
> +}
> +
>  static struct sysfs_ops port_sysfs_ops = {
> -	.show = port_attr_show
> +	.show = port_attr_show,
> +	.store = port_attr_store
>  };
>  
>  static ssize_t state_show(struct ib_port *p, struct port_attribute *unused,
> @@ -292,10 +308,65 @@
>  
>  #define PORT_PMA_ATTR(_name, _counter, _width, _offset)			\
>  struct port_table_attribute port_pma_attr_##_name = {			\
> -	.attr  = __ATTR(_name, S_IRUGO, show_pma_counter, NULL),	\
> +	.attr  = __ATTR(_name, S_IRUGO | S_IWUSR,			\
> +	 show_pma_counter, store_pma_counter),				\

Please move show_pma_counter back to the first line and align descendants at
least to the right from the parent brace:

Documentation/CodingStyle:
"Descendants are always substantially shorter than the parent and are placed
 substantially to the right."


>  	.index = (_offset) | ((_width) << 16) | ((_counter) << 24)	\
>  }
>  
> +static ssize_t store_pma_counter(struct ib_port *p, struct port_attribute *attr,
> +				 const char *buf, size_t count)

You seem to be ignoring the value user is writing.
Could this be why you always get the value set to 0?

Writing 0x1 into sysfs and getting 0 is surprizing.

I think you must check that the value is 0, since:

C16-7: When initially powered-up or reset, the value of all counters on all
ports of a node shall be set to zero. During operation, instead of overflowing,
they shall stop at all ones. At any time, writing (Set) zero into a
counter shall cause the counter to be reset to zero.
Note that writing (Set) anything other than zero into a counter results in
undefined behavior.

Any other value should return -EINVAL.


> +{
> +	struct port_table_attribute *tab_attr =
> +		container_of(attr, struct port_table_attribute, attr);
> +	int counter = (tab_attr->index >> 24) & 0xff;

index is 32 bit, so index >> 24 can't have high bits set, so 0xff is
unnecessary.

Also, make counter u8 for clarity.

> +	struct ib_mad *in_mad  = NULL;
> +	struct ib_mad *out_mad = NULL;

= NULL shouldn't be necessary.

> +	ssize_t ret;

No need for this variable - you can just assign to count.

> +
> +	if (!p->ibdev->process_mad)

This is really bad: user can't be expected to parse var log messages
to figure out why does echo fail. Can we avoid creating the sysfs file if
process_mad is NULL?

> +	{

{ must be on same line with if.

> +		printk("store_pma_counter() process_mad() == NULL");

printk left over from debug?

> +		ret = -EINVAL;
> +		goto out;

return -EINVAL here and don't waste cycles initializing ib_mad/out_mad.

> +	}
> +
> +	in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
> +	out_mad = kmalloc(sizeof *in_mad, GFP_KERNEL);
> +	if (!in_mad || !out_mad) {
> +		printk("store_pma_counter() NOMEM");
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	memset( in_mad, 0, sizeof *in_mad );

memset after kzalloc?
Also space inside ().

> +	in_mad->mad_hdr.base_version  = 1;
> +	in_mad->mad_hdr.mgmt_class    = IB_MGMT_CLASS_PERF_MGMT;
> +	in_mad->mad_hdr.class_version = 1;
> +	in_mad->mad_hdr.method        = IB_MGMT_METHOD_SET;
> +	in_mad->mad_hdr.attr_id       = cpu_to_be16(0x12); /* PortCounters */
> +
> +	*(__be16 *)(in_mad->data+42) = cpu_to_be16( ((__u16)1) << counter ); /* CounterSelect field */

Speces are required around +.

Don't use __u16 in source - its mainly for headers.
And the cast is not needed here anyway.

> +
> +	in_mad->data[41] = p->port_num;	/* PortSelect field */

A huge number of magic constants here. Please use named constants instead.

Instead of casts, I think it would be much better to create a proper structure
describing the MAD format that you use.
Then you just cast in_mad->data to that type and fill it in.


> +
> +	if ((p->ibdev->process_mad(p->ibdev, IB_MAD_IGNORE_MKEY,
> +		 p->port_num, NULL, NULL, in_mad, out_mad) &
> +	     (IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY)) !=

This line is too long. How about a temporary variable here?

!= is weaker than -> and function call, you don't need the extra ().


> +	    (IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY)) {
> +		printk("store_pma_counter() EINVAL");

Leftovers from debug?

> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	ret = count;
> +out:
> +	kfree(in_mad);
> +	kfree(out_mad);
> +
> +	return ret;
> +}
> +
>  static ssize_t show_pma_counter(struct ib_port *p, struct port_attribute *attr,
>  				char *buf)
>  {
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 

-- 
MST



More information about the general mailing list