[openib-general] [PATCH] opensm: truncate log file when fs is overflowed

Sasha Khapyorsky sashak at voltaire.com
Sun Aug 20 10:18:08 PDT 2006


On 13:01 Sun 20 Aug     , Hal Rosenstock wrote:
> Hi Sasha,
> 
> On Sun, 2006-08-20 at 12:05, Sasha Khapyorsky wrote:
> > In case when OpenSM log file overflows filesystem and write() fails with
> > 'No space left on device' try to truncate the log file and wrap-around
> > logging.
> 
> Should it be an (admin) option as to whether to truncate the file or not
> or is there no way to continue without logging (other than this) once
> the log file fills the disk ?

In theory OpenSM may continue, but don't think it is good idea to leave
overflowed disk on the SM machine (by default it is '/var/log'). For me
truncating there looks as reasonable default behavior, don't think we
need the option.

> 
> See comment below as well.
> 
> -- Hal
> 
> > Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
> > ---
> > 
> >  osm/opensm/osm_log.c |   23 +++++++++++++++--------
> >  1 files changed, 15 insertions(+), 8 deletions(-)
> > 
> > diff --git a/osm/opensm/osm_log.c b/osm/opensm/osm_log.c
> > index 668e9a6..b4700c8 100644
> > --- a/osm/opensm/osm_log.c
> > +++ b/osm/opensm/osm_log.c
> > @@ -58,6 +58,7 @@ #include <stdarg.h>
> >  #include <fcntl.h>
> >  #include <sys/types.h>
> >  #include <sys/stat.h>
> > +#include <errno.h>
> >  
> >  #ifndef WIN32
> >  #include <sys/time.h>
> > @@ -152,6 +153,7 @@ #endif    
> >      cl_spinlock_acquire( &p_log->lock );
> >  #ifdef WIN32
> >      GetLocalTime(&st);
> > + _retry:
> >      ret = fprintf(   p_log->out_port, "[%02d:%02d:%02d:%03d][%04X] -> %s",
> >                       st.wHour, st.wMinute, st.wSecond, st.wMilliseconds,
> >                       pid, buffer);
> > @@ -159,6 +161,7 @@ #ifdef WIN32
> >  #else
> >      pid = pthread_self();
> >      tim = time(NULL);
> > + _retry:
> >      ret = fprintf( p_log->out_port, "%s %02d %02d:%02d:%02d %06d [%04X] -> %s",
> >                     ((result.tm_mon < 12) && (result.tm_mon >= 0) ? 
> >                      month_str[result.tm_mon] : "???"),
> > @@ -166,6 +169,18 @@ #else
> >                     result.tm_min, result.tm_sec,
> >                     usecs, pid, buffer);
> >  #endif /*  WIN32 */
> > +
> > +    if (ret >= 0)
> > +      log_exit_count = 0;
> > +    else if (errno == ENOSPC && log_exit_count < 3) {
> > +      int fd = fileno(p_log->out_port);
> > +      fprintf(stderr, "log write failed: %s. Will truncate the log file.\n",
> > +              strerror(errno));
> > +      ftruncate(fd, 0);
> 
> Should return from ftruncate be checked here ?

May be checked, but I don't think that potential ftruncate() failure
should change the flow - in case of failure we will try to continue
with lseek() anyway (in order to wrap around the file at least).

Sasha

> 
> > +      lseek(fd, 0, SEEK_SET);
> > +      log_exit_count++;
> > +      goto _retry;
> > +    }
> >      
> >      /*
> >        Flush log on errors too.
> > @@ -174,14 +189,6 @@ #endif /*  WIN32 */
> >        fflush( p_log->out_port );
> >      
> >      cl_spinlock_release( &p_log->lock );
> > -    
> > -    if (ret < 0)
> > -    {
> > -      if (log_exit_count++ < 10)
> > -      {
> > -        fprintf(stderr, "OSM LOG FAILURE! Quota probably exceeded\n");
> > -      }
> > -    }
> >    }
> >  }
> >  
> 




More information about the general mailing list