[openib-general] [PATCH] opensm: truncate log file when fs is overflowed
Sasha Khapyorsky
sashak at voltaire.com
Tue Aug 29 12:01:40 PDT 2006
On 14:18 Tue 29 Aug , Hal Rosenstock wrote:
> Hi Sasha,
>
> On Tue, 2006-08-29 at 14:15, Sasha Khapyorsky wrote:
> > On 18:28 Sun 27 Aug , Doug Ledford wrote:
> > > On Sun, 2006-08-20 at 20:18 +0300, Sasha Khapyorsky wrote:
> > > > On 13:01 Sun 20 Aug , Hal Rosenstock wrote:
> > > > > Hi Sasha,
> > > > >
> > > > > On Sun, 2006-08-20 at 12:05, Sasha Khapyorsky wrote:
> > > > > > In case when OpenSM log file overflows filesystem and write() fails with
> > > > > > 'No space left on device' try to truncate the log file and wrap-around
> > > > > > logging.
> > > > >
> > > > > Should it be an (admin) option as to whether to truncate the file or not
> > > > > or is there no way to continue without logging (other than this) once
> > > > > the log file fills the disk ?
> > > >
> > > > In theory OpenSM may continue, but don't think it is good idea to leave
> > > > overflowed disk on the SM machine (by default it is '/var/log'). For me
> > > > truncating there looks as reasonable default behavior, don't think we
> > > > need the option.
> > >
> > > I would definitely put the option in, and in fact would default it to
> > > *NOT* truncate. If the disk is full, you have no idea why. It *might*
> > > be your logs, or it might be a mail bomb filling /var/spool/mail. I'm
> > > sure as an admin the last thing I would want is my apps deciding, based
> > > upon incomplete information, that wiping out their log files is the
> > > right thing to do. To me that sounds more like an intruder covering his
> > > tracks than a reasonable thing to do when confronted with ENOSPC.
> > >
> > > Truncating logs is something best left up to the admin that's dealing
> > > with the disk full problem in the first place. After all, if it is
> > > something like an errant app filling the mail spool, truncating the logs
> > > just looses valuable logs while at the same time making room for the app
> > > to keep on adding more to /var/spool/mail. That's just wrong. If you
> > > run out of space, just quit logging things until the admin clears the
> > > problem up. If you put this code in, make the admin turn it on. That
> > > will keep opensm friendly to appliance like devices that are single task
> > > subnet managers. But I don't think having this patch always on makes
> > > any sense on a multi task server.
> >
> > My expectation is that when OpenSM is running it will generate ENOSPC
> > more frequently than mail bombs, or other activities.
> >
> > But I see your point - don't take this control from an admin... I will
> > do this ENOSPC handling optional - actually there is another patch was
> > submitted, there is the option which limits OpenSM log file size. Will
> > add ENOSPC processing under same option.
> >
> > Hal, I will resend the patch soon.
>
> I'd prefer an incremental one off the last patch related to this if that
> isn't too much work as I'm close to committing the previous one now (and
> it'd be more work to start over on this).
Ok. There is:
Optional log file truncating upon ENOSPC errors.
Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
diff --git a/osm/opensm/osm_log.c b/osm/opensm/osm_log.c
index bc5f25c..e1c43d1 100644
--- a/osm/opensm/osm_log.c
+++ b/osm/opensm/osm_log.c
@@ -174,9 +174,11 @@ #endif
if (ret < 0 && errno == ENOSPC && log_exit_count < 3) {
fprintf(stderr, "osm_log write failed: %s. Truncating log file.\n",
strerror(errno));
- truncate_log_file(p_log);
log_exit_count++;
- goto _retry;
+ if (p_log->max_size) {
+ truncate_log_file(p_log);
+ goto _retry;
+ }
}
else {
log_exit_count = 0;
Sasha
>
> -- Hal
>
> > Sasha
> >
> > >
> > > > >
> > > > > See comment below as well.
> > > > >
> > > > > -- Hal
> > > > >
> > > > > > Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
> > > > > > ---
> > > > > >
> > > > > > osm/opensm/osm_log.c | 23 +++++++++++++++--------
> > > > > > 1 files changed, 15 insertions(+), 8 deletions(-)
> > > > > >
> > > > > > diff --git a/osm/opensm/osm_log.c b/osm/opensm/osm_log.c
> > > > > > index 668e9a6..b4700c8 100644
> > > > > > --- a/osm/opensm/osm_log.c
> > > > > > +++ b/osm/opensm/osm_log.c
> > > > > > @@ -58,6 +58,7 @@ #include <stdarg.h>
> > > > > > #include <fcntl.h>
> > > > > > #include <sys/types.h>
> > > > > > #include <sys/stat.h>
> > > > > > +#include <errno.h>
> > > > > >
> > > > > > #ifndef WIN32
> > > > > > #include <sys/time.h>
> > > > > > @@ -152,6 +153,7 @@ #endif
> > > > > > cl_spinlock_acquire( &p_log->lock );
> > > > > > #ifdef WIN32
> > > > > > GetLocalTime(&st);
> > > > > > + _retry:
> > > > > > ret = fprintf( p_log->out_port, "[%02d:%02d:%02d:%03d][%04X] -> %s",
> > > > > > st.wHour, st.wMinute, st.wSecond, st.wMilliseconds,
> > > > > > pid, buffer);
> > > > > > @@ -159,6 +161,7 @@ #ifdef WIN32
> > > > > > #else
> > > > > > pid = pthread_self();
> > > > > > tim = time(NULL);
> > > > > > + _retry:
> > > > > > ret = fprintf( p_log->out_port, "%s %02d %02d:%02d:%02d %06d [%04X] -> %s",
> > > > > > ((result.tm_mon < 12) && (result.tm_mon >= 0) ?
> > > > > > month_str[result.tm_mon] : "???"),
> > > > > > @@ -166,6 +169,18 @@ #else
> > > > > > result.tm_min, result.tm_sec,
> > > > > > usecs, pid, buffer);
> > > > > > #endif /* WIN32 */
> > > > > > +
> > > > > > + if (ret >= 0)
> > > > > > + log_exit_count = 0;
> > > > > > + else if (errno == ENOSPC && log_exit_count < 3) {
> > > > > > + int fd = fileno(p_log->out_port);
> > > > > > + fprintf(stderr, "log write failed: %s. Will truncate the log file.\n",
> > > > > > + strerror(errno));
> > > > > > + ftruncate(fd, 0);
> > > > >
> > > > > Should return from ftruncate be checked here ?
> > > >
> > > > May be checked, but I don't think that potential ftruncate() failure
> > > > should change the flow - in case of failure we will try to continue
> > > > with lseek() anyway (in order to wrap around the file at least).
> > > >
> > > > Sasha
> > > >
> > > > >
> > > > > > + lseek(fd, 0, SEEK_SET);
> > > > > > + log_exit_count++;
> > > > > > + goto _retry;
> > > > > > + }
> > > > > >
> > > > > > /*
> > > > > > Flush log on errors too.
> > > > > > @@ -174,14 +189,6 @@ #endif /* WIN32 */
> > > > > > fflush( p_log->out_port );
> > > > > >
> > > > > > cl_spinlock_release( &p_log->lock );
> > > > > > -
> > > > > > - if (ret < 0)
> > > > > > - {
> > > > > > - if (log_exit_count++ < 10)
> > > > > > - {
> > > > > > - fprintf(stderr, "OSM LOG FAILURE! Quota probably exceeded\n");
> > > > > > - }
> > > > > > - }
> > > > > > }
> > > > > > }
> > > > > >
> > > > >
> > > >
> > > > _______________________________________________
> > > > openib-general mailing list
> > > > openib-general at openib.org
> > > > http://openib.org/mailman/listinfo/openib-general
> > > >
> > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > > --
> > > Doug Ledford <dledford at redhat.com>
> > > GPG KeyID: CFBFF194
> > > http://people.redhat.com/dledford
> > >
> > > Infiniband specific RPMs available at
> > > http://people.redhat.com/dledford/Infiniband
> >
> >
>
More information about the general
mailing list