Discussion:
The case of the phantom reboot
Nick Holland
2021-03-29 12:28:33 UTC
Permalink
OpenBSD 6.8 GENERIC#5 i386
One of my systems rebooted at 03:01 local time today. I've seen kernel
panics and bad hardware but I've never seen OpenBSD "just reboot" by
itself, ever.
OpenBSD, not usually. Hardware OpenBSD is running on? Sure.
There's no cron job that would do this. last(1) is no help; it shows the
reboot    ~                                 Sat Mar 27 03:01
root      ttyp0    192.168.0.132            Wed Mar 24 11:23 - 11:23
(00:00)
wtmp.0 begins Wed Mar 24 11:23 2021
root      ttyp0    192.168.0.132            Tue Mar 16 21:30 - 21:30
(00:00)
root      ttyp0    75.82.86.131             Tue Mar 16 13:14 - 21:30
(08:15)
root      ttyp0    75.82.86.131             Sun Mar 14 21:20 - 21:29
(00:08)
root      ttyp0    75.82.86.131             Sat Mar 13 17:42 - 21:13
(03:31)
The date gaps seem odd. I've ssh'd into this system multiple times
between March 16-27. I don't see other signs of trouble in /var/log.
I could use some help in looking for evidence of foul play, or "just" a
hardware or software problem.
Thanks in advance for further troubleshooting clues.
dn
What kind of a machine is it running on? I remember having reboot
problems on certain HP and Supermicro servers with hardware watchdogs.
This is a 10+-year-old Dell 1U server with a 2-GHz Celeron 440, part of
a pair running CARP. Aside from having to replace spinning disks with
SSDs a couple of years ago, they've been rock solid.
basic machine, worked for a long time, then starts giving problems, almost
certainly a hw problem unless you can tie the problem to a recent upgrade.
And that's not terribly likely on a "basic" hardware.

Every broken device started out "rock solid" ... until it isn't. That's
the definition of "Broken".
I too have seen issues with Supermicros but that's with other OSs. I've
never had a spontaneous reboot, on this system, and am concerned from
the wtmp stuff above that this *may* have been triggered externally. I
could use some clues in other things to check. Thanks.
As Stuart pointed out, that comes from the boot process, not the shutdown.

If you are really curious, you could put a serial console on it and wait
for the next event. PROBABLY won't see much, however.

Believe me, I'm all in favor of recycling computers -- in fact, as I
often tell skeptical employers, I'd rather have two ten year old systems
than one brand new system with a service contract, but computers don't
last as long as they used to, and curiously, some big-name servers seem
to sometimes have a shorter life than some desktops, A ten year old
computer that does the job reliably is good, but not an expectation.

Nick.
Rick Aliwalas
2021-03-28 23:40:13 UTC
Permalink
It is something that could possibly be caused by bad hardware or a
glitch in the power feed amongst other options (the latter may affect
some machines differently than others)..
I've had a string of power "blips" over the last year or so. Oddly
enough, the OpenBSD machine always stays up and a Debian machine
next to it on the same power strip reboots. I always figured it was
due to the superior operating system ;)
Marco Scholz
2021-03-30 18:08:21 UTC
Permalink
On Sun, Mar 28, 2021 at 08:05:58PM -0000, Stuart Henderson wrote:
[...]
It is something that could possibly be caused by bad hardware or a
glitch in the power feed amongst other options (the latter may affect
some machines differently than others)..
Power glitch, bad power supply, bad RAM, ...
Do you have a UPS? If so I bet it's a hardware problem.
David Newman
2021-03-29 19:51:36 UTC
Permalink
OpenBSD 6.8 GENERIC#5 i386
One of my systems rebooted at 03:01 local time today. I've seen kernel
panics and bad hardware but I've never seen OpenBSD "just reboot" by
itself, ever.
OpenBSD, not usually.  Hardware OpenBSD is running on? Sure.
There's no cron job that would do this. last(1) is no help; it shows the
reboot   
~                                
Sat Mar 27 03:01
root      ttyp0    192.168.0.132            Wed
Mar 24 11:23 - 11:23
(00:00)
wtmp.0 begins Wed Mar 24 11:23 2021
root      ttyp0    192.168.0.132            Tue
Mar 16 21:30 - 21:30
(00:00)
root      ttyp0    75.82.86.131             Tue
Mar 16 13:14 - 21:30
(08:15)
root      ttyp0    75.82.86.131             Sun
Mar 14 21:20 - 21:29
(00:08)
root      ttyp0    75.82.86.131             Sat
Mar 13 17:42 - 21:13
(03:31)
The date gaps seem odd. I've ssh'd into this system multiple times
between March 16-27. I don't see other signs of trouble in /var/log.
I could use some help in looking for evidence of foul play, or "just" a
hardware or software problem.
Thanks in advance for further troubleshooting clues.
dn
What kind of a machine is it running on? I remember having reboot
problems on certain HP and Supermicro servers with hardware watchdogs.
This is a 10+-year-old Dell 1U server with a 2-GHz Celeron 440, part of
a pair running CARP. Aside from having to replace spinning disks with
SSDs a couple of years ago, they've been rock solid.
basic machine, worked for a long time, then starts giving problems, almost
certainly a hw problem unless you can tie the problem to a recent upgrade.
And that's not terribly likely on a "basic" hardware.
Every broken device started out "rock solid" ... until it isn't.  That's
the definition of "Broken".
I too have seen issues with Supermicros but that's with other OSs. I've
never had a spontaneous reboot, on this system, and am concerned from
the wtmp stuff above that this *may* have been triggered externally. I
could use some clues in other things to check. Thanks.
As Stuart pointed out, that comes from the boot process, not the shutdown.
If you are really curious, you could put a serial console on it and wait
for the next event.  PROBABLY won't see much, however.
Believe me, I'm all in favor of recycling computers -- in fact, as I
often tell skeptical employers, I'd rather have two ten year old systems
than one brand new system with a service contract, but computers don't
last as long as they used to, and curiously, some big-name servers seem
to sometimes have a shorter life than some desktops,  A ten year old
computer that does the job reliably is good, but not an expectation.
I hope it is "just" a hardware problem. These ancient machines don't owe
me anything. If anything they've been a testament to how well OpenBSD
just works, year in, year out.

Until I can swap in a replacement (the unit in question is in a colo in
another state), I may try Stuart's suggestion of enabling accounting.
The only concern I have about an external actor is that there seem to be
some missing entries in wtmp, but I don't know enough about init or wtmp
to rule out a hardware glitch.

Someone else suggested a battery problem, which seems plausible for a
unit this old.

Appreciate all the feedback -- many thanks.

dn
Rafael Possamai
2021-04-01 21:51:24 UTC
Permalink
One of my systems rebooted at 03:01 local time today.
Do you happen to have a cat nearby?
David Newman
2021-04-05 19:03:54 UTC
Permalink
Post by Rafael Possamai
One of my systems rebooted at 03:01 local time today.
Do you happen to have a cat nearby?
:-)

I'm allergic, and this box is in a colo.

Appreciate all the feedback. I've enabled accounting per Stuart's
suggestion and am pretty sure this is a hiccup on old hardware.

dn

Continue reading on narkive:
Loading...