6.4-release tset(1) really slow, what have I missed?

Discussion:

Adam Thompson

2018-12-02 22:09:56 UTC

I've successfully installed OpenBSD 6.4-RELEASE at OVH, but I'm noticing
one thing there that's different from everywhere else I've used 6.4.

tset(1) takes approximately 12-15 seconds to execute, (almost) every
time.

On a DigitalOcean VPS running 6.3-STABLE (via openup) tset sensibly
takes about 1 or 2 seconds:
***@mail.athompso.net:~$ time tset -s
TERM=xterm;
0m01.01s real 0m00.00s user 0m00.01s system
***@mail.athompso.net:~$ uname -r
6.3

On the OVH VPS running 6.4-STABLE (via openup), the same command takes
15 seconds:
***@mail2.athompso.net:~$ time tset -s
TERM=xterm;
0m15.19s real 0m00.00s user 0m00.01s system
***@mail2.athompso.net:~$ uname -r
6.4

That's from two SSH sessions from the same client with the same
parameters.

I've captured ktrace(1) output, which shows tset(1) doing, well,
nothing:
...
57429/443422 tset 0.035908 CALL
kbind(0x7f7fffff7678,24,0xecf2201fc1aab9ca)
57429/443422 tset 0.035933 RET kbind 0
57429/443422 tset 0.035950 CALL
nanosleep(0x7f7fffff7760,0x7f7fffff7750)
57429/443422 tset 0.035967 STRU struct timespec { 1 }
57429/443422 tset 15.809238 STRU struct timespec { 0 }
57429/443422 tset 15.809272 RET nanosleep 0
57429/443422 tset 15.809303 CALL
kbind(0x7f7fffff76c8,24,0xecf2201fc1aab9ca)
57429/443422 tset 15.809380 RET kbind 0
...

I don't think this is a bug in 6.4, it's clearly environment-specific...
but I have no idea what on earth could be causing it.

(dmesg, etc. omitted in this first message, since it's so ridiculously
specific.)

Oh, and to make it even weirder, it doesn't ALWAYS happen. It ran
quickly twice earlier today, but never again.

Normally I'd say "it's DNS", and I thought it was due to the slow login
times, but ktrace(1) says otherwise.

Any ideas?

Thanks,
-Adam

Philip Guenther

2018-12-03 02:50:32 UTC

Permalink

Post by Adam Thompson
I've successfully installed OpenBSD 6.4-RELEASE at OVH, but I'm noticing
one thing there that's different from everywhere else I've used 6.4.
tset(1) takes approximately 12-15 seconds to execute, (almost) every
time.
On a DigitalOcean VPS running 6.3-STABLE (via openup) tset sensibly
TERM=xterm;
0m01.01s real 0m00.00s user 0m00.01s system
6.3
On the OVH VPS running 6.4-STABLE (via openup), the same command takes
TERM=xterm;
0m15.19s real 0m00.00s user 0m00.01s system
6.4
That's from two SSH sessions from the same client with the same
parameters.
I've captured ktrace(1) output, which shows tset(1) doing, well,
...
57429/443422 tset 0.035908 CALL
kbind(0x7f7fffff7678,24,0xecf2201fc1aab9ca)
57429/443422 tset 0.035933 RET kbind 0
57429/443422 tset 0.035950 CALL
nanosleep(0x7f7fffff7760,0x7f7fffff7750)
57429/443422 tset 0.035967 STRU struct timespec { 1 }
57429/443422 tset 15.809238 STRU struct timespec { 0 }
57429/443422 tset 15.809272 RET nanosleep 0
57429/443422 tset 15.809303 CALL
kbind(0x7f7fffff76c8,24,0xecf2201fc1aab9ca)
57429/443422 tset 15.809380 RET kbind 0
...
I don't think this is a bug in 6.4, it's clearly environment-specific...
but I have no idea what on earth could be causing it.

It requested a sleep of 1 second and 15 seconds passed. That's a kernel
timetracking issue, so the output of "sysctl kern.timecounter" would be a
good place to start. Is this is an MP kernel using the CPU TSC, but on a
VM where the virtual CPU's TSCs aren't in sync?

Philip Guenther

Adam Thompson

2018-12-03 04:12:23 UTC

Permalink

It requested a sleep of 1 second and 15 seconds passed. That's a kernel timetracking issue, so the output of "sysctl kern.timecounter" would be a good place to start. Is this is an MP kernel using the CPU TSC, but on a VM where the virtual CPU's TSCs aren't in sync?
Philip Guenther

Thanks for the pointer! I noticed, once, that the system clock was way
off, but I assumed that was one of the boots where I didn't have
networking at startup so ntpd(8) -s failed to operate as expected. Bad
kernel timekeeping would also produce that result, though.

Anyway:

***@mail2.athompso.net:~$ sysctl kern.timecounter
kern.timecounter.tick=1
kern.timecounter.timestepwarnings=0
kern.timecounter.hardware=acpitimer
kern.timecounter.choice=i8254(0) acpitimer0(1000) dummy(-1000000)

and it's an SP kernel running on one vCPU. No way of knowing what's
happening under the hood, other than that it's OpenStack Nova, which
IIRC means KVM virtualization. Note the lack of advertised TSC support.

I'm unsure if my test is valid, but I switched to i8254 (confirmed
successful via sysctl), and tset(1) continues to pause for an
unnaturally long time. But then I rebooted and re-tested the same
sysctl vaules, and this time tset(1) took 1sec, as expected. WTF...

Well, putting that into sysctl.conf seems to be a reasonable workaround
for now. I also enabled timestepwarnings, and they are occurring,
although I don't yet know how often.

Is this likely to be a big enough problem that I should ditch this VPS
platform for now?

dmesg output, to verify SP kernel with no TSC:

OpenBSD 6.4 (GENERIC) #1: Mon Nov 26 09:32:17 CET 2018
***@syspatch-64-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 4177379328 (3983MB)
avail mem = 4041621504 (3854MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf6890 (10 entries)
bios0: vendor SeaBIOS version "2:1.10.2-58953eb7" date 04/01/2014
bios0: OpenStack Foundation OpenStack Nova
acpi0 at bios0: rev 0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP SSDT APIC
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel Core Processor (Haswell, no TSX, IBRS), 2394.81 MHz,
06-3c-01
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,SSE3,PCLMUL,VMX,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,IBRS,IBPB,ARAT,XSAVEOPT,MELTDOWN
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
64b/line 16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 1000MHz
ioapic0 at mainbus0: apid 0 pa 0xfec00000, version 11, 24 pins
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
...etc...

Thanks,

-Adam

Adam Thompson

2018-12-03 04:14:53 UTC

Permalink

I'm unsure if my test is valid, but I switched to i8254 (confirmed successful via sysctl), and tset(1) continues to pause for an unnaturally long time. But then I rebooted and re-tested the same sysctl vaules, and this time tset(1) took 1sec, as expected. WTF...
Well, putting that into sysctl.conf seems to be a reasonable workaround for now. I also enabled timestepwarnings, and they are occurring, although I don't yet know how often.

I understand why I got inconsistent results... I had two different hosts
open in the two windows I was using. Thank goodness they were both just
personal systems!

I'll re-test tomorrow morning when I'm more awake!

-Adam