Discussion:
pf, relayd, TCP keep alive and NAT, oh my!
Cameron Simpson
2021-06-01 00:25:38 UTC
Permalink
Can I enforce or implement TCP keep alives on a TCP stream via my
firewall?

Background:

I've got a client with an OpenBSD firewall and a Telstra NBN modem as
their modem.

Their IMAP server is upstream in the cloud (Unbuntu, courier imap). I
have this odd problem which I am beginning to suspect is the NBN modem
getting bored and dropping its NAT entries. Let me explain...

At the firewall end I see about 30 ESTABLISHED connections to the IMAP
server. At the IMAP server I see over 500, which is about where the IMAP
service stops accepting new connections, leading to errors from the
client mail readers.

My current theory is that the IMAP client connections issue the IMAP
IDLE command and go passive, waiting for email notifications from the
server. So we have an idle TCP connection across the firewall and
across the NBN modem (which NATs).

My conjecture is that at some point the modem discards idle connection
states. (This could just as well happen at any other intermediate
stateful router too.) After that event, the client end does something
which tries to use the connection, gets an RST from the modem, clean
tidyup happens on the client and in the firewall.

At the server end, none of this is seen and the imapd just sits around
idle, never releasing the connection and never stopping the matching
daemon process. This gradually rises to hit the server's configured
connection limit and it stops accepting new things.

If I had TCP keep alive turned on, both ends might tidy themselves up.
I can't enable that on the clients (various mail readers) or,
apparently, on the server configuration. I can't do it in PF because PF
just copies packets. I can't seem to do it in relayd either, though that
seems the obvious way to intercept the connection for this purpose.

Any suggestions?

I haven't fully validated my conjecture yet, BTW. It just fits the
symptoms I see.

Plan B is to build the latest courier-imap from source if I find the
time, but there may be no build option for this. I guess a single
setsockopt() call in the source would be enough, _if_ that can be done
on the accept end, which I haven't checked.

Plan B0 might be to disable IMAP IDLE support. Hmm.

Cheers,
Cameron Simpson <***@cskk.id.au>
Dirk Coetzee
2021-06-01 08:53:40 UTC
Permalink
Hi Cameron,

As a first guess, I would consider changing / implementing "set optimization". This made massive difference on our customers satellite internet connection.

man pf.conf



set optimization environment
Optimize state timeouts for one of the following network
environments:

aggressive
Aggressively expire connections. This can greatly reduce
the memory usage of the firewall at the cost of dropping
idle connections early.
conservative
Extremely conservative settings. Avoid dropping
legitimate connections at the expense of greater memory
utilization (possibly much greater on a busy network) and
slightly increased processor utilization.
high-latency
A high-latency environment (such as a satellite
connection).
normal A normal network environment. Suitable for almost all
networks.
satellite
Alias for high-latency.

The default value is normal.

-----Original Message-----
From: owner-***@openbsd.org <owner-***@openbsd.org> On Behalf Of Cameron Simpson
Sent: Tuesday, 1 June 2021 8:26 AM
To: ***@openbsd.org
Subject: pf, relayd, TCP keep alive and NAT, oh my!

Can I enforce or implement TCP keep alives on a TCP stream via my firewall?

Background:

I've got a client with an OpenBSD firewall and a Telstra NBN modem as their modem.

Their IMAP server is upstream in the cloud (Unbuntu, courier imap). I have this odd problem which I am beginning to suspect is the NBN modem getting bored and dropping its NAT entries. Let me explain...

At the firewall end I see about 30 ESTABLISHED connections to the IMAP server. At the IMAP server I see over 500, which is about where the IMAP service stops accepting new connections, leading to errors from the client mail readers.

My current theory is that the IMAP client connections issue the IMAP IDLE command and go passive, waiting for email notifications from the server. So we have an idle TCP connection across the firewall and across the NBN modem (which NATs).

My conjecture is that at some point the modem discards idle connection states. (This could just as well happen at any other intermediate stateful router too.) After that event, the client end does something which tries to use the connection, gets an RST from the modem, clean tidyup happens on the client and in the firewall.

At the server end, none of this is seen and the imapd just sits around idle, never releasing the connection and never stopping the matching daemon process. This gradually rises to hit the server's configured connection limit and it stops accepting new things.

If I had TCP keep alive turned on, both ends might tidy themselves up.
I can't enable that on the clients (various mail readers) or, apparently, on the server configuration. I can't do it in PF because PF just copies packets. I can't seem to do it in relayd either, though that seems the obvious way to intercept the connection for this purpose.

Any suggestions?

I haven't fully validated my conjecture yet, BTW. It just fits the symptoms I see.

Plan B is to build the latest courier-imap from source if I find the time, but there may be no build option for this. I guess a single
setsockopt() call in the source would be enough, _if_ that can be done on the accept end, which I haven't checked.

Plan B0 might be to disable IMAP IDLE support. Hmm.

Cheers,
Cameron Simpson <***@cskk.id.au>
Cameron Simpson
2021-06-02 00:30:05 UTC
Permalink
Post by Dirk Coetzee
As a first guess, I would consider changing / implementing "set
optimization". This made massive difference on our customers satellite
internet connection.
The customer has a terrestrial ISP connection.

I've got satellite at home, and do indeed use this setting.

I'm not sure it will help my client though.

Cheers,
Cameron Simpson <***@cskk.id.au>
Claudio Jeker
2021-06-01 09:04:19 UTC
Permalink
Post by Cameron Simpson
Can I enforce or implement TCP keep alives on a TCP stream via my
firewall?
I've got a client with an OpenBSD firewall and a Telstra NBN modem as
their modem.
Their IMAP server is upstream in the cloud (Unbuntu, courier imap). I
have this odd problem which I am beginning to suspect is the NBN modem
getting bored and dropping its NAT entries. Let me explain...
At the firewall end I see about 30 ESTABLISHED connections to the IMAP
server. At the IMAP server I see over 500, which is about where the IMAP
service stops accepting new connections, leading to errors from the
client mail readers.
My current theory is that the IMAP client connections issue the IMAP
IDLE command and go passive, waiting for email notifications from the
server. So we have an idle TCP connection across the firewall and
across the NBN modem (which NATs).
My conjecture is that at some point the modem discards idle connection
states. (This could just as well happen at any other intermediate
stateful router too.) After that event, the client end does something
which tries to use the connection, gets an RST from the modem, clean
tidyup happens on the client and in the firewall.
At the server end, none of this is seen and the imapd just sits around
idle, never releasing the connection and never stopping the matching
daemon process. This gradually rises to hit the server's configured
connection limit and it stops accepting new things.
If I had TCP keep alive turned on, both ends might tidy themselves up.
I can't enable that on the clients (various mail readers) or,
apparently, on the server configuration. I can't do it in PF because PF
just copies packets. I can't seem to do it in relayd either, though that
seems the obvious way to intercept the connection for this purpose.
Any suggestions?
Make sure you use 'block return' at least for the imap connections. This
way when the state is dropped the firewall will issue a RST packet to the
server which will close the connection.

On OpenBSD there is the 'net.inet.tcp.always_keepalive' sysctl to enable
keepalive by default. So that is something you can enable on the IMAP
server to force keep-alive on there. Other systems have similar knobs.
--
:wq Claudio
Cameron Simpson
2021-06-02 00:23:44 UTC
Permalink
Post by Claudio Jeker
Make sure you use 'block return' at least for the imap connections.
I already do:

set block-policy return
[... and the first rule ...]
# reject everything except as detailed below
block return log
Post by Claudio Jeker
This
way when the state is dropped the firewall will issue a RST packet to the
server which will close the connection.
Alas, no. I believe that the _modem_ is dropping its NAT state (or some
upstream stateful switch is getting likewise bored) and that the
connection is idle. The firewall's modem's probably sending an RST to
the client if it tries to use the connection after the modem forgets it,
or something, causing the client to make a new connection to recover.

The state table on the firewall itself seems fine (about 30 connections,
in keeping with the staff and devices in the office).

The problem is server side (cloud mail server). The connection goes
idle, the office modem forgets the NAT, the server never sees _any_
indication that the TCP is no longer valid because it's idle.
Post by Claudio Jeker
On OpenBSD there is the 'net.inet.tcp.always_keepalive' sysctl to
enable keepalive by default. So that is something you can enable on the IMAP
server to force keep-alive on there. Other systems have similar knobs.
The IMAP server is Linux, so I'll look at that. Thanks!

Also, setting this on the firewall and interposing relayd would also do
the same trick. SO that will be my fallback plan.

Thanks,
Cameron Simpson <***@cskk.id.au>
Stuart Henderson
2021-06-01 20:43:44 UTC
Permalink
Post by Cameron Simpson
If I had TCP keep alive turned on, both ends might tidy themselves up.
I can't enable that on the clients (various mail readers) or,
apparently, on the server configuration. I can't do it in PF because PF
just copies packets. I can't seem to do it in relayd either, though that
seems the obvious way to intercept the connection for this purpose.
It looks like courier-imap does enable SO_KEEPALIVE if available.
By default, keepalive timers are long; on a random Linux I had handy:

$ grep . /proc/sys/net/ipv4/tcp_keepalive_*
/proc/sys/net/ipv4/tcp_keepalive_intvl:75
/proc/sys/net/ipv4/tcp_keepalive_probes:9
/proc/sys/net/ipv4/tcp_keepalive_time:7200

7200s (2h) initially, then every 75 seconds. (OpenBSD default times are
long too; 14400 "slowhz" intervals = also 2h).
Post by Cameron Simpson
Plan B is to build the latest courier-imap from source if I find the
time, but there may be no build option for this. I guess a single
setsockopt() call in the source would be enough, _if_ that can be done
on the accept end, which I haven't checked.
https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/addsupport.html but I don't think
you'll need it.

So you probably just need to lower tcp_keepalive_time, and perhaps adjust
tcp_keepalive_intvl. Note there is a tradeoff especially with mobile
clients; they will need to wake and transmit more often, so faster
keepalives will result in more battery/data use.
Post by Cameron Simpson
Plan B0 might be to disable IMAP IDLE support. Hmm.
Depends on timings whether that will help; think it's a last ditch effort
though, I think it will make things noticably worse for clients.
Cameron Simpson
2021-06-02 00:28:32 UTC
Permalink
Post by Stuart Henderson
Post by Cameron Simpson
If I had TCP keep alive turned on, both ends might tidy themselves up.
I can't enable that on the clients (various mail readers) or,
apparently, on the server configuration. I can't do it in PF because PF
just copies packets. I can't seem to do it in relayd either, though that
seems the obvious way to intercept the connection for this purpose.
It looks like courier-imap does enable SO_KEEPALIVE if available.
Hmm. Ok. I wonder how recent that is? I have 5.0.6 IIRC, and current is
5.1.something.
Post by Stuart Henderson
$ grep . /proc/sys/net/ipv4/tcp_keepalive_*
/proc/sys/net/ipv4/tcp_keepalive_intvl:75
/proc/sys/net/ipv4/tcp_keepalive_probes:9
/proc/sys/net/ipv4/tcp_keepalive_time:7200
7200s (2h) initially, then every 75 seconds. (OpenBSD default times are
long too; 14400 "slowhz" intervals = also 2h).
Ah. A long time indeed. Yes, winding these down will help - the above
times are in the same magnitude as the time required to hit the
connection limits.
Post by Stuart Henderson
Post by Cameron Simpson
Plan B is to build the latest courier-imap from source if I find the
time, but there may be no build option for this. I guess a single
setsockopt() call in the source would be enough, _if_ that can be done
on the accept end, which I haven't checked.
https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/addsupport.html but I don't think
you'll need it.
Ta.
Post by Stuart Henderson
So you probably just need to lower tcp_keepalive_time, and perhaps adjust
tcp_keepalive_intvl. Note there is a tradeoff especially with mobile
clients; they will need to wake and transmit more often, so faster
keepalives will result in more battery/data use.
I can wind it down to a handful of minutes without any serious impact
I'd expect.
Post by Stuart Henderson
Post by Cameron Simpson
Plan B0 might be to disable IMAP IDLE support. Hmm.
Depends on timings whether that will help; think it's a last ditch effort
though, I think it will make things noticably worse for clients.
Courier lets me change the advertised capabilities (it is not clear if
that affects the actual capabilities). Not joy; possibly some clinets
will try IDLE even if it isn't advertised and just cope if not
supported, so maybe some clients are using IDLE successfully anyway.

At any rate, dropping IDLE from the advertised list didn't help, and my
hourly "restart imapd" cron is live again :-(

I'll look at the keepalive settings on the server, many thanks!

Cheers,
Cameron Simpson <***@cskk.id.au>
Stuart Henderson
2021-06-02 08:53:40 UTC
Permalink
Post by Cameron Simpson
Post by Stuart Henderson
Post by Cameron Simpson
If I had TCP keep alive turned on, both ends might tidy themselves up.
I can't enable that on the clients (various mail readers) or,
apparently, on the server configuration. I can't do it in PF because PF
just copies packets. I can't seem to do it in relayd either, though that
seems the obvious way to intercept the connection for this purpose.
It looks like courier-imap does enable SO_KEEPALIVE if available.
Hmm. Ok. I wonder how recent that is? I have 5.0.6 IIRC, and current is
5.1.something.
A long time - it was there in the initial git commit when the files were
imported from svn, certainly before 5.0.6.

https://github.com/svarshavchik/courier-libs/blame/142f42378608e593eb36ceb33895db99948427aa/tcpd/tcpd.c#L1238
Post by Cameron Simpson
Post by Stuart Henderson
$ grep . /proc/sys/net/ipv4/tcp_keepalive_*
/proc/sys/net/ipv4/tcp_keepalive_intvl:75
/proc/sys/net/ipv4/tcp_keepalive_probes:9
/proc/sys/net/ipv4/tcp_keepalive_time:7200
7200s (2h) initially, then every 75 seconds. (OpenBSD default times are
long too; 14400 "slowhz" intervals = also 2h).
Ah. A long time indeed. Yes, winding these down will help - the above
times are in the same magnitude as the time required to hit the
connection limits.
Yes - set in the days before stateful firewalls and NAT devices with limited
memory were more common, so the only thing they really needed to
protect against was connections building up from clients that had
crashed/powered off or with some broken
network parhs.

Loading...