Discussion:
TLS suddenly not working over IKED site-to-site
(too old to reply)
Rachel Roch
2018-12-03 17:18:38 UTC
Permalink
I hope someone here can shed light on an infuriating problem I’ve spent a week trying to resolve without luck.

The problem concerns an IKED site-to-site VPN on OpenBSD 6.3 (both endpoints fully syspatched).

The VPN worked absolutely perfectly until it suddenly started behaving strangely.  Seriously, I’m talking about “pass any traffic you can think of”, then I go on holiday for a week (nobody else has physical or remote access to the machines, and I did not connect on holiday), then this behaviour starts.

Basically the behaviour I am seeing is that anything that uses TLS is no longer able to connect (or at least gets no further than trying to do a TLS handshake, e.g. Firefox hangs showing "performing TLS handshake..." at the bottom of the screen), so that means:

- HTTPS websites
- VoIP
- IMAP over TLS
- RDP over TLS

Are all broken on the VPN, but all TLS-based services continue to work perfectly off-site (or when the site-to-site VPN is bypassed with a third-party VPN).  This impacts multiple servers and multiple clients, so its not just one server or one desktop PC, its anything that tries to talk TLS over that VPN !


However:
- Ping (including large packet size, e.g. “-s 1600”)
- SSH
- DNS
- Anything else you care to name that doesn’t use TLS

All continue to work perfectly over the VPN.

My PF rules (which cannot possibly be the problem, because they have not changed a single bit between “working” and “not working) don’t even differentiate between traffic types, so it can’t be some sudden PF oddity :

pass in on enc from <remote_vpnets> to <local_vpnets> keep state (if-bound) $midPriority
pass out on enc from <ocal_vpnets> to <remote_vpnets> keep state (if-bound) $midPriority

Similarly, my IKED config is also completely unchanged between "working" and "not working", and ipsecctl -sa continues to show everything correctly established

ikev2 "to remote" active esp from $a_net to $b_net\
        local $local_ext peer $remote_ext \
        ikesa auth hmac-sha2-512 enc aes-256 prf hmac-sha2-512 group curve25519 \
        childsa enc chacha20-poly1305 group curve25519 \
        srcid $local_ext dstid $remote_ext \
        ikelifetime 4h lifetime 3h bytes 512M \
        ecdsa384


This whole thing is just driving me crazy !
Rachel Roch
2018-12-03 18:17:15 UTC
Permalink
Hello,
This appears to be the same thing I have been having issues with and mentioned in a post to misc last week ("Untable ssl connections over ikev2 VPN") - (yes, typo intact - it should be "unstable").
I have tried adding a "max-mss 1300" directive into pf.conf (i.e.: "match in all scrub (no-df random-id max-mss 1300)").
At first, I _thought_ this made a difference, but I am not sure if that is really true.
I have also noticed that the TLS failures seem to vary based on OS. At this point, I was able to get an https connection to work with firefox on MacOS, but the TLS handshake continues to hang (100% of the time) with firefox on a Windows 7 PC. With an openBSD laptop, it seems like it sometimes works and sometimes doesn't (using "openssl s_client" to test).
I also made no changes in pf.conf or iked.conf from the working to non-working period.
I have no idea what to do; I am just posting my observations if that helps.
Thanks
Hi,

Glad its just not me !!! Even if you don't know the fix, at least I now know I haven't gone completely crazy !

For me it more consistent, on OSX its 100% hang, on Windows 10 its 100% hang.  Haven't tried OpenBSD client yet, I've been too busy putting emergency workarounds in place to bypass the site-to-site stuff. Will try OpenBSD client though when I get a chance.

Appreciate you taking the time to email ... keep in touch !
Rachel Roch
2018-12-03 18:45:40 UTC
Permalink
Rachel,
$ openssl s_client -connect <hostname>:<port> -showcerts
There are more possible options on s_client to debug more deeply but this is a good start.
--Paul
In answer to the above. Testing against three "random" servers  (see bottom of the email for full exchange, top three are through VPN, rest are bypassing VPN):

Through the VPN:
- Server "A" (HTTPS with "real" cert)- Nothing more than "CONNECTED (00000005)"
- Server "B" (HTTPS with "self-signed" cert)- Certificates get displayed (this correlates with behaviour seen in browser where I get shown the "do you want to continue" prompt, I can see details of the certs presented, but when I click continue it hangs)
- Server "C" (IMAPS) - Nothing more than "CONNECTED (00000005)"

Bypassing the VPN:
- Server A shows certs in openssl(and browser works ok)- Server "C" shows certs in openssl (and email client works ok)

foobarOVERVPN $ openssl s_client -connect web1.example.com:443 -showcerts
CONNECTED(00000005)
^C
foobarOVERVPN $ openssl s_client -connect web2.example.com:8443 -showcerts
CONNECTED(00000005)
depth=0 C = US, ST = CA, L = San Jose, O = example.com, OU = MyCorp, CN = MyCorp
verify error:num=18:self signed certificate
verify return:1
depth=0 C = US, ST = CA, L = San Jose, O = example.com, OU = MyCorp, CN = MyCorp
verify return:1
---
Certificate chain
0 s:/C=ZZ/ST=AA/L=BB/O=example.com/OU=MyCorp/CN=MyCorp <http://example.com/OU=MyCorp/CN=MyCorp>
   i:/C=ZZ/ST=AA/L=BB/O=example.com/OU=MyCorp/CN=MyCorp <http://example.com/OU=MyCorp/CN=MyCorp>
-----BEGIN CERTIFICATE-----
<SNIP>
-----END CERTIFICATE-----
---
Server certificate
subject=/C=ZZ/ST=AA/L=BB/O=example.com/OU=MyCorp/CN=MyCorp <http://example.com/OU=MyCorp/CN=MyCorp>
issuer=/C=ZZ/ST=AA/L=BB/O=example.com/OU=MyCorp/CN=MyCorp <http://example.com/OU=MyCorp/CN=MyCorp>
---
No client certificate CA names sent
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 1316 bytes and written 326 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES128-GCM-SHA256
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES128-GCM-SHA256
    Session-ID: 5C0575730056006E28542F880C1AB6541729337C0DDBEC95347E2B5B4669EAD7
    Session-ID-ctx:
    Master-Key: 66B8EB1A3FB0857509627840D8DDB595659A5794D365D462DED737AAD4532F4AD542663B8BAE27A7665539D15C14ADEA
    Start Time: 1543861619
    Timeout   : 7200 (sec)
    Verify return code: 18 (self signed certificate)
---
^C

foobarOVERVPN $ openssl s_client -connect imaps.example.com:993 -showcerts
CONNECTED(00000005)
^C
foobarBYPASSVPN $ openssl s_client -connect web1.example.com:443 -showcerts
CONNECTED(00000005)
depth=3 C = SE, O = AddTrust AB, OU = AddTrust External TTP Network, CN = AddTrust External CA Root
verify return:1
depth=2 C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO RSA Certification Authority
verify return:1
depth=1 C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO RSA Domain Validation Secure Server CA
verify return:1
depth=0 OU = Domain Control Validated, OU = PositiveSSL, CN = web1.example.com
verify return:1
---
Certificate chain
0 s:/OU=Domain Control Validated/OU=PositiveSSL/CN=web1.example.com
   i:/C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=COMODO RSA Domain Validation Secure Server CA
-----BEGIN CERTIFICATE-----
<SNIP>
-----END CERTIFICATE-----
1 s:/C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=COMODO RSA Domain Validation Secure Server CA
   i:/C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=COMODO RSA Certification Authority
-----BEGIN CERTIFICATE-----
<SNIP>
-----END CERTIFICATE-----
2 s:/C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=COMODO RSA Certification Authority
   i:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root
-----BEGIN CERTIFICATE-----
<SNIP>
-----END CERTIFICATE-----
3 s:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root
   i:/C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root
-----BEGIN CERTIFICATE-----
<SNIP>
-----END CERTIFICATE-----
---
Server certificate
subject=/OU=Domain Control Validated/OU=PositiveSSL/CN=web1.example.com
issuer=/C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=COMODO RSA Domain Validation Secure Server CA
---
No client certificate CA names sent
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 6299 bytes and written 326 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
    Session-ID: 6D5EC20FC1493D55F28309A02B1B589268F251D625A7EB3B5958426225C51795
    Session-ID-ctx:
    Master-Key: D8B2A8181AB5FB4BC6A55ED226CFA9D0F77CF539CE3E4A9FAE6524D631B42BB057375E96BD4EB6014C02996BD6A645C4
    Start Time: 1543861687
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
---
^C
foobarBYPASSVPN $ openssl s_client -connect imaps.example.com:993 -showcerts
CONNECTED(00000005)
depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com <http://www.digicert.com>, CN = DigiCert High Assurance EV Root CA
verify return:1
depth=1 C = US, O = DigiCert Inc, OU = www.digicert.com <http://www.digicert.com>, CN = DigiCert SHA2 High Assurance Server CA
verify return:1
depth=0 C = ZZ, L = MYTOWN, O = MYCORP, CN = imaps.example.com
verify return:1
---
Certificate chain
0 s:/C=ZZ/L=MYTOWN/O=MYCORP/CN=imaps.example.com
   i:/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert <http://www.digicert.com/CN=DigiCert> SHA2 High Assurance Server CA
-----BEGIN CERTIFICATE-----
<SNIP>
-----END CERTIFICATE-----
1 s:/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert <http://www.digicert.com/CN=DigiCert> SHA2 High Assurance Server CA
   i:/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert <http://www.digicert.com/CN=DigiCert> High Assurance EV Root CA
-----BEGIN CERTIFICATE-----
<SNIP>
-----END CERTIFICATE-----
2 s:/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert <http://www.digicert.com/CN=DigiCert> High Assurance EV Root CA
   i:/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert <http://www.digicert.com/CN=DigiCert> High Assurance EV Root CA
-----BEGIN CERTIFICATE-----
<SNIP>
-----END CERTIFICATE-----
---
Server certificate
subject=/C=ZZ/L=MYTOWN/O=MYCORP/CN=imaps.example.com
issuer=/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert <http://www.digicert.com/CN=DigiCert> SHA2 High Assurance Server CA
---
No client certificate CA names sent
Server Temp Key: ECDH, P-384, 384 bits
---
SSL handshake has read 4243 bytes and written 358 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES128-GCM-SHA256
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES128-GCM-SHA256
    Session-ID: F8BB41516529FA657513FB23B803D7CA0990B674446CB78A9D71184C93A810FE
    Session-ID-ctx:
    Master-Key: FCA69ED068B34A1A3B1256A0390A9508357762AFC9E9EEA605979B6A6CD3C2EEA5CEB29E9A67DF219213C924E29328A7
    TLS session ticket lifetime hint: 300 (seconds)
    TLS session ticket:
<SNIP>
    Start Time: 1543861700
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
---
* OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE AUTH=PLAIN AUTH=LOGIN] Dovecot ready.
^C
Theodore Wynnychenko
2018-12-08 22:03:11 UTC
Permalink
Rachel,
As a first step, try using s_client to connect to a TLS service and
$ openssl s_client -connect <hostname>:<port> -showcerts
There are more possible options on s_client to debug more deeply but
this is a good start.
--Paul
In answer to the above. Testing against three "random" servers (see
bottom of the email for full exchange, top three are through VPN, rest
I wanted to follow up on this.

I updated the servers that create the iked VPN to the 12/5 snapshot the other day.

I then took one machine on the "remote" net and ran openssl s_server.
I had another machine on the "local" net try to connect with openssl s_client.

So, the s_client shows:

SSL_connect:before/connect initialization
SSL_connect:SSLv3 write client hello A
... and nothing more.

The s_server shows:

Using auto DH parameters
Using default temp ECDH parameters
ACCEPT
SSL_accept:before/accept initialization
... and nothing more.

I also had tcpdump running at several places along the route.

On the outgoing/sending interface of the "s_client" machine I see:

21:19:54.735257 172.30.1.254.44715 > 172.30.7.205.443: S 950714671:950714671(0) win 16384 <mss 1460,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 523073824 0> (DF)

21:19:54.773320 172.30.7.205.443 > 172.30.1.254.44715: S 668125506:668125506(0) ack 950714672 win 16384 <mss 1300,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 590367976 523073824>

21:19:54.773391 172.30.1.254.44715 > 172.30.7.205.443: . ack 1 win 256 <nop,nop,timestamp 523073824 590367976> (DF)
21:19:54.774143 172.30.1.254.44715 > 172.30.7.205.443: P 1:197(196) ack 1 win 256 <nop,nop,timestamp 523073824 590367976> (DF)

21:19:56.272544 172.30.1.254.44715 > 172.30.7.205.443: P 1:197(196) ack 1 win 256 <nop,nop,timestamp 523073827 590367976> (DF)

21:19:59.272615 172.30.1.254.44715 > 172.30.7.205.443: P 1:197(196) ack 1 win 256 <nop,nop,timestamp 523073833 590367976> (DF)

21:20:05.272786 172.30.1.254.44715 > 172.30.7.205.443: P 1:197(196) ack 1 win 256 <nop,nop,timestamp 523073845 590367976>

21:20:10.743468 172.30.1.254.44715 > 172.30.7.205.443: F 197:197(0) ack 1 win 256 <nop,nop,timestamp 523073856 590367976>

21:20:10.781912 172.30.7.205.443 > 172.30.1.254.44715: . ack 1 win 261 <nop,nop,timestamp 590368008 523073824,nop,nop,sack 1 {197:197} >

21:20:12.124726 172.30.7.205.443 > 172.30.1.254.44715: F 1:1(0) ack 1 win 261 <nop,nop,timestamp 590368011 523073824>
21:20:12.124786 172.30.1.254.44715 > 172.30.7.205.443: F 197:197(0) ack 2 win 256 <nop,nop,timestamp 523073858 590368011>

21:20:12.162326 172.30.7.205.443 > 172.30.1.254.44715: . ack 1 win 261 <nop,nop,timestamp 590368011 523073824>
21:20:17.273069 172.30.1.254.44715 > 172.30.7.205.443: FP 1:197(196) ack 2 win 256 <nop,nop,timestamp 523073869 590368011> (DF)


On the incoming/receiving interface of the "local" iked machine I see:

21:19:54.737490 172.30.1.254.44715 > 172.30.7.205.443: S 950714671:950714671(0) win 16384 <mss 1460,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 523073824 0> (DF)

21:19:54.775299 172.30.7.205.443 > 172.30.1.254.44715: S 668125506:668125506(0) ack 950714672 win 16384 <mss 1300,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 590367976 523073824>

21:19:54.775625 172.30.1.254.44715 > 172.30.7.205.443: . ack 1 win 256 <nop,nop,timestamp 523073824 590367976> (DF)
21:19:54.776378 172.30.1.254.44715 > 172.30.7.205.443: P 1:197(196) ack 1 win 256 <nop,nop,timestamp 523073824 590367976> (DF)

21:19:56.274790 172.30.1.254.44715 > 172.30.7.205.443: P 1:197(196) ack 1 win 256 <nop,nop,timestamp 523073827 590367976> (DF)

21:19:59.274859 172.30.1.254.44715 > 172.30.7.205.443: P 1:197(196) ack 1 win 256 <nop,nop,timestamp 523073833 590367976> (DF)

21:20:05.275017 172.30.1.254.44715 > 172.30.7.205.443: P 1:197(196) ack 1 win 256 <nop,nop,timestamp 523073845 590367976>

21:20:10.745731 172.30.1.254.44715 > 172.30.7.205.443: F 197:197(0) ack 1 win 256 <nop,nop,timestamp 523073856 590367976>

21:20:10.783860 172.30.7.205.443 > 172.30.1.254.44715: . ack 1 win 261 <nop,nop,timestamp 590368008 523073824,nop,nop,sack 1 {197:197} >

21:20:12.126709 172.30.7.205.443 > 172.30.1.254.44715: F 1:1(0) ack 1 win 261 <nop,nop,timestamp 590368011 523073824>
21:20:12.127041 172.30.1.254.44715 > 172.30.7.205.443: F 197:197(0) ack 2 win 256 <nop,nop,timestamp 523073858 590368011>

21:20:12.164312 172.30.7.205.443 > 172.30.1.254.44715: . ack 1 win 261 <nop,nop,timestamp 590368011 523073824>


But, on the outgoing/sending interface of the "remote" iked machine, all that I see is:

21:19:54.733973 172.30.1.254.44715 > 172.30.7.205.443: S 4173630539:4173630539(0) win 16384 <mss 1300,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 523073824 0>

21:19:54.734355 172.30.7.205.443 > 172.30.1.254.44715: S 2645985599:2645985599(0) ack 4173630540 win 16384 <mss 1300,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 590367976 523073824>

21:19:54.773048 172.30.1.254.44715 > 172.30.7.205.443: . ack 1 win 256 <nop,nop,timestamp 523073824 590367976>
21:20:10.742843 172.30.1.254.44715 > 172.30.7.205.443: F 197:197(0) ack 1 win 256 <nop,nop,timestamp 523073856 590367976>

21:20:10.743111 172.30.7.205.443 > 172.30.1.254.44715: . ack 1 win 261 <nop,nop,timestamp 590368008 523073824,nop,nop,sack 1 {197:197} >

21:20:12.085788 172.30.7.205.443 > 172.30.1.254.44715: F 1:1(0) ack 1 win 261 <nop,nop,timestamp 590368011 523073824>
21:20:12.123252 172.30.1.254.44715 > 172.30.7.205.443: F 197:197(0) ack 2 win 256 <nop,nop,timestamp 523073858 590368011>

21:20:12.123472 172.30.7.205.443 > 172.30.1.254.44715: . ack 1 win 261 <nop,nop,timestamp 590368011 523073824>


And that is all that gets delivered to the incoming/receiving interface of the "s_server" machine:

21:19:54.710031 172.30.1.254.44715 > 172.30.7.205.443: S 4173630539:4173630539(0) win 16384 <mss 1300,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 523073824 0>

21:19:54.710134 172.30.7.205.443 > 172.30.1.254.44715: S 2645985599:2645985599(0) ack 4173630540 win 16384 <mss 1300,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 590367976 523073824>

21:19:54.749110 172.30.1.254.44715 > 172.30.7.205.443: . ack 1 win 256 <nop,nop,timestamp 523073824 590367976>
21:20:10.718972 172.30.1.254.44715 > 172.30.7.205.443: F 197:197(0) ack 1 win 256 <nop,nop,timestamp 523073856 590367976>

21:20:10.719023 172.30.7.205.443 > 172.30.1.254.44715: . ack 1 win 261 <nop,nop,timestamp 590368008 523073824,nop,nop,sack 1 {197:197} >

21:20:12.061678 172.30.7.205.443 > 172.30.1.254.44715: F 1:1(0) ack 1 win 261 <nop,nop,timestamp 590368011 523073824>
21:20:12.099433 172.30.1.254.44715 > 172.30.7.205.443: F 197:197(0) ack 2 win 256 <nop,nop,timestamp 523073858 590368011>

21:20:12.099484 172.30.7.205.443 > 172.30.1.254.44715: . ack 1 win 261 <nop,nop,timestamp 590368011 523073824>


Now, if I try connecting using s_client ON the "remote" iked machine (so, a connection that does not include the iked tunnel), everything works, and tcpdump shows ("the expected?") data and traffic leaving:

21:36:56.027413 172.30.7.1.33610 > 172.30.7.205.443: S 2112686897:2112686897(0) win 16384 <mss 1460,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 3723402680 0> (DF)

21:36:56.027768 172.30.7.205.443 > 172.30.7.1.33610: S 3448062619:3448062619(0) ack 2112686898 win 16384 <mss 1300,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 3091368224 3723402680>

21:36:56.027817 172.30.7.1.33610 > 172.30.7.205.443: . ack 1 win 256 <nop,nop,timestamp 3723402680 3091368224> (DF)
21:36:56.028403 172.30.7.1.33610 > 172.30.7.205.443: P 1:197(196) ack 1 win 256 <nop,nop,timestamp 3723402680 3091368224> (DF)

21:36:56.046516 172.30.7.205.443 > 172.30.7.1.33610: . 1:1289(1288) ack 197 win 261 <nop,nop,timestamp 3091368224 3723402680>

21:36:56.046518 172.30.7.205.443 > 172.30.7.1.33610: . 1289:2577(1288) ack 197 win 261 <nop,nop,timestamp 3091368224 3723402680>

21:36:56.046519 172.30.7.205.443 > 172.30.7.1.33610: . 2577:3865(1288) ack 197 win 261 <nop,nop,timestamp 3091368224 3723402680>

21:36:56.046520 172.30.7.205.443 > 172.30.7.1.33610: P 3865:4097(232) ack 197 win 261 <nop,nop,timestamp 3091368224 3723402680>

21:36:56.046607 172.30.7.1.33610 > 172.30.7.205.443: . ack 2577 win 215 <nop,nop,timestamp 3723402680 3091368224> (DF)
21:36:56.046793 172.30.7.1.33610 > 172.30.7.205.443: . ack 4097 win 192 <nop,nop,timestamp 3723402680 3091368224> (DF)
21:36:56.047147 172.30.7.205.443 > 172.30.7.1.33610: P 4097:4473(376) ack 197 win 261 <nop,nop,timestamp 3091368224 3723402680>

21:36:56.047196 172.30.7.1.33610 > 172.30.7.205.443: . ack 4473 win 246 <nop,nop,timestamp 3723402680 3091368224> (DF)
21:36:56.053509 172.30.7.1.33610 > 172.30.7.205.443: P 197:315(118) ack 4473 win 256 <nop,nop,timestamp 3723402680 3091368224> (DF)

21:36:56.055675 172.30.7.205.443 > 172.30.7.1.33610: P 4473:4691(218) ack 315 win 261 <nop,nop,timestamp 3091368224 3723402680>

21:36:56.250855 172.30.7.1.33610 > 172.30.7.205.443: . ack 4691 win 256 <nop,nop,timestamp 3723402681 3091368224> (DF)
21:36:58.540398 172.30.7.1.33610 > 172.30.7.205.443: F 315:315(0) ack 4691 win 256 <nop,nop,timestamp 3723402685 3091368224> (DF)

21:36:58.540581 172.30.7.205.443 > 172.30.7.1.33610: . ack 316 win 261 <nop,nop,timestamp 3091368229 3723402685>
21:36:58.541186 172.30.7.205.443 > 172.30.7.1.33610: F 4691:4691(0) ack 316 win 261 <nop,nop,timestamp 3091368229 3723402685>

21:36:58.541219 172.30.7.1.33610 > 172.30.7.205.443: . ack 4692 win 256 <nop,nop,timestamp 3723402685 3091368229> (DF)


I am no expert, but I can see that this "local" connection sends a lot more data.

So, today I was going to try this again, now looking at physical interfaces and the enc0 interface on the iked endpoints.

But, for whatever reason, I did not have to, because this morning https was working without a problem over the iked VPN.


Unfortunately, I noticed that there was problem with icinga2 (which monitors the hosts on the "remote" net). I noticed that even though the hosts were up, icinga2 was reporting them as down.

I found that on the (alternately) remote or local iked host, icinga connections (over port 5665) were being blocked even though there is a specific "pass rule" in pf.conf to permit them.

For example, in the log I see:
Dec 8 15:50:01 ... pf: Dec 08 15:48:49.346816 rule 4/(match) block out on em0: 172.30.7.205.22112 > 172.30.2.99.5665: R 3963276584:3963276584(0) ack 252894831 win 0

But, pfctl is running with following:

# pfctl -s rules
match in all scrub (no-df random-id max-mss 1300)
pass in quick on em1 all flags S/SA
pass out quick on em1 all flags S/SA
block drop in log on em0 all
block drop out log on em0 all
...
pass quick inet proto tcp from 172.30.7.205 to 172.30.2.99 port = 5665 flags S/SA
... and on.

There are no other "quick" rules between the default block and the quick pass rule that should allow the packet.

I don't understand why the packet is blocked when it should specifically (and "quickly") be passed.

I played with my pf.conf for a while, and, all of a sudden, icinga was able to connect. I undid my changes (things like removing the default block rules) and it continued to work.

I decided to reboot several of the hosts to see if things would be stable. But, they are not.

I now find I can no longer connect to with TLS/SSL over the iked tunnel (the original behavior that seemed to have corrected itself). Also, icinga continues to be unable to verify the status of the remote hosts over port 5665.

I don't have time right now to try using s_client and s_server and watching enc0 to see what is happening, but I will when I can.

If anyone has an ideas on what may be happening, please let me know.

Thanks
Ted
Theodore Wynnychenko
2018-12-11 01:04:28 UTC
Permalink
I would like to re-title this as something like "pf and iked instability on recent snapshots," but don’t know if doing so would break the mailing list thread, exiso, I left the subject unchanged...
-----Original Message-----
Sent: Saturday, December 08, 2018 4:03 PM
Cc: 'Rachel Roch'
Subject: RE: TLS suddenly not working over IKED site-to-site
.
.
.
I now find I can no longer connect to with TLS/SSL over the iked tunnel
(the original behavior that seemed to have corrected itself). Also,
icinga continues to be unable to verify the status of the remote hosts
over port 5665.
I don't have time right now to try using s_client and s_server and
watching enc0 to see what is happening, but I will when I can.
If anyone has an ideas on what may be happening, please let me know.
Thanks
Ted
Hello again;

So, I am at a complete loss to understand what is going on.
Today, I tried using openssl s_client and s_server to make a connection through the iked vpn (as I described in my last post). However, with NO changes to iked.conf or pf.conf, today I had several connection attempts that completed correctly. I have not included any output from those sporadic, completely functional connections.

Rather, today, most of the connections by s_client are not even acknowledged by the s_server on the other side of the iked vpn.

For example:
On the s_client machine:

# openssl s_client -state -connect "remote.host":https
SSL_connect:before/connect initialization
SSL_connect:SSLv3 write client hello A
... and nothing more ...

But on the s_server machine today all I see is:
# openssl s_sever -state -accept https ...certificate options...
Using auto DH parameters
Using default temp ECDH parameters
ACCEPT
... and no connection attempt is ever acknowledged ...

(Yesterday, at least this first part of the connection was received by the s_server:
Using auto DH parameters
Using default temp ECDH parameters
ACCEPT
SSL_accept:before/accept initialization
... and nothing more yesterday ...)


So, today using tcpdump on the outgoing interface of the s_client machine and the incoming interface of the "local" iked vpn endpoint shows:

16:43:05.107524 172.30.1.254.7305 > 172.30.7.205.443: S 1751796302:1751796302(0) win 16384 <mss 1460,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 2698316052 0>

16:43:05.149146 172.30.1.254.7305 > 172.30.7.205.443: . ack 2119500805 win 256 <nop,nop,timestamp 2698316052 3536824996>

16:43:05.149895 172.30.1.254.7305 > 172.30.7.205.443: P 0:196(196) ack 1 win 256 <nop,nop,timestamp 2698316052 3536824996>

16:43:06.648487 172.30.1.254.7305 > 172.30.7.205.443: P 0:196(196) ack 1 win 256 <nop,nop,timestamp 2698316055 3536824996>

16:43:09.648557 172.30.1.254.7305 > 172.30.7.205.443: P 0:196(196) ack 1 win 256 <nop,nop,timestamp 2698316061 3536824996>

16:43:09.948433 172.30.1.254.7305 > 172.30.7.205.443: F 196:196(0) ack 1 win 256 <nop,nop,timestamp 2698316061 3536824996>

16:43:15.648712 172.30.1.254.7305 > 172.30.7.205.443: FP 0:196(196) ack 1 win 256 <nop,nop,timestamp 2698316073 3536825005>

And this traffic (incomplete thought it may be for an ssl handshake) appears to be passed to enc0 intact:

16:43:05.105044 (authentic,confidential): SPI 0x151333df: 172.30.1.254.7305 > 172.30.7.205.443: S 3570513915:3570513915(0) win 16384 <mss 1300,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 2698316052 0> (encap)

16:43:05.146122 (authentic,confidential): SPI 0xe1c30e4a: 172.30.7.205.443 > 172.30.1.254.7305: S 1312941075:1312941075(0) ack 3570513916 win 16384 <mss 1300,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 3536824996 2698316052> (encap)

16:43:05.146654 (authentic,confidential): SPI 0x151333df: 172.30.1.254.7305 > 172.30.7.205.443: . ack 1 win 256 <nop,nop,timestamp 2698316052 3536824996> (encap)

16:43:05.147365 (unprotected): SPI 0x0000ef27: 172.30.1.254.7305 > 172.30.7.205.443: P 1:197(196) ack 1 win 256 <nop,nop,timestamp 2698316052 3536824996> (encap)

16:43:06.645932 (unprotected): SPI 0x0000ef27: 172.30.1.254.7305 > 172.30.7.205.443: P 1:197(196) ack 1 win 256 <nop,nop,timestamp 2698316055 3536824996> (encap)

16:43:09.646049 (unprotected): SPI 0x0000ef27: 172.30.1.254.7305 > 172.30.7.205.443: P 1:197(196) ack 1 win 256 <nop,nop,timestamp 2698316061 3536824996> (encap)

16:43:09.945908 (authentic,confidential): SPI 0x151333df: 172.30.1.254.7305 > 172.30.7.205.443: F 197:197(0) ack 1 win 256 <nop,nop,timestamp 2698316061 3536824996> (encap)

16:43:09.981966 (authentic,confidential): SPI 0xe1c30e4a: 172.30.7.205.443 > 172.30.1.254.7305: . ack 1 win 261 <nop,nop,timestamp 3536825005 2698316052,nop,nop,sack 1 {197:197} > (encap)

16:43:15.646158 (unprotected): SPI 0x0000ef27: 172.30.1.254.7305 > 172.30.7.205.443: FP 1:197(196) ack 1 win 256 <nop,nop,timestamp 2698316073 3536825005> (encap)


BUT, at the other end of the VPN, on enc0, all that is seen leaving the iked VPN tunnel is:

16:43:05.130558 (authentic,confidential): SPI 0x151333df: 172.30.1.254.7305 > 172.30.7.205.443: S 3570513915:3570513915(0) win 16384 <mss 1300,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 2698316052 0> (encap)

16:43:05.131049 (authentic,confidential): SPI 0xe1c30e4a: 172.30.7.205.443 > 172.30.1.254.7305: S 1312941075:1312941075(0) ack 3570513916 win 16384 <mss 1300,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 3536824996 2698316052> (encap)

16:43:05.174802 (authentic,confidential): SPI 0x151333df: 172.30.1.254.7305 > 172.30.7.205.443: . ack 1 win 256 <nop,nop,timestamp 2698316052 3536824996> (encap)

16:43:09.966420 (authentic,confidential): SPI 0x151333df: 172.30.1.254.7305 > 172.30.7.205.443: F 197:197(0) ack 1 win 256 <nop,nop,timestamp 2698316061 3536824996> (encap)

16:43:09.966853 (authentic,confidential): SPI 0xe1c30e4a: 172.30.7.205.443 > 172.30.1.254.7305: . ack 1 win 261 <nop,nop,timestamp 3536825005 2698316052,nop,nop,sack 1 {197:197} > (encap)


I have no idea what this all means, or what to do with it.
But, I am following up in case anybody has any idea of what may be happening.

Also, yesterday I described how the local iked machine appeared to be blocking packets that were explicitly allowed by pf.conf. From my post yesterday:

( For example, in the log I see:
Dec 8 15:50:01 ... pf: Dec 08 15:48:49.346816 rule 4/(match) block out on em0: 172.30.7.205.22112 > 172.30.2.99.5665: R 3963276584:3963276584(0) ack 252894831 win 0

But, pfctl is running with following:

# pfctl -s rules
match in all scrub (no-df random-id max-mss 1300)
pass in quick on em1 all flags S/SA
pass out quick on em1 all flags S/SA
block drop in log on em0 all
block drop out log on em0 all
...
pass quick inet proto tcp from 172.30.7.205 to 172.30.2.99 port = 5665 flags S/SA
... and on. )

Well, whatever was happening appears to have been resolved, because at about midnight local time on Sunday morning, icinga2 declared that the host was back up.

To be clear, I have made no changes to either pf.conf or iked.conf on any of the machines involved in this testing from Saturday.

Also, this had all been stable for the last (about) 2 years, until about two-three weeks ago. I did have another post, where I discussed the fact the iked VPN had failed to be reestablished after an update about 3-4 snapshots back. I got it working again by changing the local endpoint on the "remote" iked machine from the internal ip associated with the internal interface to an internal "alias" ip address associated with the outgoing/external interface of that machine.

But, again, it had been working for 2 years until the recent update.

I don't have any idea of what may be helpful in figuring out what I am doing wrong, or what has changed, but I am happy to provide any information that may be of help.

I don't believe I have the knowledge to do more on my own at this point.

Thanks for any advice.
Ted

Loading...