Discussion:
relayd stops processing traffic intermittently
(too old to reply)
Mischa
2017-12-23 10:40:57 UTC
Permalink
Raw Message
Hi All,

Since OpenBSD 6.2, just confirmed this in the latest snapshot (GENERIC.MP#305) as well, for some reason relayd stops processing traffic and starts flooding the log file with the following message:

Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
[snip]
Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
...etc...

Restarting the daemon "fixes" the problem.
Not sure how to trouble shoot this but I am able to reproduce this consistently by pointing SSLLabs towards relayd.
Would be great to get some pointers.

Anonymised config below:
# relayd.conf
local_v4 = "xxx"
local_v6 = "xxx"
table <localhost> { 127.0.0.1 }

www1_addr_v4 = "xxx"
www1_addr_v6 = "xxx"
table <www1hosts> { xxx }

www3_addr_v4 = "xxx"
www3_addr_v6 = "xxx"
table <www3hosts> { xxx }

cust1_addr_v4 = "xxx"
cust1_addr_v6 = "xxx"

cust2_addr_v4 = "xxx"
cust3_addr_v4 = "xxx"
cust4_addr_v4 = "xxx"
table <cust_3hosts> { xxx }
table <cust_3hosts_fallback> { xxx }
table <cust4_hosts> { xxx }

cust5_addr_v4 = "xxx"
table <cust5_hosts> { xxx }

http protocol httpfilter_default {
match request header remove "Proxy"
match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
match request header append "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT"
match response header set "Server" value "Sever"
match response header set "X-Powered-By" value "Power"
tcp { no splice }
}
http protocol httpsfilter_default {
match request header remove "Proxy"
match request header set "X-ClientIP" value "$REMOTE_ADDR"
match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
match request header append "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT"
match response header set "Strict-Transport-Security" value "max-age=31536000"
match response header set "Server" value "Sever"
match response header set "X-Powered-By" value "Power"
match request quick header "Host" value "images.webcam.nl" forward to <imageshosts>
tcp { no splice }
tls { no client-renegotiation }
}

http protocol httpfilter {
match request header remove "Proxy"
match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
match request header append "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT"
match response header set "Content-Security-Policy" value "default-src high5.nl; script-src https://high5.nl http://www.w3.org/; style-src 'self' 'unsafe-inline'; img-src 'self'"
match response header set "Server" value "Sever"
match response header set "X-Powered-By" value "Power"
match response header set "X-Frame-Options" value "SAMEORIGIN"
match response header set "X-Xss-Protection" value "1; mode=block"
match response header set "X-Content-Type-Options" value "nosniff"
match request quick header "Host" value "*xxx*" forward to <www1hosts>
match request quick header "Host" value "*xxx*" forward to <www1hosts>
tcp { no splice }
}
http protocol httpsfilter {
return error
match request header remove "Proxy"
match request header set "X-ClientIP" value "$REMOTE_ADDR"
match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
match request header append "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT"
match response header set "Strict-Transport-Security" value "max-age=31536000"
match response header set "Content-Security-Policy" value "default-src high5.nl; script-src https://high5.nl http://www.w3.org/; style-src 'self' 'unsafe-inline'; img-src 'self'"
match response header set "Server" value "Sever"
match response header set "X-Powered-By" value "Power"
match response header set "X-Frame-Options" value "SAMEORIGIN"
match response header set "X-Xss-Protection" value "1; mode=block"
match response header set "X-Content-Type-Options" value "nosniff"
tcp { no splice }
tls { no client-renegotiation }
}
relay default {
listen on $local_v4 port 80
listen on $local_v6 port 80
protocol httpfilter_default
forward to <localhost> port 8080
}
relay default_redirect {
listen on $cust2_addr_v4 port 80
listen on $cust3_addr_v4 port 80
listen on $cust4_addr_v4 port 80
listen on $cust5_addr_v4 port 80
listen on $cust1_addr_v4 port 80
listen on $cust1_addr_v6 port 80
protocol httpfilter_default
forward to <localhost> port 8081
}
relay default_redirect_tls {
listen on $cust4_addr_v4 port 443 tls
protocol httpsfilter_default
forward to <localhost> port 8081
}
relay www1 {
listen on $www1_addr_v4 port 80
listen on $www1_addr_v6 port 80
protocol httpfilter
forward to <localhost> port 8081
forward to <www1hosts> port 80
}
relay www1_tls {
listen on $www1_addr_v4 port 443 tls
listen on $www1_addr_v6 port 443 tls
protocol httpsfilter
forward to <www1hosts> port 80 mode roundrobin check http "/" host www1 code 200
#forward to <comic> port 80
}
relay www3 {
listen on $www3_addr_v4 port 80
listen on $www3_addr_v6 port 80
forward to <www3hosts> port 80
}
relay www3_tls {
listen on $www3_addr_v4 port 443 tls
listen on $www3_addr_v6 port 443 tls
forward with tls to <www3hosts> port 443
}
relay cust2_tls {
listen on $cust2_addr_v4 port 443 tls
protocol httpsfilter_default
forward to <cust2_hosts> port 80 check http "/" host cust2 code 200
forward to <cust2_hosts_fallback> port 80 check http "/" host cust2 code 200
forward to <cust4_hosts> port 80
}
relay cust3_tls {
listen on $cust3_addr_v4 port 443 tls
protocol httpsfilter_default
forward to <cust3_hosts> port 80 check http "/" host cust3 code 200
forward to <cust3_hosts_fallback> port 80 check http "/" host cust3 code 200
}
relay cust5_tls {
listen on $cust5_addr_v4 port 443 tls
protocol httpsfilter_default
forward to <cust5_hosts> port 80
}

Mischa
Claudio Jeker
2017-12-23 12:08:21 UTC
Permalink
Raw Message
Post by Mischa
Hi All,
Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
[snip]
Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
...etc...
Restarting the daemon "fixes" the problem.
Not sure how to trouble shoot this but I am able to reproduce this consistently by pointing SSLLabs towards relayd.
Would be great to get some pointers.
I have seen this as well on our production systems. This is a problem in
the privsep part of the TLS code. I could not do more testing yet but my
assumption is that a new option / feature is freaking this code out.
--
:wq Claudio
Post by Mischa
# relayd.conf
local_v4 = "xxx"
local_v6 = "xxx"
table <localhost> { 127.0.0.1 }
www1_addr_v4 = "xxx"
www1_addr_v6 = "xxx"
table <www1hosts> { xxx }
www3_addr_v4 = "xxx"
www3_addr_v6 = "xxx"
table <www3hosts> { xxx }
cust1_addr_v4 = "xxx"
cust1_addr_v6 = "xxx"
cust2_addr_v4 = "xxx"
cust3_addr_v4 = "xxx"
cust4_addr_v4 = "xxx"
table <cust_3hosts> { xxx }
table <cust_3hosts_fallback> { xxx }
table <cust4_hosts> { xxx }
cust5_addr_v4 = "xxx"
table <cust5_hosts> { xxx }
http protocol httpfilter_default {
match request header remove "Proxy"
match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
match request header append "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT"
match response header set "Server" value "Sever"
match response header set "X-Powered-By" value "Power"
tcp { no splice }
}
http protocol httpsfilter_default {
match request header remove "Proxy"
match request header set "X-ClientIP" value "$REMOTE_ADDR"
match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
match request header append "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT"
match response header set "Strict-Transport-Security" value "max-age=31536000"
match response header set "Server" value "Sever"
match response header set "X-Powered-By" value "Power"
match request quick header "Host" value "images.webcam.nl" forward to <imageshosts>
tcp { no splice }
tls { no client-renegotiation }
}
http protocol httpfilter {
match request header remove "Proxy"
match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
match request header append "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT"
match response header set "Content-Security-Policy" value "default-src high5.nl; script-src https://high5.nl http://www.w3.org/; style-src 'self' 'unsafe-inline'; img-src 'self'"
match response header set "Server" value "Sever"
match response header set "X-Powered-By" value "Power"
match response header set "X-Frame-Options" value "SAMEORIGIN"
match response header set "X-Xss-Protection" value "1; mode=block"
match response header set "X-Content-Type-Options" value "nosniff"
match request quick header "Host" value "*xxx*" forward to <www1hosts>
match request quick header "Host" value "*xxx*" forward to <www1hosts>
tcp { no splice }
}
http protocol httpsfilter {
return error
match request header remove "Proxy"
match request header set "X-ClientIP" value "$REMOTE_ADDR"
match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
match request header append "X-Forwarded-By" value "$SERVER_ADDR:$SERVER_PORT"
match response header set "Strict-Transport-Security" value "max-age=31536000"
match response header set "Content-Security-Policy" value "default-src high5.nl; script-src https://high5.nl http://www.w3.org/; style-src 'self' 'unsafe-inline'; img-src 'self'"
match response header set "Server" value "Sever"
match response header set "X-Powered-By" value "Power"
match response header set "X-Frame-Options" value "SAMEORIGIN"
match response header set "X-Xss-Protection" value "1; mode=block"
match response header set "X-Content-Type-Options" value "nosniff"
tcp { no splice }
tls { no client-renegotiation }
}
relay default {
listen on $local_v4 port 80
listen on $local_v6 port 80
protocol httpfilter_default
forward to <localhost> port 8080
}
relay default_redirect {
listen on $cust2_addr_v4 port 80
listen on $cust3_addr_v4 port 80
listen on $cust4_addr_v4 port 80
listen on $cust5_addr_v4 port 80
listen on $cust1_addr_v4 port 80
listen on $cust1_addr_v6 port 80
protocol httpfilter_default
forward to <localhost> port 8081
}
relay default_redirect_tls {
listen on $cust4_addr_v4 port 443 tls
protocol httpsfilter_default
forward to <localhost> port 8081
}
relay www1 {
listen on $www1_addr_v4 port 80
listen on $www1_addr_v6 port 80
protocol httpfilter
forward to <localhost> port 8081
forward to <www1hosts> port 80
}
relay www1_tls {
listen on $www1_addr_v4 port 443 tls
listen on $www1_addr_v6 port 443 tls
protocol httpsfilter
forward to <www1hosts> port 80 mode roundrobin check http "/" host www1 code 200
#forward to <comic> port 80
}
relay www3 {
listen on $www3_addr_v4 port 80
listen on $www3_addr_v6 port 80
forward to <www3hosts> port 80
}
relay www3_tls {
listen on $www3_addr_v4 port 443 tls
listen on $www3_addr_v6 port 443 tls
forward with tls to <www3hosts> port 443
}
relay cust2_tls {
listen on $cust2_addr_v4 port 443 tls
protocol httpsfilter_default
forward to <cust2_hosts> port 80 check http "/" host cust2 code 200
forward to <cust2_hosts_fallback> port 80 check http "/" host cust2 code 200
forward to <cust4_hosts> port 80
}
relay cust3_tls {
listen on $cust3_addr_v4 port 443 tls
protocol httpsfilter_default
forward to <cust3_hosts> port 80 check http "/" host cust3 code 200
forward to <cust3_hosts_fallback> port 80 check http "/" host cust3 code 200
}
relay cust5_tls {
listen on $cust5_addr_v4 port 443 tls
protocol httpsfilter_default
forward to <cust5_hosts> port 80
}
Mischa
Mischa Peters
2017-12-23 13:04:19 UTC
Permalink
Raw Message
Post by Claudio Jeker
Post by Mischa
Hi All,
Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
[snip]
Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
...etc...
Restarting the daemon "fixes" the problem.
Not sure how to trouble shoot this but I am able to reproduce this consistently by pointing SSLLabs towards relayd.
Would be great to get some pointers.
I have seen this as well on our production systems. This is a problem in
the privsep part of the TLS code. I could not do more testing yet but my
assumption is that a new option / feature is freaking this code out.
Anything I can do or collect to give you more information?

Mischa
Claudio Jeker
2017-12-23 14:46:29 UTC
Permalink
Raw Message
Post by Mischa Peters
Post by Claudio Jeker
Post by Mischa
Hi All,
Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
[snip]
Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
...etc...
Restarting the daemon "fixes" the problem.
Not sure how to trouble shoot this but I am able to reproduce this consistently by pointing SSLLabs towards relayd.
Would be great to get some pointers.
I have seen this as well on our production systems. This is a problem in
the privsep part of the TLS code. I could not do more testing yet but my
assumption is that a new option / feature is freaking this code out.
Anything I can do or collect to give you more information?
Your tip with SSLLabs is hopefully good enough to produce it at will.
I will try to fix this in the next days. Keep you posted.
--
:wq Claudio
Claudio Jeker
2017-12-24 18:07:11 UTC
Permalink
Raw Message
Post by Mischa Peters
Post by Claudio Jeker
Post by Mischa
Hi All,
Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
[snip]
Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
...etc...
Restarting the daemon "fixes" the problem.
Not sure how to trouble shoot this but I am able to reproduce this consistently by pointing SSLLabs towards relayd.
Would be great to get some pointers.
I have seen this as well on our production systems. This is a problem in
the privsep part of the TLS code. I could not do more testing yet but my
assumption is that a new option / feature is freaking this code out.
Anything I can do or collect to give you more information?
So, I think I found the problem. The ca process did not handle errors from
RSA_private_encrypt correctly. So once you got a bad signature in the
system chocked and stopped. This diff seems to work for me (against
SSLlabs).

Cheers
--
:wq Claudio

Index: ca.c
===================================================================
RCS file: /cvs/src/usr.sbin/relayd/ca.c,v
retrieving revision 1.31
diff -u -p -r1.31 ca.c
--- ca.c 28 Nov 2017 00:20:23 -0000 1.31
+++ ca.c 24 Dec 2017 18:01:20 -0000
@@ -266,9 +266,15 @@ ca_dispatch_relay(int fd, struct privsep
break;
}

+ if (cko.cko_tlen == -1) {
+ char buf[256];
+ log_warnx("%s: %s", __func__,
+ ERR_error_string(ERR_get_error(), buf));
+ }
+
iov[c].iov_base = &cko;
iov[c++].iov_len = sizeof(cko);
- if (cko.cko_tlen) {
+ if (cko.cko_tlen > 0) {
iov[c].iov_base = to;
iov[c++].iov_len = cko.cko_tlen;
}
@@ -381,12 +387,12 @@ rsae_send_imsg(int flen, const u_char *f

IMSG_SIZE_CHECK(&imsg, (&cko));
memcpy(&cko, imsg.data, sizeof(cko));
- if (IMSG_DATA_SIZE(&imsg) !=
- (sizeof(cko) + cko.cko_tlen))
- fatalx("data size");

ret = cko.cko_tlen;
- if (ret) {
+ if (ret > 0) {
+ if (IMSG_DATA_SIZE(&imsg) !=
+ (sizeof(cko) + ret))
+ fatalx("data size");
toptr = (u_char *)imsg.data + sizeof(cko);
memcpy(to, toptr, ret);
}
Mischa
2017-12-25 14:54:29 UTC
Permalink
Raw Message
Post by Claudio Jeker
Post by Mischa Peters
Post by Claudio Jeker
Post by Mischa
Hi All,
Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
[snip]
Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
...etc...
Restarting the daemon "fixes" the problem.
Not sure how to trouble shoot this but I am able to reproduce this consistently by pointing SSLLabs towards relayd.
Would be great to get some pointers.
I have seen this as well on our production systems. This is a problem in
the privsep part of the TLS code. I could not do more testing yet but my
assumption is that a new option / feature is freaking this code out.
Anything I can do or collect to give you more information?
So, I think I found the problem. The ca process did not handle errors from
RSA_private_encrypt correctly. So once you got a bad signature in the
system chocked and stopped. This diff seems to work for me (against
SSLlabs).
Awesome! Can confirm that it continues processing traffic when hitting it with sslabs.
Will also move it to a more bussier machine to see how that handles.

I am seeing the following messages now:
Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02
Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing
Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa routines:CRYPTO_internal:padding check failed
Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing

Mischa
Post by Claudio Jeker
Cheers
--
:wq Claudio
Index: ca.c
===================================================================
RCS file: /cvs/src/usr.sbin/relayd/ca.c,v
retrieving revision 1.31
diff -u -p -r1.31 ca.c
--- ca.c 28 Nov 2017 00:20:23 -0000 1.31
+++ ca.c 24 Dec 2017 18:01:20 -0000
@@ -266,9 +266,15 @@ ca_dispatch_relay(int fd, struct privsep
break;
}
+ if (cko.cko_tlen == -1) {
+ char buf[256];
+ log_warnx("%s: %s", __func__,
+ ERR_error_string(ERR_get_error(), buf));
+ }
+
iov[c].iov_base = &cko;
iov[c++].iov_len = sizeof(cko);
- if (cko.cko_tlen) {
+ if (cko.cko_tlen > 0) {
iov[c].iov_base = to;
iov[c++].iov_len = cko.cko_tlen;
}
@@ -381,12 +387,12 @@ rsae_send_imsg(int flen, const u_char *f
IMSG_SIZE_CHECK(&imsg, (&cko));
memcpy(&cko, imsg.data, sizeof(cko));
- if (IMSG_DATA_SIZE(&imsg) !=
- (sizeof(cko) + cko.cko_tlen))
- fatalx("data size");
ret = cko.cko_tlen;
- if (ret) {
+ if (ret > 0) {
+ if (IMSG_DATA_SIZE(&imsg) !=
+ (sizeof(cko) + ret))
+ fatalx("data size");
toptr = (u_char *)imsg.data + sizeof(cko);
memcpy(to, toptr, ret);
}
Mischa
2018-03-14 15:27:58 UTC
Permalink
Raw Message
Hi Claudio,
Post by Mischa
Post by Claudio Jeker
Post by Mischa Peters
Post by Claudio Jeker
Post by Mischa
Hi All,
Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
[snip]
Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
...etc...
Restarting the daemon "fixes" the problem.
Not sure how to trouble shoot this but I am able to reproduce this consistently by pointing SSLLabs towards relayd.
Would be great to get some pointers.
I have seen this as well on our production systems. This is a problem in
the privsep part of the TLS code. I could not do more testing yet but my
assumption is that a new option / feature is freaking this code out.
Anything I can do or collect to give you more information?
So, I think I found the problem. The ca process did not handle errors from
RSA_private_encrypt correctly. So once you got a bad signature in the
system chocked and stopped. This diff seems to work for me (against
SSLlabs).
Awesome! Can confirm that it continues processing traffic when hitting it with sslabs.
Will also move it to a more bussier machine to see how that handles.
Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02
Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing
Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa routines:CRYPTO_internal:padding check failed
Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing
Not sure if this is supposed to be taken care of, but I am still seeing the following messages in 6.3-beta.
$ uname -a
OpenBSD lb2l 6.3 GENERIC.MP#58 amd64

Mar 13 23:43:38 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02
Mar 13 23:43:39 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF072:rsa routines:CRYPTO_internal:padding check failed
Mar 13 23:43:48 lb2 relayd[14775]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02
Mar 13 23:44:03 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing

Any knobs that need to be turned?

Mischa
Tom Smyth
2018-03-14 15:47:01 UTC
Permalink
Raw Message
Hi all
I have had that issue in 6.0,6.1 and 6.2 I haven't tried current yet..
I haven't had enough time to diagnose it to provide an adequate
bug report myself.
just restarting relayd seems to resolve it
Im just confiming that I have seen this issue also ...
Post by Mischa
Hi Claudio,
Post by Mischa
Post by Claudio Jeker
Post by Mischa Peters
Post by Claudio Jeker
Post by Mischa
Hi All,
Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
[snip]
Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
...etc...
Restarting the daemon "fixes" the problem.
Not sure how to trouble shoot this but I am able to reproduce this consistently by pointing SSLLabs towards relayd.
Would be great to get some pointers.
I have seen this as well on our production systems. This is a problem in
the privsep part of the TLS code. I could not do more testing yet but my
assumption is that a new option / feature is freaking this code out.
Anything I can do or collect to give you more information?
So, I think I found the problem. The ca process did not handle errors from
RSA_private_encrypt correctly. So once you got a bad signature in the
system chocked and stopped. This diff seems to work for me (against
SSLlabs).
Awesome! Can confirm that it continues processing traffic when hitting it with sslabs.
Will also move it to a more bussier machine to see how that handles.
Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02
Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing
Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa routines:CRYPTO_internal:padding check failed
Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing
Not sure if this is supposed to be taken care of, but I am still seeing the following messages in 6.3-beta.
$ uname -a
OpenBSD lb2l 6.3 GENERIC.MP#58 amd64
Mar 13 23:43:38 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02
Mar 13 23:43:39 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF072:rsa routines:CRYPTO_internal:padding check failed
Mar 13 23:43:48 lb2 relayd[14775]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02
Mar 13 23:44:03 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing
Any knobs that need to be turned?
Mischa
--
Kindest regards,
Tom Smyth

Mobile: +353 87 6193172
The information contained in this E-mail is intended only for the
confidential use of the named recipient. If the reader of this message
is not the intended recipient or the person responsible for
delivering it to the recipient, you are hereby notified that you have
received this communication in error and that any review,
dissemination or copying of this communication is strictly prohibited.
If you have received this in error, please notify the sender
immediately by telephone at the number above and erase the message
You are requested to carry out your own virus check before
opening any attachment.
Mischa
2018-03-14 15:48:58 UTC
Permalink
Raw Message
Hi Tom!

Before, relayd would just "crash" and a reload was "solving" the problem.
In this instance it's luckily just extra logging. ;)

Mischa
Post by Tom Smyth
Hi all
I have had that issue in 6.0,6.1 and 6.2 I haven't tried current yet..
I haven't had enough time to diagnose it to provide an adequate
bug report myself.
just restarting relayd seems to resolve it
Im just confiming that I have seen this issue also ...
Post by Mischa
Hi Claudio,
Post by Mischa
Post by Claudio Jeker
Post by Mischa Peters
Post by Claudio Jeker
Post by Mischa
Hi All,
Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
[snip]
Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
...etc...
Restarting the daemon "fixes" the problem.
Not sure how to trouble shoot this but I am able to reproduce this consistently by pointing SSLLabs towards relayd.
Would be great to get some pointers.
I have seen this as well on our production systems. This is a problem in
the privsep part of the TLS code. I could not do more testing yet but my
assumption is that a new option / feature is freaking this code out.
Anything I can do or collect to give you more information?
So, I think I found the problem. The ca process did not handle errors from
RSA_private_encrypt correctly. So once you got a bad signature in the
system chocked and stopped. This diff seems to work for me (against
SSLlabs).
Awesome! Can confirm that it continues processing traffic when hitting it with sslabs.
Will also move it to a more bussier machine to see how that handles.
Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02
Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing
Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa routines:CRYPTO_internal:padding check failed
Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing
Not sure if this is supposed to be taken care of, but I am still seeing the following messages in 6.3-beta.
$ uname -a
OpenBSD lb2l 6.3 GENERIC.MP#58 amd64
Mar 13 23:43:38 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02
Mar 13 23:43:39 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF072:rsa routines:CRYPTO_internal:padding check failed
Mar 13 23:43:48 lb2 relayd[14775]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02
Mar 13 23:44:03 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing
Any knobs that need to be turned?
Mischa
--
Kindest regards,
Tom Smyth
Mobile: +353 87 6193172
The information contained in this E-mail is intended only for the
confidential use of the named recipient. If the reader of this message
is not the intended recipient or the person responsible for
delivering it to the recipient, you are hereby notified that you have
received this communication in error and that any review,
dissemination or copying of this communication is strictly prohibited.
If you have received this in error, please notify the sender
immediately by telephone at the number above and erase the message
You are requested to carry out your own virus check before
opening any attachment.
Claudio Jeker
2018-03-14 18:34:36 UTC
Permalink
Raw Message
Post by Mischa
Hi Claudio,
Post by Mischa
Post by Claudio Jeker
Post by Mischa Peters
Post by Claudio Jeker
Post by Mischa
Hi All,
Dec 23 11:19:11 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:12 lb2 relayd[22515]: rsae_send_imsg: poll timeout
[snip]
Dec 23 11:19:17 lb2 relayd[69641]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[22515]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[52110]: rsae_send_imsg: poll timeout
Dec 23 11:19:18 lb2 relayd[69641]: rsae_send_imsg: poll timeout
...etc...
Restarting the daemon "fixes" the problem.
Not sure how to trouble shoot this but I am able to reproduce this consistently by pointing SSLLabs towards relayd.
Would be great to get some pointers.
I have seen this as well on our production systems. This is a problem in
the privsep part of the TLS code. I could not do more testing yet but my
assumption is that a new option / feature is freaking this code out.
Anything I can do or collect to give you more information?
So, I think I found the problem. The ca process did not handle errors from
RSA_private_encrypt correctly. So once you got a bad signature in the
system chocked and stopped. This diff seems to work for me (against
SSLlabs).
Awesome! Can confirm that it continues processing traffic when hitting it with sslabs.
Will also move it to a more bussier machine to see how that handles.
Dec 25 15:51:07 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02
Dec 25 15:51:08 lb2 relayd[27420]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing
Dec 25 15:51:17 lb2 relayd[7541]: ca_dispatch_relay: error:04FFF072:rsa routines:CRYPTO_internal:padding check failed
Dec 25 15:51:33 lb2 relayd[73631]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing
Not sure if this is supposed to be taken care of, but I am still seeing the following messages in 6.3-beta.
$ uname -a
OpenBSD lb2l 6.3 GENERIC.MP#58 amd64
Mar 13 23:43:38 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02
Mar 13 23:43:39 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF072:rsa routines:CRYPTO_internal:padding check failed
Mar 13 23:43:48 lb2 relayd[14775]: ca_dispatch_relay: error:04FFF06B:rsa routines:CRYPTO_internal:block type is not 02
Mar 13 23:44:03 lb2 relayd[96581]: ca_dispatch_relay: error:04FFF071:rsa routines:CRYPTO_internal:null before block missing
Any knobs that need to be turned?
Unsure. The errors are from OpenSSL and explain why the key exchange blew
up. Currently this resets the connection.
I wanted to log them so they are more visible. Somebody with deep
knowledge of TLS and libcrypto may be able to answer if those are harmless
or if somebody is probing.
--
:wq Claudio
Loading...