Post by Claudio JekerPost by Daniel Ouellet================
Now with MD5 configure. We only add
tcp md5sig password test on bgpd side and
neighbor 66.63.12.108 password test on the Cisco side.
With bgpd master
Clear session from bgpd side, session comes back up right away.
Clear session from remote side, session comes back up with possible very
long delay.
With bgpd slave
Just can't establish a session what so ever! The Cisco side will get
stuck in the OpenSent mode and cycle a few times all without success.
66.63.12.108 4 65001 0 1 0 0 0 never OpenSent
I can't reproduce this. On my test setup all session come back up.
Configuration with MD5.
Well, let see if this help or not. Two example below. One might not be
very elegant, but I think it may well show the problem. I force the bgpd
to try to be slave using some filter on the Cisco router. The filter
WILL be temporary in my case anyway as I want the session to be stuck in
OpenSent mode and then at that time I will remove the filter an sit back
and watch. So, what happen is that the session will never come up, I
think it should anyway, but it doesn't.
Then when I see on the Cisco router OpenSent, I will simply remove the
filter to be 100% sure nothing is blocking the regular traffic and see
if the session can recover. It doesn't.
So, I use this filter to force this stage on the Interface facing the bgpd.
ip access-list extended bgpd-slave
permit tcp any eq bgp any neq bgp
deny tcp any neq bgp any eq bgp
permit ip any any
and apply it like this
interface FastEthernet0/0
description Connection to OpenBSD Test Lab
ip address 66.63.12.107 255.255.255.192
ip access-group bgpd-slave in
I save my config and to be ultra sure nothing else interfere, I simply
reload. No need to do that and it is stupid anyway, but just to be
paranoid here I do that.
After I can ping the Cisco for a few seconds, I initiate my bgpd on both
version of OpenBSD and then when I see the OpenSent stage on the Cisco
router, because even if it should establish a slave connection with this
filter, it doesn't. Why, I wish I knew, but anyway it doesn't. Then when
in OpenSent mode, I remove the filter for the interface totally to be
sure nothing is in the way. Also, remember no pf is running as well and
the two server are fresh install with nothing on them other then they
install and then configuring the bgpd. That's it.
So, when I see:
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down
State/PfxRcd
66.63.12.106 4 65001 0 1 0 0 0 never OpenSent
66.63.12.108 4 65001 0 1 0 0 0 never OpenSent
I do
no ip access-group bgpd-slave in
on my fast Ethernet interface and the sit back. Nothing will ever happen
here. No session will ever get up. Never! It will cycle in close -> idle
-> active -> OpenSent and then stay there for a few minutes and then
cycle again to the same point and do that over and over again.
What I see on the OpenBSD on 3.7 is
# bgpctl s neigh 66.63.12.107
BGP neighbor is 66.63.12.107, remote AS 65001
Description: iBGP Test
BGP version 4, remote router-id 0.0.0.0
BGP state = Active
Last read Never, holdtime 240s, keepalive interval 80s
Message statistics:
Sent Received
Opens 1 0
Notifications 0 0
Updates 0 0
Keepalives 0 0
Route Refresh 0 0
Total 1 0
Local host: 66.63.12.106, Local port: 179
Remote host: 66.63.12.107, Remote port: 14670
==========================
and at each cycle of close -> idle -> active -> OpenSent, the port above
will changed and in current, after the first cycle, it will show
Last error: unknown error code
instead and no ports informations and error logs like this:
Oct 7 05:44:42 dev2 bgpd[21803]: startup
Oct 7 05:44:42 dev2 bgpd[14625]: route decision engine ready
Oct 7 05:44:42 dev2 bgpd[16756]: listening on 66.63.12.106
Oct 7 05:44:42 dev2 bgpd[16756]: session engine ready
Oct 7 05:44:42 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test):
state change None -> Idle, reason: None
Oct 7 05:44:42 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test):
state change Idle -> Connect, reason: Start
Oct 7 05:44:42 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test):
state change Connect -> OpenSent, reason: Connection open
ed
Oct 7 05:44:42 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test):
write error: Invalid argument
Oct 7 05:44:42 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test):
state change OpenSent -> Idle, reason: Fatal error
Oct 7 05:44:49 dev2 ntpd[24590]: adjusting local clock by -170.192293s
Oct 7 05:45:12 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test):
state change Idle -> Connect, reason: Start
Oct 7 05:46:26 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test):
socket error: No route to host
Oct 7 05:46:26 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test):
state change Connect -> Active, reason: Connection open f
ailed
Oct 7 05:48:16 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test):
state change Active -> OpenSent, reason: Connection opene
d
Oct 7 05:48:16 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test):
write error: Invalid argument
Oct 7 05:48:16 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test):
state change OpenSent -> Idle, reason: Fatal error
Oct 7 05:48:34 dev2 ntpd[24590]: adjusting local clock by -169.939425s
Oct 7 05:49:16 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test):
state change Idle -> Connect, reason: Start
Oct 7 05:49:16 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test):
socket error: Connection refused
Oct 7 05:49:16 dev2 bgpd[16756]: neighbor 66.63.12.107 (iBGP Test):
state change Connect -> Active, reason: Connection open f
ailed
-------------------
Current is no better but as noted about, the ports information after the
first cycle will be replace with:
Last error: unknown error code
# bgpctl s neigh 66.63.12.107
BGP neighbor is 66.63.12.107, remote AS 65001
Description: iBGP Test
BGP version 4, remote router-id 0.0.0.0
BGP state = Active
Last read Never, holdtime 240s, keepalive interval 80s
Message statistics:
Sent Received
Opens 2 0
Notifications 0 0
Updates 0 0
Keepalives 0 0
Route Refresh 0 0
Total 2 0
Local host: 66.63.12.108, Local port: 179
Remote host: 66.63.12.107, Remote port: 13386
With error log:
Oct 7 05:41:55 dev1 bgpd[15395]: startup
Oct 7 05:41:55 dev1 bgpd[16398]: route decision engine ready
Oct 7 05:41:55 dev1 bgpd[10475]: listening on 66.63.12.108
Oct 7 05:41:55 dev1 bgpd[10475]: session engine ready
Oct 7 05:41:55 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test):
state change None -> Idle, reason: None
Oct 7 05:41:55 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test):
state change Idle -> Connect, reason: Start
Oct 7 05:41:55 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test):
state change Connect -> OpenSent, reason: Connection open
ed
Oct 7 05:41:55 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test):
write error: Invalid argument
Oct 7 05:41:55 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test):
state change OpenSent -> Idle, reason: Fatal error
Oct 7 05:42:25 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test):
state change Idle -> Connect, reason: Start
Oct 7 05:43:40 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test):
socket error: No route to host
Oct 7 05:43:40 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test):
state change Connect -> Active, reason: Connection open f
ailed
Oct 7 05:45:31 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test):
state change Active -> OpenSent, reason: Connection opene
d
Oct 7 05:45:31 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test):
write error: Invalid argument
Oct 7 05:45:31 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test):
state change OpenSent -> Idle, reason: Fatal error
Oct 7 05:46:31 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test):
state change Idle -> Connect, reason: Start
Oct 7 05:46:31 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test):
socket error: Connection refused
Oct 7 05:46:31 dev1 bgpd[10475]: neighbor 66.63.12.107 (iBGP Test):
state change Connect -> Active, reason: Connection open f
ailed
=====================
Second example.
Now, I make sure no filter are present on the Cisco, reload it and kill
the bgpd on both server and restart them. What happen then is I sure can
establish a session where looks like bgpd will always be the master and
then after it is establish, if I reset from the Cisco side, it will
never come back to life. It will get stuck on OpenSent mode again here too.
I setup two boxes, one with 3.7 and one with current (oct 6) to see any
difference for this specific event. Same results so far when MD5 is
configure on it. Same results with Cisco 5350 and 7206. Same thing with
IOS 12.3(9), 12.3(16) or 12.4(3) as well. Obviously, I didn't try every
version under the sun, but the idea is there anyway.
I establish a session with MD5 where bgpd is initiate the session to the
Cisco box. The "bgpctl show neighbor 66.63.12.107" clearly show that
bgpd connect to the remote on 179. After the session is up, if I do
"clear ip bgp 66.63.12.106" or "clear ip bgp 66.63.12.108", both will
get stuck for ever until I manually clear the session as well from the
bgpd side. So, if ONLY the Cisco side initial a session clear, well gone
it will be until a manual clear is also done on bgpd side. I do see the
session on Cisco do the close, idle, active, OpenSent and then get stuck
there. Really looks like the bgpd side simply is not listening anymore.
Only difference is that on current, you get the port clear looks like
and an error message that 3.7 doesn't provide.
current:
# bgpctl s neigh 66.63.12.107
BGP neighbor is 66.63.12.107, remote AS 65001
Description: iBGP Test
BGP version 4, remote router-id 66.63.12.107
BGP state = Idle, down for 00:16:06
Last read 00:16:14, holdtime 240s, keepalive interval 80s
Message statistics:
Sent Received
Opens 17 4
Notifications 2 0
Updates 4 4
Keepalives 34 41
Route Refresh 0 0
Total 57 49
Last error: unknown error code
--------------
as oppose to 3.7 you get this:
# bgpctl s neigh 66.63.12.107
BGP neighbor is 66.63.12.107, remote AS 65001
Description: iBGP Test
BGP version 4, remote router-id 66.63.12.107
BGP state = Active, down for 00:16:17
Last read 00:16:18, holdtime 240s, keepalive interval 80s
Message statistics:
Sent Received
Opens 8 4
Notifications 2 0
Updates 4 4
Keepalives 34 42
Route Refresh 0 0
Total 48 50
Local host: 66.63.12.108, Local port: 14223
Remote host: 66.63.12.107, Remote port: 179
------------------------
and from the router side:
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down
State/PfxRcd
66.63.12.106 4 65001 44 73 0 0 0 00:16:23 OpenSent
66.63.12.108 4 65001 44 74 0 0 0 00:16:22 OpenSent
====================
No matter how long I wait, it's stuck there for ever.
Now as far as netstat -sptcp is concern, here is the results fro current.
# netstat -sptcp
tcp:
6392 packets sent
3446 data packets (410744 bytes)
75 data packets (14780 bytes) retransmitted
0 fast retransmitted packets
2774 ack-only packets (3298 delayed)
0 URG only packets
0 window probe packets
7 window update packets
90 control packets
0 packets hardware-checksummed
7564 packets received
2808 acks (for 391996 bytes)
783 duplicate acks
0 acks for unsent data
0 acks for old data
4699 packets (347442 bytes) received in-sequence
435 completely duplicate packets (18997 bytes)
0 old duplicate packets
0 packets with some duplicate data (0 bytes duplicated)
47 out-of-order packets (1404 bytes)
0 packets (0 bytes) of data after window
0 window probes
8 window update packets
37 packets received after close
0 discarded for bad checksums
0 discarded for bad header offset fields
0 discarded because packet too short
0 discarded for missing IPsec protection
0 discarded due to memory shortage
7480 packets hardware-checksummed
2 bad/missing md5 checksums
800 good md5 checksums
29 connection requests
62 connection accepts
76 connections established (including accepts)
112 connections closed (including 3 drops)
0 connections drained
7 embryonic connections dropped
1700 segments updated rtt (of 1664 attempts)
101 retransmit timeouts
3 connections dropped by rexmit timeout
0 persist timeouts
4 keepalive timeouts
0 keepalive probes sent
3 connections dropped by keepalive
1 correct ACK header prediction
2301 correct data packet header predictions
1618 PCB cache misses
0 ECN connections accepted
0 ECE packets received
0 CWR packets received
0 CE packets received
0 ECT packets sent
0 ECE packets sent
0 CWR packets sent
cwr by fastrecovery: 51
cwr by timeout: 101
cwr by ecn: 0
1065 bad connection attempts
245 SYN cache entries added
0 hash collisions
62 completed
0 aborted (no space to build PCB)
172 timed out
0 dropped due to overflow
0 dropped due to bucket overflow
4 dropped due to RST
0 dropped due to ICMP unreachable
725 SYN,ACKs retransmitted
182 duplicate SYNs received for entries already in the cache
0 SYNs dropped (no route or no space)
51 SACK recovery episodes
229 segment rexmits in SACK recovery episodes
18668 byte rexmits in SACK recovery episodes
449 SACK options received
26 SACK options sent
I hope this help a bit more. In any case, it's been now more then 30
minutes and still neither the 3.7 or current have recover, or ever
establish a session yet. From this stage, the only way to establish a
session is to clear from the Cisco side and as the session is in active
mode, before it gets to the OpenSent stage, I then clean the bgpd side,
the session will come up right away, but only if done in that order.
Now I need to get some sleep...
Daniel