Discussion:
MPLS: Disable Penultimate Hop Popping?
Rolf Sommerhalder
2010-06-07 14:48:32 UTC
Permalink
Dear list,

Is there an way to disable PHP, e.g. to prevent ldpd on the last P
router from stripping/popping the label before it reaches the PE
router?

In my little test network that runs -current as of 03 June, I observe
from ldpd's lfib on the last P router that it pops the label on the
ingress interface (see output below), but then it never outputs/routes
the stripped packet to the egress interface towards the PE router (the
use counter of the matching prefix on the P router does not increment,
e.g. the P router appears to blackhole the ICMP Echo Request packets).

Currently, I am using a very basic setup with RIP as IGP, LDP, without
any additional route tables, nor VLANs, pf is disabled on all nodes.

I can provide a more detailed description of my lab setup, and output
of the various routing tables, etc. However, if there would be an easy
way to disable PHP, then that would confirm that my setup is indeed OK
(as I have tried to verify manually so far).

Thank you,
Rolf


We ping from another PE router pe11 via two P routers p1 and p2 to
another PE router pe21=3.2.1.1:
[***@pe11:root]# ping 3.2.1.1

The label switched packet makes it via another P router p1 to the
ingress interface of the last P router p2 just fine:
[***@p2:root]# tcpdump -i vr2 -n
tcpdump: listening on vr2, link-type EN10MB
18:34:39.071600 MPLS(label 0x14, exp 0, ttl 254) 2.1.1.2 > 3.2.1.1:
icmp: echo request
18:34:40.091795 MPLS(label 0x14, exp 0, ttl 254) 2.1.1.2 > 3.2.1.1:
icmp: echo request
18:34:41.111994 MPLS(label 0x14, exp 0, ttl 254) 2.1.1.2 > 3.2.1.1:
icmp: echo request

According to the LFIB this P router knows it is the Penultimate Hop
which directly connects to pe21. Therefore (probably), it pops the
label 20:
[***@p2:root]# ldpctl show lfib
flags: * = valid, C = Connected, S = Static
Flags Destination Nexthop Local Label Remote Label
*S 0.0.0.0/0 10.7.0.254 16 -
*C 1.1.2.0/29 link#3 imp-null -
*R 1.1.3.0/29 1.1.2.1 17 Pop
*C 1.2.3.0/29 link#2 imp-null -
*R 2.1.1.0/30 1.1.2.1 18 Pop
*C 2.2.1.0/30 link#1 imp-null -
*R 3.1.1.0/30 1.1.2.1 19 19
*R 3.2.1.0/30 2.2.1.2 20 Pop <==
*R 3.2.1.4/30 2.2.1.2 21 Pop
*R 3.2.1.8/30 2.2.1.2 22 Pop
*R 7.0.0.1/32 1.1.2.1 23 23
* 7.0.0.2/32 7.0.0.2 24 -
*R 7.0.0.3/32 1.2.3.3 25 17
*R 7.0.0.11/32 1.1.2.1 26 26
*R 7.0.0.21/32 2.2.1.2 27 26
*C 10.7.0.0/16 link#6 imp-null -
*C 127.0.0.0/8 link#0 - -
*S 127.0.0.0/8 127.0.0.1 - -
* 127.0.0.1/32 127.0.0.1 - -
*S 224.0.0.0/4 127.0.0.1 - -
* 224.0.0.9/32 127.0.0.1 - -


[***@p2:root]# ripctl show rib
Destination Nexthop Cost
1.1.2.0/29 0.0.0.0 1
1.1.3.0/29 1.1.2.1 2
1.2.3.0/29 0.0.0.0 1
2.1.1.0/30 1.1.2.1 2
2.2.1.0/30 0.0.0.0 1
3.1.1.0/30 1.1.2.1 3
3.2.1.0/30 2.2.1.2 2 <==
3.2.1.4/30 2.2.1.2 2
3.2.1.8/30 2.2.1.2 2
7.0.0.1/32 1.1.2.1 2
7.0.0.2/32 7.0.0.2 1
7.0.0.3/32 1.2.3.3 2
7.0.0.11/32 1.1.2.1 3
7.0.0.21/32 2.2.1.2 2

[***@p2:root]# route -n get 3.2.1.1
route to: 3.2.1.1
destination: 3.2.1.0
mask: 255.255.255.252
gateway: 2.2.1.2
interface: vr0
if address: 2.2.1.1
priority: 40 (rip)
flags: <UP,GATEWAY,DONE>
use mtu expire
49 0 0

But p2 appears to eat up those packets, instead of forwarding them to
the egress interface vr0:
[***@p2:root]# tcpdump -i vr0 -nvv
tcpdump: listening on vr0, link-type EN10MB
18:45:53.449355 2.2.1.2.646 > 224.0.0.2.646: [udp sum ok] udp 26 [tos
0xc0] [ttl 1] (id 64128, len 54)
18:45:53.682729 2.2.1.1.646 > 224.0.0.2.646: [udp sum ok] udp 26 [tos
0xc0] [ttl 1] (id 6099, len 54)
18:45:58.460172 2.2.1.2.646 > 224.0.0.2.646: [udp sum ok] udp 26 [tos
0xc0] [ttl 1] (id 6759, len 54)
^C
4 packets received by filter
0 packets dropped by kernel

Note that IP forwarding is enabled on all P and PE routers, e.g.
sysctl net.inet.ip.forwarding=1 .
Claudio Jeker
2010-06-07 15:31:01 UTC
Permalink
Post by Rolf Sommerhalder
Dear list,
Is there an way to disable PHP, e.g. to prevent ldpd on the last P
router from stripping/popping the label before it reaches the PE
router?
It is on the todo list but not yet done. It is a per-interface knob we
need in ldpd.
Post by Rolf Sommerhalder
In my little test network that runs -current as of 03 June, I observe
from ldpd's lfib on the last P router that it pops the label on the
ingress interface (see output below), but then it never outputs/routes
the stripped packet to the egress interface towards the PE router (the
use counter of the matching prefix on the P router does not increment,
e.g. the P router appears to blackhole the ICMP Echo Request packets).
Have a look at the route -n show -mpls output and check the input counter
for label 20.
Hmm. I see, my testings never used PHP on the P router. So there is
something strange going on. Thanks for the report, I'll have a look.
Post by Rolf Sommerhalder
Currently, I am using a very basic setup with RIP as IGP, LDP, without
any additional route tables, nor VLANs, pf is disabled on all nodes.
I can provide a more detailed description of my lab setup, and output
of the various routing tables, etc. However, if there would be an easy
way to disable PHP, then that would confirm that my setup is indeed OK
(as I have tried to verify manually so far).
Setup looks fine. I use OSPF as IGP but I now Michele is using RIP in his
setup.
Post by Rolf Sommerhalder
Thank you,
Rolf
We ping from another PE router pe11 via two P routers p1 and p2 to
Side note:
Please consider using IP blocks that are available for testing and not
publicly assigned ones.
--
:wq Claudio
Rolf Sommerhalder
2010-06-07 16:19:10 UTC
Permalink
Thanks Claudio for your speedy reply.
Post by Claudio Jeker
Have a look at the route -n show -mpls output and check the input counter
for label 20.

It happily counts and confirms what tcpdump shows on the ingress interface:

[***@p2:root]# route -n show -mpls
Routing tables

MPLS:
In label Out label Op Gateway Flags Refs Use
Mtu Prio Interface
3 - LOCAL 127.0.0.1 UGT 0 0
33200 56 lo0
16 - LOCAL 10.7.0.254 UGT 0 0
- 56 udav0
17 - POP 1.1.2.1 UGT 0 0
- 56 vr2
18 - POP 1.1.2.1 UGT 0 5
- 56 vr2
19 19 SWAP 1.1.2.1 UGT 0 0
- 56 vr2
20 - POP 2.2.1.2 UGT 0 7526
- 56 vr0 <==
21 - POP 2.2.1.2 UGT 0 0
- 56 vr0
22 - POP 2.2.1.2 UGT 0 0
- 56 vr0
23 23 SWAP 1.1.2.1 UGT 0 0
- 56 vr2
24 - LOCAL 7.0.0.2 UGT 0 1
33200 56 lo1
25 17 SWAP 1.2.3.3 UGT 0 0
- 56 vr1
26 26 SWAP 1.1.2.1 UGT 0 0
- 56 vr2
27 26 SWAP 2.2.1.2 UGT 0 0
- 56 vr0

Also, I have re-checked the counters of all other routes as well as
the traffic out of all other interface on this P router, but the
packets do not appear on a "wrong" interface.
Post by Claudio Jeker
Setup looks fine. I use OSPF as IGP but I now Michele is using RIP in his
setup.

For a cross-check, I will move from RIP to OSPF and report again if it
made any difference.
Post by Claudio Jeker
Please consider using IP blocks that are available for testing and not
publicly assigned ones.

But it so much more convenient with short addresses which reflect the
topology, although there is actually named running as well. As a
precaution against leakage into the wild Internet, I had added those
public ranges temporarily to the RFC1918 egress filter on my pf lab
firewall :-)
Claudio Jeker
2010-06-07 17:05:56 UTC
Permalink
Post by Rolf Sommerhalder
Thanks Claudio for your speedy reply.
Post by Claudio Jeker
Have a look at the route -n show -mpls output and check the input counter
for label 20.
Routing tables
In label Out label Op Gateway Flags Refs Use
Mtu Prio Interface
3 - LOCAL 127.0.0.1 UGT 0 0
33200 56 lo0
16 - LOCAL 10.7.0.254 UGT 0 0
- 56 udav0
17 - POP 1.1.2.1 UGT 0 0
- 56 vr2
18 - POP 1.1.2.1 UGT 0 5
- 56 vr2
19 19 SWAP 1.1.2.1 UGT 0 0
- 56 vr2
20 - POP 2.2.1.2 UGT 0 7526
- 56 vr0 <==
21 - POP 2.2.1.2 UGT 0 0
- 56 vr0
22 - POP 2.2.1.2 UGT 0 0
- 56 vr0
23 23 SWAP 1.1.2.1 UGT 0 0
- 56 vr2
24 - LOCAL 7.0.0.2 UGT 0 1
33200 56 lo1
25 17 SWAP 1.2.3.3 UGT 0 0
- 56 vr1
26 26 SWAP 1.1.2.1 UGT 0 0
- 56 vr2
27 26 SWAP 2.2.1.2 UGT 0 0
- 56 vr0
Also, I have re-checked the counters of all other routes as well as
the traffic out of all other interface on this P router, but the
packets do not appear on a "wrong" interface.
Yeah, the packets are dropped in the POP case of mpls_input.c that's how
far I got until now. I started with a fix but my magic is not strong
enough for now. Need to add some printfs to figure out where the packets
are dropped now.
Post by Rolf Sommerhalder
Post by Claudio Jeker
Setup looks fine. I use OSPF as IGP but I now Michele is using RIP in his
setup.
For a cross-check, I will move from RIP to OSPF and report again if it
made any difference.
Does not make a difference I have the same issue.
Post by Rolf Sommerhalder
Post by Claudio Jeker
Please consider using IP blocks that are available for testing and not
publicly assigned ones.
But it so much more convenient with short addresses which reflect the
topology, although there is actually named running as well. As a
precaution against leakage into the wild Internet, I had added those
public ranges temporarily to the RFC1918 egress filter on my pf lab
firewall :-)
I just brought it up because of the bad traffic flowing through the newly
assigned 1/8.
I use 10/8 as my playground and 192.168/16 for the VRFs. Works nicely.
--
:wq Claudio
Rolf Sommerhalder
2010-06-07 19:12:44 UTC
Permalink
Post by Claudio Jeker
Yeah, the packets are dropped in the POP case of mpls_input.c that's how
far I got until now. I started with a fix but my magic is not strong
enough for now.
After taking a look at the source, I essentially backed out changes
done in rev. 1.10 /src/usr.sbin/ldpd/kroute.c .

Now my test setup works *somehow* even though the LFIB still shows
that LDP applies PHP:

[***@p2:root]# ldpctl sh lfib 3.2.1.1
flags: * = valid, C = Connected, S = Static
Flags Destination Nexthop Local Label Remote Label
*R 3.2.1.0/30 2.2.1.2 20 Pop


But the kernel routing table is different, e.g. it applies no PHP but
does SWAP the labels, as desired :

[***@p2:root]# route -n show -mpls
Routing tables

MPLS:
In label Out label Op Gateway Flags Refs Use
Mtu Prio Interface
3 - LOCAL 127.0.0.1 UGT 0 0
33200 56 lo0
16 - LOCAL 10.7.0.254 UGT 0 0
- 56 udav0
17 3 SWAP 1.1.2.1 UGT 0 0
- 56 vr2
18 3 SWAP 1.1.2.1 UGT 0 30
- 56 vr2
19 19 SWAP 1.1.2.1 UGT 0 0
- 56 vr2
20 3 SWAP 2.2.1.2 UGT 0 30
- 56 vr0 <==
21 3 SWAP 2.2.1.2 UGT 0 0
- 56 vr0
22 3 SWAP 2.2.1.2 UGT 0 0
- 56 vr0
23 23 SWAP 1.1.2.1 UGT 0 0
- 56 vr2
24 - LOCAL 7.0.0.2 UGT 0 0
33200 56 lo1
25 17 SWAP 1.2.3.3 UGT 0 0
- 56 vr1
26 26 SWAP 1.1.2.1 UGT 0 0
- 56 vr2
27 26 SWAP 2.2.1.2 UGT 0 0
- 56 vr0


Also, the RTT for the pings is approx. 150 ms (and increasing) which
should be in the 1 ms range:
[***@pe11:root]# ping 3.2.1.1
PING 3.2.1.1 (3.2.1.1): 56 data bytes
64 bytes from 3.2.1.1: icmp_seq=364 ttl=252 time=113.307 ms
64 bytes from 3.2.1.1: icmp_seq=365 ttl=252 time=113.211 ms
64 bytes from 3.2.1.1: icmp_seq=366 ttl=252 time=113.285 ms
64 bytes from 3.2.1.1: icmp_seq=367 ttl=252 time=113.260 ms
64 bytes from 3.2.1.1: icmp_seq=368 ttl=252 time=113.204 ms
64 bytes from 3.2.1.1: icmp_seq=369 ttl=252 time=152.024 ms
64 bytes from 3.2.1.1: icmp_seq=370 ttl=252 time=151.746 ms
64 bytes from 3.2.1.1: icmp_seq=371 ttl=252 time=151.777 ms
64 bytes from 3.2.1.1: icmp_seq=373 ttl=252 time=151.778 ms
64 bytes from 3.2.1.1: icmp_seq=374 ttl=252 time=151.753 ms
64 bytes from 3.2.1.1: icmp_seq=375 ttl=252 time=151.729 ms
64 bytes from 3.2.1.1: icmp_seq=376 ttl=252 time=151.782 ms
64 bytes from 3.2.1.1: icmp_seq=377 ttl=252 time=151.789 ms
64 bytes from 3.2.1.1: icmp_seq=378 ttl=252 time=151.760 ms
64 bytes from 3.2.1.1: icmp_seq=379 ttl=252 time=151.712 ms
64 bytes from 3.2.1.1: icmp_seq=380 ttl=252 time=151.787 ms
64 bytes from 3.2.1.1: icmp_seq=381 ttl=252 time=151.853 ms
64 bytes from 3.2.1.1: icmp_seq=382 ttl=252 time=151.798 ms
64 bytes from 3.2.1.1: icmp_seq=383 ttl=252 time=190.438 ms
64 bytes from 3.2.1.1: icmp_seq=384 ttl=252 time=190.347 ms
64 bytes from 3.2.1.1: icmp_seq=385 ttl=252 time=190.349 ms
64 bytes from 3.2.1.1: icmp_seq=386 ttl=252 time=190.395 ms
64 bytes from 3.2.1.1: icmp_seq=387 ttl=252 time=190.315 ms
64 bytes from 3.2.1.1: icmp_seq=388 ttl=252 time=190.316 ms
64 bytes from 3.2.1.1: icmp_seq=389 ttl=252 time=190.309 ms


Obviously, I did not understand enough about the code yet...
Claudio Jeker
2010-06-07 19:33:44 UTC
Permalink
Post by Rolf Sommerhalder
Post by Claudio Jeker
Yeah, the packets are dropped in the POP case of mpls_input.c that's how
far I got until now. I started with a fix but my magic is not strong
enough for now.
After taking a look at the source, I essentially backed out changes
done in rev. 1.10 /src/usr.sbin/ldpd/kroute.c .
Now my test setup works *somehow* even though the LFIB still shows
flags: * = valid, C = Connected, S = Static
Flags Destination Nexthop Local Label Remote Label
*R 3.2.1.0/30 2.2.1.2 20 Pop
But the kernel routing table is different, e.g. it applies no PHP but
Routing tables
In label Out label Op Gateway Flags Refs Use
Mtu Prio Interface
3 - LOCAL 127.0.0.1 UGT 0 0
33200 56 lo0
16 - LOCAL 10.7.0.254 UGT 0 0
- 56 udav0
17 3 SWAP 1.1.2.1 UGT 0 0
- 56 vr2
18 3 SWAP 1.1.2.1 UGT 0 30
- 56 vr2
19 19 SWAP 1.1.2.1 UGT 0 0
- 56 vr2
20 3 SWAP 2.2.1.2 UGT 0 30
- 56 vr0 <==
This actually causes the implicit null label to become explicit.
I don't think this you want that.
Post by Rolf Sommerhalder
Also, the RTT for the pings is approx. 150 ms (and increasing) which
I guess the packet is doing some extra loops.

As I said I know where the problem is but it is far harder to fix then
expecetd.
--
:wq Claudio
Claudio Jeker
2010-06-08 14:54:22 UTC
Permalink
Post by Claudio Jeker
Post by Rolf Sommerhalder
Post by Claudio Jeker
Yeah, the packets are dropped in the POP case of mpls_input.c that's how
far I got until now. I started with a fix but my magic is not strong
enough for now.
After taking a look at the source, I essentially backed out changes
done in rev. 1.10 /src/usr.sbin/ldpd/kroute.c .
Now my test setup works *somehow* even though the LFIB still shows
flags: * = valid, C = Connected, S = Static
Flags Destination Nexthop Local Label Remote Label
*R 3.2.1.0/30 2.2.1.2 20 Pop
But the kernel routing table is different, e.g. it applies no PHP but
Routing tables
In label Out label Op Gateway Flags Refs Use
Mtu Prio Interface
3 - LOCAL 127.0.0.1 UGT 0 0
33200 56 lo0
16 - LOCAL 10.7.0.254 UGT 0 0
- 56 udav0
17 3 SWAP 1.1.2.1 UGT 0 0
- 56 vr2
18 3 SWAP 1.1.2.1 UGT 0 30
- 56 vr2
19 19 SWAP 1.1.2.1 UGT 0 0
- 56 vr2
20 3 SWAP 2.2.1.2 UGT 0 30
- 56 vr0 <==
This actually causes the implicit null label to become explicit.
I don't think this you want that.
Post by Rolf Sommerhalder
Also, the RTT for the pings is approx. 150 ms (and increasing) which
I guess the packet is doing some extra loops.
As I said I know where the problem is but it is far harder to fix then
expecetd.
Here is a fix for the PHP issue. mpls_input() is a bit of spaghetti code
but I think I will fix that later.
--
:wq Claudio

Index: mpls.h
===================================================================
RCS file: /cvs/src/sys/netmpls/mpls.h,v
retrieving revision 1.23
diff -u -p -r1.23 mpls.h
--- mpls.h 2 Jun 2010 15:41:06 -0000 1.23
+++ mpls.h 8 Jun 2010 13:57:05 -0000
@@ -181,7 +181,4 @@ void mpls_input(struct mbuf *);
int mpls_output(struct ifnet *, struct mbuf *, struct sockaddr *,
struct rtentry *);

-void mpls_ip_input(struct mbuf *, u_int8_t);
-void mpls_ip6_input(struct mbuf *, u_int8_t);
-
#endif /* _KERNEL */
Index: mpls_input.c
===================================================================
RCS file: /cvs/src/sys/netmpls/mpls_input.c,v
retrieving revision 1.26
diff -u -p -r1.26 mpls_input.c
--- mpls_input.c 2 Jun 2010 15:41:07 -0000 1.26
+++ mpls_input.c 8 Jun 2010 13:58:43 -0000
@@ -56,6 +56,9 @@ extern int mpls_inkloop;
extern int mpls_mapttl_ip;
extern int mpls_mapttl_ip6;

+int mpls_ip_adjttl(struct mbuf *, u_int8_t);
+int mpls_ip6_adjttl(struct mbuf *, u_int8_t);
+
void
mpls_init(void)
{
@@ -93,7 +96,7 @@ mpls_input(struct mbuf *m)
struct rtentry *rt = NULL;
struct rt_mpls *rt_mpls;
u_int8_t ttl;
- int i, hasbos;
+ int i, s, hasbos;

if (!ISSET(ifp->if_xflags, IFXF_MPLS)) {
m_freem(m);
@@ -117,7 +120,7 @@ mpls_input(struct mbuf *m)
ifp->if_xname, MPLS_LABEL_GET(shim->shim_label),
MPLS_TTL_GET(shim->shim_label),
MPLS_BOS_ISSET(shim->shim_label));
-#endif /* MPLS_DEBUG */
+#endif

/* check and decrement TTL */
ttl = ntohl(shim->shim_label & MPLS_TTL_MASK);
@@ -159,13 +162,23 @@ mpls_input(struct mbuf *m)
* to be at the beginning of the stack.
*/
if (hasbos) {
- mpls_ip_input(m, ttl);
+ if (mpls_ip_adjttl(m, ttl))
+ goto done;
+ s = splnet();
+ IF_INPUT_ENQUEUE(&ipintrq, m);
+ schednetisr(NETISR_IP);
+ splx(s);
goto done;
} else
continue;
case MPLS_LABEL_IPV6NULL:
if (hasbos) {
- mpls_ip6_input(m, ttl);
+ if (mpls_ip6_adjttl(m, ttl))
+ goto done;
+ s = splnet();
+ IF_INPUT_ENQUEUE(&ip6intrq, m);
+ schednetisr(NETISR_IPV6);
+ splx(s);
goto done;
} else
continue;
@@ -208,20 +221,26 @@ mpls_input(struct mbuf *m)
break;

if (!rt->rt_gateway) {
-#ifdef MPLS_DEBUG
- printf("MPLS_DEBUG: no layer 3 informations "
- "attached\n");
-#endif
m_freem(m);
goto done;
}

switch(rt->rt_gateway->sa_family) {
case AF_INET:
- mpls_ip_input(m, ttl);
+ if (mpls_ip_adjttl(m, ttl))
+ break;
+ s = splnet();
+ IF_INPUT_ENQUEUE(&ipintrq, m);
+ schednetisr(NETISR_IP);
+ splx(s);
break;
case AF_INET6:
- mpls_ip6_input(m, ttl);
+ if (mpls_ip6_adjttl(m, ttl))
+ break;
+ s = splnet();
+ IF_INPUT_ENQUEUE(&ip6intrq, m);
+ schednetisr(NETISR_IPV6);
+ splx(s);
break;
default:
m_freem(m);
@@ -229,19 +248,45 @@ mpls_input(struct mbuf *m)
goto done;
case MPLS_OP_POP:
m = mpls_shim_pop(m);
- if (hasbos) {
+ if (!hasbos)
+ /* redo lookup with next label */
+ break;
+
+ ifp = rt->rt_ifp;
#if NMPE > 0
- if (rt->rt_ifp->if_type == IFT_MPLS) {
- smpls = satosmpls(rt_key(rt));
- mpe_input(m, rt->rt_ifp, smpls, ttl);
- goto done;
- }
+ if (ifp->if_type == IFT_MPLS) {
+ smpls = satosmpls(rt_key(rt));
+ mpe_input(m, rt->rt_ifp, smpls, ttl);
+ goto done;
+ }
#endif
- /* last label but we have no clue so drop */
+ if (!rt->rt_gateway) {
m_freem(m);
goto done;
}
- break;
+
+ switch(rt->rt_gateway->sa_family) {
+ case AF_INET:
+ if (mpls_ip_adjttl(m, ttl))
+ goto done;
+ break;
+ case AF_INET6:
+ if (mpls_ip6_adjttl(m, ttl))
+ goto done;
+ break;
+ default:
+ m_freem(m);
+ goto done;
+ }
+
+ /* Output iface is not MPLS-enabled */
+ if (!ISSET(ifp->if_xflags, IFXF_MPLS)) {
+ m_freem(m);
+ goto done;
+ }
+
+ (*ifp->if_ll_output)(ifp, m, rt->rt_gateway, rt);
+ goto done;
case MPLS_OP_PUSH:
m = mpls_shim_push(m, rt_mpls);
break;
@@ -293,27 +338,27 @@ done:
RTFREE(rt);
}

-void
-mpls_ip_input(struct mbuf *m, u_int8_t ttl)
+int
+mpls_ip_adjttl(struct mbuf *m, u_int8_t ttl)
{
struct ip *ip;
- int s, hlen;
+ int hlen;

if (mpls_mapttl_ip) {
if (m->m_len < sizeof (struct ip) &&
(m = m_pullup(m, sizeof(struct ip))) == NULL)
- return;
+ return -1;
ip = mtod(m, struct ip *);
hlen = ip->ip_hl << 2;
if (m->m_len < hlen) {
if ((m = m_pullup(m, hlen)) == NULL)
- return;
+ return -1;
ip = mtod(m, struct ip *);
}
-
+ /* make sure we have a valid header */
if (in_cksum(m, hlen) != 0) {
m_free(m);
- return;
+ return -1;
}

/* set IP ttl from MPLS ttl */
@@ -323,32 +368,23 @@ mpls_ip_input(struct mbuf *m, u_int8_t t
ip->ip_sum = 0;
ip->ip_sum = in_cksum(m, hlen);
}
-
- s = splnet();
- IF_INPUT_ENQUEUE(&ipintrq, m);
- schednetisr(NETISR_IP);
- splx(s);
+ return 0;
}

-void
-mpls_ip6_input(struct mbuf *m, u_int8_t ttl)
+int
+mpls_ip6_adjttl(struct mbuf *m, u_int8_t ttl)
{
struct ip6_hdr *ip6hdr;
- int s;

if (mpls_mapttl_ip6) {
if (m->m_len < sizeof (struct ip6_hdr) &&
(m = m_pullup(m, sizeof(struct ip6_hdr))) == NULL)
- return;
+ return -1;

ip6hdr = mtod(m, struct ip6_hdr *);

/* set IPv6 ttl from MPLS ttl */
ip6hdr->ip6_hlim = ttl;
}
-
- s = splnet();
- IF_INPUT_ENQUEUE(&ip6intrq, m);
- schednetisr(NETISR_IPV6);
- splx(s);
+ return 0;
}
Rolf Sommerhalder
2010-06-09 23:08:24 UTC
Permalink
Here is a fix for the PHP issue.

Great, it fixes the problem. Thank you very much.

The test setup works now, although I had no time yet for in-depth
testing with other traffic than just ICMP pings. But pings from pe11
to pe21 make it now back to pe11 while p1 and p2 both do PHP.

However, in a brief test I observed that the RTT from pe11 to pe21 via
p1 and p2 and back is 73 ms. Sniffing on the interface of the target
pe21, I see that it takes 13 ms from the time the routed,
label-stripped Echo Request comes in, until the labeled Echo Reply is
sent back to p2.

I hope to find some time tomorrow to take a close look at p2, p1 and
pe11 in order to try and figure out why the RTT sums up to 73 ms where
I expect it to be in the order of a few ms only.

Rolf
Claudio Jeker
2010-06-10 05:37:16 UTC
Permalink
Post by Claudio Jeker
Here is a fix for the PHP issue.
Great, it fixes the problem. Thank you very much.
The test setup works now, although I had no time yet for in-depth
testing with other traffic than just ICMP pings. But pings from pe11
to pe21 make it now back to pe11 while p1 and p2 both do PHP.
However, in a brief test I observed that the RTT from pe11 to pe21 via
p1 and p2 and back is 73 ms. Sniffing on the interface of the target
pe21, I see that it takes 13 ms from the time the routed,
label-stripped Echo Request comes in, until the labeled Echo Reply is
sent back to p2.
I hope to find some time tomorrow to take a close look at p2, p1 and
pe11 in order to try and figure out why the RTT sums up to 73 ms where
I expect it to be in the order of a few ms only.
What kind of HW do you use? I do my tests with little soekris boxes and
there the RTT is in the range of 4-5ms and indistinguishable from non MPLS
operation.
--
:wq Claudio
Rolf Sommerhalder
2010-06-12 17:18:58 UTC
Permalink
Post by Claudio Jeker
What kind of HW do you use?
The MPLS test setup is made from five ALIX boards, three as P routers
in the core connected in a triangle, and two as PE routers.
Post by Claudio Jeker
I do my tests with little soekris boxes and
there the RTT is in the range of 4-5ms and indistinguishable from non MPLS
operation.
After updating to -current as of this morning, I still observe RTTs of
113 ms when pinging from one PE router to the over PE via two P
routers.
I have been checking the usual suspects, such as duplex-mismatch, but
from tcpdumps it looks as if the P routers take two to three douzen
milliseconds before they route returning ICMP Echo Replies.

Later, I'll try to compile and post a more precise analysis, possibly
with timestamped traces, etc.

Loading...