Manual:BGP nexthop selection and validation in RouterOS 3.x

From MikroTik Wiki
Revision as of 12:38, 19 March 2014 by Marisb (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The problem

Even though the BGP RFC (RFC 4271, 5.1.3) devotes several pages to the selection of the BGP nexthop that will be included in an UPDATE message, the specification is still vague at some places. Besides that, other router vendors tend to give better control over nexthop selection than the RFC describes. A particular example is XORP routing daemon. It has no nexthop selection logic on it's own at all, and requires configuration of set-nexthop routing map for each peer. On the other hand, RouterOS is trying to conform to the RFC. Quite complicated selection logic is used here by default; but if you wish, you can override this logic by using routing filters.

Introduction of IPv6 brings additional nexthop selection related problems, as the ubiquitous link-local addresses (fe80::/10) has no equivalent in IPv4 world.

Here we talk about the particular nexthop selection algorithm used in RouterOS 3.x. Most of the IPv4 related part also applies to 2.9 routing-test.

IPv4 BGP route output

  • If a nexthop is configured with set-out-nexthop filter, always use this configured value (even if it's not valid!)
  • If we are reflecting a BGP route to an iBGP router (route-reflect=yes), use the nexthop received in UPDATE message.
  • If nexthop-choice is configured as force-self, go to the last step.
  • If we are redistributing a BGP route, the nexthop we received in UPDATE message is considered.
    • If the peer is eBGP and not configured multihop -- go to the next step.
    • If the nexthop is the same as the remote peer's id or remote peer's address used to establish the connection, go to the next step.
    • Else use the received BGP nexthop.
  • The nexthop from route table (FIB in BGP terms) is considered. If route has multiple nexthops, or is recursively resolved through multiple nexthops, only first of them is considered.
    • If the peer is iBGP and we are redistributing not locally originated route, go to the next step.
    • If the peer is eBGP and is multiple IP hops away go to the next step.
    • If the nexthop is the same as the remote peer's id or remote peer's address used to establish the connection, go to the next step.
    • Else use nexthop from route table (FIB).
  • As the last fallback, use the address used to establish the connection. (In case of IPv6 connection between the peers, use a random IPv4 address of the connection's interface. Same applies to IPv6 nexthop with IPv4 connection.)

IPv4 BGP route input

  • If the nexthop received in an UPDATE message is not a valid IPv4 unicast address, ignore this UPDATE message.
  • If the nexthop is router's local address, ignore this UPDATE message.
  • If the peer is eBGP (note that peer having different AS is considered eBGP, even if it's in the same confederation) and it's not configured as multihop, then the RFC requires to check that nexthops falls in a network shared with remote peer. In practice we use the network that is used to make connection with peer. For example, if connection is made with address 10.0.0.1/24 to address 10.0.0.2, the nexthop must fall in range 10.0.0.0 - 10.0.0.255.
  • (In case of IPv6 connection, all IPv4 networks belonging to the interface are tested. Same applies to IPv6 nexthop with IPv4 connection.)
  • After these checks are passed, the user can modify the received nexthop with set-in-nexthop filter, without limitations. set-in-nexthop-direct filter also can be used; or they can combined. Both filters accepts multiple nexthop values.
  • After the route are installed in RouterOS routing table with the selected nexthop, one last step remains. For this route to become active, the nexthop must be resolved.This can happen in two ways:
    1. When the nexthop falls in some connected route's range (i.e. gateway status is "reachable").
    2. When the nexthop falls in some other route's range with low enough scope attribute (i.e. gateway status is "recursive").

IPv6 BGP route output

For IPv6, everything is complicated with the introduction of link-local address nexthops (RFC 2545). In short, the are cases when two nexthops should be included in UPDATE message. The first nexthop always is present and is referred here as "global nexthop" (although it can be a link-local address). The second ("link-local nexthop"), when present, must be a link-local address. Note that link-local address always must be associated with a "link" (i.e. interface), otherwise it cannot be used for forwarding traffic. In BGP case, the interface index is deduced from the connection.

  • If a nexthop is configured with set-out-nexthop filter, always use this configured value (even if it's not valid!)
  • If we are reflecting a BGP route to an iBGP router (route-reflect=yes), use the nexthop from UPDATE message. Do not set link-local nexthop in this case.
  • Select global nexthop in the same way we would select IPv4 nexthop.
  • If the following holds:
    • peer is reachable directly (i.e. single IP hop away);
    • global nexthop falls in a network shared with peer;
    • global nexthop is not a link local address;

then also include link-local nexthop in the UPDATE message. Else terminate.

  • Select the link-local nexthop.
    • First check the nexthop configured with set-out-nexthop-linklocal filter, if any. Use it if it's a link-local address.
    • Then try to use FIB nexthop as link-local nexthop. Use it if it's a link-local address.
    • Finally, take as nexthop the link-local address belonging to the interface used to establish the connection with remote peer.

IPv6 BGP route input

  • Validate global nexthop exactly the same way as IPv4 nexthop would be validated. Multicast, reserved and loopback addresses are not acceptable as nexthops.
  • If the link-local nexthop received is not a valid IPv6 link-local address, then ignore it.
  • If the link-local nexthop is a router's local address, then ignore it.
  • If the link-local nexthop is present in UPDATE message and should not be ignored, then use it for installing in route table (FIB). Else use global nexthop.
  • The user can modify the received nexthop with set-in-nexthop-ipv6 and set-in-nexthop-linklocal filters, without limitations. set-in-nexthop-direct filter also can be used; or they can be combined. All filters accepts multiple nexthop values.
  • In routing table, non-link-local nexthops are resolved the same way as IPv4 nexthops. Link-local nexthops always are considered reachable, if nexthop's interface has IPv6 support. (Interface has IPv6 support if it has any IPv6 address.)

Other address families

For l2vpn, l2vpn-cisco and vpnv4 address families nexthop is selected and validated in exactly the same way as for IPv4.

Currently there is no support for IPv6 nexthops for l2vpn[-cisco] address families.

References

  • RFC 4271 - A Border Gateway Protocol 4 (BGP-4) - section 5.1.3.
  • RFC 2545 - Use of BGP-4 Multiprotocol Extensions for IPv6 Inter-Domain Routing