ECMP load balancing with masquerade
Introduction
This example is improved (different) version of round-robin load balancing example. It adds persistent user sessions, i.e. a particular user would use the same source IP address for all outgoing connections. Consider the following network layout:
Quick Start for Impatient
Configuration export from the gateway router:
/ ip address add address=192.168.0.1/24 network=192.168.0.0 broadcast=192.168.0.255 interface=Local add address=10.111.0.2/24 network=10.111.0.0 broadcast=10.111.0.255 interface=wlan2 add address=10.112.0.2/24 network=10.112.0.0 broadcast=10.112.0.255 interface=wlan1 / ip route add dst-address=0.0.0.0/0 gateway=10.111.0.1,10.112.0.1 check-gateway=ping / ip firewall nat add chain=srcnat out-interface=wlan1 action=masquerade add chain=srcnat out-interface=wlan2 action=masquerade / ip firewall mangle add chain=input in-interface=wlan1 action=mark-connection new-connection-mark=wlan1_conn add chain=input in-interface=wlan2 action=mark-connection new-connection-mark=wlan2_conn add chain=output connection-mark=wlan1_conn action=mark-routing new-routing-mark=to_wla1 add chain=output connection-mark=wlan2_conn action=mark-routing new-routing-mark=to_wla2 / ip route add dst-address=0.0.0.0/0 gateway=10.111.0.1 routing-mark=to_wla1 add dst-address=0.0.0.0/0 gateway=10.112.0.1 routing-mark=to_wla2
Explanation
First we give a code snippet and then explain what it actually does.
IP Addresses
/ ip address add address=192.168.0.1/24 network=192.168.0.0 broadcast=192.168.0.255 interface=Local add address=10.111.0.2/24 network=10.111.0.0 broadcast=10.111.0.255 interface=wlan2 add address=10.112.0.2/24 network=10.112.0.0 broadcast=10.112.0.255 interface=wlan1
The router has two upstream (WAN) interfaces with the addresses of 10.111.0.2/24 and 10.112.0.2/24. The LAN interface has the name "Local" and IP address of 192.168.0.1/24.
NAT
/ ip firewall nat add chain=srcnat out-interface=wlan1 action=masquerade add chain=srcnat out-interface=wlan2 action=masquerade
As routing decision is already made we just need rules that will fix src-addresses for all outgoing packets. if this packet will leave via wlan1 it will be NATed to 10.112.0.2/24, if via wlan2 then NATed to 10.111.0.2/24
Routing
/ ip route add dst-address=0.0.0.0/0 gateway=10.111.0.1,10.112.0.1 check-gateway=ping
This is typical ECMP (Equal Cost Multi-Path) gateway with check-gateway. ECMP is "persistent per-connection load balancing" or "per-src-dst-address combination load balancing". As soon as one of the gateway will not be reachable, check-gateway will remove it from gateway list. And you will have a "failover" effect.
You can use asymmetric bandwidth links also - for example one link is 2Mbps other 10Mbps. Just use this command to make load balancing 1:5
/ ip route add dst-address=0.0.0.0/0 gateway=10.111.0.1,10.112.0.1,10.112.0.1,10.112.0.1,10.112.0.1,10.112.0.1 check-gateway=ping
Connections to the router itself
/ ip firewall mangle add chain=input in-interface=wlan1 action=mark-connection new-connection-mark=wlan1_conn add chain=input in-interface=wlan2 action=mark-connection new-connection-mark=wlan2_conn add chain=output connection-mark=wlan1_conn action=mark-routing new-routing-mark=to_wlan1 add chain=output connection-mark=wlan2_conn action=mark-routing new-routing-mark=to_wlan2
/ ip route add dst-address=0.0.0.0/0 gateway=10.111.0.1 routing-mark=to_wlan1 add dst-address=0.0.0.0/0 gateway=10.112.0.1 routing-mark=to_wlan2
With all multi-gateway situations there is a usual problem to reach router from public network via one, other or both gateways. Explanations is very simple - Outgoing packets uses same routing decision as packets that are going trough the router. So reply to a packet that was received via wlan1 might be send out and masqueraded via wlan2.
To avoid that we need to policy routing those connections.
Known Issues
DNS issues
ISP specific DNS servers might have custom configuration that treats specific requests from ISP's network differently than requests from other network. So in case connection is made via other gateway those sites will not be accessible.
To avoid that we suggest to use 3rd-party (public) DNS servers, and in case you need ISP specific recourse, create static DNS entry and policy route that traffic to specific gateway.
Routing table flushing
Every time when something triggers flush of the routing table and ECMP cache is flushed. Connections will be assigned to gateways once again and may or may not be on the same gateway.(in case of 2 gateways there are 50% chance that traffic will start to flow via other gateway).
If you have fully routed network (clients address can be routed via all available gateway), change of the gateway will have no ill effect, but in case you use
masquerade, change of the gateway will result in change of the packet's source address and connection will be dropped.
Routing table flush can be caused by 2 things:
1) routing table change (dynamic routing protocol update, user manual changes)
2) every 10 minutes routing table is flushed for security reasons (to avoid possible DoS attacks)
So even if you do not have any changes of routing table, connections may jump to other gateway every 10 minutes