Manual:Multicast detailed example
From MikroTik Wiki
MikroTik is using PIM-SM protocol implementation from XORP , integrated in our own routing program.
Since RouterOS 3.16 there is also support for simpler IGMP proxy based multicast routing.
Multicast Routing Overview
IP Multicast is a technology that allows one-to-many and many-to-many distribution of data on the Internet. Senders send their data to a multicast IP destination address, and receives express an interest in receiving traffic destined for such an address. The network then figures out how to get the data from senders to receivers.
If both the sender and receiver for a multicast group are on the same local broadcast subnet, then the routers do not need to be involved in the process, and communication can take place directly. If, however, the sender and receiver are on different subnets, then a multicast routing protocol needs to be involved in setting up multicast forwarding state on the tree between the sender and the receivers.
MikroTik supports PIM-SM multicast routing protocol. PIM means "platform independent multicast" - i.e. this protocol is not tied to any particular unicast routing IGP. SM means "sparse-mode"; as opposed to dense-mode, in sparse-mode protocols explicit control messages are used to ensure that traffic is only delivered to the subnets where there are receivers that requested to receive it.
In addition to the routing protocols used to set up forwarding state between subnets, a way is needed for the routers to discover that there are local receivers on a directly attached subnet. For IPv4 this role is served by the Internet Group Management Protocol (IGMP).
Service Models: ASM vs SSM
There are two different models for IP multicast:
- Any Source Multicast (ASM), in which a receiver joins a multicast group, and receives traffic from any senders that send to that group.
- Source-Specific Multicast (SSM), in which a receiver explicitly joins to a (source, group) pairing.
For IPv4, multicast addresses are in the range 126.96.36.199 to 188.8.131.52 inclusive. Addresses within 184.108.40.206/8 are reserved for SSM usage. Addresses in 220.127.116.11/8 are ASM addresses defined for varying sizes of limited scope. Addresses within 18.104.22.168/24 are considered link-local and are forwarded between subnets. Mostly these addresses are used by applications that do not require communication to other networks. Here are some assigned hostgroup addresses by the internet assigned numbers authority (IANA):
- 22.214.171.124 - All systems on the subnet
- 126.96.36.199 - All routers on the subnet
- 188.8.131.52 - For RIPv2
- 184.108.40.206 - For VRRP
- 220.127.116.11 - Network time protocol (NTP)
The internet assigned numbers authority (IANA) allocates ethernet addresses from 01:00:5E:00:00:00 through 01:00:5E:7F:FF:FF for multicasting, therefore leaving only 23 bits available for the multicast group ID.
When a receiver joins a multicast group, the multicast routers serving that receiver's subnet need to know that the receiver has joined so that they can arrange for multicast traffic destined for that group to reach this subnet. The Internet Group Management Protocol (IGMP) is a link-local protocol for IPv4 that communicates this information between receivers and routers. The same role for IPv6 is performed by the Multicast Listener Discovery protocol (MLD).
The basic IGMP mechanism works as follows. When a multicast receiver joins a multicast group it multicasts an IGMP Join message onto the subnet on which it is joining. The local routers receive this join, and cause multicast traffic destined for the group to reach this subnet. Periodically one of the local routers sends a IGMP Query message onto the subnet. If there are multiple multicast routers on the subnet, then one of them is elected as the sole querier for that subnet. In response to an IGMP query, receivers respond by refreshing their IGMP Join. If the join is not refreshed in response to queries, then the state is removed, and multicast traffic for this group ceases to reach this subnet.
There are three different versions of IGMP:
- IGMP version 1 functions as described above.
- IGMP version 2 adds support for IGMP Leave messages to allow fast leave from a multicast group.
- IGMP version 3 adds support for source include and exclude lists, to allow a receiver in indicate that it only wants to hear traffic from certain sources, or not receive traffic from certain sources.
PIM-SM Protocol Overview (From the PIM-SM specification RFC 4601)
PIM-SM relies on an underlying topology-gathering protocol to populate a routing table with routes. This routing table is called the MRIB or Multicast Routing Information Base. The routes in this table may be taken directly from the unicast routing table, or it may be different and provided by a separate routing protocol such as Multi-protocol BGP.
Regardless of how it is created, the primary role of the MRIB in the PIM-SM protocol is to provide the next-hop router along a multicast-capable path to each destination subnet. The MRIB is used to determine the next-hop neighbor to which any PIM Join/Prune message is sent. Data flows along the reverse path of the Join messages. Thus, in contrast to the unicast RIB which specifies the next-hop that a data packet would take to get to some subnet, the MRIB gives reverse-path information, and indicates the path that a multicast data packet would take from its origin subnet to the router that has the MRIB.
Phase One: RP Tree
In phase one, a multicast receiver expresses its interest in receiving traffic destined for a multicast group. Typically it does this using IGMP or MLD. One of the receiver's local PIM routers is elected as the Designated Router (DR) for that subnet. On receiving the receiver's expression of interest, the DR then sends a PIM Join message towards the Rendezvous Point (RP) for that multicast group. The RP is a PIM-SM router that has been configured to serve a bootstrapping role for certain multicast groups. This Join message is known as a (*,G) Join because it joins group G for all sources to that group. The (*,G) Join travels hop-by-hop towards the RP for the group, and in each router it passes through, multicast tree state for group G is instantiated. Eventually the (*,G) Join either reaches the RP, or reaches a router that already has (*,G) Join state for that group. When many receivers join the group, their Join messages converge on the RP, and form a distribution tree for group G that is rooted at the RP. This is known as the RP Tree (RPT), and is also known as the shared tree because it is shared by all sources sending to that group. Join messages are resent periodically so long a the receiver remains in the group. When all receivers on a leaf-network leave the group, the DR will send a PIM (*,G) Prune message towards the RP for that multicast group. However if the Prune message is not sent for any reason, the state will eventually time out.
A multicast data sender just starts sending data destined for a multicast group. The sender's local router (DR) takes those data packets, unicast-encapsulates them, and sends them directly to the RP. The RP receives these encapsulated data packets, decapsulates them, and forwards them onto the shared tree. The packets then follow the (*,G) multicast tree state in the routers on the RP Tree, being replicated wherever the RP Tree branches, and eventually reaching all the receivers for that multicast group. The process of encapsulating data packets to the RP is called registering, and the encapsulation packets are known as PIM Register packets. At the end of phase one, multicast traffic is flowing encapsulated to the RP, and then natively over the RP tree to the multicast receivers.
Phase Two: Register-Stop
Register-encapsulation of data packets is inefficient for two reasons:
- Encapsulation and decapsulation may be relatively expensive operations for a router to perform, depending on whether or not the router has appropriate hardware for these tasks.
- Traveling all the way to the RP, and then back down the shared tree may entail the packets traveling a relatively long distance to reach receivers that are close to the sender. For some applications, this increased latency is undesirable.
Although Register-encapsulation may continue indefinitely, for the reasons above, the RP will normally choose to switch to native forwarding. To do this, when the RP receives a register-encapsulated data packet from source S on group G, it will normally initiate an (S,G) source-specific Join towards S. This Join message travels hop-by-hop towards S, instantiating (S,G) multicast tree state in the routers along the path. (S,G) multicast tree state is used only to forward packets for group G if those packets come from source S. Eventually the Join message reaches S's subnet or a router that already has (S,G) multicast tree state, and then packets from S start to flow following the (S,G) tree state towards the RP. These data packets may also reach routers with (*,G) state along the path towards the RP - if so, they can short-cut onto the RP tree at this point.
While the RP is in the process of joining the source-specific tree for S, the data packets will continue being encapsulated to the RP. When packets from S also start to arrive natively at the the RP, the RP will be receiving two copies of each of these packets. At this point, the RP starts to discard the encapsulated copy of these packets, and it sends a Register-Stop message back to S's DR to prevent the DR unnecessarily encapsulating the packets.
At the end of phase 2, traffic will be flowing natively from S along a source-specific tree to the RP, and from there along the shared tree to the receivers. Where the two trees intersect, traffic may transfer from the source-specific tree to the RP tree, and so avoid taking a long detour via the RP. It should be noted that a sender may start sending before or after a receiver joins the group, and thus phase two may happen before the shared tree to the receiver is built.
Phase 3: Shortest-Path Tree
Although having the RP join back towards the source removes the encapsulation overhead, it does not completely optimize the forwarding paths. For many receivers the route via the RP may involve a significant detour when compared with the shortest path from the source to the receiver. To obtain lower latencies, a router on the receiver's LAN, typically the DR, may optionally initiate a transfer from the shared tree to a source-specific shortest-path tree (SPT). To do this, it issues an (S,G) Join towards S. This instantiates state in the routers along the path to S. Eventually this join either reaches S's subnet, or reaches a router that already has (S,G) state. When this happens, data packets from S start to flow following the (S,G) state until they reach the receiver.
At this point the receiver (or a router upstream of the receiver) will be receiving two copies of the data - one from the SPT and one from the RPT. When the first traffic starts to arrive from the SPT, the DR or upstream router starts to drop the packets for G from S that arrive via the RP tree. In addition, it sends an (S,G) Prune message towards the RP. This is known as an (S,G,rpt) Prune. The Prune message travels hop-by-hop, instantiating state along the path towards the RP indicating that traffic from S for G should NOT be forwarded in this direction. The prune is propagated until it reaches the RP or a router that still needs the traffic from S for other receivers. By now, the receiver will be receiving traffic from S along the shortest-path tree between the receiver and S. In addition, the RP is receiving the traffic from S, but this traffic is no longer reaching the receiver along the RP tree. As far as the receiver is concerned, this is the final distribution tree.
Multi-access Transit LANs
In contrast to PtP interfaces, multi-access LANs present a new problem. Now there can be more than one router that is can connected both to upstream and downstream networks. If all of these routers were forwarding multicast traffic to clients, that would be a waste of bandwidth. To avoid this problem, only one router is elected as traffic forwarder. The election is done by using PIM Assert mesages.
PIM-SM routers need to know the address of the RP for each group for which they have (*,G) state. This address is obtained either through a bootstrap mechanism or through static configuration. One dynamic way to do this is to use the Bootstrap Router (BSR) mechanism. One router in each PIM-SM domain is elected the Bootstrap Router through a simple election process. All the routers in the domain that are configured to be candidates to be RPs periodically unicast their candidacy to the BSR. From the candidates, the BSR picks an RP-set, and periodically announces this set in a Bootstrap message. Bootstrap messages are flooded hop-by-hop throughout the domain until all routers in the domain know the RP-Set. To map a group to an RP, a router hashes the group address into the RP-set using an order-preserving hash function (one that minimizes changes if the RP-Set changes). The resulting RP is the one that it uses as the RP for that group.
Multicast and Wireless
Multicast (and broadcast) over wireless works over wireless depending on who is transmitting multicast packet:
If AP transmits multicast packet, packet is transmitted over air with multicast receiver address (and yes - only single copy of packet is transmitted no matter how many clients are registered). As there is no single recipient of packet, it does not get acked - therefore delivery is not reliable (no retransmissions in case somebody does not receive packet). Due to this unreliable delivery, lowest basic rate is used to ensure as reliable delivery as possible. So even if you can send unicast stream between AP and Station using 54Mpbs air rate, multicasts are sent only using 6Mbps air rate (assuming that 6Mbps is lowest basic rate).
If Station transmits multicast packet, it is transmitted over air with unicast AP receiver address (as Stations always transmit all packets to AP, and always using unicast receiver address). Due to this, delivery of multicast packets from Station to AP is reliable and subject to normal rate selection process - max throughput can be used. What complicates the matter - when AP receives multicast packet from Station, it processes it locally, but it also must send it back over the air so that the rest of stations in wireless network see it (think of AP as switch in ethernet, except that is does not have to send multiple copies of packet for all clients to receive). This causes AP to execute multicast transmission procedure as described above, therefore every multicast packet gets transmitted over air twice - first time by Station (using its transmit rate) and second time - by AP using lowest basic rate. This behaviour can be altered by disabling "forwarding" option for particular Station on AP (or for all clients using "default-forwarding" option) - disabling forwarding causes AP not to forward traffic between clients and also disables "sending back" multicasts and broadcasts received from client, because that can be considered special case of forwarding traffic between clients.
If multicast traffic is forwarded across WDS link, it is transmitted over air with unicast receiver address (remote end of WDS link) and therefore is reliable and subject to normal rate selection.
From above we can draw some conclusions, how to increase wireless network throughput:
- Ensure that unicast receiver address is used when transmitting multicast
- Increase lowest basic rate if feasible.
Some ideas on how to apply above conclusions:
- In case multicast traffic is to be delivered over point-to-point link (e.g. some backbone link), you should ensure by some means that multicasts are delivered over wireless using unicast receiver address. This can be achieved by using tunnels (as previously discussed) or by using WDS links (either AP-to-AP WDS or AP-to-StationWDS WDS link). If WDS links are used, care must be taken not to inject multicasts in regular wireless interface (to avoid regular AP multicast transmission procedure).
- In case WDS can not be used, network should be planned so that Station is transmitting multicasts and AP should have forwarding disabled.
- In case multicasts are to be delivered to multiple destinations in wireless network, it is best to organize network such that AP is transmitting multicasts (because AP will transmit just one copy) and increase lowest basic rate, if throughput is not enough.
Almost minimal setup where multicast routing is necessary:
- multicast sender (server);
- multicast receiver (client);
- two routers running PIM between them.
Multicast traffic in this example will be destined to address 18.104.22.168
Sender -- (subnet I) --> Router A -- (subnet II) --> Router B -- (subnet III) --> Receiver
Router A will be configured as Rendezvous Point.
Enable PIM and IGMP router A:
[admin@A] > routing pim interface add [admin@A] > routing pim interface p Flags: X - disabled, I - inactive, D - dynamic, R - designated-router, v1 - IGMPv1, v2 - IGMPv2, v3 - IGMPv3 # INTERFACE PROTOCOLS 0 v2 all pim igmp 1 DRv2 ether3 pim igmp 2 DR register pim
Configure static Rendezvous Point:
[admin@A] > routing pim rp add address=<IP of router A>
You may also need to configure alternative-subnets on upstream interface - in case if the multicast sender address is in an IP subnet that is not directly reachable from the local router.
[admin@MikroTik] > routing pim interface set <upstream-interface> alternative-subnets=22.214.171.124/24,126.96.36.199/24
Enable PIM and IGMP router B:
[admin@B] > routing pim interface add interface=ether1 [admin@B] > routing pim interface p Flags: X - disabled, I - inactive, D - dynamic, R - designated-router, v1 - IGMPv1, v2 - IGMPv2, v3 - IGMPv3 # INTERFACE PROTOCOLS 0 Rv2 ether1 pim igmp 1 DR register pim
Configure static Rendezvous Point:
[admin@B] > routing pim rp add address=<IP of router A>
Add route on multicast sender:
# ip route add 188.8.131.52/32 via <IP of router A>
Start sender and receiver programs. You can either write simple programs yourself, or use any of these:
- MGEN is a program that can be used to send or receive multicast traffic, etc: http://cs.itd.nrl.navy.mil/work/mgen/index.php
- Alternatively, mtest is a bare-bones sender/receiver program available from: http://netweb.usc.edu/pim/pimd/
And hey, it works! Client should receive data now.
- Route metric cannot be configured, 0xffff is always used instead (important for PIM asserts). Route distance (from FIB or static MRIB) is used as "metric preference" and can be only in range 0..255.
- Scope zones are not supported.
- It is unclear whether Linux kernel fully supports IGMPv3.
Q. Does MT support Source Specific Multicast (SSM)?
A. Yes, SSM is a part of PIM-SM specification and we support it.
Q. Is support for PIM-DM planned?
A. No, as PIM-SM performs good in almost every setup, both sparse and dense.
- XORP User Manual, chapters 13 - 16
- Multicast tutorial. Deals with multicast addressing IGMP, PIM-SM / SSM, MSDP and MBGP
- RFC 2236: Internet Group Management Protocol, Version 2
- RFC 3376: Internet Group Management Protocol, Version 3
- RFC 4601: Protocol Independent Multicast - Sparse Mode (PIM-SM): Protocol Specification (Revised)
- RFC 5059: Bootstrap Router (BSR) Mechanism for PIM