Difference between revisions of "NetworkPro on Quality of Service"

From MikroTik Wiki
Jump to: navigation, search
m (The TCP Window)
m (Control is Outbound)
Line 27: Line 27:
 
===Control is Outbound===
 
===Control is Outbound===
  
Inbound traffic for router - traffic that hits routers' interfaces, no matter from what side - Internet or local - it will be received by interface no matter what, even malformed packets, and you cannot do anything with these.
+
Inbound traffic for router - traffic that hits routers' interfaces, no matter from what side - Internet or local - it will be received by interface no matter what, even malformed packets, and you cannot do anything with these.<br/>
 
Outbound traffic for router - traffic that goes out of routers' interfaces, no matter of direction, to your network or out of it. This is where you can set up queues and prioritize, and limit!
 
Outbound traffic for router - traffic that goes out of routers' interfaces, no matter of direction, to your network or out of it. This is where you can set up queues and prioritize, and limit!
  

Revision as of 14:03, 24 September 2012

Version.png

Applies to RouterOS: v2.9, v3, v4, v5

Theory

Let's begin with some theory that I've accumulated over the years.
It should save your life when dealing with Bandwidth Management and QoS.

Please register in the MikroTik WiKi and help us out by editing and suppliying articles.

TCP control for QoS

wikipedia.org/wiki/Bandwidth_management TCP rate control - artificially adjusting TCP window size as well as controlling the rate of ACKs being returned to the sender.

Icon-note.png

Note: The information at The Wikipedia.org website is considered inaccurate, biased, too general, or not secure


RouterOS HTB can adjust the TCP window by dropping packets out of the set bandwidth limit.

Icon-note.png

Note: The TCP ACKs should never be dropped! (refer to the example at the end of the article)


More general information on TCP and the ACK packet: wikipedia.org/wiki/ACK_(TCP)

The TCP Window Over-generalised

A TCP window is the amount of data a sender can send on a particular TCP connection before it gets an acknowledgment (ACK packet) back from the receiver that it has gotten some of it.

For example if a pair of hosts are talking over a TCP connection that has a TCP window size of 64 KB (kilobytes), the sender can only send 64 KB of data and then it must stop and wait for an acknowledgment from the receiver that some or all of the data has been received. If the receiver acknowledges that all the data has been received then the sender is free to send another 64 KB. If the sender gets back an acknowledgment from the receiver that it received the first 32 KB (which could happen if the second 32 KB was still in transit or it could happen if the second 32 KB got lost or dropped, shaped), then the sender could only send another 32 KB since it can't have more than 64 KB of unacknowledged data.

The primary reason for the window is congestion control. The whole network connection, which consists of the hosts at both ends, the routers in between and the actual physical connections themselves (be they fiber, copper, satellite or whatever) will have a bottleneck somewhere that can only handle data so fast. Unless the bottleneck is the sending speed of the transmitting host, then if the transmitting occurs too fast the bottleneck will be surpassed resulting in lost data. The TCP window throttles the "transmission speed" down to a level where congestion and data loss do not occur.

Control is Outbound

Inbound traffic for router - traffic that hits routers' interfaces, no matter from what side - Internet or local - it will be received by interface no matter what, even malformed packets, and you cannot do anything with these.
Outbound traffic for router - traffic that goes out of routers' interfaces, no matter of direction, to your network or out of it. This is where you can set up queues and prioritize, and limit!

HTB allows us to work with the Outgoing traffic (the traffic that is leaving the router via any interface).

Example1: Client sends out 10Mbps UDP traffic - this traffic will get to the routers local interface, but in one of the HTBs (global-in, global-total, global-out or outgoing interface) it will be shaped to (let’s say) 1Mbps. So only 1Mbps will leave the router. But in the next second the client can send 10Mbps once again and we will shape them again.

Example2: Client sends out 10Mbps TCP traffic - this traffic will get to the routers local interface, but in one of the HTBs (global-in, global-total, global-out or outgoing interface) it will be shaped to (let’s say) 1Mbps. So only 1Mbps will leave the router. Source gets ACK replies only for 1Mbps of 10Mbps, so source, in the next second, will send a little more than 1Mbps of TCP traffic. (due to TCP Window adjusting to our shaped bandwidth)

There are 4 ways we can look at a flow:
1) client upload that router receives on the local interface
2) client upload that router sends out to the Internet
3) client download that router receives on the public interface
4) client download that router sends out to the customer

1) and 3) - is Inbound traffic
2) and 4) - is Outbound traffic

HTB can control 1) when its sent out as 2) and 3) when its sent out as 4)

Icon-note.png

Note: Connections can't be upload or download, packets are. For example TCP connections have traffic in both directions.


QoS Packet Flow & Double Control

QoS Packet Flow.gif

Working with packets for bandwidth management is done in this order:

1. Mangle chain prerouting
2. HTB global-in
3. Mangle chain forward
4. Mangle chain postrouting
5. HTB global-out
6. HTB out interface

So, in one router, you can do:

a) in #1+#2 - first marking & shaping, in #3+#5 - second marking & shaping
b) in #1+#2 - first marking & shaping, in #3+#6 - second marking & shaping
c) in #1+#2 - first marking & shaping, in #4+#5 - second marking & shaping
d) in #1+#2 - first marking & shaping, in #4+#6 - second marking & shaping

In his presentation http://mum.mikrotik.com/presentations/CZ09/QoS_Megis.pdf
Janis Megis says that creating priorities separately for each client is suicide - there is no hardware that can handle small queue tree for every user (if you have 1000 of them). So he has the next best thing, which is as close as possible to the desired behavior.

The main Idea of the setup is to have two separate QoS steps:

1) In the first step we prioritize traffic, we are making sure that traffic with higher priority has more chance to get to the customers than traffic with lower priority.

Example: We have total of 100Mbps available, but clients at this particular moment would like to receive 10Mbps of Priority=1 traffic, 20Mbps of Priority=4 and 150Mbps of Priority=8. Of course after our prioritization and limitation 80Mbps of priority=8 will be dropped. And only 100Mbps total will get to the next step.

2) Next step is per-user limitation, we already have only higher priority traffic, but now we must make sure that some user will not overuse it, so we have PCQ with limits.

This way we get virtually the same behavior as "per user prioritization".

So the plan is to mark by traffic type in prerouting and limit by traffic type in global-in. Then remark traffic by IP addresses in forward and limit in global-out.

1) you need to mark all traffic, that would be managed by one particular Queue, at the same place (prerouting)
2) if you use global-total/in/out or Queue Simple or if you use Queue Tree and you do not mark connections first, you must mark upload and download for every type of traffic separately
4) if you do not use a simple PCQ, you must have a parent queue, that has max-limit and (let’s say) parent=global-in, all other queues parent=<parent>
5) you need 2 sets of those queues - one for upload, one for download

Create packet marks in the mangle chain "Prerouting" for traffic prioritization in the global-in Queue
Limitation for in mangle chain "forward" marked traffic can be placed in the "global-out" or interface queue

If queues will be placed in the interface queues
queues on the public interface will capture only client upload
queues on the local interface will capture only client's download

If queues will be placed in global-out
download and upload will be limited together (separate marks needed)

Double Control is achieved with Queue Tree

HTB Queue Tree VS Simple Queue

Queue Tree is an unidirectional queue in one of the HTBs. It is also The way to add a queue on a separate interface (parent:ether1/pppoe1..) - this way it is possible to ease mangle configuration - you don't need separate marks per outgoing interface.
Also it is possible to have double queuing, for example: prioritization of traffic in global-in or global-out, and limitation per client on the outgoing interface.

Icon-note.png

Note: If you have simple queues and queue tree in the same HTB - simple queues will get the traffic first, so the Queue Tree after that will not work.


Queue Tree is not ordered - all traffic passes it together, where as with Queue Simple - traffic is evaluated by each Simple Queue, one by one, from top to bottom. If the traffic matches the Simple Queue - then its managed by it, otherwise its passed down.

Each simple queue creates 3 separate queues:
One in global-in ("direct" part)
One in Global-out ('reverse" part)
One in Global-total ('total" part)
Simple queues are ordered - similar to firewall rules
further down = longer packet processing
further down = smaller chance to get traffic
so it’s necessary to reduce the number of queues

In the case of Simple Queues, the order is for 'catching traffic' (mangle) and the "priority" is the HTB feature.

http://wiki.mikrotik.com/wiki/Queue

Guaranteeing Bandwidth with the HTB "Priority" Feature

See the official article http://wiki.mikrotik.com/wiki/Manual:HTB

We already know that limit-at (CIR) to all queues will be given out no matter what.

Priority is responsible for distribution of remaining parent queues traffic to child queues so that they are able to reach max-limit

Icon-note.png

Note: Although no one nowhere talks about it, my conclusion is that this "reaching of max-limit" gives time to each leaf in a round-robin manner. Where as the tests show that reaching of limit-at is a process that will give better performance to priority traffic.


Queue with higher priority will reach its max-limit before the queue with lower priority. 8 is the lowest priority, 1 is the highest.

Make a note that priority only works:

  • for leaf queues - priority in inner queue have no meaning.
  • if max-limit is specified (not 0)

"Priority" feature (1..8) : Prioritize one child queue over other child queue. Does not work on parent queues (if queue has at least one child). One is the highest, eight is the lowest priority. Child queue with higher priority will have chance to reach its limit-at before child with lower priority (confirmed) and after that child queue with higher priority will have chance to reach its max-limit before child with lower priority.

Icon-note.png

Note: The "Priority" feature has nothing to do with the "bursts" HTB feature.


Guaranteeing Priority with the Limit-At setting

Priority traffic will have better performance in its limit-at than in between limit-at and max-limit. Currently for me this means, although no one nowhere talks about it, that a leaf with higher "priority" setting is given more transmission time when the bandwidth its trying to consume is within its limit-at.

A queue tree with all default-small leaf queues, will degrade performance by 10ms on a leaf queue that has more traffic than whats configured in its limit-at. If traffic is within this limit-at - performance is OK - no additional delays could be noticed. No information about this could be seen from any columns inside WinBox - Bytes, Packets, Dropped, Lends, Borrows, Queued Bytes, Queued Packets. Changing to Queue Type "default" (50 packets) did not increase the delay and did not change it noticably.

Queue Type

Queue type applies only to Child queues. It doesn't matter what queue type you set for the parent. The parent is only used to set the max-limit and to group the leaf queues.

My tests show that PCQ by dst-port for example will have better results than default and default-small. SFQ as the wireless default- as well. So much in fact that I would never use a FIFO queue again. I wonder why its the default on Ethernet. I guess its because FIFO uses as little resources as possible. P.S. use the new hardware-only queue on RouterBOARDs and miniPCI cards.

Icon-note.png

Note: PCQ is optimized in new versions of RouterOS. Please use the latest version. The upcoming v6 should have even more multicore optimizations and simple queue optimizations.


More information:
http://wiki.mikrotik.com/wiki/Queue#Queue_Types

Queue Size

http://wiki.mikrotik.com/wiki/Manual:Queue_Size

Lends VS Borrows

  • a Lend happens when the packet is treated as limit-at
  • a Borrow happens when the packet is between limit-at and max-limit

Adjusting the Max-Limit

Without setting the max-limit properly, the HTB Queue Tree will not drop enough low-priority packets, so the bandwidth control would be lost. In order to have Control - we must set the max-limit to a lower value - 99.99% to 85% of the tested throuput of the bottleneck.
Also to keep control our limi-ats combined must not be more than the max-limit.

I know it bugs you to set only 80% of your available bandwidth in these fields but this is required. There will be a bottleneck somewhere in the system and QoS can only work if the bottleneck is in your router where it has some control. The goal is to force the bottleneck to be in your router as opposed to some random location out on the wire over which you have no control.

The situation can become confusing because most ISPs offer only "Best effort" service which means they don't actually guarantee any level of service to you. Fortunately there is usually a minimum level that you receive on a consistent basis and you must set your QoS limits below this minimum. The problem is finding this minimum. For this reason start with 80% of your measured speed and try things for a couple of days. If the performance is acceptable you can start to inch your levels up. But I warn you that if you go even 5% higher than you should be your QoS will totally stop working (just too high) or randomly stop working (high when your ISP is slow). This can lead to a lot of confusion on your part so take my advice and get it working first by conservatively setting these speeds and then optimize later..

QoS Example

Upload QoS for ADSL, tested and seems working good enough. Can be easily adapted and upgraded for all needs - up/down, ADSL or dedicated optic.

/ip firewall mangle
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_1_Up dst-port=80,443 packet-size=0-666 protocol=tcp tcp-flags=syn comment=QoS
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_1_Up dst-port=80,443 packet-size=0-123 protocol=tcp tcp-flags=ack
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_1_Up dst-port=53,123 protocol=udp
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_2_Up dst-port=80,443 connection-bytes=0-1000000 protocol=tcp
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_2_Up dst-port=110,995,143,993,25,20,21 packet-size=0-666 protocol=tcp tcp-flags=syn
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_2_Up dst-port=110,995,143,993,25,20,21 packet-size=0-123 protocol=tcp tcp-flags=ack
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_3_Up packet-size=0-666 protocol=tcp tcp-flags=syn
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_3_Up packet-size=0-123 protocol=tcp tcp-flags=ack
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_4_Up dst-port=110,995,143,993,25,20,21 protocol=tcp
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_4_Up dst-port=80,443 connection-bytes=1000000-0 protocol=tcp
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_8_Up p2p=all-p2p
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_7_Up
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_2_Up src-port=8291 comment=WinBox


/queue tree
add max-limit=666K name=QoS_ADSL1_Up parent=ADSL1
add name=QoS_1 packet-mark=QoS_1_Up parent=QoS_ADSL1_Up priority=1
add name=QoS_2 packet-mark=QoS_2_Up parent=QoS_ADSL1_Up priority=2
add name=QoS_3 packet-mark=QoS_3_Up parent=QoS_ADSL1_Up priority=3
add name=QoS_7 packet-mark=QoS_7_Up parent=QoS_ADSL1_Up priority=7
add name=QoS_8 packet-mark=QoS_8_Up parent=QoS_ADSL1_Up priority=8
add name=QoS_4 packet-mark=QoS_4_Up parent=QoS_ADSL1_Up priority=4
Icon-note.png

Note: This Mangle setup does not take into account any DSCP values of incoming traffic. To use the DSCP values - they must be recognised and the packets marked with the according packet-mark. This would help with VoIP applications for example


Icon-note.png

Note: This Mangle setup will not catch most BitTorrent traffic. To catch that and put it in priority 8, you should use Layer 7 matchers. Some torrent traffic will still be unrecognized since BT inc takes care to hide their protocols better with each version. To improve overall QoS consider using latest versions of BT inc torrent clients across your network because of the µTP.


Icon-note.png

Note: To optimize Mangle rules for less CPU consumption, you should try to mark connections first with connection state new, then mark the packets of the marked connections. Also try implementing the passthrough option Manual:IP/Firewall/Mangle


Icon-note.png

Note: This HTB Queue Tree should be upgraded to a structure like Example 3 in the HTB article. Otherwise if left as shown, the leafs will be treated in a round-robin manner which could degrade performance for priority traffic. But most notably - traffic from the lower priorities would not even have a chance and will be dropped. Still this example works beautifully in production.


Icon-note.png

Note: To fix traceroute - ICMP needs to have priority, for example 4.


Icon-note.png

Note: Recommended queue sizes are small.


Test Setups

To study packet queues one can use a couple of RouterOS machines. One with packet queues and the other to generate traffic with Bandwidth Test. If sniffer is needed - the built-in is used.

So far testing with VMWare has shown wrong resulst with TCP. I have achieved correct behaviour with live RouterBOARDS as well as with Oracle VM VirtualBox.

Bandwidth-Test-Server-for-QoS-test-Primer.gif

This screenshot shows the easiest test scenario.
First addresses are assigned to ether1 - 192.168.56.55/24, 192.168.56.56/32 to 59/32
Then packets are marked in prerouting by their dst address
After that a Queue Tree is built as shown, with one leaf per packet mark.
So when generating traffic with the bandwidth test and with ping tools to one of the different IPs, you will be simulating your different priority traffic.
Then its up to ones need and imagination to experiment.



Example-test-setup.jpg

In the picture you can see an example laboratory setup. This graphic diagram I made for my MSc thesis presentation.

Heads-up on Layer 2 devices, hidden buffers and performance degradation

Make sure Flow Control is turned off on everything that passes Ethernet frames - switches, ports, devices.
Avoid using QoS or priority on cheap switches due to limitations in the level of control over the queues on those devices.

More Information

Head of Line blocking;
http://wiki.mikrotik.com/wiki/Category:QoS
http://www.mikrotik.com/testdocs/ros/3.0/qos/queue.php
http://en.wikipedia.org/wiki/Quality_of_Service

Questions and Contradictions

If you have found a contradiction and if you have a question - please ask in the forum. http://forum.mikrotik.com/