NetworkPro on Quality of Service

From MikroTik Wiki
Revision as of 13:35, 14 April 2013 by NetworkPro (talk | contribs)
Jump to: navigation, search

Applies to RouterOS: v2.9, v3, v4, v5, v6

Relevant Information

Life-saving information for the quality of our Internet - providing honest services - bandwidth management, Quality of Service, queue and drop best practices.

If some of this is not what you believe - please e-mail me and explain. I can reply in a simple and logical way that relieves any concerns. I can also correct the article to reflect the actual truth by choosing better words. Thank you. This WiKi article is helpful to understand what happens with the packets but is not anywhere near scientific quality - in the Links there are published works if you want to get into that. Some details are found by testing (now easier thanks to MikroTik Traffic Generator). Enjoy -NetworkPro

What is a Queue

A Queue is: [Wikipedia: Queue] It is pronounced KIU (cyrillic кю).

Discard packets for QoS

Protocols such as TCP/IP have a back-off mechanism - when lost packets are not acknowledged by the receiver - the sender starts sending less data. From the point of view of Internet Applications and protocols, packet loss is considered normal and informative.

RouterOS can drop packets out of the set bandwidth limit as well as according to priority configuration. This way we have free capacity for priority packets - exactly when we need it. According to set max-limt - RouterOS knows exactly how much to drop so that the router forwards only the packets that we want it to - high priority + as much low priority packets as there is available bandwidth within the remaining from that max-limit.


Note: Small TCP ACKs should not be dropped. My example further down classifies them as high priority.

For sensitive applications - we can lower the probability of a lost or queued packet significantly to a point where we can provide SLA, better experience, financial market data almost real-time, on-line gaming etc to the customer.

All this thanks to the way the Protocols are designed to handle packet loss.

The TCP Window Over-generalised

A TCP window is the amount of data a sender can send on a particular TCP connection before it gets an acknowledgment (ACK packet) back from the receiver that it has gotten some of it.

For example if a pair of hosts are talking over a TCP connection that has a TCP window size of 64 KB (kilobytes), the sender can only send 64 KB of data and then it must stop and wait for an acknowledgment from the receiver that some or all of the data has been received. If the receiver acknowledges that all the data has been received then the sender is free to send another 64 KB. If the sender gets back an acknowledgment from the receiver that it received the first 32 KB (which could happen if the second 32 KB was still in transit or it could happen if the second 32 KB got lost or dropped, shaped), then the sender could only send another 32 KB since it can't have more than 64 KB of unacknowledged data.

The primary reason for the window is congestion control. The whole network connection, which consists of the hosts at both ends, the routers in between and the actual physical connections themselves (be they fiber, copper, satellite or whatever) will have a bottleneck somewhere that can only handle data so fast. Unless the bottleneck is the sending speed of the transmitting host, then if the transmitting occurs too fast the bottleneck will be surpassed resulting in lost data. The TCP window throttles the "transmission speed" down to a level where congestion and data loss do not occur.

TCP Sawtooth and TCP Synchronization

When data networks become congested and drop packets, TCP sessions that suffer packet loss will reduce their window size to avoid congestion . Indiscriminate packet drops will (statistically speaking) signal the transmitting endpoints to slow their transmission rates, causing most of the senders to exponentially decrease their transmission rates simultaneously . This phenomenon is known as TCP synchronization, and it leads to bandwidth fluctuations on congested links . When synchronization occurs, senders reduce their transmission rates simultaneously, and slowly increase them again until links become congested—a process that then repeats itself . The random early detection (RED) algorithm solves this by randomly dropping packets as queues become full . The drop probability can be configured as a function of queue size at any given time, so the more congestion, the more aggressive the drop profile . Randomly dropping traffic before an interface becomes congested signals end hosts to slow down, preventing an overloaded queue

Often we configure bursts so that the TCP is limited in a way that delivers average throughput that is observed by the customer to be equal or slightly more than what we advertise to them.

Normal hard limit would cause the TCP backoff to impact the throughput of the connection in such a way that the customer observes lower amount of data going through than what is set as the limit.

I think this damages other protocols in congested times so I just set the limit high enough to keep customers happy.

QoS on ultra-fast Ethernet

Classic QoS Theory will not help us tweak our MikroTik setup. Most of that theory is for specific switch fabrics and routing "planes", designed and (to some extent) standardized long ago to be very friendly to TDM/SDH/Serial and lower than 10Mbps half-duplex Ethernet running ontop of those.

Problems with the legacy stuff is harmful encapsulation, low numbers of packets per second and variable delays as well as low capacities not to mention sub-optimal (no QoS) implementation by providers + simply frame loss. Problems with their forwarding planes include - expensive chips, no flexibility whatsoever due to most things done in hardware but with chips optimised to be not hugely expensive rather than to hold truly usable power/for the huge amount of energy that old stuff consumes.

Compared to what we would be doing with a RouterOS based router which is more like computer calculations with a powerful CPU/networkCPU+RAM+bus+NIC which has unlimited flexibility+ nowadays more power/for crazy-low energy consumption.

The current WAN is much faster and reliable Ethernet running native on a wavelength of a Fiber-Optic transmission system. Full-Duplex 100Mbps -> Multiple 100Mbps -> 1Gbps -> Multiple 1Gbps -> 10Gbps -> Multiple 10Gbps -> 40Gbps -> 100Gbps and so on. This increases the quality of the best-effort behaviour of the WAN.

The carriers can have additional guarantees for packet switched services by doing what they do. Like smart planning and upgrades, a lot of spare bandwidth, monitoring, and not allowing one client to overwhelm the network. The big-bandwidth multiple 10Gbps port routers supposedly have priority-aware forwarding done in hardware - not really used feature though - best-effort happens so fast, the delay is sub-millisecond and is perceived to be optimal.

With MikroTik routers we have extreme ultra low latency and no loss ultra high quality forwarding without delay on 100Mbps or more. I recommend the router to be on Gigabit Ethernet links with our provider and our clients, to make use of the lower latency even if it passes little traffic. Whether RouterOS is doing priority-aware forwarding - I haven't asked and I probably will not get an answer from MikroTik. What we know is that it forwards without delays so we can think of it as first-in-first-out without delays.

The router forwards packets as they come with sub-millisecond processing delays because today's CPUs/Network CPUs, RAM and links between them are so fast - its a piece of engineering Art - for x86 - I like to use NICs integrated in the north bridge or at least PCI-e ones.

Of course we choose hardware that can deliver the amounts of packets per second that we need the router to pass without any processing/queueing delay I personally do over-kill with router resources if budget allows it This way we avoid queueing for some software reason and we avoid queueing delays at all including in the NIC.

If this technology introduces sub-millisecond latency for some packets, as compared to another router that supposedly has hardware-based priority-aware forwarding - then we can ignore this insignificant nanosecond difference the same way we ignore it in switches. We can think of it as Ethernet's operational latency where before a frame is serialised on the wire, it waits for the previous frame to travel first - extremely fast process that is even faster in 10GbE and Data Center distributed switch fabrics.

With our own QoS we make sure our priority packets do not get dropped and do not get queued for too long. For the upstream - we prevent sending too many packets and requesting too much packets (by dropping low priority packets and signalling the protocols to slow down). We also rely on our provider to take care of that, for example if we mark them with DSCP/ToS marks (Classic QoS Theory). Or in most business cases the bandwidth is sold to us as not being shared - reserved even if we don't use it. All the traffic is high priority inside our provider's network included across their interconnects with their partners. Whatever the case - we would prefer to know exactly how much is sold to us for certain so that we can set that amount as max-limit.

When the traffic goes out to some other network, things are not so optimistic though - this is what the saying "there's no guarantees on the Internet" is about. Exceptions make specific Business WAN Services. On The Internet we can test the bandwidth and latency up to key Internet Exchange points and the places where multiple WAN providers have POPs in key cities and DCs in the world. Whoever built those interconnections though should be monitoring them and be aware of a problem before we tell them.

We also make sure we do not have operational delay and we do not have queueing in other places inside our router such as software processing and NIC hardware queues. We do this by having a router so powerful that there is no way our clients can overwhelm it and decrease its performance.

We ensure performance by monitoring with smokeping, The Dude and every way we can - this is true for everyone's network - even the most expensive and big ones.

Sharing the Bandwidth - "up to xx Mbps service" with happy customers

We can have happy customers and be honest with them. If we tell them xx Mbps - then that better be what they see most of the time. We can do this thanks to RouterOS.

Of course the "cleanest" way is to not have limits other than a control for QoS in which case we would have two services - highly shared service and let's say business assured service.

Providers get creative here to maximise profits over time. Personally - If I have bandwidth - I would let the customers use it all - no point of discarding rather than forwarding if there is sufficient capacity. Of course when a business user starts using up their guaranteed bandwidth - home users traffic will be discarded to make way for the guys that are paying us real money for our premium services.

We can monitor and review weekly, monthly and yearly graphs so that we would be able to predict what we can share so that we sell affordable service to home customers while keeping our promise for what the speed will be most of the time.

This is what the saying "It's easier to sell Internet connectivity to a lot of customers rather then to just few" is partly about.

Control is Outbound

Router Inbound traffic - traffic that is received by the router from any side. Any frames serialised by someone else on the wire and travelling towards the router.
Router Outbound traffic - traffic that goes out of a router's interface, towards the world or towards our clients. This is where we set up limits and manage the bandwidth.

RuterOS HTB allows us to work with the Outgoing traffic - the traffic that is leaving the router via any interface.

  • Example1: Client sends out 10Mbps UDP traffic - this traffic will get to the routers local interface, but it will be shaped to (let’s say) 1Mbps. So only 1Mbps will leave the router towards the Internet. But in the next second the client can send 10Mbps once again and we will shape them again.
  • Example2: Client sends out 10Mbps TCP traffic - this traffic will get to the routers local interface, but it will be shaped to (let’s say) 1Mbps. So only 1Mbps will leave the router. Source gets ACK replies only for 1Mbps of 10Mbps, so source, in the next second, will send a little more than 1Mbps of TCP traffic. (TCP will back-off)

There are 4 ways we can look at traffic:
1) client's upload that the router receives on the local interface
2) client's upload that the router sends to the Internet
3) client's download that the router receives on the public interface
4) client's download that the router sends to the customer

1) and 3) - is Inbound traffic
2) and 4) - is Outbound traffic

We control 1) when its sent out as 2) and 3) when its sent out as 4)


Note: We think of Connections as "Both Upload and Download" since most connections have packets in both directions. We think of Packets as either "Download" or "Upload".

QoS Packet Flow & Single-WAN Dual-Step Control with RouterOS v5.21

QoS Packet Flow.gif

With RouterOS v5.x and lower, when we have configuration for QoS, packets are processed in the router in the following order:

1. Mangle chain prerouting
2. HTB global-in
3. Mangle chain forward
4. Mangle chain postrouting
5. HTB global-out *
6. HTB out interface

So, in one router, we can do:

a) at 1 and 2 - first marking & shaping, at 3 and 5 - second marking & shaping *
b) at 1 and 2 - first marking & shaping, at 3 and 6 - second marking & shaping
c) at 1 and 2 - first marking & shaping, at 4 and 5 - second marking & shaping *
d) at 1 and 2 - first marking & shaping, at 4 and 6 - second marking & shaping


Note: * In case of SRC-NAT (masquerade) Global-Out will be aware of private client addresses, but Interface HTB will not. Full diagrams here Manual:Packet_Flow Since v6 optimisation, Interface HTB is also aware of src-nat

With this presentation

Janis Megis helps us optimise the performance of our router and use Queue Tree with PCQ instead of a big number of Simple Queues.

Note: Simple Queue will be optimised in RouterOS v6

The presentation shows two separate QoS steps:

1) In the first step we prioritize packets by type

Example: We have total of 100Mbps available, but clients at this particular moment would like to receive 10Mbps of Priority=1 traffic, 20Mbps of Priority=4 and 150Mbps of Priority=8. Of course after our prioritization and limitation 80Mbps of priority=8 will be dropped. And only 100Mbps total will get to the next step.

2) Next step is per-user limitation, we already have only higher priority traffic, but now we must make sure that some user will not overuse it, so we have PCQ with limits.

This way we get virtually the same behavior as "per user prioritization".

So the plan is to mark by traffic type in prerouting and limit by traffic type in global-in. Then remark traffic by IP addresses in forward and limit in global-out.

1) we mark all traffic, that would be managed by one particular Queue, at the same place (prerouting)
2) if we use global-total/in/out or Queue Simple or if we use Queue Tree and we do not mark connections first, we must mark upload and download for every type of traffic separately
4) if we do not use a simple PCQ, we must have a parent queue, that has max-limit and (let’s say) parent=global-in, all other queues parent=<parent>
5) we need 2 sets of those queues - one for upload, one for download

Create packet marks in the mangle chain "Prerouting" for traffic prioritization in the global-in Queue
Limitation for in mangle chain "forward" marked traffic can be placed in the "global-out" or interface queue

If queues will be placed in the interface queues
queues on the public interface will capture only client upload
queues on the local interface will capture only client's download

If queues will be placed in global-out
download and upload will be limited together (separate marks needed)

Double Control is achieved with Queue Tree


Note: If we have queue-ing in each PCQ sub-queue - this can be considered "bufferbloat" and could defeat QoS control for each customer when they use up the available bandwidth. To avoid this, we can move PCQ to the first step and/or not have Rate set in the PCQ plus of course have smaller queue sizes for the PCQ.

(needs editing) QoS Packet Flow in RouterOS v6

(needs editing) Multi-WAN Dual-Step Control with RouterOS v6

We want to control max-limit per WAN interface and max-limit (rate) per user.
To achieve this we can mark in Prerouting OR Postrouting based on connection mark + a second indication. So that we end up with packet mark which is only for packets that came in the router from a specific interface and are for specific user-max-limit-group.

First step is in Global and second step is in interface HTB. In each of the two steps we can implement sub-queues for traffic-type priority if we marked that separately.

(needs editing) Dual-Purpose Control with RouterOS v6


in v6 compared to v5 there are no more separate Global-Total/Global-In/Global-Out HTBs.

Simple queue functionality remains the same, just much faster - we can still use limit for download, upload and total.

Currently in v6 we can have dual QoS system by creating small simple queue tree for each user.

/queue simple
add target= burst-limit=100M/100M burst-threshold=10M/10M burst-time=1m/1m limit-at=2M/2M max-limit=10M/10M
add limit-at=500k/500k max-limit=1M/1M packet-marks=priority1_mark parent=queue1 priority=1/1 target=ether1_local total-priority=1
add limit-at=500k/500k max-limit=1M/1M packet-marks=priority4_mark parent=queue1 priority=4/4 target=ether1_local total-priority=4
add limit-at=500k/500k max-limit=3M/3M packet-marks=priority2_mark parent=queue1 priority=2/2 target=ether1_local total-priority=2
add limit-at=500k/500k max-limit=5M/5M packet-marks=priority6_mark parent=queue1 priority=6/6 target=ether1_local total-priority=6

As number of simple queue are not relevant anymore then we can have thousands of them and we can easily create this with script for large number of addresses. as only thing that changes are target address on parent interface.

:for i from=1 to 2 do={
:for j from=1 to 100 do={
/queue simple
add target="192.168.$i.$j/32" name=("client".($i*100+$j)) burst-limit=100M/100M burst-threshold=10M/10M burst-time=1m/1m limit-at=2M/2M max-limit=10M/10M
add limit-at=500k/500k max-limit=1M/1M packet-marks=priority1_mark parent=("client".($i*100+$j)) priority=1/1 target=ether1_local total-priority=1
add limit-at=500k/500k max-limit=1M/1M packet-marks=priority4_mark parent=("client".($i*100+$j)) priority=4/4 target=ether1_local total-priority=4
add limit-at=500k/500k max-limit=3M/3M packet-marks=priority2_mark parent=("client".($i*100+$j)) priority=2/2 target=ether1_local total-priority=2
add limit-at=500k/500k max-limit=5M/5M packet-marks=priority6_mark parent=("client".($i*100+$j)) priority=6/6 target=ether1_local total-priority=6

This way we achieve Dual-Purpose control - max limit per user + QoS per user. The example also includes burst. Example lacks the Mangle part.

(given to MikroTik as proposition) Multi-WAN Multi-Peering Multi-Purpose Control with RouterOS v6

NEED: Each WAN may have country peering + international Internet. We would like these to be under one max-limit so that we give a higher chance to the international traffic to increase the quality of that while still having some limit-at for the country peering.

On each WAN: marked packets that go into two or more HTBs that would ultimately give us:
- (1) overall priority for gaming, VoIP
- (2) overall priority for international without choking the local peering (international_limit-at=80% priority=1 local_limit-at=20% priority=2)
- (3) per-user max-limit (PCQ rate) while keeping (1) and (2) valid, even if a user downloads torrents/floods his line
- (4) per-price user groups - for example "1/High SLA fully guaranteed"; "2/business and premium"; "3/shared fast" and "4/shared affordable"
- (5) possibility to not have gaming priority for service package 4 shared affordable
- (6) all this to work in high-pps high-bandwidth environments with for example 25 000 separate customers

HTB Queue Tree VS Simple Queue

Queue Tree is an unidirectional queue in one of the HTBs. It is also The way to add a queue on a separate interface (parent:ether1/pppoe1..) - this way it is possible to ease mangle configuration - we don't need separate marks per outgoing interface.
Also it is possible to have double queuing, for example: prioritization of traffic in global-in or global-out, and limitation per client on the outgoing interface.


Note: If we have simple queues and queue tree in the same HTB - simple queues will get the traffic first, so the Queue Tree after that will not work.

Queue Tree is not ordered - all traffic passes it together, where as with Queue Simple in RouterOS v5 and below - traffic is evaluated by each Simple Queue, one by one, from top to bottom. If the traffic matches the Simple Queue - then its managed by it, otherwise its passed down. In RouterOS v6 and up - Queue Simple is much more optimised and even recommended in some cases.

In RouterOS v5 and below, each simple queue creates 3 separate queues:
One in global-in ("direct" part)
One in Global-out ('reverse" part)
One in Global-total ('total" part)
Simple queues are ordered - similar to firewall rules
further down = longer packet processing
further down = smaller chance to get traffic
so it’s necessary to reduce the number of queues or better - upgrade to RouterOS v6/latest

In the case of Simple Queues, the order is for 'catching traffic' (mangle) and the "priority" is the HTB feature.

Guaranteeing Bandwidth with the HTB "Priority" Feature

See the official article

We already know that limit-at (CIR) to all queues will be given out no matter what.

Priority is responsible for distribution of remaining parent queues traffic to child queues so that they are able to reach max-limit

Queue with higher priority will reach its max-limit before the queue with lower priority. 8 is the lowest priority, 1 is the highest.

Make a note that priority only works:

  • for leaf queues - priority in inner queue have no meaning.
  • if max-limit is specified (not 0)

"Priority" feature (1..8) : Prioritize one child queue over other child queue. Does not work on parent queues (if queue has at least one child). One is the highest, eight is the lowest priority. Child queue with higher priority will have chance to reach its limit-at before child with lower priority (confirmed) and after that child queue with higher priority will have chance to reach its max-limit before child with lower priority.


Note: The "Priority" feature has nothing to do with the "bursts" HTB feature.

Guaranteeing Priority with the Limit-At setting

Having Limit-At set, the packets would not be delayed in a packet queue (over-generalised explanation). This way we can consider this to be as much as "Low Latency Queueing" as possible. LLQ is not needed and does not apply to such a setup.

Queue Type, Size and Delay

Queue type applies only to Child queues. It doesn't matter what queue type we set for the parent. The parent is only used to set the max-limit and to group the leaf queues.

My tests show that PCQ by dst-port for example will have better results than default and default-small. SFQ as the wireless default- as well. So much in fact that I would never use a FIFO queue again. I wonder why its the default on Ethernet. I guess its because FIFO uses as little resources as possible.

Please use the new hardware-only queue on RouterBOARDs and miniPCI cards. I use these a lot and for wired interfaces on RouterBOARDs - this is the default now.

We avoid queueing anywhere in the router and NIC by queueing and dropping in our own QoS Queue Tree.

We can use a large queue only for traffic that we are sure can handle delay such as video on demand streaming.

We know that we can benefit from "max up to 5ms queueing" as per Van Jacobson latest publication / the CODEL project.


Note: PCQ is optimized in new versions of RouterOS. Please use the latest version. The upcoming v6 should have even more multicore optimizations and simple queue optimizations.

More information:

Lends VS Borrows

  • a Lend happens when the packet is treated as limit-at
  • a Borrow happens when the packet is between limit-at and max-limit

Adjusting the Max-Limit and ensuring we have control

Without setting the max-limit properly, the HTB Queue Tree will not drop enough low-priority packets, so the bandwidth control would be lost. In order to have Control - we must set the max-limit to a lower value - 99.99% to 85% of the tested throuput of the bottleneck.
Also to keep control our limi-ats combined must not be more than the max-limit.

I know it bugging to set only 80% of our available bandwidth in these fields but this is required. There will be a bottleneck somewhere in the system and QoS can only work if the bottleneck is in our router where it has some control. The goal is to force the bottleneck to be in our router as opposed to some random location out on the wire over which we have no control.

The situation can become confusing because most ISPs offer only "Best effort" service which means they don't actually guarantee any level of service to us. Fortunately there is usually a minimum level that we receive on a consistent basis and we must set our QoS limits below this minimum. The problem is finding this minimum. For this reason start with 80% of our measured speed and try things for a couple of days. If the performance is acceptable we can start to inch our levels up. But if we go even 5% higher than we should be, our QoS will totally stop working (just too high) or randomly stop working (high when our ISP is slow). This can lead to a lot of confusion so we get it working first by conservatively setting these speeds and then optimize later..

If a packet is not marked to enter any of our queues, it is sent out to the interface directly, traversing the HTB, so no HTB rules are applied to those packets (it would mean effective higher priority than of any packet flow managed by HTB).


Note: All packets need to go through our Queue with our max-limit setting in order for the control to be precise.

What happens upstream?

Sometimes the ISP/telco has variable latency. They probably have "Classic QoS" configured in their legacy network. We can ask them how to mark packets for low latency and do DSCP/ToS mark of outgoing priority packets. This has been observer with Deutsche Telecom in (sadly) late 2012 but at least they provided the information what ToS mark should be used.

Whoever owns Internet Exchanges where packet loss and delay may occur - may not feel responsible for it.

If we buy end-to-end WAN Services like VPN, MPLS etc etc. then all the operators underneath our service - do feel responsible for providing certain guarantees.

Example QoS script v0.1 alpha

Upload QoS for ADSL. Can be adapted and upgraded for all needs - up/down, ADSL or dedicated optic. If we want a working version/custom QoS script for our purpose - a consultant can provide such ($). In this state the main benefit is ADSL modem bufferbloat avoidance.

/ip firewall mangle
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_1_Up dst-port=80,443 packet-size=0-666 protocol=tcp tcp-flags=syn comment=QoS
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_1_Up dst-port=80,443 packet-size=0-123 protocol=tcp tcp-flags=ack
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_1_Up dst-port=53,123 protocol=udp
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_2_Up dst-port=80,443 connection-bytes=0-1000000 protocol=tcp
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_2_Up dst-port=110,995,143,993,25,20,21 packet-size=0-666 protocol=tcp tcp-flags=syn
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_2_Up dst-port=110,995,143,993,25,20,21 packet-size=0-123 protocol=tcp tcp-flags=ack
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_3_Up packet-size=0-666 protocol=tcp tcp-flags=syn
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_3_Up packet-size=0-123 protocol=tcp tcp-flags=ack
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_4_Up dst-port=110,995,143,993,25,20,21 protocol=tcp
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_4_Up dst-port=80,443 connection-bytes=1000000-0 protocol=tcp
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_8_Up p2p=all-p2p
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
add action=mark-packet chain=postrouting out-interface=ADSL1 passthrough=no
    new-packet-mark=QoS_2_Up src-port=8291 comment=WinBox

/queue tree
add max-limit=666K name=QoS_ADSL1_Up parent=ADSL1
add name=QoS_1 packet-mark=QoS_1_Up parent=QoS_ADSL1_Up priority=1
add name=QoS_2 packet-mark=QoS_2_Up parent=QoS_ADSL1_Up priority=2
add name=QoS_3 packet-mark=QoS_3_Up parent=QoS_ADSL1_Up priority=3
add name=QoS_7 packet-mark=QoS_7_Up parent=QoS_ADSL1_Up priority=7
add name=QoS_8 packet-mark=QoS_8_Up parent=QoS_ADSL1_Up priority=8
add name=QoS_4 packet-mark=QoS_4_Up parent=QoS_ADSL1_Up priority=4

Note: This Mangle setup does not take into account any DSCP values of incoming traffic. To use the DSCP values - they must be recognised and the packets marked with the according packet-mark. This would help with VoIP applications for example


Note: This Mangle setup will not catch most BitTorrent traffic. To catch that and put it in priority 8, we can try to use Layer 7 matchers. Some torrent traffic will still be unrecognised since BT inc takes care to hide their protocols better with each version. To improve overall QoS consider using latest versions of BT inc torrent clients across our network because of the µTP.


Note: To optimize Mangle rules for less CPU consumption, we should try to mark connections first with connection state new, then mark the packets of the marked connections. Also try implementing the pass-through option Manual:IP/Firewall/Mangle


Note: This HTB Queue Tree should be upgraded to a structure like Example 3 in the HTB article. Traffic from the lower priorities may not have a chance at times.


Note: To observe proper traceroute timing - ICMP needs to have priority, for example 4. Packets originating from the router itself, such as ICMP ttl messages during traceroute, may have software processing delay slightly different than packets routed through the router.


Note: Recommended queue sizes are small.

Testing, Verification and Analysis

Often we rely on monitoring systems that use ICMP, TCP and UDP (IP SLA monitoring)

We also do manual tests with software such as RouterOS Bandwidth Test + RouterOS fast ping; iperf + linux fast ping; iperf + Windows hrping and fast-ping. Also try Mark's PsPing

Don't forget to try out the RouterOS Traffic Generator as well.

Graphical Representation

We can plot the output of ping and iperf.

Packet captures can be analysed by various software to produce plots charts "reports" and pretty graphics.

Lab example: Performance Testing with Traffic Generator

To study packet queues one can use a couple of RouterOS machines. One with packet queues and the other to generate traffic with RouterOS Traffic Generator

So far testing with VMWare has shown wrong results with TCP. I have achieved expected behaviour with live RouterBOARDs. The reason is that virtual environments have additional delays queueing of packets in some cases. Please use native hardware.

Please test traffic that is being forwarded through the router rather than traffic that is destined to the router itself.

Heads-up on Layer 2 devices, hidden buffers and performance degradation

Make sure Flow Control is turned off on everything that passes Ethernet frames - switches, ports, devices.
Avoid using QoS or priority on cheap switches due to limitations in the level of control over the queues on those devices. Make sure to avoid Bufferbloat - do not overwhelm outgoing interfaces' buffers - use proper QoS upstream to help TCP/IP limit itself before the buffering occurs.

LINKS / More Information Ru Ru (search "So CoDel is" read from there onward)

Questions and Discussions

If any question - we can discuss in the forum.

If you want me to do the QoS Setup for a commercial project - please contact me.

Future Research and Development

SDN/ OpenFlow / Bandwidth Broker centralised controller approach


(To Be) Complete Collection of Published Books (not by Cisco)

Classic QoS information by Cisco

Classic QoS - ADSL, SHDSL, SDH, ATM, Frame Relay, serial async., in hardware

Legality, Ethics, Governance