[afnog] MTU Size for transit links for ISPs

Fri Sep 20 12:12:35 UTC 2013

Nishal, thanks for this very comprehensive explanation. It makes sense that the issue may not be related to MTU. What we noticed though, is that when we removed flow control on the MUX, it improved for a while.

I will keep checking for fragmented packets though.

Mohamed and Johan, I'm still waiting for the upstream to give me their MTU setting anyway.

Will keep everyone posted about how this turns out.

Cheers.

Luc Perreau

-----Original Message-----
From: nishal at controlfreak.co.za [mailto:nishal at controlfreak.co.za] On Behalf Of Nishal Goburdhan
Sent: Friday, September 20, 2013 3:56 PM
To: Perreau, Luc
Cc: afnog at afnog.org
Subject: Re: [afnog] MTU Size for transit links for ISPs

On 20 Sep 2013, at 9:11 AM, "Perreau, Luc" <Luc.Perreau at cwseychelles.com> wrote:

> Hi all,
>  
> We have been working with POS interfaces for a while and with no issues at all. Our MTU size is always set to 4500.
>  
> However, we have recently installed some Ethernet links with default MTU size of 1500. These are STM1 size links and we found that there is degraded performance on browsing and download, but P2P torrent traffic is not affected.

luc,

it's unlikely (though, not impossible) that the change to MTU=1500 on your transit interface is causing your problems.

although you've had POS interfaces, with MTU set to 1500+ for those interfaces, it's likely that the end hosts that you're communicating with - for example a web/ftp server out on the internet, hosted on a typical ethernet network in a typical data-centre - only supported an MTU of 1500.
people inside your network, on typical ethernet networks where the MTU=1500, would have likewise, have been communicating *through* your high-mtu link, using 1500 sized packets [*]  - because that's the most that they could get across *their* lan  (forget about your high-mtu POS link...)

practically, the MTU is really the largest unit that can get transferred across the *path* of the communications channel.  not just across one component of it.  
for the 4500 mtu to have been practically useful to your network, end systems/users need to know that you had a large mtu available along the entire path.
so, unless you had that 4500 MTU (or something else reasonably large) available all the way down to end-systems, it's unlikely that the change you mention would affect performance.

with MTU, you actually care about the *smallest* MTU along a path;  that's the most significantly limiting part of the equation...

> What is the best way forward to confirm that it is an MTU issue, and if so, what is the best MTU size to set? Jumbo (9000) or mutually agreed size with the transit provider?

you can generally find the mtu along your transmission path, by sending packets of variable sizes using the "ping -s" option.  
you've should tell the devices in between not to fragment your packet;  so set your -D option as well.   
here's a quick example:
[katala:~] # ping -g 1440 -h 10 -G 1510 -D 196.36.80.177 PING 196.36.80.177 (196.36.80.177): (1440 ... 1510) data bytes
1448 bytes from 196.36.80.177: icmp_seq=0 ttl=255 time=3.740 ms
1458 bytes from 196.36.80.177: icmp_seq=1 ttl=255 time=2.267 ms
1468 bytes from 196.36.80.177: icmp_seq=2 ttl=255 time=2.367 ms
1478 bytes from 196.36.80.177: icmp_seq=3 ttl=255 time=4.783 ms
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 4
ping: sendto: Message too long
Request timeout for icmp_seq 5
ping: sendto: Message too long
Request timeout for icmp_seq 6
--- 196.36.80.177 ping statistics ---
8 packets transmitted, 4 packets received, 50.0% packet loss round-trip min/avg/max/stddev = 2.267/3.289/4.783/1.040 ms [katala:~] # 

...so you can guess that the MTU between my test host, and the remote IP lies somewhere between 1478 and 1488.  
you can read more on this process here:  http://en.wikipedia.org/wiki/Path_MTU_Discovery

in practice, you want the MTU across your backbone links to *not* be smaller to your users than what is available to them across their last mile.  so, for example, if your users are on ethernet networks, where MTU=1500, then the MTU inside your core, should not be less than 1500.  if it is, then user data starts to get fragmented, and that causes issues... 
also, if you're running other interesting things inside your core, like vlans, mpls, etc. you'd *definitely* need higher than MTU=1500.

there's no "magic number" per se, other than, try your best to avoid fragmentation, because that will hurt your network.
so, look at your router interfaces carefully, and if you can, through tools like netflow, to see if you have lots of fragments.
if you do, that's a place to start to examine your infrastructure.  

--n.

* simplification.   let's ignore overhead, etc. for now.