Saturday, April 18, 2015

Taming YouTube


It's pretty easy to completely blow out a monthly bandwidth cap with streaming video traffic.  This article going to discuss the options for throttling this traffic before you get in trouble with your ISP (or wallet) due to data overages.

Identifying the Problem

A nifty 3rd Party tool for monitoring router traffic is MRTG.  It's a Perl-based web app that provides nice graphical displays of router statistics.  MRTG pulls its stats off the router via SNMP, so you'll need to configure that on your router.  You'll also need to set up a web server for it (Apache works just fine).  Figure 1 below depicts a MRTG-generated table of a hypothetical "problem" (note the Ingress number of 1.41 TB).

Figure 1
Now, if you want to do some real-time analysis, there's a tool right there on the router: Cisco IOS Netflow.  There are a lot of capabilities included in Netflow, but I'm only going to touch on one - Top Talkers.

Top Talkers is just what it sounds like - a real-time depiction of the flows that are generating the most traffic.  Setting it up is easy.

The first step is to activate Netflow on the interface you want to monitor.  That can be in the ingress, egress, or both directions.
router(config)#int gi1/0
router(config-if)#ip flow ingress
router(config-if)#ip flow egress
The next and final step is to activate and configure the Top Talkers feature.
router(config)#ip flow-top-talkers
router(config-flow-top-talkers)#sort-by bytes
router(config-flow-top-talkers)#top 3
In this case, the commands above configure a display of the top three talkers, sorted by byte count.

Now to see some real-time stats:
router#show ip flow top-talkers
SrcIf         SrcIPaddress    DstIf         DstIPaddress    Pr SrcP DstP Bytes
Di1  Gi1/0*    06 01BB B49D    35M
Gi1/0    Di1  06 B49D 01BB   774K
Di1  Gi1/0*    06 01BB B4A0   334K
Clearly, there's some fairly heavy ingress traffic (35M thus far) going on between a address and a local/private 192 address.  A quick whois on that 173 address reveals it's in Google's IP range.  More specifically, this is YouTube video traffic.

As far as inspecting real-time traffic - the tool of choice is Wireshark.  There are some network monitoring capabilities built into Chrome and Firefox as well, but they're primarily focused on the HTTP layer.

Solving the Problem

There are various factors that influence streaming bit-rates, but Netflix gives some general guidance of 3 Mbps for SD video and 5 Mbps for HD.  Some more detailed info from Google regarding YouTube here.

A rough calculation (using Netflix's guidelines) of the amount of bandwidth burned streaming 1 hour of HD video is:

60 min/hr * 60 sec/min * 5 Mb/sec = 18,000 Mb/hr = 2,250 MB/hr = 2.25 GB/hr 

As expected, the meter runs pretty fast with streaming video traffic - particularly HD.

Site Settings

There are some easy measures you can take to limit bandwidth on the streaming sites themselves.  You can simply turn down the bandwidth usage for the particular site in the end-user's account settings.  Unfortunately, depending on users to voluntarily degrade their video quality probably isn't realistic.  The rest of this article will mostly focus on the options for imposing limits on the network itself.

Traffic Policing & Shaping

Cisco has a wealth of info published on the policing/shaping topic so I'm not going to spend too much time on the details.  Grossly simplifying the process: "policing" traffic results in dropping it when a defined bandwidth limit is reached.  "Shaping" traffic uses router resources (memory) to queue traffic to avoid dropping it.  However, shaping will degenerate into dropping packets if the traffic reaches a level beyond the limits of the queuing resources.

Figure 2 below is a simple diagram depicting on where to implement policing and shaping.  The main concept of note: traffic policing should happen on the inbound interface; shaping has to happen on the outbound interface.

Figure 2

IOS Commands for Configuring Policing and Shaping

Implementation of shaping and policing follow the same basic steps:
  1. Define 'class'es of traffic that you want to manage.
  2. Create a policy that allocates bandwidth and/or makes modifications to the classes.
  3. Apply the policy to an interface.
Below is a first attempt at defining a traffic class for streaming video:
class-map match-any http-video-class
 match protocol video-over-http
Line 1 defines the class and matching criteria.
Line 2 invokes Cisco's NBAR feature to utilize a pre-built signature for matching HTTP-based video.

Below is a sample policing policy:
policy-map http-video-police
 class http-video-class
  police 1000000 187500 conform-action transmit exceed-action drop
Line 1 names the policy.
Line 2 invokes the video class we defined above.
Line 3 applies a police policy to that video class.  The class is given bandwidth of 1 Mbps with a possible burst of 187.5 Kbps.  Traffic that conforms to that bandwidth limitation is transmitted; otherwise, the traffic is dropped.

Similarly, below is a sample shaping policy:
policy-map http-video-shape
 class http-video-class
  shape average 2000000
  queue-limit 128 packets
Line 3 applies a shaping policy to the video class. Max bandwidth is 2 Mbps.
Line 4 allocates a queue size of 128. The default is 64.

The last step is to apply the policy to an interface.
interface GigabitEthernet1/0
 ip address
 service-policy output http-video-shape
Line 4 applies the shaping policy to this interface. As discussed previously, shaping/queuing happens in outbound the direction.

The command below will allow you to view real-time statistics on the policy in action:
router#show policy-map int "yourInt"

Video Classification Case Studies

In the example above, I gave the impression that the pre-defined 'video-over-http' NBAR signature was sufficient to classify all of the streaming video out there.  Unfortunately, that's not the case, at all.  Different video providers are implementing streaming differently.  Part of that is due to the fact we're in a technology transition period - HTML5 is replacing Flash as the streaming video standard.  However, there are other factors at work - in particular with YouTube - that make this classification task (and thereby the whole concept of rate limiting video) non-trivial.  Below are a few video providers that I took the time to analyze and document their streaming behavior:  Amazon Prime, Netflix, and YouTube.

Of note, I had to load the latest NBAR2 Protocol Pack (ver 13) to get the correct NBAR signatures for identifying Netflix and YouTube traffic.  That also required an IOS upgrade to the latest version which has a bug of some sorts in its PPP implementation.  Nothing can ever be easy.

Amazon Prime

Amazon currently uses Microsoft Silverlight by default but will fall back to Flash with the Hardware Abstraction Layer (HAL) module if Silverlight isn't available.  Both options were evidently motivated by the need to be in DRM compliance.

For folks that use Linux and want to watch Prime on their browser, you're kind of in a bind given how Amazon has implemented streaming.  Obviously, Silverlight won't work out of box for you (but a substitute plugin has been written - Pipelight).  And, on the Flash front - Flash/HAL flat out won't work in Chrome and requires a hack to even work in Firefox.  Instructions here on that hack.

Assuming a Flash implementation against Amazon Prime - classifying Prime traffic is simple.  Flash is using RTMP as the underlying streaming protocol.  Prime is using RTMPE, the encrypted version of RTMP.  So, a NBAR rule for RTMPE will catch Prime streaming traffic.  Incidentally, the built-in NBAR signature for Prime (amazon-instant-video) doesn't identify this Flash traffic.

Our http-video-class looks like this for classifying Amazon Prime traffic (Flash-based).
class-map match-any http-video-class
 match protocol rtmpe

Below is a real-time snapshot of the shaping policy at work on Amazon traffic:
router#show policy-map int gi1/0

  Service-policy output: http-video-shape

    Class-map: http-video-class (match-any)  
      9626 packets, 14460061 bytes
      5 minute offered rate 323000 bps, drop rate 0000 bps
      Match: protocol rtmpe
        9626 packets, 14460061 bytes
        5 minute rate 323000 bps
      queue limit 128 packets
      (queue depth/total drops/no-buffer drops) 41/0/0
      (pkts output/bytes output) 9626/14460061
      shape (average) cir 2000000, bc 8000, be 8000
      target shape rate 2000000

    Class-map: class-default (match-any)  
      18205 packets, 24550571 bytes
      5 minute offered rate 474000 bps, drop rate 0000 bps
      Match: any 
      queue limit 64 packets
      (queue depth/total drops/no-buffer drops) 0/0/0
      (pkts output/bytes output) 18204/24550505

As an aside - the Amazon Flash traffic has a source port of 1935 (TCP).  So, if you don't have a router that supports signatures - it's still easy to classify Amazon's traffic with a simple ACL such as this:
access-list 111 permit tcp any eq 1935 any


Netflix just recently started showing some love to those of us in the Linux community.  They were a 100% MS Silverlight shop last year.  Today, they've adopted HTML5 video as well.  HTML5 video tag snippet below from a Netflix page:

<video src="blob:http%3A//" style="position: absolute; width: 100%; height: 100%;"></video>

The current releases of Chrome on Linux will work with the Netflix player (Firefox is not supported, so the inverse of Amazon Prime).

The traffic profile for Netflix is fairly straight-forward.  Netflix streaming traffic is TCP packets sourced from port 80.  The current NBAR Protocol Pack (ver 13) correctly classifies Netflix traffic.  So, adding a match for Netflix to our existing class-map yields this:
class-map match-any http-video-class
 match protocol rtmpe
 match protocol netflix

Traffic statistics output below with the Netflix class match added:
router#show policy-map int gi1/0

  Service-policy output: http-video-shape

    Class-map: http-video-class (match-any)  
      19090 packets, 28450585 bytes
      5 minute offered rate 222000 bps, drop rate 0000 bps
      Match: protocol rtmpe
        12390 packets, 18618543 bytes
        5 minute rate 0 bps
      Match: protocol netflix
        6700 packets, 9832042 bytes
        5 minute rate 222000 bps
      queue limit 128 packets
      (queue depth/total drops/no-buffer drops) 56/25/0
      (pkts output/bytes output) 19057/28401286
      shape (average) cir 2000000, bc 8000, be 8000
      target shape rate 2000000

    Class-map: class-default (match-any)  
      99486 packets, 108676494 bytes
      5 minute offered rate 143000 bps, drop rate 0000 bps
      Match: any 
      queue limit 64 packets
      (queue depth/total drops/no-buffer drops) 0/0/0
      (pkts output/bytes output) 99307/108536983


Similar to Amazon, YouTube also used Flash previously but they have since moved on to HTML5 video.  Code snippet below from a YouTube page:

<video class="video-stream html5-main-video" style="width: 640px; height: 360px; left: 0px; top: 0px; transform: none;" src="blob:https%3A//"></video>

Similar to Netflix, they're using a File API blob for the video source (in memory) with Media Source Extensions.  This allows them to do cool stuff like adaptive streaming, restrict downloads to relevant portions of the video, etc.

What makes YouTube more interesting though, from a traffic classification perspective, is they've moved to a 100% TLS access format.  That means YouTube traffic is encrypted/unreadable from Layer 3 (transport) up.  The following sorts of things just won't work for classifying YouTube traffic:

  • HTTP URL - You can't read any of the headers the HTTP segment (Layer 7, application).  It's encrypted.  There's no such thing as 'deep packet inspection' of this traffic.
  • HTTP Host - Same thing.
  • TCP Port - They're using TCP port 443, just like all the rest of the TLS traffic on your network.  You put limits on TCP 443, you limit all TLS traffic.
  • IP Address - Like many of the streaming providers, Google is using a content delivery network (CDN) for their video traffic.  It's common to see multiple different IP addresses delivering video in a single session.  It's all about finding the most efficient route for the content.  To boot, that CDN lives in Google's IP address range.  The Google IP range is >200K wide these days and growing, no doubt.  Neat discussion on how to count up Google's IP addresses yourself here.  Net, trying to maintain an IP address-based ACL for Google seems like an uphill battle.
So, TLS makes YouTube traffic a bit more challenging to manage.  I would wager Google knew this all along given today's semi-hostile environment between the content providers and the ISP's.  To that point - earlier this year, a Google engineer uncovered that Gogo, an inflight broadband provider, was using a less than wholesome method to manage YouTube traffic.  In a nutshell, Gogo was issuing fake TLS certs for YouTube.  As a Man-in-the-Middle, that would (did) enable Gogo to decrypt YouTube traffic.  Gogo evidently ceased this seedy practice shortly after it was uncovered.

In the Cisco world, the current NBAR Protocol Pack (version 13) does in fact have a working signature for identifying YouTube traffic.  So, throttling YouTube is a simple matter of adding it to the class-map:
class-map match-any http-video-class
 match protocol rtmpe
 match protocol netflix
 match protocol youtube

Results of that addition below:
router#show policy-map int gi1/0

  Service-policy output: http-video-shape

    Class-map: http-video-class (match-any)  
      54104 packets, 80313633 bytes
      5 minute offered rate 619000 bps, drop rate 7000 bps
      Match: protocol rtmpe
        12390 packets, 18618543 bytes
        5 minute rate 0 bps
      Match: protocol netflix
        18059 packets, 26614941 bytes
        5 minute rate 0 bps
      Match: protocol youtube
        23655 packets, 35080149 bytes
        5 minute rate 619000 bps
      queue limit 128 packets
      (queue depth/total drops/no-buffer drops) 0/511/0
      (pkts output/bytes output) 53575/79541510
      shape (average) cir 2000000, bc 8000, be 8000
      target shape rate 2000000

    Class-map: class-default (match-any)  
      172255 packets, 159978075 bytes
      5 minute offered rate 7000 bps, drop rate 0000 bps
      Match: any 
      queue limit 64 packets
      (queue depth/total drops/no-buffer drops) 0/0/0
      (pkts output/bytes output) 172197/159807489

Now the question is - How are they classifying this traffic?  I don't have access to the Cisco source code of their Protocol Pack, so I can only guess.  If someone out there does have a solid explanation of how Cisco or others have implemented this, I would appreciate you describing it in the comments section of this blog.

The options I can think of:
  1. IP address filter on the Google IP address range.  I discussed earlier how that maintenance task is probably unmanageable.  But, given Cisco issues regular updates on these Protocol Packs - maybe it's a workable model for them.
  2. Tracking flows based on an extension of the TLS spec known as Server Name Indication (SNI).  If you watch YouTube traffic in Wireshark, you will in fact see and in the SNI header. The SNI is unencrypted as it passed during the TLS handshake and prior to encryption.  A similar method can be implemented using the Common Name field in the X.509 certificate that is issued during the TLS handshake.  For either of these to work, the traffic filter would have to mark all subsequent traffic originating from the target SNI/Common Name source and then manage the flow accordingly.
As an aside - Google is experimenting with a protocol they developed known as QUIC.  In short, its purpose is to speed up web connections.  You can turn this on full-time in Chrome easily here: chrome://flags/#enable-quic.

After QUIC is enabled, all YouTube content is delivered to UDP port 443.  That's an easy target for a class map.  Additionally, that traffic won't get mixed in with all of the rest of your TLS traffic.  The simple ACL below will classify it:
access-list 102 permit udp any eq 443 any

Cisco IOS PPP Bug Workaround


There appears to be a bug in the current releases in both the 15M and 15T code trains.  I've tested with 15.4.3M2, 15.5.1T1, and 15.5.2T with the same results.  From what I can tell, the bug is specifically in the PAP implementation in these releases.


I've had a PPPoE/PAP implementation up for years.  Upon installing any of the above IOS releases, that implementation stopped working.  The symptom is the connection flapping (up/down) continuously.  I got a hint this was an IOS bug by googling the symptom.  These PPP bugs have evidently manifested themselves in previous releases.

Turning up debug is really the only way to narrow down what is happening:
router#debug ppp authentication
router#debug ppp error
router#debug pppoe errors
Here's a sampling of the error messages you'll see:
PPPoE: Failed to add PPPoE switching subblock
PPPoE: Unexpected Event!. PPPoE switching Subblockdestroy called
Vi2 LCP: Sent too many CONFNAKs.  Switch to CONFREJ
I've had CHAP shut off on this implementation (again, for years) with this configured on the Dialer interface:
ppp chap refuse


Turning on CHAP (and removing the 'refuse' command) seems to fix things for me.  That IOS CHAP code apparently is not bug-ridden and my ISP evidently will allow a CHAP authentication.  If yours doesn't, this won't help you.  Your only option is drop back to a stable release and wait till Cisco corrects the PPP/PAP bug in a future release.
interface Dialer1
 ip address negotiated
 ip mtu 1492
 encapsulation ppp
 ip tcp adjust-mss 1452
 dialer pool 1
 dialer-group 1
 ppp authentication pap chap callin
 ppp chap hostname yourName
 ppp chap password yourPassword
 ppp pap sent-username yourName password yourPassword
 no cdp enable