Wednesday, July 18, 2012

Enabling Support for IPv6

by Rajiv Aggarwal and David Temkin

In our first post about World IPv6 Launch we talked about our motivations for supporting IPv6.  Now we’ll dive into some of the technical challenges we faced and how we addressed them.  Our primary concern was ensuring uptime and quality of experience for our members throughout our IPv6 deployment.  To accomplish this, our selected platforms and components were required to have robust IPv6 support and our rollout plan required incremental, iterative deployment and assessment.

Devices & Systems

We chose PC/Mac platforms because they met our IPv6 robustness requirements and drive a large number of streaming hours.  A typical PC/Mac playback session touches several critical edge systems which also were able to support IPv6.  These services include:

  • Our public web presence (www.netflix.com and movies.netflix.com)
  • Edge services for movie data and playback support
  • Content delivery services - static assets and audio/video streams

Playback

Streaming IPv6 playback is supported via the Netflix Open Connect Network.  This was mostly seamless, with the exception of two small, distinct issues.  

First, we ran into an issue where our FreeBSD content cache appliances, processing a significant amount of IPv6 traffic (in some cases, 8Gbps), would eventually panic and hang.  We found a reference leak in the IPv6 code that wasn't apparent until you had processed 2^32 packets. Once this counter rolled over it would free active memory and cause the panic.  Because we process large amounts of traffic we noticed this almost immediately.

Second, Open Connect playback URLs delivered to streaming clients use in-lined IPv4 literals rather than hostnames.  This allows us to bypass the overhead of managing and accessing content caches via DNS. Bypassing DNS is possible because we can accurately map a user IP address to a specific appliance based on IP CIDR block assignments.  We wanted to extend this functionality for IPv6, but found inconsistencies in how the different platforms handle IPv6 literals. Our only short-term option was to enable DNS for IPv6 clients/servers.

DNS

As many of you are aware, Netflix streaming services are deployed to Amazon's EC2 infrastructure. Edge services leverage AWS Elastic Load Balancers (ELBs).  The A and AAAA records for these ELB instances are managed by Amazon and we configure DNS CNAME records to point to them.  The best and most maintainable practice for IPv6 support is to leverage a “dualstack” configuration, allowing a single CNAME to be mapped to both A (IPv4) and AAAA (IPv6) records.  This is the solution we adopted.

For example, initially our IPv4 endpoint for the computer based player looked like this:

user$ dig cbp-us.nccp.netflix.com
;; QUESTION SECTION:
;cbp-us.nccp.netflix.com. IN A

;; ANSWER SECTION:
cbp-us.nccp.netflix.com. 3600 IN CNAME dualstack.nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com.
nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com. 47IN A 107.20.243.161
nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com. 47IN A 107.21.121.200

After changing our DNS entries it became:

user$ dig cbp-us.nccp.netflix.com
;; QUESTION SECTION:
;cbp-us.nccp.netflix.com. IN A

;; ANSWER SECTION:
cbp-us.nccp.netflix.com. 3600 IN CNAME dualstack.nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com.
dualstack.nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com. 47 IN A 107.20.243.161
dualstack.nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com. 47IN A 107.21.121.200

And to verify IPv6 addresses you need to ask for AAAA records:

user$ dig -t AAAA cbp-us.nccp.netflix.com
;; QUESTION SECTION:
;cbp-us.nccp.netflix.com. IN AAAA

;; ANSWER SECTION:
cbp-us.nccp.netflix.com. 3600 IN CNAME dualstack.nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com.
dualstack.nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com. 0 IN AAAA 2406:da00:ff00::6b15:f2e9
dualstack.nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com. 0 IN AAAA 2406:da00:ff00::6b15:fe6b

Rollout

As previously mentioned our risk-mitigation strategy was to test IPv6 support on a subset of customers before rolling it out broadly. Our DNS provider enables us to resolve hostnames based on the geo-location of the caller.  We used this during testing and rollout of IPv6 by starting with a specific geographic region and then expanding.  We started with the state of California and monitored metrics for requests coming to us via IPv4 vs. IPv6.  We specifically looked for any significant dips in IPv4 traffic that wasn't accounted for in new IPv6 traffic. In addition, we watched to see if requests arriving via IPv6 were failing in similar or different ways than those via IPv4.  As we gained confidence, we rolled out support across the U.S., again watching for failures.  When that went well we expanded to all regions, thus completing our IPv6 enablement.

This rollout was not perfect.  DNS servers are not geo-aware by specification and thus non-authoritative servers did not differentiate which users should receive IPv4 vs. IPv6 traffic. In addition, the authoritative DNS servers were doing geo-location based on the IP address of the non-authoritative DNS server.  Despite these limitations we accomplished our goal of targeting a subset of customers and were able to "dial up" support for IPv6 without making it an all or nothing rollout.

Outcome

According to a Sandvine report covered by TechCrunchIPv6 traffic in the U.S. hit record highs yesterday, but the biggest recent gains actually came about two weeks ago when Netflix turned on IPv6 functionality for its network”.  The net effect is that we now have the 2nd largest domain taking IPv6 traffic.  We're proud to have contributed in such a substantial way to the growth and evolution of the Internet!



Rajiv Aggarwal, Manager, Streaming DevOps
David Temkin, Open Connect Principal Architect