In our first post about
World IPv6 Launch we talked about our motivations for supporting IPv6.
Now we’ll dive into some of the technical challenges we faced and how we
addressed them. Our primary concern was ensuring uptime and quality of
experience for our members throughout our IPv6 deployment. To accomplish
this, our selected platforms and components were required to have robust IPv6
support and our rollout plan required incremental, iterative deployment and
assessment.
Devices & Systems
We chose PC/Mac platforms because
they met our IPv6 robustness requirements and drive a large number of streaming
hours. A typical PC/Mac playback session touches several critical edge
systems which also were able to support IPv6. These services include:
- Our public web presence (www.netflix.com and movies.netflix.com)
- Edge services for movie data and playback support
- Content delivery services - static assets and audio/video streams
Playback
Streaming IPv6 playback is supported
via the Netflix Open Connect Network. This
was mostly seamless, with the exception of two small, distinct issues.
First, we ran into an issue where our
FreeBSD content cache appliances, processing a significant amount of IPv6
traffic (in some cases, 8Gbps), would eventually panic and hang. We found
a reference leak in the IPv6 code that wasn't apparent until you had processed
2^32 packets. Once this counter rolled over it would free active memory and
cause the panic. Because we process large amounts of traffic we noticed
this almost immediately.
Second, Open Connect playback URLs
delivered to streaming clients use in-lined IPv4 literals rather than hostnames.
This allows us to bypass the overhead of managing and accessing content
caches via DNS. Bypassing DNS is possible because we can accurately map a user IP
address to a specific appliance based on IP CIDR block assignments. We wanted
to extend this functionality for IPv6, but found inconsistencies in how the different
platforms handle IPv6 literals. Our only short-term option was to enable DNS for
IPv6 clients/servers.
DNS
As many of you are aware, Netflix
streaming services are deployed to Amazon's EC2 infrastructure. Edge services
leverage AWS Elastic Load Balancers (ELBs). The A and AAAA records for these
ELB instances are managed by Amazon and we configure DNS CNAME records to point
to them. The best and most maintainable practice for IPv6 support is to
leverage a “dualstack” configuration, allowing a single CNAME to be mapped to both
A (IPv4) and AAAA (IPv6) records. This is the solution we adopted.
For example, initially our IPv4 endpoint
for the computer based player looked like this:
user$ dig
cbp-us.nccp.netflix.com
…
;; QUESTION
SECTION:
;cbp-us.nccp.netflix.com.
IN A
;; ANSWER
SECTION:
cbp-us.nccp.netflix.com.
3600 IN CNAME
dualstack.nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com.
nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com.
47IN A 107.20.243.161
nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com.
47IN A 107.21.121.200
…
After changing our DNS entries it
became:
user$ dig
cbp-us.nccp.netflix.com
…
;; QUESTION
SECTION:
;cbp-us.nccp.netflix.com.
IN A
;; ANSWER
SECTION:
cbp-us.nccp.netflix.com.
3600 IN CNAME
dualstack.nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com.
dualstack.nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com. 47 IN
A 107.20.243.161
dualstack.nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com.
47IN A 107.21.121.200
…
And to verify IPv6 addresses you need
to ask for AAAA records:
user$ dig -t AAAA cbp-us.nccp.netflix.com
…
;; QUESTION
SECTION:
;cbp-us.nccp.netflix.com.
IN AAAA
;; ANSWER
SECTION:
cbp-us.nccp.netflix.com.
3600 IN CNAME dualstack.nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com.
dualstack.nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com.
0 IN AAAA 2406:da00:ff00::6b15:f2e9
dualstack.nccp-cbp-us-frontend-61636918.us-east-1.elb.amazonaws.com.
0 IN AAAA 2406:da00:ff00::6b15:fe6b
…
Rollout
As previously mentioned our
risk-mitigation strategy was to test IPv6 support on a subset of customers
before rolling it out broadly. Our DNS provider enables us to resolve hostnames
based on the geo-location of the caller. We used this during testing and
rollout of IPv6 by starting with a specific geographic region and then
expanding. We started with the state of California and monitored metrics
for requests coming to us via IPv4 vs. IPv6. We specifically looked for
any significant dips in IPv4 traffic that wasn't accounted for in new IPv6
traffic. In addition, we watched to see if requests arriving via IPv6 were failing
in similar or different ways than those via IPv4. As we gained confidence,
we rolled out support across the U.S., again watching for failures. When
that went well we expanded to all regions, thus completing our IPv6 enablement.
This rollout was not perfect.
DNS servers are not geo-aware by specification and thus non-authoritative
servers did not differentiate which users should receive IPv4 vs. IPv6 traffic.
In addition, the authoritative DNS servers were doing geo-location based on the
IP address of the non-authoritative DNS server. Despite these limitations
we accomplished our goal of targeting a subset of customers and were able to
"dial up" support for IPv6 without making it an all or nothing
rollout.
Outcome
According to a Sandvine report covered
by TechCrunch: “IPv6 traffic in the U.S. hit record highs yesterday, but the
biggest recent gains actually came about two weeks ago when Netflix turned on
IPv6 functionality for its network”. The net effect is that we now have the 2nd largest domain taking IPv6 traffic.
We're proud to have contributed in such a substantial way to the growth
and evolution of the Internet!
Rajiv Aggarwal, Manager, Streaming
DevOps
David Temkin, Open Connect Principal
Architect
