Building a Next-Generation Streaming Platform With Sky (Cloud Next '19)

Building a Next-Generation Streaming Platform With Sky (Cloud Next '19)

MELIKA GOLKARAM: Hello, everyone I hope you've had a great Google Next so far

And thank you so much for attending this session My name is Melika Golkaram I'm a media specialist customer engineer from Google Cloud And today, I have the pleasure to co-present this session with two industry experts, Jeff Webb, a principal solution design architect from Sky, and Moore Macauley He's a director of product from Harmonic

So what are we hearing in this session? Well, I think it's useful for all of us to go a little bit back and look at what's happened in the industry, how things have shifted and changed until we've got to this point, and go to 2018 and 2019 and see what Google Cloud has done for media customers to help them achieve their targets and implement their interesting projects And then I'll ask Jeff to come up the stage and take us through their interesting journey of designing and deploying a future-proof sports streaming platform And then, Moore is going to give us a different angle of how they implement it there, basically stream processing applications on top of containers to help Sky in this wonderful project So this is really interesting for all of us Today, we are all so used to get any sort of content with great quality on any device in a way that we all forget that this industry is actually not that old

If you look at 2005 when, for the first time, YouTube was launched as, basically, the biggest streaming platform in the world 10 years down the line, things shifted and changed in every different way Streaming protocols, transcoding formats, infrastructure grew Big data– everything changed so rapidly that 10 years down the line, we see a 4% fall from the traditional TV viewership And that figure lends itself into a 74% increase in content viewership on YouTube

So this figure looks a little bit odd, because 4% versus 74% Part of it is because the overall viewership of content on traditional TV was a huge number So when you lend it into YouTube, it becomes a really big factor But the other part was that people overall started to watch content a lot more than they used to before, because obviously, you could find any sort of niche content you wouldn't be able to find on TV on YouTube Three years later– this is interesting– the overall viewership of content on streaming platforms– not only YouTube, but everywhere– witnessed a 35% increase year on year

And then today, where we're standing, people are consuming one billion hours of content only on YouTube That's a massive number So that makes us hopeful that for the next five years, the profit that gets shifted away from traditional TV platforms, IPTV, cable TV– and this $30 billion profit will be spent on online TV platforms So that will take us to 2025, where half of the viewers of the age of below 32 will never actually see a subscription on traditional TV platforms And that will take the consumption of media on internet to 25% of the world data

So we know that things have changed so much We're really proud to be standing in this moment But we want to see if Cloud is going to be the next big wave of disruption into the media and tech industry, at the same time So let's see why Cloud matters in this industry So problem number one of every technology company is the speed to delivery

If you're a media company, your problem is a lot bigger Why? You're acquiring a lot of content from studios, from sports organizations– this massive amount of money you're spending And you want to make sure that you're launching the service right on time, you are actually providing any sort of necessary infrastructure without having to worry about infrastructure planning And the other problem about media industry is that it's not only about applications It's also about the delivery of media across the world

So you need to make sure that your content gets encoded, transcoded, is being sent across the world with minimum delay The other problem is that when you work in a really fast environment, you need to be sure that you're scaling, but you're not constantly calculating and you're not being stingy with your infrastructure, because you want to make sure your users are not getting delays, or that you wouldn't lose your viewers to your competition A very good example of that is news content So if you lose one or two minutes of content viewership, on a kind of delivery on your network, then people would just go to your competition And then you're using millions of dollars of ads for something pretty plain and simple

So with Cloud, you can actually do this scalability pretty fast And then you just shut down your instances And after that, you didn't really have to worry about a bunch of annoying servers in your infrastructure Then again, you can have any sort of analytics you want You can think about what you want to do with it later

You can be super generous with your data And you can also use that data later, once you figure out what data set is important for you to have intelligent decision-making with machine learning And machine learning is infrastructure demanding So with Cloud, you can actually fix that problem And then the sweet story of studio requirements

So if you're a content distributor and you work with multiple different studios, each of them comes to you with a massive contract of different requirements for your infrastructure And if you work with a cloud provider, you really don't have to worry about at least the basics of these requirements With all said and done, we still think hybrid solutions are going to stay with media industry for a while from now, because you have your transcoders, encoders on premise And you want to make sure that you are keeping this infrastructure, but burst into cloud whenever it's needed And here we are

We are in Google Cloud infrastructure When you look at that map, you really shouldn't see a lot of colorful dots and fibers– even though it's entertaining to look at it that way, too But you should actually look at this as a representation of six Google services that have more than one billion subscribers All these services are sending traffic securely with a lot of scalability through this network, through these points of presence And Cloud is using very much the same infrastructure that YouTube, Gmail, Maps, and those services are using

Over the past three years, we have spent $20 billion on infrastructure to maintain and expand our network We have invested in 13 sub-sea cables We've launched in 18 different regions, 55 zones, 134 network edges, and 90 CDN locations, plus 73 interconnect locations We recently launched in Switzerland And we are launching in Indonesia and in Japan for the second time

But we know success doesn't come from wealth What made Google Google is technology So if you look at different layers of technology, here, you see we build our own chips We build our own hardware We build our own data centers

We maintain it We invented Kubernetes All of our services, across the board, use Kubernetes is to scale fast and easily We have multiple layers of identity to control access to our platform If you look at how much content is being stored and uploaded in YouTube, you will see storage is not an issue that we would worry about– and the security around it, either

We talked about our backbone, but it's sitting on top of everything else And the operational and security of devices is something that our SREs are paying attention to every day So how would your service look like, ultimately? If you're a kind of visual studio company and you have people sitting in Los Angeles, in London, different parts of the world, you have content stored in multiple locations, you want to make sure that you can do your editing on the device that is not present in your data center, but gives you the illusion of having it at your vicinity If you are a live sports company like Jeff is, you basically want to make sure that your content is being distributed around the world without any delay If you are a telco and you've acquired other telcos and you are distributing to multiple locations, you want to make sure that your studio rights are being obeyed, basically

And finally, when you look at the amount of storage you will be needing, you really don't want to worry about how much content is being stored every day So with Google Cloud, we're giving you a huge amount of infrastructure to scale your services We give you containers, Kubernetes app deployment services where you can deploy with a lot of scalability and reach your goals We give you an 11-9 durable storage– different classes, but instant access across the board to store your content Your network is basically a global network that is well connected with our fiber

And your services are going to be distributed in different virtual private networks But because they're within Google infrastructure, the security is complied and the content is being encrypted at rest and at transit We also give you a lot of big data and machine learning capabilities to sit on top of your services, to give you intelligent ideas about how you can push for better business decisions and for maintaining your customers on your platform This picture always cracks me up In 2010, this picture went online

And it drove the whole internet crazy about whether it was Tom Hanks or Bill Murray What do you think? Thank you Yes, it's Bill Murray The whole point is, with the whole hype about cloud these days, there is so much similarity between cloud platforms that people sometimes get confused about what would be the best decision for their own business idea And we think there are certain things in Google Cloud that shines among all other services

We are investing a lot in hybrid approaches Cloud storage is a very special service, especially when it comes to media We give you a metadata caching, redundancy, duplication across the board It's multi-regional And regardless of what class of storage you put your content on, it's always instant accessibility

Our CDN is very easy to configure So it's literally just putting a tick at a box, and then content gets cached everywhere We have a new approach to machine learning We use Auto ML, which gives you access to our models But you can train it with your own data

So you can get rid of a lot of unnecessary labels that you would get from a generic model And the network that we talked about several times– it helps you distribute your content a lot faster 2018– it was interesting for us We launched in eight new regions and we took Dedicated and Partner Interconnect to general availability With rendering, we rewrote the whole Zync platform

And it's available on Google infrastructure, and provides up to 48,000 cores to editors, and individual editors We released a live machine learning service for video analysis in real time We released a lot of features for metadata enrichment, and also for monetization, like video Auto ML, object tracking, and stuff like that In terms of OTT, AVATA, which is our end-to-end platform, launched in Europe And our CVN, in terms of media, went on GA

And Cloud armor was launched to protect your service And we are doing a lot of interesting stuff with archiving So tape digitization as a part of bigger Google services and art and culture helps our customers that have valuable content to digitize those tapes with very little money and then use machine learning to enrich the metadata for those assets And what's coming in 2019? We are launching five new sea cables across the board Hybrid platforms are going to stay as our highest priority this year

So that will help you work with your streaming services, your different types of– if you have machine learning stuff on premise, and a lot of different aspects of hybrid workflows or in different cloud providers with one control plane and the same set of APIs that are basically industry standards like Kubernetes, for example We are replacing our MPAA licensing with TPN, which covers a lot more broadcasters and content owners That will be available on major GCP infrastructure And we are doing a lot of CDN enhancements We are working on providing media APIs for encoding, transcoding, and ad insertion

We are releasing a neutral rendering platform called OpenCue That's independent of any specific software, like Zync And we are working on other AI services to go to general availability And something very interesting is about edge TPUs– that you will be able to download the model on a device offline and the device will do the analysis and send the results to the cloud So with that, I'm going to ask Jeff to come up to the stage and take us through their interesting story, which is very inspirational

JEFF WEBB: I'm going to talk over the next 20 minutes about live sports in the cloud We've been doing it for a long time on premise And there's lots of opportunities in the cloud So my name's Jeff Webb I'm principal streaming architect

I'm from Sky UK, hence the accent So what I'm going to talk about is about Sky So for those not familiar with who we are, go through that Believe in better streaming, which is part of our ethos Success formula for delivering live streaming service

So what does that make? Software Defined Streaming– this is our concept for the next generation platform How Cloud benefits, and comparison of on-premise We have a thing called single channel fault domain, which is really kind of where the key innovation comes in And I'm going to show you what happens when something goes wrong So about Sky– we have 23

6 million customers in Europe And we operate in seven European countries We're very proud, now, to be part of the Comcast family So that acquisition went through a few months ago Sky Q, which is our setup box proposition in Europe– and that's in 5

5 million homes That's up from 34 million last year We're increasing investment in original drama, for a 25% increase over last year And very interestingly, if you produce really good drama, then more customers will watch it

So you can see viewing has more than doubled in the last three years Put this up here because this is something else, as well I'm really proud to be part of a socially responsible company So we have things like Sky Ocean Rescue And you might have seen things across the internet– things about trying to save the oceans

There's a lot of plastic that we throw away and ends up in the animals and the whales and things And it's not a good thing So believe in better streaming Going to take you through a little journey of the last 10 years I myself have been at Sky for 14 years

I've been in the streaming team since 2009 So over that 10-year period, we've gone through three different generations We started off, as most broadcasters did, in 2009 when Apple announced the HLS spec We wanted to be able to do live sports– Premier League and things And so we did what most broadcasters did They went out to market, went and did a comparison, go through the RFI, RFP process, bought some white boxes, plugged them in in the data center

You had video in on one side You had video out on the other side And customers were happy Fast forward several years, business changes Customers want higher quality

You've gone from standard definition phones Now you've got retina displays on your phones and your tablets And so customers are expecting to get a better quality Obviously, you've also got big screens, smart TVs, and so on So we went to generation 2

Problem with the first model is that you go and you try and buy the biggest box you can afford because you want to maximize your channel density So you put five channels on this big box And this box cost you like $20,000 If you want to get more, then you spend more That business model is fine up until you have to go and replace it

So every few years, about every three years or so, you have to go through the upgrade, refresh cycle And that's a very expensive way of doing it In reality, as broadcaster, all you need is the software The software is the intellectual property So that's what we did in generation 2

We said, why would I buy your server? I just want your software We can run our own server, as most enterprises do We run VMware So that's what we did We virtualized it

And we've run that platform for six years now And that's been very successful What I'm talking today about is moving to the cloud, specifically for sports, because there's a really great opportunity for that So just to give you some context, what is the success formula for delivering a live streaming service? This can be sports It can be anything

Well, firstly, it starts with content Leaving aside the technology, customers in the pay TV operator– they want to be entertained They want to watch the sports They want to watch the "Game of Thrones" next week They want to watch Italian or German football, and so on

That is what they pay for That's part of the experience And so we shouldn't forget that At Sky, I'm part of what's called software engineering Software engineering is actually quite a lot of people

In the building I'm in, there's 800 or so that are in software engineering And these guys and girls are very clever And they basically go and write the software Now, unlike a lot of other people, we give the software away for free So the network is obviously key to this

So if you own and operate your own network, then, obviously, you can ensure that that last mile connection, especially, is, obviously, optimized In the UK, we offer Sky Broadband, which is our second-largest broadband provider in the UK And this is now– I think it's about six million customers Sky Broadband is also rolling out in Sky Italy later this year Obviously, have lots of CDNs, because we have lots of traffic and lots of customers– which segues nicely onto customers

So here's what a Sky Q box looks like We have, now, TVs that use Roku and, obviously, mobile devices, smart TVs, consoles, and so on Ultimately, if you take content plus developers plus network plus customers, it all gives you quality Because lots of customers, they'll want to do the thing, watch the content at the same time So next Monday is a good example

You'll have "Game of Thrones" on there at 9 PM It's going to be simulcast between US and Europe Customers want to watch it on their devices They want to watch it together, because it's a social experience, just like when you get Super Bowl or Olympics and things And they want to watch it in HD because that's what they want

So software-defined streaming This is reimagining of how we deliver and go We came up with this interesting concept called Streaming DevOps, because we've been doing streaming for 10 years And we've been doing it very well, and obviously, customers are happy with that But we wanted to make it better

And so it's been part of software engineering We looked out and asked our colleagues and said, how can we take what other teams within Sky are doing from the DevOps world and bring that into streaming? So we created something We made it up It's called Streaming DevOps And really, it's four things that come into it

First is RAM resilience So as a broadcaster, you're giving the customer the experience In live sports, you can't afford to have downtime because if they've missed the goal, it's not going to come again They're going to have to watch on the repeat or something So you want to try and avoid that at all costs

We came up with the Single Channel Fault Domain concept I'll show you what that is in a couple of slides And also, I'll talk, I'll give you the example of how it's self-healing So basically, if you have a problem, don't show the customer Hide the problem until you've resolved it

Channels must stay on air And that's critical Geographic resilience is very key because a simple analogy would be, if you've got one data center, don't put all your channels in one data center because chances are, one day, something will go wrong and you'll lose not just one channel, but a whole bunch of channels So have two baskets Do exactly the same

Make sure the baskets are identical with your channels And you'll be a lot better In order to do that, you need highly available synchronized video streaming And that's what we'll show a little bit later The second pillar is around speed to market

Melika touched on it earlier The problem with enterprise is that you have to go through the procurement cycle If we order a server today, six to eight weeks later, the server will arrive, it'll get plugged in It's too long How many football matches, how many basketballs games have you missed by then? Well, quite a lot, especially if it's March Madness

Automatic pipeline– so taking some of the benefits of what's going on in DevOps So the continuous integration, continuous deployment I'm sure, obviously, a lot of you Googlers and other people– don't need to explain what that is The third pillar is observability We're dealing with video

We're a TV company So when that "Game of Thrones" episode is coming through, you want to make sure that, before it gets to the customer, that it's actually good video and good audio coming out Because if you had a lip sync issue, it's a bad customer experience So you want to make sure that your monitoring systems before you actually get it to the customer is actually good And doing that in an automated way is the way to go

Obviously, for operations reasons, you want to be able to have centralized logging and monitoring using open-source tools like the ELK Stack and also real-time dashboards– things like using Grafana and Prometheus, as well So you can give ops guys actually what's going on right now with customers and see how the platform is performing And the fourth one, really, is actually the key one And the reason why I'm here today is talking about transportability At the outset, we had a desire, a principle to make the platform agnostic

So on premise, we run in VMware So now, we're taking it into Google with GCP Just to mention there about GKE, we're not doing GKE on premise, but I just heard the announcement yesterday that it's coming So how does the cloud benefit live sports streaming? Fairly obvious one It shortens the development cycle because you're not having to wait for infrastructure

We can spin up, now, channels in minutes So everything's pre-configured So if you take a Formula One example, those Grand Prixs are run, on average, about every two weeks Sometimes every other week, dependant on the geography But sometimes, it's a bit longer

But on Friday, they have their practice Saturday, you have qualifying And Sunday, you have the race Each of those channels must be consistently built So if I have the Australian Grand Prix and then, two weeks later, I have the Singapore Grand Prix, and then a week later, I have the Bahrain Grand Prix, then the actual channel configuration itself is the same

The only thing that's really changed over those three events is the time and, actually, the content itself The way that I deliver it to a client is still the same end points When a customer presses Play, it's still the same Each channel in the configuration must be built consistently We have a a YAML file, another markup language

It has all the configuration parameters for channels So you would have things like your A/B ladder You would have DRM types You'd have stream formats and so on This is really critical, because this allows you to maintain or become stateless

So your channel configuration can be stateless This is very necessary for reasons of continuity at the moment We wanted a platform to be modular So, obviously, technology changes from time to time Some smart people will go and invent new codecs and things for new devices and stuff

Things will come along and things will go away So you need it to be modular so that you're not locked into a particular technology when something much better can come along and possibly even be cheaper, as well Cost effective So, obviously, make sure that you can do it cost effectively Speed to market– fairly obvious

End-to-end automation pipeline Do everything automatically Do not have engineers going into graphical user interfaces, manually putting in configurations, and pressing Apply We used to do that Not doing that anymore

So increase reliability and protection– a Single Channel Fault Domain I'll show you that And a really interesting thing that we're now able to do in the cloud, because it's much more cost effective, is around A/B multivariate testing So imagine being able to have a Sky Sports channel Let's say it's Formula One, because that's on the next slide

And that's the standard version Then what we have is a B version of it which is slightly different So you make the B version available to some customers You try what it is in the real world And you get feedback on it, like you would do going through a trials process

This is video, so you actually need to see it You need the data So it's quite key This new platform allows us to maintain channel uptime So that's really critical

Software upgrades is a classic problem So in this diagram, I've taken a layered approach My arm's not that big So other top, you've got Sky clients You've got Sky Go, Now TV, Sky Sports, Sky Q

On the next layer, we've got content delivery networks Obviously, there's many of them We've got video workflow with Harmonic We've got core services layer So this is a sort of wrapper

We'll see that on the next slide We're using Jenkins The sort of ketchup bottle is what we call Sauce Sauce is our configuration management tool that we've written in-house We're using Prometheus

We've got Grafana We've got Puppet for orchestration We've got ELK Stack for logging and information We've got container and orchestration with Docker Engine and Kubernetes We do run our own Kubernetes cluster

We're not using GKE at the moment And, really, what I would call out as the only difference is the bottom layer So you've got on-premise infrastructure, your typical Intel Xeon processors, et cetera, that cost you a lot of money and generally go out to date pretty quickly because the upgrade cycle on servers is pretty fast– 12 to 18 months And then on the bottom layer on the right-hand side, you've got a bunch of cloud stuff– hence we being here So we'll go through

We've got Google Compute Engine, which we're using We've got Google Cloud Storage, Google Load Balancing, Cloud Router because we're dealing with video We need to reliably get the video to the cloud, not trusting the internet Could use the internet That's possible

But you want to take a more reliable approach to it So using Google Direct Connect, having redundant 10-gig links delivering multiple channels into the cloud is the sensible way to go You need firewalls there because you need to protect your data And you've got NAT and the entire access management– to name but a few And really, that's the only difference there, is that bottom infrastructure

So what does single-channel fault domain look like? So I've got an example here with Formula One channel So I appreciates it's a little bit busy diagram So working through, from the top, in the blue box of the top there, we've got SDS, top layer It managers live channel configuration That's the YAML files and stuff that I was talking about before– your 5 meg, your 4 meg, your 3 meg, your 2 meg, your 1 meg, your half meg

When you A/B a ladder for customers, your protocols, your edge list, your Smooth, your DASH DRM types and stuff because you're a paid TV operator And it also manages capacity, as well So we were in our own Kubernetes cluster And so we managed the capacity on it to make sure that we have enough resources to be able to deliver the video at the right time, in the right place

It also manages the metrics, as well So it's pulling all that in And what it's doing– it's aggregating the metrics from the whole stack So this is everything, basically, coming in from the left-hand side to the right-hand side It's also pulling in the client-side metrics, as well

So you actually get an end-to-end view, which is really, really cool for operations people because generally, they often complain they don't have enough information, or they don't have the right information So trying to fix that and give them that visibility is really key The purple box on the left hand side– it says, SDS video and quality monitoring So its purpose is to monitor the output of the mezzanine So this is your input into your transcoder

That's comparing 1080 sources coming, comparing against the output, which is what the customers are consuming So they might be gaining 1080 outputs, 720, 506, so on and so forth So this is really key– and I mentioned it earlier– because you want to make sure that you're putting good video out, that you're not dropping frames, that you're not having lip sync issues, for example So that's quite critical And then you've got the other green box, there, which is harmonic content prep

What I would call out here is really– do you see there's a gray box in the middle? It says, availability zone 1 And then it's surrounded by– it's a little bit hard to make out on this big screen, here But there's four boxes there There's transcode, package and storage There's encryption and origin server

What you'll notice is, there's a little Google Cloud in the middle there As we've seen in Melika's diagram, before, Google Cloud is a lot bigger than that But I couldn't fit it on the screen And then below, you've got availability zone 2 Really key is that we're doing this– this is highly available streaming

So in the normal world, we're only just doing the top layer And that's how we've been doing it for the last 10 years or so The innovation, really, has come around being able to do it in parallel So you've got two parallel streams coming in So if you work through the example from the left-hand side, you have mezzanine output going over the Google Direct Connect that I talked about

That's 10 gigabits redundant, coming from two different data centers, coming into a transcoder At the transcoder, you've got this– you see the blue and the red arrows? You're coming in and there's a black arrow in between the transcoders This is synchronization I show it in the yellow box below, where it says, frame accurate synchronization And frame accurate is actually key, because if you want to put advertising on here, you need to make sure that it's frame accurate

Package and storage formats– your HLS, your Smooth, your DASH– you need to also be synchronized there So you need to have a protocol that basically says, I'm here I'm here I'm here I'm here

Make sure that it's the same, because otherwise, on the right-hand side where it says active active live synchronized content, and then with the little sort of globe there, you can't do something funky like go AZ1, AZ2, AZ1, AZ2 get Because you notice those two red arrows allows us to load balance live content concurrently And actually, I think Moore is going to give you a little demo of that later So here, we're actually seeing that we can use both legs in parallel These are in two different availability zones

So what it should say there is that they're in the same region So if we deployed this– let's use the US example We're west coast now So we went to Google's region in US west coast And we deployed this here

We're putting it into one AZ and a second AZ So this means that if the AZ had an issue, then there is another one available to provide customers So you've got redundancy both at data-center level, within the availability zones, but also at a channel level because, going back to my earlier example, channel density was all around, how many channels can I fit into this server here? So if buy the most expensive server, usual economics works out that I can get more channels on it Problem is that the fault domain of that channel is the number of channels you can get So if you can get five channels on it and that box gets broken, then five channels are affected

You have, now, an outage Go back to my example about software upgrades Software upgrades come out every few months They might fix bugs They might introduce features

But when it happens, you're going to take up five channels And then this happens quite a lot So we're trying to avoid that So on my next slide– I've just talked previously about happy path, when everything's working great So what happens when something goes wrong? Because we can't assume that everything is great

I've got something's gone red here This is in availability zone 1 My packager has gone wrong I don't really know– perhaps it's just this example– why that's happened I don't really care, actually

What I care about is the customer experience So the customers on the– excuse me– on the right-hand side must carry on working They must keep on serving the CDNs They must keep filling up the customer claims buffer on the playback And it must be consistent

So don't interrupt that customer experience So the way that you go about this, in this example, is two things happen Firstly is that you need to detect there's a fault So in the bottom box here, the packager in AZ1 has failed And so this means a loss of resilience, not a loss of channel

And that's a big distinction because you're still on air You're still making money And you're not getting customer tickets Customer's not going to call the call center to say, hey, I've just noticed you had a loss of resilience, because they can still watch "Game of Thrones" or the football or something So the second thing you need to do, on the right-hand side, is, you see there's a red x there where there previously was an arrow

You need to be able to make sure that you stop sending traffic to the broken leg So in here, my example in availability zone 1, that leg is gone Basically, it's not updating So the manifest is not updating There's no fresh video because the packager is no longer outputting video

So therefore, it can't be encrypted Therefore, it can't be delivered by Origim's server And go back to my example AZ1 bad, AZ2 good That model doesn't work

That's a bad customer experience What you need to do is make sure that you're only going to AZ2 So it's AZ2, AZ2, AZ2, AZ2, because the other leg is fine It's operationally good Now the way that we've solved this problem is, Kubernetes will do, obviously, the monitoring

It will detect and rebuild that the packager in AZ1 has failed I'll go and refer you back to what I was talking about a few minutes ago with the YAML files The YAML files are the state of the file of the channel So in the Sky Sports 1 example, we can return it back to how it was 30 minutes ago, before the fault occurred So we don't know why it's happened

What we're trying to do is restore that service as quickly as we can and put it back to how it was 30 minutes ago Once we've done that and the health checks pass and the channel resilience is restored, we actually go back to the previous slide, where it's now in AZ1 It's actually working again And that's, obviously, really cool I'll just call out the bottom things on the encryption

There's a pay TV operator We have digital rights management– so the things like Play Ready, Widevine, other DRMs, et cetera And Origin, obviously, is storing out and encrypted The tools that we use to do this, to actually recover service– obviously, we operate our own Kubernetes cluster In the two AZs, we have independent clusters

So the one in AZ1 is independent of AZ2 So we run them in parallel And we're doing the same workload in parallel, twice We're using Source Source is something that we wrote ourself, which is our configuration management tool

And it basically goes and manages that So we use, obviously, things like Git repos And when an engineer wants to make changes– for example, they have to check in and they have to check out We've got version control on it, as well And I don't have a slide on it, but from a confidence point of view, we actually go through five different realms on all our paths to production

We go through a development cycle So we have a development environment We go through a test environment We go through a non-functional test environment, where we're injecting load And we go through a stage, which is all about testing and making sure that it's reliable, because it's very important in video

It's no good if it stays up for 24 hours and falls over You need it to run for two weeks And then, obviously, in production– show what you're doing The one of the things that I will just finish on is, when we talk to Sky Sports about this, we went through the economics of the cloud One of the things that was called out was, we did the number crunching on it, and they had a system that they've had operational for eight years

And when we went through and said, look, that hardware is no longer supported We can't get spare parts We can't get software upgrades for it We need to go and replace that So we came out, and it was a big number

It was a high six-figure number When we went through them and said, look, how often do you actually use that system? Because the Formula One's only on once every other week The football's on several times a week What do you do at this time of day when there's no football on and there's no Formula One? Well, it's just sat there Well, maybe that's not the cost-effective way to do it

So we asked them, how many hours of programming do you have? It turned out to be about 1,000 hours of programming per year for various events And we said, well, surely there's got to be more cost-effective way of doing it Could we consider an opex-type thing and move those workloads to the cloud? And so we did that And it is significant savings It's less than half the cost

And it's more flexible And we don't have all those pain points that we talked about before We don't have to worry about infrastructure That's Google's problem We don't have to worry about storage, for example

That's Google's problem Load balancing– it's Google's problem Part of the– there we go Thank you I'm overtime

MOORE MACAULEY: Hello, everyone Thank you, Jeff Thank you for a very interesting presentation You get to enjoy another accent I'm originally from Ireland

And I'm going to try and speak very slowly, because I've got a tendency, when I'm nervous in front of big crowds, to speak awfully quickly Thank you very much So I'm from Harmonic And we believe in smarter, faster, simpler video streaming And we were very excited to be involved with Sky and Google on this project

Just a little bit of background on who Harmonic are Some of you may never have heard of us, although I'm sure that over the years, you've watched video that's flowed through many of our devices We've been around for over 20 years We are headquartered in Silicon Valley If you happen to drive along the 237, we're about North First Street

Around the world, we have over 5,000 customers in the media and service provider sector We started off building appliances And if we look at the number of over-the-top channels we have deployed, we entered the streaming market about when Jeff did, many years ago We have 38,000 live channels deployed today Most of those channels are deployed on appliances

But many of them are now deployed in our software, as a service offering In fact, if I look at our software as a service offering, for our customers, we actually deliver video from our origin servers to over 65 million subscribers who pay those customers of ours every month for the video services that they watch Subscription video is really difficult As Jeff has described, you have to provide a really high quality of service

And our goal these days is really to make sure that we maximize that quality of experience for our customer's customers So how did we do that? How did we make the transition from what was effectively a box manufacturer in Silicon Valley to someone who provides a software as a service offering? It was no small change But as the product architect, it was my job, six years ago, to kick off a project that looked at everything Harmonic did for its customers from end to end and to think about how we could re-envisage that from a hardware delivery to a software delivery For customers, we take care of many things We take care of ingesting content

We take care of play out, graphics, transcoding, over the top Even into the traditional cable world and satellite world, we provide statistical multiplexing solutions We encrypt the content We package it We provide origin serving functionality, manifest manipulation, and monitoring

And there was a very traditional way of delivering that in the appliance world– very simple in terms of boxes delivering functionality Everybody understood exactly how it was deployed What we needed to do was, we needed to think about it, reinvent it Reinvent it and deliver something that has a different architecture It's based on containers

And think about all of those problems that you have described How do you effectively provide one-for-one resilience? How do you provide upgradability? And we did that in a product that we called VOS, which is our cloud-native media processing platform And it's what Jeff used And we also deliver that as a SAS offering, vOS 360, to our customers When I look at Sky, Sky gave me all of the usual requirements that anyone starts off with a live project

They really wanted to hit all of the devices they needed to use, all of the streaming formats, to make sure they got into those devices Closed captioning They need timed metadata because they were doing sports You want to be able to know that a goal has gone off, send notifications And they really wanted ease of installation, ease of monitoring, and ease of recovery

This is what I hear from every customer But what was different about Sky's requirements was that desire to make it better, that no single point of failure And although Jeff didn't mention it, he also put a second constraint on it, which was he didn't want any additional delay to his sports Because, you know what? We could have built a very, very redundant system, but if we had delayed another 30 seconds But he said no

Don't want any more delay to my sports They need to be faster, not slower So that made it a little bit harder I think we've been 45 minutes into this presentation And it's time you saw some video

But before I do, I'm just going to show you one diagram This is the diagram of the video that I'm going to show you, how we put it together We deployed it on Google Cloud We went a little bit farther than Jeff asked for We put it in separate zones, not just separate availability zones

And we used GKE, not GCP, because I like it when Google deploys Kubernetes for me Like Jeff said, I don't want to worry about that, either We built this system Let's see what that A/B switching looks like from a customer standpoint So here's some beautiful content

And while you watch the beautiful content, it does say it's in 4K and HDR And although I'm standing in a theater today, I'm really sorry to tell you what you're watching isn't in 4K and HDR I wish that it was However, if at home you would love to see it and you happen to have a Roku streaming device, there is a NASA channel on the Roku streaming device that we provide the back end for that will allow you to see this same content in 4K and HDR And breathtaking as the content is here, it's even more breathtaking there

While you've been watching the video, perhaps you've noticed the VOS 360 logo bopping back and forward from the left and right to the screen That wasn't put there by a professional That was put there by me, which is why it's so accurately positioned So what I did, actually, was I went into those two transcoding instances that Jeff showed you previously And I configured one of them to put the logo on the left side and the other one to put the logo on the right side

A little bit of a difference And so that's why you get to see it popping back and forward from either side But it is beautiful content, and I do encourage you to go and have a look at it I'm a great believer in NASA I love it

And I love the content they produce Some of the moon rises are amazing to watch We did go even further than just zone 1 and zone 2 We offer our software not just as a SAS I told you, we sold it to customers on premise

So here is another diagram where you could actually do that same synchronization with an on-premise cluster and a cluster in GCP or GKE I think we actually did deliver on what Sky wanted And the challenge that we really had to solve was this challenge called the CAP Theorem, or Brewer's Theorem, for the computer scientists Among you, we already know what it is For those of us who don't understand, basically, if we partition two things and we drop information between them, then we have to make a decision about whether or not we serve every request

And sometimes, the requests are different Or we only serve requests from one side when we know there are problems We don't want the video to jump back and forward It's got to look seamless, like it did here So we sacrifice in availability

And the whole challenge was to make your availability numbers for that backup system as high as possible That delivered Sky benefits It gave them the improved false tolerance It gave them the ability to do green-blue upgrades It gave them the ability to do A/B testing

Way more than we originally thought Often, we started off looking at the fault tolerance of this project I think, as Harmonic in this project, we really did deliver smarter, faster, simpler video streaming And it's working with customers like Sky and working with people like Google that enables us to deliver on some of the most fantastic projects in the world with some of the greatest content– Formula One, NASA It's a great place to be

Melika, do you have a question for both of us? Or are we out of time? MELIKA GOLKARAM: I think we're out of time MOORE MACAULEY: All right Well, I don't get to answer a question Thank you very much, indeed Thank you, Melika

Thank you, Jeff It's been great to be here I hope you enjoyed it [APPLAUSE]