So, who is Netflix?

Netflix is an amazing digital success story. Starting out more than 15 years ago as a predominantly DVD subscription service, Netflux was able to pivot along the way and take advantage of rapidly evolving mobile technology and ever-improving internet speeds to become one of the largest video distribution networks in the planet.

We know Netflix mainly as a TV and Movie online streaming service. What most people do not recongnize is that Netflix is a hugely techologically relevant enterprise, that assumes an active role in the software development and open-source community.

The need to re-architect

Netflix has more that 37 billion subscribers in 50+ countries and territories. Over one billion hours are consumed every month, only in North America. This is why thinking about the architecture of a system such as Netflixe's has to be a main priority for the team of engineers. The Netflix API has to deal with thousands of requests coming in concurrently and perform.

The Netflix API is a facade on top of the SOA, internal to Netflix. And it is there to provide access to the different Netflix funtionality, relative to these devices.

As Netflix started to scale and grow as a business (the number of devices accessing daily was rapidly increasing), some limitations were being found. Their RESTFUL API was too generic and optimized for nobody. Some devices were making dozens of calls only for a simple user experience. This was getting out of hand because the need to customize user experience was obvious.

Netflix decided to re-architecture their system. They wanted to narrow down the number of calls from a single user experience to only one, and transpose all the logic to the server layer of the system, eliminating redundant network calls, and leveraging the far more advanced concurrency characteristics of servers, over smaller older devices the also support Netflix.

Netflix also wanted to be a fairly polyglot development environment. They did not want one team to be responsible to develop these customised endpoints. Every team should be able to develop their own web services.

Explaining the Problem

Let's sum this up. Why did a growing digital business like Netflix need to find a better architecture?

Netflix was experiencing a huge growth in subscribers, which means that the Ntflix API had to deal with more and more requests. The need for the system to scale was obvious.

Performing asynchronously was a must-have feature. The netflix team decided to refactor the API endpoints to have to deal with only one request sent by the client on a single user experience. This client logic was pushed to the service layer, eliminating redundant calls. How ever, this means that the server has to deal with cuncurrent conditional execution, and has to perform several asynchronous tasks in order to deliver the best user experience.

It was a priority that these API endpoints and services could be created by all the development teams, and not just the "API team". If only one team was in charge of developing and maintaining the entire API, they would easily become a bottleneck.This means remaining a fairly poliglot development environment, so that the teams could create web services withing the languages they were using.

Enter RX

Snippets of code like public Data getData() were breaking the contract of the service layer wiht a lot of dicisions that did not allow to make change. For example, changing the behaviour of a service from synchronous to asynchronous.

As we saw before, asynchronicity was a requirement for the Netflix API, and change is a constant in the software development process.

The concepts of Reactive Extensions started to become clear as a very pure way of thinking about asynchronous development. They presented as a simple enhancement to the Gang of Four observer pattern, moved in to the asynchronous world...

As the principles Mike Meijers presented at Microsoft started to sink in, the Netflixed chose to create RX Java and apply it to their API. This became the standard. But why is RX better than the previously consolidated "war-proven" alternatives? Let's discuss this further.

Why is RX better?

What we're looking for is a way to implement robust asynchronious behaviour. So, why not use the Future interface? This would look something like public Future<T> getData()

The Future interface has existed for quite a long time, and, at first glance, they seem like a pretty good solution to implement asynchronous behaviour. However, this solution is flawed. For instance, if you want to perform conditional login on it, you dereference the value and the system blocks. This is hard to avoid, if nested conditional logic is to be performed. To avoid blocking, you need to spawn other threads and as you start to wrap threads around threads, the whole architecture starts to become messy, and performance is compromised.

Are there other solutions?

The Guava library supplies futures with callbacks. (listenable futures). Instead of blocking, the callback is performed asynchronously and we use callback handlers. Completable Future in Java 8 also allows to apply callbacks to functions. If not for RX, this would probably be a reasonable solution.

However, Observables provide the advantage of being able to operate with vectors, as well as simple scalar responses. The Observable is one step further in the abstraction, because it allows to achieve high order functions and compositions on both vector and scalar responses. Practically, using observables, instead of blocking APIs, use Observable APIs. And it frees the implementation can perform synchronous or asynchronous. The client code treats all interactions asynchronously, however, the API implementation can make decisions on wether something is blocking, non blocking and what resources are being used.

Conclusion

  • Using RX Java, Netflix is able to develop asynchronous web-services in the service layer of their architecture, eliminating plenty of reduntand calls to their API coming from the client side.
  • This allows the control of the service logic to be inverted. The way the client side request is handled is completely up to the development team, and can be done in plenty of different ways, like synchronously or asynchronously. Synchronous operation can be an option in some cases, like if the data is already cached.
  • This level of abstraction improves code mantainability, as well as performance and scalability
  • Owner of the API should retain control over the concurrency behaviour
  • The multiple development teams are able to develop there own endpoints and services and deploy them on top of the platform, in the production environment. The Netflix API team left the business of developing web services and is dedicated entirely to structuring and maintaining the platform upon which the rest of the teams build their own services and deploy them.