Horsdal Consult: 2016

Monday, November 7, 2016

Sharing and Caching Assets Across Single-Page BFFs

Over the last couple of posts I showed the single-page BFFs pattern (a variation of the Backend-For-Frontend aka BFF pattern where the BFF serves only one page in a web application), and how you will likely need to place single-page BFFs behind a reverse proxy.

In this post I will outline an approach to handling shared assets - like CSS or Javascript which is used on all or most of the pages in a web application built using a number of single-page BFFs.

When building a web application consisting of a number of pages, there is likely both some look-and-feel and some behavior that we want to share across all pages - buttons should look and behave the same, menus that should be on all pages, colors and fonts should be consistent etc. This calls for sharing some CSS and some Javascript across the pages. When thinking about handling these shared things across a number of single-page BFFs there are some competing requirements to take into consideration:

Clients - i.e. browsers - should be able to cache the CSS and Javascript as the user moves through the web application. The alternative is that the client downloads the same CSS and JS over and over as the user navigates the web application. That would be too wasteful in most cases.
Each single-page BFF should be individually deployable. That is we should be able deploy a new version of one single-page BFF without having to deploy any other services. If not the system soon becomes unmanageable. Furthermore developers should be able to run and test each single-page BFF by itself without having to spin up anything else.
I do not want to have to write the same CSS and JS in each single-page BFF. I want to share it.

In this post I will show an approach that allows me to achieve all three of the above.

First notice that the obvious solution to the first bullet - allowing clients to cache shared CSS and JS - is to put the shared CSS and the shared JS in two bundles - a CSS bundle and a JS bundle - and put the bundles at a known place. That way each single-page BFF can just address the bundle at the known place, as shown below. Since all pages refer to the same bundles the client can cache them.

The problem with this is that it violates bullet 3: Any given single-page BFF only works when both the global bundles are available, so for a developer to work on a single-page BFF, they need not only the BFF, but also something to serve the bundles. If we are not careful, this approach can also violate bullet 2: Since the single-page BFFs depend on the two global bundles, but they do not contain the global bundles themselves, each one has a hard dependency on another service - the one that serves the bundles. That means we can easily end up having to deploy the service that serves the bundles at the same time as a new version of one of the single-page BFFs.

Looking instead a bullet three first suggests that we should put the shared CSS and Javascript into a package - an NPM package, that is - and include that package in each single-page BFF. That means each single-page BFF has the bundles it needs, so developers can work with each single-page BFF in isolation. This also means that each single-page BFF continues to be individually deployable: When deployed they bring along the bundles they need, so there is no dependency on another service to serve the bundles. The problem now becomes how to allow clients to cache the bundles? When the bundle are included in each single-page BFF, each page will use different URsL for the bundles - completely defeating HTTP cache header.

Luckily, as discussed in the last post, the single-page BFFs are already behind a reverse proxy. If we put a bit of smarts into the reverse proxy we can include the bundles in each single-page BFF and still let every page use same the URLs for the bundles, like so:

The trick implementing this is to have the reverse proxy look at the referrer of any request for one of the bundles - if the referrer is frontpage, the request is proxied to the frontpage single-page BFF, if the referrer is the my account page, the request is proxied to the MyAccount single-page BFF. That means that the bundles loaded on the frontpage is the bundles deployed in the frontpage single page BFF. Likewise the bundles loaded on the my account page come from the MyAccount single-page BFF. But from the client perspective the bundles are on the same URL - i.e. the client is able to cache the bundles based on HTTP cache headers.

To sum up: Put shared CSS and JS bundles in an NPM package that all the single-page BFFs use, and making the reverse proxy use the referrer on requests for the bundles to decide where to proxy requests to allows clients the cache the bundles, the single-page BFFs to be individually deployable and developers work with each single-page BFF in isolation.

Wednesday, October 5, 2016

Using Single-Page BFFs and Hiding It

In the last post I wrote about a variation of the BFF - Backend-for-frontend - pattern: The single-page BFF page, where the BFF does not support a full web frontend, but only a single page. The reason to use this pattern is that even BFFs can grow out of hand, at which point we can consider cutting the BFF into several single-page BFFs, each responsible for supporting just on page.

Now imagine we have a web site that uses the single-page BFF patterns extensively. As users navigate through the site and visit different pages they will in effect interact with different single-page BFFs. Each such single-page BFF is its own service, and conseqeuntly has its own host name - e.g. the frontpage BFF might be at https://frontpage.bff.ajax.com, the "my account" page might be at https://myaccount.bff.ajax.com/ and so on for each logical page on the site. Like this:

The problem we now have is that if users access these adresses - https://frontpage.bff.ajax.com, https://myaccount.bff.ajax.com etc. - we are exposing the way we have chosen to implement the system on the server side to a our users. That's bad: Users will bookmark these and then we are tied to keeping these host names alive, even if we decide to rearchitect the server side to use some other pattern than single-page BFFs. Instead I would like users to just stay of the same domain all the time - like https://www.ajax.com/ - and just visit different pages on that domain - e.g. https://www.ajax.com/ and https:/www.ajax.com/account. So how do we make ends meet? We simply set up a reverse proxy in front of the single-BFFs. Like so:

This is a standard thing to do and can easily be done with widespread technology, like nginx, squid or IIS with ARR. In other words: There is nothing new here, I am only pointing out that using the single-page BFF pattern will lead a need to set a up a reverse proxy in front of your services.

Wednesday, September 7, 2016

Single-Page BFFs

In this post I describe a variation of the backend for frontend (AKA BFF) pattern that Søren Trudsø made me aware of. With this variation we create not just a backend for each frontend - one backend for the iOS app, one for the Android app and one for the web site, say - but a backend for each page in the web frontend: A single-page BFF.

The BFF patterns is a pattern for building applications on top of a system of microservices. I describe that pattern in my Microservice in .NET Core book, but you can also find good descriptions of the pattern on the web - for instance from Sam Newman. To sum it up the BFF patterns is that you create a microservice for each frontend you have. That is, if you have an iOS app there is an iOS BFF. If you have an Android app you have an Android BFF. If you have an Oculus Rift app there is an Oculus Rift BFF. Each BFF is a microservice with the sole responsibility of serving its frontend - the Android BFF is there solely to serve the Android app and does not care about the Oculus rift app whatsoever. The BFFs do nothing more than gather whatever their frontends - or apps - need and serve that in a format convenient to the frontend. The BFFs do not implement any business logic themselves, all that is delegated to other microservices. This figure illustrates the setup:

In this setup each BFF tends to grow as its frontend grows. That is, the web BFF tends to grow as the web frontend grows: When more functionality is added to existing pages, that functionality might need new endpoints to make AJAX requests to, and thus the web BFF grows a little bit. When new pages are added to the web frontend, the web BFF also grows.

Sidenote: I realize that the term "page" on a web site is somewhat fuzzy these days: Single page apps routinely swap the entire view from one thing to something completely different, giving the user the experience of going to a new "page". In this post I use the term "page" in the more traditional sense of a full page reload. You know, that thing when you follow a link and the browser loads a completely new HTML document from a new URL. I think you've encountered it before :D

The size of the web BFF might not be problem at first (or ever), but a some point enough may have been added to the web frontend to make it a problem. In this situation I have found it useful to break the web BFF down by page boundaries: In stead of having one BFF serve the entire web frontend, I will have a one BFF for each page on the web site, like so:

This way the BFFs are kept small and focused on a single task, namely serving a single page.

Notice that one or more of the pages here can be single page apps that include several views, so there need not be a direct correspondance between what the use perceives a separate views - or pages - and the single page BFFs on the backend. Rather, in such cases, there is a BFF for each single page app.

Wednesday, February 17, 2016

Book Excerpt: Expecting Failures In Microservices and Working Around Them

This article was excerpted from the book Microservices in .NET.

When working with any non-trivial software systems, we must expect failures to occur. Hardware can fail. The software itself might fail due, for instance, to unforeseen usage or corrupt data. A distinguishing factor of a microservice system is that there is a lot of communication between the microservices.

Figure 1 shows the communication resulting from a user adding an item to his/her shopping cart. From figure 1 we see that just one user action results in a good deal of communication. Considering that a system will likely have concurrent users all performing many actions, we can see that there really is a lot of communication going on inside a microservice system.

We must expect that communication to fail from time to time. The communication between only two microservices may not fail very often, but in regard to a microservice system as a whole, communication failures are likely to occur often simply because of the amount of communication going on

Figure 1 In a system of microservices, there will be many communication paths

Since we have to expect that some of the communication in our microservice system will fail, we should design our microservices to be able to cope with those failures.

We can divide the collaborations between microservices into three categories: Query, command and event based collaborations. When a communication fails, the impact depends on the type of collaboration and way the microservices cope with it:

Query based collaboration: When a query fails, the caller does not get the information it needs. If the caller copes well with that, the impact is that the system keeps on working, but with some degraded functionality. If the caller does not cope well, the result could be an error.
Command based collaboration: When sending a command fails, the sender won’t know if the receiver got the command or not. Again, depending on how the sender copes, this could result in an error, or it could result in degraded functionality.
Event based collaboration: When a subscriber polls an event feed, but the call fails, the impact is limited. The subscriber will poll the event feed later and, assuming the event feed is up again, receive the events at that time. In other words, the subscriber will still get all events, only some of them will be delayed. This should not be a problem for an event-based collaboration, since it is asynchronous anyway.

Have Good Logs

Once we accept that failures are bound to happen and that some of them may result, not just in a degraded end user experience, but in errors, we must make sure that we are able to understand what went wrong when an error occurs. That means that we need good logs that allow us to trace what happened in the system leading up to an error situation. "What happened" will often span several microservices, which is why you should consider introducing a central Log Microservice, as shown in figure 2, that all the other microservices send log messages to, and which allows you to inspect and search the logs when you need to.

Figure 2 A central Log Microservice receives log messages from all other microservices and stores them in a database or a search engine. The log data is accessible through a web interface. The dotted arrows show microservices sending log messages to the central Log Microservice

The Log Microservice is a central component that all other microservices use. We need to make certain that a failure in the Log Microservice does not bring down the whole system when all other microservices fail because they are not able to log messages. Therefore, sending log messages to the Log Microservice must be fire and forget - that is, the messages are sent and then forgotten about. The microservice sending the message should not wait for a response.

SIDEBAR

Use an Off-the-Shelf Solution for the Log Microservice

A central Log Microservice does not implement a business capability of a particular system. It is an implementation of generic technical capability. In other words the requirements to a Log Microservice in systems A are not that different from the requirements to a Log Microservice is system B. Therefore I recommend using an off-the-shelf solution to implement your Log Microservice - for instance logs can be stored in Elasticsearch and made accessible with Kibana. These are well-established and well-documented products, but I will not delve into how to set them up here.

Correlation Tokens

In order to be able to find all log messages related to a particular action in the system, we can use correlation tokens. A correlation token is an identifier attached to a request from an end user when it comes into the system. The correlation token is passed along from microservice to microservice in any communication that stems from that end-user request. Any time one of the microservices sends a log message to the Log Microservice, the message should include the correlation token. The Log Microservice should allow searching for log messages by correlation token. Referring to figure 2, the API Gateway would create and assign a correlation token to each incoming request. The correlation is then passed along with every microservice-to-microservice communication.

Roll forward vs Roll back

When errors happen in production, we are faced with the question of how to fix them. In many traditional systems, if errors start occurring shortly after a deployment, the default would be to roll back to the previous version of the system. In a microservice system, the default can be different. Microservices lend themselves to continuous delivery. With continuous delivery, microservices will be deployed very often and each deployment should be both fast and easy to perform. Furthermore, microservices are sufficiently small and simple so many bug fixes are also simple. This opens the possibility of rolling forward rather than rolling backward.

Why would we want to default to rolling forward instead of rolling backward? In some situations, rolling backward is complicated, particularly when database changes are involved. When a new version that changes the database is deployed, the microservice will start producing data that fits in the updated database. Once that data is in the database, it has to stay there, which may not be compatible with rolling back to an earlier version. In such a case, rolling forward might be easier.

Do Not Propagate Failures

Sometimes things happen around a microservice that may disturb the normal operation of the microservice. We say that the microservice is under stress in such situations. There are many sources of stress. To name a few, a microservice may be under stress because:

One of the machines in the cluster its data store runs on has crashed
It has lots network connectivity to one of its collaborators
It is receiving unusually high amounts of traffic
One of its collaborators is down

In all of these situations, the microservice under stress cannot continue to operate the way it normally does. That doesn’t mean that it’s down, only that it must cope with the situation.

When one microservice fails, its collaborators are put under stress and are also at risk of failing. While the microservice is failing, its collaborators will not be able to query, send commands or poll events from the failing microservice. As illustrated in figure 3, if this makes the collaborators fail, even more microservices are at risk of failing. At this point, the failure has started propagating through the system of microservices. Such a situation can quickly escalate from one microservice failing to lot of microservices failing.

Figure 3 If the microservice marked FAILED is failing, so is the communication with it. That means that the microservices at the other end of those communications are under stress. If the stressed microservices fail due to the stress, the microservices communicating with them are put under stress. In that situation, the failure in the failed microservice has propagated to several other microservices.

Some examples of how we can stop failures propagating are:

When one microservice tries to send a command to another microservice, which happens to be failing at the time, that request will fail. If the sender simply fails as well, we get the situation illustrated in figure 3 where the failures propagate back through the system. To stop the propagation, the sender might act as if the command succeeded, but actually store the command into a list of failed commands. The sending microservice can periodically go through the list of failed commands and try to send them again. This is not possible in all situations, because the command may need to be handled here and now, but when this approach is possible it stops the failure in one microservice from propagating.
When one microservice queries another one that’s failing, the caller could use a cached response. In case the caller has a stale response in the cache, but a query for a fresh response fails, it might decide to use the stale response anyway. Again, this is not something that will be possible in all situations, but when it is, the failure will not propagate.
An API Gateway that is stressed because of high amounts of traffic from a certain client can throttle that client by not responding to more than a certain number of requests per second from that client. Notice that the client may be sending an unusually high amount of requests because it is somehow failing internally. When throttled, the client will get a degraded experience, but will still get some responses. Without the throttling, the API Gateway might become slow for all clients or it might fail completely. Moreover, since the API Gateway collaborates with other microservices, handling all the incoming requests would push the stress of those requests onto other microservices too. Again, the throttling stops the failure in the client from propagating further into the system to other microservices.

As we can see from these examples, stopping failure propagation comes in many shapes and sizes. The important thing to take away from this article is the idea of building safeguards into your systems that are specifically designed to stop from propagating the kinds of failures you anticipate. How that is realized depends on the specifics of the systems you are building. Building in these safeguards may take some effort, but it’s very often well worth the effort because of the robustness they give the system as a whole.

Pages