Combining Feature Toggles with the Factory pattern to enable Continuous Delivery
Intro
Feature Toggles have become a frequently used tool in my team, bringing us many benefits such as:
- Decoupling physical deployments from “go-lives” — we can turn features on at any point, separately from physically deploying the code.
- Decoupling features from a release — if we discover a feature has an issue post deployment, we can turn it off rather than rollback the entire release, therefore mitigating risk.
- Avoiding separate long running branches — long running branches are a recipe for pain; a maintenance burden likely to produce merge conflicts as well as increasing the chances of finding issues at the last minute.
All of these contribute to our goal of achieving continuous delivery.
If not used with care though, Feature Toggles can produce unsightly & difficult to maintain code, littered with if statements. This post describes how the team have used a combination of the feature toggle technique along with the Factory pattern to conditionally alter the behaviour of multiple objects with just a single if statement.
Context
To give some brief context to the team, we work in an insurance company and among other things, build and maintain an ASP.NET Core web API that allows consumers (such as our public website & 3rd party broker applications) to obtain insurance quotes. The API is basically an orchestration service that communicates with multiple internal systems such as a pricing engine, policy management system, CRM system & others.
The application has been designed using the ports & adapters (or Hexagonal) architecture, which fits perfectly for this type of application. Each system we communicate with has an adapter that encapsulates its specific implementation details, whilst any business logic (largely orchestration & validation) is isolated in a “core” layer.
Project “JEDI”
As part of a multi-system project initiative (referred to as project “Jedi”), upcoming breaking changes were announced to the APIs of the pricing engine system (Nearix) & the policy management system (Leah), both of which we and other teams consume. The changes we’d need to make for the project would be to send new data to the pricing engine and pass new data returned on to the policy management system. Pretty simple stuff.
Why feature toggle?
The team have worked on these types of changes in the past and they have been rather painful. Not because the changes are difficult to implement, but because of the logistics involved in getting the changes in all of the systems aligned in production. The standard approach from a project management perspective for multi-system initiatives is “Big Bang”; when the green light is given, all teams need to deploy to production at the same time as each other — usually in the middle of the night or at the weekend. If one of the teams isn’t ready, nobody can deploy & the project is delayed. I’ve been on “Go/No-Go” calls with about 40 people geared up to deploy over the weekend, only for it to be called off at the last minute. If the production deployment does go ahead but an issue is discovered in one of the systems that requires a rollback, it’s likely that all of them need to rollback.
Previously being accepting of this flawed & fragile “big bang” strategy, the team’s approach to deliver this type of multi-system change was to build the new functionality into an isolated branch. If we were to follow suit again for this project, it would look like something this:
Separate instances of the systems we consume are setup in a test environment containing the new breaking changes (shown at the bottom of the diagram). Whilst these new instances are available concurrently in the test environment, only one version will ever be deployed to production.
Where we had built into a separate branch, the changes (shown in red on the diagram) were left isolated. Meanwhile, other changes required from different stakeholders continued to be made in our mainline branch. The longer we waited for the project to be given the green light, the longer the branches were left to diverge.
We could have made the effort of keeping the separate branch it up-to date with the commits in our mainline, but this is a maintenance burden that would slow us down; we’d effectively be maintaining two versions of our application & potentially be dealing with merge conflicts. It’s also not unheard of for a project like this to be scrapped altogether — so the effort of maintaining the two versions could be in vain.
We ended up putting the changes we’d made into the mainline branch on hold until the cross-system project had been deployed to production. This is obviously a sub-optimal, leaving business stakeholders (and our team) frustrated; we should be capable of delivering multiple initiatives concurrently without getting tied in knots.
So this time round, we want to decouple ourselves from any multi-system project deadlock. We want to be able to continue delivering other streams of work regardless of the readiness Jedi project. At the same time we need to be ready to go-live with the Jedi changes at a moments notice.
Feature toggles allow us to achieve this; we can merge the Jedi project changes into our mainline branch, but allow the new logic flows to be optional. This means we can continue to deliver other changes to production, without the Jedi code either left to diverge or become a maintenance burden. The Jedi code will be deployed to production in advance of the other systems without breaking anything as the Jedi code is dormant. We can be confident the dormant code hasn’t broken anything because our tests have told us so.
Implementing the Feature Toggle
The changes required for Jedi are spread across multiple areas of the code. We need to:
- Send new data to the pricing engine (via the Nearix Adapter)
- Receive new data returned from the pricing engine (via the Nearix Adapter)
- Coordinate the sending of this new data to the policy management system (Core)
- Send new data to the policy management system (via the Leah Adapter)
This post will focus on the last part as an example; feature toggling the changes required to our policy management adapter.
Our standard approach of writing an adapter that communicates with a different system using HTTP is to implement request and response mapper objects. These objects have the responsibility of mapping from our types (defined in the Core layer) to the types required by the third party system (and vice versa). The request required by Leah to create a quote is verbose to say the least. It’s a fairly old school API that requires a huge XML (SOAP) request, made up of many fields that sit in many nested elements. To make this manageable and to follow the Single Responsibility Principle, the request mapper is broken down into smaller mappers, each responsible for mapping a particular element of the request. This forms a tree-like structure of mappers which visually corresponds to the nested XML that will be generated. The class diagram below shows a small subset of the objects involved:
To fulfil the Jedi requirements, we need to make changes to the objects coloured green.
One option is to pass the options object containing the feature toggle to each individual mapper object and add if statements wherever we need to do something different for Jedi. But this would produce unsightly code and potentially add cyclomatic complexity. It might not seem that bad for just the one toggle, but if you’re adopting feature toggles as a practice and have multiple toggles active in the code concurrently, it’ll get messy. Ideally we want to keep feature toggle logic separate from the application (in this case mapping) logic; we don’t want to be debugging through a mix of the two.
Another option is to extract an interface for each object that we need to alter the behaviour of and wire up Jedi specific implementations in the IoC container. At first that might seem like an elegant solution, but this approach also has downsides:
- Breaking encapsulation. You’d need to change the low-level granular mapper classes to be public in order to register them in the IoC container in the composition root. This would expose the internal implementation of the adapter; these classes shouldn’t be be visible outside of the adapter’s module/assembly.
- Tightly coupled unit tests. The IoC container will take care of wiring up the dependencies when the app is run, but they become problematic when writing tests; you’ll need to new-up all of the instances yourself to build up the the dependency chain. Not only does this mean a lot of effort is required to setup tests, it also means the tests are tightly coupled to the inner workings of the adapter. Any refactoring you apply to it’s implementation will require changes to your tests.
Combining the Factory pattern with a Feature Toggle
A favourable option is to use a feature toggle in conjunction with a Factory. This allows us to introduce alternative behaviours at various points in the code with just the single switch point and without the downsides of the two options described above.
To demonstrate this approach, the code below is what the factory looks like that instantiates a IQuoteService implementation within the Leah adapter module. When Jedi is toggled on, it creates an alternative implementation:
This slightly different implementation inherits from the original, but overrides the GetRequestMapper method, returning a Jedi specific request mapper:
The same principal is applied with each of it’s individual mappers that we need to have specific behaviour for:
So what we end up with is:
All of the original logic remains unaltered & there aren’t lots of if statements littered around the code. Instead, we are able to pick out the specific objects that we want to extend and provide alternative implementations for. This keeps things super clean with just the single toggle located within the factory. To test the new Jedi behaviour, our tests just need to set the feature toggle in the options object — the factory provides the means of setting up the various Jedi objects, but all of that is abstracted away from the tests.
SOLID
It’s worth pointing out that this approach is only possible because the code follows the SOLID principles — two in particular are the Open/Closed Principle & the Liskov Substitution Principle.
Open-Closed Principle: Software entities should be open for extension, but closed for modification.
The mapping process that RequestMapper provides is closed; it will always map each of the elements required for the request and process the mapping in the same order. How it maps the data for each element however, is open for extension. These extension points are possible because we offer a virtual method for each element that is required to be mapped:
Liskov Substitution Principle: Objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program.
This principle is also prominent here; the virtual methods provide the means for sub-classes to replace each individual mapper type with an alternative, with the parent mapper being none the wiser.
Single Responsibility Principle: A class should only have a single responsibility, that is, only changes to one part of the software’s specification should be able to affect the specification of the class.
Another obvious but worth mentioning principle is the Single Responsibility Principle. The mapping has been broken down into a series of individual objects, each responsible for mapping a particular element of the monolithic request. This allows us to pick out these specific parts of the implementation for extension.
Conclusion
Feature Toggles are a fantastic technique for decoupling and enabling Continuous Delivery. But I’ve found a certain level of resistance to them (or at least a lack of up-take) in my organisation. I think this is down to a couple of things:
- Mindset. Instead of viewing code we deploy as being in it’s finished state, we should think of the code we deploy as always being in a state of transition. With a mindset of the former, feature toggles may “feel” wrong.
- Breaking the status quo. The “big bang” strategy has been operational for a long time & it’s what people are used to. Changing something “institutional” is difficult.
Combining feature toggles with the Factory pattern provides the ability to change the behaviour of multiple classes with a single switch and without compromising encapsulation and introducing coupling in our tests.
It seems to me that Factories are rather underused (at least in the .NET community) and dependency injection is often favoured as the default approach. But I hope I’ve made a good case that Factories are sometimes a better option.
I used to think that the Open-Closed & Liskov principles were mainly applicable in the context of writing a re-usable library. In that context it makes sense to provide extension points for consumers. But I’ve come to learn that actually, the principles apply just as much when writing code that can’t be consumed as a library (such as web applications). When we need to support multiple behaviours concurrently, even if only temporarily, we become the consumers of our own code, and the principles that guide these extension points are very useful.