A software release is about deliver a set of features to customer at a certain date. There are two important factors here, one is a set of features, the other is the date. In tranditional model, we set a fixed date as “release date”, and then counting backwards to decide “code freeze date”, planning each sprint deliver milestone (progress). This model requires each feature developer have a good estimation of how much time the feature developing needs, and also the features are relatively standalone, only met these two conditions, the release date might be able to be kept. In real world though, sofeware development effort estimation rarely worked well, “hofstadter’s law” and “90/90 rule” always won. In addition, the “independent features” are always little part of the feature set, especially in monolithic project.
Agile devleoping think another way, in its manifesto, “deliver working software frequently” and “working software is the primary measure of progress” are two key principles about release. It’s easy to understand, if more features are set within a release, more developing time and testing will need to be done, and it’s harder and takes longer to make a release, then people tend to make fewer releases and it in turn means there would be even more features in one release. That leads to a vicious cycle. If we change the way we do release, make it smaller and easier, we can release more often, and it helps feature velocity and reduce the drama before the release date.
In most agile practice, it mentions investment tools like test automation, CICD, and more recently microservice was invented to arrange application as a collection of services, reduce the dependencies between teams/codes, so that feature developing can be better decoupled, helped improve the feature velocity. However it doesn’t provide much guidance on how to balance the release date and features, which usually will be heard between product and developer.
We might have notice in agile practice, in one release, we can only get two out from these three gurantees: 1. fixed release date(D), 2. fixed number of features(N), 3. feature velocity (V) (deliver as many features as possible)
We can find examples for all three cases:
In most cases, we would prefer keeping a short and fixed release date, in order to be able to release more frequent. And between fixed set feature within a release and features velocity, it depends on the coupling between features. If it’s a security requirement has to be fixed in next release, we prefer make it as fixed features, or if it’s a multi-depedent feature (e.g. k8s upgrading), we might not able to promise if that would be ready in the next release. Usually the features have to be included within a release are those might cause the software “stopped working”, things like bugs that causes crash, data loss or leaks. In short, a release prefers a fixed fequency, without promise what will be include within the release unless they are absolutely required.
Then our question becomes how to reduce the chance of introducing a “stop working” bugs. It usually introduced by deep-coupling features - things like a foundational refactor, backward incompatible changes, or change depends on multiple other features, such changes usually huge, involves multiple components, hard to review and tests. In order to improve it, we need to make sure the features in releases can be decoupled, and easy to review and test.
A decoupled release that has fixed frequency, without promising the features within each release, and offers decoupled features can be review and tested easily. It provides way for us to release on time, with known set of bug fixes and features, and has ability to introduce large change progressively.
In order to put decoupled release work in practice, we can introduce release cargo and feature switch.
If we think a decoupled release is like a train system, it rans between two stations, one side is developers, the other side is users, or whatever next step it is in the delivery cycle. The train always arrives and departures on time. The cargoes on the train are the features will be shipped from developers to users. Developers mission is to put as much features (V) onto the train before it leaves (D), while keep the training deliver at least as much as cargos it was last round(regression). We build tools like CICD to make it as fast as possible to put cargos on to the train, so that the major time costing is to prepare the cargoes.
For small cargoes(feature) or tinkers (bug fixes), it’s easy, we just load the cargo onto the train when it comes. However when it comes to large cargos, like some large features can’t be completed within a release cycle, or an engine repair, like some refactors or fundamental changes, it becomes tricky. If we put cargo onto the train only when they are ready, the moment putting it onto the train, we might not know how much impact it will cost to adjust spaces and engine horsepower, things like merging a large feature branch, preparing acceptance testing plan, updating user manual and so on. Unfortunately these situations are pretty common for an infrastructure related platform, for instance, a top to button TLS enablement, a major dependent upgrading (k8s version), or converting pattern from synchronized operation to asynchronized one.
What if we put uncompleted features onto the train early? It might sound strange to ship uncompleted features, but if we could put a “feature switch” to help us hide such feature from user (not only just from UI, but also disabled), it could bring us many benefits: it would help us review codes related to the feature early, exposes integration problems early and reduces the size of feature branches and unmerged codes that hide potential issues or technical debt, the feature switch turn on and off a underdeveloping features can help us tests, integrate the feature internally but not visible for users till they are completely accepted. Most importantly, for even we found the last minute issue before release, we can always turn such feature off and defer the effort fixing it later. This approach has been widely used in web developing for A/B testing and can be used in infrastructure platforms too.
In practice, due to the difference of infrastructure platform and a website, it might be different.
An infrastructure platform has some key factors
In order to adopt the release cargo and feature switch, we need to change the way we developing features:
In above steps, whenever a relase train arrives, ship the code as it currently is.
We actually did such practice when adding a new external network driver support for an IaaS platform.
In this pattern, we didn’t ever blocked release due to the big switch, all the features were review and integrated progressively. And this pattern should be able to be adopt for k8s upgrading or anything that might be too big enough to be fit into a fixed release time frame.
In infrastructure platform developing, it’s common to have heavy features that hard to be completed within one release time. So we either be able to keep the release pace and adding as many features as we could in one release, or keep the feature set and productivity but slips release date. In most cases we want to practice the “decoupled release”, that relaxed features from release schedule, and decouples features dependencies if possible. And release cargo and feature switch could be one method that can be used in infrastructure developing, would help to improve the release timing, large feature developing, discover the integration problem early and encourage setup CI for all features added, all of these, would help a software developing team deliver a better product on time.