OpenStack Operator Initialization Resource
RHOSO a set of Kubernetes operators to deploy OpenStack
Over the past few years, I’ve had the privilege of working on a team at Red Hat, where we’ve developed a comprehensive set of Kubernetes operators specifically designed to deploy OpenStack. At this point we have a total of over 20 operators. While these operators are each focussed on deploying their respective openstack services our desire is to deploy them as a single product called RHOSO (Red Hat OpenStack on OpenShift).
Deploying with OLM, our journey so far
To deploy our operators, we utilize OLM (Operator Lifecycle Manager), a package manager that simplifies the process of deploying operators on your Kubernetes cluster. OLM is included by default with OpenShift and offers several features that enable controlled installation and upgrade of operators over time. Initially, we deployed all our operators using a single OLM bundle, which managed synchronization of all operators and presented them as a single product with multiple components. However, we soon encountered an issue with the OLM bundle size limit.
Our subsequent iteration, which marked the product’s general availability (GA), involved deploying the product using OLM with over 22 bundles and implementing a simple dependency mechanism to manage their deployment. This approach has proven to be effective, but we still face a few challenges:
- We desire only one operator product to be visible in the OpenShift Console (UI), while still allowing all operator package manifests to be displayed on the command line.
- If an older version of the openstack-operator is installed, it may still install the latest version of service operators, resulting in a mixed set of operators. While explicit pinning mechanisms exist, they require manual intervention and could be tedious.
- There is still a concern regarding the bundle size for the openstack-operator. We need sufficient space in the bundle to accommodate multiple versions of our Custom Resource Definitions (CRDs) in the future.
There’s hope on the horizon with the upcoming release of OLM v1, which promises to address the bundle size issue. However, we can’t afford to wait as we urgently need a solution for OCP 4.16.
A new initialization resource
The upcoming FR2 release of the RHOSO (Red Hat OpenStack on OpenShift) product will deploy the service operators via a new initialization resource.
An initialization resource is a custom resource is a k8s custom resource that is used to "initialize" something in the k8s cluster early so that the operators themselves can do their work. Many k8s operators and products use a similar pattern. OLM you can indicate that you have an initialization-resource by adding an annotation to your CSV (Cluster Service Version). The new annotation for our OpenStack operator looks like this:
operatorframework.io/initialization-resource: '{"apiVersion":"operator.openstack.org/v1beta1","kind":"OpenStack","metadata":{"name":"openstack","namespace":"openstack-operators"},"spec":{}}'
After adding this section to the CSV the OpenShift Console shows the new initialization resource as being required and looks something like this:
What the OpenStack initialization resource currently does
Once created the new OpenStack initialization works by bootstrapping/installing the following resources for all of the OpenStack operators:
- CRDs (custom resource definitions)
- RBAC permissions
- Creates and manages all the OpenStack operator deployments
- Webhooks. Webhook certificates are now installed via cert-manager directly instead of via OLM
All the these resources get installed directly from the openstack-operator container itself which means we no longer have any size limits. Additionally we can now maintain all our resources as a single product entry in the OLM catalog.
For developers, scaling service operator deployments to 0
In the past our project had individual OLM CSVs for each OpenStack service operator. But now there is only a single CSV for the entire set of operators. The OpenStack service operators now run as k8s Deployments which are owned by the OpenStack initialization resource.
One commmon use case for developers is to install all the OpenStack operators and scale down a single operator's replicas to 0 and then run a modified version locally with code changes to functionally test things. The new way to do that is:
1) Edit the CSV for the OpenStack Operator and set the replicas for the controller-operator to 0. This step prevents the initialization resource from overwriting any changes we make to owned resources. 2) Edit the Deployment for the OpenStack service operator you wish to disable and set its replicas to 0.
Summary
The initialization resource really helps streamline the deployment of our operators. It also puts us in control of how operators get updated and should consolidate the management of things in the future. I could see us adding a few features to the initialization resource to help prevent operator updates until a maintenance window and perform other setup type tasks.