Deploy Your Own SolrCloud to Kubernetes for Fun and Profit
One of the big dependencies Sitecore has is Apache Solr (not SOLR or Solar) which it uses for search. Solr is a robust and battle-tested search platform but it can be a little hairy and much like a lot of open source software, it’ll run on Windows but really feels more at home on Linux.
So if you’re running Sitecore and you’re hosted in the cloud, you’ve got a couple of options for hosting Solr:
- Deploy a bunch of VMs
- Use a managed service (SearchStax, et al.)
- Run it on a cloud service (e.g. Kubernetes)
Obviously, since it’s me, we’re going hard mode and running this thing on Kubernetes.
Surprisingly, there’s not a whole ton of documentation out there on how to set up a basic multinode SolrCloud in Kubernetes. Most of the stuff out there requires some crazy set of CRDs or custom images or something else that seemed…excessive. I just want to run the official Apache containers for Solr and Zookeeper that they publish!
First, we’ll need to understand how Solr and Zookeeper do “load balancing” because it’s a little atypical. Normally for load balancing, you would have a bunch of instances and then a separate instance to receive traffic and distribute across the instances. Well, with Solr, any of the nodes can receive queries (which is good) but the nodes do their own communication with each other to distribute the load since your cores are sharded across your instances.
The Design
The intention behind this was to be as simple and stripped down as possible - still be able to scale to multiple instances of both Zookeeper and Solr, use the default Docker Hub images, and just spin up a basic Solr instance to support Sitecore.
It’ll look something like this:
Okay, so we really just need one StatefulSet
for Zookeeper and one StatefulSet
for Solr. The StatefulSet
is similar to the Kubernetes Deployment
but it’s intended for, well, stateful applications. That is, you have some storage mechanism backing your pod so that data can be persisted across restarts. Each instance of the pod gets its own storage and managed it accordingly.
Since we’ll have multiple replicas in our StatefulSet
, we’ll also need a Service
as the network entrypoint.
The Quirks
This is all good and fine, but due to the nature of Kubernetes and containerized applications, there are a couple of things we need to accommodate for:
Zookeeper instance IDs
Each Zookeeper instance in a cluster needs its own ID, from 1-254. In a typical non-containerized scenario, these IDs are assigned as a file called myid
in the configuration - in each instance, you would hardcode a different number on each instance. If you wanted to add another instance, you’d have to set the myid
file for the new instance to be the next number up (or another number you weren’t using).
The Docker container for Zookeeper tries to mitigate this by creating an environment variable called ZOO_MY_ID
to allow for setting the ID at runtime. This means we need to set the environment variable dynamically based on which instance of the replica is starting up. Fortunately, Kubernetes provides something called the Downward API which allows us to pull pod information into environment variables.
The Code
Here are the Kubernetes configs to set up your own SolrCloud Kubernetes instance! Check out the comments to understand what the pieces do and feel free to take it and modify it to fit your needs.
Find the below and more in my GitHub repo for a sample Sitecore XP0 deployment.
Zookeeper
https://github.com/georgechang/sitecore-k8s/blob/main/zookeeper/zookeeper-statefulset.yaml
# this allows for max 1 instance of ZK to be unavailable - should be updated based on the number of instances being deployed to maintain a quorum
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: zookeeper-pdb
namespace: solr
spec:
selector:
matchLabels:
app: zookeeper
maxUnavailable: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: zookeeper-statefulset
namespace: solr
labels:
app: zookeeper
spec:
replicas: 3 # set this to the number of Zookeeper pod instances you want
updateStrategy:
type: RollingUpdate
podManagementPolicy: Parallel
selector:
matchLabels:
app: zookeeper
serviceName: zookeeper-service
template:
metadata:
labels:
app: zookeeper
spec:
nodeSelector:
kubernetes.io/os: linux
agentpool: solr
affinity:
# this causes K8s to only schedule only one Zookeeper pod per node
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zookeeper
topologyKey: "kubernetes.io/hostname"
containers:
- name: zookeeper
image: zookeeper:3.4
env:
- name: ZK_REPLICAS
value: "3" # informs Zookeeper of the number of intended replicas
- name: ZK_TICK_TIME
value: "2000"
- name: ZOO_4LW_COMMANDS_WHITELIST
value: "mntr,conf,ruok"
- name: ZOO_STANDALONE_ENABLED
value: "false"
# lists all of the Zookeeper servers that are part of this cluster
- name: ZOO_SERVERS
value: server.1=zookeeper-statefulset-0.zookeeper-service:2888:3888;2181 server.2=zookeeper-statefulset-1.zookeeper-service:2888:3888;2181 server.3=zookeeper-statefulset-2.zookeeper-service:2888:3888;2181
- name: ZOO_CFG_EXTRA
value: "quorumListenOnAllIPs=true electionPortBindRetry=0" # quorumListenOnAllIPs allows ZK to listen on all IP addresses for leader election/follower, electionPortBindRetry disables the max retry count as other ZK instances are spinning up
ports:
- name: client
containerPort: 2181
protocol: TCP
- name: server
containerPort: 2888
protocol: TCP
- name: election
containerPort: 3888
protocol: TCP
volumeMounts:
- name: zookeeper-pv
mountPath: /data
livenessProbe:
# runs a shell script to ping the running local Zookeeper instance, which responds with "imok" once the instance is ready
exec:
command:
- sh
- -c
- 'OK=$(echo ruok | nc 127.0.0.1 2181); if [ "$OK" = "imok" ]; then exit 0; else exit 1; fi;'
initialDelaySeconds: 20
timeoutSeconds: 5
readinessProbe:
# runs a shell script to ping the running local Zookeeper instance, which responds with "imok" once the instance is ready
exec:
command:
- sh
- -c
- 'OK=$(echo ruok | nc 127.0.0.1 2181); if [ "$OK" = "imok" ]; then exit 0; else exit 1; fi;'
initialDelaySeconds: 20
timeoutSeconds: 5
initContainers:
# each ZK instance requires an ID specification - since we can't set the ID using env variables, this init container sets the ID for each instance incrementally through a file on a volume mount
- name: zookeeper-id
image: busybox:latest
command:
- sh
- -c
- echo $((${HOSTNAME##*-}+1)) > /data-new/myid
volumeMounts:
- name: zookeeper-pv
mountPath: /data-new
volumeClaimTemplates:
- metadata:
name: zookeeper-pv
spec:
accessModes:
- ReadWriteOnce
storageClassName: managed-premium
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
name: zookeeper-service
namespace: solr
labels:
app: zookeeper
spec:
ports:
- port: 2888
name: server
- port: 3888
name: leader-election
- port: 2181
name: client
clusterIP: None
selector:
app: zookeeper
Solr
https://github.com/georgechang/sitecore-k8s/blob/main/solr/solr-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: solr-statefulset
namespace: solr
labels:
app: solr
spec:
replicas: 3 # set this to the number of Solr pod instances you want
selector:
matchLabels:
app: solr
serviceName: solr-service
template:
metadata:
labels:
app: solr
spec:
securityContext:
runAsUser: 1001
fsGroup: 1001
nodeSelector:
kubernetes.io/os: linux
agentpool: solr
affinity:
# this causes K8s to only schedule only one Solr pod per node
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- solr
topologyKey: "kubernetes.io/hostname"
containers:
- name: solr
image: solr:8.4
env:
# ZK_HOST lists all of the hostnames for all of the Zookeeper instances - this should correspond to however many ZK instances you have running.
- name: ZK_HOST
value: zookeeper-statefulset-0.zookeeper-service:2181,zookeeper-statefulset-1.zookeeper-service:2181,zookeeper-statefulset-2.zookeeper-service:2181, etc
- name: SOLR_JAVA_MEM
value: "-Xms4g -Xmx4g" # set the JVM memory usage and limit
ports:
- name: solr
containerPort: 8983
volumeMounts:
- name: solr-pvc
mountPath: /var/solr
livenessProbe:
# runs a built-in script to check for Solr readiness/liveness
exec:
command:
- /bin/bash
- -c
- "/opt/docker-solr/scripts/wait-for-solr.sh"
initialDelaySeconds: 20
timeoutSeconds: 5
readinessProbe:
# runs a built-in script to check for Solr readiness/liveness
exec:
command:
- /bin/bash
- -c
- "/opt/docker-solr/scripts/wait-for-solr.sh"
initialDelaySeconds: 20
timeoutSeconds: 5
initContainers:
# runs a built-script to wait until all Zookeeper instances are up and running
- name: solr-zk-waiter
image: solr:8.4
command:
- /bin/bash
- "-c"
- "/opt/docker-solr/scripts/wait-for-zookeeper.sh"
env:
- name: ZK_HOST
value: zookeeper-statefulset-0.zookeeper-service:2181,zookeeper-statefulset-1.zookeeper-service:2181,zookeeper-statefulset-2.zookeeper-service:2181
# runs a built-in script to initialize Solr instance if necessary
- name: solr-init
image: solr:8.4
command:
- /bin/bash
- "-c"
- "/opt/docker-solr/scripts/init-var-solr"
volumeMounts:
- name: solr-pv
mountPath: /var/solr
volumeClaimTemplates:
- metadata:
name: solr-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: managed-premium
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
name: solr-service
namespace: solr
labels:
app: solr
spec:
type: LoadBalancer
ports:
- port: 8983
selector:
app: solr