Multi-Node Installation

Production high-availability deployment
You're viewing a development version of manager, the latest released version is v1.4.1

The current page Multi-Node Installation doesn't exist in version v1.4.1 of the documentation for this product.
We can take you to the closest parent section instead: /docs/acd/components/manager/v1.4.1/installation/

Overview

This guide describes the installation of the AgileTV CDN Manager across multiple nodes for production deployments. This configuration provides high availability and horizontal scaling capabilities.

Air-Gapped Deployment? This guide assumes internet connectivity. For air-gapped deployments, see the Air-Gapped Deployment Guide for additional requirements and procedures.

Prerequisites

Hardware Requirements

Refer to the System Requirements Guide for hardware specifications. Production deployments require:

  • Minimum 3 Server nodes (Control Plane Only or Combined role)
  • Optional Agent nodes for additional workload capacity

Operating System

Refer to the System Requirements Guide for supported operating systems.

Software Access

  • Installation ISO: esb3027-acd-manager-X.Y.Z.iso (for each node)
  • Extras ISO (air-gapped only): esb3027-acd-manager-extras-X.Y.Z.iso

Network Configuration

Ensure that required firewall ports are configured between all nodes before installation. See the Networking Guide for complete firewall configuration requirements.

Multiple Network Interfaces

If your nodes have multiple network interfaces and you want to use a separate interface for cluster traffic (not the default route interface), configure the INSTALL_K3S_EXEC environment variable before installing the cluster or joining nodes.

For example, if bond0 has the default route but you want cluster traffic on bond1:

# For server nodes
export INSTALL_K3S_EXEC="server --node-ip 10.0.0.10 --flannel-iface=bond1"

# For agent nodes
export INSTALL_K3S_EXEC="agent --node-ip 10.0.0.20 --flannel-iface=bond1"

Where:

  • Mode: Use server for the primary node establishing the cluster, or for additional server nodes. Use agent for agent nodes joining the cluster.
  • --node-ip: The IP address of the interface to use for cluster traffic
  • --flannel-iface: The network interface name for Flannel VXLAN overlay traffic

Set this variable on each node before running the install or join scripts.

SELinux

If SELinux is to be used, it must be set to “Enforcing” mode before running the installer script. The installer will configure appropriate SELinux policies automatically. SELinux cannot be enabled after installation.

Installation Steps

Step 1: Prepare the Primary Server Node

Mount the installation ISO on the primary server node:

mkdir -p /mnt/esb3027
mount -o loop,ro esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027

Replace X.Y.Z with the actual version number.

Step 2: Install the Base Cluster on Primary Server

If your node has multiple network interfaces and you need to specify a separate interface for cluster traffic, set the INSTALL_K3S_EXEC environment variable before running the installer (see Multiple Network Interfaces):

export INSTALL_K3S_EXEC="server --node-ip <node-ip> --flannel-iface=<interface>"

Run the installer to set up the K3s Kubernetes cluster:

/mnt/esb3027/install

This installs:

  • K3s Kubernetes distribution
  • Longhorn distributed storage
  • Cloudnative PG operator for PostgreSQL
  • Base system dependencies

Important: After the installer completes, verify that all system pods in both namespaces are in the Running state before proceeding:

# Check kube-system namespace (Kubernetes core components)
kubectl get pods -n kube-system

# Check longhorn-system namespace (distributed storage)
kubectl get pods -n longhorn-system

All pods should show Running status. If any pods are still Pending or ContainerCreating, wait until they are ready. Proceeding with incomplete system pods can cause subsequent steps to fail in unpredictable ways.

This verification confirms:

  • K3s cluster is operational
  • Longhorn distributed storage is running
  • Cloudnative PG operator is deployed
  • All core components are healthy before continuing

Step 3: Retrieve the Node Token

Retrieve the node token for joining additional nodes:

cat /var/lib/rancher/k3s/server/node-token

Save this token for use on additional nodes. Also note the IP address of the primary server node.

Step 4: Server vs Agent Node Roles

Before joining additional nodes, determine which nodes will serve as Server nodes vs Agent nodes:

RoleControl PlaneWorkloadsHA QuorumUse Case
Server Node (Combined)Yes (etcd, API server)YesParticipatesDefault production role; minimum 3 nodes
Server Node (Control Plane Only)Yes (etcd, API server)NoParticipatesDedicated control plane; requires separate Agent nodes
Agent NodeNoYesNoAdditional workload capacity only

Guidance:

  • Combined role (default): Server nodes run both control plane and workloads; minimum 3 nodes required for HA
  • Control Plane Only: Dedicate nodes to control plane functions; requires at least 3 Server nodes plus 3+ Agent nodes for workloads
  • Agent nodes are required if using Control Plane Only servers; optional if using Combined role servers
  • For most deployments, 3 Server nodes (Combined role) with no Agent nodes is sufficient
  • Add Agent nodes to scale workload capacity without affecting control plane quorum

Proceed to Step 5 to join Server nodes. Agent nodes are joined after all Server nodes are ready.

Step 5: Join Additional Server Nodes

On each additional server node:

  1. Mount the ISO:

    mkdir -p /mnt/esb3027
    mount -o loop,ro esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027
    
  2. Join the cluster:

If your node has multiple network interfaces, set the INSTALL_K3S_EXEC environment variable with the server mode before running the join script (see Multiple Network Interfaces):

export INSTALL_K3S_EXEC="server --node-ip <node-ip> --flannel-iface=<interface>"

Run the join script:

/mnt/esb3027/join-server https://<primary-server-ip>:6443 <node-token>

Replace <primary-server-ip> with the IP address of the primary server and <node-token> with the token retrieved in Step 3.

  1. Verify the node joined successfully:
    kubectl get nodes
    

Repeat for each server node. A minimum of 3 server nodes is required for high availability.

Step 5b: Taint Control Plane Only Nodes (Optional)

If you are using dedicated Control Plane Only nodes (not Combined role), apply taints to prevent workload scheduling:

kubectl taint nodes <node-name> CriticalAddonsOnly=true:NoSchedule

Apply this taint to each Control Plane Only node. Verify taints are applied:

kubectl describe nodes | grep -A 5 "Taints"

Note: This step is only required if you want dedicated control plane nodes. For Combined role deployments, do not apply taints.

Important: Control Plane Only Server nodes can be deployed with lower hardware specifications (2 cores, 4 GiB, 64 GiB) than the installer’s default minimum requirements. If your Control Plane Only Server nodes do not meet the Single-Node Lab configuration minimums (8 cores, 16 GiB, 128 GiB), you must set the SKIP_REQUIREMENTS_CHECK environment variable before running the installer or join command:

# For the primary server node
export SKIP_REQUIREMENTS_CHECK=1
/mnt/esb3027/install

# For additional Control Plane Only Server nodes
export SKIP_REQUIREMENTS_CHECK=1
/mnt/esb3027/join-server https://<primary-server-ip>:6443 <node-token>

Note: This applies to Server nodes only. Agent nodes have separate minimum requirements.

Step 6: Join Agent Nodes (Optional)

On each agent node:

  1. Mount the ISO:

    mkdir -p /mnt/esb3027
    mount -o loop,ro esb3027-acd-manager-X.Y.Z.iso /mnt/esb3027
    
  2. Join the cluster as an agent:

If your node has multiple network interfaces, set the INSTALL_K3S_EXEC environment variable with the agent mode before running the join script (see Multiple Network Interfaces):

export INSTALL_K3S_EXEC="agent --node-ip <node-ip> --flannel-iface=<interface>"

Run the join script:

/mnt/esb3027/join-agent https://<primary-server-ip>:6443 <node-token>
  1. Verify the node joined successfully:
    kubectl get nodes
    

Agent nodes provide additional workload capacity but do not participate in the control plane quorum.

Step 7: Verify Cluster Status

After all nodes are joined, verify the cluster is operational:

1. Verify all nodes are ready:

kubectl get nodes

Expected output:

NAME                 STATUS   ROLES                       AGE   VERSION
k3s-server-0         Ready    control-plane,etcd,master   5m    v1.33.4+k3s1
k3s-server-1         Ready    control-plane,etcd,master   3m    v1.33.4+k3s1
k3s-server-2         Ready    control-plane,etcd,master   2m    v1.33.4+k3s1
k3s-agent-1          Ready    <none>                      1m    v1.33.4+k3s1
k3s-agent-2          Ready    <none>                      1m    v1.33.4+k3s1

2. Verify system pods in both namespaces are running:

# Check kube-system namespace (Kubernetes core components)
kubectl get pods -n kube-system

# Check longhorn-system namespace (distributed storage)
kubectl get pods -n longhorn-system

All pods should show Running status. If any pods are still Pending or ContainerCreating, wait until they are ready.

This verification confirms:

  • K3s cluster is operational across all nodes
  • Longhorn distributed storage is running
  • Cloudnative PG operator is deployed
  • All core components are healthy before proceeding to application deployment

Step 9: Air-Gapped Deployments (If Applicable)

If deploying in an air-gapped environment, on each node:

mkdir -p /mnt/esb3027-extras
mount -o loop,ro esb3027-acd-manager-extras-X.Y.Z.iso /mnt/esb3027-extras
/mnt/esb3027-extras/load-images

Step 10: Create Configuration File

Create a Helm values file for your deployment. At minimum, configure the manager hostnames, Zitadel external domain, and at least one router:

# ~/values.yaml
global:
  hosts:
    manager:
      - host: manager.example.com
      - host: manager-backup.example.com
    routers:
      - name: director-1
        address: 192.0.2.1
      - name: director-2
        address: 192.0.2.2

zitadel:
  zitadel:
    ExternalDomain: manager.example.com

Tip: A complete default values.yaml file is available on the installation ISO at /mnt/esb3027/values.yaml. Copy this file to use as a starting point for your configuration.

Important: The zitadel.zitadel.ExternalDomain must match the first entry in global.hosts.manager or authentication will fail due to CORS policy violations.

Important: For multi-node deployments, Kafka replication is enabled by default with 3 replicas. Do not modify the kafka.replicaCount or kafka.controller.replicaCount settings unless you understand the implications for data durability.

Step 11: Load MaxMind GeoIP Databases (Optional)

If you plan to use GeoIP-based routing or validation features, load the MaxMind GeoIP databases. The following databases are used by the manager:

  • GeoIP2-City.mmdb - The City Database
  • GeoLite2-ASN.mmdb - The ASN Database
  • GeoIP2-Anonymous-IP.mmdb - The VPN and Anonymous IP Database

A helper utility is provided on the ISO to create the Kubernetes volume:

/mnt/esb3027/generate-maxmind-volume

The utility will prompt for the locations of the three database files and the name of the volume. After running this command, reference the volume in your configuration file:

manager:
  maxmindDbVolume: maxmind-db-volume

Replace maxmind-db-volume with the volume name you specified when running the utility.

Tip: When naming the volume, include a revision number or date (e.g., maxmind-db-volume-2026-04 or maxmind-db-volume-v2). This simplifies future updates: create a new volume with an updated name, update the values.yaml to reference the new volume, and delete the old volume after verification.

Step 12: Configure TLS Certificates (Optional)

For production deployments, configure a valid TLS certificate from a trusted Certificate Authority (CA). A self-signed certificate is deployed by default if no certificate is provided.

Method 1: Create TLS Secret Manually

Create a Kubernetes TLS secret with your certificate and key:

kubectl create secret tls acd-manager-tls --cert=tls.crt --key=tls.key

Method 2: Helm-Managed Secret

Add the certificate directly to your values.yaml:

ingress:
  secrets:
    acd-manager-tls: |
      -----BEGIN CERTIFICATE-----
      ...
      -----END CERTIFICATE-----
  tls:
    - hosts:
        - manager.example.com
      secretName: acd-manager-tls

Configuring All Ingress Controllers

All ingress controllers must be configured with the same certificate secret and hostname:

ingress:
  hostname: manager.example.com
  tls: true
  secretName: acd-manager-tls

zitadel:
  ingress:
    tls:
      - hosts:
          - manager.example.com
        secretName: acd-manager-tls

confd:
  ingress:
    hostname: manager.example.com
    tls: true
    secretName: acd-manager-tls

mib-frontend:
  ingress:
    hostname: manager.example.com
    tls: true
    secretName: acd-manager-tls

Important: The hostname must match the first entry in global.hosts.manager for Zitadel CORS compatibility. The secret name has a maximum length of 53 characters.

Step 13: Deploy the Manager Helm Chart

Deploy the CDN Manager application:

helm install acd-manager /mnt/esb3027/helm/charts/acd-manager --values ~/values.yaml

Note: By default, helm install runs silently until completion. To see real-time output during deployment, add the --debug flag:

helm install acd-manager /mnt/esb3027/helm/charts/acd-manager --values ~/values.yaml --debug

Tip: For better organization, split your configuration into multiple files and specify them with repeated --values flags:

helm install acd-manager /mnt/esb3027/helm/charts/acd-manager \
  --values ~/values-base.yaml \
  --values ~/values-tls.yaml \
  --values ~/values-autoscaling.yaml

Later files override earlier files, allowing you to maintain a base configuration with environment-specific overrides.

Monitor the deployment progress:

kubectl get pods

Wait for all pods to show Running status before proceeding.

Note: The default Helm timeout is 5 minutes. If the installation fails due to a rollout timeout, retry with a larger timeout value:

helm install acd-manager /mnt/esb3027/helm/charts/acd-manager --values ~/values.yaml --timeout 10m

If a previous installation attempt failed and you receive an error that the release name is already in use, uninstall the previous release before retrying:

helm uninstall acd-manager
helm install acd-manager /mnt/esb3027/helm/charts/acd-manager --values ~/values.yaml

Step 14: Verify Deployment

Verify all application pods are running:

kubectl get pods

Note: During the initial deployment, several pods may enter a CrashLoopBackoff state depending on the timing of other containers starting up. This is expected behavior as some services wait for dependencies (such as databases or Kafka) to become available. The deployment should stabilize automatically after a few minutes.

Verify pods are distributed across nodes:

kubectl get pods -o wide

Expected output for a 3-node cluster (pod names will vary):

NAME                                              READY   STATUS      RESTARTS        AGE
acd-cluster-postgresql-1                          1/1     Running     0               11m
acd-cluster-postgresql-2                          1/1     Running     0               11m
acd-cluster-postgresql-3                          1/1     Running     0               10m
acd-manager-5b98d569d9-2pbph                      1/1     Running     0               3m
acd-manager-5b98d569d9-m54f9                      1/1     Running     0               3m
acd-manager-5b98d569d9-pq26f                      1/1     Running     0               3m
acd-manager-confd-6fb78548c4-xnrh4                1/1     Running     0               3m
acd-manager-gateway-8bc8446fc-chs26               1/1     Running     0               3m
acd-manager-gateway-8bc8446fc-wzrml               1/1     Running     0               3m
acd-manager-kafka-controller-0                    2/2     Running     0               3m
acd-manager-kafka-controller-1                    2/2     Running     0               3m
acd-manager-kafka-controller-2                    2/2     Running     0               3m
acd-manager-metrics-aggregator-76d96c4964-lwdcj   1/1     Running     2               3m
acd-manager-mib-frontend-7bdb69684b-6qxn8         1/1     Running     0               3m
acd-manager-mib-frontend-7bdb69684b-pkjrw         1/1     Running     0               3m
acd-manager-redis-master-0                        2/2     Running     0               3m
acd-manager-redis-replicas-0                      2/2     Running     0               3m
acd-manager-selection-input-5fb694b857-qxt67      1/1     Running     2               3m
acd-manager-zitadel-8448b4c4fc-2pkd8              1/1     Running     0               3m
acd-manager-zitadel-8448b4c4fc-vchp9              1/1     Running     0               3m
acd-manager-zitadel-init-hh6j7                    0/1     Completed   0               4m
acd-manager-zitadel-setup-nwp8k                   0/2     Completed   0               4m
alertmanager-0                                    1/1     Running     0               3m
grafana-6d948cfdc6-77ggk                          1/1     Running     0               3m
telegraf-54779f5f46-2jfj5                         1/1     Running     0               3m
victoria-metrics-agent-dc87df588-tn8wv            1/1     Running     0               3m
victoria-metrics-alert-757c44c58f-kk9lp           1/1     Running     0               3m
victoria-metrics-longterm-server-0                1/1     Running     0               3m
victoria-metrics-server-0                         1/1     Running     0               3m

Note: Init pods (such as zitadel-init and zitadel-setup) will show Completed status after successful initialization. This is expected behavior. Some pods may show restart counts as they wait for dependencies to become available.

Step 15: Configure DNS (Optional)

Add DNS records for the manager hostname. For high availability, configure multiple A records pointing to different server nodes:

manager.example.com.  IN  A  <server-1-ip>
manager.example.com.  IN  A  <server-2-ip>
manager.example.com.  IN  A  <server-3-ip>

Alternatively, configure a load balancer to distribute traffic across nodes.

Post-Installation

After installation completes, proceed to the Next Steps guide for:

  • Initial user configuration
  • Accessing the web interfaces
  • Configuring authentication
  • Setting up monitoring

Accessing the System

Refer to the Accessing the System section in the Getting Started guide for service URLs and default credentials.

Note: A self-signed SSL certificate is deployed by default. For production deployments, configure a valid SSL certificate before exposing the system to users.

High Availability Considerations

Pod Distribution

The Helm chart configures pod anti-affinity rules to ensure:

  • Kafka controllers are scheduled on separate nodes
  • PostgreSQL cluster members are distributed across nodes
  • Application pods are spread across available nodes

Data Replication and Failure Tolerance

For detailed information on data replication strategies and failure scenario tolerance, refer to the Architecture Guide and System Requirements Guide.

Troubleshooting

If pods fail to start or nodes fail to join:

  1. Check node status: kubectl get nodes
  2. Describe problematic pods: kubectl describe pod <pod-name>
  3. Review logs: kubectl logs <pod-name>
  4. Check cluster events: kubectl get events --sort-by='.lastTimestamp'

See the Troubleshooting Guide for additional assistance.

Next Steps

After successful installation:

  1. Next Steps Guide - Post-installation configuration
  2. Configuration Guide - System configuration
  3. Operations Guide - Day-to-day operations