Performance Tuning Guide

Optimization tips for improving CDN Manager performance
You're viewing a development version of manager, the latest released version is v1.4.1

The current page Performance Tuning Guide doesn't exist in version v1.4.1 of the documentation for this product.
We can take you to the product root for that version instead: /docs/acd/components/manager/v1.4.1/

Overview

This guide provides performance tuning recommendations for the AgileTV CDN Manager (ESB3027). While the default configuration is suitable for most deployments, certain environments may benefit from additional optimizations.

Network Topology Optimization

Topology Aware Hints

The CDN Manager uses Kubernetes Topology Aware Hints to prefer routing pods in the same zone as the source of network traffic. This reduces cross-zone latency and improves overall system responsiveness.

How It Works

When nodes are labeled with topology zones, Kubernetes automatically routes traffic to pods in the same zone when possible. This is particularly beneficial for:

  • Low-latency requirements: Keeps traffic local to reduce round-trip time
  • Cost optimization: Reduces cross-zone data transfer costs in cloud environments
  • Load distribution: Prevents hotspots by distributing load across zones

Configuring Availability Zones

Each node must have zone and region labels applied for Topology Aware Hints to function:

# Label a node with a zone
kubectl label nodes <node-name> topology.kubernetes.io/zone=us-east-1a

# Label a node with a region
kubectl label nodes <node-name> topology.kubernetes.io/region=us-east-1

Replace <node-name> with your actual node names and adjust the zone/region values to match your deployment geography.

Note: Labels applied via kubectl label are automatically persistent and will survive node restarts.

Verify Topology Configuration

Verify labels are applied:

kubectl get nodes --show-labels | grep topology.kubernetes.io

Verify EndpointSlices are being generated with hints:

kubectl get endpointslices

Requirements for Topology Aware Hints

For Topology Aware Hints to activate:

  • Minimum Nodes: At least one node must be labeled with each zone referenced by endpoints
  • Symmetry: The control plane checks for sufficient CPU capacity across zones to balance traffic
  • Zone Coverage: All zones with endpoints should have at least one ready node

Integration with Pod Anti-Affinity

Topology labels complement the pod anti-affinity rules already configured in the Helm chart:

  • Pod Anti-Affinity: Handles pod-to-node placement to ensure high availability
  • Topology Aware Hints: Handles service-to-pod traffic routing to keep requests within the same zone

Together, these features optimize both placement and routing for improved performance.

Fallback Behavior

If zone labels are not configured, the system falls back to random load-balancing across all available pods. This is functionally correct but may result in:

  • Increased cross-zone traffic
  • Higher latency for some requests
  • Less predictable performance characteristics

Kernel Network Tuning (sysctl)

For high-throughput deployments, tuning Linux kernel network parameters can significantly improve connection handling and overall system performance. These settings are particularly beneficial for environments with high connection rates or large numbers of concurrent connections.

Apply the following settings to optimize network performance:

# Networking
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 2048
net.ipv4.tcp_max_syn_backlog = 2048

# Connection Tracking
net.netfilter.nf_conntrack_max = 131072
net.netfilter.nf_conntrack_tcp_timeout_established = 1200

# Port Reuse
net.ipv4.ip_local_port_range = 10240 65535
net.ipv4.tcp_tw_reuse = 1

# Memory Buffers
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608

Setting Descriptions

ParameterRecommended ValuePurpose
net.core.somaxconn1024Maximum socket listen backlog. Increases pending connection queue size.
net.core.netdev_max_backlog2048Maximum packets queued at network device level. Helps handle burst traffic.
net.ipv4.tcp_max_syn_backlog2048Maximum SYN requests queued. Improves handling of connection floods.
net.netfilter.nf_conntrack_max131072Maximum tracked connections. Prevents connection tracking table exhaustion.
net.netfilter.nf_conntrack_tcp_timeout_established1200Timeout for established connections (seconds). Reduces stale entry buildup.
net.ipv4.ip_local_port_range10240 65535Range of local ports for outbound connections. Expands available ephemeral ports.
net.ipv4.tcp_tw_reuse1Allows reusing TIME_WAIT sockets. Reduces port exhaustion under high load.
net.core.rmem_max8388608Maximum receive socket buffer size (8MB). Improves high-bandwidth transfers.
net.core.wmem_max8388608Maximum send socket buffer size (8MB). Improves high-bandwidth transfers.

Applying Settings

Temporary (Until Reboot)

Apply settings immediately but they will be lost on reboot:

sudo sysctl -w net.core.somaxconn=1024
sudo sysctl -w net.core.netdev_max_backlog=2048
# ... repeat for each parameter

Persistent (Across Reboots)

Add settings to /etc/sysctl.conf or a file in /etc/sysctl.d/:

# Create a dedicated config file
cat <<EOF | sudo tee /etc/sysctl.d/99-cdn-manager.conf
# CDN Manager Network Tuning
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 2048
net.ipv4.tcp_max_syn_backlog = 2048
net.netfilter.nf_conntrack_max = 131072
net.netfilter.nf_conntrack_tcp_timeout_established = 1200
net.ipv4.ip_local_port_range = 10240 65535
net.ipv4.tcp_tw_reuse = 1
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
EOF

# Apply all settings
sudo sysctl -p /etc/sysctl.d/99-cdn-manager.conf

Kubernetes Considerations

For Kubernetes deployments, these sysctl settings can be applied via:

  1. Node-level configuration: Use DaemonSets or node provisioning scripts
  2. Pod-level safe sysctls: Some sysctls can be set per-pod via securityContext.sysctls
  3. Container runtime configuration: Configure via container runtime options

Note that some sysctls require privileged containers or node-level configuration.

Monitoring Impact

After applying these settings, monitor:

  • Connection establishment rates
  • TIME_WAIT socket count: netstat -n | grep TIME_WAIT | wc -l
  • Connection tracking table usage: cat /proc/sys/net/netfilter/nf_conntrack_count
  • Network buffer utilization via Grafana dashboards

Resource Configuration

Horizontal Pod Autoscaler (HPA)

The default HPA configuration is tuned for production workloads. For environments with variable load, consider adjusting the scale metrics:

ComponentDefault Scale MetricsTuning Consideration
Core ManagerCPU 50%, Memory 80%Lower CPU threshold for faster scale-out
NGinx GatewayCPU 75%, Memory 80%Increase for cost optimization
MIB FrontendCPU 75%, Memory 90%Adjust based on operator concurrency

For detailed HPA configuration, see the Architecture Guide.

Resource Requests and Limits

Ensure resource requests and limits are appropriately sized for your workload. Under-provisioned resources can cause:

  • Pod evictions during high load
  • Increased latency due to CPU throttling
  • Slow scaling responses

Refer to the Configuration Guide for preset configurations and planning guidance.

Database Optimization

PostgreSQL

The PostgreSQL cluster is managed by the Cloudnative PG operator. For improved performance:

  • Connection Pooling: The application uses connection pooling by default
  • Replica Usage: Read queries can be offloaded to replicas for read-heavy workloads
  • Backup Scheduling: Schedule backups during low-traffic periods to minimize I/O impact

Redis

Redis provides in-memory caching for sessions and ephemeral state:

  • Memory Allocation: Ensure sufficient memory for cache hit rates
  • Persistence: RDB snapshots are enabled; adjust frequency based on durability needs

Kafka

Kafka handles event streaming for selection input and metrics:

  • Partition Count: Default partitions are sized for typical workloads
  • Replication Factor: Production deployments use 3 replicas for fault tolerance
  • Consumer Groups: The Selection Input Worker is limited to one consumer per partition

Monitoring Performance

Key Metrics to Watch

Monitor the following metrics for performance insights:

  • API Response Time: Track via Grafana dashboards
  • Pod CPU/Memory Usage: Identify resource bottlenecks
  • Kafka Lag: Monitor consumer lag for selection input processing
  • Database Connections: Watch for connection pool exhaustion

Grafana Dashboards

Pre-built dashboards are available at https://<manager-host>/grafana:

  • System Health: Overall cluster and application health
  • CDN Metrics: Routing and usage statistics
  • Resource Utilization: CPU, memory, and network usage per component

Troubleshooting Performance Issues

High Latency

  1. Check pod distribution across nodes: kubectl get pods -o wide
  2. Verify topology labels are applied: kubectl get nodes --show-labels
  3. Review network latency between nodes
  4. Check for resource contention: kubectl top pods

Slow Scaling

  1. Verify HPA is enabled: kubectl get hpa
  2. Check cluster capacity for scheduling new pods
  3. Review HPA metrics: kubectl describe hpa acd-manager

Database Performance

  1. Check PostgreSQL cluster status: kubectl get pods -l app=postgresql
  2. Review slow query logs (if enabled)
  3. Monitor connection pool usage

Next Steps

After reviewing performance tuning:

  1. Architecture Guide - Understand component interactions
  2. Configuration Guide - Detailed configuration options
  3. Metrics & Monitoring Guide - Comprehensive monitoring setup
  4. Troubleshooting Guide - Resolve performance issues