Performance Tuning Guide

Optimization tips for improving CDN Manager performance

You're viewing a development version of manager, the latest released version is v1.4.1

Page not available in that version

The current page Performance Tuning Guide doesn't exist in version v1.4.1 of the documentation for this product.

We can take you to the product root for that version instead: /docs/acd/components/manager/v1.4.1/

Overview

This guide provides performance tuning recommendations for the AgileTV CDN Manager (ESB3027). While the default configuration is suitable for most deployments, certain environments may benefit from additional optimizations.

Network Topology Optimization

Topology Aware Hints

The CDN Manager uses Kubernetes Topology Aware Hints to prefer routing pods in the same zone as the source of network traffic. This reduces cross-zone latency and improves overall system responsiveness.

How It Works

When nodes are labeled with topology zones, Kubernetes automatically routes traffic to pods in the same zone when possible. This is particularly beneficial for:

Low-latency requirements: Keeps traffic local to reduce round-trip time
Cost optimization: Reduces cross-zone data transfer costs in cloud environments
Load distribution: Prevents hotspots by distributing load across zones

Configuring Availability Zones

Each node must have zone and region labels applied for Topology Aware Hints to function:

# Label a node with a zone
kubectl label nodes <node-name> topology.kubernetes.io/zone=us-east-1a

# Label a node with a region
kubectl label nodes <node-name> topology.kubernetes.io/region=us-east-1

Replace <node-name> with your actual node names and adjust the zone/region values to match your deployment geography.

Note: Labels applied via kubectl label are automatically persistent and will survive node restarts.

Verify Topology Configuration

Verify labels are applied:

kubectl get nodes --show-labels | grep topology.kubernetes.io

Verify EndpointSlices are being generated with hints:

kubectl get endpointslices

Requirements for Topology Aware Hints

For Topology Aware Hints to activate:

Minimum Nodes: At least one node must be labeled with each zone referenced by endpoints
Symmetry: The control plane checks for sufficient CPU capacity across zones to balance traffic
Zone Coverage: All zones with endpoints should have at least one ready node

Integration with Pod Anti-Affinity

Topology labels complement the pod anti-affinity rules already configured in the Helm chart:

Pod Anti-Affinity: Handles pod-to-node placement to ensure high availability
Topology Aware Hints: Handles service-to-pod traffic routing to keep requests within the same zone

Together, these features optimize both placement and routing for improved performance.

Fallback Behavior

If zone labels are not configured, the system falls back to random load-balancing across all available pods. This is functionally correct but may result in:

Increased cross-zone traffic
Higher latency for some requests
Less predictable performance characteristics

Kernel Network Tuning (sysctl)

For high-throughput deployments, tuning Linux kernel network parameters can significantly improve connection handling and overall system performance. These settings are particularly beneficial for environments with high connection rates or large numbers of concurrent connections.

Recommended sysctl Settings

Apply the following settings to optimize network performance:

# Networking
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 2048
net.ipv4.tcp_max_syn_backlog = 2048

# Connection Tracking
net.netfilter.nf_conntrack_max = 131072
net.netfilter.nf_conntrack_tcp_timeout_established = 1200

# Port Reuse
net.ipv4.ip_local_port_range = 10240 65535
net.ipv4.tcp_tw_reuse = 1

# Memory Buffers
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608

Setting Descriptions

Parameter	Recommended Value	Purpose
`net.core.somaxconn`	1024	Maximum socket listen backlog. Increases pending connection queue size.
`net.core.netdev_max_backlog`	2048	Maximum packets queued at network device level. Helps handle burst traffic.
`net.ipv4.tcp_max_syn_backlog`	2048	Maximum SYN requests queued. Improves handling of connection floods.
`net.netfilter.nf_conntrack_max`	131072	Maximum tracked connections. Prevents connection tracking table exhaustion.
`net.netfilter.nf_conntrack_tcp_timeout_established`	1200	Timeout for established connections (seconds). Reduces stale entry buildup.
`net.ipv4.ip_local_port_range`	10240 65535	Range of local ports for outbound connections. Expands available ephemeral ports.
`net.ipv4.tcp_tw_reuse`	1	Allows reusing TIME_WAIT sockets. Reduces port exhaustion under high load.
`net.core.rmem_max`	8388608	Maximum receive socket buffer size (8MB). Improves high-bandwidth transfers.
`net.core.wmem_max`	8388608	Maximum send socket buffer size (8MB). Improves high-bandwidth transfers.

Applying Settings

Temporary (Until Reboot)

Apply settings immediately but they will be lost on reboot:

sudo sysctl -w net.core.somaxconn=1024
sudo sysctl -w net.core.netdev_max_backlog=2048
# ... repeat for each parameter

Persistent (Across Reboots)

Add settings to /etc/sysctl.conf or a file in /etc/sysctl.d/:

# Create a dedicated config file
cat <<EOF | sudo tee /etc/sysctl.d/99-cdn-manager.conf
# CDN Manager Network Tuning
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 2048
net.ipv4.tcp_max_syn_backlog = 2048
net.netfilter.nf_conntrack_max = 131072
net.netfilter.nf_conntrack_tcp_timeout_established = 1200
net.ipv4.ip_local_port_range = 10240 65535
net.ipv4.tcp_tw_reuse = 1
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
EOF

# Apply all settings
sudo sysctl -p /etc/sysctl.d/99-cdn-manager.conf

Kubernetes Considerations

For Kubernetes deployments, these sysctl settings can be applied via:

Node-level configuration: Use DaemonSets or node provisioning scripts
Pod-level safe sysctls: Some sysctls can be set per-pod via securityContext.sysctls
Container runtime configuration: Configure via container runtime options

Note that some sysctls require privileged containers or node-level configuration.

Monitoring Impact

After applying these settings, monitor:

Connection establishment rates
TIME_WAIT socket count: netstat -n | grep TIME_WAIT | wc -l
Connection tracking table usage: cat /proc/sys/net/netfilter/nf_conntrack_count
Network buffer utilization via Grafana dashboards

Resource Configuration

Horizontal Pod Autoscaler (HPA)

The default HPA configuration is tuned for production workloads. For environments with variable load, consider adjusting the scale metrics:

Component	Default Scale Metrics	Tuning Consideration
Core Manager	CPU 50%, Memory 80%	Lower CPU threshold for faster scale-out
NGinx Gateway	CPU 75%, Memory 80%	Increase for cost optimization
MIB Frontend	CPU 75%, Memory 90%	Adjust based on operator concurrency

For detailed HPA configuration, see the Architecture Guide.

Resource Requests and Limits

Ensure resource requests and limits are appropriately sized for your workload. Under-provisioned resources can cause:

Pod evictions during high load
Increased latency due to CPU throttling
Slow scaling responses

Refer to the Configuration Guide for preset configurations and planning guidance.

Database Optimization

PostgreSQL

The PostgreSQL cluster is managed by the Cloudnative PG operator. For improved performance:

Connection Pooling: The application uses connection pooling by default
Replica Usage: Read queries can be offloaded to replicas for read-heavy workloads
Backup Scheduling: Schedule backups during low-traffic periods to minimize I/O impact

Redis

Redis provides in-memory caching for sessions and ephemeral state:

Memory Allocation: Ensure sufficient memory for cache hit rates
Persistence: RDB snapshots are enabled; adjust frequency based on durability needs

Kafka

Kafka handles event streaming for selection input and metrics:

Partition Count: Default partitions are sized for typical workloads
Replication Factor: Production deployments use 3 replicas for fault tolerance
Consumer Groups: The Selection Input Worker is limited to one consumer per partition

Monitoring Performance

Key Metrics to Watch

Monitor the following metrics for performance insights:

API Response Time: Track via Grafana dashboards
Pod CPU/Memory Usage: Identify resource bottlenecks
Kafka Lag: Monitor consumer lag for selection input processing
Database Connections: Watch for connection pool exhaustion

Grafana Dashboards

Pre-built dashboards are available at https://<manager-host>/grafana:

System Health: Overall cluster and application health
CDN Metrics: Routing and usage statistics
Resource Utilization: CPU, memory, and network usage per component

Troubleshooting Performance Issues

High Latency

Check pod distribution across nodes: kubectl get pods -o wide
Verify topology labels are applied: kubectl get nodes --show-labels
Review network latency between nodes
Check for resource contention: kubectl top pods

Slow Scaling

Verify HPA is enabled: kubectl get hpa
Check cluster capacity for scheduling new pods
Review HPA metrics: kubectl describe hpa acd-manager

Database Performance

Check PostgreSQL cluster status: kubectl get pods -l app=postgresql
Review slow query logs (if enabled)
Monitor connection pool usage

Next Steps

After reviewing performance tuning:

Architecture Guide - Understand component interactions
Configuration Guide - Detailed configuration options
Metrics & Monitoring Guide - Comprehensive monitoring setup
Troubleshooting Guide - Resolve performance issues