Performance Tuning Guide
Page not available in that version
The current page Performance Tuning Guide doesn't exist in version v1.4.1 of the documentation for this product.
Overview
This guide provides performance tuning recommendations for the AgileTV CDN Manager (ESB3027). While the default configuration is suitable for most deployments, certain environments may benefit from additional optimizations.
Network Topology Optimization
Topology Aware Hints
The CDN Manager uses Kubernetes Topology Aware Hints to prefer routing pods in the same zone as the source of network traffic. This reduces cross-zone latency and improves overall system responsiveness.
How It Works
When nodes are labeled with topology zones, Kubernetes automatically routes traffic to pods in the same zone when possible. This is particularly beneficial for:
- Low-latency requirements: Keeps traffic local to reduce round-trip time
- Cost optimization: Reduces cross-zone data transfer costs in cloud environments
- Load distribution: Prevents hotspots by distributing load across zones
Configuring Availability Zones
Each node must have zone and region labels applied for Topology Aware Hints to function:
# Label a node with a zone
kubectl label nodes <node-name> topology.kubernetes.io/zone=us-east-1a
# Label a node with a region
kubectl label nodes <node-name> topology.kubernetes.io/region=us-east-1
Replace <node-name> with your actual node names and adjust the zone/region values to match your deployment geography.
Note: Labels applied via kubectl label are automatically persistent and will survive node restarts.
Verify Topology Configuration
Verify labels are applied:
kubectl get nodes --show-labels | grep topology.kubernetes.io
Verify EndpointSlices are being generated with hints:
kubectl get endpointslices
Requirements for Topology Aware Hints
For Topology Aware Hints to activate:
- Minimum Nodes: At least one node must be labeled with each zone referenced by endpoints
- Symmetry: The control plane checks for sufficient CPU capacity across zones to balance traffic
- Zone Coverage: All zones with endpoints should have at least one ready node
Integration with Pod Anti-Affinity
Topology labels complement the pod anti-affinity rules already configured in the Helm chart:
- Pod Anti-Affinity: Handles pod-to-node placement to ensure high availability
- Topology Aware Hints: Handles service-to-pod traffic routing to keep requests within the same zone
Together, these features optimize both placement and routing for improved performance.
Fallback Behavior
If zone labels are not configured, the system falls back to random load-balancing across all available pods. This is functionally correct but may result in:
- Increased cross-zone traffic
- Higher latency for some requests
- Less predictable performance characteristics
Kernel Network Tuning (sysctl)
For high-throughput deployments, tuning Linux kernel network parameters can significantly improve connection handling and overall system performance. These settings are particularly beneficial for environments with high connection rates or large numbers of concurrent connections.
Recommended sysctl Settings
Apply the following settings to optimize network performance:
# Networking
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 2048
net.ipv4.tcp_max_syn_backlog = 2048
# Connection Tracking
net.netfilter.nf_conntrack_max = 131072
net.netfilter.nf_conntrack_tcp_timeout_established = 1200
# Port Reuse
net.ipv4.ip_local_port_range = 10240 65535
net.ipv4.tcp_tw_reuse = 1
# Memory Buffers
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
Setting Descriptions
| Parameter | Recommended Value | Purpose |
|---|---|---|
net.core.somaxconn | 1024 | Maximum socket listen backlog. Increases pending connection queue size. |
net.core.netdev_max_backlog | 2048 | Maximum packets queued at network device level. Helps handle burst traffic. |
net.ipv4.tcp_max_syn_backlog | 2048 | Maximum SYN requests queued. Improves handling of connection floods. |
net.netfilter.nf_conntrack_max | 131072 | Maximum tracked connections. Prevents connection tracking table exhaustion. |
net.netfilter.nf_conntrack_tcp_timeout_established | 1200 | Timeout for established connections (seconds). Reduces stale entry buildup. |
net.ipv4.ip_local_port_range | 10240 65535 | Range of local ports for outbound connections. Expands available ephemeral ports. |
net.ipv4.tcp_tw_reuse | 1 | Allows reusing TIME_WAIT sockets. Reduces port exhaustion under high load. |
net.core.rmem_max | 8388608 | Maximum receive socket buffer size (8MB). Improves high-bandwidth transfers. |
net.core.wmem_max | 8388608 | Maximum send socket buffer size (8MB). Improves high-bandwidth transfers. |
Applying Settings
Temporary (Until Reboot)
Apply settings immediately but they will be lost on reboot:
sudo sysctl -w net.core.somaxconn=1024
sudo sysctl -w net.core.netdev_max_backlog=2048
# ... repeat for each parameter
Persistent (Across Reboots)
Add settings to /etc/sysctl.conf or a file in /etc/sysctl.d/:
# Create a dedicated config file
cat <<EOF | sudo tee /etc/sysctl.d/99-cdn-manager.conf
# CDN Manager Network Tuning
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 2048
net.ipv4.tcp_max_syn_backlog = 2048
net.netfilter.nf_conntrack_max = 131072
net.netfilter.nf_conntrack_tcp_timeout_established = 1200
net.ipv4.ip_local_port_range = 10240 65535
net.ipv4.tcp_tw_reuse = 1
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608
EOF
# Apply all settings
sudo sysctl -p /etc/sysctl.d/99-cdn-manager.conf
Kubernetes Considerations
For Kubernetes deployments, these sysctl settings can be applied via:
- Node-level configuration: Use DaemonSets or node provisioning scripts
- Pod-level safe sysctls: Some sysctls can be set per-pod via
securityContext.sysctls - Container runtime configuration: Configure via container runtime options
Note that some sysctls require privileged containers or node-level configuration.
Monitoring Impact
After applying these settings, monitor:
- Connection establishment rates
- TIME_WAIT socket count:
netstat -n | grep TIME_WAIT | wc -l - Connection tracking table usage:
cat /proc/sys/net/netfilter/nf_conntrack_count - Network buffer utilization via Grafana dashboards
Resource Configuration
Horizontal Pod Autoscaler (HPA)
The default HPA configuration is tuned for production workloads. For environments with variable load, consider adjusting the scale metrics:
| Component | Default Scale Metrics | Tuning Consideration |
|---|---|---|
| Core Manager | CPU 50%, Memory 80% | Lower CPU threshold for faster scale-out |
| NGinx Gateway | CPU 75%, Memory 80% | Increase for cost optimization |
| MIB Frontend | CPU 75%, Memory 90% | Adjust based on operator concurrency |
For detailed HPA configuration, see the Architecture Guide.
Resource Requests and Limits
Ensure resource requests and limits are appropriately sized for your workload. Under-provisioned resources can cause:
- Pod evictions during high load
- Increased latency due to CPU throttling
- Slow scaling responses
Refer to the Configuration Guide for preset configurations and planning guidance.
Database Optimization
PostgreSQL
The PostgreSQL cluster is managed by the Cloudnative PG operator. For improved performance:
- Connection Pooling: The application uses connection pooling by default
- Replica Usage: Read queries can be offloaded to replicas for read-heavy workloads
- Backup Scheduling: Schedule backups during low-traffic periods to minimize I/O impact
Redis
Redis provides in-memory caching for sessions and ephemeral state:
- Memory Allocation: Ensure sufficient memory for cache hit rates
- Persistence: RDB snapshots are enabled; adjust frequency based on durability needs
Kafka
Kafka handles event streaming for selection input and metrics:
- Partition Count: Default partitions are sized for typical workloads
- Replication Factor: Production deployments use 3 replicas for fault tolerance
- Consumer Groups: The Selection Input Worker is limited to one consumer per partition
Monitoring Performance
Key Metrics to Watch
Monitor the following metrics for performance insights:
- API Response Time: Track via Grafana dashboards
- Pod CPU/Memory Usage: Identify resource bottlenecks
- Kafka Lag: Monitor consumer lag for selection input processing
- Database Connections: Watch for connection pool exhaustion
Grafana Dashboards
Pre-built dashboards are available at https://<manager-host>/grafana:
- System Health: Overall cluster and application health
- CDN Metrics: Routing and usage statistics
- Resource Utilization: CPU, memory, and network usage per component
Troubleshooting Performance Issues
High Latency
- Check pod distribution across nodes:
kubectl get pods -o wide - Verify topology labels are applied:
kubectl get nodes --show-labels - Review network latency between nodes
- Check for resource contention:
kubectl top pods
Slow Scaling
- Verify HPA is enabled:
kubectl get hpa - Check cluster capacity for scheduling new pods
- Review HPA metrics:
kubectl describe hpa acd-manager
Database Performance
- Check PostgreSQL cluster status:
kubectl get pods -l app=postgresql - Review slow query logs (if enabled)
- Monitor connection pool usage
Next Steps
After reviewing performance tuning:
- Architecture Guide - Understand component interactions
- Configuration Guide - Detailed configuration options
- Metrics & Monitoring Guide - Comprehensive monitoring setup
- Troubleshooting Guide - Resolve performance issues