Certificate Monitoring Best Practices
This guide covers common use cases, organizational strategies, and best practices for managing certificate monitoring at scale.
Organizing Monitors with Tags
Tags help you filter, search, and manage monitors efficiently. Develop a consistent tagging strategy across your organization:
Environment Tags: Use production, staging, development, and qa to distinguish between service tiers. This allows you to quickly filter critical production services or apply different alert thresholds.
Service Type Tags: Apply tags like webserver, mailserver, database, ldap, api, cdn, and loadbalancer to group similar services. This helps when troubleshooting service-specific certificate issues.
Team or Ownership Tags: Use devops, platform, security, engineering, or specific team names to identify who manages each service. Route alerts to the appropriate teams through dedicated contact groups.
Geographic or Location Tags: Apply tags like us-east, eu-west, datacenter-1, aws, azure, on-prem to track where services are deployed. This is especially useful for distributed infrastructure.
Criticality Tags: Mark services with critical, high-priority, or customer-facing to identify which certificate expirations would cause the most significant impact.
Combine multiple tags for powerful filtering. For example, tag a monitor with production, webserver, aws, us-east, and critical to fully describe its context.
Profile Strategies for Different Environments
Create separate monitoring profiles for different use cases rather than using a single profile for everything:
Production Profile:
- Aggressive thresholds: 90, 60, 45, 30, 15, 7, 3, 1, 0 days
- Enable all alert types (expiration, name verification, CA verification, integrity, connection, CAA records)
- Multiple contact groups for redundancy
- Use case: Customer-facing services where downtime is unacceptable
Staging Profile:
- Standard thresholds: 30, 15, 7, 0 days
- Disable name verification if using non-production hostnames
- Disable CAA record alerts for internal services
- Single contact group for the engineering team
- Use case: Pre-production testing environments
Development Profile:
- Minimal thresholds: 30, 7, 0 days
- Disable CA verification for self-signed certificates
- Disable name verification and CAA alerts
- Configure private CAs for internal PKI
- Use case: Developer workstations and test environments
Internal Services Profile:
- Standard thresholds: 60, 30, 15, 7, 0 days
- Configure private CAs for internal certificate authorities
- Assign monitoring agents for private network access
- Disable CAA alerts for services not on public DNS
- Use case: Internal LDAP, databases, and private APIs
Long-Lived Certificate Profile:
- Extended thresholds: 90, 60, 45, 30, 15, 7, 0 days
- All alerts enabled
- Use case: Certificates with 1-2 year validity that require lengthy procurement processes
When to Use Monitoring Agents vs Public Monitoring
Use Public Monitoring (Default) When:
- Services are accessible from the public internet
- You want to verify certificates from an external perspective (how customers see them)
- Services are behind load balancers with public IP addresses
- Simplicity is preferred - no additional infrastructure to deploy
Use Monitoring Agents When:
- Services are on private networks (RFC 1918 addresses: 10.x.x.x, 172.16-31.x.x, 192.168.x.x)
- Services are behind firewalls or VPNs without public access
- You need to monitor from specific geographic locations for compliance
- You want to control the source IP addresses for security or allowlisting
- You need redundancy across multiple network paths
Hybrid Approach: Monitor the same service from both public infrastructure and internal agents to verify both external and internal perspectives. This catches issues like split-horizon DNS, firewall misconfigurations, or certificate mismatches between internal and external endpoints.
Balancing Alert Thresholds
Choose alert thresholds based on your certificate renewal process:
Automated Renewal (Let's Encrypt, ACME):
- Use shorter thresholds: 30, 15, 7, 0 days
- Certificates renew automatically every 60-90 days
- Focus on detecting renewal failures rather than advance warnings
Manual Renewal with Quick Turnaround:
- Use standard thresholds: 60, 30, 15, 7, 0 days
- Provides advance warning while avoiding alert fatigue
- Suitable for most commercial certificate providers
Manual Renewal with Approval Process:
- Use aggressive thresholds: 90, 60, 45, 30, 15, 7, 3, 1, 0 days
- Long procurement cycles require maximum advance notice
- Common in enterprises with security review boards or budget approval requirements
Testing or Short-Lived Environments:
- Use minimal thresholds: 7, 0 days or even just 0 days
- Reduces noise for environments that may be torn down before certificates expire
Managing Internal vs External Services
External/Public Services:
- Validate against public CAs only
- Enable all alert types including CAA record validation
- Monitor from public infrastructure to match customer experience
- Use production-level alert thresholds
- Examples: Public websites, customer APIs, email servers accepting external mail
Internal Services:
- Configure private CAs for internal PKI validation
- Disable CAA alerts if not using public DNS
- Deploy monitoring agents on internal networks
- May disable name verification if using IP addresses
- Examples: Internal LDAP, databases, private APIs, development services
DMZ or Hybrid Services:
- Monitor from both public and internal perspectives
- Validate against both public and private CAs if using mixed infrastructure
- Carefully configure hostname validation for split-horizon DNS
- Examples: Web applications with public frontend and private backend, mail servers with both internal and external interfaces
Consolidating Monitoring
When to Monitor Individual Instances:
- Each instance has a unique certificate
- Different teams manage different instances
- Geographic distribution requires separate certificates
- Compliance requires monitoring each endpoint
When to Monitor Load Balancers Instead:
- All backend servers use the same certificate
- Load balancer terminates TLS (backend is HTTP)
- Reduces monitor count and simplifies management
- Billing optimization: one monitor instead of many
When to Monitor Both:
- Load balancer and backend servers have different certificates
- End-to-end TLS encryption (load balancer re-encrypts to backends)
- Verify both external customer experience and internal backend health
Cost Management
At $0.01 per monitor per day, costs scale with the number of configured monitors. See billing details for more information.
Strategies to manage costs:
- Audit monitors regularly and remove obsolete or decommissioned services
- Use tags to identify and clean up old development or test monitors
- Monitor load balancers instead of individual backend servers where appropriate
- Consolidate similar services under a single monitor when they share certificates
- Focus monitoring on customer-facing and critical services rather than every possible endpoint
Do Not Compromise on:
- Production certificate monitoring - expired certificates cause outages
- Services with manual renewal processes - automation prevents costly mistakes
- Customer-facing services - reputation and trust are more valuable than monitoring costs
Review and Maintenance
Weekly: Check the errors page for monitors with issues. Address connection failures, validation errors, and certificates approaching expiration.
Monthly: Review your monitors list for obsolete services. Audit tag usage for consistency, verify contact groups have current team members, and check that monitoring agents are running current versions.
Quarterly: Evaluate profile configurations and alert thresholds. Review private CA expiration dates and plan renewals. Assess monitoring costs and verify all critical services have monitoring configured.
After Infrastructure Changes: Add monitors for newly deployed services and remove monitors for decommissioned ones. Adjust monitoring agents if network topology changes and update contact groups when team membership changes.
Multi-Team Organizations
Centralized Management:
- Security or platform team manages all monitors and profiles
- Standardized tagging and naming conventions
- Consistent alert thresholds across the organization
- Pros: Consistency, oversight, cost control
- Cons: Bottleneck for changes, may not fit all team needs
Distributed Management:
- Individual teams manage their own monitors
- Shared profiles and contact group conventions
- Tag-based filtering for each team's services
- Pros: Team autonomy, faster changes, team-specific configuration
- Cons: Potential inconsistency, requires coordination
Hybrid Approach:
- Platform team provides standard profiles and guidelines
- Individual teams create and manage their own monitors
- Shared tagging taxonomy for cross-team visibility
- Platform team monitors critical shared infrastructure
- Best of both approaches for most organizations