Certificate Monitoring Best Practices

This guide covers common use cases, organizational strategies, and best practices for managing certificate monitoring at scale.

Organizing Monitors with Tags

Tags help you filter, search, and manage monitors efficiently. Develop a consistent tagging strategy across your organization:

Environment Tags: Use production, staging, development, and qa to distinguish between service tiers. This allows you to quickly filter critical production services or apply different alert thresholds.

Service Type Tags: Apply tags like webserver, mailserver, database, ldap, api, cdn, and loadbalancer to group similar services. This helps when troubleshooting service-specific certificate issues.

Team or Ownership Tags: Use devops, platform, security, engineering, or specific team names to identify who manages each service. Route alerts to the appropriate teams through dedicated contact groups.

Geographic or Location Tags: Apply tags like us-east, eu-west, datacenter-1, aws, azure, on-prem to track where services are deployed. This is especially useful for distributed infrastructure.

Criticality Tags: Mark services with critical, high-priority, or customer-facing to identify which certificate expirations would cause the most significant impact.

Combine multiple tags for powerful filtering. For example, tag a monitor with production, webserver, aws, us-east, and critical to fully describe its context.

Profile Strategies for Different Environments

Create separate monitoring profiles for different use cases rather than using a single profile for everything:

Production Profile:

Aggressive thresholds: 90, 60, 45, 30, 15, 7, 3, 1, 0 days
Enable all alert types (expiration, name verification, CA verification, integrity, connection, CAA records)
Multiple contact groups for redundancy
Use case: Customer-facing services where downtime is unacceptable

Staging Profile:

Standard thresholds: 30, 15, 7, 0 days
Disable name verification if using non-production hostnames
Disable CAA record alerts for internal services
Single contact group for the engineering team
Use case: Pre-production testing environments

Development Profile:

Minimal thresholds: 30, 7, 0 days
Disable CA verification for self-signed certificates
Disable name verification and CAA alerts
Configure private CAs for internal PKI
Use case: Developer workstations and test environments

Internal Services Profile:

Standard thresholds: 60, 30, 15, 7, 0 days
Configure private CAs for internal certificate authorities
Assign monitoring agents for private network access
Disable CAA alerts for services not on public DNS
Use case: Internal LDAP, databases, and private APIs

Long-Lived Certificate Profile:

Extended thresholds: 90, 60, 45, 30, 15, 7, 0 days
All alerts enabled
Use case: Certificates with 1-2 year validity that require lengthy procurement processes

When to Use Monitoring Agents vs Public Monitoring

Use Public Monitoring (Default) When:

Services are accessible from the public internet
You want to verify certificates from an external perspective (how customers see them)
Services are behind load balancers with public IP addresses
Simplicity is preferred - no additional infrastructure to deploy

Use Monitoring Agents When:

Services are on private networks (RFC 1918 addresses: 10.x.x.x, 172.16-31.x.x, 192.168.x.x)
Services are behind firewalls or VPNs without public access
You need to monitor from specific geographic locations for compliance
You want to control the source IP addresses for security or allowlisting
You need redundancy across multiple network paths

Hybrid Approach: Monitor the same service from both public infrastructure and internal agents to verify both external and internal perspectives. This catches issues like split-horizon DNS, firewall misconfigurations, or certificate mismatches between internal and external endpoints.

Balancing Alert Thresholds

Choose alert thresholds based on your certificate renewal process:

Automated Renewal (Let's Encrypt, ACME):

Use shorter thresholds: 30, 15, 7, 0 days
Certificates renew automatically every 60-90 days
Focus on detecting renewal failures rather than advance warnings

Manual Renewal with Quick Turnaround:

Use standard thresholds: 60, 30, 15, 7, 0 days
Provides advance warning while avoiding alert fatigue
Suitable for most commercial certificate providers

Manual Renewal with Approval Process:

Use aggressive thresholds: 90, 60, 45, 30, 15, 7, 3, 1, 0 days
Long procurement cycles require maximum advance notice
Common in enterprises with security review boards or budget approval requirements

Testing or Short-Lived Environments:

Use minimal thresholds: 7, 0 days or even just 0 days
Reduces noise for environments that may be torn down before certificates expire

Managing Internal vs External Services

External/Public Services:

Validate against public CAs only
Enable all alert types including CAA record validation
Monitor from public infrastructure to match customer experience
Use production-level alert thresholds
Examples: Public websites, customer APIs, email servers accepting external mail

Internal Services:

Configure private CAs for internal PKI validation
Disable CAA alerts if not using public DNS
Deploy monitoring agents on internal networks
May disable name verification if using IP addresses
Examples: Internal LDAP, databases, private APIs, development services

DMZ or Hybrid Services:

Monitor from both public and internal perspectives
Validate against both public and private CAs if using mixed infrastructure
Carefully configure hostname validation for split-horizon DNS
Examples: Web applications with public frontend and private backend, mail servers with both internal and external interfaces

Consolidating Monitoring

When to Monitor Individual Instances:

Each instance has a unique certificate
Different teams manage different instances
Geographic distribution requires separate certificates
Compliance requires monitoring each endpoint

When to Monitor Load Balancers Instead:

All backend servers use the same certificate
Load balancer terminates TLS (backend is HTTP)
Reduces monitor count and simplifies management
Billing optimization: one monitor instead of many

When to Monitor Both:

Load balancer and backend servers have different certificates
End-to-end TLS encryption (load balancer re-encrypts to backends)
Verify both external customer experience and internal backend health

Cost Management

At $0.01 per monitor per day, costs scale with the number of configured monitors. See billing details for more information.

Strategies to manage costs:

Audit monitors regularly and remove obsolete or decommissioned services
Use tags to identify and clean up old development or test monitors
Monitor load balancers instead of individual backend servers where appropriate
Consolidate similar services under a single monitor when they share certificates
Focus monitoring on customer-facing and critical services rather than every possible endpoint

Do Not Compromise on:

Production certificate monitoring - expired certificates cause outages
Services with manual renewal processes - automation prevents costly mistakes
Customer-facing services - reputation and trust are more valuable than monitoring costs

Review and Maintenance

Weekly: Check the errors page for monitors with issues. Address connection failures, validation errors, and certificates approaching expiration.

Monthly: Review your monitors list for obsolete services. Audit tag usage for consistency, verify contact groups have current team members, and check that monitoring agents are running current versions.

Quarterly: Evaluate profile configurations and alert thresholds. Review private CA expiration dates and plan renewals. Assess monitoring costs and verify all critical services have monitoring configured.

After Infrastructure Changes: Add monitors for newly deployed services and remove monitors for decommissioned ones. Adjust monitoring agents if network topology changes and update contact groups when team membership changes.

Multi-Team Organizations

Centralized Management:

Security or platform team manages all monitors and profiles
Standardized tagging and naming conventions
Consistent alert thresholds across the organization
Pros: Consistency, oversight, cost control
Cons: Bottleneck for changes, may not fit all team needs

Distributed Management:

Individual teams manage their own monitors
Shared profiles and contact group conventions
Tag-based filtering for each team's services
Pros: Team autonomy, faster changes, team-specific configuration
Cons: Potential inconsistency, requires coordination

Hybrid Approach:

Platform team provides standard profiles and guidelines
Individual teams create and manage their own monitors
Shared tagging taxonomy for cross-team visibility
Platform team monitors critical shared infrastructure
Best of both approaches for most organizations

Organizing Monitors with Tags​

Profile Strategies for Different Environments​

When to Use Monitoring Agents vs Public Monitoring​

Balancing Alert Thresholds​

Managing Internal vs External Services​

Consolidating Monitoring​

Cost Management​

Review and Maintenance​

Multi-Team Organizations​