June 1, 2026
How to Reduce Datadog Log Ingestion Cost Without Losing Visibility
A practical log reduction strategy for classifying high-value logs, sampling noisy streams, filtering at the edge, and routing lower-value telemetry to cheaper storage.
For the broader framework, see Datadog Cost Reduction: What to Keep in Datadog and What to Offload to Zabbix/Grafana.
Datadog is useful when teams need fast access to production logs, metrics, traces, alerts, and incident context in one place. The cost problem usually starts when every log line is treated as equally valuable.
A good log reduction strategy does not begin by deleting data. It begins by classifying logs by operational value, deciding which logs need hot search, which logs only need archive retention, and which logs should never reach Datadog in the first place.
The practical goal is simple: keep critical production and security visibility in Datadog, but stop paying premium observability prices for repetitive, low-value noise.
Why Datadog Log Costs Grow
Datadog log billing has two separate concepts that matter for cost planning: ingestion and indexing. Datadog charges for ingested logs based on the total number of gigabytes submitted to the Datadog Logs service. It also charges for indexed log events based on the number of log events submitted for indexing under the selected retention policy.
That split matters. Index exclusion filters can reduce searchable log volume, but they do not automatically mean the log never touched the Datadog platform. If the log has already been submitted to Datadog, you still need to understand whether you paid an ingestion cost before reducing indexing.
The usual cost drivers are not mysterious:
- debug logging left enabled in production
- health check and readiness probe logs
- HTTP 200 access logs from web servers, load balancers, and ingress controllers
- Kubernetes control plane and container churn
- VPC flow logs, CDN logs, DNS logs, firewall logs, and service mesh logs
- logs with large unused fields
- duplicated telemetry already available in another system
- logs converted into a substitute for metrics
Datadog’s own guidance describes several ways log volume becomes wasteful: debug logs, error loops, unnecessary performance data inside logs, extra fields that are never used, and log streams that are not all equal in value.
The First Rule: Do Not Cut Blindly
Random log cuts create two predictable problems:
- Incident response gets worse. A team may save money on a monthly bill and then lose hours during a production incident because the one log stream that explained the failure was excluded.
- Security and compliance evidence can disappear. Authentication logs, authorization failures, administrative changes, payment-related audit events, and privileged access events may be needed for investigations or audits.
The correct question is not “which logs can we delete?” The correct question is “which logs need hot search, which need cheaper retention, and which should be reduced before they reach Datadog?”
Classify Logs into Three Tiers
A simple three-tier model is enough for most environments.
Tier 1: Keep hot and searchable
These logs need to stay searchable in Datadog or another hot search platform:
- production errors
- HTTP 5xx responses
- failed authentication attempts
- privileged access and configuration changes
- payment, identity, and security events
- database connection failures
- logs directly tied to active monitors or incident workflows
These logs are expensive to lose. They should have clear retention rules, ownership, alerting, and access controls.
Tier 2: Sample, reduce, or move to warm storage
These logs are useful, but usually not useful at full volume:
- HTTP 200 and 302 access logs
- normal application info logs
- successful transaction logs
- high-volume web server logs
- load balancer access logs
- standard service mesh traffic logs
For these streams, keeping 100 percent of events in hot search is often wasteful. A common pattern is to index a sample, generate metrics from the stream, and route the full raw feed to cheaper storage for later investigation.
Tier 3: Drop, archive, or bypass Datadog
These logs often provide little value in hot search:
- Kubernetes liveness and readiness probes
- repetitive health checks
- verbose DEBUG and TRACE logs
- non-production debug noise
- heartbeat messages
- duplicate telemetry already collected elsewhere
Some of this data can be dropped. Some should be archived to object storage if the organization wants forensic retention. The key point is that low-value logs should be reduced at the edge whenever possible, not after they have already driven ingestion volume.
What to Do Inside Datadog First
Before deploying new infrastructure, use the controls already available in Datadog.
Review top log producers
Start with a seven-day usage review. Group logs by service, environment, source, status code, and team. Identify the top five producers by volume. The goal is to find patterns, not to argue about individual log lines.
Useful questions:
- Which services produce the most logs?
- Which indexes are rarely queried?
- Which logs are mostly HTTP 2xx or health checks?
- Which logs are generated by non-production environments?
- Which teams can explain their log volume?
Datadog recommends usage monitoring, index query review, and exclusion filters for high-volume logs. Log Patterns can also be used to find repetitive log lines that are good exclusion candidates.
Segment indexes
Do not send all logs into one catch-all index with one retention policy. Separate indexes by value and use case.
A practical structure:
| Index | Example content | Suggested treatment |
|---|---|---|
| Production critical | errors, auth, payment, security events | hot search, alerting, normal retention |
| Production operational | access logs, info logs, normal traffic | sampled or shorter retention |
| Non-production | dev and staging logs | short retention or heavy exclusion |
| Archive-only | low-value historical data | route to archive or lower-cost tier |
Datadog index filters are evaluated in order, and logs enter the first index whose filter they match. That means index order matters. Put specific indexes above broad catch-all filters.
Use exclusion filters carefully
Exclusion filters are useful for controlling indexing. Datadog documents that excluded logs are discarded from indexes, but can still flow through Live Tail, generate metrics, and be archived.
Good exclusion candidates:
http.status_code:[200 TO 299]for high-volume access logsstatus:DEBUGin production/healthz,/ready, and similar health probes- non-production logs not tied to incidents
Do not use exclusion filters as a substitute for governance. Create owners, change control, and usage alerts. Otherwise one team can accidentally re-enable a noisy stream and recreate the bill problem.
Convert repetitive logs into metrics
Some logs exist only because teams want a count, rate, or trend. In those cases, generate a metric and reduce the raw log volume.
Examples:
- count successful logins by region
- count payment failures by processor
- count API responses by status code
- count queue processing results
Once the metric is validated, the raw informational log can often be sampled, shortened, or excluded from hot indexing.
Where Edge Filtering Matters
In-platform controls help, but edge filtering is where larger cost reductions become possible. If noisy logs never leave the host, node, or cluster, they do not become Datadog ingestion volume.
This is where Vector, Fluent Bit, Datadog Observability Pipelines, or another telemetry router fits.
A practical routing policy looks like this:
| Log type | Destination |
|---|---|
| ERROR, FATAL, HTTP 5xx, auth failures | Datadog hot index |
| sampled HTTP 2xx access logs | Datadog or warm log store |
| full raw access logs | S3, GCS, Azure Blob, Loki, OpenSearch, or OpenObserve |
| health checks and probes | drop or archive only |
| non-production DEBUG logs | short retention or local archive |
This keeps Datadog focused on high-value operational visibility while preserving lower-value data somewhere cheaper.
Vector and Fluent Bit as routing layers
Vector and Fluent Bit are common choices for log routing. Both can collect logs, parse fields, filter events, sample streams, add metadata, and send different classes of logs to different destinations.
For example, a Kubernetes cluster could route app logs this way:
# simplified routing example
sources:
app_logs:
type: kubernetes_logs
transforms:
route_logs:
type: route
inputs: [app_logs]
route:
critical: '.level == "ERROR" || .status >= 500'
noisy: '.path == "/healthz" || .path == "/ready" || .level == "DEBUG"'
sinks:
datadog_critical:
type: datadog_logs
inputs: [route_logs.critical]
s3_archive:
type: aws_s3
inputs: [route_logs.noisy]
This is not a production-ready configuration. It is the pattern: classify at the edge, send critical logs to Datadog, and send noisy logs to cheaper storage.
Choosing lower-cost destinations
There is no universal replacement for Datadog Logs. The right backend depends on query behavior.
Object storage
S3, GCS, or Azure Blob is the cheapest default archive for raw logs. It is useful for long-term retention, audit evidence, and rare forensic retrieval. It is not a good hot investigation interface by itself.
Grafana Loki
Loki is useful for high-volume logs when teams usually query by labels such as cluster, namespace, pod, service, or environment. It avoids full-text indexing of every log line, which can reduce storage overhead. The tradeoff is slower broad text search across large time windows.
OpenSearch
OpenSearch is useful when teams need full-text search, flexible filtering, and exploratory log analysis. The tradeoff is operational cost: cluster sizing, shard management, JVM tuning, storage performance, and upgrades.
OpenObserve or columnar backends
Columnar, object-storage-backed systems can be attractive for lower-cost log analytics. Treat vendor savings claims as benchmarks, not guarantees. The real cost depends on volume, query patterns, retention, cloud storage, compute, and the engineering time required to operate the system.
Risks to Handle Before Cutting Volume
Loss of incident context
If traces stay in Datadog but logs move elsewhere, engineers may lose one-click correlation. That does not make offload impossible, but it means trace IDs, service names, environment tags, and request IDs must be preserved across systems.
Compliance gaps
Do not reduce authentication, authorization, administrative, or payment audit logs without confirming retention and access requirements. A cheap log strategy that fails an audit is not cheap.
Pipeline backpressure
If the destination is unavailable, the routing layer must buffer safely. Enable disk-backed buffers where possible. Memory-only buffering can fail during network problems or backend throttling.
Poor ownership
Log cost reduction fails when nobody owns the policy. Each major service should have an owner, a retention class, a sampling rule, and a review schedule.
Practical 30-Day Action Plan
Week 1: Measure
- Pull seven days of log usage.
- Identify top services, sources, indexes, status codes, and environments.
- Find indexes that are rarely queried.
- Identify debug, health check, access log, and non-production noise.
Week 2: Reduce indexing
- Create or reorder indexes by value.
- Add exclusion filters for obvious low-value logs.
- Sample high-volume HTTP 2xx logs.
- Set usage monitors and alerts on indexed volume.
Week 3: Move reduction upstream
- Add edge filtering with Vector, Fluent Bit, or Observability Pipelines.
- Drop or sample health checks and repetitive info logs before Datadog ingestion.
- Archive full raw streams to object storage where needed.
Week 4: Validate
- Run incident simulations against the new log policy.
- Verify security and audit retention.
- Confirm trace IDs and request IDs still connect logs across systems.
- Review usage before and after changes.
- Document ownership and change control.
Cost Reduction Checklist
- Identify top five log producers by volume.
- Separate production, non-production, security, and archive-only streams.
- Confirm which logs are needed for active alerts.
- Add exclusion filters for repetitive low-value logs.
- Sample high-volume access logs instead of indexing everything.
- Generate metrics from repetitive informational logs.
- Route full raw logs to object storage when retention is needed.
- Use edge filtering to reduce ingestion, not just indexing.
- Preserve trace IDs, request IDs, service names, and environment tags.
- Enable pipeline buffering and retry behavior.
- Review log policy monthly.
Conclusion
Datadog log cost reduction is not a tool replacement project. It is a telemetry classification project.
Keep Datadog for the logs that need hot search, alerting, correlation, and incident response. Reduce, sample, or offload the repetitive streams that rarely help during an outage. The cleanest savings usually come from moving filtering upstream, before low-value logs become Datadog ingestion volume.
I help infrastructure and security teams reduce Datadog log costs by classifying log value, routing noisy logs to lower-cost storage, and keeping the data needed for incidents, security, and compliance.
Related guides
- Datadog Cost Reduction: What to Keep and What to Offload
- Datadog Bill Too High? Start With Logs, Custom Metrics, and Kubernetes Noise
- Zabbix vs Datadog Cost: When Zabbix Still Makes Sense
- Datadog to Zabbix Migration: What Should Move and What Should Stay
- Datadog to Grafana Migration: Practical Path for Infrastructure Dashboards
Telemetry Audit & Consultation
High log ingestion fees?
I help enterprise engineering teams design telemetry pipelines, implement edge-routing with Vector/Fluent Bit, and offload static checks to Zabbix and Grafana - saving up to 60% on SaaS bills without losing incident visibility.
Reduce Ingestion CostSources
- Datadog Pricing List - current public product pricing and billing units.
- Datadog Billing Documentation - log ingestion, indexed log events, host metering, and related billing definitions.
- Datadog Log Indexes Documentation - index filters and exclusion filters.
- Datadog Strategies for Reducing Log Volume - sampling, filtering, dropping attributes, deduplication, quotas, and archive routing.
- Datadog Best Practices for Log Management - usage monitoring, query review, exclusion filters, log-based metrics, sensitive data scanning, and audit trail.
- Vector Documentation - configuration model for sources, transforms, and sinks.
- Fluent Bit Documentation - output routing and buffering options.
Written by
Tymur Chmeruk
Cloud Security & Infrastructure Engineer · Baltimore–Washington Metro · [email protected]