Cloudflare Enterprise — Technical Reference

CF Logpush — Format Transform & Automated Push

Serverless log pipeline · Cloudflare Workers + R2 + Queues · CDN Partner Log Integration

1
Overview
🎯

Purpose

Automatically transform Cloudflare edge access logs from JSON (Logpush format) into a custom CDN partner log format, and deliver them to a remote log ingestion endpoint in near real-time.

🏗️

Stack

100% Cloudflare-native: Logpush + R2 + Queues + Workers. Zero servers, zero infrastructure maintenance, auto-scaling by design.

📋

Log Format

Target: 145-field \u0001-delimited plaintext format. Body compressed with Gzip. Authentication via MD5 HMAC URL signing.

End-to-End Latency

Approximately 70 seconds from request to log delivery. Typically well within partner SLA requirements.

Cloudflare Services Used

ServiceRoleLocation in Dashboard
LogpushExports edge HTTP request logs to R2 (~1 min batches, gzip-compressed ndjson)Domain → Analytics & Logs → Logpush
R2 Object StorageStores raw log files and processed batch files (temporary)Dashboard → R2 Object Storage
R2 Event NotificationsTriggers a Queue message on every new object created in R2R2 Bucket → Settings → Event Notifications
QueuesDecouples parse and send stages; guarantees at-least-once delivery with automatic retryDashboard → Workers & Pages → Queues
WorkersTwo logical workers in one script: Parser and SenderDeployed via wrangler deploy or GitHub Actions CI/CD
2
Architecture
┌─────────────────────────────────────────────────────────────────────────────────────┐ │ [Step 1] Log Generation │ │ │ │ End user opens a webpage / app │ │ │ │ │ ▼ │ │ Request reaches a Cloudflare edge node (300+ cities worldwide) │ │ │ │ │ │ CF records one log entry: client IP, timestamp, URL, status code, │ │ │ response size, cache status, latency... │ │ ▼ │ │ CF edge processes the request; log entry is buffered in CF internal system │ └───────────────────────────────────────┬─────────────────────────────────────────────┘ │ Every ~1 minute, Logpush automatically batches and pushes │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ [Step 2] Logpush Writes to R2 │ │ │ │ Logpush service (fully automated): │ │ 1. Packages all logs from the past ~1 minute (ndjson, one JSON object per line) │ │ 2. Gzip compresses the file │ │ 3. Writes to R2 bucket │ │ │ │ ┌─────────────────────────────────────────┐ │ │ │ R2 Bucket: cdn-logs-raw │ │ │ │ logs/ │ │ │ │ └── 20260328/ │ ← auto-partitioned by date │ │ │ ├── 20260328T090000Z_xxx.log.gz │ ← each file a few KB to a few MB │ │ │ ├── 20260328T090100Z_xxx.log.gz │ ← filename contains timestamp │ │ │ └── ... (new file every ~1 min) │ │ │ └─────────────────────────────────────────┘ │ └───────────────────────────────────────┬─────────────────────────────────────────────┘ │ R2 detects new object → immediately fires Event Notification (seconds) │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ [Step 3] R2 Event Notification → parse-queue │ │ │ │ R2 Event Notification sends a message to parse-queue: │ │ { "bucket": "cdn-logs-raw", "object": { "key": "logs/20260328/xxx.log.gz" } } │ │ (Message contains only the file path — a few dozen bytes, never exceeds 128 KB) │ │ │ │ ┌─────────────────────────────────────────┐ │ │ │ Queue: parse-queue │ │ │ │ msg 1: { key: "logs/.../a.log.gz" } │ ← file path only, no log content │ │ │ msg 2: { key: "logs/.../b.log.gz" } │ ← auto-batched, auto-retry │ │ │ ... │ │ │ └─────────────────────────────────────────┘ │ └───────────────────────────────────────┬─────────────────────────────────────────────┘ │ Queue wakes up Parser Worker when messages arrive (~3-5 seconds) │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ [Step 4] Parser Worker — Read, Decompress, Parse, Transform │ │ │ │ ① Obtain streaming reference to .gz file from R2 (no full download) │ │ │ │ │ ▼ │ │ ② Stream decompress via DecompressionStream — only a few KB in memory at a time │ │ │ │ │ ▼ │ │ ③ Split by newline, parse each line as JSON (one CF log entry per line) │ │ │ │ │ ▼ │ │ ④ Format transform (transformEdge function) │ │ CF JSON → \u0001-delimited 145-field plaintext format │ │ │ │ Before: { "RayID":"9e34...", "ClientIP":"221.229...", "EdgeColoCode":"HGH" } │ │ After: cs_vod_v3.0\u0001[28/Mar/2026:16:42:25 +0800]\u00019e34...\u0001200 │ │ \u00011774680145.000\u00010.149\u0001...(145 fields total) │ │ │ │ │ ▼ │ │ ⑤ Every 1000 lines (or end of file) → write R2 temp file + send to send-queue │ └───────────────────────────────────────┬─────────────────────────────────────────────┘ │ Write R2 temp file, enqueue message │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ [Step 5] R2 Temp File + send-queue │ │ │ │ R2 temp file (processed/ directory, already in target format): │ │ processed/logs_20260328_xxx-0-1774683718956.txt ← up to 1000 log lines │ │ │ │ ┌─────────────────────────────────────────┐ │ │ │ Queue: send-queue │ │ │ │ msg: { "key": "processed/...txt" } │ ← file path only, a few dozen bytes │ │ │ ... │ ← auto-retry up to 5x → send-dlq │ │ └─────────────────────────────────────────┘ │ └───────────────────────────────────────┬─────────────────────────────────────────────┘ │ Queue wakes up Sender Worker when messages arrive (~3-5 seconds) │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ [Step 6] Sender Worker — Compress, Sign, Send │ │ │ │ ① Read temp file from R2 (up to 1000 lines in target format) │ │ │ │ │ ▼ │ │ ② Gzip compress body (required by partner spec) │ │ │ │ │ ▼ │ │ ③ Compute MD5 auth signature: │ │ ts = current Unix time + 300s │ │ rand = random integer │ │ sigstr = "{uri}-{ts}-{rand}-{privateKey}" │ │ md5hash = MD5(sigstr) │ │ URL = {endpoint}{uri}?auth_key={ts}-{rand}-{md5} │ │ │ │ │ ▼ │ │ ④ HTTP POST to partner log ingestion server (Content-Encoding: gzip) │ │ │ │ │ ├── Success (HTTP 200) → ack message + delete R2 temp file ✅ │ │ └── Failure → retry → up to 5x → send-dlq on exhaustion ⚠️ │ └───────────────────────────────────────┬─────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ [Step 7] Partner Ingestion │ │ Verify auth → Gzip decompress → split on \u0001 → write to billing/analytics DB ✓│ └─────────────────────────────────────────────────────────────────────────────────────┘
Queue messages carry only R2 object keys (a few dozen bytes), never log content. This sidesteps the 128 KB Queue message size limit entirely.
3
Key Design Decisions

Streaming Gzip Decompression

Raw log files are processed via DecompressionStream + ReadableStream. The Worker never buffers the full file in memory — it decompresses and transforms line-by-line, keeping memory usage constant regardless of file size. This is essential given the 128 MB Worker memory limit.

Two-Stage Queue Architecture

parse-queue handles R2 → transform. send-queue handles R2 → HTTP POST. The separation allows independent retry policies: parse failures retry 2×, send failures retry 5×. Both queues have dedicated dead-letter queues (DLQs) for observability and manual recovery.

Idempotent Delivery

Queues use at-least-once semantics. The Sender handles duplicate delivery gracefully: if the R2 temporary file has already been deleted (indicating a prior successful send), the message is silently acknowledged without re-sending.

Timestamp Handling

The parser auto-detects EdgeStartTimestamp in three formats: Unix seconds integer, Unix milliseconds integer, and RFC 3339 string. Logpush is recommended to be configured with timestamp_format = unix for consistency.

CDN Node Country Mapping

The target format requires the CDN node country (not client country). EdgeColoCode (IATA airport code, e.g. NRT = Tokyo) is mapped to ISO 3166-1 alpha-2 country code via a built-in lookup table covering 200+ airports globally. If the IATA code is not in the table, defaults to CN.

MD5 Authentication

Workers' SubtleCrypto API does not support MD5. A pure-JS RFC 1321 implementation is included in src/index.js. The auth URL signature format is: MD5({uri}-{ts}-{rand}-{privateKey}).

Response Content-Length (Field #21)

The http_requests dataset does not expose a dedicated Content-Length field. Field #21 is populated from ResponseHeaders['content-length'] when Logpush Custom Fields are configured via API (see Step 6b in Deployment). If not configured or the header is absent, the field returns -.

Custom Fields configuration for content-length is optional. Without it, field #21 will consistently output -, which is the standard placeholder for unavailable fields in this format.

Reliability Model

Failure ScenarioBehaviorRecovery
Remote endpoint temporarily unavailablesend-queue retries up to 5× with exponential backoff (~15 min window)Automatic; R2 file preserved until delivery confirmed
Malformed log record in source filePer-line error caught; line skipped; batch continuesError count logged; file-level processing succeeds
Parser crash on corrupt .gz fileparse-queue retries 2×; moves to parse-dlqManual inspection of DLQ message + raw R2 file
Duplicate Queue deliveryR2 file not found → silent ack, no re-sendAutomatic; no data duplication
4
Performance Characteristics

Measured: 2026-03-30, 6-hour window (11:00–17:00 GMT+8) · HTTP traffic: 13.31M requests · Data transfer: 70.39 GB

Worker

MetricObservedAssessment
Worker invocations (6h)2kMatches Logpush batch count
Errors (internal)0Clean
Exceeded CPU Time Limits3Occurred at peak; monitor if increasing
Exceeded Memory0Streaming design effective
Median CPU Time3.52 msFar below 30s limit
P90 CPU Time355 msNormal for large file batches
P99 CPU Time3.17sWithin 30s limit
Wall Time P50 / P991.13s / 18.97sWithin limits including network wait
5xx subrequests (to partner)36 / 15k0.24% — partner-side intermittent; auto-retried

parse-queue

MetricObservedAssessment
Messages ingested (6h)355Matches Logpush file count
Messages acknowledged347 (97.7%)Healthy
Messages retried0No parse failures
Realtime backlog0Real-time consumption
Average backlog (avg/peak)5.47 / 13Light
Consumer lag time (avg/peak)15.13s / ~250sPeak lag at traffic burst; acceptable
Consumer concurrency (avg/peak)1.37 / 14Auto-scaled effectively
DLQ messages0No data loss

send-queue

MetricObservedAssessment
Messages ingested (6h)14.9k~14.9M log lines delivered
Messages acknowledged15.45kIncluding prior backlog clearance
Messages retried0No send failures escalated to queue
Realtime backlog0Real-time consumption
Average backlog (avg/peak)32.6 / 106Moderate at peak; within capacity
Consumer lag time (avg/peak)5.62s / ~55sAcceptable
Consumer concurrency (avg/peak)22.07 / ~220Near 250 concurrency limit at peak
DLQ messages0No data loss

Capacity Headroom

DimensionCurrent Load5× Traffic10× Traffic
Worker CPU~5% of limitSafeP99 near limit on large files
Worker Memory0 overrunsSafe (streaming design)Safe
parse-queue lagavg 15s, peak 250sPeak lag ~5–10 minSustained backlog risk
send-queue concurrencyavg 22, peak 220Near 250 limit at peakExceeds 250 limit
Partner endpoint0.24% 5xxMonitor 5xx rateLikely bottleneck
Overall assessment: the system operates comfortably at current traffic levels. Stable up to ~5× current volume. Beyond 5×, send-queue concurrency and parse-queue lag become primary constraints. The partner endpoint capacity should be confirmed before scaling.

Key Metrics to Monitor

PriorityMetricAlert Threshold
★★★DLQ message count (both queues)> 0 — immediate action
★★★Realtime Backlog (sustained)> 100 messages for > 5 min
★★★Exceeded CPU Time Limits> 10 per hour
★★Consumer Lag Time (parse-queue)Sustained > 5 minutes
★★5xx subrequests rate> 0.5% of total
★★send-queue Consumer ConcurrencySustained > 200
★P99 CPU Time> 20s
5
Deployment Guide
Prerequisites: Node.js v18+ installed locally. Wrangler CLI installed (npm install -g wrangler). A Cloudflare API Token scoped to the target account with the following permissions: Workers Scripts:Edit, Workers R2 Storage:Edit, and Workers Queue:Edit (use the Edit Cloudflare Workers template and manually add the R2 and Queue permissions).
Before deploying, obtain from your log ingestion partner: server endpoint URL, authentication private key, and the target URI path.
1

Create R2 Bucket

Dashboard → R2 Object Storage → Create bucket

  • Name: cdn-logs-raw
  • Location: Automatic (default)
2

Create Queues

Dashboard → Workers & Pages → Queues → Create queue

Create all four queues (exact names required):

Queue NamePurpose
parse-queueTriggers Parser Worker on new R2 files
send-queueTriggers Sender Worker on processed batches
parse-dlqDead-letter queue for parse failures
send-dlqDead-letter queue for send failures
3

Configure R2 Event Notification

R2 → cdn-logs-raw → Settings → Event Notifications → Add notification

  • Event type: object-create
  • Queue: parse-queue
  • Prefix (optional): logs/
  • Suffix (optional): .gz
4

Set Worker Secrets

From the project directory, set encrypted secrets via Wrangler:

# Log ingestion endpoint (base URL, no trailing slash)
wrangler secret put CTYUN_ENDPOINT

# Authentication private key
wrangler secret put CTYUN_PRIVATE_KEY

# Target URI path (e.g. /logpost/yourpath)
wrangler secret put CTYUN_URI_EDGE

# Verify
wrangler secret list
If using GitHub Actions CI/CD, secrets still need to be set via wrangler secret put directly. They persist in Cloudflare and do not need to be re-uploaded on each deployment.
5

Deploy the Worker

wrangler deploy
Expected output confirms: Producer bindings for parse-queue and send-queue. Consumer bindings for parse-queue and send-queue. Current Version ID assigned.
If deploying via GitHub Actions, simply push to the main branch. The workflow handles wrangler deploy automatically.
6

Configure Logpush Job

Dashboard → [Target Zone] → Analytics & Logs → Logpush → Create a Logpush job

  1. Destination: R2
  2. Bucket: cdn-logs-raw  ·  Path prefix: logs/{DATE}/
  3. Dataset: HTTP requests
  4. Fields: Select all fields (the Worker handles missing fields gracefully)
  5. Timestamp format: Unix
  6. Sample rate: 1 (100%)
Timestamp format must be set to Unix. The parser auto-detects all three formats but Unix is recommended for consistency.
6b

Enable Content-Length Logging — Optional

By default, field #21 (sent_http_content_length) falls back to EdgeResponseBodyBytes. To capture the true Content-Length response header value, configure Logpush Custom Fields via API. This is a one-time zone-level configuration.

This configuration requires API access. It cannot be set through the Cloudflare Dashboard UI.

Step A — Check for existing Custom Fields ruleset

# List zone rulesets and find http_log_custom_fields phase
curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/rulesets" \
  --request GET \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  | jq '.result[] | select(.phase == "http_log_custom_fields") | {id, phase}'

If a ruleset is returned, note its id as RULESET_ID and skip to Step C. If no result, proceed to Step B.

Step B — Create the Custom Fields ruleset (only if Step A returned nothing)

curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/rulesets" \
  --request POST \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --json '{
    "name": "Zone-level phase entry point",
    "kind": "zone",
    "description": "Custom log fields for Logpush",
    "phase": "http_log_custom_fields"
  }' | jq '.result.id'

Note the returned id as RULESET_ID.

Step C — Configure content-length capture

Use raw_response_fields to capture the original content-length value from the origin before any CF transformations.

curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/rulesets/$RULESET_ID" \
  --request PUT \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --json '{
    "rules": [
      {
        "action": "log_custom_field",
        "expression": "true",
        "description": "Capture content-length response header for Logpush",
        "action_parameters": {
          "raw_response_fields": [
            { "name": "content-length" }
          ]
        }
      }
    ]
  }'

Step D — Add ResponseHeaders to the Logpush job

# Get your Logpush job ID first
curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/logpush/jobs" \
  --request GET \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  | jq '.result[] | {id, name, dataset}'

# Update job to include ResponseHeaders field
curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/logpush/jobs/$JOB_ID" \
  --request PUT \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --json '{
    "output_options": {
      "field_names": [
        "CacheCacheStatus","ClientCountry","ClientIP","ClientRequestBytes",
        "ClientRequestHost","ClientRequestMethod","ClientRequestProtocol",
        "ClientRequestReferer","ClientRequestScheme","ClientRequestURI",
        "ClientRequestUserAgent","ClientSSLProtocol","ClientSrcPort",
        "EdgeColoCode","EdgeResponseBodyBytes","EdgeResponseBytes",
        "EdgeResponseContentType","EdgeResponseStatus","EdgeServerIP",
        "EdgeStartTimestamp","EdgeTimeToFirstByteMs","OriginIP",
        "OriginRequestHeaderSendDurationMs","OriginResponseDurationMs",
        "OriginResponseHeaderReceiveDurationMs","OriginResponseStatus",
        "OriginTLSHandshakeDurationMs","ParentRayID","RayID",
        "ResponseHeaders"
      ],
      "timestamp_format": "unix"
    }
  }'
Once configured, ResponseHeaders in each log line will contain {"content-length": "12345"} and the Worker will automatically use it for field #21.
Note: For HTTP/2 responses or gzip-compressed responses, CF may omit Content-Length and use chunked transfer encoding instead. In those cases field #21 will output -.
7

Verify End-to-End

Allow ~2 minutes for Logpush to initialize and write the first batch, then verify:

  1. R2: cdn-logs-raw/logs/ contains .log.gz files
  2. Queue: parse-queue → Messages Processed counter is incrementing
  3. Worker logs: Real-time tail via wrangler tail ctyun-logpush
# Expected log output (LOG_LEVEL=info)
[INFO] Parsing: logs/20260328/xxx.log.gz
[INFO] Done: logs/20260328/xxx.log.gz | lines=73 batches=1 errors=0
[INFO] Sent 73 lines → HTTP 200 | processed/xxx.txt
HTTP 401/403 on send → verify CTYUN_PRIVATE_KEY value
HTTP 404 on send → verify CTYUN_URI_EDGE path
6
Configuration Reference

wrangler.toml

name                = "ctyun-logpush"
main                = "src/index.js"
compatibility_date  = "2026-03-27"
compatibility_flags = ["nodejs_compat"]
account_id          = "<your-account-id>"

[[r2_buckets]]
binding     = "RAW_BUCKET"
bucket_name = "cdn-logs-raw"

[[queues.producers]]
binding = "PARSE_QUEUE"
queue   = "parse-queue"

[[queues.producers]]
binding = "SEND_QUEUE"
queue   = "send-queue"

[[queues.consumers]]
queue              = "parse-queue"
max_batch_size     = 5
max_batch_timeout  = 10
max_retries        = 2
dead_letter_queue  = "parse-dlq"

[[queues.consumers]]
queue              = "send-queue"
max_batch_size     = 50
max_batch_timeout  = 5
max_retries        = 5
dead_letter_queue  = "send-dlq"

[vars]
BATCH_SIZE = "1000"  # Lines per HTTP POST batch
LOG_LEVEL  = "info"  # debug | info | warn | error

workers_dev  = false
preview_urls = false

Environment Variables

NameTypeExampleDescription
CTYUN_ENDPOINTSecret http://log.example.com:5580 Log ingestion server base URL (no trailing slash, no URI path)
CTYUN_PRIVATE_KEYSecret YourKey@1234 MD5 authentication private key (provided by log partner)
CTYUN_URI_EDGESecret /logpost/yourpath Target URI path for HTTP POST (provided by log partner)
BATCH_SIZEVar 1000 Number of log lines per POST batch.
LOG_LEVELVar info Worker log verbosity. Use debug for troubleshooting; revert to info in production.
Secrets are set once via wrangler secret put <NAME> and persist in Cloudflare. They are never stored in wrangler.toml or committed to source control.

GitHub Actions CI/CD

The repository includes .github/workflows/deploy.yml. Any push to main triggers an automatic wrangler deploy (code and configuration only).

Required GitHub secret for CI/CD: CLOUDFLARE_API_TOKEN only. Worker secrets (CTYUN_ENDPOINT, CTYUN_PRIVATE_KEY, CTYUN_URI_EDGE) are managed separately via wrangler secret put.

7
Operations

Health Check — Dashboard Indicators

IndicatorHealthy StateAction if Abnormal
R2 logs/New .gz files appearing every ~1 minCheck Logpush job is Enabled
parse-queue backlogNear 0 at all timesSustained backlog → check Worker logs for parse errors
send-queue backlogNear 0 at all timesSustained backlog → check remote endpoint availability
parse-dlqInactive (0 messages)Messages present → inspect raw .gz file for corruption
send-dlqInactive (0 messages)Messages present → check endpoint, manually reprocess from R2
R2 processed/No file accumulationFiles accumulating → Sender not consuming; check Worker logs

Metric Interpretation Guide

SignalMeaningAction
Backlog = 0 + Lag < 30sSystem consuming in real timeHealthy — no action needed
Backlog growing continuouslyConsumption falling behind production — genuine bottleneckCheck Worker errors; scale if needed
Exceeded CPU Time Limits > 0Individual Logpush file too large for single Worker invocationMonitor frequency; do NOT reduce max_batch_size
Messages Retried > 0Partner endpoint instabilityInvestigate 5xx subrequests; confirm endpoint capacity
DLQ > 0Data loss risk — highest priorityInspect DLQ messages immediately; manually reprocess from R2
send-queue Concurrency > 200Near the 250 concurrent consumer limitIncrease max_batch_size for send-queue

Updating Secrets vs. Variables

Change TypeMethodRedeploy Required?
Rotate CTYUN_ENDPOINT / CTYUN_PRIVATE_KEY / CTYUN_URI_EDGEwrangler secret put <NAME>No — effective immediately
Change BATCH_SIZE / LOG_LEVELEdit wrangler.tomlwrangler deployYes
Update Worker code (src/index.js)git push → CI/CD auto-deploysYes (handled by CI/CD)