Feedico

Home · Blog · Coupon warehouse ETL

Developers

How to build a coupon warehouse with Feedico ETL

Published 2026-06-18 · Updated June 18, 2026 · ~14 min read

High-traffic coupon sites, cashback apps, and internal BI teams eventually outgrow live list API calls on every page view. A coupon warehouse — a local mirror of affiliate promo and merchant data — decouples sync from serving. Feedico remains the normalization layer; your warehouse powers frontends, editors, and analysts. This guide covers schema design, pagination, idempotent upserts, scheduling, and when to add webhooks.

Typical sync cadence
15–60 min
Adjust for retail peaks
Page size starting point
100–200
Balance quota vs memory
Primary API endpoints
2
/me/coupons + /me/networks

When you need a warehouse

A list API is perfect for prototypes and low-traffic editorial sites. Move to a warehouse when any of these become true:

  • Page views exceed comfortable API quota if every request lists coupons live.
  • Editors need SQL or spreadsheet access to merchant coverage and code freshness.
  • You run multiple surfaces (WordPress, Next.js app, mobile) that must show identical rows.
  • Search facets (merchant, category, region, provider) require indexed columns.
  • You want historical snapshots to measure insert velocity — see our live data report for platform benchmarks.

Smaller WordPress-only stacks may stop at post meta — see how to import CJ coupons to WordPress first, then return here when post meta becomes the bottleneck.

Reference architecture

Upstream affiliate networks (CJ, Awin, Impact, Admitad, …) sync into Feedico on your credentials. A scheduled ETL worker paginates authenticated list endpoints and upserts into your warehouse. Consumer apps — web, mobile, email — query the warehouse through your own read API or ORM. Feedico tokens stay on the worker host only.

Optional: the Linux desktop client mirrors the same flow with a GUI for operators who prefer not to maintain scripts. The coupon API ETL section explains the product framing; this article is the engineering deep dive.

Schema design

Split merchants (firms) and coupons (promos). Merchants change slowly; coupons churn daily. Store fetched_at on every row for freshness badges and SLA monitoring. Keep a raw_json column during early iterations so schema migrations do not block sync.

MySQL warehouse tables
-- MySQL / MariaDB warehouse (production sketch)
CREATE TABLE merchants (
  provider        VARCHAR(64)  NOT NULL,
  external_id     VARCHAR(128) NOT NULL,
  display_name    VARCHAR(255) NOT NULL,
  website_url     TEXT,
  fetched_at      DATETIME(3)  NOT NULL,
  PRIMARY KEY (provider, external_id),
  KEY idx_merchant_name (display_name)
);

CREATE TABLE coupons (
  provider        VARCHAR(64)  NOT NULL,
  external_id     VARCHAR(128) NOT NULL,
  merchant_key    VARCHAR(128) NOT NULL,
  code            VARCHAR(128),
  title           VARCHAR(512),
  offer_url       TEXT,
  starts_at       DATETIME(3)  NULL,
  expires_at      DATETIME(3)  NULL,
  fetched_at      DATETIME(3)  NOT NULL,
  PRIMARY KEY (provider, external_id),
  KEY idx_coupon_merchant (merchant_key),
  KEY idx_coupon_code (code),
  KEY idx_coupon_freshness (fetched_at)
);

Minimal ETL pipeline

  1. Extract: worker paginates POST /api/v1/me/coupons and POST /api/v1/me/networks until recordCount is satisfied.
  2. Transform: map normalized fields; derive merchant_key from provider + firm id; drop rows missing required compliance fields.
  3. Load: upsert in batches (500–2000 rows) inside a transaction; commit sync metadata row with duration and counts.
  4. Validate: alert if coded coupon count drops more than 30% vs previous run (often signals auth or filter misconfiguration).
  5. Publish: invalidate CDN / app cache keys tagged by merchant or provider.
ETL stage responsibilities
StageInputOutput
ExtractFeedico list APIsIn-memory batch of JSON rows
TransformRaw JSONTyped rows + merchant_key
LoadTyped rowsUpserted SQL tables
ValidateRow counts vs historyAlert or abort flag
PublishSync complete eventCache bust + webhook fan-out

Python worker example

Prototype against SQLite before provisioning MySQL. Store FEEDICO_API_TOKEN in the environment, not in source control:

Paginate coupons → SQLite (Python)
#!/usr/bin/env python3
"""Minimal Feedico → SQLite coupon warehouse worker."""
import os, sqlite3, time
import urllib.request

TOKEN = os.environ["FEEDICO_API_TOKEN"]
BASE = "https://api.feedico.io/api/v1/me"
HEADERS = {"Authorization": f"Bearer {TOKEN}", "Content-Type": "application/json"}

def post(path: str, body: dict) -> dict:
    req = urllib.request.Request(
        f"{BASE}{path}",
        data=__import__("json").dumps(body).encode(),
        headers=HEADERS,
        method="POST",
    )
    with urllib.request.urlopen(req, timeout=60) as resp:
        return __import__("json").load(resp)

def paginate_coupons(page_size: int = 200):
    page = 1
    while True:
        payload = post("/coupons", {"page": page, "pageSize": page_size})
        rows = payload.get("data") or []
        yield from rows
        total = payload.get("recordCount") or 0
        if page * page_size >= total or not rows:
            break
        page += 1

def main():
    conn = sqlite3.connect("coupon_warehouse.db")
    conn.execute("""
        CREATE TABLE IF NOT EXISTS coupons (
            provider TEXT NOT NULL,
            external_id TEXT NOT NULL,
            code TEXT,
            title TEXT,
            merchant TEXT,
            offer_url TEXT,
            fetched_at TEXT,
            raw_json TEXT,
            PRIMARY KEY (provider, external_id)
        )
    """)
    n = 0
    for row in paginate_coupons():
        conn.execute(
            """INSERT INTO coupons VALUES (?,?,?,?,?,?,?,?)
               ON CONFLICT(provider, external_id) DO UPDATE SET
                 code=excluded.code, title=excluded.title,
                 merchant=excluded.merchant, offer_url=excluded.offer_url,
                 fetched_at=excluded.fetched_at, raw_json=excluded.raw_json""",
            (
                row.get("provider"),
                str(row.get("id") or row.get("couponId") or row.get("code")),
                row.get("code"),
                row.get("title"),
                row.get("merchant") or row.get("networkName"),
                row.get("offerUrl"),
                row.get("fetchedAt") or time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
                __import__("json").dumps(row),
            ),
        )
        n += 1
    conn.commit()
    print(f"Upserted {n} coupon rows")

if __name__ == "__main__":
    main()

Merchant dimension sync

Coupon rows reference merchants. Sync firms first so foreign keys resolve. Use the same pagination pattern on /me/networks with optional provider and firmName filters. Merchant pages, eligibility checks, and “stores with codes” facets all read from this table. Details on the merchant feed API section.

Scheduling, quotas, and observability

Run workers on systemd timers, Kubernetes CronJobs, or GitHub Actions with secrets. Log sync_run_id, start time, duration, pages fetched, rows upserted, and HTTP status. Feedico list calls count against plan quota — design backoff when you receive documented rate-limit responses.

Compare nightly row counts to the affiliate coupon data report if you need external benchmarks for “healthy” catalogue size during seasonal peaks.

Webhooks vs polling

Blind polling every few minutes wastes quota when nothing changed. On eligible plans, register HTTPS webhook endpoints in the Feedico dashboard for sync-complete signals. Your worker then runs incremental pulls or marks cache keys dirty instead of full table scans. See coupon API webhooks and affiliate API webhooks for product context. Verify signatures and retry with idempotent upserts — webhooks may deliver more than once.

Read layer for frontends

Do not expose raw SQL to browsers. Common patterns:

  • Thin internal REST: read-only endpoints over merchants and coupons with pagination mirroring Feedico semantics your apps already expect.
  • GraphQL: flexible merchant search for editorial tools.
  • Materialized views: pre-aggregate “top merchants with coded offers this week” for homepage carousels.
  • WordPress bridge: nightly job pushes diff into post meta for legacy themes — hybrid until full replatform.

For public catalogue exploration without auth, evaluate the global coupon API separately from your private warehouse — different trust and freshness SLAs.

Anti-patterns to avoid

  • Calling Feedico on every HTTP request — exhausts quota and adds tail latency.
  • Deleting all rows before insert — causes read downtime; prefer upsert + soft-delete for missing ids.
  • Keying only on coupon code — duplicates and collisions across merchants.
  • Skipping merchant sync — orphan coupon rows without firm context break filters and disclosures.
  • No freshness metadata in UI — users blame your site when expired codes fail checkout.

Getting started

Create a Feedico account, connect networks, wait for the first dashboard sync, then prototype the Python worker locally. When row counts stabilize, promote to MySQL and add monitoring. Map related intents on the publisher integration hub and read documentation for auth headers and OpenAPI downloads.

Frequently asked questions

What is a coupon warehouse in affiliate publishing?

A local database mirror of affiliate promo and merchant rows synced from upstream networks. Your website, BI tools, and editorial CMS read from the warehouse instead of calling list APIs on every page view, which protects rate limits and keeps latency predictable.

SQLite or MySQL for a coupon warehouse?

SQLite suits single-node workers, prototypes, and the Feedico Linux desktop client. MySQL or Postgres is better when multiple app servers, analysts, and cron workers need concurrent writes. Start with SQLite; migrate when sync duration or lock contention grows.

How do I deduplicate coupons across sync runs?

Upsert on a composite key: provider slug plus the network's external coupon or link id. Never key only on the promo code string — the same code can appear for different merchants or expire and return with a new id.

Should I poll Feedico or use webhooks?

Poll on a fixed cadence (15–60 minutes) as a baseline. On eligible plans, register webhooks for sync-complete events and trigger incremental warehouse jobs instead of full re-pulls. Hybrid patterns reduce API quota usage during quiet periods.

Can I use the global catalog API instead of /me/coupons?

The global catalog API is useful for discovery and free-tier exploration. Production warehouses for your approved programmes should use authenticated /me/coupons and /me/networks so rows reflect networks you joined and filters you configured.

How does this relate to the WordPress CJ import guide?

WordPress can store coupons in post meta for small catalogues. When you exceed roughly 50k rows, need sub-second merchant search, or feed BI dashboards, move to a dedicated warehouse and expose thin read APIs to WordPress, Next.js, or mobile apps.

Related reading

You need programme approval and compliant use at each affiliate network. Feedico provides the integration layer - not a substitute for network terms.