Developers
How to build a coupon warehouse with Feedico ETL
Published 2026-06-18 · Updated June 18, 2026 · ~14 min read
High-traffic coupon sites, cashback apps, and internal BI teams eventually outgrow live list API calls on every page view. A coupon warehouse — a local mirror of affiliate promo and merchant data — decouples sync from serving. Feedico remains the normalization layer; your warehouse powers frontends, editors, and analysts. This guide covers schema design, pagination, idempotent upserts, scheduling, and when to add webhooks.
- Typical sync cadence
- 15–60 min
- Adjust for retail peaks
- Page size starting point
- 100–200
- Balance quota vs memory
- Primary API endpoints
- 2
- /me/coupons + /me/networks
When you need a warehouse
A list API is perfect for prototypes and low-traffic editorial sites. Move to a warehouse when any of these become true:
- Page views exceed comfortable API quota if every request lists coupons live.
- Editors need SQL or spreadsheet access to merchant coverage and code freshness.
- You run multiple surfaces (WordPress, Next.js app, mobile) that must show identical rows.
- Search facets (merchant, category, region, provider) require indexed columns.
- You want historical snapshots to measure insert velocity — see our live data report for platform benchmarks.
Smaller WordPress-only stacks may stop at post meta — see how to import CJ coupons to WordPress first, then return here when post meta becomes the bottleneck.
Reference architecture
Upstream affiliate networks (CJ, Awin, Impact, Admitad, …) sync into Feedico on your credentials. A scheduled ETL worker paginates authenticated list endpoints and upserts into your warehouse. Consumer apps — web, mobile, email — query the warehouse through your own read API or ORM. Feedico tokens stay on the worker host only.
Optional: the Linux desktop client mirrors the same flow with a GUI for operators who prefer not to maintain scripts. The coupon API ETL section explains the product framing; this article is the engineering deep dive.
Schema design
Split merchants (firms) and coupons (promos). Merchants change slowly; coupons churn daily. Store fetched_at on every row for freshness badges and SLA monitoring. Keep a raw_json column during early iterations so schema migrations do not block sync.
-- MySQL / MariaDB warehouse (production sketch)
CREATE TABLE merchants (
provider VARCHAR(64) NOT NULL,
external_id VARCHAR(128) NOT NULL,
display_name VARCHAR(255) NOT NULL,
website_url TEXT,
fetched_at DATETIME(3) NOT NULL,
PRIMARY KEY (provider, external_id),
KEY idx_merchant_name (display_name)
);
CREATE TABLE coupons (
provider VARCHAR(64) NOT NULL,
external_id VARCHAR(128) NOT NULL,
merchant_key VARCHAR(128) NOT NULL,
code VARCHAR(128),
title VARCHAR(512),
offer_url TEXT,
starts_at DATETIME(3) NULL,
expires_at DATETIME(3) NULL,
fetched_at DATETIME(3) NOT NULL,
PRIMARY KEY (provider, external_id),
KEY idx_coupon_merchant (merchant_key),
KEY idx_coupon_code (code),
KEY idx_coupon_freshness (fetched_at)
);Minimal ETL pipeline
- Extract: worker paginates POST /api/v1/me/coupons and POST /api/v1/me/networks until
recordCountis satisfied. - Transform: map normalized fields; derive
merchant_keyfrom provider + firm id; drop rows missing required compliance fields. - Load: upsert in batches (500–2000 rows) inside a transaction; commit sync metadata row with duration and counts.
- Validate: alert if coded coupon count drops more than 30% vs previous run (often signals auth or filter misconfiguration).
- Publish: invalidate CDN / app cache keys tagged by merchant or provider.
| Stage | Input | Output |
|---|---|---|
| Extract | Feedico list APIs | In-memory batch of JSON rows |
| Transform | Raw JSON | Typed rows + merchant_key |
| Load | Typed rows | Upserted SQL tables |
| Validate | Row counts vs history | Alert or abort flag |
| Publish | Sync complete event | Cache bust + webhook fan-out |
Python worker example
Prototype against SQLite before provisioning MySQL. Store FEEDICO_API_TOKEN in the environment, not in source control:
#!/usr/bin/env python3
"""Minimal Feedico → SQLite coupon warehouse worker."""
import os, sqlite3, time
import urllib.request
TOKEN = os.environ["FEEDICO_API_TOKEN"]
BASE = "https://api.feedico.io/api/v1/me"
HEADERS = {"Authorization": f"Bearer {TOKEN}", "Content-Type": "application/json"}
def post(path: str, body: dict) -> dict:
req = urllib.request.Request(
f"{BASE}{path}",
data=__import__("json").dumps(body).encode(),
headers=HEADERS,
method="POST",
)
with urllib.request.urlopen(req, timeout=60) as resp:
return __import__("json").load(resp)
def paginate_coupons(page_size: int = 200):
page = 1
while True:
payload = post("/coupons", {"page": page, "pageSize": page_size})
rows = payload.get("data") or []
yield from rows
total = payload.get("recordCount") or 0
if page * page_size >= total or not rows:
break
page += 1
def main():
conn = sqlite3.connect("coupon_warehouse.db")
conn.execute("""
CREATE TABLE IF NOT EXISTS coupons (
provider TEXT NOT NULL,
external_id TEXT NOT NULL,
code TEXT,
title TEXT,
merchant TEXT,
offer_url TEXT,
fetched_at TEXT,
raw_json TEXT,
PRIMARY KEY (provider, external_id)
)
""")
n = 0
for row in paginate_coupons():
conn.execute(
"""INSERT INTO coupons VALUES (?,?,?,?,?,?,?,?)
ON CONFLICT(provider, external_id) DO UPDATE SET
code=excluded.code, title=excluded.title,
merchant=excluded.merchant, offer_url=excluded.offer_url,
fetched_at=excluded.fetched_at, raw_json=excluded.raw_json""",
(
row.get("provider"),
str(row.get("id") or row.get("couponId") or row.get("code")),
row.get("code"),
row.get("title"),
row.get("merchant") or row.get("networkName"),
row.get("offerUrl"),
row.get("fetchedAt") or time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
__import__("json").dumps(row),
),
)
n += 1
conn.commit()
print(f"Upserted {n} coupon rows")
if __name__ == "__main__":
main()Merchant dimension sync
Coupon rows reference merchants. Sync firms first so foreign keys resolve. Use the same pagination pattern on /me/networks with optional provider and firmName filters. Merchant pages, eligibility checks, and “stores with codes” facets all read from this table. Details on the merchant feed API section.
Scheduling, quotas, and observability
Run workers on systemd timers, Kubernetes CronJobs, or GitHub Actions with secrets. Log sync_run_id, start time, duration, pages fetched, rows upserted, and HTTP status. Feedico list calls count against plan quota — design backoff when you receive documented rate-limit responses.
Compare nightly row counts to the affiliate coupon data report if you need external benchmarks for “healthy” catalogue size during seasonal peaks.
Webhooks vs polling
Blind polling every few minutes wastes quota when nothing changed. On eligible plans, register HTTPS webhook endpoints in the Feedico dashboard for sync-complete signals. Your worker then runs incremental pulls or marks cache keys dirty instead of full table scans. See coupon API webhooks and affiliate API webhooks for product context. Verify signatures and retry with idempotent upserts — webhooks may deliver more than once.
Read layer for frontends
Do not expose raw SQL to browsers. Common patterns:
- Thin internal REST: read-only endpoints over merchants and coupons with pagination mirroring Feedico semantics your apps already expect.
- GraphQL: flexible merchant search for editorial tools.
- Materialized views: pre-aggregate “top merchants with coded offers this week” for homepage carousels.
- WordPress bridge: nightly job pushes diff into post meta for legacy themes — hybrid until full replatform.
For public catalogue exploration without auth, evaluate the global coupon API separately from your private warehouse — different trust and freshness SLAs.
Anti-patterns to avoid
- Calling Feedico on every HTTP request — exhausts quota and adds tail latency.
- Deleting all rows before insert — causes read downtime; prefer upsert + soft-delete for missing ids.
- Keying only on coupon code — duplicates and collisions across merchants.
- Skipping merchant sync — orphan coupon rows without firm context break filters and disclosures.
- No freshness metadata in UI — users blame your site when expired codes fail checkout.
Getting started
Create a Feedico account, connect networks, wait for the first dashboard sync, then prototype the Python worker locally. When row counts stabilize, promote to MySQL and add monitoring. Map related intents on the publisher integration hub and read documentation for auth headers and OpenAPI downloads.
Frequently asked questions
What is a coupon warehouse in affiliate publishing?
A local database mirror of affiliate promo and merchant rows synced from upstream networks. Your website, BI tools, and editorial CMS read from the warehouse instead of calling list APIs on every page view, which protects rate limits and keeps latency predictable.
SQLite or MySQL for a coupon warehouse?
SQLite suits single-node workers, prototypes, and the Feedico Linux desktop client. MySQL or Postgres is better when multiple app servers, analysts, and cron workers need concurrent writes. Start with SQLite; migrate when sync duration or lock contention grows.
How do I deduplicate coupons across sync runs?
Upsert on a composite key: provider slug plus the network's external coupon or link id. Never key only on the promo code string — the same code can appear for different merchants or expire and return with a new id.
Should I poll Feedico or use webhooks?
Poll on a fixed cadence (15–60 minutes) as a baseline. On eligible plans, register webhooks for sync-complete events and trigger incremental warehouse jobs instead of full re-pulls. Hybrid patterns reduce API quota usage during quiet periods.
Can I use the global catalog API instead of /me/coupons?
The global catalog API is useful for discovery and free-tier exploration. Production warehouses for your approved programmes should use authenticated /me/coupons and /me/networks so rows reflect networks you joined and filters you configured.
How does this relate to the WordPress CJ import guide?
WordPress can store coupons in post meta for small catalogues. When you exceed roughly 50k rows, need sub-second merchant search, or feed BI dashboards, move to a dedicated warehouse and expose thin read APIs to WordPress, Next.js, or mobile apps.
Related reading
You need programme approval and compliant use at each affiliate network. Feedico provides the integration layer - not a substitute for network terms.