Redis 7+ is in-memory data structure server most commonly deployed as a cache, message broker, session store, and real-time analytics engine. Its single-threaded event loop delivers sub-millisecond latency at hundreds of thousands of operations per second on commodity hardware. For data engineers, Redis shows up in three primary roles: cache layer in front of slow queries (BI dashboards, API responses), rate-limiting and deduplication store in streaming pipelines, and fast lookup table for enriching stream events. Redis 7.4+ also bundles Redis Stack with modules for vector similarity search (RediSearch), JSON documents (RedisJSON), and probabilistic data structures (RedisBloom).
This guide targets Redis 7.4. All concepts also apply to Valkey 7.x (the community fork) and AWS ElastiCache / Azure Cache for Redis managed offerings.
Redis is not just a key-value store — each key maps to a typed data structure. Choosing the right type avoids O(n) operations and excess memory.
| Type | Key Operations | DE Use Case |
|---|---|---|
| String | SET, GET, INCR, SETNX, GETSET | Cache serialized JSON, feature flags, counters |
| Hash | HSET, HGETALL, HINCRBY | Dimensional lookup tables (user profile, product metadata) |
| List | LPUSH/RPUSH, LRANGE, BRPOP | Simple task queues, recent-events ring buffers |
| Set | SADD, SCARD, SINTERSTORE | Deduplication (seen event IDs), unique visitor counts |
| Sorted Set (ZSet) | ZADD, ZRANGEBYSCORE, ZREVRANK | Leaderboards, sliding-window rate limits, scheduled jobs |
| Stream | XADD, XREADGROUP, XACK | Event bus with consumer groups, log aggregation |
| HyperLogLog | PFADD, PFCOUNT | Approximate unique count (DAU, distinct queries) at 12 KB/key |
Redis is in-memory by default — a restart loses all data unless persistence is configured.
dump.rdb). Fast restarts; small file; but data since last snapshot is lost on crash. Configured via save <seconds> <changes> e.g. save 300 10.appendonly.aof. On restart, replays the log. Three fsync policies: always (durability > performance), everysec (default, lose max ~1 s of writes), no (OS decides). AOF files can be rewritten (BGREWRITEAOF) to compact history.Key expiry (EXPIRE key seconds / SET key value EX seconds) removes keys after a TTL. Redis uses a lazy expiry check (deleted on access) plus a background sampler that periodically evicts expired keys. With many expiring keys, set jitter on TTLs to prevent thundering-herd cache stampedes.
Eviction policies (set via maxmemory-policy) control what Redis deletes when maxmemory is reached:
allkeys-lru — Evict any key by LRU. Best for pure caches.volatile-lru — LRU only among keys with TTLs set. Mixed cache+persistent data.allkeys-lfu — LFU (Least Frequently Used). Better recall for hot data with bursty access patterns.noeviction — Return OOM error when full. Use for source-of-truth stores where data loss is unacceptable.Redis supports three deployment topologies:
REPLICAOF host port). Replicas serve reads; primary handles writes. Async replication — small window of data loss on primary failure.MULTI/EXEC workarounds or hash tags: {user:123}:profile and {user:123}:sessions land on same slot.Streams (XADD / XREADGROUP) are an append-only log data structure — conceptually similar to a Kafka topic partition but entirely in-memory. Key concepts:
millisecondsTimestamp-sequenceNumber (e.g. 1712345678000-0) or manually specified. IDs are always monotonically increasing.XREADGROUP GROUP mygroup consumer1 COUNT 10 STREAMS mystream > — the > means "give me new undelivered messages".XACK mystream mygroup <id> marks a message as processed. Unacknowledged messages in the PEL (Pending Entry List) can be reclaimed by other consumers after timeout.XADD mystream MAXLEN ~ 100000 * field value — the ~ allows trimming in batches for efficiency.| Feature | Pub/Sub | Streams |
|---|---|---|
| Delivery | Fire-and-forget; offline subscribers miss messages | Persisted log; consumers can replay |
| Consumer groups | ❌ All subscribers receive all messages | ✅ Each consumer group processes each entry once |
| At-least-once delivery | ❌ | ✅ via ACK + PEL redelivery |
| Backpressure | None | MAXLEN trimming |
| Use when | Real-time broadcast (live dashboards, notifications) | Reliable event processing (DE pipelines, audit logs) |
A Spark-based data platform generates complex Summary statistics on demand. Each query takes 3–8 seconds on the warehouse. A Redis cache with allkeys-lru eviction stores serialized JSON results keyed by a hash of the query parameters with a 5-minute TTL. Cache hit rate reaches 85% within 10 minutes of peak traffic, reducing warehouse compute costs by 60%. Cache stampedes on TTL expiry are mitigated by a Lua-based lock that lets only one caller refresh the value.
A data ingestion API accepts events from thousands of IoT devices. To prevent abuse, per-device rate limiting is implemented using a Redis Sorted Set: device events are recorded as ZADD device:{id}:events NX <timestamp> <event_id>. The window is then trimmed (ZREMRANGEBYSCORE) and counted (ZCARD) in a single Lua script, ensuring atomicity without distributed locks.
Five microservices (ingest, validate, enrich, aggregate, export) form a pipeline that processes field sensor data. Each stage publishes to the next service's Redis Stream with consumer groups. Slow stages accumulate a backlog visible in the stream length. Dead-letter handling: entries unacknowledged for >30 seconds are moved to a dlq stream by a background reclaimer that uses XAUTOCLAIM. The total pipeline latency is under 200ms at 50K events/second.
A gaming analytics dashboard shows "top 100 players by score in the last hour". When a game event arrives, ZINCRBY leaderboard:hourly <delta> <player_id> updates the score atomically. ZREVRANGE leaderboard:hourly 0 99 WITHSCORES retrieves the top 100 in O(log n + 100) time. A background job shifts the window by creating a new key every hour and using EXPIRE on the old one. This pattern requires zero separate aggregation pass.
A web analytics pipeline tracks unique page views per URL per day. Using PFADD pageviews:{date}:{url} {user_id} for each event and PFCOUNT pageviews:{date}:{url} for the count, each counter uses at most 12 KB regardless of cardinality. Error rate is <1%. Multiple HyperLogLogs can be merged with PFMERGE to compute monthly uniques without re-processing events. Memory savings vs exact counters are 99%+ for high-cardinality URLs.
import redis
import json, hashlib, time
r = redis.Redis(host="localhost", port=6379, decode_responses=True)
-- Lua script: atomic check-and-lock to prevent cache stampede
LOCK_SCRIPT = """
local val = redis.call('GET', KEYS[1])
if val then return val end
local locked = redis.call('SET', KEYS[2], '1', 'NX', 'EX', '10')
if locked then return nil end
return redis.call('GET', KEYS[1])
"""
def cache_aside(query_key: str, expensive_fn, ttl: int = 300):
lock_key = ff"{query_key}:lock"
script = r.register_script(LOCK_SCRIPT)
for attempt in range(5):
cached = script(keys=[query_key, lock_key])
if cached is not None:
return json.loads(cached) # Cache hit
result = expensive_fn() # Only one caller reaches here
r.set(query_key, json.dumps(result), ex=ttl)
r.delete(lock_key)
return result
time.sleep(0.05) # Brief wait when lock is held
return json.loads(r.get(query_key) or "{}")
RATE_LIMIT_SCRIPT = """
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local uid = ARGV[4]
redis.call('ZREMRANGEBYSCORE', key, '-inf', now - window)
local count = redis.call('ZCARD', key)
if count >= limit then
return 0
end
redis.call('ZADD', key, 'NX', now, uid .. ':' .. now)
redis.call('EXPIRE', key, window)
return 1
"""
import time, uuid
rate_script = r.register_script(RATE_LIMIT_SCRIPT)
def is_allowed(device_id: str, window_seconds: int = 60, limit: int = 100) -> bool:
now = int(time.time() * 1000) # millisecond precision
key = ff"ratelimit:{device_id}"
uid = str(uuid.uuid4())
result = rate_script(
keys=[key],
args=[now, window_seconds * 1000, limit, uid]
)
return bool(result)
import redis, json, time, threading
r = redis.Redis(host="localhost", port=6379, decode_responses=True)
STREAM = "sensor-events"
GROUP = "enrichment-workers"
# Create consumer group (idempotent)
try:
r.xgroup_create(STREAM, GROUP, id="0", mkstream=True)
except redis.exceptions.ResponseError:
pass # Group already exists
def worker(consumer_name: str):
while True:
# Block up to 1 s waiting for new messages ('>' = undelivered)
entries = r.xreadgroup(
GROUP, consumer_name,
{STREAM: ">"},
count=10, block=1000
)
if not entries:
continue
for _, messages in entries:
for msg_id, data in messages:
try:
process(data) # user-defined enrichment
r.xack(STREAM, GROUP, msg_id)
except Exception as e:
print(ff"[WARN] {msg_id} failed: {e} — left in PEL for retry")
# Producer: ingest raw events
def produce(sensor_id: str, value: float):
r.xadd(STREAM, {"sensor_id": sensor_id, "value": value},
maxlen=500_000, approximate=True)
# Launch two consumers in parallel threads
for name in ["w1", "w2"]:
threading.Thread(target=worker, args=(name,), daemon=True).start()
LEADERBOARD = "scores:hourly"
def record_score(player_id: str, delta: float):
r.zincrby(LEADERBOARD, delta, player_id)
def top_n(n: int = 10) -> list[dict]:
entries = r.zrevrange(LEADERBOARD, 0, n - 1, withscores=True)
return [
{"player": pid, "score": int(score), "rank": i + 1}
for i, (pid, score) in enumerate(entries)
]
def player_rank(player_id: str) -> int | None:
rank = r.zrevrank(LEADERBOARD, player_id)
return rank + 1 if rank is not None else None
# Rotate leaderboard every hour: NEW key + expire OLD key
import datetime
def rotate_leaderboard():
hour_key = datetime.datetime.utcnow().strftime("scores:%Y%m%d%H")
r.rename(LEADERBOARD, hour_key)
r.expire(hour_key, 86400) # keep 24 h of history
↑ Back to top
| Dimension | Redis | Memcached | DragonflyDB | Apache Kafka |
|---|---|---|---|---|
| Data structures | Rich (10+ types) | String only | Redis-compatible (10+ types) | Byte arrays (topic records) |
| Persistence | ✅ RDB + AOF | ❌ In-memory only | ✅ Snapshot + journal | ✅ Durable log on disk |
| Consumer groups / at-least-once | ✅ Streams | ❌ | ✅ Streams | ✅ Consumer groups |
| Max throughput | ~1M ops/s (single node) | ~1M ops/s | ~4M ops/s (multi-threaded) | Millions msg/s (partitioned) |
| Retention / replay | Configurable MAXLEN | None | Configurable | Unlimited (log compaction) |
| Best for | Caching, rate limits, leaderboards, real-time enrichment | Simple session caching | Same as Redis, higher throughput | High-volume durable event streaming |
Rule of thumb: Use Redis for low-latency lookups and real-time structures (leaderboards, rate limits, session state). Use Kafka when you need durability, replay, and very high throughput at scale. Do not use Redis Streams as a Kafka replacement when message retention > 24 hours or when partition-level parallelism > ~8 is needed.
↑ Back to topappendonly yes, appendfsync everysec) or RDB snapshots before storing anything you can't afford to lose. Managed services (ElastiCache, Azure Cache) have separate persistence configuration separate from node spin-up.KEYS * scans the entire keyspace — O(N) — and blocks the single-threaded server. On a Redis instance with 10M keys, this can freeze responses for several seconds. Use SCAN (cursor-based, non-blocking) or query by key prefix patterns via RediSearch instead.MSET, KEYS, pipelining across keys) raise CROSSSLOT errors unless all keys hash to the same slot. Force co-location by using hash tags: wrap the shared part in {}, e.g., {user:123}:cart and {user:123}:wishlist always land on the same slot.maxmemory, Redis will consume all available RAM until the OOM killer terminates it, or the OS starts swapping (catastrophic latency). Always set maxmemory and an appropriate eviction policy. Monitor used_memory vs maxmemory with INFO memory.threading.Thread). Measure how many times the "expensive function" is called in each scenario. Confirm the Lua lock reduces it to exactly 1 call. Then observe behavior when the lock holder crashes mid-refresh.orders. Write a producer that generates 1,000 fake order events. Create two consumer groups — fulfillment and analytics. Each group should have two consumers. Confirm each group independently receives all 1,000 events, and each consumer within a group receives roughly half. Implement a dead-letter handler using XAUTOCLAIM.ZUNIONSTORE. Write a function that returns a player's rank and score across the merged window. Rotate the window every 60 seconds in a test environment and verify correctness.A cache stampede (also called a thundering herd) occurs when a popular cache key expires simultaneously for many concurrent requests, causing all of them to bypass the cache and hit the backing store at once. Redis prevents this via Lua scripts: a lightweight atomic lock is set (SET lock_key 1 NX EX 10) before recomputing the value. Only the first request acquires the lock; subsequent requests poll until the value is repopulated. Alternatively, probabilistic early expiry can refresh keys slightly before actual expiry, spreading the recomputation load over time.
FLUSHDB deletes all keys in the current logical database (default DB 0). FLUSHALL deletes all keys across all 16 logical databases. Both are blocking by default but accept the ASYNC modifier to delete in a background thread. In production, neither should ever be run without explicit confirmation — they are irreversible and instant. Use SCAN + DEL for selective key deletion. A common DE use case for FLUSHDB: clearing a test Redis instance before loading fresh fixture data.
An entry in the PEL means it was delivered to a consumer but never acknowledged with XACK. This indicates the consumer either crashed mid-processing or encountered an error. Handle it with XAUTOCLAIM (Redis 7+): a background reclaimer periodically calls XAUTOCLAIM stream group reclaimer min-idle-time 0-0 COUNT 50 to transfer ownership of stale PEL entries to healthy consumers. After a configurable max-retry count, move the entry to a dead-letter stream for manual inspection. This pattern is equivalent to Kafka's consumer offset uncommitted detection.
Redis is single-threaded for command processing. KEYS * performs a full keyspace scan — O(N) where N is the total number of keys — and blocks the entire server for its duration. On an instance with millions of keys, this can block for seconds, causing timeouts for all other clients. The production alternative is SCAN cursor MATCH pattern COUNT 100, which iterates in small batches without blocking. For structured key lookups, use RediSearch (part of Redis Stack) which maintains an inverted index.
Redis Sentinel provides high availability for a single-primary setup: Sentinel processes monitor the primary, detect failure using quorum voting, and promote a replica to primary. No data sharding — all data lives on one primary. Good for datasets that fit on a single server. Redis Cluster provides both high availability AND horizontal sharding across up to 1000 nodes. Data is distributed across 16,384 hash slots. It automatically handles failover. The tradeoff: multi-key commands require hash tags to ensure co-location, and not all commands are supported. Choose Sentinel for simpler HA; choose Cluster when your dataset outgrows a single node.