Real-time is expensive. Make sure you need it.
There's a seductive pull toward building everything as a real-time system. Live dashboards, instant updates, streaming data, sub-second responses. It feels modern. It impresses in demos. And for certain operational functions, it's genuinely necessary.
For others, it's a waste of engineering effort and infrastructure cost. The art is knowing which is which.
When real-time matters
Real-time processing is essential when delayed information leads to degraded outcomes. In operational systems, that means:
- Dispatch and routing: A shuttle assignment based on 5-minute-old data could send a driver to the wrong terminal
- Exception detection: A payment failure needs immediate attention, not a batch report at midnight
- Customer communication: ETA updates need to reflect current reality, not the plan from an hour ago
- Capacity management: When a lot is approaching full, real-time awareness prevents overbooking
The common thread: these are situations where the cost of staleness exceeds the cost of freshness.
When batch is better
Batch processing - running computations on a schedule rather than continuously - is better for tasks where slightly stale data is perfectly acceptable and the computation is expensive.
- Demand forecasting: Tomorrow's demand prediction doesn't change meaningfully between 6 PM and midnight
- Financial reconciliation: End-of-day batch is simpler, more reliable, and easier to audit
- Report generation: Weekly performance reports don't need streaming data
- ML model retraining: Models improve on daily or weekly cycles, not real-time
- Data archival and cleanup: Periodic maintenance is simpler than continuous processing
The hybrid approach
The best operational systems use both. We call this "real-time where it matters, batch where it's efficient."
In Flypark's architecture, booking status changes propagate in real-time through an event system. Every state transition - booking confirmed, vehicle checked in, shuttle dispatched, customer picked up - triggers immediate downstream updates. Operators see current state, customers get timely notifications, and exceptions surface instantly.
Meanwhile, demand forecasting runs as a nightly batch job. Revenue reporting aggregates hourly. ML models retrain weekly. These processes consume the same underlying data but on schedules that match their actual freshness requirements.
The cost difference
Real-time infrastructure (event streaming, WebSocket connections, reactive databases) costs roughly 3-5x more than equivalent batch infrastructure. That multiplier applies to both infrastructure spend and engineering complexity. Every component you move from batch to real-time adds ongoing maintenance burden.
Common mistakes
Making dashboards real-time when nobody watches them continuously. If your operations dashboard is checked 5 times per day, refreshing on load is fine. Streaming updates to an unwatched screen wastes resources.
Real-time analytics on historical data. Analyzing last month's performance doesn't benefit from real-time processing. Batch it, cache it, serve it fast.
Event-driven everything. Not every state change needs to trigger downstream reactions. Some changes are informational. Treating them all as events creates unnecessary complexity and potential failure cascades.
Deciding for your system
For each feature, ask: "What happens if this data is 5 minutes old?" If the answer is "nothing meaningful," batch it. If the answer is "a customer has a bad experience" or "an operator makes a wrong decision," make it real-time.
Simple framework. Saves enormous engineering effort.