Listen to this Post

Introduction
Every market-making model you can name — Avellaneda-Stoikov included — was built for liquid books with continuous order flow and reliable parameter estimation. Put it on an instrument that trades 50 times an hour and it doesn’t just underperform; it abandons you entirely. The theory assumes a market that is always there. Illiquid ones are not. The model is rarely what breaks — it’s everything between the book and the quote.
Learning Objectives
- Understand why the Avellaneda-Stoikov framework collapses in low-liquidity environments and how parameter estimation becomes statistically unstable
- Learn to distinguish toxic fills from harmless ones through order book refill detection and microstructural feature engineering
- Build infrastructure-layer defenses — including OFI-based staleness detection, event-driven hedging, and low-latency quote management — that let the model survive when the book isn’t there
You Should Know
- The Parameter Estimation Problem: When k Becomes Noise
The Avellaneda-Stoikov model’s reservation price and optimal spread depend critically on κ (kappa) — the market-order arrival rate parameter that governs how sensitive liquidity takers are to the quoted spread. The reservation price is given by:
r = mid - inventory gamma sigma^2 (T - t)
And the optimal spread by:
s = gamma sigma^2 (T - t) + (2 / gamma) ln(1 + gamma / k)
Where γ is inventory risk aversion, σ is volatility, and T – t is the remaining time horizon. The order arrival intensity follows:
Λᵇ(δ) = Λᵃ(δ) = A × e^(-k × δ)
with A, k > 0.
On a liquid name at 10,000 trades an hour, a 60-second window gives you a stable k. At 50 trades an hour, that same window sees 0.83 events. You’re not estimating a parameter — you’re guessing on noise. The classical model relies on static parameters, constant volatility, and a simplified description of order-execution intensity — assumptions that substantially limit applicability under non-stationary microstructure.
What to do about it: Implement dynamic parameter calibration using a regularised maximum-likelihood estimator, which achieves a regret upper bound of order ln²T in expectation. For practical implementation, consider expanding your feature space from 2 to 22 microstructural variables — one study demonstrated that on SBER, test R² increased from 0.024 to 0.412 (ElasticNet extended), with sign accuracy reaching 80.1%. Dynamic calibration of k proved critical: switching from a constant k = 100 to a daily-calibrated value increased total PnL by 58–78%.
Python snippet for rolling k estimation:
import numpy as np from scipy.optimize import minimize def estimate_kappa(trade_timestamps, window_seconds=3600): """ Estimate kappa (market order arrival rate) using maximum likelihood over a rolling window. """ Count arrivals in rolling windows arrivals = np.diff(trade_timestamps) lambda_hat = 1.0 / np.mean(arrivals) Poisson rate For Avellaneda-Stoikov, kappa is estimated from: P(arrival at spread delta) = A exp(-kappa delta) Fit via Poisson regression on (spread, fill_probability) def neg_log_likelihood(k, spreads, fill_counts, exposure): lambda_pred = exposure np.exp(-k spreads) return -1p.sum(fill_counts np.log(lambda_pred) - lambda_pred) result = minimize(neg_log_likelihood, x0=1.0, args=(spreads, fill_counts, exposure)) return result.x[bash]
- Order Flow Imbalance (OFI): Distinguishing “No Pressure” from “No Data”
The same numeric value can represent a genuine market condition or simply the absence of current evidence. An OFI of zero means “no pressure” in a liquid market, but in an illiquid one, it more likely means “nothing has printed in eight minutes.” Without preserving that context, post-trade attribution can easily blame the model for an infrastructure or data-quality failure.
OFI measures the excess of buy pressure over sell pressure and has been shown to predict short-term returns with remarkable accuracy across asset classes. The key insight is constructing OFI in event time rather than clock time, enabling high-frequency forecasts of buy–sell imbalance. In practice, this means you need features that carry their own staleness — a timestamp of last update, a decay factor, or an explicit “data age” signal that informs the quoting engine whether a zero means equilibrium or absence.
Implementation approach:
class StaleAwareOFI:
def <strong>init</strong>(self, decay_seconds=60):
self.last_update = {}
self.imbalance = {}
self.decay_seconds = decay_seconds
def update(self, symbol, bid_volume, ask_volume, timestamp):
raw_ofi = bid_volume - ask_volume
age = timestamp - self.last_update.get(symbol, timestamp)
Apply staleness penalty: older data carries less weight
staleness_factor = np.exp(-age / self.decay_seconds)
self.imbalance[bash] = raw_ofi staleness_factor
self.last_update[bash] = timestamp
return self.imbalance[bash]
@property
def is_stale(self, symbol, current_time):
age = current_time - self.last_update.get(symbol, 0)
return age > self.decay_seconds 2
- Refill Detection: The Signal That Separates Toxic from Harmless
You can’t tell a toxic fill from a harmless one until the book fails to refill, and by then you’re holding it. A level that returns in milliseconds is uninformed — likely a small retail order or a routine cancellation. A level that stays empty for minutes is toxic — someone just took your liquidity and the market isn’t coming back to rescue you.
Refill detection requires measuring the time between a fill (or level depletion) and the restoration of that level in the order book. In high-frequency environments, this needs sub-millisecond precision. Private fills arrive faster than public trade summaries — market makers on the offer side get a private fill acknowledgment faster than the trade summary is published by the exchange. This latency advantage is precisely what allows sophisticated market makers to rejoin the level with new quotes before the rest of the market even knows a trade occurred.
Linux system tuning for low-latency order processing:
Disable CPU frequency scaling for consistent performance sudo cpupower frequency-set -g performance Set real-time priority for trading processes sudo chrt -f -p 99 $(pgrep -f trading_engine) Lock memory to prevent swapping sudo prlimit --memlock=unlimited --pid=$(pgrep -f trading_engine) Optimize network stack for low latency sudo sysctl -w net.core.rmem_max=16777216 sudo sysctl -w net.core.wmem_max=16777216 sudo sysctl -w net.ipv4.tcp_low_latency=1 sudo sysctl -w net.core.busy_poll=50 sudo sysctl -w net.core.busy_read=50
- Event-Driven Hedging: Fire on the Fill, Not a Clock
One toxic position erases the spread from dozens of clean trades. A hedger that fires on a fixed clock interval will always be too slow for the positions that matter most. The infrastructure buys you seconds to react, but it won’t tell you which book you’re in. The solution is event-driven hedging — the hedge calculation and execution trigger on the fill event itself, not on a timer.
Tick-to-trade latency measures the time from when the exchange acknowledges a message in its order gateway, resulting in a market data update (tick), to when your software transmits an instruction (order, cancel, replace, etc.) back to the exchange. Even with the pipeline right, a quiet book and a loaded one look identical until the fill goes against you. The infrastructure buys you seconds to react. It won’t tell you which book you’re in.
FIX session management for order entry:
import quickfix as fix class MarketMakingApplication(fix.Application): def onCreate(self, sessionID): Initialize FIX session with exchange self.session_id = sessionID def onLogon(self, sessionID): Session established — ready for order entry self.ready = True def onMessage(self, message, sessionID): msg_type = fix.MessageUtility.getMessageType(message) if msg_type == fix.MsgType_ExecutionReport: Fill event — trigger hedge calculation immediately exec_type = fix.ExecType(message.getField(fix.ExecType())) if exec_type.getValue() == fix.ExecType_FILL: self.on_fill(message) def on_fill(self, execution_report): Event-driven hedge: fire immediately symbol = execution_report.getField(fix.Symbol()) quantity = execution_report.getField(fix.CumQty()) price = execution_report.getField(fix.Price()) self.hedge_position(symbol, quantity, price)
5. The Infrastructure Layer: Where Models Survive
The fixes live in the plumbing, not the formula. Modern production systems typically use the model as a guideline to build on top of — an adaptive market-making architecture that preserves the analytical structure of the Avellaneda–Stoikov framework while introducing adaptation mechanisms for changing market regimes. The central idea is to separate market dynamics from the trading objective: the market state determines a low-dimensional set of Avellaneda–Stoikov parameters, while recent realized rewards determine a low-dimensional objective vector.
In practice, this means:
- Colocation — placing trading servers in close physical proximity to the exchange’s matching engine. Latency asymmetry functions like a barrier to financial inclusion — slower firms are systematically removed from top-book competition.
-
Kernel bypass networking — using technologies like DPDK or Solarflare’s OpenOnload to bypass the kernel networking stack entirely.
-
FPGA acceleration — modern platforms achieve tick-to-trade latency under 200 nanoseconds via full FPGA architecture.
-
Real-time volatility forecasting — implementing HAR, GARCH, or LightGBM models to adjust market-making parameters dynamically across different market regimes.
Windows performance tuning for trading systems:
Set high performance power plan powercfg -setactive 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c Disable Nagle's algorithm for low-latency sockets Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\" -1ame "TcpAckFrequency" -Value 1 Increase system responsiveness for real-time threads Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\PriorityControl" -1ame "Win32PrioritySeparation" -Value 26 Disable TCP auto-tuning for consistent latency netsh int tcp set global autotuninglevel=disabled
6. Inventory Risk and the Feedback Loop
The Avellaneda-Stoikov model creates a feedback loop that stabilizes inventory while still providing liquidity. When inventory grows, the reservation price shifts so quotes are skewed to attract trades that reduce the position. But this assumes the market will actually take the other side. In illiquid instruments, there is no other side.
Modern approaches combine inventory management with microstructural signals. One implementation uses spread width driven by realized volatility, an order-book imbalance (OBI) signal as the alpha factor to bias quotes ahead of anticipated moves, and inventory skew that tilts quotes to mean-revert position toward zero. The risk-aversion parameter γ must be calibrated dynamically — switching from a constant to a daily-calibrated value has been shown to increase total PnL by 58–78%.
Python inventory management with dynamic skew:
class InventoryManager: def <strong>init</strong>(self, max_inventory=1000, target_inventory=0, risk_aversion=0.1): self.max_inventory = max_inventory self.target = target_inventory self.gamma = risk_aversion self.current_inventory = 0 def compute_skew(self, mid_price, volatility, time_horizon): Reservation price from Avellaneda-Stoikov inventory_deviation = self.current_inventory - self.target reservation_shift = -inventory_deviation self.gamma volatility2 time_horizon Dynamic spread based on inventory risk base_spread = self.gamma volatility2 time_horizon risk_premium = (2 / self.gamma) np.log(1 + self.gamma / self.kappa) optimal_spread = base_spread + risk_premium Skew quotes based on inventory bid_price = mid_price + reservation_shift - optimal_spread / 2 ask_price = mid_price + reservation_shift + optimal_spread / 2 return bid_price, ask_price def update_inventory(self, fill_quantity): self.current_inventory += fill_quantity If inventory exceeds max, widen spreads aggressively if abs(self.current_inventory) > self.max_inventory 0.7: self.gamma = 1.5 Increase risk aversion
What Undercode Say
- The model is rarely what breaks — it’s everything between the book and the quote. Academic models like Avellaneda-Stoikov provide a theoretical framework for market making, but they may not necessarily be directly applicable in practice. The gap between theory and reality is where infrastructure matters most.
-
Dynamic parameter calibration is non-1egotiable — static parameters, constant volatility, and simplified execution intensity assumptions substantially limit the model’s applicability under non-stationary microstructure. The classical model’s assumptions may not hold in real markets, requiring reliable parameter estimation and fast execution.
The distinction between model failure and infrastructure failure is critical. The same numeric value can represent a genuine market condition or simply the absence of current evidence. Without preserving that context, post-trade attribution can easily blame the model for an infrastructure or data-quality failure. The best production systems use the model as a guideline to build on top of, not as the final word.
The infrastructure buys you seconds to react, but it won’t tell you which book you’re in. Even with the pipeline right, a quiet book and a loaded one look identical until the fill goes against you. That’s why the plumbing matters more than the formula — and why the people who build that plumbing are worth more than the quants who write the models.
Prediction
- +1 The trend toward adaptive market-making architectures that separate market dynamics from trading objectives will accelerate, with reinforcement learning and zero-shot adaptation methods increasingly integrated into production systems.
-
+1 Dynamic parameter calibration using machine learning will become standard practice, with firms moving from daily recalibration to event-driven or regime-switching calibration based on real-time microstructural features.
-
-1 The gap between academic models and practical implementation will widen as low-latency infrastructure becomes more specialized and expensive, consolidating market-making profitability among firms with the best plumbing rather than the best models.
-
-1 As more firms adopt similar adaptive architectures, the competitive advantage will shift to ever-faster infrastructure — FPGA, kernel bypass, and sub-microsecond tick-to-trade latency — creating a arms race that smaller firms cannot win.
-
+1 The development of open-source frameworks and cloud-based low-latency solutions (AWS cluster placement groups, shared CPGs) will democratize access to competitive infrastructure, potentially leveling the playing field for smaller market makers.
▶️ Related Video (76% Match):
https://www.youtube.com/watch?v=0ZHypIAxYNo
🎯Let’s Practice For Free:
🎓 Live Courses & Certifications:
Join Undercode Academy for Verified Certifications
🚀 Request a Custom Project:
Secure, high-velocity infrastructure and disruptive technological engineering. Contact our engineering team for high-tier development and proprietary systems:
[email protected]
💎 Smart Architecture | 🛡️ Secure by Design | ⭐ Trusted by Thousands
IT/Security Reporter URL:
Reported By: Silahian Illiquid – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅


