ML Feature Stores & Model Serving
Manage features and serve ML models in production
TL;DR
A feature store is a centralized repository managing ML features: computed values used for training and inference. Offline features (training data) pre-computed in batch and stored in data warehouse. Online features (inference) stored in low-latency cache (Redis), retrieved in <100ms during predictions. Training-serving skew—features computed differently during training vs serving—is the silent killer of ML systems. Model serving deploys models as REST APIs or streaming predictions, with canary deployments for safety. Never compute features differently between training and serving; use feature store as single source of truth.
Learning Objectives
By the end of this article, you will understand:
- Difference between offline and online features
- Feature store architecture and components
- How to prevent training-serving skew
- Model serving patterns: batch, REST, streaming
- Monitoring for data drift and model degradation
- Cost-latency trade-offs in feature and model systems
Motivating Scenario
Your ML team builds a fraud detection model using 50 features (transaction amount, user age, historical purchase count, etc.). Training happens monthly on historical data: 30 days to compute features, 2 days to train model. Serving needs predictions in <100ms for real-time fraud checks. During training, features computed from cleaned, deduplicated transaction data. In production, features computed from raw transaction stream—slightly different logic leads to training-serving skew: model sees different data distributions, accuracy drops 30%. You need a feature store ensuring training and serving use identical feature computations.
Core Concepts
Offline Features (Training)
Offline features are pre-computed from historical data and stored in a data warehouse for model training.
Characteristics:
- Computed once, reused multiple times
- No latency constraints (can take hours)
- High accuracy: can use expensive transformations
- Snapshot at specific time (not real-time)
Typical pipeline:
Raw transaction data (30 days)
→ Group by user, compute aggregates
→ Join with user demographics
→ Create feature matrix: [user_id, amount_mean, purchase_count, age, ...]
→ Store in warehouse as training dataset
Tools: Spark/Pandas for batch computation, store in Parquet or database.
Online Features (Inference)
Online features are retrieved in real-time during model predictions, stored in low-latency cache.
Characteristics:
- Must respond <100ms typically
- Stored in Redis, DynamoDB, or specialized feature cache
- Computed continuously as events arrive
- Always fresh, but limited history
Typical flow:
Real-time event: user purchases item
→ Query online feature store: "user_123 features"
→ Feature store returns: {amount_mean: 45.2, purchase_count: 12, ...}
→ Feed to model: predict fraud?
→ Return prediction <50ms
Training-Serving Skew
Training-serving skew occurs when features differ between training and serving.
Common causes:
- Logic differences: Training uses deduplicated data, serving uses raw
- Data availability: Training sees 30-day history, serving only 7-day
- Timing: Training uses yesterday's aggregates, serving uses real-time
- Schema changes: Production DB changed, training logic didn't update
Example skew:
Training: average_purchase =
SELECT AVG(amount) FROM transactions
WHERE user_id = 123 AND amount > 0 AND NOT duplicate
Serving: average_purchase =
SELECT AVG(amount) FROM transactions
WHERE user_id = 123
// Forgot to filter duplicates and invalid amounts!
Result: Model trained on [avg=50], sees [avg=62] in production, predictions wrong.
Prevention:
- Store feature definitions in feature registry
- Generate both training and serving code from same definition
- Use feature store libraries (Feast, Tecton) that enforce this
Feature Store Architecture
Components:
-
Feature Registry: Central repository of feature definitions
- Name, description, computation logic
- Data type, freshness requirements
- Training vs serving requirements
-
Offline Store (Data Warehouse): Historical features for training
- Large, durable, slow
- Parquet in S3, BigQuery, Snowflake
-
Online Store (Cache): Real-time features for serving
- Small, fast, volatile
- Redis, DynamoDB, specialized systems
-
Feature Computation Engine: Transforms raw data into features
- Batch (Spark): offline
- Streaming (Flink): online
- Both read from same registry, use same logic
Model Serving Patterns
Batch Predictions:
- Pre-compute predictions for all entities nightly
- Store in database
- API returns precomputed value instantly
- Best for: recommendation systems, customer segments
REST API (Online):
- Request arrives: /predict?user_id=123
- Fetch features from online store
- Run model inference
- Return prediction
- Best for: fraud detection, real-time scoring
Streaming:
- Model subscribes to event stream (Kafka)
- Receives transactions, computes features, runs model
- Publishes predictions downstream
- Best for: continuous monitoring, alert systems
Practical Example
- Feast: Offline Features
- Feast: Online Features
- Model Serving API
- Data Drift Detection
# features.py
from feast import FeatureView, Entity, FeatureStore
from feast.data_sources import BigQuerySource
# Define entity
user = Entity(name="user_id", value_type=ValueType.INT64)
# Define data source
user_stats_source = BigQuerySource(
table="project.dataset.user_statistics",
timestamp_field="event_timestamp",
created_timestamp_column="created_timestamp"
)
# Define features
user_stats_view = FeatureView(
name="user_stats",
entities=[user],
ttl=86400 * 30, # 30-day history
features=[
Feature(name="total_purchases", dtype=ValueType.INT64),
Feature(name="avg_purchase_value", dtype=ValueType.FLOAT),
Feature(name="days_since_last_purchase", dtype=ValueType.INT64),
],
online=True, # Also materialize to online store
source=user_stats_source
)
# Training: Get historical features
store = FeatureStore(repo_path=".")
training_data = store.get_historical_features(
entity_df=pd.DataFrame({
'user_id': [1, 2, 3, ...],
'event_timestamp': [datetime(...), ...]
}),
features=['user_stats:total_purchases',
'user_stats:avg_purchase_value']
).to_df()
# Save for training
training_data.to_csv('training_features.csv')
from feast import FeatureStore
store = FeatureStore(repo_path=".")
# Materialize latest features to Redis
store.materialize_incremental(end_date=datetime.utcnow())
# Online: Retrieve features for prediction
features = store.get_online_features(
features=['user_stats:total_purchases',
'user_stats:avg_purchase_value'],
entity_rows=[{'user_id': 123}]
)
print(features)
# {
# 'user_stats:total_purchases': [45],
# 'user_stats:avg_purchase_value': [52.30],
# }
# Feed to model
prediction = fraud_model.predict(features)
from fastapi import FastAPI
from feast import FeatureStore
import joblib
app = FastAPI()
store = FeatureStore(repo_path=".")
model = joblib.load("fraud_model.pkl")
@app.post("/predict")
async def predict(user_id: int, transaction_amount: float):
# Fetch features from online store
try:
features = store.get_online_features(
features=['user_stats:total_purchases',
'user_stats:avg_purchase_value'],
entity_rows=[{'user_id': user_id}]
)
except Exception as e:
return {"error": f"Feature fetch failed: {e}"}
# Build feature vector
feature_vector = [
features['user_stats:total_purchases'][0],
features['user_stats:avg_purchase_value'][0],
transaction_amount,
# ... other features
]
# Get prediction
fraud_prob = model.predict_proba([feature_vector])[0][1]
return {
"user_id": user_id,
"fraud_probability": float(fraud_prob),
"recommended_action": "block" if fraud_prob > 0.8 else "allow"
}
# curl -X POST http://localhost:8000/predict \
# -H "Content-Type: application/json" \
# -d '{"user_id": 123, "transaction_amount": 599.99}'
import numpy as np
from scipy import stats
def check_data_drift(
historical_data,
recent_data,
feature_name,
threshold=0.05
):
"""
Detect if feature distribution changed significantly
"""
# Kolmogorov-Smirnov test
statistic, p_value = stats.ks_2samp(
historical_data[feature_name],
recent_data[feature_name]
)
drifted = p_value < threshold
return {
"feature": feature_name,
"drifted": drifted,
"p_value": p_value,
"historical_mean": historical_data[feature_name].mean(),
"recent_mean": recent_data[feature_name].mean(),
}
# Monitor features
features = ['total_purchases', 'avg_purchase_value', 'days_since_purchase']
drifts = [
check_data_drift(training_data, production_data, f)
for f in features
]
for drift in drifts:
if drift['drifted']:
print(f"ALERT: {drift['feature']} drifted!")
print(f" Historical mean: {drift['historical_mean']:.2f}")
print(f" Recent mean: {drift['recent_mean']:.2f}")
When to Use / When Not to Use
- Customer segments for campaigns
- Recommendation lists for homepage
- Churn risk scores for outreach
- No latency constraints
- Process millions entities efficiently
- Lower serving infrastructure cost
- Fraud detection at transaction time
- Personalized pricing
- Real-time recommendations
- Sub-100ms latency required
- Unpredictable entity access patterns
- Model requires current context
Patterns & Pitfalls
Design Review Checklist
- Defined entities (users, products, transactions) that features attach to
- Listed all features and their source data (offline vs online)
- Documented feature computation logic: exactly same for training and serving?
- Chose feature store (Feast, Tecton, custom) or built it?
- Configured offline store: where do training features live? (DW, S3)
- Configured online store: where do inference features live? (Redis, DynamoDB)
- Set up materialization schedule: how often refresh online features?
- Planned model serving: batch, REST API, streaming?
- Configured monitoring: data drift detection, feature quality checks
- Tested failure scenarios: online store down, feature unavailable, staleness
Self-Check
- What's the difference between offline and online features? (Hint: batch vs real-time, accuracy vs latency)
- How do you prevent training-serving skew? (Hint: single source of truth, same logic for both)
- Why does online feature latency matter for predictions? (Hint: adds to total request latency)
- What's data drift and how detect it? (Hint: feature distribution changes, K-S test)
- When would you use batch predictions vs online? (Hint: latency requirements, entity count)
Next Steps
- Set up Feast locally: define features, train model, serve predictions
- Implement online store: materialize features to Redis, test retrieval latency
- Build monitoring: detect data drift, track prediction latency and accuracy
- Test skew scenarios: intentionally compute features differently, measure impact
- Design canary deployment: gradually route traffic to new model, monitor metrics