Feature Store Mini
Production-minded pattern: one registry for feature meaning, one build path, structured validation, and the same transforms whether you run a CLI job or POST a CSV to FastAPI.
Same code path in CLI and API; errors use stable codes (e.g. missing raw columns). OpenAPI documents the live service.
Limitations & scope
- Not an online feature store: no point-in-time joins, no real-time serving layer, no Feast/Tecton-scale materialization.
- Scope is a single-repo pattern for consistent offline features and reviewable validation—not training or model hosting.
- Demo raw schema is Telco-style tabular data; production would swap the same pattern for your own column contract.
Overview
The repo implements a single build entrypoint that reads a raw CSV (synthetic sample included), applies registered column transforms, and writes a feature table. A compact validation pass runs on the result so duplicates, schema gaps, and coarse sanity issues surface as structured summaries—suitable for logs or JSON responses.
Workflow
Transforms and validation code live under src/features and src/validation. The CLI entry is python -m src.pipeline.build_feature_table; HTTP callers use POST /demo/transform with the same engine.
Engineered outputs
Eight derived columns plus customer_id (names in code: definitions.py):
num_active_services— count of active service flagsis_long_term_contract— contract term bucketmonthly_charge_band— Low / Medium / High from monthly chargecharge_per_tenure— ratio with safe handling when tenure is zerohas_tech_support,is_fiber_user,has_streaming_bundle— boolean signalsfeature_version— constant version string for the build
Tech stack
No Streamlit dashboard in repo; interaction is CLI, HTTP upload to /demo/transform, or Swagger.
Deployment / live interface
Canonical service: features.vahdetkaratas.com. The app serves GET / when a static UI bundle is deployed under layout-shell/; JSON endpoints and /docs do not depend on that UI. Typical VPS path: install dependencies, run Uvicorn against src.api.main:app, reverse-proxy TLS. Docker is not defined in-repo; a thin image wrapping the same command is enough if you add one later.
Limitations
- No online feature store: no point-in-time joins, real-time serving layer, or Feast/Tecton-style materialization.
- In scope: one-repo pattern for consistent offline features and reviewable validation—not model hosting or training orchestration.
- Sample raw data is synthetic Telco-style tabular CSV; a real deployment swaps the same contracts for your schema.
Why this project
It demonstrates where features come from and how to keep batch scoring or retraining aligned with the same transforms—without claiming a full hosted feature platform. Good fit for code review: small surface, explicit contracts, tests, and an honest limitations block.