Scalable Analytics Architecture Simulator

Case Study — Product Data Scientist Portfolio

90-Second Summary

Problem

Data products fail when infrastructure does not scale with growth. Startups, growth-stage, and enterprise companies need different architectures.

Approach

Built a simulator that dynamically reconfigures data architecture (storage, processing, modeling, deployment, monitoring) based on organizational scale. Outputs latency, monthly cost, reliability, and complexity.

Output

Three presets (Startup, Growth, Enterprise) with layered architecture diagram and key metrics. Demonstrates evolution from Postgres + cron to Kafka + K8s.

Decision

Choose architecture tier based on traffic volume, experiment velocity, and reliability requirements.

Measurement Plan

Time Horizon	Milestone	Success Criteria
0–30 days	Current state documented	Inventory of pipelines, tables, and SLAs; pain points identified
30–60 days	Target architecture designed	Tool choices justified; cost and latency projections
60–90 days	Migration roadmap	Phased rollout plan; success metrics per phase

Success Metrics & Tripwires

Success: Pipeline latency within SLA; data freshness meets product needs; cost within budget; zero data loss incidents.

Tripwire (anti-success): Repeated pipeline failures; model staleness; runaway cloud costs.

Instrumentation / Event Taxonomy

Event Name	Required Properties	Notes
pipeline_run_start	pipeline_id, run_id, timestamp	Orchestration trigger
pipeline_run_end	pipeline_id, run_id, status, duration_sec	Success/failure
data_freshness	table_name, last_updated, expected_freshness_hours	SLA monitoring

Data Model Layer

Staging: Raw tables from sources (APIs, DBs, events). stg_* with dedup and type casting.

Marts: fct_* and dim_* for business logic. Incremental models where appropriate.

Tests: dbt uniqueness, not_null, relationships, accepted_values. Freshness checks on critical sources.

Documentation: dbt docs; lineage; column descriptions; README per model.

View Code Resume