// Portfolio — Data tooling

ETL Playground

A browser-only CSV cleaning lab: deterministic steps, a quick quality report, and download. No backend, no uploads to any server.

Privacy: All parsing and cleaning run in your browser. Nothing is sent to a server. Do not paste sensitive or regulated personal data; use synthetic samples or non-sensitive test files.

Data in

Try sample

Cleaning pipeline

Six steps run after parse: trim/null tokens → dedupe (identical rows, then same normalized email when an email column exists and the table is not event-log-shaped) → infer types → coerce dates/numbers → winsorize numeric columns → impute missing letter+number ids when there is a single gap, format money-like columns to two decimals (empty amounts → 0.00), then drop rows that still have no id and fill other blanks with N/A.

Load data to see pipeline steps.

Data quality report

No report yet.

Preview

Before (parsed)

After (cleaned)

// Technical summary

What this demo uses

JavaScript Papa Parse CSV Client-side ETL

This demo cleans messy CSV files in the browser and shows exactly what changed, so users can trust the result without sending data to a server.

  • Methodology: A fixed ETL pipeline runs step-by-step (parse, clean, type handling, outlier control, missing-value fill). The same in-memory table powers reports, previews, and downloads.
  • Technical terms:
    • ETL: Extract, Transform, Load - collect data, clean it, and prepare it for use.
    • Winsorize: Limits extreme values so outliers do less damage.
    • Deterministic: Same input gives the same output every time.
  • Toolsets/technology used: JavaScript, Papa Parse, CSV processing, and client-side browser storage/memory only.