Loading evaluation data…
Could not load data.
From the repo root: python3 -m http.server 8080 then open /model-console/
// ML · Static evaluation
Model comparison
Metric glossary
Precision vs recall (test)
Higher and to the top-right is better for both axes. Toggle models to focus the chart and table.
Tradeoff
At a fixed 0.5 probability threshold, improving recall often trades off precision. PR-AUC summarizes ranking quality on imbalanced data; pair it with the confusion counts on each card.
// Metrics table
| Model | Precision | Recall | F1 | PR-AUC | ROC-AUC | Brier |
|---|
// When to pick each model
Limitations & honesty
// Technical summary
What this demo uses
This demo compares precomputed phishing-URL model results. It explains what each metric means in plain language so non-technical users can still understand model tradeoffs.
- Methodology: Models were trained and evaluated offline on a labeled URL dataset, then exported as static JSON. The page only visualizes saved results (no live training, no API calls).
-
Technical terms:
- TF-IDF: Turns text into weighted numbers so models can read patterns.
- PR-AUC: How well a model balances catching threats and avoiding false alarms.
- ROC-AUC: Overall ability to separate risky vs safe URLs.
- Brier score: How close confidence scores are to reality; lower is better.
- Toolsets/technology used: JavaScript, Chart.js, static JSON, and offline Python/scikit-learn model evaluation.