Model comparison

Dataset

—

Split

—

Sample

—

Metric glossary

Precision vs recall (test)

Higher and to the top-right is better for both axes. Toggle models to focus the chart and table.

Tradeoff

At a fixed 0.5 probability threshold, improving recall often trades off precision. PR-AUC summarizes ranking quality on imbalanced data; pair it with the confusion counts on each card.

// Metrics table

Model	Precision	Recall	F1	PR-AUC	ROC-AUC	Brier

// When to pick each model

Limitations & honesty

// Technical summary

What this demo uses

JavaScript Chart.js Static JSON Classification metrics

This demo compares precomputed phishing-URL model results. It explains what each metric means in plain language so non-technical users can still understand model tradeoffs.

Methodology: Models were trained and evaluated offline on a labeled URL dataset, then exported as static JSON. The page only visualizes saved results (no live training, no API calls).
Technical terms:
- TF-IDF: Turns text into weighted numbers so models can read patterns.
- PR-AUC: How well a model balances catching threats and avoiding false alarms.
- ROC-AUC: Overall ability to separate risky vs safe URLs.
- Brier score: How close confidence scores are to reality; lower is better.
Toolsets/technology used: JavaScript, Chart.js, static JSON, and offline Python/scikit-learn model evaluation.