Recommend

Which setup should I run?

Next action

Loading workspace state

InferGrade is checking account, runner, and evidence readiness.

Setup

Get ready to run

Sign in, pair one runner, confirm readiness, then queue a short decision run.

Overview

Find the setup to run next, then inspect the evidence behind it.

Start with Recommend for an answer-first setup choice. Explore public evidence or queue a benchmark when you already know what you want to inspect.

How evidence works
Evidence and setup status Checking evidence and runner readiness.
Active runs
syncing
Verified results
usable in decisions
Open blockers
checking
Sign in
Account attached
Pair a runner
Local execution ready
Choose evidence
Recommendation ready
Run or compare
Next action

Recommend

Which setup should I run?

Recent runs

Tracked execution

More tools

Exports and community

Open exports and contributor activity Download evidence snapshots or inspect community activity.

Top contributors

Community evidence stays cumulative and exportable.

Recommendations

Which setup should I run?

Why this answer? Plot, table, caveats, and next benchmark. Tradeoffs ready Open for plot, table, caveats, and next run.
Question and filters Known-good questions first, with light scope edits.
Advanced filters
Download data

Explore

Inspect families, setup matches, and evidence

Historical Results

Recent benchmark evidence

Model Backend Use Case TTFT Tok/s Hardware Capability Verification

Compare

Choose between families, variants, and quants

Preset views

Start from a useful model-choice stance, then refine the exact variants or inspect individual runs.

Individual run comparison

Result

Shareable proof artifact

Family Explorer

Branches, quants, and nearby matches

Download data

Build

Queue a benchmark run

Pick a model, choose the evidence you need, then queue the run.

Why run this benchmark

Run the benchmark that would change the answer.

Best path: start from a recommendation so Build already knows the setup and the evidence gap. From scratch, choose a model below.

1 Model 2 Benchmarks 3 Queue
Model

Choose the model first. The goal filter only narrows suggested starters and benchmark hints.

Use public artifacts without connecting Hugging Face.

Benchmark scope

Choose the evidence this run should produce.

Benchmark groups

Adjust related checks together.

Individual checks

Exact checks for this run.

Run details

Optional context for history.

Advanced overrides
Artifact and runtime

Only adjust these if you need an override.

Ontology hints

Most users should keep the inferred values.

Run plan

Ready to queue after preparation.

No run plan prepared yet.

Run Status

Active and recent runs

Recent runs

Pick a run to inspect its current stage and progress.

Live timeline

Use the timeline to understand what changed and why.

Saved plans

Reusable runs

My Runs

Contributor activity