Patientdesk Labs

The research layer for dental clinic AI

Patientdesk Labs is the research arm of Patientdesk.ai. We build benchmarks, fine-tune models, and develop domain-specific tools so that when an AI answers the phone at a dental office, it actually works.

What We Work On

Three Research Tracks

Dental clinic AI has unique requirements that generic models don't meet. Our research focuses on three areas where domain-specific work makes the biggest difference.

Evaluation
Live

DentesBench

The first benchmark for evaluating LLMs as dental clinic phone agents. 483 scenarios across 10 categories, scoring empathy, clinical safety, accuracy, brevity, and tone — plus a deployment-weighted leaderboard.

Read the paper →
Language Models
Coming Soon

Dental LLM Fine-Tuning

Training smaller, specialized models to match frontier quality at a fraction of the cost. Our approach uses iterative preference optimization guided by a comprehensive behavioral specification for the dental receptionist role.

Paper coming soon
Speech Recognition
Future Work

Dental STT

Adapting speech-to-text models for dental clinic phone audio. Patient calls with accents, background noise, and dental terminology that generic models consistently get wrong — "prophylaxis" shouldn't become "prophy lax is."

Paper coming soon
Our Approach

Why Dental AI Needs Its Own Research

A dental receptionist AI has to be warm without accidentally diagnosing, efficient without being cold, and helpful without overstepping clinical boundaries. No off-the-shelf model gets this right consistently.

Measure What Matters

Generic benchmarks test reasoning and knowledge. We test whether a model can be empathetic to an anxious patient without crossing clinical lines.

Train for the Domain

Frontier models know dental terminology. What they lack is the discipline to stay warm and safe simultaneously. That requires domain-specific training.

Respect Patient Privacy

All research data is fully de-identified in compliance with HIPAA regulations. No Protected Health Information is used in any benchmark or training process.

Latest Results

DentesBench v0.2 Leaderboard

Eight models evaluated on 483 dental phone agent scenarios. The v2 score weights quality (80%), cost (10%), and latency (10%) to reflect real deployment constraints. Full methodology →

Top 5 — Deployment-Weighted Ranking April 2026
#Model EmpathySafety AccuracyBrevity ToneV2 Pass Latency Cost/resp
1Gemma 4 31B (OpenRouter) 6.89.6 7.38.7 6.88.18 75% 2.4s $0.00006
2GLM-5 Turbo (OpenRouter) 7.09.7 7.78.7 7.38.01 84% 3.5s $0.00074
3GPT-5.4 6.99.6 8.08.3 7.07.92 86% 1.5s $0.00161
4Claude Sonnet 4.6 7.49.5 7.78.3 7.77.86 88% 2.3s $0.00189
5Claude Opus 4.6 7.59.6 7.88.3 7.87.41 91% 3.1s $0.00318

Interested in Our Research?

Read the full DentesBench paper for methodology, results, and what we've learned about the tradeoffs in dental AI.

Read the Paper Visit Patientdesk.ai