Patientdesk Labs is the research arm of Patientdesk.ai. We build benchmarks, fine-tune models, and develop domain-specific tools for dental clinic AI — so that when an AI answers the phone at a dental office, it actually works.
The first benchmark for evaluating LLMs as dental clinic phone agents. 512 scenarios across 10 categories, scored on empathy, clinical safety, accuracy, brevity, tone — plus cost and latency for production viability.
Read the paper →Fine-tuning Gemma 4 31B with a soul-document-driven self-training loop. Opus 4.6 judges candidate responses against our character spec, generates preference pairs, and the model iteratively improves via DPO — targeting dental-domain conversation quality.
Paper coming soonFine-tuning Whisper for dental clinic phone audio. Real patient calls with accents, background noise, and dental terminology that generic STT models consistently get wrong — "prophylaxis" shouldn't become "prophy lax is."
Paper coming soonFrontier models evaluated on dental phone agent scenarios. Scored on quality dimensions by an LLM judge, with real-world cost and latency for production context.
Opus scores highest but costs 50x more per call than Flash and takes 7x longer to respond. In a real-time phone call, a 3-second response delay is unacceptable. Flash responds in 400ms but fails 100% of scenarios. Sonnet hits the sweet spot — near-Opus quality at 1.5s latency and $0.03/call — but even then, the style-vs-safety tradeoff means no model clears 8/10 on all dimensions. Read more →
Every scenario captures real dental clinic interactions. Skewed toward harder categories where models fail most consequentially.
Methodology, the style-vs-safety tradeoff, cost/latency analysis, and what we learned about dental AI.
Read the Paper