AI Knows the Answer. The problem is your question.

Apr 20

Written By Jireh Cuarto

LLMs can be highly accurate in isolation, but accuracy can collapse when real people use them.

Key Takeaways

In a randomized study of 1,298 participants, LLMs identified the correct condition 94.9% of the time when tested alone, yet people using the same LLMs identified the relevant condition in fewer than 34.5% of cases—no better than a control group using any source they wanted.
The main failure isn’t always the model’s medical knowledge—it’s the interaction layer. Missing context, vague prompts, and unstructured symptom descriptions can steer even strong models toward the wrong framing and the wrong next steps.

The challenge: Patients don't know how to prompt AI effectively. Generic AI tools return generic answers — because they have no idea what happened in the exam room.
The solution: Neatly already holds the salient context from the visit. The patient doesn't need to know what to include in a prompt — Neatly does.
How it works: Neatly automatically embeds the relevant clinical context into every patient query before it reaches the LLM, producing answers grounded in the patient's actual care plan. Critically, Neatly also identifies what wasn't discussed at the appointment — and understands how to incorporate that missing context into the prompt to further improve the accuracy and relevance of results.

When someone feels unwell—or gets a new diagnosis—the most natural next step is to search. Today, “search” increasingly means asking an AI assistant.

That seems reasonable: large language models (LLMs) can pass medical exams and produce fluent explanations. But fluency is not the same thing as reliability.

A Nature Medicine study put this to the test in a way that matters for real-world care: not just whether an LLM “knows medicine,” but whether people can actually use an LLM to make sense of symptoms and decide what to do next. The results highlight an uncomfortable truth: even good models can produce poor outcomes when patients have to supply the right context on their own.[1]

The “35% problem”: when patient-mediated AI goes off track

In the randomized preregistered study (1,298 participants, ten medical scenarios), researchers compared different approaches to getting medical guidance:

LLMs tested alone were strong: they correctly identified underlying conditions in 94.9% of cases and selected the correct care “disposition” (the right level of next step) 56.3% of the time on average.[1]
Humans using the same LLMs performed dramatically worse: participants identified relevant conditions in fewer than 34.5% of cases and correct disposition in fewer than 44.2%—no better than the control group.[1]

That ~35% figure lands like a gut punch because it reframes the question. It’s not “Are LLMs smart?” It’s: “Can patients reliably translate their real symptoms and medical history into a question that elicits the right answer?”

Why AI health answers fail in the real world (even with a good model)

The study’s results align with what clinicians and patients already know from experience: the hardest part of healthcare is often not the final recommendation—it’s the inputs.

Here are common pitfalls when patients use LLMs to research health conditions, and what tends to go wrong:

1) Missing clinical context (the most common failure mode)

A patient question often leaves out the very details that determine what’s risky vs. routine:

timeline (hours vs. weeks)
severity and progression
medications (including recent changes)
comorbidities
pregnancy status
recent procedures, infections, travel, exposures
red flags (e.g., chest pain + shortness of breath)

An LLM can’t infer what it’s not told. And if the user doesn’t know what matters, the prompt can unintentionally steer the model into the wrong clinical frame.

2) Vague symptom language that’s normal for humans—but hard for triage

Patients describe experience (“I feel off,” “my heart feels weird,” “bad stomach”) rather than clinical features (onset, localization, duration, associated symptoms, triggers). That’s not a personal failure—it’s a translation problem.

The downside: an LLM may provide content that sounds helpful but doesn’t resolve the key clinical ambiguity.

3) The “confident wrong answer” problem

When an LLM is wrong, it often doesn’t sound wrong.

In health, the cost of “confidently wrong” can be high in either direction:

under-triage (false reassurance)
over-triage (unnecessary panic, ED visits, or avoidance of care)

The Nature Medicine study explicitly evaluated “disposition” accuracy and reported that human–LLM teams did not outperform the control group.[1]

4) No persistent memory of what already happened in the clinic

The LLM chat starts from scratch unless the patient supplies all relevant information every time.

But in reality, the most valuable context is often:

what the clinician already ruled out
what the clinician thought was most likely
why a medication was chosen (and what side effects to watch for)
what “return precautions” were given

Patients frequently don’t have these details in a usable form—especially days later.

5) The appointment itself is incomplete (and patients need guidance anyway)

Even the best clinicians are constrained by time. Many visits end with unanswered questions:

“What does that term mean?”
“What are common side effects?”
“What should make me call sooner?”
“How do I take this correctly?”
“What should I track between now and my follow-up?”

If the patient turns to an LLM afterward, the model may provide generic guidance, but it may not be aligned with the patient’s plan—or with what was actually decided in that visit.

How Neatly is designed to address the interaction-layer gap

Neatly is built around a simple premise: better inputs lead to better outputs—especially in health.

Instead of asking patients to reconstruct their clinical story from memory, Neatly helps in three ways:

1) Capture: preserve what happened in the visit

Neatly records and structures the doctor–patient conversation into patient-friendly summaries. That means the patient doesn’t have to remember (or retype) critical details later.

When a question comes up—two hours after the visit or two weeks later—the patient can ask it with the benefit of their real context, not a blank slate.

2) Translate: turn raw conversation into the medical concepts that belong in the question

A key challenge in patient-mediated prompting is that patients often don’t know which details matter.

Neatly identifies and organizes important clinical concepts so the patient’s questions include what’s missing from a typical prompt, such as:

key diagnoses and differentials discussed
treatments and medication instructions
tests ordered and the reason for each
relevant history that changes risk
“watch for” symptoms and return precautions

In other words: Neatly helps convert a vague question like: “Is this normal?”

…into something closer to what a clinician would ask:

“After starting metformin three days ago for type 2 diabetes, I’m having diarrhea and cramping, no fever, and I’m able to keep fluids down. The plan was to titrate the dose weekly. What are expected side effects vs. warning signs, and when should I contact my clinician?”

3) Add: include “should-know” information that may not have been said in the room

This is the part that matters most.

Healthcare conversations are constrained—patients may be overwhelmed, clinicians may be rushed, and not every education point gets covered.

Neatly is designed to supplement the conversation with important guidance patients commonly need even if it wasn’t explicitly discussed, for example:

plain-language definitions for unfamiliar terms
expected time-to-benefit for a medication
common side effects vs. urgent symptoms
how to take a medication correctly (food, timing, missed doses)
what to track between visits
how to prepare for upcoming tests

This helps both patients and clinicians: patients get better understanding and follow-through; clinicians get fewer misinterpretations and more productive follow-up questions.

What this means for clinicians and health systems

The Nature Medicine findings point to a key lesson for any “patient-facing AI” effort:

Do not evaluate the model alone. Evaluate the human–AI system.

That includes:

whether patients can supply the right context
whether the system protects against overconfidence
whether it reinforces the care plan rather than drifting from it
whether it clearly distinguishes education from medical advice

Neatly’s approach is to anchor AI interactions in the patient’s longitudinal, visit-based context and to proactively fill in the educational gaps that predict confusion and non-adherence.

A practical checklist: how to ask safer, better health questions with AI

Whether or not you use Neatly, these are the elements that most improve the quality of AI health answers:

What is the main concern? (one sentence)
Timeline: when it started, whether it’s worsening, and anything that triggered it
Severity: what you can/can’t do, pain scale, impact on breathing, eating, hydration, sleep
Key history: diagnoses, pregnancy status, immune status, recent procedures
Meds: current meds and what changed recently
Red flags: chest pain, fainting, new weakness, severe headache, confusion, dehydration, uncontrolled bleeding
What your clinician already said: working diagnosis, tests ordered, and the plan
Your goal: reassurance, interpretation, when to seek care, how to manage symptoms, how to prepare for next visit

The bottom line

LLMs can be impressive medical assistants in controlled settings—but patients shouldn’t have to become prompt engineers to get safe, useful guidance.

The evidence suggests that patient-mediated AI can fail frequently even when the underlying model performs well.[1]

Neatly is designed for the messy reality: incomplete memories, incomplete visits, and high-stakes decisions between appointments. By capturing the visit, translating it into the clinical context that matters, and adding the “should-know” guidance that patients need anyway, Neatly helps turn AI from a generic search box into a safer, more personalized health companion.

THe best way to learn more about Neatly Health is to try it. Download today.

Medical disclaimer: Neatly provides educational information and helps patients understand and organize questions for their care team. It does not replace professional medical advice, diagnosis, or treatment. If symptoms feel severe or urgent, seek emergency care or call local emergency services.

Jireh Cuarto