Hello, I'm Ryuta Hamamoto from TIMEWELL.
In May 2026, the U.S. medical AI OpenEvidence raised an additional 200 million dollars and was widely reported as having grown into infrastructure used by roughly 40% of U.S. physicians[^isao]. It processes 15 million clinical consultations per month and is the official AI partner of the New England Journal of Medicine (NEJM), JAMA, and 11 of their specialty journals[^decades].
It is often lumped together with "search AI for medical professionals," but the core design idea of OpenEvidence is something else: an AI trained specifically on peer-reviewed medical literature that always answers with citations. That deliberate divergence from the general-purpose LLM path has implications well beyond medicine, especially for enterprise AI.
In my AI consulting work, I get a steady stream of questions every week — "Is it okay to feed our internal documents into ChatGPT?" and "How should we design for hallucinations?" Studying OpenEvidence carefully gives you one credible answer to those questions. Working from primary sources as of June 2026, this article walks through how OpenEvidence is built and adopted, how Japanese physicians can actually use it, where it gets dangerous in a Japanese clinical context, and what enterprise AI teams should take away from the design.
What OpenEvidence Is: A Medical AI Used Daily by 40% of U.S. Physicians
OpenEvidence is a U.S. medical AI startup that spun out of Harvard Medical School's Beth Israel Deaconess Medical Center[^openevidence]. The founder and CEO is Daniel Nadler, who previously built the economic-data AI Kensho (later sold to S&P Global for 550 million dollars). For his second company, he chose "AI that supports physician decision-making."
Its adoption curve is unusual. As of early 2026, about 40% of the roughly one million U.S. physicians were registered users, and more than 15 million clinical consultations per month were running through the platform[^decades]. It is better understood not as a handy search tool but as a new layer added to the front end of American medicine.
| Metric | Value | Notes |
|---|---|---|
| Share of U.S. physicians using it | ~40% | Of roughly 1 million |
| Monthly clinical consultations | 15M+ | As of January 2026 |
| Partner medical journals | NEJM, JAMA + 11 specialty journals | Official AI partner |
| Most recent funding round | 200M USD | May 2026 |
| Valuation | Over 6B USD | Same round |
| Lead investors | Sequoia Capital, Kleiner Perkins, etc. | Series B and C |
The May 2026 200-million-dollar round is not reported as the moment OpenEvidence became a unicorn — it had already passed a 6-billion-dollar valuation and is treated as the leading medical AI company[^isao]. Top-tier venture firms like Sequoia Capital and Kleiner Perkins led later rounds, and the partner list reads like a roll call of the world's most respected medical journals.
The business model is distinctive. OpenEvidence is free for healthcare professionals, with revenue coming from pharmaceutical advertising and research partnerships. It is essentially the same playbook Google search used — give the user-facing product away for free, capture the front end, and monetize through advertisers. Applied to medicine, the result is a free front door for clinicians and a paid back door for pharma.
How It Works: Why It Can Cite Its Sources
General-purpose LLMs (ChatGPT, Claude, Gemini, etc.) are trained on broad internet text. OpenEvidence takes the opposite turn: it deliberately constrains its training data to peer-reviewed medical literature.
Based on public information, the training corpus appears to be assembled from four categories:
- Peer-reviewed medical journals: full-text articles from over 300 journals, including NEJM, JAMA, Lancet, BMJ, Cell, and Nature.
- Clinical guidelines: the latest specialty-society guidelines from the American College of Physicians (ACP), American Heart Association (AHA), NCCN (oncology), IDSA (infectious diseases), and others.
- Primary regulatory sources: FDA package inserts and approval data, CDC infectious-disease guidance.
- Drug databases: interaction, dosing, and contraindication data.
A large language model is trained and fine-tuned on this corpus, and answers are generated via retrieval-augmented generation (RAG) over the relevant literature. Every answer comes back with DOI-anchored citations so the physician can jump straight to the primary source[^openevidence].
The key thing to understand is not "OpenEvidence doesn't hallucinate" — it's that the design lets the physician catch hallucinations immediately. No AI today can fully prevent hallucinations. OpenEvidence's choice is to always return citations, which forces critical appraisal back into the clinical workflow.
The contrast with general-purpose LLMs looks like this:
| Dimension | OpenEvidence | ChatGPT/Claude (general-purpose) |
|---|---|---|
| Training corpus | Peer-reviewed medical literature only | Broad web text |
| Citations | Always with DOIs | Optional, with hallucination risk |
| Guideline currency | Immediate via official partnerships | Bound by training cutoff |
| Primary user | Verified healthcare professionals | General users |
| Pricing | Free for physicians | Paid plans available |
| Regulatory posture | Used as an "educational tool" | Medical use discouraged |
In early 2026, Anthropic announced that Claude Opus 4.7 improved on medical benchmarks[^anthropic], and the medical task performance of general-purpose LLMs is steadily rising. Even so, OpenEvidence holds its ground in three areas: guideline version management, citation accountability, and a closed user base of verified clinicians.
Looking for AI training and consulting?
Learn about WARP training programs and consulting services in our materials.
How Japanese Physicians Actually Use It
"Can we use this from Japan?" is one of the most common questions I get. The short answer as of June 2026: yes. You go straight to openevidence.com, upload proof that you are a physician, and wait for verification. That's the basic flow.
Registration
- Access the web version: open https://www.openevidence.com/.
- Create an account: register an email address (personal addresses like Gmail are accepted, but institutional addresses tend to clear verification faster).
- Verify as a physician: upload an image of your medical license. Multiple reports confirm Japanese medical licenses are accepted[^decades].
- Specify specialty and affiliation: enter your clinical department and institution.
- Start using after approval: typically usable within 1–3 business days.
A guest mode is also available, so you can test the product before going through verification. Native iOS and Android apps are available for use during outpatient gaps.
Japanese-language Use
Starting in 2026, OpenEvidence supports Japanese-language input. If you ask a question in Japanese, it interprets English-language literature and returns an answer in Japanese. Citation links remain in English, so deeper reading still requires English-language fluency — but the language barrier at the summary layer has dropped significantly.
GIGAZINE ran a hands-on test on June 1, 2026, entering complaints like "I've recently been feeling chronically fatigued with joint pain and don't understand the cause." OpenEvidence came back with a differential diagnosis list, the underlying papers for each candidate, and recommended workup, all with citations[^gigazine]. Using it as a diagnostic tool for the general public is out of scope; but for a physician using it to organize information, the Japanese-language experience is now fully serviceable.
Common Usage Patterns
A few standard prompt patterns have emerged in practice:
- Differential diagnosis: "50-year-old male, three weeks of dry cough and exertional dyspnea. Normal CRP, normal ECG. Give me five differential diagnoses, each with supporting literature."
- Drug selection: "Treatment options for severe COVID-19 in a 20-week pregnant patient, based on the latest guidelines."
- Adverse event review: "Drug-drug interaction between Drug A and Drug B, with case reports from the past five years."
- Patient explanation: "Adjuvant therapy options for Stage III breast cancer, summarized in plain English for a patient."
Although the product is built for physicians, it is also being used adjacent to clinical work — drafting patient-facing explanations, scoping research questions for clinical studies, and so on.
[Applying to enterprise AI]
OpenEvidence's design — "specialize in peer-reviewed literature and always cite sources" — translates cleanly to other industries. Law firms want case-law AI, manufacturers want technical-literature AI, and financial institutions want regulatory-document AI. Demand for domain-corpus-specialized AI is forming everywhere.
TIMEWELL's enterprise AI agent ZEROCK is an AI agent specialized in confidential corporate knowledge — internal documents, contracts, policies, patents. It runs on AWS servers in Japan, is excluded from LLM training, and attaches source URLs to every answer.
How It Differs from ChatGPT, Claude, and UpToDate
To understand OpenEvidence's position, you have to compare it with both general-purpose LLMs and existing clinical tools. The two questions I hear most often are "Isn't ChatGPT enough?" and "How is it different from UpToDate?"
Versus ChatGPT and Claude
General-purpose LLMs are designed to "answer anything, while taking no responsibility for medicine." Their terms of service warn users not to rely on them for medical decisions. In a space where a hallucination can directly affect clinical judgment, both contractual and operational barriers to adoption are high.
OpenEvidence takes the opposite stance: "only answer medical questions, and own that scope." It draws four lines — verified physicians only, mandatory citations, positioning as an educational tool, and explicit separation from FDA-cleared clinical-judgment functions. Where general-purpose LLMs go "wide and shallow," OpenEvidence chose "narrow and deep."
Versus UpToDate
UpToDate, from Wolters Kluwer, is a clinical decision-support system with nearly three decades of history. A team of specialist physicians manually writes and updates peer-reviewed articles, so trust is extremely high. The downside is that the path to an answer is long, and the experience isn't conversational.
The differences look like this:
| Axis | OpenEvidence | UpToDate |
|---|---|---|
| Modality | Agentic (conversational) | Database (article browsing) |
| Information sources | Peer-reviewed papers + guidelines | Articles written by specialists |
| Update cadence | Continuous (corpus updates) | Per-article, on a rolling basis |
| Price (for physicians) | Free | Paid subscription |
| Search experience | Question to direct answer with citations | Keyword to article to comprehension |
| Strength | Speed and breadth | Editorial trust |
Rather than replacing UpToDate, OpenEvidence is settling into a different niche: "I have three minutes between outpatient cases and I need a decision now" goes to OpenEvidence; "I want to sit down and read an entire field deeply" stays with UpToDate. Many physicians use both, and OpenEvidence has not displaced existing tools.
The "Differential Diagnosis on Turn 1" UX
One reason clinicians love OpenEvidence is the UX choice to return a differential diagnosis list on the very first turn. You don't get search results or a paper list — you get "the conditions to suspect are these, here is the supporting literature for each." It runs in parallel with the physician's own cognitive process (form a differential, then test it), and that pairing is hard to achieve in a traditional database-style tool.
What to Watch Out for in Japan: Insurance Coverage and Domestic Guidelines
It is easy to come away thinking "if I can use it in Japan, I should." Use among Japanese physicians is indeed rising, but operational caveats matter.
"Latest Standard of Care" That Isn't Approved or Reimbursed in Japan
Because OpenEvidence reflects the latest U.S. evidence, drugs and protocols recommended as standard of care in the U.S. are often unapproved or not reimbursed in Japan. Obesity treatments, some cancer immunotherapies, and rare-disease gene therapies routinely take years to move from U.S. approval to inclusion in Japan's national health insurance.
If you take an OpenEvidence answer and present it as "the latest standard of care" to a patient, the treatment may not be covered by insurance, may require self-pay private practice, or may not even be available in Japan. The evidence has to be critically appraised and translated through Japanese guideline and reimbursement reality.
Differences with Japanese Specialty-Society Guidelines
Japan's cancer guidelines, cardiology guidelines, and diabetes guidelines are developed through their own evidence-evaluation and consensus processes. Even when looking at the same RCTs, recommendation grades sometimes diverge from NCCN or European/American guidelines.
OpenEvidence prefers U.S. specialty-society guidelines, so its answers can clash with domestic recommendations. In that case, the question isn't "which one is right" but "what evidence and context lies behind each recommendation." That is the physician's interpretive work. As an educational or reference tool, OpenEvidence is fine; as a substitute for memorizing guidelines, it is dangerous.
Where It Sits in Japan's Regulatory Environment
In Japan, AI used as a medical device falls under the SaMD (Software as a Medical Device) regime of the Pharmaceuticals and Medical Devices Act. The line between products that hold PMDA approval as medical device programs and those that don't is strictly enforced[^pmda].
OpenEvidence does not currently hold Japanese SaMD approval and should be positioned as an "educational information tool for healthcare professionals." Treating it as a billable diagnostic AI or recording in a chart that "OpenEvidence made the diagnosis" is out of scope.
For domestic references, the Ministry of Health, Labour and Welfare's September 2024 "Guidelines for the use of medical digital data in AI research and development"[^mhlw] is one anchor. So is the HAIP/CIP "Generative AI Use Guidelines in Healthcare, Version 2"[^haip]. On the U.S. side, the FDA had cleared more than 1,350 AI/ML SaMD products by early 2026, showing how fast regulation and implementation are converging[^fda].
Physician Liability and the Limits of AI
OpenEvidence's own terms of service repeatedly state that final clinical decisions are the physician's responsibility. If clinical harm results from following an AI-generated answer, OpenEvidence does not assume liability. Structurally, this is the same as the terms of any general-purpose LLM. The institutions that deploy it have to write internal operating rules that keep usage inside the boundary of "assistive tool."
A "Literature AI" Approach Companies Can Apply to Knowledge Management
So far we've looked at how OpenEvidence is built and how it operates. Now the business question: how do enterprise AI teams apply this?
Four Principles to Extract from OpenEvidence's Design
The reasons OpenEvidence took hold in clinical practice boil down to four principles:
- Specialization in a domain corpus. Training data was constrained from "the general web" to "peer-reviewed medical literature."
- Mandatory cited answers. Every answer carries a DOI so the user can verify the source.
- Risk control through user verification. Deep features only unlock once the user is verified as a physician.
- Official partnerships with industry canon. Official partnerships with NEJM and JAMA give "correctness" social backing.
The same four principles are reproducible in any specialized domain. Case-law and statute AI for law firms, technical-literature and patent AI for manufacturers, regulatory-document and compliance AI for financial institutions, project-deliverable AI for consulting firms — all of them can be built from "domain-specialized corpus + cited answers + access control + partnership with the canon."
Three Enterprise Problems General-Purpose LLMs Cannot Solve
In the field, I get the same three problems on repeat:
Problem 1: Risk of confidential data leakage. When employees paste internal documents into ChatGPT or Claude, even if the contract excludes training, it is hard to guarantee "zero possibility of leakage." Customer NDAs and the information-management policies of listed companies don't always line up.
Problem 2: Your own knowledge isn't in the model. General-purpose LLMs are trained on past internet content; they don't know your internal documents. Pasting context into every prompt is brute force — not reproducible, and it does not scale.
Problem 3: Answers without citations are scary. General-purpose LLMs don't attach sources by default, so users have no real way to verify whether an answer is correct. At the level of board decisions or legal judgments, that is almost always a blocker.
OpenEvidence solved exactly these three in medicine with "domain specialization + citations + verification." Carrying that thinking into enterprise confidential-document management is the design behind TIMEWELL's enterprise AI agent ZEROCK.
How ZEROCK Is Designed
ZEROCK is an AI agent specialized in confidential corporate knowledge — internal documents, contracts, policies, patents, meeting minutes. Mapped against the OpenEvidence principles, it looks like this:
| OpenEvidence principle | ZEROCK implementation |
|---|---|
| Specialization in a domain corpus | Tenant-isolated design that only references the customer's own documents |
| Mandatory cited answers | Every answer auto-links back to the source file and the relevant section |
| Risk control through user verification | SSO, permission management, and audit logs for role-based access |
| Official partnerships with industry canon | Customer's "official documents" registered as trusted sources |
In addition, ZEROCK runs on AWS servers in Japan, and input data is excluded from LLM training contractually and technically. Even in healthcare, finance, or public sectors where domestic data residency is mandatory, the architecture clears procurement requirements.
ZEROCK uses GraphRAG for relationship-aware retrieval, so it can answer based on document relationships rather than simple keyword matching. Questions like "what are the renewal conditions of that contract" or "what is the amendment history of that policy" — the kind that search-based RAG tends to miss — are addressable.
Just as OpenEvidence added a new layer to the front end of U.S. medicine, I expect "an AI agent specialized in our own knowledge" to become a standard layer in enterprise operations. The choice isn't binary between banning and allowing general-purpose LLMs — it's about building a specialized AI layer alongside.
[Who this is for]
- You want to train AI on your internal documents, contracts, and policies for safe knowledge search.
- You're worried about confidential information leaking through ChatGPT or Claude.
- Users tell you "the AI doesn't show sources" and "hallucinations are scary."
- You're in a law firm, manufacturer, or financial institution that needs domain-specialized AI.
- You need to run on AWS servers in Japan (healthcare, finance, public sector).
If even one of these applies, a 30-minute ZEROCK consultation will get you to a concrete use-case map.
→ Book a 30-minute ZEROCK consultation / → See the ZEROCK service page
Summary
OpenEvidence took a separate evolutionary path from general-purpose LLMs and shipped a medical AI built on "domain-corpus specialization + citations + physician verification." The fact that it has reached 40% of U.S. physicians and 15 million monthly consultations shows that domain-specialized AI is no longer a fad — it can graduate into social infrastructure.
It is also open to Japanese physicians: Japanese-language input and physician-license verification now make it usable in Japan. At the same time, the gap with Japanese national health insurance, the differences with domestic guidelines, and the SaMD regulatory positioning all require operational care.
The lessons here are not limited to medicine. "Specialization in a domain corpus," "cited answers with source URLs," "risk control through user verification," and "official partnerships with industry canon" all transfer directly to enterprise AI for confidential documents. Enterprise AI leaders worn down by the general-purpose-LLM debate should look closely at how OpenEvidence is built and translate it into their own domain.
At TIMEWELL, we have productized that thinking as ZEROCK. If reading about OpenEvidence makes you want to redesign your company's knowledge-management AI, please get in touch any time.
Related Articles
- ZEROCK: An Enterprise AI Agent for Confidential Knowledge Management
- Enterprise AI Agents Compared: Selection Criteria and Major Domestic Products
- Enterprise AI Glossary: Key Concepts You Should Know
References and Footnotes
[^isao]: ISAO International CPA Office, "OpenEvidence Raises 200 Million Dollars" https://isaocpa.com/ai/807/
[^decades]: decades.co.jp, "What is OpenEvidence? The Medical AI Used by 40% of U.S. Physicians" (February 2026) https://decades.co.jp/openevidence-clinical-guide-202602/
[^mhlw]: Ministry of Health, Labour and Welfare, "Guidelines for the Use of Medical Digital Data in AI Research and Development" (September 2024) https://www.mhlw.go.jp/content/001310044.pdf
[^fda]: IntuitionLabs, "FDA AI/ML SaMD Guidance Complete 2026 Compliance Guide" https://intuitionlabs.ai/articles/fda-ai-ml-samd-guidance-compliance
[^openevidence]: OpenEvidence official site https://www.openevidence.com/about
[^anthropic]: Anthropic, Claude Opus 4.7 announcement https://www.anthropic.com/news/claude-opus-4-7
[^pmda]: PMDA, "Medical Device Programs" https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/0000179749_00004.html
[^haip]: HAIP/CIP, "Generative AI Use Guidelines in Healthcare, Version 2" https://haip-cip.org/assets/documents/nr_20241002_02.pdf
[^gigazine]: GIGAZINE, "We Asked OpenEvidence About Long-Standing Symptoms" (June 1, 2026) https://gigazine.net/news/20260601-openevidence-medical-ai-review/
[^cubec]: Cubec, Major Update Announcement https://prtimes.jp/main/html/rd/p/000000007.000124652.html
![[June 2026 Update] OpenEvidence Complete Guide: How the Medical AI Used by 40% of U.S. Physicians Works, How to Use It in Japan, and What It Means for Enterprise AI](/images/columns/open-evidence-ai-medical.png)