Reference GuideLast reviewed April 2026

Large Language Models for Orthopedic Trainees: What’s Safe, What’s Not

A reference guide by OSCRSJ for residents, fellows, and medical students on practical and ethical use of large language models in training, research, and clinical work.

The question is not whether orthopedic trainees should use large language models. Most already do. The question is where use is appropriate, where it is risky, and where it is explicitly prohibited by current publishing and practice standards. This guide is a working reference, not an endorsement of any particular tool or workflow. It reflects the state of the field at the time of writing and will be updated as authoritative guidance evolves.

What a large language model actually is

A large language model is a neural network trained to predict text. Given an input, it produces a plausible continuation. Models such as ChatGPT, Claude, and DeepSeek can summarize journal articles, draft clinical notes, answer study questions, and generate code. They are not databases, search engines, or peer reviewers. They do not have live access to PubMed, to hospital records, or to the most recent literature unless they are connected to an external retrieval system. Output quality depends heavily on the prompt, and the same model can return different answers to functionally identical questions. Most critically, large language models can produce fluent text that is false. This failure mode, known as hallucination, is a well-documented property of the architecture.

Where LLM use is generally safe for trainees

Three categories of use are low risk when output is verified against primary sources.

Studying and comprehension

A language model can explain concepts, generate practice questions, and rephrase dense passages from a textbook or paper. Trainees should still verify any factual claim against the source material before relying on it for exams or clinical decisions.

Writing assistance on trainee-authored work

Language models can improve clarity, grammar, and flow in a draft the trainee has already written. This is consistent with ICMJE guidance as long as the model is acknowledged and the human author remains responsible for every claim in the final text. Language models cannot be listed as authors. They do not meet ICMJE authorship criteria because they cannot take accountability for the work.

Code and statistical assistance for research

Language models are frequently used to write or debug analysis code in R or Python. Code should be read, understood, and tested before it is applied to real data, and the use should be disclosed if the resulting work is published.

Where LLM use is not safe

Several categories of use are either prohibited or carry material clinical and academic risk.

Entering protected health information

Most consumer language model interfaces transmit input data to the vendor’s servers, where it may be retained or used for training. Entering patient names, medical record numbers, dates of birth, or other identifiable data into a public model is a HIPAA violation in US institutions and a parallel violation under most international frameworks. HIPAA-compliant enterprise agreements exist at some health systems; default consumer interfaces do not qualify.

Generating citations without verification

Language models routinely fabricate citations, producing plausible-looking journal names, author lists, volumes, and DOIs that do not correspond to real papers. Every citation produced by a language model must be verified in PubMed or on the publisher’s website before it enters a manuscript, case report, or clinical note. Submission of fabricated references is a research integrity violation and is grounds for manuscript rejection and retraction.

Answering guideline-sensitive clinical questions without checking the guideline

A language model may produce an answer that is close to but not fully concordant with the current AAOS clinical practice guideline, ACGME requirement, or specialty society position. Published benchmark studies have found that general-purpose models follow published guidelines in roughly 70 to 85 percent of cases, depending on topic. Trainees should treat model output as a starting point for literature review, not as a substitute for the guideline itself.

Patient-facing communication without physician review

A language model can draft patient education materials, discharge summaries, or letter responses. These outputs must be reviewed and edited by a licensed physician before they are given to a patient. Direct, unreviewed patient-facing use is prohibited in most institutional AI policies.

What the authoritative bodies say

Three sources currently define the boundaries of acceptable LLM use in biomedical research and practice.

The International Committee of Medical Journal Editors (ICMJE) states that artificial intelligence tools do not qualify for authorship and that use of AI in manuscript preparation must be disclosed. The journal is responsible for all content, and authors must be able to vouch for the accuracy and originality of the work. OSCRSJ follows the ICMJE standard on this point.

The World Association of Medical Editors (WAME) offers parallel guidance. AI-generated content must be disclosed. Chatbots cannot be authors. The corresponding author is responsible for any AI-assisted content.

The American Academy of Orthopaedic Surgeons (AAOS) has published position content on AI in orthopedic practice emphasizing that AI tools are adjuncts, not substitutes for clinical judgment, and that responsibility for patient care remains with the treating surgeon.

A working framework for trainees

Four principles summarize current safe use:

Verify every factual claim against a primary source before relying on it.
Cite the original literature, not the language model, and verify that every citation corresponds to a real paper.
Disclose AI assistance in any submitted manuscript per the target journal’s policy.
Never enter patient data into a consumer language model interface.

Use at the level of reading comprehension, writing polish, and code scaffolding is well within current norms. Use at the level of independent clinical decision-making, unverified citation generation, or patient communication is not.

How OSCRSJ handles AI-assisted submissions

Authors submitting to OSCRSJ must disclose the use of any large language model, generative AI system, or AI writing assistant in the preparation of the manuscript. Disclosure is made in the submission portal’s declarations step and reproduced in the published article. Disclosed use does not reduce the editorial or peer-review standard applied to the manuscript, and editors may request additional verification of methods, citations, or figures if AI involvement is substantial. Language models cannot be listed as authors. The corresponding author remains responsible for the accuracy of every claim, citation, and figure in the submission. Manuscripts found to contain fabricated citations or unverified AI-generated content are rejected and, if already published, retracted per COPE guidelines.

When this guide will be updated

This guide is reviewed each quarter and revised when ICMJE, WAME, AAOS, or relevant regulatory bodies update their positions. The current revision reflects guidance available as of April 2026.

Companion: Imaging Primer for Residents →AI in Orthopedics Glossary →

Writing an AI-assisted manuscript?

OSCRSJ accepts case reports and series on novel AI-assisted diagnoses and surgical planning, with clear AI-disclosure policy. Full APC fee waiver for manuscripts submitted before August 1, 2026.

Submit a manuscript

This guide is an editorial reference for educational purposes. It is not legal, regulatory, or institutional compliance advice. Readers should consult their local institutional AI, research integrity, and HIPAA compliance offices before adopting new workflows.