文章主题:文章, 人工智能, 临床

666AI工具大全,助力做AI时代先行者!

Using ChatGPT to write patient clinic letters

The Lancet Digital Health

Article

Early Recent, Mar 07, 2023

https://doi.org/10.1016/S2589-7500(23)00048-1

本文由“天纳”临床学术信息人工智能系统自动翻译

点击文末“阅读原文”下载本文PDFThe appropriate recording and communication of clinical information between clinicians and patients are of paramount importance. Recently there has been a much-needed drive to improve the information that is shared with patients.[1] However, the preparation of clinical letters can be time consuming. Although there has been an increase in the use of letter templates and voice recognition systems, with the aim of improving efficiency, novel technologies such as natural language processing (NLP) and artificial intelligence (AI) have the power to revolutionise this area of practice. NLP algorithms are designed to recognise and understand the structure and meaning of human language, classify texts according to their content or purpose, and generate responses that are appropriate and coherent.[2] OpenAIs ChatGPT chatbot was launched in November, 2022, and uses NLP technology to generate human-like text. The generative pre-trained transformer (GPT) language model is based on a transformer architecture, which allows it to process large amounts of text data and generate coherent text outputs by learning the relationships between input and output sequences. The GPT language model has been trained on large datasets of human language, with several studies demonstrating that it is very good at generating high-quality and coherent text outputs.[3],  [4],  [5] As a result, AI, like ChatGPT, has the potential to produce high quality clinical letters that are comprehendible by patients while improving efficiency, consistency, accuracy, patient satisfaction, and deliver cost savings to a health-care system. In this Comment we describe the early adoption and evaluation of ChatGPT-generated clinical letters to patients with limited clinical input. The aim was to evaluate the readability, factual correctness, and humanness of ChatGPT-generated clinical letters to patients, using the example of skin cancer as the most common human cancer.临床医生和患者之间临床信息的适当记录和沟通至关重要。最近,人们迫切需要改进与患者共享的信息。[1] 然而,临床信函的准备可能很耗时。尽管字母模板和语音识别系统的使用有所增加,但为了提高效率,自然语言处理(NLP)和人工智能(AI)等新技术有能力彻底改变这一实践领域。NLP算法旨在识别和理解人类语言的结构和含义,根据文本的内容或目的对其进行分类,并产生适当和连贯的响应。[2] OpenAI的ChatGPT聊天机器人于2022年11月推出,使用NLP技术生成类人文本。生成预训练转换器(GPT)语言模型基于转换器架构,该架构允许它处理大量文本数据,并通过学习输入和输出序列之间的关系来生成连贯的文本输出。GPT语言模型已经在人类语言的大型数据集上进行了训练,几项研究表明,它非常善于生成高质量和连贯的文本输出。[3] ;[4] ;[5] 因此,人工智能与ChatGPT一样,有可能产生高质量的临床信函,让患者能够理解,同时提高效率、一致性、准确性、患者满意度,并为医疗保健系统节省成本。在这篇评论中,我们描述了ChatGPT生成的临床信函的早期采用和评估,这些信函发给临床输入有限的患者。目的是以皮肤癌症为最常见的人类癌症为例,评估ChatGPT生成的给患者的临床信函的可读性、事实正确性和人性。We created a series of different clinical communication scenarios that covered the remit of a clinicians skin cancer practice. To simulate how clinicians might use ChatGPT in the clinical environment we created shorthand instructions to input into the chatbot, which we defined as limited clinical input, because the input is small compared with the relative amount of natural free text a clinician would otherwise be required to write or dictate to generate a clinical letter. In the USA, it is recommended that patient-facing health literature be written at or below a sixth grade level (age 11–12 years).[6] There are no specific guidelines on this in the UK. In view of this, all letters were instructed to be written at a reading age of 11–12 years. We sought to evaluate the capabilities of ChatGPT by presenting it with a series of instructions of increasing complexity (appendix p 1). These instructions ranged from simply following specific directions, to using national guidelines and data from these guidelines to provide clinical advice in the letter—eg, the management of anticoagulation peri-operatively. After submitting the instructions, ChatGPT then generated a response in the form of a clinical letter to be issued to the patient. The online tool readable was used to evaluate the readability of letters with commonly used formulae as described in many other studies.[7],  [8] Factual correctness and humanness of letters were assessed by two independent clinicians using a Likert scale ranging from 0 to 10, with 0 representing completely incorrect or inhuman and 10 representing completely correct and human. Error analysis was performed using linear regression. We used two separate generalised linear models (GLMs) to investigate the effect of the predictor variables of cancer type (basal cell carcinoma [BCC] as reference category), general commands, specific guidelines, and general guidelines on the outcome variables, and median humanness in the first GLM and median correctness in the second GLM. Statistical analysis was done in R (version 4.1.1) p<0·001 was deemed statistically significant.我们创建了一系列不同的临床交流场景,涵盖了临床医生癌症皮肤实践的职权范围。为了模拟临床医生在临床环境中如何使用ChatGPT,我们创建了速记指令来输入聊天机器人,我们将其定义为有限的临床输入,因为与临床医生生成临床信函所需的相对自然自由文本量相比,输入量很小。在美国,建议面向患者的健康文献应达到或低于六年级水平(11-12岁)。[6] 英国对此没有具体的指导方针。鉴于此,所有信件都被要求在11-12岁的阅读年龄时书写。我们试图通过向ChatGPT提供一系列日益复杂的指令来评估它的能力(附录P1)。这些说明包括简单遵循特定指示,使用国家指南和这些指南中的数据在信中提供临床建议,例如围手术期抗凝治疗的管理。在提交指令后,ChatGPT随后以临床信函的形式生成了一份回复,发给患者。在线工具可读性用于评估字母的可读性,如许多其他研究中所述,使用常用公式。[7] ;[8] 两名独立临床医生使用0至10的Likert量表评估了字母的事实正确性和人性,0表示完全不正确或不人道,10表示完全正确和人性。使用线性回归进行误差分析。我们使用两个单独的广义线性模型(GLM)来研究癌症类型(基底细胞癌[BCC]作为参考类别)的预测变量、一般命令、特定指南和一般指南对结果变量的影响,以及第一个GLM中的中位人性和第二个GLM的中位正确性。在R(版本4.1.1)p<;0.001被认为具有统计学意义。• View related content for this article•;查看此文章的相关内容38 hypothetical clinical scenarios were created, seven of which pertained to BCC, 11 to squamous cell carcinoma (SCC), and 20 to malignant melanoma (MM; appendix p 2). Overall, the readability scores suggest that the text might be suitable for a varying reading ability, and the mean readability age for the generated letters was at a USA ninth grade (aged 14–15 years) and considered by the US Department of Health and Human Services as average difficulty (appendix p 2). Overall median correctness of the clinical information contained in the letter was 7 (range 1–9). Overall median humanness of the writing style was 7 (5–9; appendix pp 3–4). The weighted κ for correctness was 0·80 (p<0·0001) and humanness was 0·77 (p<0·0001). For median correctness, ANOVA howed a statistically significant difference among the groups (F2,35=10·1, p=0·00035). The Tukey honestly significant difference (HSD) test showed that the mean difference between the MM and BCC groups was statistically significant (–2·71 [95% CI –4·32 to –1·11], p<0·0001). There was no statistically significant difference between the SCC and BCC groups (p=0·31) or the SCC and MM groups (p=0·016). For median humanness, ANOVA showed a statistically significant difference among the groups (F2,35=27·76, p<0·0001). The Tukey HSD test showed that there was a statistically significant difference between the MM and BCC groups (–1·63 [–2·17 to –1·09], p<0·0001) and between the SCC and BCC groups (–1·43 [–2·03 to –0·83], p<0·0001). There was no statistically significant difference between the SCC and MM groups (p=0·55).创建了38个假设的临床场景,其中7个与基底细胞癌有关,11个与鳞状细胞癌(SCC)有关,20个与恶性黑色素瘤(MM;附录P2)。总体而言,可读性得分表明,文本可能适合不同的阅读能力,生成的字母的平均可读年龄为美国九年级(14-15岁),美国卫生与公众服务部将其视为平均难度(附录P2)。信中所含临床信息的总体中位正确性为7(范围1-9)。写作风格的总体中位人性为7(5-9;附录第3-4页)。正确性的加权κ为0.80(p<0.0001),人性为0.77(p<0.001)。对于中位数正确性,方差分析显示各组之间存在统计学显著差异(F2,35=10.1,p=0.00035)。Tukey诚实显著性差异(HSD)检验显示,MM组和BCC组之间的平均差异具有统计学意义(-2.71[95%CI–4.32至-1.11],p<;0.0001)。SCC组和BCC组(p=0.31)或SCC组和MM组(p=0.016)之间无统计学显著差异。对于中位人性,方差分析显示各组之间存在统计学显著差异(F2,35=2.76,p<;0.0001)。Tukey HSD检验显示,MM组和BCC组之间(-1.63[–2.17至–1.09],p<;0.0001)以及SCC组和BCC组之间(–1.43[–2.03至–0.83],p<:0.0001)存在统计学显著差异。SCC组和MM组之间没有统计学上的显著差异(p=0.55)。Results of the GLM for median humanness showed that cancer type was a significant predictor, with the MM coefficient being –1·45 (SE 0·30) and the SCC coefficient being –1·32 (0·28; both p<0·0001). The general commands, specific guidelines, and general guidelines variables were not significant predictors of median humanness. The multiple R2 value of 0·622 indicated that the model explained 62·23% of the variance in median humanness, and the F-statistic of 10·55 and corresponding p<0·0001 indicated that the model was significant overall. In the GLM for median correctness, MM was found to be significantly associated with median correctness with a coefficient of –2·64 (SE 0·90; p<0·0001). The other predictors were not significantly associated with median correctness. The multiple R2 value of 0·3729 suggested that the model explained approximately 37·29% of the variance in median correctness. The F-statistic and corresponding p value were used to test the overall significance of the model, and p<0·0001 indicated that the model is significantly different from a model with no predictors.中位人性的GLM结果显示,癌症类型是一个显著的预测因子,MM系数为-1.45(SE为0.30),SCC系数为-1.32(0.22;两者p<0.001)。一般指令、特定指导原则和一般指导原则变量不是中等人性的显著预测因素。R2=0.622的倍数值表明该模型解释了62.23%的中位人性方差,F统计量为10.55,相应的p<;0.0001表明该模型总体上是显著的。在中位数正确性的GLM中,MM与中位数正确性显著相关,系数为-2.64(SE 0.90;p<;0.0001)。其他预测因素与中位数正确性无显著相关性。R2的倍数值为0.3729,表明该模型解释了中值正确性约37.29%的方差。使用F统计量和相应的p值来检验模型的总体显著性;0.0001表明该模型与没有预测因子的模型有显著差异。This pilot assessment shows that it is possible to generate clinic letters with a high overall correctness and humanness score with ChatGPT. Furthermore, these letters were written at a reading level that is broadly similar to current real-world human generated letters.[9] The ability of AI to generate clinical letters as an alternative to those written by clinicians raises important considerations for the quality and effectiveness of health-care communication. It is important that potential risks, such as omissions or errors, which might have serious consequences for patient care, are mitigated. The incorrect reporting of results or interpretation of treatment guidelines could affect patient morbidity and mortality. To mitigate these risks, it is important for the use of AI in health care, including the automated generation of clinical letters, be carefully regulated and monitored. In the early stages of adoption of such new technologies it is necessary to continue with a human-in-the-loop approach, whereby the outputs of such systems are carefully verified by health-care providers. To responsibly incorporate ChatGPT or similar generic AI systems into the clinical workflow, one approach could be to use voice-to-text recognition software with limited human input, followed by rapid clinician editing of the generated letter. This approach could be a feasible starting point for exploring the potential applications of this technology while also addressing any potential risks. Further studies are needed to assess the effectiveness of AI-generated clinical letters in real-world clinical settings and to compare the performance of various AI systems against one another. It is probable that an AI system with a greater medical focus will yield improved results. Additionally, the quality and writing style of input provided to any chatbot should be assessed in a range of settings, including different languages, resource levels, and cultural contexts.这项试点评估表明,使用ChatGPT可以生成总体正确性和人性化得分较高的诊所信函。此外,这些信件的阅读水平与当前现实世界中人类生成的信件大致相似。[9] 人工智能生成临床信函作为临床医生书写信函的替代品的能力,为医疗保健沟通的质量和有效性提出了重要考虑。重要的是要减轻可能对患者护理产生严重后果的潜在风险,如遗漏或错误。对结果的错误报告或对治疗指南的解释可能会影响患者的发病率和死亡率。为了减轻这些风险,对人工智能在医疗保健中的使用,包括临床信函的自动生成,进行仔细的监管和监测是很重要的。在采用这些新技术的早期阶段,有必要继续采用人在环的方法,由医疗保健提供者仔细核实这些系统的产出。为了负责任地将ChatGPT或类似的通用人工智能系统纳入临床工作流程,一种方法可以是使用人工输入有限的语音到文本识别软件,然后由临床医生快速编辑生成的信件。这种方法可以成为探索这项技术潜在应用的可行起点,同时也可以解决任何潜在风险。需要进一步的研究来评估人工智能生成的临床信函在现实世界临床环境中的有效性,并将各种人工智能系统的性能相互比较。一个更专注于医学的人工智能系统很可能会产生更好的结果。此外,提供给任何聊天机器人的输入的质量和写作风格都应该在一系列环境中进行评估,包括不同的语言、资源水平和文化背景。Caution must be exercised and potential risks must be proactively addressed to ensure the safety and quality of patient care while introducing such an important technical advancement.在引入如此重要的技术进步的同时,必须谨慎行事,积极应对潜在风险,以确保患者护理的安全和质量。We declare no competing interests. SRA and TDD are funded by the Welsh Clinical Academic Training Fellowship. SRA received a Paton Masser grant from the British Association of Plastic, Reconstructive and Aesthetic Surgeons to support this work. ISW is the surgical Specialty Lead for Health and Care Research Wales; the Chief Investigator for the Scar Free Foundation & Health and Care Research Wales Programme of Reconstructive and Regenerative Surgery Research; an associate editor for the Annals of Plastic Surgery; an editorial board member of BMC Medicine and numerous other editorial board roles. Ethical committee approval has been obtained from Swansea University Medical School Research Ethics Subcommittee (2020-0025). The study was performed in accordance with the Declaration of Helsinki.我们声明没有相互竞争的利益。SRA和TDD由威尔士临床学术培训奖学金资助。SRA获得了英国整形、重建和美容外科医生协会的Paton Masser资助,以支持这项工作。ISW是威尔士健康与护理研究的外科专业负责人;无疤痕基金会首席研究员;威尔士健康与护理研究重建和再生外科研究方案;《整形外科年鉴》副主编;BMC Medicine的编委会成员和许多其他编委会角色。伦理委员会已获得斯旺西大学医学院研究伦理小组委员会(2020-0025)的批准。这项研究是根据赫尔辛基宣言进行的。Download .pdf (.43MB)Help with pdf filesSupplementary appendix下载.pdf(.43 MB)pdf文件帮助补充附录

References

1. General Medical councilDomain 3: communication partnership and teamwork.https://www.gmc-uk.org/ethical-guidance/ethical-guidance-for-doctors/good-medical-practice/domain-3—communication-partnership-and-teamwork#paragraph-31Date accessed: December 22, 20222. Jurafsky D,Martin JHSpeech and language processing.3. Brown T,Mann B,Ryder N,et al.Language models are few-shot learners.https://doi.org/10.48550/arXiv.2005.141654. Radford A,Wu J,Child R,et al.Exploring the limits of transfer learning with a unified text-to-text transformer.https://doi.org/10.48550/arXiv.1910.106835. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. NIPS17: Proceedings of the 31st International Conference on Neural Information Processing Systems. December 2017 (abstr 5998–6008).6. US Department of Health & Human ServicesNational action plan to improve health literacy.https://health.gov/sites/default/files/2019-09/Health_Literacy_Action_Plan.pdfDate: 2010Date accessed: December 22, 20227. Burke V,Greenberg DDetermining readability: how to select and apply easy-to-use readability formulas to assess the difficulty of adult literacy materials.Adult Basic Educ Lit. 2010; 4:34-428. Wang L-W,Miller MJ,Schmitt MR,Wen FKAssessing readability formula differences with written health information materials: application, results, and recommendations.Res Social Adm Pharm. 2013; 9:503-5169. Drury DJ,Kaur A,Dobbs T,Whitaker ISThe readability of outpatient plastic surgery clinic letters: are we adhering to plain English writing standards?.Plast Surg Nurs. 2021; 41:27-33

Using ChatGPT to write patient clinic letters

The Lancet Digital Health

Article

Early Recent, Mar 07, 2023

https://doi.org/10.1016/S2589-7500(23)00048-1

《早期近期研究:2023年3月7日》

文章, 人工智能, 临床

《早期近期研究:2023年3月7日》

AI时代,拥有个人微信机器人AI助手!AI时代不落人后!

免费ChatGPT问答,办公、写作、生活好得力助手!

搜索微信号AIGC666aigc999或上边扫码,即可拥有个人AI助手!