- Research
- Open access
- Published:
The potential of artificial intelligence reading label system on the training of ophthalmologists in retinal diseases, a multicenter bimodal multi-disease study
BMC Medical Education volume 25, Article number: 503 (2025)
Abstract
Objective
To assess the potential of artificial intelligence reading label system on the training of ophthalmologists in a multicenter bimodal multi-disease study.
Methods
The accuracy of 16 ophthalmologists with study duration ranging from one to nine years across multiple annotation rounds and its correlation with the number of rounds and ophthalmology study duration were analyzed. Additionally, this study evaluated the concordance between optical coherence tomography (OCT) or color fundus photography (CFP) and final case diagnosis.
Results
The study involved 7777 pairs of OCT and CFP images, cases labeled with nine prevalent retinal diseases including diabetic retinopathy (DR, 2118 cases), retinal detachment (RD, 121 cases), retinal vein occlusion (RVO, 886 cases), dry age-related macular degeneration (dAMD, 549 cases), wet age-related macular degeneration (wAMD, 1023 cases), epiretinal membrane (ERM, 1061 cases), central serous retinopathy (CSC, 150 cases), macular schisis (MS, 128 cases), macular hole (MH, 86 cases) and normal fundus (1036 cases) were selected for further analysis. All images were assigned to 16 ophthalmologists over five rounds. The average diagnostic accuracy for the nine retinal diseases and normal fundus improved significantly across the five rounds (p = 0.013) and is closely correlated to the duration of ophthalmology study (p = 0.007). Furthermore, significant improvements were observed in the diagnostic accuracy of both OCT (p = 0.028) and CFP (p = 0.021) modalities as the number of rounds increased. Notably, OCT single modal diagnosis demonstrated higher consistency with the final diagnosis in cases of RD, ERM, MS, and MH compared to CFP, while CFP single modal diagnosis has higher consistency in DR, RVO and normal fundus.
Conclusion
The implementation of an artificial intelligence reading label system enhances the diagnostic accuracy of retinal diseases among ophthalmologists and holds potential for integration into future medical education.
Introduction
Initially introduced in 1956, artificial intelligence (AI) refers to the technology designed to mimic human behavior. AI has experienced rapid advancements, particularly in the domain of image recognition [1]. Ophthalmology is an ideal field for the application of AI, given that the diagnosis of ophthalmic diseases predominantly relies on image-based information [2]. To develop accurate and reliable AI models capable of diagnosing diseases, it is imperative to have a substantial volume of meticulously annotated image data. The quality of annotated data is crucial for training AI models with precise diagnostic capabilities, underscoring the significance of training annotators effectively [3].
A diverse group of ophthalmologists, varying in levels of experience, participated in the image annotation process. This approach not only aids in the development of AI models but also serves as a practical training method for ophthalmologists, providing exposure to a wide array of specific cases and corresponding images. It is important to investigate the potential of the annotation process on the training of ophthalmologists.
Bi-modality artificial intelligence study, utilizing paired OCT and CFP images, has the potential to enhance our understanding of retinal diseases and improve the efficiency of disease screening and diagnosis [4]. To the best of our knowledge, this is the first study to investigate the training effect of the AI annotation process in a multi-center, bimodality, and multi-disease context. This research examines the application of an artificial intelligence reading label system in the education of relatively junior ophthalmologists, specifically in retinal diseases. The findings may offer valuable insights into medical education in OCT and CFP learning in various retinal diseases.
Methods
Study design
A total of 7777 pairs of swept-source OCT (SS-OCT, VG200, SVision Imaging, Ltd., Luoyang, China) and CFP (Zeiss Clarus 500 fundus camera, Carl Zeiss Meditec, Jena, Germany) images centered on the macular region were loaded into the reading label system. Sixteen ophthalmologists including attending physicians and residents with study duration ranging from one to nine years from four teaching hospitals (Peking Union Medical College Hospital, Shandong Eye Hospital, First Affiliated Hospital of China Medical University, First Hospital of Xi’an) were divided into eight groups, and each group was assigned with a senior ophthalmologist (associated professor or professor), who checked annotation results and provided standard diagnosis. The study duration of ophthalmologists responsible for annotation was calculated from the start of residency to September 2023. All images were assigned to each group in 5 rounds. A training meeting introducing the standard operating procedure (SOP) was arranged before the annotation and the SOP document was distributed to all annotators. Stage summary meetings with a review and summary of the previous annotation were arranged after per round. The annotation process started from September 2023 and finished in October 2024. Our study agreed with the tenets of the Declaration of Helsinki. Ethics approval was approved from the PUMC Hospital Institutional Review Board (K3606). The PUMC Hospital Institutional Review Board has approved the waiver of informed consent for participants in this study.
Nine retinal diseases including diabetic retinopathy (DR), retinal detachment (RD), retinal vein occlusion (RVO), dry age-related macular degeneration (dAMD), wet AMD (wAMD), epiretinal membrane (ERM), central serous retinopathy (CSC), macular schisis (MS), macular hole (MH) and normal fundus were involved in analysis for accuracy and kappa.
Reading label system
The reading label system was originally developed for manual annotation for training AI deep learning model. It was a web-based annotation system. Readers logged in with their accounts and passwords, and the system randomly loaded a certain number of images. Pairs of OCT and CFP were assigned to annotators. For each case, the OCT pattern employed was a radial scan centered on the macula and all 18 B-scan images were available for annotators in the reading label system. After reading the OCT and CFP, readers gave their diagnosis based on merely OCT (OCT diagnosis) and merely CFP (CFP diagnosis), and provide final case diagnosis (bimodal diagnosis) on account of both OCT and CFP images.
Statistic methods
Statistical analysis was done using Stata (Stata/SE 17.0, StataCorp, TX 77,845, USA). Accuracy was calculated as the number of correctly diagnosed examples divided by the number of total examples. Kappa is a statistic that measures inter-modality agreement as previously described [5]. Considering that the accuracy of the same annotator was repeatedly measured in five rounds, generalized estimating equations (GEE) analysis was performed to estimate the correlation between the accuracy and the number of round and ophthalmology study duration of ophthalmologists. A p value less than 0.05 was considered significant.
Results
Description of images chosen for analysis
In total, 7777 pairs of CFP and OCT images were assigned to 16 ophthalmologists in five rounds. Cases labeled with nine retinal diseases including diabetic retinopathy (DR, 2118 cases), retinal detachment (RD, 121 cases), retinal vein occlusion (RVO, 886 cases), dry age-related macular degeneration (dAMD, 549 cases), wet AMD (wAMD, 1023 cases), epiretinal membrane (ERM, 1061 cases), central serous retinopathy (CSC, 150 cases), macular schisis (MS, 128 cases), macular hole (MH, 86 cases) and normal fundus (1036 cases) were included in the final analysis (Table 1).
The increase of accuracy in OCT/CFP bimodal diagnosis
The average accuracy of nine retinal diseases and normal fundus of bimodal diagnosis in 16 ophthalmologists increased during five annotation rounds (p = 0.013) (Fig. 1A and B) (Table 2). To be more specific, the diagnostic accuracy of RD (p = 0.028), wAMD (p = 0.009), MS (p < 0.001), and MH (p = 0.036) improved with number of rounds increasing. The correlation between disease accuracy and ophthalmology study duration was further analyzed. The average accuracy in nine retinal diseases and normal fundus of bimodal diagnosis was higher in ophthalmologists with longer study duration. Furthermore, such correlation was significant in RD (p = 0.031), CSC (p = 0.008) and MH (p = 0.029).
The increase of accuracy in OCT single modal diagnosis
The correlation between OCT single modal diagnostic accuracy and annotation rounds and ophthalmology study duration was further analyzed (Table 2). The average accuracy of merely OCT diagnosis in 16 ophthalmologists improved during five rounds (p = 0.028) (Fig. 1A and C). Moreover, the OCT single modal diagnostic accuracy in wAMD (p = 0.005) and MS (p < 0.001) improved significantly as the number of rounds increased. The average accuracy (p = 0.001), the accuracy of wAMD (p = 0.02), ERM (p = 0.003), CSC (p = 0.027), MS (p = 0.001) in OCT diagnosis increased with longer study duration.
The increase of accuracy in CFP single modal diagnosis
As for CFP single modal diagnosis (Table 2), the average accuracy (p = 0.021) (Fig. 1A and D), the accuracy of RD (p = 0.01), wAMD (p = 0.002), and ERM (p = 0.005) increased expressively with the increase of number of rounds. Moreover, the average accuracy (p = 0.003), the accuracy of DR (p = 0.009), RD (p = 0.008), RVO (p = 0.013), dAMD (p = 0.016), wAMD (p = 0.01), ERM (p = 0.048), CSC (p = 0.003) in CFP single modal diagnosis was meaningfully correlated with ophthalmology study duration.
The auxiliary diagnosis capability of CFP and OCT in case diagnosis
The consistency between CFP/OCT diagnosis and final case diagnosis was analyzed and considered as the auxiliary diagnosis capability of CFP and OCT with a statistic, kappa (Table 3). OCT performed much better than CFP in the diagnosis of RD (0.895 vs. 0.618) (Fig. 2A), ERM (0.966 vs. 0.520), MS (0.925 vs. 0.000) and MH (0.970 vs. 0.559) (Fig. 2B). Conversely, CFP performed better than OCT in the diagnosis of normal fundus (0.914 vs. 0.594), RVO (0.983 vs. 0.000) (Fig. 2C), and DR (0.988 vs. 0.000) (Fig. 2D).
Typical images with different labels in OCT and CFP. A1 and A2: CFP labeled with wAMD, OCT labeled wit wAMD and RD; B1 and B2: CFP labeled with ERM, OCT labeled with ERM and MH; C1 and C2: CFP labeled with BRVO, OCT labeled with normal fundus; D1 and D2: CFP labeled with DR, OCT labeled with normal fundus
Discussion
China is confronting the challenge of an aging population, with 18.7% of its populace aged 60 years or older as of 2021 [6]. An aging society tends to experience a heightened burden of non-communicable diseases [7]. Numerous retinal diseases are associated with older age, including diabetic retinopathy [8], retinal vein occlusion [9], age related macular degeneration [10], idiopathic epiretinal membrane [11], and idiopathic macular hole [12]. This demographic shift is likely to result in a significant shortage of ophthalmologists, underscoring the critical importance and urgency of enhancing the ophthalmology training process.
As non-invasive imaging techniques, OCT and CFP have emerged as crucial imaging techniques for the early detection of retinal pathological changes and for providing guidance in personalized treatment strategies. OCT offers high-resolution, cross-sectional imaging of the retina and optic disc [13], while captures high-resolution images of the retina, optic disc, and blood vessels with a wide field of view. The integration of OCT and CFP provides a more comprehensive understanding of retinal diseases.
In our multicenter, bimodal, multi-disease study, we aim to elucidate factors that improve diagnostic accuracy among ophthalmologists. An analysis of 7,777 pairs of OCT/CFP images was conducted to assess diagnostic accuracy of nine prevalent retinal diseases and normal fundus conditions. Results indicate that diagnostic accuracy improves with increased rounds of image labeling, and it is significantly influenced by the duration of ophthalmology training, the imaging modality used (OCT/CFP combination, OCT alone, or CFP alone), and the specific retinal disease being assessed. These findings may have implications for the teaching work for young ophthalmologists.
Under bimodal conditions, with the number of rounds increasing, the average accuracy improved, which demonstrated the training efficacy of our reading label system. This improvement was particularly pronounced for retinal diseases including RD, wAMD, MS, and MH, suggesting a significant teaching effect of the system for these four diseases. A strong correlation was observed between average accuracy and the duration of ophthalmology study, as anticipated. Notably, the correlation between accuracy and study duration was more pronounced for DR, ERM, and MS, indicating that recognizing these three diseases may require a more extended learning process. In terms of single-modal diagnosis, within the CFP single-modal setting, the average accuracy and the accuracy for most retinal diseases, including DR, RVO, dAMD, wAMD, ERM, and CSC, were correlated with the duration of study in CFP. Compared to OCT, CFP may necessitate a longer study period.
The consistency between single-modal and bimodal diagnostic accuracy was evaluated using the kappa statistic. In diagnosing RD, ERM, MS and MH, OCT proved to be a more reliable modality compared to CFP. The pathological changes in ERM, MS, and MH were more distinctly observed in the cross-sectional retinal images provided by OCT. Traditionally, CFP and ophthalmoscopy have been regarded as an appropriate method for detecting RD, however, OCT offers enhanced visualization of the neuroepithelial layer detachment and can even measure the distance between the neuroepithelial and retinal pigment epithelium (RPE) layers [14]. Detecting RD using CFP requires greater clinical expertise from annotators compared to OCT. In the single-modal CFP approach, the accuracy of RD diagnosis improved with increased annotation rounds and was significantly correlated with the duration of study, indicating that the interpretation of CFP is highly reliant on clinical experience.
AI has become a powerful tool in medical education, providing exclusive opportunities to improve learning process [15]. AI could make medical education more personalized and efficient, and has been widely used in undergraduate medical education [16]. For instance, generative AI could assist in enhancing learning experiences by providing personalized instruction, creating virtual patient scenarios for clinical practice, and generating educational content [17]. In radiology, AI could provide a higher volume exposure to imaging studies compared to traditional radiologist training, and help triage and sort through massive medical images [18]. Like radiology, ophthalmology is a subject highly depending on the interpretation of clinical images, like OCT and CFP. There are several AI applications with interactive online system portable and accessible to improve radiology learning process, which could provide a more precise education according to learner’s preferred styles and needs [19]. Our study indicated the training effect of AI reading label system and provide some information about the learning curve and characteristics of nine primary retinal diseases. In the future, more reliable ophthalmology AI learning systems and applications with high quality and meticulously annotated clinical images might help with the training of ophthalmologists.
The strength of this study lies in its multicenter, bimodal, and multi-disease design. The participation of ophthalmologists from four teaching hospitals enhances the generalizability of our findings within the context of medical education. Furthermore, the bimodal and multi-disease approach enriches the educational process by providing a broader spectrum of information. Our study demonstrates the effectiveness of the AI reading label system in diagnostic accuracy improvement. Given the rapid advancements in AI technology, the training efficacy of the AI reading label system is likely to become increasingly significant in the education of ophthalmologists.
Conclusion
The AI reading label system contributes to improving the diagnostic accuracy of retinal diseases among ophthalmologists. It holds potential for widespread application in future medical education in ophthalmology.
Data availability
The datasets generated and/or analyzed during the current study are not publicly available due to limitations of ethical approval involving the patient data and anonymity but are available from the corresponding author on reasonable request.
Abbreviations
- AI:
-
Artificial intelligence
- CFP:
-
Color fundus photography
- CSC:
-
Central serous retinopathy
- dAMD:
-
Dry age-related macular degeneration
- DR:
-
Diabetic retinopathy
- ERM:
-
Epiretinal membrane
- GEE:
-
Generalized estimating equations
- MH:
-
Macular hole
- MS:
-
Macular schisis
- OCT:
-
Optical coherence tomography
- RD:
-
Retinal detachment
- RPE:
-
Retinal pigment epithelium
- RVO:
-
Retinal vein occlusion
- SOP:
-
Standard operating procedure
- SS-OCT:
-
Swept-source OCT
- wAMD:
-
Wet age-related macular degeneration
References
Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88.
Li JO, Liu H, Ting DSJ, Jeon S, Chan RVP, Kim JE, et al. Digital technology, tele-medicine and artificial intelligence in ophthalmology: A global perspective. Prog Retin Eye Res. 2021;82:100900.
Hasei J, Nakahara R, Otsuka Y, Nakamura Y, Hironari T, Kahara N, et al. High-quality expert annotations enhance artificial intelligence model accuracy for osteosarcoma X-ray diagnosis. Cancer Sci. 2024;115(11):3695–704.
Wang S, He X, Jian Z, Li J, Xu C, Chen Y, et al. Advances and prospects of multi-modal ophthalmic artificial intelligence based on deep learning: a review. Eye Vis (Lond). 2024;11(1):38.
Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20(1):37–46.
China NBOSO. Communiqué of the Seventh National Population Census (No. 5). 2021.
Yip W, Fu H, Chen AT, Zhai T, Jian W, Xu R, et al. 10 Years of health-care reform in China: progress and gaps in universal health coverage. Lancet. 2019;394(10204):1192–204.
Cioana M, Deng J, Nadarajah A, Hou M, Qiu Y, Chen SSJ, et al. Global prevalence of diabetic retinopathy in pediatric type 2 diabetes: A systematic review and Meta-analysis. JAMA Netw Open. 2023;6(3):e231887.
Scott IU, Campochiaro PA, Newman NJ, Biousse V. Retinal vascular occlusions. Lancet. 2020;396(10266):1927–40.
Guymer RH, Campbell TG. Age-related macular degeneration. Lancet. 2023;401(10386):1459–72.
Fung AT, Galvin J, Tran T. Epiretinal membrane: A review. Clin Exp Ophthalmol. 2021;49(3):289–308.
Flaxel CJ, Adelman RA, Bailey ST, Fawzi A, Lim JI, Vemulakonda GA, et al. Idiopathic macular hole preferred practice Pattern(R). Ophthalmology. 2020;127(2):P184–222.
Pandya BU, Grinton M, Mandelcorn ED, Felfeli T. RETINAL OPTICAL COHERENCE TOMOGRAPHY IMAGING BIOMARKERS: A review of the literature. Retina. 2024;44(3):369–80.
Akpinar MH, Sengur A, Faust O, Tong L, Molinari F, Acharya UR. Artificial intelligence in retinal screening using OCT images: A review of the last decade (2013–2023). Comput Methods Programs Biomed. 2024;254:108253.
Alam F, Lim MA, Zulkipli IN. Integrating AI in medical education: embracing ethical usage and critical Understanding. Front Med (Lausanne). 2023;10:1279707.
Lee J, Wu AS, Li D, Kulasegaram KM. Artificial intelligence in undergraduate medical education: A scoping review. Acad Med. 2021;96(11S):S62–70.
Boscardin CK, Gin B, Golde PB, Hauer KE. ChatGPT and generative artificial intelligence for medical education: potential impact and opportunity. Acad Med. 2024;99(1):22–7.
Fischetti C, Bhatter P, Frisch E, Sidhu A, Helmy M, Lungren M, et al. The evolving importance of artificial intelligence and radiology in medical trainee education. Acad Radiol. 2022;29(Suppl 5):S70–5.
Duong MT, Rauschecker AM, Rudie JD, Chen PH, Cook TS, Bryan RN, et al. Artificial intelligence for precision education in radiology. Br J Radiol. 2019;92(1103):20190389.
Acknowledgements
The authors would like to extend special thanks to Jingyuan Yang, Shi Feng, Zhangwanyu Wei, Yishuang Mao, Xufeng Zhao, and Zhiyan Xu from Peking Union Medical College Hospital, Chunli Liu, Congcong Yang, Mengmeng Yu, Tong Su, Leyu Lv, and Qian Wang from Eye Hospital of Shandong First Medical University, Siqi Li and Xiaotong Zhang from the First Affiliated Hospital, Zejuan Song and Yan Suo from Xi’an No. 1 hospital for participating in the annotation process.
Funding
Our study was funded by National High Level Hospital Clinical Research Funding 2022-PUMCH-C-061.
Author information
Authors and Affiliations
Contributions
M.W. interpreted the data and drafted the manuscript. X.Z, X.G., T.S., H.F., G.D., C.L., B.W. checked annotation results and provided standard diagnosis. X.Z. and D.L. helped with image collection. Q.W. helped develop the statistical method and analyze data. C.Z provided the AI reading label system. C.W. and W.Y designed the study and revised the final manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Our study agreed with the tenets of the Declaration of Helsinki. Ethics approval was approved from the PUMC Hospital Institutional Review Board (K3606). The PUMC Hospital Institutional Review Board has approved the waiver of informed consent for participants in this study.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, M., Zhang, X., Li, D. et al. The potential of artificial intelligence reading label system on the training of ophthalmologists in retinal diseases, a multicenter bimodal multi-disease study. BMC Med Educ 25, 503 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12909-025-07066-1
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12909-025-07066-1