Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the United States With High-Risk Non-Muscle-Invasive Bladder Cancer.

Submitted by chb4040 on April 23, 2025 - 5:19pm

Title	Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the United States With High-Risk Non-Muscle-Invasive Bladder Cancer.
Publication Type	Journal Article
Year of Publication	2023
Authors	Narayan VM, Siolas D, Meadows ES, Turzhitsky V, Sillah A, Imai K, McMurry AJ, Li H
Journal	JCO Clin Cancer Inform
Volume	7
Pagination	e2300096
Date Published	2023 Sep
ISSN	2473-4276
Keywords	Adult, Cohort Studies, Humans, Male, Natural Language Processing, Non-Muscle Invasive Bladder Neoplasms, Retrospective Studies, United States, Urinary Bladder Neoplasms
Abstract	PURPOSE: Treatment of non-muscle-invasive bladder cancer (NMIBC) is guided by risk stratification using clinical and pathologic criteria. This study aimed to develop a natural language processing (NLP) model for identifying patients with high-risk NMIBC retrospectively from unstructured electronic medical records (EMRs) and to apply the model to describe patient and tumor characteristics. METHODS: We used three independent EMR-derived data sets including adult patients with a bladder cancer diagnosis in 2011-2020 for NLP model development and training (n = 140), validation (n = 697), and application for the retrospective cohort analysis (n = 4,402). Deep learning methods were used to train NLP recognition of medical chart terminology to identify seven high-risk NMIBC criteria; model performance was assessed using the F1 score, weighted across features. An algorithm was then used to classify each patient as high-risk NMIBC (yes/no). Manually reviewed records served as the gold standard. RESULTS: The F1 scores after model training were >0.7 for all but one uncommon feature (prostatic urethral involvement). The highest area under the receiver operating curves (AUC) was observed for Ta (0.897) and T1 (0.897); the lowest AUC was for carcinoma in situ (CIS; 0.617). For high-risk NMIBC classification, positive predictive value was 79.4%, negative predictive value was 93.2%, and false-positive rate was 8.9%. Sensitivity and specificity were 83.7% and 91.1%, respectively. Of 748 patients manually confirmed as having high-risk NMIBC, 196 (26%) had CIS (of whom 19% also had T1 and 23% also had Ta disease); 552 tumors (74%) had no associated CIS. CONCLUSION: The NLP model, combined with a rule-based algorithm, identified high-risk NMIBC with good performance and will enable future work to study real-world treatment patterns and clinical outcomes for high-risk NMIBC.
DOI	10.1200/CCI.23.00096
Alternate Journal	JCO Clin Cancer Inform
PubMed ID	37906722
PubMed Central ID	PMC10642898