Identifying non-small cell lung cancer patients from a cohort of heterogeneous lung cancer patients using boosted trees on electronic health records data
Ability to distinguish between subtypes of lung cancer (LC) is important for clinical outcomes and cost analysis, but this information is seldom captured in the structured electronic health record (EHR) data. The objective of this study was to develop and validate an artificial intelligence model to identify non-small cell lung cancer (NSCLC) patients from a cohort of heterogeneous LC patients using de-identified retrospective EHR data. The study found that machine learning methods can be used with structured EMR features and a curated gold-standard to develop and validate reliable indicators of clinical status in NSCLC. This could save substantial time and effort by quickly identifying patients for retrospective outcomes and cost studies as compared to expert manual curation.