Int J Med Inform. 2020 Oct 22;145:104308. doi: 10.1016/j.ijmedinf.2020.104308. Online ahead of print.
BACKGROUND AND OBJECTIVE: Identification and Standardization of data elements used in clinical trials may control and reduce the cost and errors during the operational process, and enable seamless data exchange between the electronic data capture (EDC) systems and Electronic Health Record (EHR) systems. This study presents a methodology to comprehensively capture the clinical trial data element needs.
MATERIALS AND METHODS: Case report forms (CRF) for clinical trial data collection were used to approximate the clinical information need, whereby these information needs were then mapped to a semantically equivalent field within an existing FHIR cancer profile. For items without a semantically equivalent field, we considered these items to be information needs that cannot be represented in current standards and proposed extensions to support these needs.
RESULTS: We successfully identified 62 discrete items from a preliminary survey of 43 base questions in four CRFs used in colorectal cancer clinical trials, in which 28 items are modeled with FHIR extensions and their associated responses for colorectal cancer. We achieved promising results in the data population of the CRFs with average Precision 98.5 %, Recall 96.2 %, and F-measure 96.8 % for all base questions. We also demonstrated the auto-filled answers in CRFs can be used to discover patient subgroups using a topic modeling approach.
CONCLUSION: CRFs can be considered as a proxy for representing information needs for their respective cancer types. Mining the information needs can serve as a valuable resource for expanding existing standards to ensure they can comprehensively represent relevant clinical data without loss of granularity.