Viewing Study NCT04849195


Ignite Creation Date: 2025-12-24 @ 1:29 PM
Ignite Modification Date: 2026-01-05 @ 2:58 AM
Study NCT ID: NCT04849195
Status: UNKNOWN
Last Update Posted: 2021-04-19
First Post: 2021-04-13
Is NOT Gene Therapy: True
Has Adverse Events: False

Brief Title: Comparison of Different Feature Engineering Methods for Automated ICD Coding
Sponsor: China National Center for Cardiovascular Diseases
Organization:

Study Overview

Official Title: Comparison of Different Feature Engineering Methods for Automated ICD Coding
Status: UNKNOWN
Status Verified Date: 2021-04
Last Known Status: ACTIVE_NOT_RECRUITING
Delayed Posting: No
If Stopped, Why?: Not Stopped
Has Expanded Access: False
If Expanded Access, NCT#: N/A
Has Expanded Access, NCT# Status: N/A
Acronym: None
Brief Summary: Using traditional machine learning classifiers, this study targets on comparing bag-of-words, word2cec and roberta on automated ICD coding related to cardiovascular diseases in Chinese corpus.
Detailed Description: ICD coding is quite important as it serves as basis for a wide range of economic and academic applications. Currently, manual coding is mainly adopted, which faces several limits like being time-consuming and prone to error, and this makes automated ICD coding via machine learning a hot research topic.

As an inevitable phase during machine learning, feature engineering plays a crucially important role in leading to promising coding performance. Although have reached enlightening conclusions, existing studies lacked comparison of different feature engineering methods. Finding out what methods under what circumstances perform better can be quite helpful in promoting practical applications of automated coding.

The investigators will implement this study based on inpatient' data collected from electronic medical records from Fuwai Hospital, the world's largest medical center for cardiovascular disease. Bag-of-words, word2cec and roberta will be respectively used to extracted features from training data. Then code-wise logistic regression classifiers and support vector machine classifiers will be trained to auto-assign codes. Afterwards, performances of the models on test data will be evaluated.

Study Oversight

Has Oversight DMC: None
Is a FDA Regulated Drug?: False
Is a FDA Regulated Device?: False
Is an Unapproved Device?: None
Is a PPSD?: None
Is a US Export?: None
Is an FDA AA801 Violation?: