TY - JOUR
T1 - Explainable Machine Learning for Early Assessment of COVID-19 Risk Prediction in Emergency Departments
AU - Casiraghi, Elena
AU - Malchiodi, Dario
AU - Trucco, Gabriella
AU - Frasca, Marco
AU - Cappelletti, Luca
AU - Fontana, Tommaso
AU - Esposito, Alessandro Andrea
AU - Avola, Emanuele
AU - Jachetti, Alessandro
AU - Reese, Justin
AU - Rizzi, Alessandro
AU - Robinson, Peter N.
AU - Valentini, Giorgio
N1 - Funding Information:
This work was supported in part by the Università degli Studi di Milano through the Piano di Sostegno alla ricerca 2019 Grant.
Publisher Copyright:
© 2013 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020
Y1 - 2020
N2 - Between January and October of 2020, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus has infected more than 34 million persons in a worldwide pandemic leading to over one million deaths worldwide (data from the Johns Hopkins University). Since the virus begun to spread, emergency departments were busy with COVID-19 patients for whom a quick decision regarding in- or outpatient care was required. The virus can cause characteristic abnormalities in chest radiographs (CXR), but, due to the low sensitivity of CXR, additional variables and criteria are needed to accurately predict risk. Here, we describe a computerized system primarily aimed at extracting the most relevant radiological, clinical, and laboratory variables for improving patient risk prediction, and secondarily at presenting an explainable machine learning system, which may provide simple decision criteria to be used by clinicians as a support for assessing patient risk. To achieve robust and reliable variable selection, Boruta and Random Forest (RF) are combined in a 10-fold cross-validation scheme to produce a variable importance estimate not biased by the presence of surrogates. The most important variables are then selected to train a RF classifier, whose rules may be extracted, simplified, and pruned to finally build an associative tree, particularly appealing for its simplicity. Results show that the radiological score automatically computed through a neural network is highly correlated with the score computed by radiologists, and that laboratory variables, together with the number of comorbidities, aid risk prediction. The prediction performance of our approach was compared to that that of generalized linear models and shown to be effective and robust. The proposed machine learning-based computational system can be easily deployed and used in emergency departments for rapid and accurate risk prediction in COVID-19 patients.
AB - Between January and October of 2020, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus has infected more than 34 million persons in a worldwide pandemic leading to over one million deaths worldwide (data from the Johns Hopkins University). Since the virus begun to spread, emergency departments were busy with COVID-19 patients for whom a quick decision regarding in- or outpatient care was required. The virus can cause characteristic abnormalities in chest radiographs (CXR), but, due to the low sensitivity of CXR, additional variables and criteria are needed to accurately predict risk. Here, we describe a computerized system primarily aimed at extracting the most relevant radiological, clinical, and laboratory variables for improving patient risk prediction, and secondarily at presenting an explainable machine learning system, which may provide simple decision criteria to be used by clinicians as a support for assessing patient risk. To achieve robust and reliable variable selection, Boruta and Random Forest (RF) are combined in a 10-fold cross-validation scheme to produce a variable importance estimate not biased by the presence of surrogates. The most important variables are then selected to train a RF classifier, whose rules may be extracted, simplified, and pruned to finally build an associative tree, particularly appealing for its simplicity. Results show that the radiological score automatically computed through a neural network is highly correlated with the score computed by radiologists, and that laboratory variables, together with the number of comorbidities, aid risk prediction. The prediction performance of our approach was compared to that that of generalized linear models and shown to be effective and robust. The proposed machine learning-based computational system can be easily deployed and used in emergency departments for rapid and accurate risk prediction in COVID-19 patients.
KW - Associative tree
KW - Boruta feature selection
KW - clinical data analysis
KW - COVID-19
KW - generalized linear models
KW - missing data imputation
KW - random forest classifier
KW - risk prediction
UR - http://www.scopus.com/inward/record.url?scp=85096218342&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096218342&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2020.3034032
DO - 10.1109/ACCESS.2020.3034032
M3 - Article
AN - SCOPUS:85096218342
SN - 2169-3536
VL - 8
SP - 196299
EP - 196325
JO - IEEE Access
JF - IEEE Access
M1 - 9239931
ER -