Systematic evaluation of machine learning classification models to predict growth faltering in preterm infants

PAS 2021 Virtual

April 30 – May 4, 2021


View the Full Poster


Growth faltering (GF) in preterm infants associated with under nutrition and clinical disorders and treatments leads to poor neurodevelopmental outcomes.

We hypothesized that we could identify infants at risk of GF earlier in their neonatal intensive care unit (NICU) stay based on clinical and feeding data to allow for early nutrition interventions that would produce better outcomes.

We developed classification machine learning models to predict GF at discharge (defined as birth-to-discharge weight z-score decline of ≥ 1.2). To improve our models, we developed and tested imputation methods to overcome differences in data collection across sites. We trained our prediction models for three time windows starting at birth for clinically actionable interventions: 1) birth, 2) two weeks, and 3) one month.

Our longitudinal dataset consists of 357 infants from multiple NICUs (n=3), including infants with growth normal (GN, n=246) and GF (n=111). The data set includes infant and maternal characteristics, and longitudinal clinical, feeding, medication, and probiotics data. We learned different models using various machine learning approaches, including random forest and logistic regression (with and without imputation of missing values) by randomly splitting the dataset into 80% training and 20% test sets. Model performance was systematically assessed through a 5-fold cross validation on the training set and post model selection on the test set. Feature selection was done to identify the most informative features for predictions. Our three models were validated on an independent cohort of 135 preterm infants at a new site with a different GF profile from the training set, including fewer infants with GN (n=54) than with GF (n=81).

Results show the logistic regression classifier (LR) outperforms random forests classifiers for predicting GF, and data imputation improves model performance. The LR classifier performs favorably on the independent validation set for the three time-windows. See Table 1 for model performance.

In conclusion, we performed a systematic evaluation of classifiers to predict GF in preterm infants within the first month of life and found that LR with imputation performs best and a subset of the features provides adequate accuracy. To determine generalizability to other preterm patient populations and clinical sites, models were validated on an independent cohort, demonstrating applicability to inform clinical practice.

Table 1: LR Performance across Training, Test and Validation Data Sets


Dataset Number of Infants Performance Metrics
GF GN Sensitivity Accuracy AUC-ROC
Birth 2 weeks One


Birth 2 weeks One


Birth 2 weeks One


Training 91 195 0.68 0.72 0.71 0.61 0.66 0.62 0.64 0.72 0.68
Test 20 51 0.70 0.80 0.80 0.70 0.66 0.68 0.75 0.72 0.76
Validation 81 54 0.59 0.46 0.76 0.62 0.58 0.66 0.71 0.64 0.70


S. Xu1, J. Lugo-Martinez1, Z. Bar-Joseph1, L. A. Parker2, J. Neu2, A. Tandon3, D. Genetti3, J. Levesque3, D. Gallagher3, T. Warren3

1 Carnegie Mellon University, 2University of Florida, Gainesville, 3 Astarte Medical