The apnea-hypopnea index (AHI), the current severity metric used clinically for diagnosing obstructive sleep apnea (OSA), does not correlate well to daytime sleepiness measured via the Epworth Sleepiness Scale (ESS).… Click to show full abstract
The apnea-hypopnea index (AHI), the current severity metric used clinically for diagnosing obstructive sleep apnea (OSA), does not correlate well to daytime sleepiness measured via the Epworth Sleepiness Scale (ESS). Here, we assessed whether a machine-learned combination of possibly independent metrics across ventilatory/hypoxic/arousal domains would be better associated with ESS than the AHI using data from 3 large cohorts. Polysomnography data were analyzed from The Sleep Heart Health Study (SHHS), The Multi-Ethnic Study of Atherosclerosis (MESA), and The Osteoporotic Fractures in Men (MrOS) Study. A total of N=6618 (39.9% female; age 68.7±6.6) subjects had valid data (ESS and good quality airflow/EEG/SpO2). Ventilatory burden was evaluated using a derived flow signal that utilized the sum of thoracic and abdominal effort signals for SHHS and was evaluated using the Nasal Cannula/Pressure Transducer signal for the MrOS and MESA data. Hypoxic burden was calculated as the area between the baseline and the SpO2 trace for any episode with >= 3% desaturation. Arousal burden was defined as the manually scored arousal index (number/hr.). Based on a cut-off of ESS(ESS>= 10), sleepiness was coded present or absent and was the primary outcome. Data were analyzed in two ways: using all 3 cohorts as train (70%) and test (30%), and by permutations and combinations of the 3 cohorts (70/30 split; e.g., SHHS for training, MESA and MrOS for test). Model performance metrics were the area under the receiver operating characteristic curve (AUROC) and %accuracy. For comparison, a logistic regression model using AHI3a (3% desaturation and/or EEG arousal) was fit. The logistic regression model (AHI3a) classified sleepiness with an AUROC of 0.51±0.07. The random forest model trained on 70% of all 3 cohorts achieved the highest AUROC of 0.88±0.07 (mean accuracy of 85.1± 2.13%), whereas the permutations and combinations of the 3 datasets resulted in an average AUROC of 0.63 ±0.12 (mean accuracy of 76.4±6.57%). The machine-learned combination of ventilatory/hypoxic/arousal burdens classifies daytime sleepiness in OSA better than AHI3a across data from 3 large cohorts. These results suggest that OSA severity measured using machine-learned combination of ventilatory/hypoxic/arousal burdens better explains the variability in daytime sleepiness.
               
Click one of the above tabs to view related content.