Automatic Sleep Scoring: Shaping the Future of Sleep Health

Automatic Sleep Scoring: Shaping the Future of Sleep Health

14 Min.
Medium
By María Sierra, Eduardo López Larraz
January 22, 2025

Sleep disorders have become a pressing global concern, with millions affected by conditions such as insomnia and obstructive sleep apnea. This post explores the advancements in automatic sleep scoring, an AI-driven innovation revolutionizing sleep health. By leveraging wearable devices and deep learning, this technology offers accessible, real-time solutions for sleep analysis. From personal health empowerment to enhancing telehealth, discover how automatic sleep scoring is shaping the future of sleep medicine.

Understanding the Importance and Challenges of Sleep

In today’s fast-paced world, sleep often takes a backseat to daily responsibilities. Yet, the importance of a good night's sleep cannot be overstated.

Sleep is not merely a passive state of rest; it is a complex biological process that plays a crucial role in our physical health, mental well-being, and overall quality of life. It is during sleep that our bodies engage in essential restorative processes, including memory consolidation, immune function enhancement, and emotional regulation. 

However, modern lifestyles, marked by stress, irregular schedules, and excessive screen time, have led to widespread sleep deprivation and related health issues. Alarming statistics underscore the urgency of the problem, with sleep disorders, ranging from insomnia to obstructive sleep apnea (OSA), on the rise globally: OSA impacts approximately 1 billion adults worldwide and nearly one-third of the general population complains of insomnia (Ohayon, M. 2011). According to the World Health Organization, around 1 in 4 people experience sleep disorders at some point in their lives.

These rising numbers impose substantial burdens on individuals and healthcare systems alike. As the prevalence of sleep disorders increases, so does the urgency to develop effective and accessible solutions for assessing and managing sleep health

Automatic Sleep Scoring Sleep Health EEG 3Sleep EEG monitorization using polysomnography (PSG)

Traditionally, sleep analysis has relied on polysomnography (PSG), the gold standard for diagnosing sleep disorders, which is comprehensive but cumbersome. PSG requires multiple sensors to monitor various physiological signals, including:

  • Electroencephalography (EEG): Measures brain wave activity.
  • Electrooculography (EOG): Records eye movements.
  • Electromyography (EMG): Captures muscle activity.
  • Respiratory monitoring: Assesses airflow and respiratory effort.
  • Pulse oximetry: Measures pulse and blood oxygen levels.

This multi-sensor setup is not only time-consuming and expensive but also requires specialized equipment and trained personnel to interpret the data, leading to delays in diagnosis and treatment. The complexity of PSG creates a significant barrier to timely and accessible sleep health diagnostics. Consequently, there has been a growing demand for more attainable and efficient solutions that can provide accurate sleep assessments without the need for extensive laboratory setups.

Automatic Sleep Scoring Sleep Health Eeg 4 Sw2Ikon wearable textil EEG headband on the picture

To address the limitations existing in the current sleep evaluation process, Bitbrain has developed a fully textile-based, wearable garment capable of measuring brain activity with medical-grade precision (López-Larraz et al., 2023). This cost-effective device handles the most tedious aspects present in a complex and bulky PSG recording setup in terms of the excessive number of sensors, the impact on the patient’s comfort during the night, the resource intensive, and the associated economic expenses.

Wearable devices open the door to home sleep monitoring. However, their mass adoption raises new challenges. It generates a massive amount of EEG data that is impossible to fully analyze and label by technicians. Furthermore, there is a loss of information since signals included in PSG that are also used to classify sleep (such as EOG, EMG or ECG) are unavailable. Also, the recorded data exhibits higher noise levels and artifacts compared to PSG recordings, due to the simplified ecological setup and the lack of a professional during the recording process. 

All these challenges make sleep evaluation harder and must be overcome to offer a self-administered and large-scale sleep study tool in home environments under uncontrolled conditions. 

Automatic Sleep Scoring: A Game-Changing Innovation

As a result, an automated analysis is needed to be able to deal with much more data in a more complicated context. 

To address these challenges, researchers are turning to technology. Automatic sleep scoring, powered by artificial intelligence (AI), has emerged as a groundbreaking solution. It leverages machine learning algorithms and wearable devices to analyze sleep data with unprecedented accuracy and efficiency, supporting home-based monitoring while helping to streamline sleep analysis and reduce diagnosis times.

Automatic Sleep Scoring Sleep Health EEG 5

Deep learning techniques, a subset of machine learning, are currently being used to analyze physiological data and classify sleep stages with high precision. For example, convolutional neural networks (CNNs) are extensively used to extract spatial features from EEG signals, enabling the model to capture intricate patterns across various sleep stages.

The sleep cycle alternates between REM and NREM stages, each one defined by specific patterns of electrical brain activity. NREM is divided into four stages based on different EEG pattern changes (Nayak and Anilkumar, 2023; Patel, Reddy, Araujo, 2024).

Sleep EEG Stages Graph Time and sleep stages in the sleep cycle

Recurrent neural networks (RNNs) and long short-term memory (LSTM) networks complement this by processing sequential data, capturing temporal dependencies crucial for identifying transitions between sleep phases. Transformer architectures, utilizing self-attention mechanisms, further enhance these capabilities by dynamically weighting the importance of features within the data.

Additionally, attention-based models improve interpretability, offering insights into how specific EEG characteristics influence classification outcomes. The combination of these methodologies or their standalone use ensures robust performance across diverse datasets, working as a promising solution to bridge the gap between traditional diagnostics and the need for scalable, accessible solutions for sleep analysis.

These techniques have paved the way for automatic sleep scoring systems that can analyze sleep data collected from various sources, including wearable devices and EEG sensors, to classify sleep stages, often in real time

Bitbrain’s Pioneering Study

The Bitbrain team recently completed a study (Esparza-Iaizzo et al., 2024) that evaluates and highlights the potential of these systems to revolutionize sleep analysis and delves into their intricacies, exploring various factors that influence their effectiveness.

By conducting a series of experiments, the study aimed to determine the most suitable conditions for automatic sleep scoring, ultimately leading to more reliable and generalizable results across diverse populations. The team employed a CNN-based model and tested it on open-access datasets containing both healthy participants and individuals with sleep disorders.

The Architecture of the CNN Model

From a broad perspective, the CNN architecture employed in the study comprises two main submodels. The first submodel (‘Time Distributed’) extracts time-invariant features from each individual epoch. These features are then processed simultaneously by the second submodel (‘Global’) to classify sleep stages. 

Automatic Sleep Scoring Sleep Health EEG 1Model’s CNN architecture 

Below are the basic technical specifications that define the model’s architecture and functionality: 

  • Input: Pre-processed signal data (segmented into epochs). 
    Every recording contains a variable number of channels sampled at different frequencies and divided into a discrete number of labels associated with each epoch. Initially, each file is resampled to 100 Hz and reshaped into a matrix where the first dimension is the number of epochs, the second is the number of time points (3000), and the third is the employed number of channels, which depends on the experiment being carried out. The subjects in each dataset are then divided into training and testing with a 90% and 10% ratio, respectively, to carry out a 10-fold cross-validation. The training data is furthermore partitioned into a train (85%) and validation set (15%). Finally, the recordings are z-scored exclusively according to the training data, ensuring that information from the test set is not employed in any way.
  • Convolutions: Multiple layers extracting spatial and frequency-domain features, each followed by Rectified Linear Unit (ReLU) activation functions to introduce non-linearity.
  • Pooling: Max-pooling to reduce dimensionality and enhance computational efficiency while highlighting dominant signal characteristics.
  • Dropout: Regularization layers to mitigate overfitting by randomly deactivating neurons during training.
  • Dense Layers: Fully connected layers aggregating features into high-level abstractions.
  • Output: Softmax layer consolidates the extracted features into a 5-unit output vector that contains the probabilities for each sleep stage (Wake, N1, N2, N3, REM).

Key Findings and Implications

The key areas of the investigation answer three major questions:

1. What is the optimal sensor configuration for accurate automated sleep scoring?  

To address this, the study examined whether EEG electrode position impacts scoring accuracy (frontal, central, and occipital recording sites), whether multiple EEG channels provide a significant advantage (combination of all three locations), and the effects of incorporating additional sensors like EOG (electrooculogram), EMG (electromyogram), and EKG (electrocardiogram). Results indicated that a single frontal EEG channel offers a reliable solution, simplifying hardware requirements without compromising accuracy. This constitutes one of the most promising findings of the study, reducing the need for cumbersome multi-sensor setups and making wearable devices practical for widespread use. The model achieved:

  • 81.4% accuracy for healthy participants.
  • 78.5% accuracy across diverse populations, including those with sleep apnea.

2. Can real-time sleep staging rival traditional offline scoring? 

This question was explored by comparing the CNN model's performance in real-time versus offline settings. The outcomes showed that real-time applications achieve nearly equivalent accuracy to offline scoring methods, with only a 2% performance gap in the best offline result achieved, demonstrating the feasibility of immediate feedback and intervention. 

Findings from these two sets of experiments suggest that real-time, robust sleep monitoring is feasible with minimal hardware.

Automatic Sleep Scoring Sleep Health EEG 2

Comparison of the two sets of experiments 

3. How do these algorithms perform across different populations, including those with sleep disorders

As mentioned above, the study tested the CNN-based model on datasets encompassing both healthy participants and those with sleep disorders. While the model achieved 81.4% accuracy in healthy individuals and 78.5% accuracy in diverse populations, challenges remain in scoring sleep for individuals with OSA. Fragmented sleep and frequent arousals dropped the model’s accuracy for this group to 75.5%. This underscores the need for tailored solutions that address the unique challenges of clinical populations.

At the heart of this automatic sleep scoring system lies the convolutional neural network (CNN), a type of deep learning architecture that is designed to recognize patterns in complex data, making them ideal for analyzing sleep signals. The study demonstrates how by training CNNs on diverse datasets, researchers can develop models that enable them to classify sleep stages with remarkable precision, even in real-time scenarios.

The findings indicate that this technology has the potential to transform the way monitoring and analysis of sleep is monitored, making it possible to deliver personalized recommendations for improving sleep quality. What’s more, they suggest that real-time sleep scoring can achieve comparable accuracy to traditional offline procedures, which rely on past and future information to classify sleep stages. This breakthrough has significant implications for the future of sleep medicine, as it allows for immediate feedback and intervention, potentially improving patient outcomes.

Beyond traditional diagnostics, the research underscores other applications of automatic sleep scoring that are directly relevant to improving patient care and supporting healthcare providers:

  • Empowering Personal Health: By providing real-time insights into sleep patterns, these systems enable individuals to make informed decisions about their lifestyle and health.
  • Telehealth Integration: AI-driven sleep analysis allows remote consultations, where clinicians can review accurate, home-generated data to diagnose and monitor conditions without requiring in-person visits. This can enhance telehealth platforms, facilitating remote monitoring and reducing reliance on in-person consultations.
  • Proactive Sleep Health Management: Wearable devices with automatic scoring algorithms enable continuous home monitoring.  These kinds of technologies are able to support early detection of sleep disorders, reducing the progression of related health complications.
  • Closed-Loop Neurostimulation: The real-time feedback from automatic scoring can guide personalized neurostimulation interventions, such as devices adjusting stimulation parameters to improve sleep quality dynamically.
  • Efficiency in Healthcare: Automating time-intensive sleep scoring frees clinicians to dedicate more time to treatment planning and patient interaction, improving outcomes, reducing healthcare costs, and enhancing overall care quality.

The diagnostic process involves capturing EEG data through a wearable device, automatically processing this information with AI algorithms to classify sleep stages and generating comprehensive reports that clinicians can interpret for actionable insights. This seamless integration from data collection to diagnosis exemplifies the transformative potential of AI in healthcare.

This innovation positions AI and wearable technology as central players in addressing global healthcare challenges, particularly in sleep medicine.

Introducing the BOAS dataset: A Milestone in Sleep Research

Bitbrain, as part of the commitment to innovation in sleep research, has further advanced the field by developing the Bitbrain Open Access Sleep (BOAS) dataset, a comprehensive resource designed to bridge the gap between traditional clinical PSG technologies and wearable EEG solutions. Recently published on OpenNeuro, the BOAS dataset, includes data collected over 128 nights of sleep from healthy participants, meticulously monitored using both clinical and portable technologies. Key aspects of the dataset are:

  1. Clinical reference PSG system: The Brain Quick Plus Evolution by Micromed, a system that provides comprehensive, clinically validated sleep parameters. This includes EEG signals from multiple channels, eye movements (EOG), muscle activity (EMG), breathing patterns (via belts and nasal cannulas), and photoplethysmographic activity for pulse, heartbeat, and oxygen saturation.
  2. Wearable EEG headband developed by Bitbrain: A comfortable, user-friendly device designed for home use. This EEG sleep headband measures frontal EEG activity with sensors limited to the forehead, motion through accelerometers and gyroscopes, and pulse with a photoplethysmography sensor. It was developed for easy, self-administered operation, making sleep monitoring more accessible.

Automatic Sleep Scoring Sleep Health EEG 7Ikon wearable EEG headband on the photo above

The BOAS dataset captures data simultaneously from both systems, allowing direct comparisons between them that help evaluate the performance of emerging wearable EEG technologies against the gold-standard PSG. This facilitates the validation of accessible and self-administered tools besides providing an invaluable tool for researchers and developers. 

Each night’s data has been labelled using a rigorous methodology to ensure robust and reliable sleep scoring:

  • Three expert sleep scorers independently annotated the PSG recordings following the American Academy of Sleep Medicine (AASM) criteria. A fourth expert derived a consensus label from these annotations. This consensus labelling approach addresses the inherent variability in human sleep staging, which has an estimated inter-scorer agreement of approximately 85%.
  • The consensus labels were then applied to the corresponding wearable EEG recordings, leveraging the dual-recording setup.

The dataset, therefore, includes:

  • PSG recordings from 128 nights.
  • Wearable EEG recordings from the same nights.
  • Human-consensus sleep stage labels, obtained from the PSG recordings.
  • AI-generated sleep stage labels, separately obtained from PSG recordings and from wearable EEG recordings.

To ensure accessibility and usability for the research community, the BOAS dataset is formatted according to the Brain Imaging Data Structure (BIDS), which is a community-driven initiative designed to standardize the organization and sharing of neuroimaging and related data. It provides a structured format for storing data from studies like MRI, EEG, and MEG, ensuring consistency and compatibility across datasets. BIDS simplifies data sharing, reproducibility, and integration into analysis pipelines by adhering to clear conventions. Its goal is to facilitate collaboration and accelerate research by making data more accessible and usable across disciplines. 

The performance of Bitbrain’s CNN-based model was rigorously tested using this dataset. It was trained and validated separately on the PSG and wearable data using a cross-validation procedure, providing an opportunity to evaluate its performance across two different technologies. 

Results demonstrated that wearable EEG devices when paired with advanced AI models, can approximate the accuracy of gold-standard clinical methods:

  • For PSG data, the model achieved 87.08% agreement with expert consensus labels.
  • For wearable EEG data, the model achieved 86.64% agreement, highlighting the potential of mobile EEG devices to deliver clinically relevant sleep staging, without the logistical complexities and costs associated with PSG.

Exploring the Potential of the Bitbrain Open Access Sleep Dataset

One of the most significant aspects of the Bitbrain Open Access Sleep dataset is its ability to explore the strengths and limitations of portable technologies. It offers unparalleled opportunities for advancing sleep research and technology, catering directly to real-world applications such as:

Automatic Sleep Scoring Sleep Health Eeg 8

  1. Validation of mobile devices for consumer markets
    The BOAS dataset supports the development and validation of wearable sleep monitoring tools, making sleep assessments more accessible. It enables manufacturers to rigorously validate their devices against clinical standards. For example, wearable EEG headbands can be benchmarked to ensure accuracy comparable to traditional PSG setups, safeguarding reliability for end-users.
  2. Custom AI model development for industry applications
    With synchronized data from multiple sources, researchers and businesses developing AI-driven sleep analysis tools can fine-tune algorithms. This allows for the creation of more robust, generalizable, and accurate models tailored for wearable devices, enhancing their ability to identify complex patterns in multi- or single-channel data and overall product performance.
  3. Optimizing Inter-Technology Integration in Smart Home Devices
    Offering data from both PSG and wearable EEG allows to analyze how different sensors can optimize the capture of relevant sleep data. The dataset supports comparisons between different sensor technologies, aiding the design of hybrid systems (e.g., combining wearable EEG with smart mattresses or smartwatches). Companies can refine multi-sensor platforms for enhanced sleep tracking capabilities, tailored to diverse user needs.

What’s more, the BOAS dataset not only supports the technical validation of new tools but also provides valuable insights into addressing critical clinical challenges. For example:

  1. Enhancing telehealth and home monitoring
    By validating portable devices and AI models, the dataset supports and advances innovations in telehealth, allowing individuals to monitor their sleep at home with accuracy comparable to clinical settings.
  2. Addressing sleep disorder patterns
    With data from multiple sensors, including respiratory belts and thermistors, the dataset is a critical resource for modelling patterns associated with various sleep disorders, such as obstructive sleep apnea.
  3. Optimizing portable hardware design
    By identifying which sensors provide the most clinically relevant measurements, the BOAS dataset contributes to the design of more efficient, cost-effective wearable devices.

All previous hints underline the BOAS dataset’s transformative potential in shaping the future of sleep research, ensuring that sleep-related solutions meet the specific demands of healthcare providers, technology developers, and consumers alike.

As technologies like automatic sleep scoring evolve, resources such as the BOAS dataset will be essential for training and validating models in real-world environments. Additionally, as an open-access resource, this dataset fosters scientific collaboration, empowering researchers worldwide to drive innovation in sleep medicine.

Future directions include integrating these data into broader studies to traverse new frontiers on portable devices and closed-loop neurostimulation techniques, as explored in the study, where AI-driven systems provide real-time interventions to improve sleep quality.

By providing a robust foundation for research and development, the BOAS dataset represents a significant step forward in making high-quality, accessible sleep monitoring tools a reality.

Final Thoughts: Measurable Outcomes and Clear Opportunities for a Future of Better Sleep

The integration of AI-driven solutions like the BOAS dataset and Bitbrain’s study results is already reshaping sleep health. As technology advances, these resources offer:

  • Scalable sleep diagnostics: Automatic sleep scoring systems validated by these resources provide a scalable alternative to traditional PSG setups.
  • Innovative telehealth tools: Real-time diagnostics reduce barriers to access, ensuring patients can receive timely care at home.
  • Industry impact: These advancements give businesses the confidence to deliver accurate, user-friendly solutions validated through comprehensive studies.

Automatic Sleep Scoring Sleep Health Eeg 6 Sw

They represent transformative steps forward in sleep diagnostics and health management. Their contributions include:

  • High diagnostic accuracy: The CNN model’s 87.08% agreement for PSG data and 86.64% for wearable EEG data underscore its reliability.
  • Way forward addressing performance across populations: While the CNN model achieved 78.5% accuracy in diverse populations, results in OSA patients were slightly lower at 75.5%, identifying the need for tailored enhancements for clinical applications.
  • Actionable AI and device insights: The Bitbrain study demonstrated that a single frontal EEG electrode provides reliable sleep scoring with 81.4% accuracy for healthy participants, streamlining hardware requirements for wearable devices. Experiments incorporating additional sensors, such as EOG and EMG, showed only marginal gains, emphasizing that simpler configurations can still meet clinical standards.
  • Immediate feedback capability: The study revealed that real-time scoring is viable, achieving near-parity with offline methods, with only a 2% performance gap. This opens doors for immediate diagnostic feedback in telehealth scenarios, empowering providers to deliver instant interventions, and improving patient outcomes.
  • Path to market readiness: By identifying optimal configurations and validating technologies, the combined insights guide manufacturers in creating market-ready, compliant devices.

Together, these tools enable clients to lead in the rapidly evolving sleep health industry, ensuring scalable and accessible solutions for a global audience.

The research presented is a testament to the incredible potential of automated sleep scoring and portable devices in transforming healthcare, offering a glimpse into a future where sleep health is prioritized and accessible to all. 

As we continue to innovate and explore new frontiers in sleep analysis, we can look forward to a world where quality sleep is on hand for everyone, leading to healthier, happier individuals.

The journey towards better sleep is a multifaceted endeavour that requires collaboration between researchers, healthcare professionals, and individuals.

Sleep well and dream big!  

About the authors

  • María Sierra, Engineer in Electronics and Automation, with a Master’s in Robotics, Graphics, and Computer Vision. Currently a Data Scientist at Bitbrain.
  • Eduardo López Larraz, (Scholar, Research Gate, Linkedin). PhD. in Biomedical Engineering and Research Scientist at Bitbrain. 

You might be interested in

References

  • Ohayon, Maurice. (2011). Epidemiological Overview of sleep Disorders in the General Population. Sleep Medicine Research. 2. 1-9. 10.17241/smr.2011.2.1.1. 
  • Nayak, C. S., & Anilkumar, A. C. (2023). EEG Normal Sleep. In StatPearls. StatPearls Publishing.
  • Patel, A. K., Reddy, V., Shumway, K. R., & Araujo, J. F. (2024). Physiology, Sleep Stages. In StatPearls. StatPearls Publishing.
  • López-Larraz, E., Escolano, C., Robledo-Menéndez, A., Morlas, L., Alda, A., & Minguez, J. (2023). A garment that measures brain activity: proof of concept of an EEG sensor layer fully implemented with smart textiles. Frontiers in human neuroscience, 17, 1135153. doi.org/10.3389/fnhum.2023.1135153
  • Esparza-laizzo, M., Sierra-Torralba, M., Klinzing, Jens G., Minguez, J., Montesano, L., López-Larraz, E. (2024). Automatic sleep scoring for real-time monitoring and stimulation in individuals with and without sleep apnea. bioRxiv 2024.06.12.597764. doi.org/10.1101/2024.06.12.597764
Ikon Neuroheadband EEG
The first dry-EEG based on smart textiles
Learn more