Faculty Publications

The Effects of Individual Differences, Non‐Stationarity, and The Importance of Data Partitioning Decisions for Training and Testing of EEG Cross‐Participant Models

Alexander J. Kamrud [*], Air Force Institute of Technology
Brett J. Borghetti, Air Force Institute of TechnologyFollow
Christine M. Schubert Kabban, Air Force Institute of TechnologyFollow

Document Type

Article

Publication Date

5-6-2021

Abstract

EEG-based deep learning models have trended toward models that are designed to perform classification on any individual (cross-participant models). However, because EEG varies across participants due to non-stationarity and individual differences, certain guidelines must be followed for partitioning data into training, validation, and testing sets, in order for cross-participant models to avoid overestimation of model accuracy. Despite this necessity, the majority of EEG-based cross-participant models have not adopted such guidelines. Furthermore, some data repositories may unwittingly contribute to the problem by providing partitioned test and non-test datasets for reasons such as competition support. In this study, we demonstrate how improper dataset partitioning and the resulting improper training, validation, and testing of a cross-participant model leads to overestimated model accuracy. We demonstrate this mathematically, and empirically, using five publicly available datasets. To build the cross-participant models for these datasets, we replicate published results and demonstrate how the model accuracies are significantly reduced when proper EEG cross-participant model guidelines are followed. Our empirical results show that by not following these guidelines, error rates of cross-participant models can be underestimated between 35% and 3900%. This misrepresentation of model performance for the general population potentially slows scientific progress toward truly high-performing classification models.

Comments

© 2021 The Authors. This article is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Sourced from the published version of record cited below.

Author marked [*] was an AFIT graduate student at the time of publication.

DOI

10.3390/s21093225

Source Publication

Sensors

Recommended Citation

Kamrud, A. J., Borghetti, B. J., & Schubert Kabban, C. M. (2021). The effects of individual differences, non‐stationarity, and the importance of data partitioning decisions for training and testing of EEG cross‐participant models. Sensors, 21(9), art. 3225. https://doi.org/10.3390/s21093225

Download

Included in

Data Science Commons

COinS

Faculty Publications

The Effects of Individual Differences, Non‐Stationarity, and The Importance of Data Partitioning Decisions for Training and Testing of EEG Cross‐Participant Models

Document Type

Publication Date

Abstract

Comments

DOI

Source Publication

Recommended Citation

Included in

Search

Browse

Author Corner

Faculty Publications

The Effects of Individual Differences, Non‐Stationarity, and The Importance of Data Partitioning Decisions for Training and Testing of EEG Cross‐Participant Models

Authors

Document Type

Publication Date

Abstract

Comments

DOI

Source Publication

Recommended Citation

Included in

Share

Search

Browse

Author Corner