Written in the course of CS598 – Machine Learning and Signal Processing at UIUC.
Title: A Method for Removal of EEG Artifacts using Facial Motion Detection and ICA
Author(s): Austin Walters, Esteban Gomez, and Cole Gleason
Electroencephalography (EEG) headsets are used to measure electrical activity in a subject’s brain (commonly referred to as brainwaves). Readings from EEG headsets are prone to a number of artifacts due to blinks, muscle-spasms, or other movement by the subject. While much work has been done to remove these blinks with blind source separation and independent component analysis, almost all methods require training of a classifier to recognize artifacts per-subject. Additionally, some methods require training per subject-session due to differing quality of the electrode contacts to the subject’s scalp.
This paper presents a method for automatic detection and removal of artifacts in commercial-grade headsets. While there is still much work to be done on success measurements, cleaner removal, and real-time systems, the method outlined seems initially promising.
Electroencephalography (EEG) is a common method of reading electrical activity in the brain for research and medical diagnosis.
Signals are notoriously difficult to manipulate, detect, and correct. Particularly, electroencephalography (EEG) signals are often very convoluted with muscle movement (5 – 200 hz) and power grid pollution (around 60 hz). The method described is intended to remove as much signal pollution as possible related to muscle movement using a standard laptop camera or web-cam.
The methods currently used to correct or remove noise from muscle movement in EEG signals require the manual training of a classifier every time a subject wears the headset. Every time a headset is placed on the head the connection to the scalp is altered slightly, and any classifier trained during a previous session can have issues identifying noise from useful signals.
The method described is designed to remove the need to train any classifier by finding correlations between movement and the ICA components of an EEG signal. If there is a correlation this can be used to identify the specific component and either correct it or mark it as a possible error. The advantage of this is that there is no need to train a new classifier every time an EEG is used, which takes considerable time, and noise can be more accurately identified.
Research and Neurofeedback
The research in the field of Neurofeedback is still in its infancy, but there has been some strides in making it useful for consumers. One particularly promising area of research is that of steady state visually evolved potentials or (SSVEP), which can be used as a brain-computer interface (BCI). An application of this would be the chess game created in Dr. Bretl’s lab at the University of Illinois, which enabled users to play a game of chess using a robotic arm and flashing lights.
SSVEP works by stimulating the brain with a flashing light at a particular frequency, say 7 hz. This creates action potentials (neural firing) at the same frequency, and these signals can then be picked up on an EEG with a very high accuracy (85 – 90 percent). Unfortunately, SSVEP responses only seem to work (best) between the frequency of 5 hz and 30 hz, the range of frequencies heavily effected by muscle movement. Thus, if the noise due to muscle movement can be reduced then the accuracy of SSVEP detection, one of the most promising methods used in BCI today, can be improved as well.
Consumer Headsets and Untrained Users
Most of the brain-computer interfaces today require a medical grade EEG. However, with the introduction of Emotiv and OpenBCI headsets, there is now a surplus of consumer grade EEG headsets. The downside to this is that these EEGs are used in uncontrolled environments without expert handling. Requiring a whole new range of EEG signal analysis or collection needs to be developed, in order to account for the increase in noise, else many of these EEGs will be next to useless.
This method can account for some noise and correct the signals, without constant manual training of a classifier, and without needing any input from the subject. This means that although the subject may not be an expert, the EEG signal can still be corrected/cleaned, and the consumer EEG can be used to a greater effect. The requirement of training a classifier or reviewing the EEG data is removed, hence it is more consumer friendly, and less error prone.
Generating a Motion Signal
The EEG output is a mixture of the signals resulting from many simultaneous brain processes. Since at any moment the brain is processing and responding to many stimuli, it is expected to see a very complex response from the EEG. Outside noise that is reflected in the signals include motion artifacts from muscle twitches and other type of movements. Since these responses are of little significance, all effects of movement are categorized as noise.
By combining video data with the EEG, it becomes possible to make a more accurate distinction between the undesired and desired components of the brain signals. To provide an insightful analysis of the relation between the video and the EEG output, it is necessary to convert the video into a facial motion signal. An additional complication is that the location of the EEG sensor determines how strong a motion artifact will be recorded, which means that the motion signal must have a spatial component alongside its intensity component.
Facial Detection and Image Stabilization
For the motion signal to contain useful information, each area of the face must be compared to the same area across multiple frames. If the user changes position without making any facial gestures, the signal should be able to reflect low motion while tracking the user in the frame. As a result, the algorithm must be shift-invariant for the face location but very sensitive to changes within the region of interest (ROI).
The region of interest was extracted using OpenCV’s pre-trained Haar-Cascade facial templates. This returns a region that contains the user’s face determined by its coordinates x. However, there is no guarantee that the coordinates of the ROI will be exact between one frame and the next, even if the user did not move. There is always coordinate variation that shifts the ROI slightly in any direction.
The unwanted jitter in the ROI leads to adding extra noise to the motion signal, and so it is possible to interpret x as a combination of the true location x̂, and random noise xnoise. Before extracting the face image from the overall frame, it is necessary to approximate the ROI’s coordinates such that the random noise is minimized.
Under the assumption that x is a random variable, it is possible to minimize noise by making multiple measurements to decrease its variance. As a result, x̂ was approximated by averaging the output of two different facial detection calls each with its own facial template. Although this average increases confidence of the location of the face in the frame, it does very little to handle inter-frame jitter and displacement. To account for the necessary shift-invariance, a ten frame buffer was added that kept track of the last nine x̂ values, so that the ROI was defined by the average of the current location and that of the previous nine frames.
As a result the ROI smoothly follows the user’s face throughout the frame, while still being sensitive to what occurs inside. In the case where both Haar-Cascade calls fail, the ROI extracted is based on the average x̂ from the buffer. Since the user will likely move very little within ten frames, this approach permits using recent history to predict where the face is located.
After a shift invariant ROI has been extracted it becomes considerably simpler to track changes in the face image, mainly because the different areas of the face now line up. Any motion detection scale will measure the overall changes throughout the entire face. However, as mentioned earlier, to perform useful analysis of the EEG signal it is necessary to also obtain some spatial information. As a result, a 4 x 4 grid is overlaid on the ROI to divide it into sixteen different regions. By breaking up the image in this fashion, instead of getting a single value that conveys all changes, there are sixteen different values that contain the motion of each portion of the face. On Figure 1, the overlaid grid that divides the image concentrates most of the movement on the two middle quadrants of the second row.
The grid images provide greater precision for interpreting the motion data, which enables a more detailed analysis against the EEG signal. The grid image is then converted into a series of delta images, that contain high values in regions where there was motion and low values that remained constant. This algorithm consists in measuring the pixel changes between the current, the previous, and the next frame.
The delta images in Figure 2 correspond to the grid images previously shown. It is clear that before the blink, there was little motion while there is a concentration of high values around the eyes during the blink. The pixel values for each quadrant of the delta image are summed up to get an overall value representing the amount of motion for that quadrant in that frame.
This technique for converting a video into a motion signal seems to provide intuitive and representative results. Since the signal conveys both intensity and location information, it is easy to determine the moments where different events occurred in the video. Figure 3a shows the change of all sixteen signals in time, arranged so that they correspond to the respective location on the subject’s face. Barring the training period where the buffer was getting filled during the beginning, the only high-valued features align with blinks.
Notice that all regions except 6 and 7 remain relatively quiet during the session. Figure 3b shows a close up of the signals around the blink that was presented in Figures 1 and 2. Note that the region showed in Figure 3b is the portion between the red lines in Figure 3a. The red line in Figure 3b shows the moment where the presented blink occurred. Like seen in Figure 2b, all the regions are quiet except for the middle two on the second row.
Gathering EEG Data
One of the goals of this research was to provide a solution to remove artifacts from commodity EEG headsets, which are of much lower quality than those found in the medical field. A low-cost headset that provides 14 channels, the Emotiv Epoc, was chosen. It is known to not perform as well as medical grade devices. Next, a experiment was developed to collect the readings off of the 14 electrodes while simultaneously capturing video from a laptop webcam.
The experiment consisted of the subjects performing various facial expressions or motions while looking at the camera. Each action was performed for 20 seconds and the subjects would switch actions when the screen indicated. The last action in the sequence was to look at a screen to the sub- ject’s right which was flashing a box. This was meant to be used to establish a baseline reading and examine SSVEP (see Motivation section), although that proved difficult.
Some example expressions were scratching the face, tugging ears, pretending to talk, blinking, and scratching the top of the head.
Each test took approximately 5 minutes in total.
Data Processing and Cleaning
Using start points on the video and EEG data, the two signals were aligned by hand and cropped to be within half of a second of each other. Some of the video data was hard to process due to various factors such as skin color and large amounts of hair preventing facial detection.
Additionally, some subjects exaggerated movements greatly, which made it difficult to track the face. However, most of these obstacles were overcome using the motion signal construction technique in the previous section.
The EEG data, E, then contained 14 channels at a sampling rate of 129 Hz and the motion data, M, consisted of 16 channels at a sampling rate of 15 Hz. Both of these data-sets were normalized to unit variance and zero mean before continuing. The motion data and EEG data were re-sampled to 516 Hz, which is the standard sampling rate in many of the medical grade headsets.
Removing Motion Components
The method for removing ICA components that contain motion involves first segmenting the data into small windows, E<subw and Mw. The authors found success with windows of 0.25 seconds, although they experienced acceptable results up to 3 second windows. Different segment sizes are appropriate for removing different motions. For example, a blink occurs in less than 0.5 seconds, but scratching the head or face might take longer.
Assuming EEG artifacts due to motion and normal EEG data would be statistically independent, blind source separation should be possible by performing independent component analysis (ICA) on Ew to extract the components. Once found, “bad” components could be removed and the original signal could be reconstructed from the remaining ones.
Psuedocode for this procedure can be found in Algorithm 1.
Similarity Measure for Motion and EEG Signal
A measure for deciding which components to remove was needed. Some similar methods use a classifier that is trained by EEG experts on artifacts in the subject’s data for that particular session. This is problematic as it requires experts to ensure that the headset was applied to the subject in a particular way to ensure quality contacts with the skin, that the test was controlled for movement, and to later clean out artifacts.
In order to achieve the same results on a commodity EEG in the home without an expert present, an unsupervised metric was needed. A simple normed cross-correlation (xcorr) metric, c was chosen. This allowed for slight shift invariance in the two signals, as they may not be perfectly aligned. However, due to segmenting the invariance an not extend beyond the window’s edges. The metric shown below.
This metric was indicator of how likely an EEG segment component correlated with a motion segment, with a value of 1 being exactly correlated and 0 being no correlation with any lag value. After experimentation with some various thresholds, empirical results indicated that 0.8 removed many ICA components that contributed to artifacts near high levels of motion in the time series. The original signal was then reconstructed without the bad components.
Note on ICA Algorithms
While several algorithms to choose components were tested, the one that performed best was the ICA implementation in EEGLab. FastICA performed well in some segments of the data, but it would fail to converge in others.
Overall, results of this method are rather mixed. While the method presented did identify and remove some motion-related components, it was not nearly as effective as desired. Typically, only noise with high amplitude that coincided with motion of high amplitude was removed. Additionally, analysis in the frequency domain indicates that some artifacts are introduced by the reconstruction that may harm analysis of the underlying EEG signal.
Attempts to reduce the undesired component threshold (0.8) to remove more noise resulted in large chunks of valid data being thrown out.
The reconstruction, for the most part, was close to the original signal. Some large portions of noise that correlated well were removed. Figure 4a highlights a segment of the signal in which some components were removed due to high correlation with the motion signal while the subject was talking.
In contrast, Figure 4b depicts two segments where no components were removed due to low correlation. The left segment has noise in the EEG signal but no movement (the subject’s movement was not in range of the camera). The right segment has some motion but little correlated noise.
In both cases, relevant EEG data seems to have been preserved. While no re-synthesis from a component analysis can provide a perfect reconstruction, this reconstruction is close.
Power Spectral Density
Power Spectral Density (PSD) is a well known method of analysis for EEG signals. The PSD shows the amount of different frequencies present in the signal. Muscle spasms and eye twitches should appear in the 5-200 Hz range. Therefore, if the method is successful then the cleaned data should have a similar PSD to the original data, perhaps with lees frequencies in those ranges.
However, comparisons of the power of the boxed ranges in Figure 5a to Figure 5b reveals that the power in these areas increased. Our method seems to have normalized (of sorts) the particular frequencies.
While more analysis must be done before saying the definitive cause of these artifacts, it is possible that the small segmentation windows cause frequencies to be introduced when components are removed.
The authors recognize that the results are unsatisfying and could be vastly improved. Many avenues for enhancement have been identified, but they unfortunately could not be implemented due to time constraints. One potentially impactful use of this work would be a way to train a neural network or classifier to automatically identify movements and associate them with EEG signals.
The analysis of the results is done by visual inspection of the time and frequency domains without an obvious success metric. Typical methods to remove artifacts with blind source separation have experts code the artifacts by spatial location using heuristics. Construction of a comprehensive data-set with annotated artifacts and associated video would be fantastic and allow for comparison against other method on the same subjects. It would also allow for this method to be evaluated objectively by ratio of artifacts removed and ratio of desired signal preserved.
Sliding Filter to Replace Segmentation
Segmentation into discrete chunks can lead to alignment issues between the motion and EEG signals or make artifact features unrecognizable. Allows for shift invariance across the entire signal would be ideal.
Additionally, segmentation is believed to be the cause of the artifacts introduced in the power spectral density of the reconstruction. Hopefully a sliding filter with overlapping windows would fix this.
Alternative Source Separation Algorithms
While FastICA was compared to another implementation of ICA in EEGLab, few other source separation techniques were attempted. Analysis of wavelet decomposition and non-negative matrix factoring as possible algorithms would be interesting.
Cross-correlation was also the only similarity measure used between ICA components and motion signals, and further work should be done to study other measures.
The current system relies on a lot of pre-processing and cannot be run online. With modifications to both the motion detection and ICA reconstruction portions of the algorithm, a system could be built to process each frame in under a second.
Instead of computing the Haar-Cascade twice for each frame, the motion detection method would be modified to initially compute SIFT features and then track optical flow using the Lucas-Kanade algorithm.
The method presented here shows preliminary results at being able to remove ICA components correlated with facial motion detected through a web-cam. While significant improvements must be made before the method is robust (see Results section), there are a few benefits offered by the proposed method over similar ones.
Namely, this method uses external information (video) that should be available in a nonmedical environment using the subject’s camera. It also requires no classification training by an expert, which allows subjects to perform EEG tests at home. This could be used for remote diagnosis kits, Neurofeedback, or enhanced learning. Finally, this method may alleviate some concerns over data quality in consumer headsets by allowing software post-processing to make up for the lower quality electrodes.
 Jung, Tzyy-Ping, et al. “Removing electroencephalographic artifacts by blind source separation.” Psychophysiology 37.02 (2000): 163-178.
 Jung, Tzyy-Ping, et al. “Removal of eye activity artifacts from visual event-related potentials in normal and clinical subjects.” Clinical Neurophysiology111.10 (2000): 1745-1758.
 Akhtar, Aadeel, et al. “Playing checkers with your mind: An interactive multiplayer hardware game platform for brain-computer interfaces.” Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE. IEEE, 2014.
 Viola, Paul, and Michael Jones. “Rapid object detection using a boosted cascade of simple features.” Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on. Vol. 1. IEEE, 2001.
 Lipton, Alan, et al. A system for video surveillance and monitoring. Vol. 2. Pittsburg: Carnegie Mellon University, the Robotics Institute, 2000.
 Duvinage, Matthieu, et al. “Performance of the Emotiv Epoc headset for P300-based applications.” Biomedical engineering online 12.1 (2013): 56.
 Delorme, Arnaud, and Scott Makeig. “EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis.” Journal of neuroscience methods 134.1 (2004): 9-21.
 Hyvarinen, Aapo. “Fast and robust fixed-point algorithms for independent component analysis.” Neural Networks, IEEE Transactions on 10.3 (1999): 626-634.
 Schwilden, Helmut. “Concepts of EEG processing: from power spectrum to bispectrum, fractals, entropies and all that.” Best Practice & Research Clinical Anaesthesiology 20.1 (2006): 31-48.
 Lowe, David G. “Distinctive image features from scale-invariant keypoints.”International journal of computer vision 60.2 (2004): 91-110.
 Lucas, Bruce D., and Takeo Kanade. “An iterative image registration technique with an application to stereo vision.” IJCAI. Vol. 81. 1981.