Electroencephalogram (EEG) Based Imagined Speech Decoding and Recognition

1State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology Tianjin 300130, China 2Biomedical Engineering Department, University of Ilorin, P.M.B. 1515, Ilorin-Nigeria 3Electrical and Electronics Department, Taraba State University, Jalingo-Nigeria 4Electrical Engineering Department, Ahmadu Bello University, Zaria-Nigeria Electroencephalogram (EEG) Based Imagined Speech Decoding and Recognition


INTRODUCTION
Neuroimaging techniques have made a signi cant contribution in decoding a brain physiological phenomena as signals to control a BCI system. ese phenomena include P300 evoked potential, slow cortical potential (SCP), visual evoked potential (VEP), and sensorimotor rhythms (SMR) to restore lost verbal communication for people with complete language system but de cit in verbal communication due to disease or injury [1,2]. Loss in verbal communication can be due to neurodegenerative disorganization that in uence speech articulation and motor production such as aphasia and its variants [3,4]. In speech comprehension and production, one of the aim of neural prosthetic device is to bring back communication to those affected patients by characterizing the neural activity [5]. Over the years, various neuroimaging methods have been in used and can be classi ed into invasive and noninvasive techniques. Invasive methods required microelectrode arrays to be implanted inside skull in the brain, as such it involves surgery by expert surgeons in order to obtain a high precision skill. Although this method provides good signal to noise ratio but the formation of a scar tissue over the device due to reaction to the extraneous ma er as well as complex surgery that make a permanent hole in the skull limit its application and as such causes a health risk to the patient which may not be worth acceptable [6]. Electrocorticography (ECoG) or intracranial EEG (iEEG) is a partially invasive method in which electrode arrays are implanted over the brain inside the skull, this method overcomes the problem of formation a scar tissue even though the signal strength is weak. Noninvasive techniques are the most widely used with EEG as most commonly acceptable neuroimaging technique. Other non -invasive methods are magneto encephalography (MEG) [7], near infrared spectrum imaging (NIRS), and functional magnetic resonance imaging (fMRI). e advantage of EEG neuroimaging method includes high temporal resolution, it is very portable, low cost, safer or low risks to the users.
Various BCI's application have made it possible for people to directly communicate between the brain and a computer to transfer messages from one's thought to the outside world to enable an individual to perform a non-muscular way of communication as well as to control his surroundings. When we perform a task, the brain generates a signal corresponding to the pa ern activity [8,9]. To explore and identify these pa erns is a challenging task and key to the successful BCI system. Over the past decade, there are various BCI techniques that have been developed to assist people with severe communication de cit to restore their communication [10]. ese neurotechnological devices ranges from a speller device, virtual key board, moving cursor on the screen to name a few [11,12]. Even though these type of techniques shows promising performance, the need for patient to learn how to adjust their brain activity in an arti cial and trained manner such as detecting le ers presented on a screen rapidly, rotating a cube, motions in order to operate an interface etc. limits their applications. erefore, to improve those techniques and to provide other alternatives to those people, a system which will allow people to communicate more naturally by directly translate or decode inner speech from brain signals is desirable [13,14].
Imagined speech or covert speech is the ability to produce representation of inner speech without any outside speech stimulation and self-generated verbal speech, to understand its underlying mechanism remain a great challenge by researches and also difficult to investigate inner neuronal process because of absence of behavioral output as well as complexity in time-lock exact events with neural activity during imagined speech [15,16]. A lot of effort have been in place to understand neural representation during imagined speech to improve neuroprosthetic devices and to develop various alternative approaches in analyzing neural signal features during imagined speech. Example is the work reported in [17,18] where they used imagined speech to identify different subjects, which proves the variability of EEG signal in performing the same task by different subjects. Also, EEG responses were decoded during imagination and perception of music [19].
is review paper provides a recent progress on decoding and investigating a neural processes, recognition, and monitoring during imagined speech for improving neuroprosthetic assistive communication devices. e paper review studies that have used EEG neuroimaging approach as this method enable us to monitor brain activity with high temporal resolution, it is very portable, low cost, and safer as compared to other approaches. We highlighted various imagined speech decoding and experimental paradigms to explore some challenges that are encountered during covert speech decoding. To develop a natural speech and realistic neuroprostheses device, future directions and new trends in tackling technical challenges have been explored and considered.
e remaining parts of this paper is organized as follows, section two brie y provide the properties of EEG signals, while section three introduce the general decoding model for characterizing brain activity and provides a review of studies that employed EEG signals in decoding imagined speech. Section four highlights future direction, challenges, and possible solutions for a successful and realistic brain machine interface system. Lastly, section ve concludes our discussion and ndings.

Imagined Speech Decoding System
is section provides the general overview of imagined speech decoding system. e system consists of the following stages as shown in Fig. 1. 1. Data acquisition 2. Pre -processing 3. Feature extraction 4. Classi cation and 5. Performance analysis and evaluation. Preprocessing is a process of artifacts and noise elimination in the recorded EEG data which is contaminated by external and internal factors such as environmental, power line interference, muscle artifacts (EMG), eyes Applied Materials and Technology blinking (Oculogram) etc. Preprocessing in imagined speech is an important step that in uence the classi er performance by removing the unwanted signals. Relevant and signi cant features are extracted, the most commonly applied method for BCI system are common-spatial pa erns [20], autoregressive coefficients [21], and spectro-temporal features [22]. In the classi cation stage, many classi ers have been employed for decoding imagined speech, the most popular ones are; support vector machine (SVM), linear discriminant analysis (LDA), and random forest (RF) classi ers. Summary of various techniques for preprocessing, feature extraction, and classi cation with their advantages and disadvantages are provided in section 3 and appendix I, II, and III respectively.

Data Acquisition
Data acquisition in speech decoding system is a process of collection and recording of neural data from the subjects participated in an experiment. Imagine speech dataset can be recorded either through invasive or noninvasive. Invasive EEG are called Electrocorticography (ECoG) while noninvasive method is by using EEG.
is method is commonly applied as it has low risk and safer to the subjects [23]. During the experiments, the subjects have to imagine a set of choosing phonemes, syllables, phrases or sentences while their EEG signals are recorded. e experimental protocol involves pre-trial period in which the subjects were prepared and acquainted with the experimental target. Following the pre-trial, the subjects were asked to imagine the set of stimuli in pre-de ned time interval. Example of time-locked experiment conducted in [7] is shown in g. 2. Some datasets used in imagined speech decoding are publically available to be accessed by interested researchers such as [24], while some are acquired upon request from the owners.

Preprocessing
To improve the efficiency of a classi er and also to reduce its computational complexity, the imagine speech data need to be preprocessed because not all the recorded data that are useful in classi cation stage as artifacts and noise are induced and contaminate the EEG data during the acquisition process. ese artifacts include heart beat artifacts (ECG), eye winks (EOG), muscle movements (EMG), those caused by electrode faults, power line, and interference from equipment's and devices [25][26][27]. e artifacts related to EEG signals can be divided into two types depending on their source. ose that their source is internally from a biological activities of a body are called as interior artifacts, while those that their source is external are called as exterior artifacts. e summary of types of artifacts are listed in table 1. In addition, the EEG signal has a low signal to noise ratio, therefore it is highly imperative to preprocess the data which involves the process such as down sampling, windowing, and ltering.  3. Functional Brain network areas that are involve in production, planning, and perception of speech [32].
techniques. Details of the feature extraction techniques are discussed in section 3 and appendix II.

Neural Correlates of Language
Early studies of speech recognition focused on identifying areas responsible for language processing in the human brain which uses patients that undergoing neurosurgery and those of neurological damage [28]. ese studies began as early as 1861 by Paul Broca, a neurosurgeon who investigated the human brains using nine patients with lesions and nd out that the language areas in humans are located at le hemisphere of the posterior frontal gyrus in the region called Broca's area. Some years a er, Carl Wernickle identi ed an area known as Wernicke's area which he hypothesized that the posterior part of the le temporal lobe is also included in the language comprehension. Several researches have explored the speech and language processing neural regions and developed some functional models and their functional signi cance [29][30][31].
ese models help in identifying areas that are involve in production, planning, and perception of speech. Fig.3. depicts these areas such as primary motor cortex, pre-motor cortex, Broca's area, Wernicke's area, primary auditory cortex etc. Fig. 4 shows how the EEG is recorded noninvasively using electrodes placed on the scalp with signals displayed on a computer to depicts the electrical activity of the brain when electrodes detected electrical charges. However, in some speci c applications, invasive electrodes can be used termed as intracranial EEG. ese electrical recordings from the surface of the brain or even from the outer surface of the head reveals that there is continuous electrical activity in the brain [33]. Based on the state of the brain, the frequencies and amplitudes of the brain signals changes such as during sleep, wakefulness, in a disease state like dementia, epilepsy, sleep disorder, etc. or in mental state. Fig. 5 shows an example of normal EEG signals. EEG signal is measured as the potential difference over time, between the active electrode and the reference electrode. e international 10-20 system accessed from Brain Master Technologies Inc. [34] is shown in Fig. 6. e multichannel EEG sets contain up to 128 or 256 active electrodes. ese electrodes are made of silver chloride (AgCl). A gel is used to creates a conductive path between the skin and the electrode for the ow of current. Electrodes that do not use gels, called 'dry' electrodes are made of materials such as titanium and stainless-steel.

Characteristics of EEG Signals
One of the most important scale in clinical EEGs for evaluating defects and in cognitive research is frequency. A recorded EEG has a frequency somewhere within the 0.01 Hz -100 Hz range. e frequency content can be divided into ve major bands known as delta, theta, alpha, beta, and gamma. Details on the frequencies associated with these bands are provided in Table 2.

GENERAL DECODING MODELS
Sophisticated predictive models are required for targeting BCI application to decode cognitive functions in real time for researchers to use multivariable neural features in complex and rich behavioral conditions [35,36]. A regression framework is widely used modelling method to link neural processes, mental state, and stimulus features. For example, we can model the stimulus features at particular instance as a weighted sum of the neural processes as in equation (1).
Where, Y (t) is the stimulus feature at time t, w (p) is the weight for a given feature p, X(t,p), is the neural processes at time instance t and feature p.
Classi cation can also be used as a decoding model in which from a nite set of options a neural activity can be recognized as a member to a discrete event type. Several machine learning algorithms can be use by both models such as support vector machine, neural networks, hidden markov models, and simple regression methods among others [37]. Some studies focused on summarizing the perception and imagination of speech and music into various models which relates the neuron's responses with an auditory stimulus.

Decoding of imagined speech based on EEG
To understand the neural representation of imagined speech from low-level acoustic features to higher-level speech representations, evaluation the relationship between imagined speech stimulus and neural response is a great challenge. In view of that, various studies have demonstrated and highlighted the bene t of EEG recordings to classify imagined speech representations. Early work in this area is the work of [20,43] in which they classi ed 5 different words using hidden markov model (HMM) classi er. [20,44] used spatial ltering through common spatial ltering in decoding silent vowel speech with Support Vector Machine (SVM).
[24] classi ed vowels using random forest a er down sampling the data and reported an accuracy of 22.32%. Discrimination of imagined speech in EEG was proposed in [45] using Tensor decomposition. Hilbert transform and Hilbert spectrum methods were used by [46], [47], and [48] to decode imagined speech using EEG signal, both studies used two different syllables during the experiment but with four and seven subjects respectively. Wavelet transform was used for feature extraction with alternated least squares approximation and down sample the data for vowel classi cation, they obtained the accuracy of 59.70% using SVM classi er [45]. Multi-class classi cation of words was proposed in [49] using connectivity features.
Imagined speech was used for subject recognition using auto regressive (AR) coefficients with k-Nearest Neighbor (k-NN) and linear SVM was performed in [50][51][52]. EEG signals was used to characterize 10 English language imagined phonemes [52 53] with Naïve Bayesian and Linear Discriminant Analysis for feature extraction and classi cation. Decoding Chinese characters based on EEG speech imagery was proposed by [54,55], common spatial pa erns (CSP) and SVM approach was employed for preprocessing and classi cation respectively. Japanese vowels were classi ed through EEG recordings using SVM [56]. Imagined speech classi cation based on Riemannian distance of correntropy spectral density was proposed in [57]. Word classi cation was performed by [58 59], English vowels [60][61][62][63][64], phonemic decoding [22], Spanish vowels [65] with different approaches in preprocessing, feature extraction and classi cation. In [66], extreme learning machine (ELM) was trained and tested on a raw EEG data classi cation and compared with several machine learning classi ers. ELM shows a promising result by outperforming other machine learning classi ers. Marthe et al. [19] proposed imagined music decoding and recognition model and hypothesed that the same linear neural decoding models used in imagined speech decoding can also be used to decode imagined music. eir proposed model achieved an accuracy of 69.2%.
Recent studies in the area of imagined speech through EEG recordings are tilting their a ention towards new methods and recent machine learning techniques such as deep learning [67,68].
ese include a classi cation of imagined speech using regularized neural network [69], using Arti cial Neural Network (ANN) to classify bilingual unspoken speech [70], and online classi cation of imagined speech for BCI based on EEG signals. Machine learning algorithms were applied to analyzed the similarity and differences among perception, production and imagination using EEG pa erns [71]. Deep learning approach have been applied in other areas such vision recognition, image processing, and motor imagery for many years, but only recently researches begin to explore the bene t of this technique in imagined speech processing, decoding and analysis from EEG [72][73][74][75]. Convolutional Neural Network (CNN) algorithm with both deep and shallow architecture was experimented to classify word pairs of the EEG dataset with an average accuracy of 62.37% and 60.88% for the deep and shallow CNNs, respectively [66]. In another study, ve vowels were classi ed a er down sampling the data to 128 Hz, Independent Component Analysis (ICA) with Hessian approximation for artefact removal was deployed in preprocessing the data. Classi cation was performed using deep CNN with 32 layers. Five main vowels (a, e, i, o, u) and six different words were classi ed by Tamm et al. [76]. ey proposed a low in computation with few number of layers using CNN model and achieved the accuracy of 23.98%. To improve CNN model, an optimized structure was proposed by Cooney et al. [77] by optimizing input layers to decode imagined speech using transfer learning.

FUTURE CHALLENGES
For the successful implementation of BCI system using imagined speech based on EEG recordings, the future research direction must focus on key and challenging step which is to apply the different levels of speech processing and representation to imagined speech due to the absent of behavioral output and difficulty in monitoring the spectrotemporal structure. To time-lock a brain processes to (1)

Research Article
Applied Materials and Technology a behavioral state or a quanti able stimulus is complex task as an experimenter cannot directly monitor the imagination. erefore, some standards techniques/models that matched input-output data has to be employed. Also, several factors such as age, emotion, gender, dialect, and pronunciation affects natural speech which results in temporal irregularities. Other challenges explored from our study with their remedy include those associated with how to design a good experimental task, proper training of participants, generating sufficient amount of data, using effective and good electrodes as well as improve their design, employing unsupervised machine learning techniques among others. Recently, most of the researchers focused their a ention on applying deep learning algorithm in decoding and recognition of imagined speech, therefore huge amount of data must be generated to train the network effectively. Also, the requirement of higher computational resources in deep learning models must be addressed to enable other researches to experiment and evaluate their models to realize precise, practical, and reliable non-invasive models. Finally, homogeneous performance comparison among the developed techniques is difficult as there is lack of standardization for evaluating their performance due to different datasets with different sampling frequency, number of electrodes and other parameters.

CONCLUSIONS
In conclusion, this paper highlights the progress and explored the potential of using various decoding models and algorithms through collection of studies that used EEG signals recording to identi ed neural mechanisms related to complex speech production and functions. ese models are also capable of recognizing and characterizing the important components of natural communication, that is speech and language perception and production directly from brain activity. Several studies reviewed in this paper reveals a promising result for classi cation of either phonemes, vowels and words from EEG brain signals but still shows that a lot of work need to be conducted to provide a realistic and efficient brain machine interface and neuroprosthetic devices. Recent machine learning techniques such as deep learning need to be further investigated, improvements in areas of experimental paradigm is paramount for be er recognition, preprocessing, and feature extraction as they are the key aspects in development of new communication interfaces.

Applied Materials and Technology
This article is licensed under a Creative Commons Attriution 4.0 International License.