chapter one

DETERMINED BLIND SIGNAL DE-NOISING

FOR ENHANCED COMMUNICATION USING

MACHINE INTELLIGENCE

DENIS OMBATI OMWERI

MASTER OF SCIENCE

Telecommunication Engineering

JOMO KENYATTA UNIVERSITY OF

AGRICULTURE AND TECHNOLOGY

2015

Determined Blind Signal De-Noising for Enhanced

Communication Using Machine Intelligence

Denis Ombati Omweri

Thesis Submitted in Partial Fulfilment for the Degree of

Masters of Telecommunication Engineering in the Jomo

Kenyatta University of Agriculture and Technology

2015

DECLARATION

This thesis is my original work and has not been presented for a degree in any other

University.

Signature ………………………………………. Date ……………………..

Denis Ombati Omweri

Declaration by Supervisors

This thesis has been submitted for examination with our approval as University supervisors:

Signature ………………………………………. Date ……………………..

Dr. Edward N. Ndungu

JKUAT, Kenya.

Signature ………………………………………. Date ……………………..

Dr. Livingstone M. Ngoo

Multimedia University, Kenya. ii

DEDICATION

This thesis is dedicated to my parents and siblings. iii

ACKNOWLEDGEMENT

First I would like to thank the Almighty God for abundant grace throughout this journey called life. I would also like to express my deepest gratitude and appreciation to my two supervisors, Dr. E. N. Ndungu and Dr. L. M. Ngoo during my tenure at

Jomo Kenyatta University of Agriculture and Technology, for their enthusiastic encouragement, excellent guidance and kind support during this research and development of this thesis.

I would also like to acknowledge the Department of Telecommunication and

Information Engineering for offering me the opportunity to study, the financial support and research facilities. Special thanks also go to my colleagues and staff at the School of Electrical and Electronics and Information Engineering for their help and support.

I express special thanks to all my friends and colleagues who have inspired me a lot during the course of research work and all those who provided free resources to be accessed over the internet for researching this project. Finally, I would like to sincerely thank my family for their continued support and encouragement. They rendered me enormous support during the course of this project work and without whose support this educational opportunity would not have been possible. iv

TABLE OF CONTENTS

DECLARATION ....................................................................................................... II

DEDICATION .......................................................................................................... III

ACKNOWLEDGEMENT ....................................................................................... IV

TABLE OF CONTENTS .......................................................................................... V

LIST OF TABLES .................................................................................................... IX

LIST OF FIGURES .................................................................................................. X

APPENDICES ....................................................................................................... XIII

LIST OF ABBREVIATIONS ............................................................................... XIV

ABSTRACT ........................................................................................................... XVI

CHAPTER ONE ........................................................................................................ 1

1.

INTRODUCTION .............................................................................................. 1

1.1 I NTRODUCTION TO B LIND S OURCE S IGNAL P ROCESSING ................................ 1

1.2 B LIND S OURCE S EPARATION B ACKGROUND ................................................... 2

1.3 T HE C OCKTAIL P ARTY P ROBLEM .................................................................... 2

1.4 J USTIFICATION OF THE RESEARCH ................................................................... 3

1.5 O

BJECTIVES AND THE

S

COPE OF

W

ORK .......................................................... 4

1.5.1 Main Objective .......................................................................................... 4

1.5.2 Specific objectives ..................................................................................... 4

1.5.3 The Scope of work ..................................................................................... 5

1.6 T

HESIS

L

AYOUT .............................................................................................. 5

CHAPTER TWO ....................................................................................................... 7

2.

BLIND SOURCE SEPARATION OVERVIEW .............................................. 7

2.1 I

NTRODUCTION ............................................................................................... 7

2.2 I

NSTANTANEOUS BLIND SOURCE SEPARATION ............................................... 11

2.3 A DAPTIVE A LGORITHM ................................................................................ 13

2.3.1 Independent Component Analysis (ICA)................................................. 13

v

2.3.2 Infomax: Entropy and Independence ....................................................... 14

2.3.3 Independence of the extracted signals ..................................................... 14

2.4 I NFORMATION T HEORY ................................................................................. 17

2.4.1 Entropy of a Single Event ........................................................................ 17

2.4.2 Exploring a single signal row .................................................................. 20

2.4.3 Entropy of Multiple Variables ................................................................. 23

2.5 T

HE PDF OF THE EXTRACTED SIGNALS .......................................................... 25

2.6 I

NFOMAX EXPRESSION FOR

E

NTROPY ........................................................... 26

2.7 E

VALUATING THE

G

RADIENT OF

E

NTROPY ................................................... 27

CHAPTER THREE ................................................................................................. 31

3.

RADIAL BASIS FUNCTION NETWORKS ................................................. 31

3.1 I NTRODUCTION ............................................................................................. 31

3.2 I NTERPOLATION PROBLEM ............................................................................ 32

3.2.1 Solution to Interpolation Problem. .......................................................... 38

3.2.2 Regularization Networks ......................................................................... 40

3.3 G ENERALIZED R ADIAL B ASIS F UNCTION N ETWORKS ................................... 43

3.4 W EIGHTED N ORM ......................................................................................... 46

3.5 A DAPTIVE R ADIAL B ASIS F UNCTION ............................................................ 48

3.6 L EARNING S TRATEGIES IN RBF .................................................................... 49

3.7 S ELF -O RGANIZED C ENTRES ......................................................................... 50

3.8 K-M EANS C LUSTERING A LGORITHM ........................................................... 51

3.9 S UPERVISED L EARNING S TRATEGY ............................................................... 53

3.9.1 Error Minimization Algorithm ................................................................. 54

3.9.2 Merits of Generalized Radial Basis Function for Blind Source Separation

55

CHAPTER FOUR .................................................................................................... 57

4.

METHODOLOGY ........................................................................................... 57

4.1 A

DOPTING THE

I

NFOMAX

A

LGORITHM ......................................................... 58

4.2 ICA I NFOMAX M ATLAB I MPLEMENTATION ................................................... 60

4.3 P

ERFORMANCE

M

EASUREMENT ................................................................... 60

vi

4.4 O BSERVATIONS M ADE .................................................................................. 61

4.4.1 Results for M = 2 Source Signals ............................................................ 62



4.5 S

UMMARY ON

ICA A

LGORITHM ................................................................... 68

CHAPTER FIVE ..................................................................................................... 70

5.

SIMULATION AND RESULTS ...................................................................... 70

5.1 R

ADIAL

B

ASIS

N

ETWORK

M

ODELLING OF

D

ATA .......................................... 70

5.2 N

ATURAL

G

RADIENT

D

ESCENT

A

LGORITHM ................................................ 72

5.3 A S

AMPLE

C

OMMUNICATION

S

IGNAL ........................................................... 76

5.4 I NFOMAX A LGORITHM FOR P OLAR NRZ S IGNAL ......................................... 76

5.4.1 The Triangular Model .............................................................................. 78

5.4.2 The Hyperbolic Tangent Model ............................................................... 82

5.5 R ADIAL B ASIS F UNCTION N ETWORK S IMULATIONS ..................................... 84

5.6 T HE P OLAR N ON -R ETURN TO Z ERO S IMULATIONS ....................................... 88


5.6.2 Results for M = 3 NRZ Source Signals ................................................... 89

5.6.3 Results for M = 3 Nonlinear Mixing Function ........................................ 91

5.7 P ERFORMANCE M EASUREMENT IN B LIND S IGNAL D E NOISING .................... 92

5.7.1 Results of Using RBFN in Nonlinear Non-Return to Zero source mixtures

98

5.7.2 Summary of the RBFN Performance as compared to those of ICA ...... 100

CHAPTER SIX ...................................................................................................... 103

6.

CONCLUSION AND RECOMMENDATION ............................................ 103

6.1 C

ONCLUSIONS ............................................................................................ 103

6.2 R

ECOMMENDATIONS ................................................................................... 104

6.3 E

XTENSIONS OF

I

NFORMATION

M

AXIMIZATION AND

S

UPERVISED

R

ADIAL

B

ASIS

F

UNCTION

N

ETWORK .................................................................................. 104

REFERENCES ....................................................................................................... 107

vii

APPENDICES ......................................................................................................... 111

LIST OF PUBLICATIONS ................................................................................... 127

viii

LIST OF TABLES

Table 4. 1: Algorithm variable values for M=3. .................................................. 62




Table5. 1: Algorithm variable values for M=2. .................................................. 84

Table 5. 2: Algorithm variable values for M = 4. ................................................ 87

Table 5. 3: Case 1 -Two Source Signal and Case 2 -Four Source Signals. ........ 101

Table 5. 4: Case 3 Three NRZ Source Signal Estimation with Linear Mixture. 101

Table 5. 5: Case 4: Three NRZ Source Signal Estimation with Non-linear Mixture.

.......................................................................................................... 102

ix

LIST OF FIGURES

Figure 1-1: The "Cocktail Party” Problem [1]. ....................................................... 3

Figure 2-1: The block illustrating blind signal processing or blind source identification, general schematic [4]. ................................................... 8

Figure 2-2: The nonlinear model with additive noise [6]. ...................................... 9

Figure 2-3: Adaptive System Identification [7]. ................................................... 10

Figure 2-4: Block diagram of basic linear instantaneous blind source separation problem represented by vectors and matrices [6]. ............................. 12

Figure 2-5: Infomax Strategy. ............................................................................... 17

Figure 2-6: Entropy of a two-event example with equal probability. ................... 19

Figure 2-7:

Transformation of y to Y , From Ref. [3]. .......................................... 21

Figure 3-1: Three examples of separable dichotomies of different sets of five points in two dimensions: (a) linearly separable dichotomy; (b) spherically separable dichotomy; (c) quadratically separable dichotomy [3]

[21]. .................................................................................................... 34

Figure 3-2: The Gaussian radial basis function with centre and radius [19]. ....... 37

Figure 3-3: Regularization network. ..................................................................... 41

Figure 3-4: Generalized radial basis function network [24]. ................................ 46

Figure 3-5: The RBFN Structure. ......................................................................... 55

Figure 4-1: ICA-RBFN flow chart. ....................................................................... 57

Figure 4-2: A sample audio signal. ....................................................................... 58

Figure 4-3: Hyperbolic tangent function and hyperbolic tangent derivative. ....... 59

Figure 4-4: An approximate of two sources through ICA. ................................... 63

Figure 4-5: h ( Y )

,gradient

 h and 𝜆 values for M



2 . ........................................ 64

Figure 4-6: ICA Infomax for M = 3. ..................................................................... 65

Figure 4-7: h ( Y )

, gradient  h

and 𝜆 values for 𝑀 = 3 . ...................................... 66

Figure 4-8:

An approximate of the sources through ICA. .................................... 67

Figure 5-1: A sample NRZ signal. ........................................................................ 76

Figure 5-2: Theoretical polar NRZ cdf. ................................................................ 77

Figure 5-3: Theoretical polar NRZ pdf. ................................................................ 77

Figure 5-4: Approximate polar NRZ pdf. ............................................................. 78

x

Figure 5-5: Approximate polar NRZ cdf 

( y )

, pdf 

' ( y )

, and 

' ' ( y )

for  

0 .

1 .

............................................................................................................ 81

Figure 5-6: Approximate polar NRZ cdf 

( y )

, and pdf 

'' ( y )

with  

10

. ........ 83

Figure 5-7: RBFN approximation of the sources. ................................................. 85

Figure 5-8: SDR comparison between ICA and RBFN Using two source signals. ............................................................................................... 86

Figure 5-9: SIR comparison between ICA and RBFN Using two source signals. 87

Figure 5-10: Radial Basis Function Approximation of four sources signals. ......... 88

Figure 5-11: RBFN approximation of two sources. ............................................... 89

Figure 5-12: An approximate of two sources through ICA. ................................... 89

Figure 5-13:

An approximate of four sources through ICA. .................................. 90

Figure 5-14: An approximate of four sources by RBFN. ....................................... 91

Figure 5-15:

An approximate of the sources through ICA. .................................... 91

Figure 5-16: RBFN approximation of three sources. ............................................. 92

Figure 5-17: SDR comparison between ICA and RBFN Using two source signals. ............................................................................................... 93

Figure 5-18: SIR comparison between ICA and RBFN Using two source signals. 94

Figure 5-19: SIR comparison between ICA and RBFN Using four source signals. 95

Figure 5-20: SDR comparison between ICA and RBFN Using four source signals. ............................................................................................... 95

Figure 5-21: SAR comparison between ICA and RBFN Using four source signals. ............................................................................................... 96

Figure 5-22: SIR comparison between ICA and RBFN Using three source signals. ............................................................................................... 97

Figure 5-23: SDR comparison between ICA and RBFN Using three source signals. ............................................................................................... 97

Figure 5-24: SAR comparison between ICA and RBFN Using three source signals. ............................................................................................... 98

Figure 5-25: SIR comparison between ICA and RBFN Using three source signals. ............................................................................................... 99

xi

Figure 5-26: SDR comparison between ICA and RBFN Using three source signals. ............................................................................................... 99

Figure 5-27:

SAR comparison between ICA and RBFN Using three source signals .............................................................................................. 100

xii

APPENDICES

Appendix A: The Relationship Between Jacobbian Matrix and the Unmixing Matrix.

........................................................................................................... 111

Appendix B: Infomax Expression For Entropy. .................................................... 112

Appendix C: Matlab Codes. ................................................................................. 113

Appendix D: Radial Basis Function Codes………………..…………………….120

Appendix E.

List of Publications…………………………………...……………127 xiii

BSE

BSP

BSS cdf dpdf

MEG

ECG

EEG

EM

FD

FHSS

GRBFN

G

ICA

Infomax

GMM

IF

LMS

MATLAB

MIMO

MBD

ME

MMI

ML

MLP

NRZ pdf

RBFN

RZ

SAR

LIST OF ABBREVIATIONS

-Blind Signal Extraction

-Blind Signal Processing

-Blind Source Separation

-Cumulative Density Function

-Derivative Of Probability Density Function

-Magnetoencephalography

-Electrocardiogram

-Electroencephalogram

-Expectation Maximization

-Frequency-Domain

-Frequency-Hopped Spread Signals

-Generalized Radial Basis Function Network

-Green’s Function

-Independent Component Analysis

Information Maximization

-Gaussian Mixture Model

-Intermediate Frequency

-Least Mean-Square

-Matrix Laboratory

-Multiple-Input Multiple-Output

-Multichannel Blind Deconvolution

-Maximum Entropy

-Minimization of Mutual Information,

-Maximum Likelihood

-Multilayer Perceptron Networks

-Non-Return to Zero

-probability density function

-Radial Basis Function Network

-Return to Zero

-Signal to Artifacts Ratio xiv

SDR

SIR

SNR

SVD

TDICA

WLAN

-Signal to Distortion Ratio

-Signal to Interference Ratio

-Signal to Noise Ratio

-Singular Value Decomposition

-Time-Domain ICA

-Wireless Local Area Network xv

ABSTRACT

A hands-free speech recognition system and a hands-free telecommunication system are essential for realizing an intuitive, unconstrained, and stress-free human-machine interface. In real acoustic environments, however, the speech recognition and speech recording are significantly degraded because one cannot detect the user’s speech signal with a high signal-to-noise ratio (SNR) owing to the interfering signals such as noise.

In this thesis, blind source separation (BSS) algorithm and Artificial Neural Networks

(ANN) are applied to overcome this problem. Artificial intelligence (AI) is the intelligence behaviour exhibited by machines or software. Intelligent Machine, therefore, is a system that perceives its environment and takes actions that maximize its chances of success. Radial Basis Function network is used in this research work.

Independent Component Analysis (ICA); a statistical signal processing technique having emerging and new practical application areas, such as blind signal separation and analysis of several types of data for feature extraction, is used as a preprocess to

ANN. In blind separation, ICA algorithm separates the independent sources from their mixtures by measuring non-Gaussian variables of data. Blind ICA is a common method used to identify artefacts and interference from their mixtures and is applied in fields such as electroencephalogram (EEG), magnetoencephalography (MEG), and electrocardiogram (ECG). Therefore, based on these valuable applications, ICA is implemented for real-time signal processing like in hands free communications systems. The ICA-based BSS can be classified into two groups in terms of the processing domain, i.e., frequency-domain ICA (FDICA) and time-domain ICA

(TDICA). This thesis implements time domain Independent Component Analysis

(ICA) to separate signal mixtures. Blind ICA also acts as a preprocess for RBF network such that the network complexity is reduced.

This thesis, therefore, presents ICA-Radial Basis Function (ICA-RBF) based on maximum entropy which performs separation of mixed signals and generalization of input signals. The proposed algorithm for blind source system, also maximizes Signalto-Interference Ratio (SIR) and Signal-to-Distortion Ratio (SDR) of the extracted xvi

signals. This research work specifically emphasizes on information-theoretical approach, filtering and the associated adaptive nonlinear learning algorithm, focusing on a numbers of signals: audio signals and digital communications signals (polar nonreturn to zero signals).The results have shown that a RBF network with ICA as an input pre-process has not only a better generalization ability to the one without preprocessing, but also the former’s performance converges much faster. To verify the proposed algorithm, MATLAB simulations are also performed for both off line signal processing and real-time signal processing show that the proposed method gives better

Signal to interference Ratio and Signal to Distortion Ratio than early ICA techniques without involving neural networks. MATLAB implementation codes are included as appendices. xvii

CHAPTER ONE

1.

INTRODUCTION

1.1

Introduction to Blind Source Signal Processing

In this chapter, the background of the problem is described. The objectives of this research as well as the importance of the study are outlined.

As the use of wireless communication expands, more signals get picked from the environment, causing pervasiveness resulting in overcrowding in the spectrum and increasing number of overlapping signals. These signals cause co-channel interference when they overlap in time and frequency domain components. When these superimposed signals are received, they are generally difficult to demodulate. This is due to the influence of the interfering signal which leads to inaccurate statistical decisions at the receiver, resulting in inaccurate demodulation [1]. In military applications for example, the ability to correctly demodulate received signals affects friendly communications capabilities as well as hostile threat assessments. Cochannel signals are often received as signal mixtures, although the nature of the source signals and the mixing process is usually unknown. The problem of finding original signals from a mixture of signals is called blind source separation [2].

1

1.2

Blind Source Separation Background

Blind Source Separation (BSS) is a term used to describe the method of extraction of underlying source signals from a set of observed signal mixtures with little or no information as to the nature of those source signals. This further can be formulated as the problem of separating or estimating the waveforms of the original sources from an array of sensor or transducer signals without knowing the characteristics of the transmission channels. Blind source separation has a variety of applications, including neural imaging, economic analysis, and signal processing. A classic example of blind source separation is the cocktail party problem [3]. The cocktail party problem is explained in the following section.

1.3

The Cocktail Party Problem

The cocktail party problem considers the example of a room full of people speaking simultaneously with microphones scattered throughout the room and each microphones records a mixture of all the voices in the room. The problem, then, is to separate the voices of the individual speakers using only the recorded mixtures of their voices. A simplified version of the cocktail party [1] is shown in figure 1.1.

2

Figure 1-1: The "Cocktail Party” Problem [1] .

This version of “Cocktail Party” is simplified but as the number of sources increases the problem becomes more complicated with dire need of a more intelligent system for source separation. Independent Component Analysis (ICA) is one of the many methods used to address the problem of blind source separation. In this thesis ICA is the main method to be used together with RBFN.

1.4

Justification of the research

Blind separation of source signals has received wide attention in various fields such as biomedical signal analysis, data mining, speech enhancement, image recognition and wireless communications. In the recently expanding wireless communication, for example, more signals are introduced to the environment. The pervasiveness of this signals results in overcrowding in the spectrum and an increasing number of

3

overlapping signals. Multiple signals overlapping in frequency and time create cochannel interference. When these superimposed signals are received, they are generally difficult to demodulate. This is brought about by the influence of interfering signal on the decision system at the receiver, resulting to inaccurate demodulation and unacceptable indeterminacies. Therefore, there remains a prime objective of processing the observed signals, using an adaptive system in such an acoustic environment to facilitate communication.

1.5

Objectives and the Scope of Work

The research objectives are as follows;

1.5.1

Main Objective

The main objective of this thesis is to simulate blind source separation system model using Independent Component Analysis and Radial Basis Function network ICA-RBF.

The system will be based on the output sensor (mixed) signals, extracted signals.

1.5.2

Specific objectives

The specific objectives of this research are

(i) To analyse ICA methods based on information theory as viewed by James Stone for implementation in any application software.

(ii)

To obtain a better generalization of the filtered signals using artificial neural networks of radial basis function RBF type in an application software and hence determine the suitability of the proposed ICA-RBF method for a blind source separation system.

4

(iii) Compare the performance of the proposed technique based on its merits and demerits with reference to already popular blind source separation method and then draw conclusions on ICA-RBF Networks based on the attained performance measures.

1.5.3

The Scope of work

In this work, statistical signal analysis is carried out. It features maximum information and independence maximization using entropy. The time domain simulations are performed using MATLAB/Simulink simulation package. Thereafter, independent component algorithm and a “machine intelligent” of neural network type are incorporated to perform a correct separation of the mixed source signals. The Artificial

Neural Network (ANN) of Radial Basis Function type is used. This type of ANN, with both supervised and unsupervised hidden neurons, connected with Infomax algorithm achieved better separation of the sources.

Finally, a performance analysis is carried out to validate the filtering performance of the used method.

1.6

Thesis Layout

This thesis consists of six chapters and has six appendices. In Chapter One, an introduction to the subject of research is given that is the “Cocktail party” problem, the problem statement and the objectives are provided.

Chapter Two, contains the concept of Blind Signal Processing (BSP) and the Independent Component Analysis

(ICA). Chapter Three, introduces the artificial intelligence of the radial basis function

5

type and discussions on how these networks can be used to increase input-output space dimensionality for efficient decomposition and hence easier de-noising. Radial basis function network learning strategies adopted are also discussed. The k-means clustering algorithm for unsupervised learning is also discussed.

Chapter Four, contains the methodology. It gives the modelling and simulation adopted for the input signals. This chapter also, gives the model communication signal

(Non-return to zero signal) and all the results obtained by MATLAB simulation. In summary Chapters Five shows the exclusive work done in this research.

Chapter

Six, presents the conclusion drawn on the achievements and some recommendations on further research work that may be considered in this area. The last part contains a list of selected publications related to the work done during the time of this research.

6

CHAPTER TWO

2.

BLIND SOURCE SEPARATION OVERVIEW

2.1

Introduction

This chapter deals mainly with the concept of information theory in blind source processing. It also gives an overview of the existing ICA algorithm.

A fairly general blind signal processing (BSP) problem can be formulated as follows

[2]. The mixed or sensor signals are recorded as 𝐱(𝑡) = [𝑥

1

(𝑡), 𝑥

2

(𝑡), … 𝑥

𝑀

(𝑡)] 𝑇 from a multiple-input multiple-output (MIMO) system. After recording these observed signals, the objective then is to find an inverse system, termed a reconstruction system with the aid of neural network or an adaptive inverse system. Thereafter, if it exist, is should be stable, in order to estimate the primary unknown source signals 𝐬(𝑡) =

[𝑠

1

(𝑡), 𝑠

2

(𝑡), … 𝑠

𝑁

(𝑡)] 𝑇 . This problem is referred to as BSS whereby the random vector x is obtained by finding a rank transforming or a separating matrix W of 𝑁 х 𝑀 such that the output signal vector is given by 𝐲(𝑡) = [𝑦

1

(𝑡), 𝑦

2

(𝑡), … 𝑦

𝑀

(𝑡)] 𝑇 and is defined as 𝐲 = 𝐖x .

The output vector y is an approximate of vector s and hence 𝐬̃(𝒕) = 𝐲(𝒕) , and it contains independent source components as possible. This will be as measured by information theory cost function known as Infomax. This is to say, it is required for the system to adapt the weights 𝑤 𝑖𝑗

of the matrix W of the linear system to combine the mixed signals x to generate estimates of the source signals. The optimal weights

7

will correspond to the statistical independence of the output signals y ( t )

. This inverse system as in Figure 2.1 should be adaptive in such a way that it has some tracking capability in non-stationary environments [4]. In many cases, source signals are simultaneously linearly mixed and filtered and therefore, the aim is to process the observations in such a way that the original source signals are extracted by the adaptive system.

Figure 2-1:The block illustrating blind signal processing or blind source identification, general schematic [4] .

The problem of separating and estimating the original source waveforms from the sensor array, without knowing the transmission channel characteristics and the sources is generally referred to in literature as a problem of Independent Component Analysis

(ICA), Blind Source Separation (BSS), Blind Signal Extraction (BSE) or Multichannel

Blind Deconvolution (MBD). The nonlinear model with additive noise 𝐯(𝑡) for performing blind source separation is as shown in Figure 2.2.

Roughly speaking, these solutions can be formulated as the problems of separating or estimating the waveforms of the original sources from an array of sensors or

8

transducers without exactly knowing the characteristics of the transmission channels

[2]. This though appears to be something magical about blind signal processing that is estimating the original source signals without knowing the parameters of mixing and or filtering processes. It is difficult to imagine that one can estimate this at all. In fact, without some a priori knowledge, it is not possible to uniquely estimate the original sources signal. However, one can usually estimate them up to some level of indeterminacies. In mathematical terms these indeterminacies and ambiguities are called dictionaries [2] and can be expressed as arbitrary scaling, permutation and delay of estimated source signals [5]. These indeterminacies, however, preserve the waveforms of original source signals.

Figure 2-2: The nonlinear model with additive noise [6] .

Although these indeterminacies seem to have severe limitations, in a greater number of applications these limitations are not essential. This is because the most relevant information about the source signals is contained in the waveforms of the sources and not in their amplitudes or order in which they are arranged in the output of the system

[4]. Therefore, it is generally expected that in the dynamic system there is no guarantee

9

that the estimated or extracted signals have exactly the same waveforms as the source signals and then the requirements must be sometimes further relaxed to the extent that the extracted waveforms are distorted filtered version of the primary source signals.

At this point emphasis is given about the essential difference between the standard direct inverse identification problem and the blind signal processing. In the basic linear identification or inverse system we have access to the input source signals [7] as in

Figure 2.3.Here, normally the objective is to estimate the input by minimizing the mean square error between the delayed or model source signals and the output signals.

On the other hand in the BSS problem one does not have access to the source signals which are assumed to be statistically independent, and so the need to design an appropriate non-linear filter to estimate the desired signals say 𝐬(𝑡) .

Figure 2-3: Adaptive System Identification [7] .

10

2.2

Instantaneous blind source separation

In blind sources separation problem, the mixing and filtering processes of the unknown input sources 𝑠 𝑗

(𝑡)(𝑗 = 1, 2, … 𝑁) may have different mathematical or physical models depending on specific applications.

In the simplest case,

M mixed signals 𝑥 𝑖

(𝑡) , (𝑖 = 1, 2, … 𝑀) are linear combinations of 𝐬(𝑡) = [𝑠

1

(𝑡), 𝑠

2

(𝑡), … 𝑠

𝑁

(𝑡)] 𝑇 typically 𝑀 ≥ 𝑁, unknown, mutually and statistically independent, zero mean signals that are noise contaminated as in Figure

2.1, and this can be written as; x i

( t )

 j

N 



1

H ij s j

( t )

 v i

( t ), ( i



1 , 2 ,..., M )

(2.1)

Bold letters will be used for multichannel variables, such as the vector of observation x , the vector of sources s , or the mixing system

H

, and plain letters for mono-channel variables such as the 𝑗 𝑡ℎ source 𝑠 𝑗

. The matrix H is called the mixing matrix of 𝑁 х 𝑀 dimension [4].

Therefore, in vector notation equation (2.1) is x ( t )



Hs ( t )

 v ( t ),

(2.2) where is 𝐱(𝑡) = [𝑥

1

(𝑡), 𝑥

2

(𝑡), … , 𝑥

𝑀

(𝑡)] 𝑇 vector of sensor signals, 𝐬(𝑡) =

[𝑠

1

(𝑡), 𝑠

2

(𝑡), … , 𝑠

𝑁

(𝑡)] 𝑇 is a vector of sources, 𝐯(𝑡) = [𝑣

1

(𝑡), 𝑣

2

(𝑡), … , 𝑣

𝑀

(𝑡)] 𝑇 is a vector of additive noise and

H

is an unknown full rank 𝑀 × 𝑁 mixing matrix. This basically means that the signals received by an array of sensors, microphones, antennas, transducer are weighted sums of or rather linear mixtures of primary sources

[4] [8]. These sources are time varying, zero mean, mutually statistically independent

11

and totally unknown as is the case of arrays of sensors for communication or speech signals.

It is assumed that the number of source signals 𝑁 is unknown unless stated otherwise and that if only the sensor vector x ( t ) is available it is necessary and possible to design a neural network [9] as one shown in the Figure 2.4. The system will be associated with adaptive learning algorithm that enables estimation of sources, identification of the mixing matrix

H and the separating matrix

W

with good tracking abilities. So the output of this system will be containing components that are independent as possible measured by the information theoretic cost function called Infomax. The weights are adapted to generate estimates of the source signals 𝒔̃ . 𝒔 𝒊 𝑗

(𝑡) = ∑ 𝑀 𝑖=1

𝑊 𝑖𝑗 𝑥 𝑗

(𝑡) (𝑗 = 1, 2, … , 𝑁)

(2.3)

Figure 2-4: Block diagram of basic linear instantaneous blind source separation problem represented by vectors and matrices [6] .

12

2.3

Adaptive Algorithm

2.3.1

Independent Component Analysis (ICA)

Multidimensional data analysis plays a major role in solving BSS problems. Proper representation of multivariate data commonly encountered in signal processing, pattern recognition, neural networks and statistical analysis is essential for visualization of the underlying geometry of the data structure. Linear transformations are used to exploit possible dependencies and reduce dimensionality of the multivariate data sets. Principal Component Analysis (PCA) and Independent

Component Analysis (ICA) are the two prominent methods for linear transformation.

Particularly ICA has been very effective in separating independent sources from the mixed signals. Hence, the principles of ICA are proposed to be used in the BSS problem. Independent Component Analysis (ICA) is a method applied to find unknown sources signals from signal mixture, and one of many approaches to the BSS problem. Just as ICA is one among many methods of resolving signal mixtures into their original signals, many approaches have been developed to perform ICA [10].

These approaches include Maximum Entropy (ME), Minimization of Mutual

Information (MMI), Maximum Likelihood (ML) and Fixed Point (FP) [10]. This research will concentrate on information maximization and hence the term “Infomax” algorithm, which finds a number of independent sources from the same number of signal mixtures by maximizing entropy of the signals.

13

2.3.2

Infomax: Entropy and Independence

Entropy is basically the average information obtained when the value of random variable is found, and Infomax is based on the fact that the maximum entropy of joint continuous variables occurs only when the random variables are statistically independent. Therefore if entropy is maximized, the resulting signals must be independent. If the contributing signals are independent, then these signals must be the original sources signals. Therefore, “Infomax” is a method of finding mutually independent signals by maximizing information flow or entropy and or otherwise less mutual information hence Minimal Mutual Information (MMI) [8].

The infomax algorithm achieves the maximum entropy of a function using gradient ascent, an iterative process taking steps in the direction of maximum gradient until a local maximum is reached. If repeated a global maximum will be found. When the global maximum of entropy is found using gradient ascent, entropy is maximized, and the resulting signals are the source signals, with a level of indeterminacies [8]. This will invoke a filtering process with the help of Radial Basis Function Neural (RBFN) networks.

2.3.3

Independence of the extracted signals

ICA, is proposed as a pre-processing network to RBFN because it is a multivariate, parallel and more robust as compared to projection pursuit and Principal Component

Analysis (PCA). These mentioned algorithms are applied in BSS problems. The PCA extracts source signal with correlation being its basis, whereas matching pursuit and

14

ICA depends on statistical independence of the source data. The advantages of ICA over pursuit techniques is that the latter, can be applied to realize a MIMO system as opposed to a MISO in the former method. Statistical independence also lies at the core of ICA method [10] [11]. ICA approximately extracts a set of independence signals from its mixtures, so that the extracted signals would be a set of single voices.

The main advantage of relying on independence as opposed to uncorrelated signals is that if the extracted signals are due to uncorrelation, a new set of signal will be obtained as opposed to independence where a set of single voices are obtained. The signals obtained by ICA rely on independence which implies an inclusion of uncorrelation, but lack of correlation does not imply independence. Therefore lack of correlation is weaker property than independence [8].

Two variables s

1 t and s

2 t of a vector s t are statistically independent [11] if and only if

P s

( s t

)



P s

( s

1 t

, s

2 t

)



P s

1

( s

1 t

) P s

2

( s t

2

)

(2.4) where for any joint probability distribution function pdf the probability distribution functions Ps

1 and Ps

2 are called the marginal probability distribution functions of joint pdf Ps . Now the two variables or signals are independent when their joint Ps can be obtained exactly from the product of respective marginal pdfs

Ps 1 and

Ps 2

as implied by equation (2.4) above.

If s

1 t and s

2 t are independent it further implies that

E

     

1 2 1 2

,

(2.5) whereby for correlation p

 q



1

, while independence involves all the positive integer values of p and q

. Thus independence takes many constrains on the joint pdf

Ps than correlation does or rather whereas correlation is a measure of amount of

15

covariation between s

1 and s

2 and depends on the first moment of pdf s

1 t only, independence is a measure of the covariation between 1 raised to power p

and s

2 raised to power q

depending on all moments of the joint pdf 𝑃 𝑠

[11].

Therefore, independence being a more powerful tool than correlation or by being a generalized measure of correlation, is used as an approach into BSS to extract

M signals y from

M

signal mixtures x

by optimizing unmixing matrix W , where y



Wx (2.6)

In vector-matrix notation this equation can be expanded as shown below;















.

.

y

1

1 y

1

2 y

1

M

.

y

1

2 y

2

2

.

2 y

M y y

2

1

3

3

...

...

y

3

M

...

.

.

y y

2

1

N

N

N y

M





























 w w

.

.

11

21 w

M 1 w

12 w

22

.

.

w

M 2

.

.

.

.

.

.

.

.

.

.

.

w w

1 M

2

.

.

M w

MM



























 x x

.

.

1

1

1

2 x

1

M x

1

2 x

2

2

.

.

2 x

M

3 x

M x

1

3 x

3

2

.

.

.

.

.

.

.

.

.

x

1

N x

2

N

.

.

N x

M







,









(2.7) where the subscripts in 𝐱 and 𝐲 indicate the signal number and the superscripts are the time index. The notation 𝐲 represents a vector function of time 𝑡 and 𝐲 𝐍 represents a vector function at specific time.

Having defined the research problem the summary the Infomax strategy used is as follows. It was found that independence of signals cannot be measured, but their entropy can be. Entropy is related to independence in that maximum entropy implies independent signals and entropy of signal mixtures 𝐱 is constant, but the change in entropy can be maximized by mapping and transforming the signals y



Wx to an

16

alternative set of signals Y

 g ( y )

 g ( Wx ) . This mapping as shown in the Figure 2.5 spreads out Y so that the change in entropy is maximized. When the entropy is maximized, the resulting signals are independent. The inverse y

 g



1

( Y ) is then taken, resulting in extracted signals y that are independent [8]. And since the extracted sets of signals 𝐲 are independent, they must be the original source signals s

. This is in summary the Infomax algorithm.

Figure 2-5: Infomax Strategy.

2.4

Information Theory

In this section information theory is used to develop “Infomax” algorithm earlier introduced. Infomax is a method of ICA which aims to finding independent source signals by maximizing entropy.

2.4.1

Entropy of a Single Event

Considering a univariate event, its entropy is the amount of surprise associated with the same given event. And on the other hand the surprise is the information in the event. The information associated with the occurrence of an event say

A is defined as

[8] [10];

I ( A )

 ln





1

Pr[ A ]





  ln



Pr[ A ]



,

(2.8)

When 𝐼(𝐴) is expressed in natural logarithms the units of information are then in nats.

It is worth noting here that the base of the algorithm is arbitrary, but the natural

17

logarithm is used for mathematical convenience. For a surprise, say

A t



1

to occur at a particular stated time t it would be a surprise indeed, but its probability is very low.

On the other hand if the probability of an event occurring is

Pr( A )



1

, then it contains very little information. The precise amount of surprise associated with the outcome

A t



1 , is defined as shown below [8] where

I ( A t

)

is large for small values of

Pr and small for large values of

Pr

I ( A )

  ln Pr( A )

  ln( 1 )



0 ,

(2.9)

If the probability of an event

A occurring



Pr(A)



0



, then it contains infinite information

I ( A )

  ln Pr( A )

  ln( 0 )

 

, (2.10)

Entropy is average information and can be obtained from expectation. And expectation is a weighted average defined [11] as;

E

    s

X ( s ) Pr( s )

(2.11)

Therefore, entropy H, or expected information or average amount of surprise associated with an event is then

H(A)



E{I(A)}

  i

I ( A i

)Pr( A ) ,

(2.12) where i represents an arbitrary number of events. For this arbitrary number of events, entropy 𝐻(𝐴) is obtained by substituting equation (2.8) into equation (2.12).

H ( A )



E



 ln Pr( A i

)



  i

 ln Pr( A i

) Pr( A i

),

(2.13) which can be rearranged to the form below for a set of event.

H ( A )

   i

Pr( A i

) ln



Pr( A i

)



, (2.14)

18

With two possible outcomes say n



2 for events like [Yes or No, Head or Tail and 0 or 1]. The probability of such two events sum to one and can be

Pr( A

1

)



Pr( A

2

)



1 , (2.15)

Defining

Pr( A

1

)

 p , and

Pr( A

2

)



1

 p ,

By expanding equation (2.12) one can arrive at

H ( A )

 



Pr( A

1

) ln Pr( A

1

)



Pr( A

2

) ln Pr( A

2

)



, and by substituting into equation (2.15) and (2.16)

H ( p )

  p ln( p )



( 1

 p ) ln( 1

 p ),

(2.16)

(2.17)

(2.18) p is probability ranging from 0 – 1. The graphical representation in Figure 2.6 is entropy representation of two events with equal probability

Figure 2-6: Entropy of a two-event example with equal probability.

For

N

number of trials,

H ( A )



E



 ln P

A

( A )



19

(2.19)

H ( A )

 

1

N

N  t ln P

A

( A t

) , (2.20) where 𝒕 is time sample and N is the number of time samples. Now, since infomax method obtains mutually independent signals by maximizing entropy and the entropy of signal mixtures x is constant, the expression for entropy of a transformed signal

Y

is necessary so that change in entropy can be maximized. From the entropy of a signal dependent variable x each element in x is a different signal sampled at the same time t. From equation (2.20)

H ( Y )

 

1

N

N  t ln p

Y

( Y t

) , (2.21) where

Y

 g ( y ),

and y is a scalar function of time y

 y ( t )

 y t

, the superscript indicates the scalar of y at time

t

. The function g ( y ) is cumulative density function of desired signal

y

, referred here as “model cdf” of the source signals as it is chosen to extract the desired type of the source signals.

2.4.2

Exploring a single signal row

By recalling equation (2.6) y



Wx , where W is a single row of unmixing matrix W and x is a vector representing a sample of M signals in time as shown in the said vector notation, where x

















 x x

.

.

x t

M

1 t

2 t

















and

W













W

W

.

11

 W

M

21

1

W

12

W

22

.

W

M 2

...

...

...

...

W

W

1

2

.

N

N

W

MM











,



w

,

The extracted signal y at time instant t can be obtained as;

(2.22)

20

𝒚 = 𝐖𝐱 = [𝑊

21

𝑊

22

… 𝑊

2𝑀

] 𝑥 𝑡

1 𝑥

..

.

.

𝑡

2

[ 𝑥

.

𝑡

𝑀

]

,

After vector multiplication y t  w

21 x

1 t  w

22 x

2 t 

...

 w

2 M t x

M

,

(2.23)

(2.24)

Where y t a signal is sampled at time t

, and transforming y t through the model cdf g ( y t

) yields to a mapped value of

Y a random variable on the range from zero to one i.e.

Y

 g ( y t

)

 g ( w

21 x

1 t  w

22 x t

2



...

 w

2 M x t

M

),

(2.25)

Then, the probability density function pdf is related to cdf as shown in figure 2.7.

Figure 2-7: Transformation of y to Y , From Ref. [3] .

21

Figure 2.7 illustrates how a signal y =

 y

1

,...

y

100



(C) can be used to approximate its pdf (D). The integral of

P y

yields to the cdf of y

. This transformation of its pdf through cdf (B) yields a uniform distribution (A) [1].

P

Y

( Y t

)



Y



P y

( y t

)

 y

By rearranging equation (2.26) one can get

P

Y

( Y t

)



P y

( y t

) dY dy

(2.26)

P

Y

( Y t

)



P y

( y t

)

 y



Y



P y

( y



Y t

 y

)

And in the limit as  y



0 this implies



Y

 y

 dY dy and hence equation (2.27)

(2.27) becomes

(2.28)

Now since

Y

 g ( y t

)

where g ( y t

)

is a model cdf of the source signal, then dY dy

 g ' ( y )

and g ' ( y )

is the pdf of the source signal. Equation (2.28) becomes

P

Y

( Y t

)



P y

( y t

) g ' ( y )



P y

P s

(

( y t y t

)

)

,

Substitute equation (2.29) in equation (2.21)

H ( Y )

 

1

N

N  t ln

P y

( y t

P s

( y t

)

)

,

(2.29)

(2.30)

22

2.4.3

Entropy of Multiple Variables

The univariate model can be extended to a general case in which there is more than one random variable. From equation (2.19) a univariate equation can be expressed as

H ( A )



E



 ln P

A

( a )



, (2.31)

This can be extended to a vector notation for multiple variables, where

A



{ A

1

, A

2

...

A

M

} and a



{ a

1

, a

2

...

a

M

} . The resulting multivariate expression for entropy is

H (

A

)



E



 ln P

A

(

a

)



(2.32) where P

A

(

a

) is multivariate pdf of random vector A . If each a i

is independent and identically distributed then

P a

A

)



P

A

( a

1

) P

A

( a

2

)...

P

A

( a

M

)



M  i



1

P

A

( a i

) ,

The natural log of multivariate pdf is

(2.33) ln



A

( )



 ln



P

A

( a

1

) P

A

( a

2

)...

P

A

( a

M

)



 ln



 i

M 



1

P

A

( a i

)





, (2.34) using the logarithmic properties of the type, the log of the product is equal to the sum of the logs, so ln



P

A

( a

1

) P

A

( a

2

)...

P

A

( a

M

)



 ln



P

A

( a

1

)

 

P

A

( a

2

)





...

 ln



P

A

( a

M

)



(2.35a)

 i

M 



1 ln



P

A

( a i

)



,

Substituting equation (2.35b) into (2.32), the resulting entropy is given by

(2.35b)

H ( A )



E   i

M 



1 ln



P

A

( a i

)



(2.36)

23

The expectation can be estimated by taking an average, to yield an expression similar to that of a univariate result of equation (2.20) as

H ( A )

 

1

N i

N M 



1 i



1 ln



P

A

( a i

)



 

1

N i

N 



1 ln



P

A

( a t

)



(2.37)

If equation (2.37) is applied to the mapped signal Y transformed as the model cdf 𝐘 = g(𝐲) = g(𝐖𝐱) , the multivariate expression of entropy of signals Y, becomes

H ( Y )

 

1

N

N  i



1 ln



P

Y

( Y t )



(2.38)

Following the same line of reasoning as was applied for variable Y above, the entropy

H(Y) of a multivariate pdf 𝐏

𝑌

(𝒀) is maximized if it is a uniform joint pdf. Given the mapped signal Y , the joint pdf P can be shown to be related to the joint pdf

Y

P y of the extracted signal by multivariate form of equation (2.28) as

P

Y

( Y )



P y

( y )

,



Y

 y

(2.39) where the 

Y



y

is a Jacobian matrix [3] and the │∙│denotes absolute value of the determinant of this matrix. This is examined into detail at the appendix A. Following the univariate logic

Y

 g ( y )

 g ( Wx )

where g ( y ) is called model of the source signals. 

Y



y

is the pdf g ' ( y ) of the source signals P s

( y ) which can be expressed as

P

Y

( Y )



P y

( y )

,

P s

( y )

(2.40)

And substituting equation (2.40) into equation (2.38) leads to multivariate expression for entropy in terms of the source signals pdf

P s

( y )

and the extracted pdf P y

( y )

24

H ( Y )

 

1

N i

N 



1 ln

P y

( y t

)

P s

( y ) y

(2.41)

2.5

The pdf of the extracted signals

As in the univariate expression, this multivariate expression of entropy requires an expression for the extracted signal P y

( y ) . And to obtain this expression, the relationship in equation (2.39), which is true for any invertible function, is taken into account. The pdf P y

( y ) of the extracted signal y



Wx is expressed as

P y

( y )



P x

(

 y x )

 x

,

(2.42) where the denominator is a Jacobbian matrix J . The ji th element of the matrix is evaluated as 

Y

 y



J



W

[8] or j



|



Y

 y |



J

 g ' ( y )

[8], and are important in drawing a relation between the mixed and observed signal x and the estimated y signals in deriving entropy. This is shown in Appendix A.

The pdf of the output function is defined in terms of mixed sources and the equation

(2.42) can be rewritten in pdf as

P )

 y

( y

P x

( x )

,

W

(2.43)

This expression leads to the expression of entropy used in the infomax algorithm for blind source separation.

25

2.6

Infomax expression for Entropy

If

M unknown signals s



( s

1

, s

2

,..., s

M

)

T have a common cumulative density function

(cdf) g and pdf p s then given an unmixing matrix

W

which extracts

M signals y



( y

1

, y

2

,..., y

M

)

T from a set of observed signal mixtures x, the entropy of the signals

Y

 g ( y ) is

H ( Y )



H ( x )

 



 i

M 



1 ln p s

( y i

)





 ln W ,

(2.44) where y

 i w

T i x is the i th signal, which is extracted by the i th row of the unmixing matrix W . This expression is derived in Appendix B. This expected value will be computed using N sampled values of the mixtures x. By definition, pdf p s of a variable is the derivative of that variable’s cdf g p s

( y )



 g ( y )



Y

,

(2.45) where this derivative is denoted g ' ( y )

 p s

( y ),

[8] [11]so that equation (2.45) is

H ( Y )



H ( x )

 





M  i



1 ln g

'

( y i

)





 ln w , that maximizes equation (2.43) also maximizes equation (2.47) below. h ( Y )

  



 i

M 



1 ln g

'

( y i

)





 ln w ,

(2.46)

The entropy

H ( x ) of the observed mixtures x is unaffected by

W

its contribution to

H ( Y ) is constant, and can therefore be ignored. This means that the unmixing matrix

(2.47)

An optimal matrix W

 maximises the function h or rather the entropy in

Y using gradient ascent algorithm. This will imply the rows of

Y are independent. Since y is

26

the inverse of

Y

, it implies the rows of y are independent, which implies that W is the unmixing matrix that yields the original signals. Therefore, in order to perform gradient ascent efficiently, we require an expression for the gradient of h

with respect to matrix

W .

2.7

Evaluating the Gradient of Entropy

The term gradient entropy is also called the differential entropy [3]. The gradient of h is found by taking its partial derivative with respective to one scalar W ij of

W where

W ij is the element of i th row and j th column of

W ij

. The weight

W ij determines the proportion of the jth mixture in the extracted signal y i

. Given that y



Wx

, and that every source signal has the same pdf ' , the partial derivative of h with respect to the ijth element is

 h



W

 





M  i



1

 ln g

'

(



W ij y i

)



  ln w



W ij

,

(2.48)

For simplification the partial derivatives on the right hand side are evaluated in turn.

From the summation part of the first term







M  i



1

 ln g

'

( y i



W ij

)





, 



(2.49) can be written as

 ln g

'

(



W ij y i

)



1 g ' ( y i

)

 g

'

( y i



W ij

)

,

(2.50)

Using chain rule

27

 g

'

( y i



W ij

)

 dg ' ( y i

) d ( y i

)

 y i



W ij

,

(2.51)

The derivative on the right hand side is expressed as a second derivative of g with respect to y i dg ' ( y i

)

 g '' ( y i

), d ( y i

) and

 y i



W ij

 x j

,

Substituting equation (2.52) and (2.53) into equation (2.51)

 g

'

( y i



W ij

)

 g '' ( y i

) x j

,

Substituting equation (2.54) into equation (2.50) it yields to

 ln g

'

(



W ij y i

)



1 g ' ( y i

) g '' ( y i

) x j

,

Substituting equation (2.55) into (2.49) yields to





 i

M 



1

 ln g

'

(



W ij y i

)



  



 i

M 



1 g

''

( y i

) g ' ( y j

) x j



 ,

(2.52)

(2.53)

(2.54)

(2.55)

(2.56)

For notational convenience 

( y i

)

 g g '

'' ( y i

( y i

)

)

'

and hence this yields to





 i

M 



1

 ln g

'

( y i



W ij

)



  





M  i



1



( y i

) x j



 ,

(2.57)

Now, turning to the second term on the right hand side of the equation (2.48) and with the help of the proof at appendix A,

28

 ln w



W ij



  ij

,

(2.58)

Thus

  ij

, is the ij

-th element of the inverse of the transposed un-mixing matrix

W .

Substituting equation (2.57) and (2.58) into equation (2.48) leads into

 h



W

 



 i

M 



1



( y i

) x i







  ij

, (2.59)

By considering all elements of W then,

 h

 





( y ) x

T





W



T

,

(2.60) where  h is an

M



M

(Jacobbian) matrix derivatives in which the i th element is

  ij

.

Given a finite sample of

N

observed mixture values of x t for t



1 ,...., N

and an unmixng matrix

W , the expectation

E

.

 can be estimated as the mean of ergodic signals (stationary).







( y i

) x T





1

N

N  i



1



( y t i

)[ x t ] T ,

(2.61)

 h



1

N i

N 



1



( y t i

)[ x t

]

T 

W



T

,

(2.62)

For the infomax algorithm, an optimal un-mixing matrix

W is found by maximizing entropy, which is iteratively following the gradient  h until local maximum is reached. This is accomplished by the following weighting algorithm.

W new



W old

   h ,

(2.63)

W new



W old

 





1

N i

N 



1



( y t i

)[ x t

]

T 

W



T





,

29

(2.64)

By updating

W

, with learning rate of  to optimal

W

 , a maximum entropy in

Y

 g ( y )

is obtained. However, the assumption that entropy depends only on statistical independence is not sufficient enough to separate source signals from their mixtures especially in non-linear cases. Therefore, some further constraints on the mixing function and statistical independence should be imposed such that the sources can be easily and completely recovered even though the meaning of “blind” here is somehow different from the earlier case due to the required statistical properties of the sources. These reasons necessitated the review into radial basis function (RBF) neural networks as outlined in the chapters three and four below.

30

CHAPTER THREE

3.

RADIAL BASIS FUNCTION NETWORKS

3.1

Introduction

In this chapter neural network of radial basis functions is reviewed. Radial Basis

Function RBF in neural network is approached by viewing the design as a curve fitting or approximation problem in high dimensional space. This is the motivation behind

RBF method as it draws upon research work on traditional strict interpolations in a multidimensional space. Learning is equivalent to finding a multidimensional function that provides a best fit to the training data, with “best fit” criterion being measured in some statistical sense. In a neural network, the hidden units form a set of “functions” that compose a random “basis” for the input patterns or vectors. These functions are called radial basis functions [12].

Radial basis functions were first introduced by Powell to solve the real multivariate interpolation problem [8], in numerical analysis. In neural networks, radial basis functions were first used by Broomhead and Lowe [13]. Other contribution to the theory, design and applications of RBFNs can be found in reference papers by Renal,

Moody and Darken, and Poggio and Girosi [14] [15] [16] [17]. The regularization theory is applied to this class of neural networks as method for improved generalization to new data [18].

31

The design of RBFN in its most basic form consists of three separate layers. The input layer consists of set nodes also called sensory units. The second layer is a hidden layer of high dimension. The output layer gives the response of the network to the activation patterns applied to the input layer. The transformation from the input nodes to the hidden layers is nonlinear while, that of the hidden space to the output space is linear

[19]. A mathematical justification is found by [20]. Covers theorem states that a pattern classification problem cast in a non-linear higher dimensional space is more likely to be linearly separable than in a low dimensional space. This basic and fundamental theorem on separability provides the reason for making the dimension of the hiddenunit space high in RBFN.

This chapter then gives a review of separability of patterns, the interpolation problem, supervised learning as in the ill-posed hyper surface reconstruction problem and regularization theory and its networks. The generalized radial basis functions networks, its learning strategy are also discussed.

3.2

Interpolation problem

Cover’s theorem on separability of vectors as considered by [19], states that a radial basis function network is applied on a complex pattern problem by transforming the problem into a high dimensional space in a nonlinear manner [11]. The theorem is defined as follows:

32

Consider a family of surfaces, each of which naturally divides an input space into two regions. Let 𝑋 denote a set of N patterns or points x

1

, x

2

,

…

x

N

, each of which is assigned to one of the two classes say 𝑋 1 and 𝑋 0 . These binary partitions of the points is said to be separable with respect to the family of surfaces. And on the other hand if there exist a separating function it should be able to separate points in the class

X

1 from those in class X

0 .

For a pattern each x



X , a vector  i

(x) = [



1

(x),



2

(x), … ,



M

(x)]

T defined by a set of real functions

{

 i

(x) | i



1 , 2 ,

…

, M }

. Supposing the pattern x is a vector in m

0

 dimensional input space, the vector  i

( x ) then maps points into a new m

1

 dimensional space. The function  i

( x ), is referred to us a hidden function because it plays a similar role as that of the feed forward neural network. Now a class of {

X

1

, X

0

} of

X

is said to be   separable if there exists an m

1

 dimensional vector w such that we may write w

T

 i

( x )



0 , for x



X

1 (3.1) and w T

 i

( x )



0 , for x



X

0 (3.2)

The hyper plane is then defined by equation (3.3), which describes the separating surface in this  space. w

T

 i

( x )



0 ,

(3.3)

33

Some of the separating surfaces are hyper planes of first order, quadratics of second order and hyper spheres a quadratics as shown figure 3.1 with certain linear constraints on the coefficients.

Figure 3-1: Three examples of separable dichotomies of different sets of five points in two dimensions: (a) linearly separable dichotomy; (b) spherically separable dichotomy; (c) quadratically separable dichotomy [3] [21] .

Interpolation problem allows one to solve the nonlinearly separable pattern classification problem, by finding the linear weight vector of the network as will be shown in the following sections. Nonlinearly separable pattern classification problem is generally solved by exploiting mapping the input space into a new space of sufficiently high dimension. Considering a feed forward network with an input layer, single hidden layer and an output layer having a single unit, the network can be designed to perform a nonlinear mapping from the input space to the hidden space, and a linear mapping from the hidden space to the output space. The network maps a 𝑚

0

− dimension input space to the single dimensional output space, expressed as;

F : R m

0



R

1

(3.4)

34

Therefore, interpolation problem can be stated that; given a set of N different points

{𝐱 𝐢

∈ 𝐑 𝟐 | 𝑖 = 1, 2, … 𝑁} and a corresponding set of

N real numbers {𝑑 𝑖

∈ 𝐑 𝟏 | 𝑖 =

1, 2, … 𝑁}, a function 𝐹 ∶ 𝐑 𝐦

𝟎 → 𝐑 𝟏 is to be found that satisfies the interpolation condition:

F ( x )

 d i

, i



1 , 2 ,...

N (3.5)

For strict interpolation as defined above, the interpolation surface that is function

F has to process through all the training data points. Therefore, radial-basis functions

(RBF) technique consists of choosing a function F that has the following form [4]:

F ( x )

 i

N 



1 w i



( x

 x i

)

(3.6) where { 

( x

 x i

)

| i = 1, 2, … N ) } is a set of N random nonlinear functions called radial basis function, and represent the norm and generally it is Euclidean norm.

The data points x

 i

R m

1 , i



1 , 2 ,...

N are centres of radial basis function. By inserting

(3.6) in (3.5) we obtain the following linear equations for unknown coefficients or weights of the expansion 𝜔 𝑖

.

























11

21

.

.

.

N 1



12



22

.

.

.



N 2

.

.

.

.

.

.

.

.

.

.

.

.





.

.

1 N

2 N

.



NN



































 w w

.

.

.

w

N

1

2



















=



















,







 d d









 d

.

.

.

1

2

N

(3.7)

35

where ,  ji

 

 x j

 x i

 j, i = 1, 2, ...N, showing that we have N

 number of basis functions and not

N

square number of  basis functions. In compact mathematical form; d



[ d

1

, d

2

,..., d

N

], w



 w

1

, w

2

,...

w

N

,



   ij

, i , j ,



1 , 2 ,...

N ,

(3.8)

(3.9)

(3.9)

The vectors d and w represent the desired response vector and linear weight vector, respectively. Φ denotes an NbyN matrix with elements 𝜙 𝑗𝑖

and is called the interpolation matrix. In compact form equation (3.7) is written as:

 w

 d

(3.10)

If the matrix Φ is a positive definite an interpolation problem can always be solved

[19]. The common examples of this specific class of radial basis functions, from literature, are given as follows: i.

Thin plate spline function :

Φ(r) = r 2 log(r) (3.11) ii.

Multi quadratic function:

Φ(r) = √𝑟 2 + 𝜎 2 iii.

Inverse multi-quadratic function:

Φ(r) =

√𝑟

1

2 +𝜎 2 iv.

Gaussian function:

Φ(r) = exp (− r

2

σ 2

) for 𝜎 ≥ 0 , and 𝑟 ≥ 0 .

(3.15)

(3.16)

(3.17)

36

The Gaussian radial basis function monotonically decreases with distance from the center, as shown in Figure 3.2

Figure 3-2: The Gaussian radial basis function with centre and radius [19] .

Considering that if all the data points are distinct the weight vector w can be formed as follows: 𝐰 = 𝚽 −𝟏 𝐝 (3.18) where  

1 is the inverse of the interpolation matrix  . Even though in theory a solution to the strict interpolation problem exists, in practice equation (3.18) cannot be solved when the matrix  is arbitrary close to singular. This brings about an ill posed hyper-surface reconstruction problem. As opposed to the well-posed complement function the ill-posed surface does not meet the three conditions of namely; existence, continuity and uniqueness during training. Considering these conditions, a well-posed surface is needed. Therefore, the problem is now viewed as a purely interpolation problem. This is because interpolation constrains our surface to pass through all the training points including noise. This regularization solution will

37

invoke smoothness and flexible conditions on the surface, less affected by noise and better for generalization.

3.2.1

Solution to Interpolation Problem.

An ill-posed surface as mentioned earlier violates the three conditions of existence, continuity and uniqueness. Tikhonov’s regularization [9] takes input-output data available for approximation to be described as;

Input signal 𝐱 𝐢

∈ 𝐑 𝐦

𝟎 , 𝑖 = 1, 2, … 𝑁

0

(3.19)

Desire signal 𝐝 𝑖

∈ 𝑅 1 , 𝑖 = 1, 2 . . . 𝑁 (3.20)

From the approximating function or interpolating function

F (

 x )

Tikhonov takes the following two terms; i)

Standard error term;

E s

( F )



1

2 i

N 



1

( d i

 y i

)

2

,

(3.21) 𝑑 𝑖

is desired output and 𝑦 𝑖 be 𝑑 𝑖

= 𝑦 𝑖

}

is actual output{basically we do not want all points to

E s

( F )



1

2 i

N 



1

( d i



F ( x i

)

2

,

(3.22) ii)

Regularization term;

E

R

( F )



1

2



D F

2

,

(3.23)



D is a linear differential operator

The regularization term gives the radial basis function networks (RBF) a good edge than that of ordinary multi-layer perceptron networks (MLP). This is because the smoothing introduced has less constraining and flexible conditions on the surface, less affected by noise and better for generalization. The combined measure is given by

38

E ( F )



E s

( F )

 

E

R

( F )

(3.24)

 -is a regularization parameter to play within order to achieve smoothness. The corresponding regularization principle yields to Frechet differential and superposition of Green’s functions [19] [22] of the form;

F



( x )



1

 i

N 



1

 d i



F ( x i

)



G ( x ;

 i

) ,

(3.25)

The minimization solution 𝐹

ℷ

(𝐱) is linear superposition of Green’s function 𝐺(𝐱; 𝛆 𝐢

), 𝑖 = 1, 2, … . 𝑁 which yields to Green’s Matrix as will be shown below, 𝜺 is centre, and is acting as basis function, or rather 𝑁 number of function being basis.

1



 d i



F ( x i

) ,

 coefficient of expansion and Greens Function acts as a basis function. w i



1



 d i



F ( x i

)



, i



1 , 2 ,...

N

(3.26)

(3.27)

F



( x )



1

 i

N 



1 w i

G ( x ;

 i

) ,

(3.28) and translating the Green’s function to act as radial basis function 𝓔 𝐢

= 𝐱 𝐢

, and then evaluating at 𝐱 𝐣, 𝑗=1,2,….𝑁 and can be

F



( x )



N  i



1 w i

G ( x j

; x i

) , 𝑖, 𝑗 = 1, 2, … . 𝑁

(3.29)

The entire minimization equation is expressed in matrix form as

F



( x )





F



( x

1

) F



( x

2

) .

.

.

F



( x

N

)



T

,

(3.30)

39

d



 d

1 w



 w

1 d

2 w

2

.

.

.

.

.

.

d

N



T

, w

N



T

,

G











G

G









 G (

( x

( x x

N

.

.

.

1

2

;

;

; x x

1 x

1

1

)

)

)

G ( x

1

; x

2

)

G ( x

2

; x

2

)

.

.

.

G ( x

N

; x

2

)

.

.

.

.

.

.

.

.

.

G ( x

G

G (

( x x

2

N

1

;

.

.

.

;

; x x x

N

N

N

)

)









) 









,

(3.31)

(3.32)

(3.33)

From equation (3.26) and (3.28) becomes a vector matrix form and can be represented as w



1



 d



F )



,

F



Gw

Eliminating

F

 between equation (3.34) and (3.35),



G

 

I



w



d

w





G

 

I

  d

(3.34)

(3.35)

(3.36)

(3.37)

G

- is Green’s matrix realized from the Green’s function and from Michelet criteria

[24]it takes radial basis functions such as, Gaussian function, inverse quadratic and multi-quadrics.

The solution in equation (3.27) with RBF is then rewritten as;

F



( x )

 i

N 



1 w i

G x j

 x i

,

(3.38)

3.2.2

Regularization Networks

Figure 3.3 shows the network structure established while considering the expansion of the approximation function F ( x ) given in equation (3.29) in terms of the Green’s function

G ( x ; x i

) centred at x i

. This network is called a regularization network [23],

40

using the solution to the regularization problem caused by interpolation as seen in

Section 3.2.1.

G(𝐱 𝐣

; 𝐱 𝐢

), has to be a positive definite for all i

.

Figure 3-3: Regularization network.

The network has three layers. The first layer of the network consists of source nodes whose number is equal to the dimension m

0

of the input vector x that is, the number of independent variables of the problem. The second layer is a hidden layer, made up of nonlinear units that are connected directly to all of the nodes in the input layer.

There is one hidden unit for each data vector x i i



1 , 2 ,...

N , where N is the number of training samples. The activation function of the individual hidden units is described by the Green’s functions [22]. Correspondingly, G ( x ; x i

) represents the output of the ith hidden unit. The output layer has one single linear unit which is fully connected to the hidden layer. The term “linearity” is introduced because the output of the network is a linearly weighted sum of the outputs of the hidden units. The weights of

41

the output layer are the unknown coefficients of the expression described in equation

(3.37) in terms of the Green’s functions G ( x ; x i

), and the regularization parameter  .

Obviously, such a network structure can be readily extended to have any number of outputs desired [9].

The Green’s function G ( x ; x i

), is assumed to be positive definite for all i in the regularization network represented in Figure 3.3. When this condition is satisfied, which is true in the case where the Green’s functions G ( x ; x i

), have the form of

Gaussian functions, then this network will produce an “optimal” interpolant solution in the sense that it minimizes the functional E ( F ) . Three properties of the regularization network from the viewpoint of approximation theory are as follows

[19]:

 The regularization network is a universal approximator; in that it can approximate arbitrarily well any multivariate continuous function on a compact subset of p, given a sufficiently large number of hidden units.

 Since the approximation scheme derived from regularization theory is linear in the unknown coefficients, it follows that the regularization network has the best approximation property. This means that given an unknown nonlinear function F

, there always exists a choice of coefficients that approximates

F better than all other possible choices.

 The solution computed by the regularization network is optimal. Optimality here means that the regularization network minimizes a function that measures how much the solution deviates from its true value as represented by training data [16].

42

When the transforming dimension m

1

is greater than the input dimension that is m

0

 m

1 and m



0

N

the Green’s function

G ' s can be replaced by 

' s

. The difference between the G ' s and 

' s

in RBF is that interpolation is restricted to all given points of the surface but in smooth regularization the surface reconstruction takes to account the cost function and variation in smoothness.

3.3

Generalized Radial Basis Function Networks

The developed regularization network has infinite inputs which are difficult to separate. The Frechet differential is then required to  i

( x ) neurons at every input to the function. This is because as

N increases typically the system will build itself into

N



N such centres also. This poses some computational challenges especially when the data is too big. Moreover, the probability of ill conditioning is higher for such large matrices. So the generalized RBFN takes 𝑚 𝑖

-hidden neurons 𝑚 𝑖

≤ 𝑁 to get an approximate regularization function as shown,

F



( x )

 m i i





1 w i

 i

( x ) , (3.39) 𝜑 𝑖

(𝐱)| 𝑖 = 1, 2, … 𝑚 𝑖

forms the new set of basis function which are already independent without loss of generality. This shows the number of basis functions is less than the number of data points that is 𝑚

1

≤ 𝑁 and 𝑤 𝑖

represent a new set of weights. Considering radial basis function in [24] gives

 i

(

 x )



G





 t i





,

43

 i



1 , 2 ,...

m i



(3.40)

where

 t i

| i



1 , 2 ,...

m i



are new centres.

Substituting for equation (3.38) [17] [19] an approximation solution

F



(

 x ) is obtained with finite basis functions rewritten as,

F





( x )

 m i i





1

 w i

G ( x ,

 t i

)

 m i i





1 w i

G (

 x



 t i

) ,

(3.41)

From the approximate term equation (3.40) the two Tikhonov terms



( F



)

  s

( F )

  c

( F )

[25] can be obtained as



( F



)

 m i i

N 



1 d i

 i





1 w i

G ( x





 t i

)

2

 



D F



2

, (3.42)

In a vector matrix the right hand side parameters can be rewritten as d w







 d

1 w

1 d

2 w

2

.

.

.

.

.

.

d

N w

N



T



T

,

,

(3.43)

(3.44)

G











G

G









 G (

( x

1

( x x

.

.

.

2

N

;

;

; t

1 t t

1

1

)

)

)

G ( x

1

; t

1

)

G ( x

2

; t

2

)

.

.

.

G ( x

N

; t

2

) .

.

.

.

.

.

.

.

.

G

G

G

(

(

( x

1 x x

2

N

.

.

.

;

; t t t

M

M

M

)

)









) 









, (3.45)

Hence these parameters can be phrased as squared Euclidean norm d



Gw

2

, where the desired response has the same dimension

N ,

as before. And on the other hand, the matrix

G has of the Green’s function and the weight vector w have different dimensions. The matrix

G has

N rows and

M columns, and thus it is asymmetric or rather underdetermined. The vector w has

N rows and one column.

44

The second term 



D F



2

,

[25] [24] can be represented by inner product in Hilbert space with

F

 as approximating differential





D F



2



( D F



D F



)

H

D F

 2 





 m i  i



1 w i

G (

 x ,

 t i

)

~

D



D i m i 



1 w i

G (

 x ,

 t i

)







H

,

 j m i m i 



1 i



1 w i w i

G o

 

( t j

, t i

)

= w T G o w

Where the matrix

G o is a symmetrical 𝑀 𝑖

× 𝑀 𝑖

matrix, given by

(3.46)

(3.47a)

(3.48b)

G o



G

G









 G









(

( t

( t t

1

2

M

.

.

.

;

;

; t t t

1

1

1

)

)

)

G ( x

1

; t

2

)

G ( t

2

; t

2

)

.

.

.

G ( t

M

; t

2

) .

.

.

.

.

.

.

.

.

G ( t

G

G ( t

( t

2

M

.

.

.

1 t

; t

; t

M

M

M

)

)

)







,













Broomhead in [14] shows that, w



G

 d ,

(3.49)

The minimization of this approximate function F

 in relation with the weight vector produces the result as,

( G

T

G

 

G o

) w



G

T d ,

(3.50)

(3.51) where

G

 

( G

T

G

 

G )



1 T

G , and as  

0 , G

 

( G

T

G )



1 T

G , which is a pseudoinverse matrix.

45

Figure 3-4: Generalized radial basis function network [24] .

The framework for the generalized radial-basis function (RBF) network shown in

Figure 3.4 is provided by the solution to the approximation problem defined in the equation (3.41). In this network, a bias (i.e., data-independent variable) is applied to the output unit. In order to do that, one of the linear weights in the output layer of the network is set equal to a bias and the associated radial basis function is treated as a constant equal to +1.

3.4

Weighted Norm

For a more optimal generalized radial basis function the Euclidean norm is replaced with a weighted norm [11], especially when the input vector x , belongs to different classes. Such data need some transformation. x

C

2 

( Cx ) T

 

,

 x

T

C

T

Cx

(3.52a)

(3.52b)

46

where 𝐱, is dimension 𝑚

0

and 𝐂 is of 𝑚

1

× 𝑚

1

weighting matrix, which transforms 𝐱 from 𝐱 space to 𝐂x dimension 𝑚

1

of weighted space and the resulting norm is therefore weighted norm. The corresponding generalized radial basis function equation is given as shown below with replacement of the Euclidean norm with weighted norm using

Gaussian as Green’s function.

F





( x )

 m i i





1 w i

G (

 x



 t i

)

C

,

G





 t i





C

 exp[



( x

 t i

)

T

C i

T

C i

( x

 t i

)],

(3.53)

(3.53a)

 exp[



1

2

( x

 t i

)

T

 

1

( x

 t i

) ],

(3.53b)

F





( x )

 m i i





1 w i

G (

 x



 t i

(3.54)

)

C

 m

1 i





1 w j exp[



1

2

( x

 t i

)

T

 

1

( x

 t i

) ] ,

The inverse matrix

 

1 is defined by

1

2

 

1 

C i

T

C i where the original sigma matrix is a covariance matrix



.

The differences between the regularization and generalization network are as follows which then promotes the later to be used in our blind source approximation.

 The number of nodes in the hidden layer of the generalized RBF network of Figure

3.4is

M , where M is ordinarily smaller that the number N of examples available for training. On the other hand, the number of hidden nodes in the regularization

RBF network of Figure 3.3 is exactly

N , of the data dimension.

 In the generalized RBF network of Figure 3.4, the linear weights associated with the output layer, and the positions of the centres of the radial basis functions and

47

the norm weighting matrix associated with the hidden layer, are all unknown parameters that have to be learned. On the other hand, the activation functions of the hidden layer in the regularization RBF network of Figure 3.3 are known, being defined by a set of Green’s functions centred at the training data points; the linear weights of the output layer are the only unknown parameters of the network [17]

[25].

3.5

Adaptive Radial Basis Function

This section deals with generalized radial basis function network used in this research.

As earlier indicated our practical BSS system has multiple inputs and multiple outputs

(MIMO), and the applicability of neural networks as practical adaptive identifiers and filter will eventually be judged by its success in multivariable problems [26]. In this section, we design a model following adaptive filter for a class of a discrete time multivariable linear and nonlinear blind source systems with ICA as a processor input.

Radial Basis Function (RBF) neural network with recursive k-means and least mean square (LMS) training algorithms is used for off-line stable identification. It implements a stable model following adaptive filter by utilizing the identification results. Simulation results demonstrate that the used controller can drive unknown

MIMO nonlinear BSS systems to follow the desired trajectory very well [27] [28] [26]

[29].

Utilization of RBF neural network as an intelligent filter is also emphasized. The RBF neural networks differ from multi-layered neural networks in that it has simple

48

structure and fast learning algorithms and they perform local representations.

Moreover, the basis function in the RBF neural network structure ensures that only the weights located in the vicinity of the input need adjustment during training [26]. This feature makes RBF neural networks attractive as an off-line and on-line controller where there is often no control of the order of presentation of the data samples used during training [29] [30].

3.6

Learning Strategies in RBF

The main feature in a radial basis function network is to consider the type of mapping to be approximated and in our problem it is nonlinear dynamic mapping by a sum of several functions, each one with its own priori. A data of m o

is fed into m i different radial basis function nodes existing as computational node. The nodes compute the norm and find a response for all these basis functions [27].

G

 x



 t i

2

C

 exp[



1

2



2

 x

 t i

2

,

 i



1 , 2 ,...

m i

(3.55) where the  is the standard deviation. To avoid a small or large spread of the Gaussian function [31] [3], the appropriate and judicially chosen value of  is expressed as;

  d max

2 m

1

,

(3.56)

G

 x



 t i

2 d max is maximum distance between chosen centres

C

 exp[

 d m

1

2 max

 x

 t i

2

,

 𝑖 = 1, 2, 3, … … 𝑀

1

(3.57)

Learning is pegged for various distributions of the data we have for mapping. For unevenly distributed data the centres will have to be adapted in a way to be able to

49

discriminate the pattern. A more populated cluster of data will call for more centres unlike the sparsely populated ones, hence adaptability in RBF centres, to invoking efficient utilization of defined centres. On the other hand for a uniformly distributed data there is no need for adjusting centres but we only adjust the synaptic weights.

This is achieved by solving the pseudo-inverse matrix by singular value decomposition (SVD) algorithm [30].

Training RBF neural network consists of determining the location of centres and widths for the hidden layer and the weights of the output layer. This training using a two-phase approach: in the first phase, unsupervised learning occurs, whose main objective is to optimize the location of centre and width. In the second phase, the output layer is trained in a supervised mode using the least mean-square (LMS) algorithm to adjust the weight so as to obtain the minimum mean square error at the output [19] [3]. In general, the following are the three steps of the hybrid learning method for an RBF neural network; a)

Find the cluster centres of the radial basis function; use the k-means clustering algorithm. b)

Find the width of the radial basis function. c) Find the weight; using least means square (LMS).

3.7

Self-Organized Centres

The self-organized learning element procedure allocates network recourses in a meaningful way by appropriately placing the centres of radial basis function in only

50

those regions of the input space where important data exist [19]. Considered here is characteristic of distribution density that is evenly, sparsely or highly population of data. This mechanism is unsupervised learning by use of K-means clustering algorithm

[31] [3] [32].

3.8

K-Means Clustering Algorithm

To calculate the centres of the radial basis function we adopted the k-means clustering algorithm. The purpose of applying the k-means clustering algorithm is to find a set of clustered centres and a partition of training data into subclasses [2]. The centre of each cluster is initialized to a randomly chosen input datum. Then each training datum is assigned to the cluster that is nearest to itself. After training data have been assigned to a new cluster unit, the new centre of a cluster represents the average of the training data associated with that cluster unit. After all the new centres have been calculated, the process is repeated until it converges [16] [33] [34]. The recursive k-means algorithm is given as follows:

1) Choose a set of centres

( t

1

, t

2

,..., t m

1

) arbitrarily and give the initial learning rate



( 0 )



1

(3.58)

The parenthesis in the bracket is the iteration number

0 , 1 , 2 ,...

n

2) Compute the minimum Euclidean distance

K k

( x )

 x ( n )

 t k

( n ) k



1 , 2 ,...

m

1

(3.59) r

 arg min k

K k

( x ) Centres of RBF neurons at iteration n.

51

t k

( n )

represent all competing centres, n - is alteration (epoch) the argument gives the minimum distance which we are interested with the argument or the index of the winning centre for pattern x .

3) Sampling: Draw a sample vector x from the input space

H .

For nth

-iteration the nth sample that we have to pick is x ( n ).

4) Adjust the position of these centres as follows that is updating the winning centre as close to the pattern. t k

( n



1 )





 t k

( n )

 

 x ( n )

 t k

( n )

 t k

( n ) otherwise k

 r k

 r

, (3.60)

0

  

1 is the learning rate

By finding the distance between x ( n ) and t ( n ) the vector is given a push hence changing its position close to respective data distribution.

5) Continuation step

Increment n by 1, n new

 n old



1

,  new

( n )



0 .

9998

 old

( n



1 ) and then go back to step

2 until no noticeable changes occur or is observed in the centre t k

. This is a competitive learning mechanism and centres get to organize themselves without supervision or a teacher hence a self-competitive learning technique.

52

3.9

Supervised Learning Strategy

In supervised learning the centres of RBF are obtained and all other free parameters of the network [11], for the best generalization. The error correction is the natural candidate for such a process, which is most conveniently, implemented using a gradient descent procedure that indicates a generalization of LMS algorithm. The instantaneous value of the cost function is

 

1

2 j

N 



1 e

2 j

,

(3.61) where the value

N is the number of training sample called epoch, and e j is the error signal, given by e j

 d



F



( x j

),

(3.62) d j

 i m i 



1 w i

G



 x

 j



 t i





C i

,

(3.63)

In this approach the free parameters w i

, t i

,

 i



1

, have to be calculated to minimize

 . The covariance matrix

 i

, is related to the norm weighting indicating the spread of the Green’s function in order to minimize the cost function. In addition to the number and centres of radial basis function in the hidden layer, their spread and method used for learning in the input-output mapping there is also training since the network parameters are dynamically adjusted at every sample step [31] [34]. The gradient algorithm is used to update the network free parameters.

53

3.9.1

Error Minimization Algorithm

1) Updating of the linear weights w i

, at output node the differential between the cost function  with respective weight w i

, is given by d



( n ) dw j

( n )



N  i



1 e j

G

 x

 j



 t i

C i

, w i

( n



1 )

 w i

( n )

  a

 

 w i

( n )

( n )

,

(3.64a)

(3.65b)

2)

Updating the position of the centres. The position of the centre is adjusted by supervised adjustment of the centres hence it shows how the hidden layers are organized, d



( n ) d t i

( n )



2 w i

( n )

N 



1 i e j

G '

 x j

 t i

( n )

2

C i



 

1

 x j

 t i

( n ) ,

 t i

( n



1 )

 t i

( n )

  b

 

( n )

 t i

( n )

,

(3.66a)

(3.66b)

3)

Updating of the centre width. The spreading of the centres is adjusted according to d



( n ) d

 i

( n )



2 w i

( n ) j

N 



1 e j

G '

 x j

 t i

( n )

2

C i



 j

( n ),

 j

( n )



 x j

 t i

( n )

 x j

 t i

( n )



T

,

(3.67a) the outer product multiplication.

 i



1

( n



1 )

  i



1

( n )

  c 

 

 i



1

( n )

( n )

, (3.67b)

54

The adaptive learning rate coefficient in the gradient descent algorithm determines the size of the weights and the centre position and the centre width adjustments made at each iteration and hence influences the rate of convergence.

In summary, all the three parameters below form the argument of adaptive radial basis function as already discussed and is shown in RBFN structure figure 3.5 for blind source separation;

 Weight

 Centre position and

 Centre spread

Figure 3-5: The RBFN Structure.

3.9.2

Merits of Generalized Radial Basis Function for Blind Source

Separation

Advantages of using Radial basis function networks over traditional multilayer perceptrons (MLP):-

55

 Locality of radial basis function and feature extraction in hidden neurons, that allows usage of clustering algorithms and independent tuning of RBFN parameters.

 Sufficiency of one layer of non-linear elements for establishing arbitrary inputoutput mapping.

 Solution of clustering problem can be performed independently from the weights in output layers.

 RBFN output in scarcely trained areas of input space is not random, but depends on the density of the pairs in training data set.

 The simple architecture with linear weights and the LMS adaptation rule is equivalent to the gradient of quadratics surface, thus having a unique solution to the weights.

The radial basis function (RBF) neural networks, therefore, forms a popular network used in many applications of recent times due to their ability to approximate complex non-linear mapping directly from the input-output data with a simple topological structure and ease of implementation of dynamic and adaptive network architectures.

The same advantages are borrowed into this research for improved blind signal processing (BSP) applications.

56

CHAPTER FOUR

4.

METHODOLOGY

This chapter is a build up of the process described in chapter three. The whole approach will take independent component (ICA) as a pre-process for radial basis function network (RBF)as shown in the Figure 4.1.

Figure 4-1: ICA-RBFN flow chart.

The probabilistic evaluation of the signal 𝐱 ∈ 𝐑 𝐦

𝟎

is done and then a non-linear regression through the neural networks is done using radial basis function type.

Secondly, the signals used as input will be captured from statistical nature of signals derived from [11] and entropy principles as given by [8]. As audio signal spends the majority of their time around zero [31] [3], they typically have super-Gaussian probability density functions. Therefore, in Infomax Algorithm uses a Super-Gaussian pdf to model high-kurtosis signal of this nature. Figure 4.2 shows a sample audio signal of the ten thousand bars of the played excerpt from Handel's "Hallelujah

Chorus" from MATLAB Data Directory.

57

Figure 4-2: A sample audio signal.

4.1

Adopting the Infomax Algorithm

A hyperbolic tangent function is used in the simulations for Infomax Algorithm as well as its derivative as the model pdf. It is used successfully for cdf to extract supergaussian signals as well it is found to be more accurate representations of cdf and pdf because they show valid models implementable for BSS system proposed.

Y

 g ( y )

 tanh( y )

(4.1)

This implies that the pdf g ' ( y ), is given by the first derivative of tanh is g ' ( y t

)

 d dy tanh( y t

)

 sec h

2

( y t

),



1

 tanh

2

( y t

),

(4.2)

The hyperbolic tangent function and hyperbolic tangent derivative for the audio signal are as shown in Figures 4.3. And the second derivative of tanh is given by g '' ( y t

)

 dg ' ( y d ( y t t

)

)

,

 d ( 1

 tanh

2

( y t

))

, d y t

58

(4.3)

(4.4)

 

2 tanh( y t

) d (tanh( y t

))

, d y t

 

2 tanh( y t

) ' t

),

(4.5)

(4.6)

Figure 4-3: Hyperbolic tangent function and hyperbolic tangent derivative.

The hyperbolic tangent function and hyperbolic tangent derivative for “Hallelujah

Chorus" is a shown in Figures 4.3.

For the simplified notation 

( y i

)

 g '' ( y i

)

'

in equation (2.57) it will be as; g ' ( y i

)



( y i

)

 g '' ( y i

) g ' ( y i

)





2 tanh( y t

) g ' ( y t

)' g ' ( y t

)

 

2 tanh( y t

),

Substituting equation into the entropy equation (2.62).

(4.7)

 h

 

2

N i

N 



1 tanh( y t

)[ x t

]

T 

W



T

,

(4.8)

The algorithm for updating

W in order to maximize the entropy of Y

 g ( y )

 tanh( y ) is therefore;

W new



W old

 





W



T 

2

N i

N 



1 tanh( y t i

)[ x t

]

T





, where  is a small constant called learning rate.

(4.9)

59

Based on [8] [35], ICA algorithm, an implementation in MATLAB is achieved as shown in the next section.

4.2

ICA Infomax Matlab Implementation

In Appendix C1 a sample code for Infomax for high kurtosis using gradient ascent is shown. The Stone’s default step size of for example  

0 .

25 and maximum iteration of 100 were adopted and he algorithm converged with good performance. To improve the gradient ascent algorithm to meet the objectives means to increase the rate of convergence. The unmixing matrix

W

was initially set to the identity matrix.

However, rather than taking the same size step in the direction of maximum increase, larger steps are taken as long as entropy h ( Y ) continues to increase. If entropy decreases, it is assumed that the algorithm missed the maximum. Rather than continuing to take steps, the algorithm regresses to the last “good” values of W and h ( Y )

begins taking smaller steps, which gradually increases again as long as entropy increases.

4.3

Performance Measurement

The performance of blind separation algorithm as discussed was evaluated using

BASS- EVAL Toolbox. The BASS-EVAL can be applied to all usual blind source problems [2] [36] [37]. The following assumptions were taken into consideration while evaluating the performance index of the proposed algorithm [38]; i) The true source signals and noise signal if any are known.

60

ii)

A family of allowed distortions was chosen according to the application but independently of the kind of mixture or the algorithm used. iii)

The mixing and the de-mixing technique do not need to be known.

The measure of separation was computed for each estimated source ̃ 𝑗

by comparing it to a given source s j

. The estimated sources ̃ 𝑗

were then compared with all the sources s j

so as to give an index of performance attained by this technique. This computation involved two successive steps. In the first step, ̃ 𝑗

is decomposed as s j

 s t arg et

 e int erf

 e noise

 e artif

,

(4.10) where s t arg et

 f ( s j

)

is a version of s j

modified by an allowed distortion, and where e int erf

, e

and noise e artif are error terms for the interferences, noise and artifacts respectively [39]. These three terms represent the part of perceived as coming from the wanted source, from other unwanted sources, from sensor noise and from other causes like the forbidden distortions of the sources and or burbling artifacts. In the second and final step after estimating the sources, energy ratios were computed to evaluate the relative amount of these three terms on the whole signal duration.

4.4

Observations Made

The algorithm above was applied for M signals with M = 2, 3, and 4. A set of 2, 3, and

4 signal mixtures 𝐱 = (Signal Mixtures 2, Signal Mixtures 3, … Signal Mixtures 4 … ), where 𝐱 = 𝐀𝐬 were obtained at a time using a randomly generated 2 × 2 , 3 × 3 and 4 × 4 mixing

61

matrix 𝐀 . Adding to the complexity of the code there was also increased run time but better results in terms of faster convergence of simulations occurred when M = 2 signals. The algorithm converged at or slightly lower than 15 iterations. Figure 4.5 shows gradient ascent function values for entropy ℎ(𝐘) , gradient of entropy 𝛁ℎ and the step size 𝜆 .

4.4.1

Results for M = 2 Source Signals

High-kurtosis signals were imported from MATLAB’s “datafun” directory and are shown as Signal 1 for a bird chirp and Signal 2 for a gong, in the first row of Figure

4.4. The random mixtures of the two source signals are shown in the second row of the same figure. The Infomax algorithm was applied with the values shown in Table

4.1.

Table 4.1: Algorithm variable values for M = 2.

Infomax Algorithm iterations 100

Initial step increase factor 0.1

Step size increase factor

Step size decrease factor

Gradient Descent repetitions

1.2

0.1

5

These values produced consistent results. Figure 4.4 row three shows the signals extracted by the algorithm. The similarity of the extracted signals to the original source signals is evident. It should be noted here that, Infomax may not preserve the ordering of the signals, which is the results would sometimes be reversed, with the gong extracted as “Extracted Signal 1” and the chirp extracted as “Extracted Signal 2.”

62

Figure 4-4: An approximate of two sources through ICA.

Figure 4.5 shows gradient ascent function values for entropy h ( Y ) gradient of entropy

 h

, and step size  . The results show entropy converged after approximately 15 iterations, although, generally, the results converged after 15 to 80 iterations. The middle plot depicts  h

, the magnitude of the entropy gradient. As entropy is maximized, the magnitude of the gradient decreases, indicating the maximum is very

63

near the current value. The bottom plot shows how the step size  changes with the

“learning” gradient ascent algorithm.

Figure 4-5: h ( Y )

, gradient

 h and 𝜆 values for M



2 .

4.4.2


An addition of a third signal from the MATLAB data function “Data fun” directory involved the “splat” sound of spilling paint. This therefore, brought about a third signal mixture to the algorithm, which as was noted slightly increases the complexity as well as computation time. The values used in the algorithm are shown in Table 4.2.

Table 4.2: Algorithm variable values for M=3.

Infomax Algorithm iterations

Initial step increase factor


100

0.1

1.2

64



0.1

5

Figure 4-6: ICA Infomax for M = 3.

With the same number of iterations and the same number of gradient ascent repetitions performed, the results were fairly consistent with those obtained in part one, although convergence was less often as approximately fifty percent of runs converged. The results converged more frequently when the number of gradient ascent repetitions increased, but that also increased the computation time, a property that naturally need to be eliminated in many systems. The results of a sample run of a “chirp,” a “gong,”

65

and a “splat” are shown in Figure 4.6. As in the two-signal case of M = 2 signals, the extracted signals are clearly close matches of the source signals. It’s worth noting also that in Figure 4.6 the general signal ordering is not preserved. For three iterations of the gradient ascent function , it was clear that similarities between source signals and extracted signals was attained and entropy value was maximised as shown in the first plot of figure 4.7.

4.4.3


The Infomax method was applied on four source signals and mixtures from

MATLAB’s “Data fun” directory, with the types of each source signal shown in Table

4.3.

Signal 1

Figure 4-7: h ( Y )

, gradient

 h and 𝜆 values for 𝑀 = 3 .


Signal 2 Signal 3 Signal 4

66

Chirp Gong Splat



Initial step increase factor




100

0.1

1.2

0.1

5, 10, 20 ...

Hallelujah Song

Figure 4-8: An approximate of the sources through ICA.

67

4.5

Summary on ICA Algorithm

For high-kurtosis signal, the Infomax algorithm with multiple iterations of a “learning” gradient ascent function proved quite useful in extracting small numbers of signals from signal mixtures. However, it is worth noting that as the number of source signals increased, the algorithm was not able to reliably extract source signals quickly and consistently. This was likely because the random mixtures of an increased number of source signals s resulted in signal mixtures x with “bumpier” entropy functions; that is entropy ℎ(𝐘) = ℎ(𝐖𝐱) with many local maximums. The increased occurrence of local maxims required more iterations of the gradient ascent function with different initial values for the un-mixing matrix

W

to find a starting point that might find the global maximum rather than a local maximum, since the Infomax algorithm only converges and extracts source signals by finding the global maximum of entropy h ( Y )

.

In concluding this chapter it was noted that the assumption that entropy depends only on statistical independence is not sufficient enough to separate source signals from their mixtures especially in random sources in a kaleidoscope environment and nonlinear cases. Therefore, some further constraints on the mixing function and statistical independence were imposed such that the sources could easily and completely be recovered even though the meaning of “blind” here is somehow different from the earlier case due to the required statistical properties of the sources. These reasons necessitated the application artificial intelligence (AI) for better approximation of the source signals. For example, to avoid making many iterations as the source signals increase which make the Infomax algorithm “bumpier” artificial neural network of

68

radial basis functions is invoked for better filtering. From Chapter 3, we had seen

Covers theorem stating that a pattern classification problem cast in a non-linear higher dimensional space is more likely to be linearly separable than in a low dimensional space. This basic and fundamental theorem on separability provides the reason for applying RBF networks. The inherent property thereof, such as regularization process brings about smoothness and flexible conditions on the separating surface less affected by noise but with better generalization.

In the following chapter, the last phase of source separation in Figure 4.1 is considered.

A further sample signal called non-return-to zero (NRZ) usually used in line coding for mobile communication is used to verify the applicability of the proposed technique.

69

CHAPTER FIVE

5.

SIMULATION AND RESULTS

5.1

Radial Basis Network Modelling of Data

A radial basis function network (RBFN) is a linear model for a function, f ( x ), in the form: f ( x )

  w i g i

( x ) ,

(5.1) where 𝑤 𝑖 is a weight and g i

( x ) is the basis function. A radial basis defined here is normal or Gaussian which has a response decreasing or increasing monotonically with the distance from the central point. g ( x )



1

2



2 e



( x

 

)

2

2

 2

,

(5.2)

By representing a function with a radial function allows for exact interpolation of a data set, where every input is mapped exactly into specific target vector. When the approach is applied to the target data sets, the input vector corresponds to the time index and the target vector corresponds to the original signal vector of the sources.

The signals are represented as a weighted sum of Gaussian shaped pulses and the means, standard deviations and weights can be evaluated to see if they are unique for each class and suitable for use as features in a classifier.

70

Several methods are available to automatically model the signal as a sum of weighted

Gaussians. For example the MATLAB function “newrb.m” and “newrbe.m” automatically designs a RBFN that approximates the given function, but as for these functions the width of the Gaussian used to represent the signal was specified. Had the class of the signal been known widths could be determined for feature analysis, but this is not suitable in a situation where the class of the target is not known as in blind source separation problem at hand. The algorithm that fits the Gaussian function to the target signal should be able to make a best fit to the input signal with no knowledge of the class type.

A NETLAB MATLAB Toolbox [36] [40] provided us such proposed algorithm. The algorithm performs simple regression using radial basis function network. The network is supplied with the input data, target data and the number of Gaussians used to approximate the target data. The network is designed using Gaussian Mixture Model

(GMM) trained with the Expectation Maximization (EM) algorithm to find the centres

 i

,

of the pulses. After the centres are found the standard deviation or width of each

Gaussian is set to the average of the distances between the centres times a user specified scale. The problem is reduced to a simple design matrix:







 t





 t

.

.

.

i 1 iN





































1

1

(

.

.

.

( x x

N

1

)

) .

.

.

.

.

.

.

.

.



1



M

.

.

.

( x

1

( x iN

)

)



















 w

.

i 1







 w

.

.

iN







,







 i



1 , 2 ....., K t

  w ,

(5.3a)

(5.3b)

71

where t

is target or desired matrix,  is the Gaussian radial basis function network defined by equation 5.2 and w are the weights. The mean and the variances are found by the GMM and the input during training of the network. The next step is to find the weights of each Gaussian, which is accomplished by using pseudo-inverse of the design matrix: w



(

 T 

)



1  T t ,

(5.4)

To adjust the RBFN network all the free parameters w i

 i and  i i



1 , 2 ,..., M

must be estimated to make it more adaptive and optimal for effective clustering. The specific functions used from the NETLAB MATLAB code to generate the results are ‘rbf,m’,

‘rbftrain’, and ‘rbfwd.w’ and are included in the Appendix D1 .

5.2

Natural Gradient Descent Algorithm

This section deals with supervised learning of the RBF network. The drawback of purely unsupervised method, such as K-means clustering, is that clustering may not be relevant for the target function. Then improving parameters using gradient descent is imported for the control process shown in simulation. The network was trained using back-propagation by gradient descent method. The improved values of the basis function centers and widths together with the output unit weights are determined. This is also called supervised learning of the RBF network. From a sample artificial network figure 5.1 the developed RBF Algorithm can easily visualized.

72

Figure 5.1 A 2-3-2 RBFN artificial network

For simplicity the above network can be used to illustrate the RBF algorithm. A training data set can be represented as 100 × 4 matrix;



















 x

1 , 1

.

.

.

x

1 , p

.

.

.

x

1 , 100 x

2 , 1

.

.

x

2 , p

.

.

.

.

x

2 , 100 t

1 , 1

.

.

t

1 , p

.

.

.

.

t

1 , 100 t

2 , 1

.

.

t

2 , p

.

.

.

.



















 t

2 , 100

The input part denoted by 𝑿

0

is the first two columns. And the output (target) is denoted by 𝑻 represented by the last two columns. The output of layers 1 and 2 are

5.5 represented by 𝑿

1

= { 𝒙

𝟑,𝒑

, 𝒙

𝟒,𝒑

, 𝒙

𝟓,𝒑

} and 𝑿

2

= { 𝒙

𝟔,𝒑

, 𝒙

𝟕,𝒑

} and p varies 𝑝 =

1, 2, … . .100

.

The centers of the Gaussian functions in the second layer can be expressed as shown below with each row representing a centre point in a two dimensional space. 𝑐

11

𝐂 = ( 𝑐

21 𝑐

31 𝑐 𝑐 𝑐

12

22

32

) 5.6

73

The standard deviation of the Gaussian in the first layer can be expressed as shown below where each element represents a standard deviation of a Gaussian functions in the second layer.

σ

1

σ = ( σ

2

σ

3

) the parameters 𝑊 for the second layer can be represented as follows

5.7

𝑊

11

𝐖 = ( 𝑊

21

𝑊

31

𝑊

12

𝑊

22

)

𝑊

32

5.8

The equations for computing the output of the first layer are 𝑥

3 𝑥

4 𝑥

5

= exp (−

= exp (−

= exp (−

∥x−c

1

∥

2

2𝜎

2

1

)

∥x−c

2

∥

2

2𝜎

2

2

)

∥x−c

3

∥

2

2𝜎

2

3

)

The Euclidean distance is a 𝑝 × 3 matrix between 𝑝 input vectors and 3 Gaussian centers. The equation for computing the output of the second layer is;

5.9

𝐗

𝟐

= 𝐗

𝟏

∗ 𝐖

5.10

The instantaneous error measure for the 𝑝 𝑡ℎ data pair due to outputs 𝑋

6,𝑝

and 𝑋

7,𝑝 and targets 𝑇

6,𝑝

and 𝑇

7,𝑝

is defined as

𝐸 𝑝

= (𝑡

6,𝑝

− 𝑥

6,𝑝

)

2

+ (𝑡

7,𝑝

− 𝑥

7,𝑝

)

2

5.11

𝜕𝐸 𝑝

𝜕𝐗

2

= [−2(𝑡

6,𝑝

− 𝑥

6,𝑝

) − 2 (𝑡

7,𝑝

− 𝑥

7,𝑝

)] 5.12

74

The derivative of error 𝐸 𝑝

with respect to the second or output layer weights is given by;

𝜕𝐸 𝑝

𝜕𝐖

= 𝐗

𝟏

∗

𝛛𝐄

𝛛𝐗

𝟐

The accumulated derivatives with respect to 𝐗

𝟏 are

5.13

𝜕𝐸 𝑝

𝜕𝐗

1

=

𝜕𝐄

𝜕𝐗

2

∗ 𝐖

The derivative of the first layer with respect to the standard deviation 𝜎 are;

5.14

𝜕𝑥

3

𝜕𝜎

1

𝜕𝑥

4

𝜕𝜎

2

𝜕𝑥

5

𝜕𝜎

3

= exp (−

∥x−c

1

∥

2

2𝜎

2

1

)

=

= 𝑥 𝑥

4

5

∥x−c

1

∥

2

2𝜎

2

1

∥x−c

2

∥

2

2𝜎

2

1

∥x−c

3

∥

2

2𝜎

2

1

= 𝑥

3

∥x−c

1

∥

2

2𝜎

2

1

Hence

𝜕𝐗

1

𝜕𝛔

= 𝐗

𝟏

∗

∥x−c

1

∥

2 𝜎

3

1

𝜕𝐗

1

𝜕Σ

= ∑ 𝑃 𝑝=1

(

𝜕𝐄 𝑝

𝜕𝐗

1

∗

𝜕𝐗

1 )

𝜕𝜎

The derivatives of the first layer with respect to the standard deviation 𝜎 are;

5.15

5.16

5.17

𝜕𝐸 𝑝

𝜕𝐂

= 𝑑𝑖𝑎𝑔 (𝜎 −2 ) ∗ {(

𝜕𝐸

𝜕𝑋

1

∗ 𝑋

1

)

𝑇

∗ 𝑋

1

− 𝑑𝑖𝑎𝑔 (𝑠𝑢𝑚 (

𝜕𝐸

𝜕𝑋

1

∗ 𝑋

1

)) ∗ 𝐂}

The corresponding modifier formulas for steepest descents are as follows:

𝑊 𝑗

(𝑛 + 1) = 𝑤 𝑗

(𝑛) + 𝜂𝑒(n)

𝜕𝑦(𝑛+1)

𝜕𝑤 𝑗

(𝑛)

5.18

5.19 𝑐 𝑗𝑖

(𝑛 + 1) = 𝑐 𝑗𝑖

(𝑛) + 𝜂𝑒(n)

𝜕𝑦(𝑛+1)

𝜕𝑐 𝑗𝑖

(𝑛) 𝜎 𝑗

(𝑛 + 1) = 𝜎 𝑗

(𝑛) + 𝜂𝑒(𝑛)

𝜕𝑦(𝑛+1)

𝜕𝜎 𝑗

(𝑛)

75

5.20

5.21

5.3

A Sample Communication Signal

Once it was determined that neural network analysis through Infomax Algorithm was fairly reliable for a number of signal, it was tested on a simple communication signal, the polar non-return to zero signal [37]. Digital baseband signals often use line codes to provide particular spectral characteristics of pulse train [41]. The most common codes used for mobile communication is polar non-return-to-zero NRZ because it offers simple synchronization capabilities.

5.4

Infomax Algorithm for Polar NRZ Signal

To implement a polar NRZ signal into the Infomax algorithm first a function that would create a polar NRZ signal in MATLAB is required. Attempting to extract polar

NRZ signals using the high-kurtosis model cdf g( Y )=tanh( Y ) and model pdf g ' ( Y )



1

 tanh

2

( y ) is unsuccessful and therefore it is necessary to model the cdf and pdf of a polar NRZ signal. A sample of polar NRZ signal is shown in Figure 5.1, the cdf is in Figure 5.2 and its associated derivative is shown in Figure 5.3

Figure 5-1: A sample NRZ signal.

76

Figure 5-2: Theoretical polar NRZ cdf.

Figure 5-3: Theoretical polar NRZ pdf.

The equation representing the cdf 𝑔(𝑦) of a polar NRZ signal as shown in Figure 5.3 is g ( y )



0 .

5 u ( y



1 )



0 .

5 u ( y



1 ), (5.22) where u ( y ) is the unit step function. The corresponding derivative g ' ( y ) is g ' ( y )



0 .

5



( y



1 )



0 .

5



( y



1 ), (5.23)

The theoretical cdf and pdf for the polar NRZ signal cannot be implemented in

MATALB due to delta function in pdf which creates a necessity of approximating the cdf and pdf that could be implementable in MATLAB. Therefore basically before executing Infomax algorithm the model function cdf g ( Y )

, the pdf g ' ( Y ) and the

77

derivative of the pdf say dpdf g '' ( Y ) implementable in the Infomax using MATLAB are created. The first approach models the delta functions in the pdf as the triangles with arbitrary narrow bases. The second approach uses a modified version of hyperbolic tangent function to more closely model the cdf and pdf of a polar NRZ signal [35].

5.4.1

The Triangular Model

The approach involves approximating the delta function in the pdf of Figure 5.3 with a base of

2



,

where  is appropriately chosen as an arbitrary small value as shown in

Figure 5.4 to model out the analysable Kronecker delta function of the digital signal

NRZ.

Figure 5-4: Approximate polar NRZ pdf.

The resultant expression approximate pdf q ' ( y ) for a polar NRZ signal is;

78

q ' ( y )















0

2

2



1





1

2

2













0



2



2

0

1





1

2

2 y

 y



1

 

1

2





2



2

2

 y

 y





1

2





2





1

2

2

 y





( 1

 

)





1







( 1

 

) y



1



  x



1



( 1

 

) ,

1

   y



1

1

 y



1

  y



( 1

 

) y

 

1

(5.24)

The approximate expression for cdf q ( y ) is found by integrating the pdf q ' ( y ), from the above expression and it is given as; q ( y )















1 m

 y

2 

( 1



2

0 .

25

















0 .

5







1

2 m

0 .

75



 y 2





1

2





1

2





)

2

( 1

 

0



 b

 m



1

 y

2





0 .

5

) 2



 y ( 1

  b ( 1

 b



 y

 y )





( 1

 m

 m 1

 y

2



 b

 y



1

 



1 .

0

)





)

 

 y

 

( 1

 

)



( 1

 

)

 y

 

1



1

 y

  

1

 

1

 y



1

 

1

   y



1

1

 y



1

  y



1

 

(5.25)

The derivative of the pdf q '' ( y ) is constructed by simply taking the slope values from the appropriate regions of the pdf, resulting in q ' ' ( y )































2



2

2



0

1





0

1



2



0

1

1

2

2

2

2 x

 

( 1

 

)



( 1

 

)

 x

 

1





1

 x

  

1



1

 x



( 1

 

) ,

1

   x



1

1

 x



1

  x



( 1

 

)

(5.26)

79

The functions were implemented in Matlab as code for “polarNRZpdf”,

“polarNRZcdf” and “polarNRZdpdf”. The MATLAB implementations of the approximate model cdf q ( y ) , pdf q ( y ) , and pdf derivative q '' ( y ) are plotted for q '' ( y ) in Figure 5.5. The approximate functions q ( y ) and q ' ( y ) bear a striking resemblance to the theoretical functions g ( y ) and g ' ( y ) shown in Figure 5.3 with the added benefit that they can be implemented in MATLAB so that the Infomax algorithm can be adapted for the polar NRZ signal.

80

Figure 5-5: Approximate polar NRZ cdf



( y ) , pdf



' ( y ) , and



' ' ( y ) for

 

0 .

1 .

81

5.4.2

The Hyperbolic Tangent Model

This is second approach used to modify the version of hyperbolic tangent function to more closely model the cdf and pdf of a polar NRZ signal that can be loaded into

Infomax algorithm. The second approximate polar NRZ cdf q ( y ) is created by duplicating the shifting the hyperbolic tangent function so as to closely approximate the theoretical cdf. A good approximation of the cdf is found to be



( y )













1

4

1

4 tanh

  tanh

 

( y



( y

1 )







1 )





1

3 y



0 y



0

, where  is a compression factor that controls the slope of the function.

The pdf 

' ( y ) approximate is found by taking the derivative of the cdf 

( y )

(5.27)



' ( y )















4



1

 tanh

2

 

( y



1 )

 



1

 tanh

2

 

( y



1 )

  y y



0



0

, (5.28)

Which can easily be seen as 

' ( y )





4



1

 tanh

2

 

( y



1 )

 







An expression for 

'' ( y ) is found by taking the derivative of the pdf 

'' ( y )

:



'' ( y )

 d dy



' ( y )

 d dy



4



1

 tanh

2

 

( y



1 )

 



0

 d dy



4 tanh

2

 

( y



1 )

 



, (5.29)

Using the chain rule, we get



'' ( y )

 





4



1

 tanh 2

 

( y



1 )

  

 

2



4 tanh

 

( y



1 )

 

, (5.30)

Equation 5.30 simplifies to



'' ( y )









2

2 tanh

2

 

( y



1 )









' ( y ), (5.31)

82

Therefore equation 5.31 can further be represented as shown for pdf derivative 𝜗 ′′ (𝑦)



" ( y )







 

2

2



2

2 tanh

 

( y tanh





( y





1 )

1 )





















'

' ( y )

( y ) y



0 y



0

, (5.32)

The approximate polar NRZ cdf 

( y ) and pdf 

' ( y ) using the hyperbolic tangent function are shown in Figure 5.6. Like the triangular model, they closely resemble the theoretical versions in Figure 5.3 and Figure 5.4.

Figure 5-6: Approximate polar NRZ cdf



( y ) , and pdf



'' ( y ) with

 

10 .

When this model was used for radial basis function network blind source separation, it produced good results from those Infomax algorithm with a hyperbolic tangent

83

model with  

2 . This is because the hyperbolic tangent model provided a more continuous pdf which was differentiable at more points than the triangular model pdf.

5.5

Radial Basis Function Network Simulations

The simulations using basis network produced more consistent results which converged faster, and with greater number of signals and the signal amplitudes, basically borrowed by the fact that the hyperbolic tangent model gave a good approximation for polar cdf and pdf than the triangular model approximation. a) Results for M=2 source signals

The Infomax algorithm was run with two source signals of randomly generated amplitude, bit rate and time shift using the values shown in the Table 5.1

Table5. 1: Algorithm variable values for M=2.


Initial step size 

Step size increase factor 

Step size decrease factor 


100

0.05

1.2

0.1

50

Figure 5.7 shows the results of a sample trial, where the top row shows the randomly generated source signals, the middle row shows the random mixtures of the signals, and the bottom row shows the extracted signals. It is clear from Figure 5.7 that

Extracted Signal 1 shares the same bit pattern as Source Signal 2, but with a different magnitude, and the case is the same with Source Signal 1 and Extracted Signal 2. In

84

fact, every single successful run of the Infomax algorithm extracted signals with a magnitude of approximately one. This result is due to the tendency of the Infomax algorithm to match the pdf of the extracted signal to the model pdf supplied to the algorithm. Since the model pdf was developed around a polar NRZ signal with amplitude of 1.0, the extracted signal also had amplitude of 1.0. In addition to not preserving the magnitude of the original signal, repeated trials showed the algorithm did not necessarily preserve the ordering as earlier seen.

Figure 5-7: RBFN approximation of the sources.

85

The sources s

1 and s

2

were estimated using both algorithms. No artifacts affected the mixing samples as SAR  ∞. The estimated error was dominated by interference and distortion as SDR  SIR as shown in the plots below. The SIR and SDR of the input signal s

1 is higher than that obtained for s

2 showing that the de-mixing matrix estimated s

1

is closer to the true de-mixing matrix than that of s

2 in both ICA and

RBFN.

48

47

46

45

44

43

42

41

40

39

49

Signal to Distortion Ratio in dB

s1 s2

Source Number

ICA

RBF

Figure 5-8: SDR comparison between ICA and RBFN Using two source signals.

86

Signal to Interference Ratio in dB

46

45

44

43

42

41

40

39

ICA

RBF

38 s1 s2

Source Number

Figure 5-9: SIR comparison between ICA and RBFN Using two source signals. b) Results for M = 4 Source Signals

The Infomax algorithm was again repeated for four source signals of randomly selected amplitude, bit rate, and time shift using the values shown in Table 5.2 below.


Infomax Algorithm iterations 100

Initial step size  0.05



Gradient ascent repetitions

1.2

0.1

100

For the given values, no signals of significant nature were extracted. Successful implementation for a reasonably fast convergence of the algorithm for greater than

87

three source signals either requires a more advanced gradient ascent repetitions or a more exhaustive search for the optimal unmixing matrix W , resulting in a significantly increased simulation time. By channelling the signals to the RBFN system the following estimate of the source signals were extracted;

Figure 5-10: Radial Basis Function Approximation of four sources signals.

5.6

The Polar Non-Return to Zero Simulations

5.6.1


The Infomax RBF algorithm was run with two source signals of randomly selected amplitude, the bit rate, time shift and number of neurons using the values shown in

Table 5.3. The Infomax was runs just between one and five iterations and just once after training the RBF network. The figures 5.11 and 5.12 shows the output signals at the output neuron of RBF network for number of source signals 𝑀 = 2 and 𝑀 = 4 .



Initial step size 

10

0.05

88



Gradient ascent repetitions

Number of RBF neurons

1.2

0.1

100

9

0.2

0

-0.2

-0.4

-0.6

-0.8

0

0.8

0.6

0.4

100

Extracted Signal s1 using RBFN Extracted Signal s1 using RBFN

0.2

0.15

0.1

0.05

0

-0.05

-0.1

-0.15

-0.2

0 200 300

Discrete time

400 500 100 200 300

Discrete time

Figure 5-11: RBFN approximation of two sources.

400 500

Signal 1 Signal 2

-1

0

0.1

0

1

0

-0.1

0

100 200 300 discrete time

Signal Mixture 1

400 500

0.2

0

-0.2

0

0.5

0

-0.5

0



400 500


400 500 100 200 300 discrete time

400 500

Figure 5-12: An approximate of two sources through ICA.

5.6.2

Results for M = 3 NRZ Source Signals

To investigate the performance of the proposed RBF network in comparison with the

ICA network with Infomax algorithm in function approximation, we used a set of three

89

independent uniform series of

  t

N



1

,

  t

N



1

, and

  t

N



1

, with size N



5000 whereby the observations were obtained with the mixing matrix

W , below. The corresponding outputs

  t

N



1

,

  t

N



1

and

  t

N



1

, N



5000 , was also available for radial basis network. The first 1,000 data was let to be training set, and the remaining

500 data the testing set.

w







0 .

3803

 0 .

6624





0 .

4924

0 .

7890

1 .

0228

1 .

2619



0

0

.

.

8721

8450

2 .

6604 









, W















2 .

1 .

4070

0941

3 .

5528



4 .

4595

2

0

.

.

5292

5043



5

0

.

.

6743

8294

2 .

6604 





,





I







0 .

9373

 0 .

0465





1 .

7640

0 .

7395

0 .

0590

0 .

7136

0 .

4921

2 .

3592

0 .

0541 









,

Signal 1

1

0

-1

0 500 discrete time

1


0

-1

0 500 discrete time

Extracted Signal 1

2

0

-2

0 discrete time

500

Signal 2

0.1

0

-0.1

0 500 discrete time

1


0

-1

0 500 discrete time


2

0

-2

0 discrete time

500

Signal 3

0.5

0

-0.5

0 500 discrete time

1


0

-1

0 500 discrete time


1

0

-1

0 discrete time

500

Figure 5-13: An approximate of four sources through ICA.

90

Figure 5-14: An approximate of four sources by RBFN.

5.6.3

Results for M = 3 Nonlinear Mixing Function

In this experiment we applied the technique to approximate the sources from nonlinear mixed function. Once it was determined that independent component analysis through

Radial Basis Function was very reliable for a number of signals, it was tested on a nonlinear sample communication signal, the polar non-return to zero signal of the model as seen from above.

Figure 5-15: An approximate of the sources through ICA.

91

Figure 5-16: RBFN approximation of three sources.

5.7

Performance Measurement in Blind Signal De-noising

In this section the performance of the proposed algorithm is evaluated and compared with other conventionally used algorithm. The proposed method in this thesis is IC-

RBF network against ICA algorithm. In this section the performance criteria was looked into when the only allowed distortions on s j

 y j are time-invariant gains.

First, the estimated sources are decomposed into three terms as shown s j

 s t arg et

 e int erf

 e artif

,

(5.33)

And then define relevant energy ratios in decibels (dB) between these two terms. The main metrics considered here are Signal to Interference Ratio (SIR) and Signal to

Artifacts Ratio (SAR) as plotted in the figures below and they were evaluated using the BASS-EVAL Toolbox in MATLAB.

In order to assess the relevance of performance measurement, tests on signals obtained from MATLAB’s “datafun” directory and are shown as Signal 1 (a bird chirp), Signal

2 (a gong), Signal 3 (the “splat” sound of spilling paint) and Signal 4 Handel's

"Hallelujah Chorus". In this section sources were estimated from various instantaneous mixtures using both ICA and RBFN algorithms and thereafter tested by

92

first listening at a frequency Fs=10000 Hz or 8192 Hz for “Hallelujah Choir” and the above mentioned measures were evaluated. The emphasis taken into account as to the best way to evaluate the meaningfulness of the success of these results were; listening to the extracted sounds and comparing with the related performance indices. The results for different scenarios are as shown below a)

Instantaneous

2



2

mixture

49


48

44

43

42

41

40

47

46

45

ICA

RBF

39 s1 s2

Source Number

Figure 5-17: SDR comparison between ICA and RBFN Using two source signals.

93


46

45

44

43

42

41

40

39

ICA

RBF

38 s1 s2

Source Number

Figure 5-18: SIR comparison between ICA and RBFN Using two source signals. b)

Instantaneous

4



4

mixture

The sources s

1

, s

2 s

3

and s

4

were estimated using both algorithms. In this example also no artifacts affected the mixing samples as SAR  ∞. The estimated error was dominated by interference and distortion as SDR  SIR as shown in the plots Figure 5.

20 and Figure 5.21. The SIR and SDR of the input signal s1 is higher while those obtained for other signals s

2

and s

are almost equal showing that the de-mixing

3 matrix is closer to the true demixing matrix in both ICA and RBFN.

94

Signal Interference Ratio in dB

30,0

25,0

20,0

15,0

10,0

ICA

RBFN

5,0

0,0 s1 s2 s3 s4

Source Number

Figure 5-19: SIR comparison between ICA and RBFN Using four source signals.


35,0

30,0

25,0

20,0

15,0

10,0

ICA

RBFN

5,0

0,0 s1 s2 s3 s4

Source Number

Figure 5-20: SDR comparison between ICA and RBFN Using four source signals.

95

Signal Artifacts Ratio in dB

280

260

240

ICA

220

RBFN

200 s1 s2 s3

Source Number s4

Figure 5-21: SAR comparison between ICA and RBFN Using four source signals. c)

Instantaneous

3



3

mixture for NRZ Sample Signal.

The sources s

1

, s

2

and s

3 were estimated using both algorithms. In this example also no artifacts affected the mixing samples as SAR  ∞ although it closely related in both

ICA and RBFN algorithms. The estimated error by interference and distortion as SDR

 SIR as shown in the plots Figure 5.22, Figure 5.23and Figure 5.24below. The SIR and SDR attained for the input signal s

1 s

2

and s

3 is higher and better. This shows that the estimated sources are as close as the inputs and that the demixing matrix is closer to the true demixing matrix in RBFN than in ICA. From the performance measurement indices of SIR, SDR and SAR, it can be seen that RBFN generally give a better approximation of the sources.

96


70

60

50

40

30

20

10

0

ICA

RBFN

S1 S2

Source Number

S3

Figure 5-22: SIR comparison between ICA and RBFN Using three source signals.

Signal Distortion Ratio in dB

80

70

60

50

40

30

20

ICA

RBFN

10

0

S1 S2 S3

Source Number

Figure 5-23: SDR comparison between ICA and RBFN Using three source signals.

97


255

250

245

240

235

230

225

220

215

ICA

RBFN

S1 S2

Source Number

S3

Figure 5-24: SAR comparison between ICA and RBFN Using three source signals.

5.7.1

Results of Using RBFN in Nonlinear Non-Return to Zero source mixtures

In this section instantaneous 3



3 mixture is used for Non-linear NRZ Sample Signal.

The sources s

1

, s

2

and s

were estimated using both algorithms. In this example also

3 no artifacts affected the mixing samples as SAR  ∞ although it closely related in both

ICA and RBFN algorithms. The estimated error by interference and distortion as SDR

 SIR as shown in the plots Figure 5.25 , Figure 5.26 and Figure 5.27. The SIR and

SDR attained for the input signal s

1 s

2

and s

3

achieves better results as shown by higher and better ratios. This shows that the estimated sources are as close as the inputs and that the de-mixing matrix is closer to the true de-mixing matrix in RBFN than in

ICA. From the performance measurement indices of SIR, SDR and SAR, it can be seen that RBFN generally give a better approximation of the sources, but it is also

98

worth noting that the level of separation attained when the mixing matrix is cubed is poorer than when it is linear as shown from Fugures5.22-24 and Figures 5.25-27.

Signal Interference Ratio in dB

70

60

50

40

30

20

10

0

ICA

RBFN s1 s2

Source Number s3

Figure 5-25: SIR comparison between ICA and RBFN Using three source signals.

Signal Distortion Ratio in dB

40

30

20

10

0

70

60

50

ICA

RBFN s1 s2

Source Number s3

Figure 5-26: SDR comparison between ICA and RBFN Using three source signals.

99


240

235

230

225

220

215

210

ICA

RBFN s1 s2

Source Number s3

Figure 5-27: SAR comparison between ICA and RBFN Using three source signals

5.7.2

Summary of the RBFN Performance as compared to those of ICA

The acoustic application considered comprised of signals from several sources tapped by microphones for processing. The outputs extracted correspond to the separate, primary and independent quality source signals.

The results obtained by Independent Component Analysis technique incorporating the artificial intelligence method in the Tables 5.3, 5.4 and 5.5 demonstrate that the proposed method improves the performance of a blind source separation system. The performance measurement obtained by comparing the output signals to the input sources and ICA method used in the same field is also compared. From the simulation result this is shown as an overall improvement compared to early systems that were optimized using only Independent Component Analysis. Therefore, the proposed ICA-

RBFN algorithm can be used to solve a blind source problem of which the objective is to obtain sources that are linearly and nonlinearly mixed during transmission.

100

S

4

S

3

S

1

S

2

Table 5.3: Case 1 -Two Source Signal and Case 2 -Four Source Signals.

SDR in (dB) SIR in (dB) SAR in (dB) SDR in (dB) SIR in (dB)

IC

A

RBF

N

IC

A

RBF

N

ICA RBF

N

IC

A

RBF

N

IC

A

RBF

N

15.5 46.

7

42.

1

47.7 45.

4

42.1 41.

0

45.4 258.

229.0

15.

6 3

41.0 244.

8

232.6 4.8

15.3 15.

0

4.9 4.8 4.9

239.

4

221.2 26.

3

242.

3

227.6 6.9

26.3 26.

3

6.9 6.8

23.9

7.0

Sources

Table 5.4: Case 3 Three NRZ Source Signal Estimation with Linear Mixture.

SAR in dB SDR in dB SIR in dB

ICA RBFN ICA RBFN ICA RBFN

Signals

S1 244.4 248.9

6

21.

56.1 25.6 52.1

S2

S3

247.3

228.0

239.6

233.7

41.3

19.6

34.5

68.4

44.8

39.4

37.5

68.4

101

Table 5.5: Case 4: Three NRZ Source Signal Estimation with Non-linear Mixture.

SAR in dB SDR in dB SIR in dB

Sources

Signals

S1

S2

S3

ICA RBFN ICA RBFN ICA RBFN

234.1 233.1 4.1

221.1 225.5 7.2

6.7

7.3

4.1

7.2

6.7

7.4

232.1 218.8 28.9 57.4 25.7 57.4

102

CHAPTER SIX

6.

CONCLUSION AND RECOMMENDATION

6.1

Conclusions

In this research in has been shown that; i)

The source signals can be recovered after interference. The main results from the radial basis function acting as network-based cancellers achieve better approximation of the interference signal in comparison to the standard independent component analysis. In this formulation independent component acted as our pre-process of inputs to the basis functions. The reason is as a result using non-linear mapping between mixtures and sources, approximated by a linear combination of specialized Green’s function in this case called Gaussian functions in the network. ii)

The RFBN was adaptive by training or learning which involved weightings and basis centres. iii)

The simulations and the performance indices of SDR and SIR have shown applicability and efficiency of the proposed canceller.

However, it worth noting that the performance measurement of especially the

SAR was versatile, infinite and as result it was not used to compare ICA-RBF algorithm and ICA.

103

6.2

Recommendations

While the application of RBFN in BSS is a relatively new research field, its popularity is steadily increasing and the opportunities for future work in this subject matter are abundant. The research opportunities range from extension of this research to exploring other linear and nonlinear BSS based systems.

6.3

Extensions of Information Maximization and

Supervised Radial Basis Function Network

a. Signals with Identical Bit Rates

In chapter six we considered polar NRZ signals with both identical and random amplitudes; only randomly selected bit rates were analyzed. Further work could be done to determine the ability of the Infomax algorithm to distinguish between source signals with identical bit rates. b. Gradient Optimization

The Infomax algorithm was found to converge consistently only for small numbers

( M



4 ) of source signals and signal mixtures. With M



6 source signals and mixtures, some similarities between the source and extracted signals were found for audio signals, but little similarity was found between sources and extracted polar NRZ signals. Further investigation of an optimal gradient ascent algorithm in infomax would likely improve the efficiency of the RBFN algorithm as well as extend its capability beyond just a few signal mixtures.

104

c. Additional Signals

While this research focused on audio signals and the polar NRZ communications signal, ICA-RBF could be applied to a variety of other communications signals such as Manchester NRZ Unipolar NRZ and Bipolar RZ and for robust and nonlinear cases.

Additional communication signal types such as signals at intermediate frequency (IF) or radio frequency (RF) with different probability density functions should be considered as well as wireless local area network (WLAN), cellular (especially

CDMA), and frequency-hopped spread signals (FHSS).

The proposed research can be extended in following dimensions. Developing a real-time sequential mixed signal processing. And on the other hand an extended implementation of RBFN algorithm based on the proposed modules can be done for higher dimension that is more than two sources and mixtures of the above nature.

d. Use of other neural network techniques

Radial basis function is just one of the neural networks techniques, one can extend this research into Self Organizing Map SOM forms of neural nets. e. Other ICA Pre-processing Methods

Infomax is just one of many methods of ICA. Numerous other methods could be developed, tested, and refined to determine the relative applicability and efficiency with various signals. These methods include sphering or whitening (separating the signals from additive white Gaussian noise) to the more advanced methods described in Chapter II. All of these possibilities for additional research paint an exciting future for breakthroughs in ICA as a method of Blind Source Separation.

105

106

REFERENCES

[1] D. P. J, Z. Liu and C. A. Philip, “Blind Source Separation In a Distributed

Microphone Meeting Environment for Improved Teleconferencing,” in IEEE

International Conference on Acoustic Speech and Signal Processing , 2008.

[2] B. V. Gowreesunker and . H. T. Ahmed, “Blind Source Separation Using

Monochannel Overcomplete Dictionaries,” in IEEE International Conference on Acoustic Speech and Signal Processing , Minneapolis, 2008.

[3] L. Yongman and L. Tusheng, “A RBF Neural Network Algorithm for Blind

Source,” in In Proc. IEEE International Conference on Intelligent Systems

Design and Application , Jian, 2006.

[4] A. Cichoki and S. Amari, Adaptive Blind Signal and Image Processing, West

Sussex: John and Wesley Sons Ltd, 2002, pp. 1-40.

[5] A. Aissa-El-Bey, K. Abed-Meraim and Y. Grenier, “Underdetermined Blind

Separation of Audio Sources from the Time-Frequency Representation of their

Convolutive Mixtures,” in IEEE International Conference on Signal

Processing , 2007.

[6] C. Seungjin, A. Cichoki, M. P. Hyung, Y. Soo and Lee, “Blind Source

Separation and Independent Component Analysis: A Review,” in Neural

Imaging Processing , 2005.

[7] K. L. Shreeja, “Adaptive Channel Equalization Using Radial Basis Function

Networks and MLP,” National Institute Of Technology, 2009.

[8] V. S. James, Independent Component Analysis: A Tutorial Introduction,

Massachusetts: Massachusetts Institute of Technology, 2004.

[9] K. J, H. A, V. R, H. J and O. E, “Applications of Neural Blind Separation to

Signal and Image Processing,” in In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP'97) , Munich, 1997.

[10] D. Niva, . R. Aurobinda and K. D. Pradipta, “ICA Methods for Blind Source

Separation of Instantaneous Mixtures: A Case Study,” Neural Information

Processing- Letters and Reviews, vol. 11, no. 11, pp. 225-247, 2007.

107

[11] R. C. George and D. M. Clare , Probalitity Methods of Signal and Systems, 3 ed., C. D. MacGillem, Ed., Oxford University Press, September 1986.

[12] J. P. Pandey and D. Singh, “Application of Radial Basis Neural Network for

State Estimation of Power System Networks,” International Journal of

Engineering, Science and Technology, vol. 2, no. 3, pp. 19-28, 2010.

[13] D. S. Broomhead and D. Lowe, “Multivariable Functional Interpolation and

Adaptive networks; Complex Systems,” vol. 2, pp. 321-355, 1998.

[14] S. Renals, “Radial Basis Function Network for Speech Pattern Classification,”

1989.

[15] . S. Renals, . D. McKelvie and F. McInnes, “A Comparative Study of

Continuous Speech Recognition Using Neural Networks and Hidden Markov

Models,” in ICASSP, IEEE International Conference on Acoustics, Speech and

Signal Processing , 1991.

[16] J. Moody and J. D. Christian, “Fast Learning in Network of Locally-Tuned

Processing Units,” Neural Computation, vol. 1, pp. 282-294, 1989.

[17] T. Poggio and F. Girosi, “Regularization algorithms for learning that are equivalent to multilayer networks,” Science, vol. 247, pp. 978-982, 1990.

[18] B. Z. Massoud and J. Christian, “Mutual Minimization: Application to blind source Separation" International Journal of Neural Systems,” Signal

Processing, vol. 85, pp. 975-995, 2005.

[19] H. S, “Neural Networks,” in A Comprehensive Foundation , Ontario, Prentice

Hall International Inc, 1990.

[20] M. C. T, “Geometrical and Statistical Properties of Systems of Linear

Inequalities with Applications in Pattern Recognition,” IEEE Transactions on

Electronic Computers, Vols. EC-14, pp. 326-334, 1965.

[21] R. Steve, M. David and M. Fergus, “A Comparative Study of Continuous

Speech Recognition Using Neural Networks and Hidden Markov Models,”

Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing,, vol. 1, pp. 369-372, 1991.

[22] G. M and L. J. N. D, Linear Robust Control, Englewood Cliffs: Prentice Hall,

1995.

108

[23] W. L. A, “Some Aspects of Radial Basis Function Approximation,” in

Approximation Theory, Spline Functions and Applications , Boston, Kluwer

Academic Publishers, 1992, pp. 163-190.

[24] T. A. N and A. Y. V, Solutions of Ill-posed Problems, Washington DC: Winston and Sons, 1977.

[25] T. A. N, On regularization of ill-posed problems, vol. 153, Doklady

AkademiiNauk, 1973, pp. 49-52.

[26] A. G. Bors, “Introduction of Radial Basis Function (RBF) Networks,” York,

2001.

[27] M. M. Y, Hybrid Algorithm for RBF Network, Perak, 1999.

[28] S. A. Vorobyov and C. Andrzej, “Hyper Radial Basis Function Neural Network for interference Cancellation with Non-linear Processing of Reference Signal,”

Digital Signal Processing, vol. 11, pp. 204-221, 2002.

[29] T. A. Al-Zohairy, “Adaptive Control of Non-Linear Multivariable Dynamical

Systems Using RBF Neural Network,” Canadian Journal on Automation,

Control Engineering and Intelligent Systems, vol. 2, no. 1, 2011.

[30] T. A. AL-Zohairy, “Direct Adaptive Control of Unknown Non-Linear

Multivariable Systems Using Radial Basis Function Neural Networks with

Gradient and K- Means,” International Journal of Computer Theory and

Engineering , vol. 3, no. 6, 2011.

[31] F. M and P. G, “An On-line Training Radial Basis Function Neural Network for

Optimum Operation of the UPFC,” European Transactions on Electrical

Power, vol. 21, no. 1, pp. 27-39, 2011.

[32] L. Y, B. A and G. R, “An Algorithm for Vector Quantizer Design,” IEEE Trans.

Commun, vol. 28, pp. 84-96, 1980.

[33] C. S, M. L. S and M. B, “Complex –Valued Radial Basis Function Network,

Part 1 : Network Architecture and Learning Algorithm,” Signal Processing, vol.

35, pp. 19-34, 1994.

[34] K. A, L. M and S. J, “On Radial Basis Function Network Equalization in the

GSM System,” ESANN Proceedings - European Symposium on Artificial

Neural Networks Bruges, pp. 179-184, 2003.

109

[35] J. H. Garvey, “Independent Component Analysis by Entropy Maximization

(Infomax),” Institutional Archive of Naval Postgraduate School, Calfornia,

2007.

[36] N. I, “NETLAB: Neural Network Software,” August 2000. [Online]. Available: http://www.ncrg.aston.ac.uk/netlab/netlab. [Accessed November 2013].

[37] S. Haykin, “An Introduction to Analog and Digital communication,” New

York, John WIley & Sons, 1989, pp. 197-200.

[38] V. E, G. R and F. C, “Performance Measurement in Blind Audio Source

Separation,” IEEE Transactions on Speech and Audio Processing, vol. 14, no.

4, pp. 1462-1469, 2006.

[39] V. E, G. R and F. C, “Proposal for Performance Measurement in Source

Separation,” in Proc. Int. Symposium in ICA and BSS, pp. 763-768, 2003.

[40] “Netlab Neural Network software,” Neural Computing Research Group,

Division of Electric Engineering and Computer Science, 2002. [Online].

Available: http://www.ncrg.aston.ac.uk/ and http://corp.koders.com/. http://www.koders.com/matlab/gaussian+fitting.. [Accessed November 2013].

[41] T. S. Rappaport, “Wireless Communications,” in Principles and Practice

Second Edition , Prentice Hall Engineering and Emerging Technologies Series,

2002, pp. 281-295.

110

APPENDICES

APPENDIX A. THE RELATIONSHIP BETWEEN JACOBBIAN MATRIX

AND THE UNMIXING MATRIX

M



M

Jacobbian Matrix The Jacobbian J is a scalar in which the determinant of an

J

of partial derivative. If M=2

J



 y

 x













 y



 x y

1

2

 x

1

1

 y

1



 x y

 x

2

2

2











,

J is a determinant of J

J



J



 y

1

 x

 y

1

2

 x

1

 y

1



 x y

2

2

 x

2



 y

1

 x

1

 y

 x

2

2



 y

1

 x

2

 y

 x

1

2

(A1)

(A2)

The relationship between J and unmixing matrix W can the be seen as follows

Equation (1) y



Wx y





 y y

1

2





, W





 w

11 w

21 w w

12

22





, x





 x x

1

2





,

(A3)



 y y

2

1









 w

11 w

21 w

12 w

22







 x x

2

1



 ,

(A4) y

1

 w

11 x

1 y

2

 w

21 x

1

Equation A1 requires that



 w

12 w x

22

2 x

2

 y

1  x

1

 y

1  x

2

 y

2  x

1

 y

2  x

2







 w

11 w w

12

21 w

22

J





 w

11 w

21 w

12 w

22







W similar as

(A5)

Det J = Det W ,

111

J



J



W

This expression is borrowed into equation (2.42)

APPENDIX B. Infomax Expression for Entropy

Substituting equation (41) into (38)

H ( Y )

 

1

N

N  i



1 ln

W

P x

( x t

)

P s

( y t

) 

,

H ( Y )

 

1

N i

N 



1

 ln P x

( x t

)

 ln W

 ln P s

( y t

) ,



H ( Y )

 

1

N i

N 



1 ln P x

( x t

)



1

N

N  i



1

 ln P s

( y t

)

 ln W ,



The first part of expression B3 is entropy due to the observed signal

H ( X )

 

1

N

N  i



1 ln P ( x t

) ,

H ( Y )



H ( X )



1

N

N  i



1

 ln P s

( y t

)

 ln W ,



(A6)

(B1)

(B2)

(B3)

(B4)

(B5)

112

APPENDIX C: MATLAB CODES

APPENDIX C1:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Demonstration code for "Independent component analysis: A Tutorial Introduction"

% D. O OMBATI, SEEIE, September 2015.

% MAY 2015, DENIS OMBATI TAMARO, TIE Department, JKUAT, KENYA.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Basic ICA algorithm demonstrated on 2 speech signals.

% The default value of each parameter is given in [] brackets.

% [0] Set to 1 to hear signals.

listen=1; % set to 0 if you have to mute the audio.

% [1] Set random number seed.

seed=9; rand( 'seed' ,seed); randn( 'seed' ,seed);

% [2] M = number of source signals and signal mixtures.

M = 2;

% [1e4] N = number of data points per signal.

N = 1e4;

% Load data, each of M=2 columns contains a different source signal.

% Each column has N rows (signal values).

% Load standard matlab sounds (from MatLab'sdatafun directory)

% Set variance of each source to unity.

load chirp ; s1=y(1:N); s1=s1/std(s1); load gong ; s2=y(1:N); s2=s2/std(s2);

% Combine sources into vector variable s.

s=[s1,s2];

% Make new mixing matrix.

A=randn(M,M);

% Listen to speech signals ...

% [10000] Fs Sample rate of speech.

Fs=10000; if listen soundsc(s(:,1),Fs); soundsc(s(:,2),Fs); end ;

% Plot histogram of each source signal -

% this approximates pdf of each source.

figure(1);hist(s(:,1),50); drawnow; figure(2);hist(s(:,2),50); drawnow;

% Make M mixures x from M source signals s.

x = s*A;

% Listen to signal mixtures signals ...

113

if listen soundsc(x(:,1),Fs); soundsc(x(:,2),Fs); end ;

% Initialise unmixing matrix W to identity matrix.

W = eye(M,M);

% Initialise y, the estimated source signals.

y = x*W;

% Print out initial correlations between

% each estimated source y and every source signal s.

r=corrcoef([y s]); fprintf( 'Initial correlations of source and extracted signals\n' ); rinitial=abs(r(M+1:2*M,1:M)) maxiter=100; % [100] Maximum number of iterations.

eta=1; % [0.25] Step size for gradient ascent.

% Make array hs to store values of function and gradient magnitude.

hs=zeros(maxiter,1); gs=zeros(maxiter,1);

% Begin gradient ascent on h ...

for iter=1:maxiter

% Get estimated source signals, y.

y = x*W; % wtvec in col of W.

% Get estimated maximum entropy signals Y=cdf(y).

Y = tanh(y);

% Find value of function h.

% h = log(abs(det(W))) + sum( log(eps+1-Y(:).^2) )/N; detW = abs(det(W));

h = ( (1/N)*sum(sum(Y)) + 0.5*log(detW) );

% Find matrix of gradients @h/@W_ji ...

g = inv(W') - (2/N)*x'*Y;

% Update W to increase h ...

W = W + eta*g;

% Record h and magnitude of gradient ...

hs(iter)=h; gs(iter)=norm(g(:)); end ;

% Plot change in h and gradient magnitude during optimisation.

figure(1);plot(hs);title( 'Function values - Entropy' ); xlabel( 'Iteration' );ylabel( 'h(Y)' ); figure(2);plot(gs);title( 'Magnitude of Entropy Gradient' ); xlabel( 'Iteration' );ylabel( 'Gradient Magnitude' );

% Print out final correlations ...

r=corrcoef([y s]); fprintf( 'FInal correlations between source and extracted signals ...\n' ); rfinal=abs(r(M+1:2*M,1:M))

% Listen to extracted signals ...

if listen soundsc(y(:,1),Fs); soundsc(y(:,2),Fs); end ;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%

114

APPENDIX C2: The Polar NRZ Signal Functions

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

% POLAR NRZ SIGNAL %

% This function returns a vector representing a polar NRZ signal. %

% s = polarNRZ(totaltime, bitrate, samplerate) %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%% function s = polarNRZ(totaltime, bitrate, samplerate)

% STEP 1 - Generate random bit sequence bits = rand(totaltime * ceil(bitrate), 1) < 0.5; bits = 2 * (bits - .5); %converts bits from 0,1 to -1,1

% STEP 2 - Duplicate bit sequence n times, where n = samplerate / bitrate n = ceil(samplerate/bitrate); x = [ ]; for i = 1:n

x = [x,bits]; % duplicates column vector end

% STEP 3 - Concatenate rows of x y = [ ]; for i = 1:length(bits)

y = [y x(i,:)]; end actualsamples = totaltime * samplerate; s = y(1:actualsamples);

% STEP 4 - change first bit to be random (i.e. not always at the beginning of a bit) shift = ceil(rand*samplerate/bitrate); s = [s(shift + 1:actualsamples) s(1:shift)]';

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

%%

% POLAR NRZ CDF

% This function returns an approximate cdf of a polar NRZ signal using the triangular model with gamma as

% small base constant z = polarNRZcdf(y, gamma)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%% function z = polarNRZcdf(y, gamma)

m = 1 / (2 * gamma^2); % slope (pos or neg) bpos = 1 + gamma; % numerator of y-intercept (pos) bneg = gamma - 1; % numerator y-intercept (neg) bdenom = 2 * gamma^2; % denominator of y-intercept b

b1 = bpos / bdenom; % y-intercept (pos)

b2 = bneg / bdenom; % y-intercept (neg)

115

z = zeros(size(y)); % creates array to store values for cdf

% Define "critical points" - where equations change critPoints = [-(1+gamma) -1 (-1+gamma) (1-gamma) 1 (1+gamma)];

% Region 1 equation (z < -1-gamma)

% z = 0 - do nothing!

% Region 2 equation (-1-gamma < z <= -1)

z2 = (0.5 * m * (y.^2 - (1 + gamma)^2) + b1 * (y + 1 + gamma)).* (y >= critPoints(1) & y

<critPoints(2));

% Region 3 equation (-1 < z <= -1+gamma)

z3 = (.25 + (.5 * m * (1 - y.^2) + b2 * (1 + y))).* (y >= critPoints(2) & y <critPoints(3));

% Region 4 equation (-1+gamma < z <= 1-gamma)

z4 = .5.* (y >= critPoints(3) & y <critPoints(4));

% Region 5 equation (1-gamma < z <= 1)

z5 = (.5 + (.5 * m * (y.^2 - (1 - gamma)^2) + b2 * (y - (1-gamma)))).* (y >=critPoints(4) & y

<critPoints(5));

% Region 6 equation (1 < z <= 1+gamma)

z6 = (.75 + (.5 * m * (1 - y.^2) + b1 * (y - 1))).* (y >= critPoints(5) & y <critPoints(6));

% Region 7 equation (z > 1+ gamma)

z7 = (y >= critPoints(6)); % always 1

% Combine values of each region for output cdf

z = z2 + z3 + z4 + z5 + z6 + z7;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

%%

% POLAR NRZ CDF2 %

% This function returns an approximate cdf of a polar NRZ signal using the modified hyperbolic tangent model % w compression factor sigma z = polarNRZcdf2(y, sigma)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%% function z = polarNRZcdf2(y, sigma)

z = zeros(size(y)); % creates array to store values

% Define "critical point" - where equation changes critPoint = 0;

% Equation for z < 0

z1 = (.25 * (tanh(sigma * (y + 1)) + 1)).* (y <critPoint);

% Equation for z >= 0

z2 = (.25 * (tanh(sigma * (y - 1)) + 3)).* (y >= critPoint);

% Combine values of each region for output cdf

z = z1 + z2;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

%%

% POLAR NRZ DPDF %

116

% This function returns an approximate dpdf of a polar NRZ signal %

% using the triangular model with gamma as small base constant %

% z = polarNRZdpdf(y, gamma) %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%% function z = polarNRZdpdf(y, gamma) m = 1 / (2 * gamma^2); % slope (pos or neg) z = zeros(size(y)); % create array to store values


% Region 1 (z < -1-gamma) - always 0

% Region 2 equation (-1-gamma < z <= -1) z2 = (m).* (y >= critPoints(1) & y <critPoints(2));

% Region 3 equation (-1 < z <= -1+gamma) z3 = (-m).* (y >= critPoints(2) & y <critPoints(3));

% Region 4 (-1+gamma < z <= 1-gamma) - always 0

% Region 5 equation (1-gamma < z <= 1) z5 = (m).* (y >= critPoints(4) & y <critPoints(5));

% Region 6 equation (1 < z <= 1+gamma) z6 = (-m).* (y >= critPoints(5) & y <critPoints(6));

% Region 7 (z > 1+ gamma) - always 0

% Combine values of each region for output dpdf z = z2 + z3 + z5 + z6;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

% POLAR NRZ DPDF2 %

% This function returns an approximate pdf derivative of a polar NRZ signal %

% using the modified hyperbolic tangent model w/ compression factor sigma %

% z = polarNRZdpdf2(y, sigma) %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%% function z = polarNRZdpdf2(y, sigma)




z1 = -((sigma^2) / 2).* (tanh(sigma * (y + 1)).* polarNRZpdf(y, sigma)).* (y <critPoint);


z2 = -((sigma^2) / 2).* (tanh(sigma * (y - 1)).* polarNRZpdf(y, sigma)).* (y >= critPoint);

117

% Combine values of each region for output dpdf

z = z1 + z2;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

% POLAR NRZ PDF %

% This function returns an approximate pdf of a polar NRZ signal %

% using the triangular model with gamma as small base constant %

% z = polarNRZpdf(y, gamma) %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%% function z = polarNRZpdf(y, gamma)

m = 1 / (2 * gamma^2); % slope (pos or neg) bpos = 1 + gamma; % numerator of y-intercept (pos) bneg = gamma - 1; % numerator y-intercept (neg) bdenom = 2 * gamma^2; % denominator of y-intercept b

b1 = bpos / bdenom; % y-intercept (pos)

b2 = bneg / bdenom; % y-intercept (neg)



% Region 1 (z < -1-gamma) - always 0

% Region 2 equation (-1-gamma < z <= -1)

z2 = (m * y + b1).*( y>= critPoints(1) & y <critPoints(2));

% Region 3 equation (-1 < z <= -1+gamma)

z3 = (-m * y + b2).* (y >= critPoints(2) & y <critPoints(3));

% Region 4 (-1+gamma < z <= 1-gamma) - always 0

% Region 5 equation (1-gamma < z <= 1)

z5 = (m * y + b2).* (y >= critPoints(4) & y <critPoints(5));

% Region 6 equation (1 < z <= 1+gamma)

z6 = (-m * y + b1).* (y >= critPoints(5) & y <critPoints(6));

% Region 7 (z > 1+ gamma) - always 0

% Combine values of each region for output pdf

z = z2 + z3 + z5 + z6;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

% POLAR NRZ PDF2 %

% This function returns an approximate pdf of a polar NRZ signal %

% using the modified hyperbolic tangent model w/ compression factor sigma %

% z = polarNRZpdf2(y, sigma) %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

118

function z = polarNRZpdf2(y, sigma)




z1 = ((sigma / 4) * (1 - tanh(sigma * (y + 1)).^2)).* (y <critPoint);


z2 = ((sigma / 4) * (1 - tanh(sigma * (y - 1)).^2)).* (y >= critPoint);

% Combine values of each region for output pdf

z = z1 + z2;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%

119

APPENDIX D. RADIAL BASIS FUNCTION MODELING CODE

APPENDIX D1. NETLAB TOOLBOX CODE [37]rbf.mUSEDTO CREATE

AND INITIALIZE NETWORK function net = rbf(nin, nhidden, nout, rbfunc, outfunc, prior, beta)

%RBF Creates an RBF network with specified architecture

%

% Description

% NET = RBF(NIN, NHIDDEN, NOUT, RBFUNC) constructs and initialises a

% radial basis function network returning a data structure NET. The

% weights are all initialised with a zero mean, unit variance normal

% distribution, with the exception of the variances, which are set to

% one. This makes use of the Matlab function RANDN and so the seed for

% the random weight initialization can be set using RANDN('STATE', S)

% where S is the seed value. The activation functions are defined in

% terms of the distance between the data point and the corresponding

% centre. Note that the functions are computed to a convenient

% constant multiple: for example, the Gaussian is not normalised.

% (Normalisation is not needed as the function outputs are linearly

% combined in the next layer.)

%

% The fields in NET are

% type = 'rbf'

% nin = number of inputs

% nhidden = number of hidden units

% nout = number of outputs

% nwts = total number of weights and biases

% actfn = string defining hidden unit activation function:

% 'gaussian' for a radially symmetric Gaussian function.

% 'tps' for r^2 log r, the thin plate spline function.

% 'r4logr' for r^4 log r.

% outfn = string defining output error function:

% 'linear' for linear outputs (default) and SoS error.

% 'neuroscale' for Sammon stress measure.

% c = centres

% wi = squared widths (null for rlogr and tps)

% w2 = second layer weight matrix

% b2 = second layer bias vector

%

% NET = RBF(NIN, NHIDDEN, NOUT, RBFUND, OUTFUNC) allows the user to

% specify the type of error function to be used. The field OUTFN is

% set to the value of this string. Linear outputs (for regression

% problems) and Neuroscale outputs (for topographic mappings) are

% supported.

%

% NET = RBF(NIN, NHIDDEN, NOUT, RBFUNC, OUTFUNC, PRIOR, BETA), in which

% PRIOR is a scalar, allows the field NET.ALPHA in the data structure

% NET to be set, corresponding to a zero-mean isotropic Gaussian prior

% with inverse variance with value PRIOR. Alternatively, PRIOR can

% consist of a data structure with fields ALPHA and INDEX, allowing

% individual Gaussian priors to be set over groups of weights in the

% network. Here ALPHA is a column vector in which each element

% corresponds to a separate group of weights, which need not be

% mutually exclusive. The membership of the groups is defined by the

% matrix INDX in which the columns correspond to the elements of ALPHA.

% Each column has one element for each weight in the matrix, in the

120

% order defined by the function MLPPAK, and each element is 1 or 0

% according to whether the weight is a member of the corresponding

% group or not. A utility function RBFPRIOR is provided to help in

% setting up the PRIOR data structure.

%

% NET = RBF(NIN, NHIDDEN, NOUT, FUNC, PRIOR, BETA) also sets the

% additional field NET.BETA in the data structure NET, where beta

% corresponds to the inverse noise variance.

%

% See also

% RBFERR, RBFFWD, RBFGRAD, RBFPAK, RBFTRAIN, RBFUNPAK

%

% net.type = 'rbf'; net.nin = nin; net.nhidden = nhidden; net.nout = nout;

%%

%%%%%%%%%%% Check that function is an allowed type%%%%%%%%%%%%%%%%%

%% actfns = {'gaussian', 'tps', 'r4logr'}; outfns = {'linear', 'neuroscale'}; if (strcmp(rbfunc, actfns)) == 0 error('Undefined activation function.') else net.actfn = rbfunc; end ifnargin<= 4 net.outfn = outfns{1}; elseif (strcmp(outfunc, outfns) == 0) error('Undefined output function.') else net.outfn = outfunc; end

% Assume each function has a centre and a single width parameter, and that

% hidden layer to output weights include a bias. Only the Gaussian function

% requires a width net.nwts = nin*nhidden + (nhidden + 1)*nout; ifstrcmp(rbfunc, 'gaussian')

% Extra weights for width parameters net.nwts = net.nwts + nhidden; end ifstrcmp(net.outfn, 'neuroscale') net.mask = rbfprior(rbfunc, nin, nhidden, nout); end ifnargin> 5 ifisstruct(prior) net.alpha = prior.alpha; net.index = prior.index;

121

elseif size(prior) == [1 1] net.alpha = prior; else error('prior must be a scalar or a structure'); end end w = randn(1, net.nwts); outfunc = net.outfn; net.outfn = 'linear'; net = rbfunpak(net, w); net.outfn = outfunc;

% Make widths equal to one ifstrcmp(rbfunc, 'gaussian') net.wi = ones(1, nhidden); end

APPENDIX D2. NETLAB TOOLBOX CODE [37] rbftrain.mUSED TO TRAIN

NETWORK USING THE EM ALGORITHM AND AGMMMODEL. function [net, options] = rbftrain(net, options, x, t)

%RBFTRAIN Two stage training of RBF network.

%

% Description

% NET = RBFTRAIN(NET, OPTIONS, X, T) uses a two stage training

% algorithm to set the weights in the RBF model structure NET. Each row

% of X corresponds to one input vector and each row of T contains the

% corresponding target vector. The centres are determined by fitting a

% Gaussian mixture model with circular covariances using the EM

% algorithm through a call to RBFSETBF. (The mixture model is

% initialised using a small number of iterations of the K-means

% algorithm.) If the activation functions are Gaussians, then the basis

% function widths are then set to the maximum inter-centre squared

% distance.

%

% For linear outputs, the hidden to output weights that give rise to

% the least squares solution can then be determined using the pseudo-

% inverse. For neuroscale outputs, the hidden to output weights are

% determined using the iterative shadow targets algorithm. Although

% this two stage procedure may not give solutions with as low an error

% as using general purpose non-linear optimisers, it is much faster.

%

% The options vector may have two rows: if this is the case, then the

% second row is passed to RBFSETBF, which allows the user to specify a

% different number iterations for RBF and GMM training. The optional

% parameters to RBFTRAIN have the following interpretations.

%

% OPTIONS(1) is set to 1 to display error values during EM training.

%

% OPTIONS(2) is a measure of the precision required for the value of

% the weights W at the solution.

%

% OPTIONS(3) is a measure of the precision required of the objective

% function at the solution. Both this and the previous condition must

% be satisfied for termination.

122

%

% OPTIONS(14) is the maximum number of iterations for the shadow

% targets algorithm; default 100.

%

% See also

% RBF, RBFERR, RBFFWD, RBFGRAD, RBFPAK, RBFUNPAK, RBFSETBF

%

%

% Check arguments for consistency switchnet.outfn

case 'linear' errstring = consist(net, 'rbf', x, t); case 'neuroscale' errstring = consist(net, 'rbf', x); otherwise error(['Unknown output function ', net.outfn]); end if ~isempty(errstring) error(errstring); end

% Allow options to have two rows: if this is the case, then the second row

% is passed to rbfsetbf if size(options, 1) == 2 setbfoptions = options(2, :); options = options(1, :); else setbfoptions = options; end if(~options(14)) options(14) = 100; end

% Do we need to test for termination?

test = (options(2) | options(3));

% Set up the basis function parameters to model the input data density net = rbfsetbf(net, setbfoptions, x);

% Compute the design (or activations) matrix

[y, act] = rbffwd(net, x); ndata = size(x, 1); switchnet.outfn

case 'linear'

% Sum of squares error function in regression model

% Solve for the weights and biases using pseudo-inverse from activations temp = pinv([act ones(ndata, 1)]) * t;

net.w2 = temp(1:net.nhidden, :);

net.b2 = temp(net.nhidden+1, :); case 'neuroscale'

% Use the shadow targets training algorithm ifnargin< 4

123

% If optional input distances not passed in, then use

% Euclidean distance x_dist = sqrt(dist2(x, x)); else x_dist = t; end

Phi = [act, ones(ndata, 1)];

% Compute the pseudo-inverse of Phi

PhiDag = pinv(Phi);

% Compute y_dist, distances between image points y_dist = sqrt(dist2(y, y));

% Save old weights so that we can check the termination criterion wold = netpak(net);

% Compute initial error (stress) value errold = 0.5*(sum(sum((x_dist - y_dist).^2)));

% Initial value for eta eta = 0.1; k_up = 1.2; k_down = 0.1; success = 1; % Force initial gradient calculation for j = 1:options(14) if success

% Compute the negative error gradient with respect to network outputs

D = (x_dist - y_dist)./(y_dist+(y_dist==0)); temp = y'; neg_gradient = 2.*sum(kron(D, ones(1, net.nout)) .* ...

(repmat(y, 1, ndata) - repmat((temp(:))', ndata, 1)), 1); neg_gradient = (reshape(neg_gradient, net.nout, ndata))'; end

% Compute the shadow targets

t = y - eta*neg_gradient;

% Solve for the weights and biases temp = PhiDag * t;

net.w2 = temp(1:net.nhidden, :);

net.b2 = temp(net.nhidden+1, :);

% Do housekeeping and test for convergence ynew = rbffwd(net, x); y_distnew = sqrt(dist2(ynew, ynew)); err = 0.5.*(sum(sum((x_dist-y_distnew).^2))); if err >errold success = 0;

% Restore previous weights net = netunpak(net, wold); err = errold; eta = eta * k_down; else success = 1; eta = eta * k_up; errold = err;

y = ynew; y_dist = y_distnew; if test & j > 1

124

w = netpak(net); if (max(abs(w - wold)) < options(2) & abs(err-errold) < options(3)) options(8) = err; return; end end wold = netpak(net); end if options(1) fprintf(1, 'Cycle %4d Error %11.6f\n', j, err) end ifnargout>= 3 errlog(j) = err; end end options(8) = errold; if (options(1) >= 0) disp('Warning: Maximum number of iterations has been exceeded'); end otherwise error(['Unknown output function ', net.outfn]); end

APPENDIX D3. NETLAB TOOLBOX CODE[37] rbffwd.mUSED TO

SIMULATE OUTPUT OF TRAINED RBF NETWORK. function [a, z, n2] = rbffwd(net, x)

%RBFFWD Forward propagation through RBF network with linear outputs.

%

% Description

% A = RBFFWD(NET, X) takes a network data structure NET and a matrix X

% of input vectors and forward propagates the inputs through the

% network to generate a matrix A of output vectors. Each row of X

% corresponds to one input vector and each row of A contains the

% corresponding output vector. The activation function that is used is

% determined by NET.ACTFN.

%

% [A, Z, N2] = RBFFWD(NET, X) also generates a matrix Z of the hidden

% unit activations where each row corresponds to one pattern. These

% hidden unit activations represent the design matrix for the RBF. The

% matrix N2 is the squared distances between each basis function centre

% and each pattern in which each row corresponds to a data point.

%

% See also

% RBF, RBFERR, RBFGRAD, RBFPAK, RBFTRAIN, RBFUNPAK

%

%

%% Check arguments for consistency errstring = consist(net, 'rbf', x); if ~isempty(errstring); error(errstring); end

[ndata, data_dim] = size(x);

% Calculate squared norm matrix, of dimension (ndata, ncentres)

125

n2 = dist2(x, net.c);

% Switch on activation function type switchnet.actfn

case 'gaussian' % Gaussian

% Calculate width factors: net.wi contains squared widths

wi2 = ones(ndata, 1) * (2 .* net.wi);

% Now compute the activations

z = exp(-(n2./wi2)); case 'tps' % Thin plate spline

z = n2.*log(n2+(n2==0)); case 'r4logr' % r^4 log r

z = n2.*n2.*log(n2+(n2==0)); otherwise error('Unknown activation function in rbffwd') end a = z*net.w2 + ones(ndata, 1)*net.b2;

126

APPENDIX E. LIST OF PUBLICATIONS

1)

Ombati D, E N Ndungu and L M Ngoo, “

Optimization Of Underdetermined

Blind Speech De-noising For Enhanced Teleconferencing By Interpolated

Fastica_25 ” KSEEE Conference, 2011

2)


Determined Blind Speech Denoising for Enhanced Teleconferencing Using Novel Machine

Intelligence ” KSEEE Conference, 2013

3)


Multichannel Determined Blind

Speech De-noising Using Artificial Machine Intelligence

” KSEEE

Conference, 2013

127