Uploaded by Temperature Face Recognition Project

Vision-Based Autonomous Navigation Approach for a Tracked Robot Using Deep Reinforcement Learning

advertisement
2230
IEEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 15, 2021
Vision-Based Autonomous Navigation Approach
for a Tracked Robot Using Deep
Reinforcement Learning
Muhammad Mudassir Ejaz , Tong Boon Tang , Senior Member, IEEE,
and Cheng-Kai Lu , Senior Member, IEEE
Abstract —Tracked robots need to achieve safe
autonomous steering in various changing environments.
In this article, a novel end-to-end network architecture
is proposed for tracked robots to learn collision-free
autonomous navigation through deep reinforcement learning.
Specifically, this research improved the learning time and
exploratory nature of the robot by normalizing the input data
and injecting parametric noise into the network parameters.
Features were extracted from the four consecutive depth
images by deep convolutional neural networks, which were
used to derive the tracked robot. In addition, a comparison
was made between three Q-variant models in terms of
average reward, variance, and dispersion across episodes.
Also, a detailed statistical analysis was performed to measure the reliability of all the models. The proposed model
was superior in all the environments. It is worth noting that our proposed model, layer normalisation dueling double
deep Q-network (LND3QN), could be directly transferred to a real robot without any fine-tuning after being trained in a
simulation environment. The proposed model also demonstrated outstanding performance in several cluttered real-world
environments considering both static and dynamic obstacles.
Index Terms — Autonomous navigation, deep learning, reinforcement learning, obstacle avoidance.
I. I NTRODUCTION
R
OBOTS are used in various applications, such as data
collection, surveillance [1], exploration, rescue services,
and inspection [2]. The robot requires navigation to operate
in these applications, but achieving collision-free and safe
navigation is a challenging task. Autonomous navigation has
been studied for a long time, and many well-developed
methods have been proposed for safe autonomous navigation [3], [4] for various environments. However, these conventional methods use assumptions to operate, which are
not suitable for large environments [5]. For large areas,
visual simultaneous localization and mapping (V-SLAM) [6]
is used where motions of the robot are estimated using pixel
Manuscript received August 5, 2020; accepted August 8, 2020. Date of
publication August 13, 2020; date of current version December 16, 2020.
This work was supported in part by the YUTP-Fundamental Research
Grant (YUTP-FRG) under Grant 015LC0-002. The associate editor
coordinating the review of this article and approving it for publication
was Dr. Ioannis Raptis. (Corresponding author: Cheng-Kai Lu.)
Muhammad Mudassir Ejaz is with the Department of Electrical and
Electronics Engineering, Universiti Teknologi PETRONAS (UTP), Seri
Iskandar 32610, Malaysia.
Tong Boon Tang and Cheng-Kai Lu are with the Institute of Health
Analytics (IHA), Universiti Teknologi PETRONAS (UTP), Seri Iskandar
32610, Malaysia (e-mail: chengkai.lu@utp.edu.my).
Digital Object Identifier 10.1109/JSEN.2020.3016299
information. However, this method is susceptible to changing
light conditions and has poor performance in a low textured
environment [7].
Recently, deep learning approaches for autonomous navigation have highlighted the bottleneck of traditional methods and
have proposed solutions to address the limitations of conventional methods [8]–[11]. Deep reinforcement learning (DRL)based methods have gained massive popularity in the field
of autonomous navigation due to their promising performance
where no labelled data is required for training [12]–[15]. Laser
range sensors were widely used for autonomous navigation
using DRL in the past, but due to less sensing capability
to describe the 3D world, the vision sensor has become a
good choice since it provides more information, generalizes
the environment better, and is cheaper.
In particular, depth images are more appropriate than RGB
images since they exhibit much better visual fidelity due to
their texture-less nature [16]. Also, the depth image from the
simulation environment and the real-world environment are
quite similar, so model transferability is also easy. In DRLbased methods, the agent learns from its behaviour through hitand-trial, so it is not feasible and impractical to allow robots
to train in the real world. Hence, a simulation environment is
required for training.
1558-1748 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 09,2022 at 23:02:49 UTC from IEEE Xplore. Restrictions apply.
EJAZ et al.: VISION-BASED AUTONOMOUS NAVIGATION APPROACH FOR A TRACKED ROBOT USING DRL
It takes DRL-based methods longer to learn a problem
than supervised learning methods since there is no labelled
data and no supervision is provided to the network. One way
to reduce the learning time is to improve the exploratory
nature of the agent. Exploration is the process where an agent
explores new regions by acquiring more information from the
environment, irrespective of thinking about the rewards. In
contrast, exploitation is the opposite of exploration, where
agents take steps to increase the cumulative reward. This
trade-off between exploration and exploitation is a standing
problem. Also, the computational cost is a significant issue
when working with deep learning models. The distribution of
data in each layer changes abruptly, making the model slow
while training.
In this article, we present a novel end-to-end network
architecture that relieves the burden of the computation cost
by normalising the input data before each convolutional layer
using a layer normalisation method. It also improves the
exploratory nature of an agent by adding the parametric
noise in the fully deep Q-network (DQN) (layer normalisation
dueling double deep Q-network [LND3QN]) trained in three
virtual environments, and the results were compared with
Q-variant models. The results also demonstrate the outstanding performance of a model in a tracked robot in various
cluttered real-world environments, considering both static and
dynamic obstacles. Nevertheless, to the best of our knowledge,
no reports have focused on these issues together.
The rest of the manuscript presented as follows. The related
works are discussed in Section II, and the architecture and
implementation of LND3QN with noise injection are demonstrated in Section III. In Section IV, the experimental results
are further discussed. Lastly, Section IV concludes this article.
II. R ELATED W ORK
Autonomous navigation using DRL-based methods has
become a popular choice since labelled data is not required
for training. In addition, the transferability of a model is high
if depth images are used as input for a network. In 2013,
Mnih et al. proposed a DQN algorithm where actions were
classified using raw pixels and tested on ATARI games [17].
The same approach was implemented on ViZDoom, a firstperson shooting (FPS) game environment, by Lample and
Chaplot [18]. In the FPS game environment, an agent navigates
different regions to increase their score. This idea opened the
way for using DRL-based methods for autonomous navigation
using raw images. Both Tai and Liu [19] and Zhang et al. [20]
used depth images as an input for the network, where
DQN and successor feature-based DRL was adopted for the
obstacle avoidance navigation, respectively. Xie et al. [21]
acquired depth-prediction information from the RGB images
by converting them through the fully convolutional residual
network (FCRN) according to Laina et al. [22], and the
duelling nature of Dueling Deep Q Networks (DDQN) was
used for action prediction. The same DRL method was used
by Wu et al. [23], and branching noise in the fully connected
layers was achieved through the noisy nets for better exploration, according to Fortunato et al. [24]. Wu et al. [25]
2231
proposed a novel method where two sets of data streams
merged, which were given to a network as an input. The
methods, as mentioned earlier, used DQN [26] and its variants
as a DRL method for network training.
Another main attribute of DRL-based methods is the exploration technique. The purpose of exploration is to ensure
that convergence of the agent’s actions must not reach the
local optimum prematurely. Several exploration methods have
been proposed in the literature to deal with exploration and
exploitation trade-off, such as counting tables [27], learned
dynamic models [28], [29], self-supervised curiosity [30],
and state-space density modeling [31]. To achieve a better
exploratory nature of these algorithms, a bootstrap DQN was
proposed by Osband et al. [32], where temporally correlated noise has added in the parameters. It has been found
that adding noise improves the exploratory behaviour of the
agent [33]. Fortunato et al. [24] proposed a NoisyNets for
DQN that enhances the agent’s exploration capability by the
addition of noise in the network parameters.
III. E XPERIMENTS AND D ISCUSSIONS
A. Problem Definition
The objective of this work was to empower tracked robots
to learn autonomous navigation viably using DRL. We formulated this autonomous navigation problem using the Markov
decision process (MDP), which consists of a set of tuple M =
(S, A, R, P, γ ). S is the state space, A is the action space, R is
the immediate reward, P represents a transition probability,
and γ [0,1] is the discount factor [34]. Reinforcement learning (RL) is a closed-loop learning phenomenon where an agent
performs an action, at A, in a given state, st S, and moves
to the next state, st+1 , and receives an immediate reward (rt )
from the environment. A policy, π (a|s), defines the mapping
from state to action. The goal of an agent is to maximize
the cumulative reward from the environment through a Qvalue. The Q-value is defined as the best action value that
increases the reward by following the optimal policy and can
be formulated as follows:
∞
t
γ R (st , at ) |s0 = s, a0 = a
(1)
Q (s, a) = E
t =0
In the above equation, γ represents a discount factor that
controls the distribution of rewards in the future. Bellman’s
equation [35] can be used to formulate the RL problem that
returns the maximum rewards from the environment, as shown
in Equation 2, as follows:
Q ∗ (s, a) = R (st , at ) + γ max Q(st +1 , at +1 )
(2)
To enable the agent to train itself, the reward is the only
learning signal given to the agent by the environment. To consider the learning speed, we designed a dense-reward function
rather than a sparse one. In sparse-reward functions, the agent
requires more experiences to learn, which slows down the
learning process. However, in a dense-reward function, the
agent is restricted from taking actions that give a maximum
return. Therefore, we designed an information-reward function
Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 09,2022 at 23:02:49 UTC from IEEE Xplore. Restrictions apply.
2232
IEEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 15, 2021
Fig. 1. Layer normalisation duelling double deep Q-network with noise injection (LND3QN) with the noise injection network architecture. The input
state is a series of four depth images that go into three convolutional layers to extract features from the input data. A duelling architecture divides
the flattened layer into value and advantage functions.
that allows the model to learn fast, safe, and smooth steering
with high efficiency, as expressed by Equation 3, as follows:
−10
i f colli si on
r (st , at ) =
(3)
c1 ϑ 2 cos (c2 ϑω) − c3 other wi se.
wherec1, c2 , and c3 are constants with values of 3, 3, and
0.1, respectively. Here, c1 acts as a scaling factor, c2 acts
as a bias, and c3 acts as a regularier. Linear and angular
velocity are represented by ϑ and ω, respectively. This reward
function helps the robot to move straight as far as possible
for maximum reward until angular action is required to avoid
a collision. When the robot is close to the obstacle, its linear
velocity decreases, and its angular velocity increases, which
changes the orientation of the robot. In the case of a collision,
a robot receives −10 as a penalty; otherwise, it receives a total
reward calculated by Equation 3 after 500 steps.
B. Network Architecture
To attain the objectives mentioned above, we selected the
dueling double deep Q-network (D3QN) as an RL method.
D3QN is a model-free, value-based method proposed by
Wang et al. [36]. D3QN highlights the overestimation issue
faced by DDQN and DQN due to the estimation of each
action value in every state. The novelty of D3QN is that
it splits the Q-value into two streams: one computes the
value of the state, and the other calculates the advantage
action that depends on the state. This architecture helps to
generalize the actions without imposing any effects on the
learning algorithm. After a state value and an advantageous
action value is calculated, they are concatenated to form one
stream of Q-values. Q-value Q-values are formulated using the
following expression: Q π (s, a) = V π (s) + Aπ (s, a), where
V represents the value and A is the advantage function. The
benefit of the dueling network is that every time the Q-value is
updated, it updates the value of that action, while other actions
remain unaffected. Thereby, the process of state-value learning
becomes more efficient.
We modified the D3QN model and proposed a novel method
named layer normalisation dueling double deep Q-network
with noise injection (LND3QN). This method reduces the
computational cost by normalizing the state space through
layer normalization and improves the exploratory nature of
the agent by injecting the noise into the network parameters.
The network architecture of the proposed method is illustrated
in Fig. 1.
Four consecutive depth images captured from the camera
merged together as the first set of inputs of the network. The
reason for stacking the images is to preserve the temporal
information of an environment. In forward neural networks,
a non-linearity is shown between the input, x, to an output
vector, y. Let suppose, x m is the vector representation of the
summed input of the layer, while m t h is the hidden layer of
that neural network. The summed inputs are calculated with
a weight matrix, wt h projection, and bottom-up inputs, h m ,
given as follows:
(4)
= f xli + bli
x im = wimT h m h m+1
i
where f (·) is an element-wise non-linear function, wim is
the input weights of the i th input layer, T is a transpose of
matrix and bli is the bias. These parameters are learned by the
optimisation algorithms, with the gradients being computed by
back-propagation. In a feed-forward neural network, the output
of the first layer becomes the input of the second layer, which
greatly changes the summed inputs of that layer, especially
when the rectified linear unit (ReLU) activation function is
used. To reduce this covariate shift, the layer normalisation
technique is applied before each of the convolutional layers.
Layer normalization computes the mean and variance of the
summed input within a layer, as follows:
1 K
1 K m
m
m
xi σ =
(x m − μm )2 (5)
μ =
i=1
i=1 i
K
K
where H represents the number of hidden layers, x is the
summed input vector, and μ and σ are the mean and variance,
respectively. Three convolution layers with different filters and
Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 09,2022 at 23:02:49 UTC from IEEE Xplore. Restrictions apply.
EJAZ et al.: VISION-BASED AUTONOMOUS NAVIGATION APPROACH FOR A TRACKED ROBOT USING DRL
kernels are used to extract the features from the input data.
Layer normalisation with the identity function is applied to
the input data, and after that, the processed data is fed into
first convolutional layers. A kernel size of 10×14 with a stride
size of 8 converts the image into 20 × 16 with 32 features.
In the second and third layers of the convolutional layer, the
layer normalisation layer uses the ReLU activation function,
while the kernel size of the second convolution layer is 4 × 4
and 3 × 3, respectively. The last convolution layer transformed
the image to a size of 10 × 8, with 64 features, which were
then flattened to become an array of 5120-D. The covariance
was measured after each convolutional layer, and we found
that by doing layer normalization the covariance after each
convolutional layer was reduced to 32%, 55%, and 69%,
respectively. This reduction not only boosted the training but
also provided relief to the computation power. Layer normalization [37] is better than batch and weight normalization
because it does not introduce any dependencies within the
network, and it normalizes the input data by calculating the
mean and variance.
After the extraction of features, the network is divided
into two branches that consist of two dense layers. One has
been used to measure the state value. In contrast, the other
branch is used to estimate the advantage function, which
corresponds to the action commands. The extracted features
then passed into two fully connected layers with 512 units
and 1 unit to determine the state value. Meanwhile, the
same features are fed into the second branch with two fully
connected layers with 512 units, and N units to calculate the
Q-values. N represents the number of actions. After obtaining
the state value, V (s), and the advantage action (A) from the
fully connected layer, the Q-value was calculated using the
following equation:
Q i (s, a) = Leaky ReLU (V (s) + Ai (s, a)
1 Ai (s, a ))
(6)
−
a
N
where N is the number of actions. The use of the leaky ReLU
activation function instead of ReLU, was used to fix the issue
of dying neurons, which occurred due to layer normalisation.
Generally, the epsilon-greedily method is used for exploration in Q-variant methods. The initial value of is usually
large and decreases gradually until it reaches the final epsilon.
Entropy-based exploration works in the opposite manner since
the probability value in the explored region is small and
increases as the agent move towards an unexplored region [38].
However, these probability-based exploration methods take
more steps to achieve better exploration, which affects training.
For better exploration, we introduced noise into the network
parameters, which enhance the agent’s exploratory nature.
We used NoisyNets, proposed by Fortunato et al. [24], which
works by disturbing the weights and biases by adding noise
in the linear layer, y = wx + b, where x, w, and b represent
the input, weight, and bias, respectively. Gaussian noise is
added as uncertainty in the network parameters, as shown in
Equation 6, as follows:
y = (μw + σw w ) x + μb + σb b
(7)
2233
where μw , σw , μb , and σb represent the network parameters,
while w and b are the random noises and is the elementwise multiplication.
Adding the noise in the network makes the network heavier,
which affects the computational power. To overcome this issue,
we used factorised Gaussian noise rather than independent
noises. The weight and bias matrix can be expressed as the
following:
(8)
ωi, j = μωi, j + σi,ωj f (i ) f j ; b j = μbj + σ jb f j
√
where f is defined as f () = sgn() ||. We sampled μm
and μb randomly by a uniform distribution on the interval
− N1 ,
1
N
, where N is the input layer size and σ m and
σ b were initialised to 0.4
N . The advantage of using noise for
exploration compared to epsilon-greedily and entropy-based
methods is that it does not need any hyperparameter for tuning.
The amount of noise in the network is updated automatically
while training.
C. Training Framework
The training framework of the proposed model consists of
two networks, as depicted in Fig. 2. The state information
goes to the online network to estimate a Q(st ; θ ) value for the
corresponding action. The action value multiplies by Q(st; θ − )
to estimate a Q(st ; at ; θ − ). Meanwhile, the information of
the next state goes to both online and target networks simultaneously to compute the target Q-value, Q(st +1 ; θ − ). The
target value (yi ) is calculated using Equation 8 in which (γ )
represents the discount factor, set as 0.99, and (rt ) is the
instantaneous reward given in each time step. Here, θ and
θ − represent the network parameters.
⎧
⎪
⎨rt , i f epi sode ends at step t + 1
yi = rt + γ Q st +1 , argmax Q (st +1 , at +1 ; θ ) ; θ − , (9)
⎪
⎩
other wi se
In deep learning, a model is optimised using a loss function.
By obtaining the Q-values from the online and target network,
the loss is calculated using the following equation:
(10)
L (θ ) = E (yi − Q(st , at , ; θ )2
The network parameters of an online network are initialized randomly, while the parameters of the target network
are duplicated from theta. Afterwards, at every time step,
the back-propagation is applied to the loss function using
the Adam optimiser with a learning rate (α) of 0.0001 to
update the online network’s parameters. For efficient training
of the online network, the parameters of the convolutional
layers were divided by 2 during the back-propagation. The
target network is just a duplication of the online network,
and hence their parameters are not trainable. The rate at
which the duplication of parameters occurred was 0.001.
Algorithm 1 shows the pseudo-code for the proposed method.
The advantage of using noise for exploration compared to
epsilon-greedily and entropy-based methods is that it does not
need any hyperparameter for tuning. The amount of noise in
the network is updated automatically while training.
Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 09,2022 at 23:02:49 UTC from IEEE Xplore. Restrictions apply.
2234
IEEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 15, 2021
Fig. 2. Training framework of the proposed LND3QN model. The input state is fed into the online network to perform an action. It calculates the
estimated Q-value after taking an action. The target value (y) is calculated through the immediate reward (rt ). The discount factor (γ) and the Q-values
come from both the networks at each iteration. The online network (θ) is responsible for the optimal actions, while the target network (θ− ) estimates
the target values. The loss is back-propagated to an online network to update its parameters. The target network is just a duplication of the online
network, and it is updated periodically.
IV. A RCHITECTURE AND I MPLEMENTATION OF LND3QN
W ITH N OISE I NJECTION
A. Experiments in a Virtual Environment
The Gazebo is an open-source 3D simulator with a graphical
interface with high-quality graphics and a physics engine
to train and test the algorithm in an elaborate indoor and
outdoor environment. It is a well-known simulator that works
with the robot operating system (ROS). The ROS allows us
to simulate the virtual tracked robot for training and then
transfer the trained model in a physical robot. Admitting
this benefit, we designed different environments for training
in Gazebo, and each environment was different in terms of
complexity and number of obstacles, as shown in Fig. 3. The
first environment was a small 10 × 10 m world with few
obstacles. Willow Garage’s office, which comes with Gazebo,
was chosen as a second environment. In the last environment,
we introduced more walking persons in the cafe environment,
which resembles a real-world scenario.
Communication in the ROS is done via topics to different
nodes, as shown in Fig. 4. The messages are subscribed and
published via topics indicated as rectangular boxes to four
main nodes represented by an oval shape. The depth image is
published to Gazebo World (GW) by camera_controller node
via camera/depth/image_raw topic. GW processes the data and
then sends it to the motor_controller node using the cmd_vel
topic. The odometry node subscribes a message to GW via
odom topic to move the tracked robot.
Training was performed on an Intel i7 CPU, 16 GB
RAM, and NVIDIA RTX 2060 GPU desktop system with
Tensorflow [39], and each iteration took approximately 0.22
seconds as the average training time. Hyper-parameters such
as the learning rate (α) and discount factor (γ ) were set to
Algorithm 1 LND3QN With Noise Injection
Input: INITIALISE Batch size (NB ), Replay memory (M), Size of
replay memory (NM ), Gamma (γ ), Exploration frames (E), Observation (NO ), Rate to update target network (τ ), Parameter of online
(θ) and target (θ − ) network, Episodes (NE )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
While episode = 1 to NE do
Observe the state st
Select an action a ∗ = argmaxa Q(st ,a;θ)
Receive a reward rt by executing an action and goes
to the next state st+1
Store transition (st , a ∗ , r , st+1 ) in M
if |M| > NM then
Remove the first transition from the Memory M
end
Apply Layer Normalization (LN) in each
convolutional layer
Inserting factorized Gaussian noise in the fully
connected layers
Sample a mini-batch of NB transitions (st , at , rt , st+1 )
from M
Sample noise variables i , j , i− , −
j ∼ N(0, 1)
Evaluate Q1(st+1 , a; θ) and Q2(st+1 , a; θ − )
ifepisode terminates at st+1 then
Set y = rt
else
Set y= rt + γ Q2(st+1 , ar gmaxat+1 Q1 (st+1 , at ; θ) θ − )
end
end
Compare Loss L(θ) = L (θ) = E (yi − Q(st , at , ; θ))2
Perform AdamOptimizer with respect to parameters
of the online network θ
Update target network θ − ← θ
end
0.0001 and 0.99, respectively. The experience replay buffer
size was 50000 for the first environment and reduced to 30000
Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 09,2022 at 23:02:49 UTC from IEEE Xplore. Restrictions apply.
EJAZ et al.: VISION-BASED AUTONOMOUS NAVIGATION APPROACH FOR A TRACKED ROBOT USING DRL
2235
Fig. 5. Rewards received by all the models in the first environments.
(a) represents the reward at each episode, (b) shows the smoothed
reward over each episode while (c) and (d) depicts the average reward
and variance, respectively.
Fig. 3.
3D environments build on the Gazebo simulator. The first
environment is a small 10 x 10 m square world, while the second
environment is a willow garage with narrow paths and a large area. The
third environment is a café world, which resembles a real-world scenario.
Fig. 4. ROS communication flowchart. Nodes and topics are represented
by ellipse and rectangular boxes, respectively. Arrow shows that message
is published to the node.
for the second and third environments. In every experiment,
seven velocities were considered, including linear and angular
velocities, such as 0.4 m/s, 1.2 m/s, π/12 rad/s, π/6 rad/s, 0
rad/s, −π/6 rad/s, and −π/12 rad/s. In the first environment,
the total episodes for the training set were 1500. However,
in the second and third environments, we reduced the total
episodes to 150. In each environment, the proposed model
LND3QN was compared to three baseline models, namely
DQN, DDQN, and D3QN. In all the baseline models, the greedily method was used for exploration, where the initial and
final epsilon values were set to 0.1 and 0.001, respectively.
All the models were evaluated by the reward obtained by
the robot in all three environments. For better demonstration,
we divided the reward graph into four different plots. The
first plot shows the reward at each time step. The second
plot depicts the smooth value of the reward calculated by
the exponential moving average (EMA) method, formulated
as follows:
X 1,
t =1
(11)
st =
α X 1 + (1 − α) St −1 , t > 1
where the coefficient (α) represents the smoothing factor,
which was between 0 and 1, Xis the reward in each episode,
and S is the smoothing value at a given episode. For a better
smoothing curve, we kept α = 0.99. The third plot illustrates
the average reward after taking an average of 10 episodes.
Variance is shown in the fourth plot, where a model with low
variance is considered well trained.
Fig. 5 illustrates the results from the first environment
where training was started from scratch. In Fig. 5(a), the
suggested model is more prominent, which means it received a
maximum reward in most episodes. However, Fig. 5(b) depicts
the smoothed reward. It can be seen that the proposed model
has better performance among all at the end of the training
and received a maximum reward at the 1100th episode. The
curve remained stable after the 1100th episode, which indicates
that the proposed model was trained faster than other models.
From the variance plot, as depicted in Fig. 5(d), the spikes
represent the deviation of an episodic reward from the mean.
In the beginning, all the models have approximately the same
variance, but at the interval from 26th to 30th episodes, a peak
has been found that means the training of the proposed model
has gone towards over-fitting, but as the training continues the
variance of the proposed model tends to decrease and reach a
minimum at the end of the training.
In the second environment, trained weights from the first
environment are used for training. Fig. 6 shows the reward
pattern of all the models. It can be clearly seen in Fig. 6(b)
that the suggested model received higher rewards from the
beginning until the end of the training. However, the reward
values DQN and DDQN are similar at the end of the training,
but D3QN performed better than DQN and DDQN. The
stability of LND3QN is clearly seen from the variance plot,
as shown in Fig. 6(d). It can be seen that at the start of the
training the D3QN model goes to overfitting as it exhibits high
variance, but from intervals 7th to onwards, the model starts
learning.
Fig. 7 illustrates the reward graph obtained from the third
environment by all the models where the second environment’s
trained weights have been used. Compared to three baseline
models, the LND3QN model outperformed and yielded higher
episodic reward due to a better exploration strategy, as shown
in Fig. 7(b). From the figure, it can be seen that the D3QN
Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 09,2022 at 23:02:49 UTC from IEEE Xplore. Restrictions apply.
2236
IEEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 15, 2021
TABLE I
C OMPARISON B ETWEEN THE P ROPOSED LND3QN M ODEL AND
T HREE B ASELINE M ODELS W ITH R ESPECT TO THE I NTERQUARTILE
R ANGE IN VARIOUS V IRTUAL E NVIRONMENTS
Fig. 6. Performance curves of all the Q-variant models and the proposed
model in the second environment. Reward at the end of each episode
and its smoothed values are depicted in (a) and (b), respectively. (c) and
(d) shows the average reward and variance, respectively.
Fig. 7. Comparison of all the models in terms of reward received in the
third environment. (a) portrays the reward at each episode, (b) shows
the smoothed reward over each episode. Average reward and variance
plotted in (c) and (d), respectively.
model did not perform well in this environment and obtained
the lowest reward, whereas DQN and DDQN models were
the same at the end of the training. Once again, the proposed
model resulted in the lowest variance, as depicted in Fig. 7(d),
while DQN achieved the highest variance at the start and end
of training.
DRL-based methods have a high level of variability in
performance and are susceptible to different factors, such
as hyper-parameters, the environment, and implementation
details [40]. These variabilities will cause issues when the
algorithm is deployed in a real-world scenario. To consider
this, the reliability was measured by all the models. Dispersion
across runs (DR) is one way to calculate the variability of a
model. It can be measured by either the variance or standard
deviation, but we measured it by calculating the interquartile
range (IQR) since it is robust statistics.
The IQR is the difference between the first quartile value
and the third quartile value. We applied the analysis of
variance (ANOVA) to the reward values. Table I lists the IQR
value of each model. In the first environment, LND3QN had an
IQR of 81, while DDQN had the lowest IQR of 41. However,
DQN had a slightly better IQR in comparison to D3QN.
In environment 2, D3QN performed better with an IQR of 63,
while the other models had lower IQR values. In the third
environment, the proposed model performed outstandingly,
with an IQR of 45, which is 68% more than the baseline
models. On the contrary, all the baseline models had IQR
values below 20. For better visualisation of reliability and
more statistical analysis, boxplots show how the rewards are
dispersed across the episodes in all the scenarios, as shown in
Fig. 8.
The results were evaluated in terms of range, median, and
outliers. Fig. 8(a) demonstrates the results of the first environment. The figure shows that the proposed model yielded higher
rewards in most of the episodes, with a median value of 120,
while DQN obtained the lowest median value. DDQN had a
smaller IQR with one outlier; however, the median reward was
higher than DQN and D3QN models. In the second environment, the median achieved by the DQN and DDQN models
plunged to below 55. However, the suggested LND3QN model
attained the highest median value, as shown in Fig. 8(b). In the
last environment, the proposed model performed very well
from the start of the training compared to the others, as shown
in Fig. 8(c). LND3QN achieved the highest median value,
while baseline models had median values below 50. If we look
over the third quartile of the proposed model in Fig. 7(a–c),
we notice that it is higher than all the baselines models, which
proves that LND3QN attained the highest rewards, and its
success rate is also high. An episode is considered to be
successful if it lasts for 500 steps without collision during
training.
A more statistical comparison is made in terms of average
reward and standard deviation (SD) return, as illustrated in
Table II. This statistical analysis further proves the stability
and reliability of the proposed model. The average reward of
all the baseline models in the first environment was below 90.
However, the LND3QN average reward was 98.02. Similarly, the SD of LND3QN was higher than the other models.
In the second environment, LND3QN obtained a lower SD
value than D3QN, but its average reward was higher than that
of others. The superiority of the proposed model was shown in
the last environment, where the SD achieved by DQN, DDQN,
and D3QN models was below 11, and the SD attained by
LND3QN was 22. Also, the average reward of the proposed
model was 76, while the others were below 55.
B. Experiments in Real-World Scenarios
Real-world experiments were conducted in an indoor environment with three different scenarios. A tracked robot
equipped with an Intel Realsense camera and Lidar was
used throughout the experiments, as shown in Fig. 9. The
dimensions of the robot were 43 × 37 × 18 cm, with 35 kg
weight. All the tracks were connected with 4 servo motors, and
each motor communicated with an RS232 cable. In addition,
two small motors were attached with the camera to turn the
camera left, right, up, and down. All motors were connected
with a UDOO x86 computer, while the sensors were powered
by an Intel Core i5 mini PC. The ROS was installed on both
Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 09,2022 at 23:02:49 UTC from IEEE Xplore. Restrictions apply.
EJAZ et al.: VISION-BASED AUTONOMOUS NAVIGATION APPROACH FOR A TRACKED ROBOT USING DRL
2237
Fig. 8. Dispersion across Run in all three environments. Better reliability is indicated by more positive values. Y-axis represents the range of reward
values.
TABLE II
R ESULTS AND D ESCRIPTIONS OF R EPORTED VALUES BY DQN
VARIANTS AND THE P ROPOSED M ODEL . “L ENGTH (I TERS )” D ENOTES
A LGORITHM I TERATIONS AND “L ENGTH (E PISODES )” D ENOTES THE
N UMBER OF E PISODES
Fig. 9. Tracked robot equipped with an Intel Realsense depth camera
and a Lidar sensor. (a) and (b) shows the front and side view of a tracked
robot, respectively.
systems, where UDOO x86 served as a ROS Master, and a
wireless router was used for communication between them.
Real-world experiments were conducted in an indoor environment with different scenarios. In the first scenario, two
boxes were placed, as shown in Fig. 10(a). The goal of the
robot was to avoid both of them and to navigate autonomously.
Depth images were captured from the Intel Realsense camera
and passed to the proposed model, LND3QN, to determine
the actions. In the beginning, the robot moved forward with
a linear velocity and then changed its direction to the right
by selecting the larger angular velocity to avoid the first
obstacles. As it crossed the first obstacle, the second obstacle
was placed nearby, so it again changed its direction to the
right by increasing the angular velocity and minimising the
linear velocity. Once it crossed the second obstacle, it went
forward again and approached a wall. Similarly, the robot
Fig. 10. Different scenarios in an indoor environment to evaluate the
validity and adaptability of our proposed model.
turned left by receiving a larger angular velocity command
to avoid the collision with the wall. During the experiments,
angular velocities were recorded, and then graphs are plotted
of the selected actions.
Fig. 11 illustrates the six intermediate time step images,
with the respective angular velocity at the bottom. All the
action commands were provided by the proposed model that
was trained in a simulation environment. In addition to the
complex scenarios, the suggested model was evaluated in other
Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 09,2022 at 23:02:49 UTC from IEEE Xplore. Restrictions apply.
2238
IEEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 15, 2021
Fig. 11.
Autonomous navigation in the second real-world indoor
scenario. Images shows the six intermediate time step and the action
selections by a robot illustrate below it.
Fig. 12. Actions selection by the robot in the first indoor scenario. The
curve below the images shows the steering action selected by the tracked
robot at each step.
Fig. 13. Action choices by the robot in the third real-world scenario. The
graph shows the angular actions taken by the tracked robot at each time
step.
unseen environments, as shown in Fig. 10(b), where more
obstacles are placed. As shown in Fig. 12, the robot turned
left at first to avoid the collision and traveled straight until it
came closer to the obstacle. Once it reached the obstacle, the
robot changed its direction by decreasing the linear velocity
and increased the angular velocity to turn left.
Furthermore, we evaluated the proposed model in more
complex scenarios by placing more obstacles, which resembles
a cluttered environment, as shown in Fig. 10(c). In the beginning, the robot was placed near the wall. In order to not bump
the wall, it changed its direction to the left and moved forward
until it approached the next wall. Similarly, the action commands from the learned policy enabled the robot to change its
direction once again to avoid the obstacle, as shown in Fig. 13.
Experiments were also conducted in a dynamic
environment, where a person was moving in a 4 × 4 m2
area, as presented in Fig. 14. The goal was to avoid the
moving obstacle (human) and the static obstacles placed
in the testing environment. Fig. 14 (a) shows the layout of
Fig. 14.
Experiments conducted in a real-world dynamic environment. The path covered by the robot on current scenario is indicated
by a red color, whereas light red color shows the path at previous
scenario taken by the robot. Obstacles are represented by the green
and yellow boxes. The black box depicts the robot’s starting location
and blue arrow shows the movements of a person. (a) Layout of the
testing environment. (b) Experiment performed in a static environment.
(c-f) Experiments performed in a dynamic environment, with relocating
of the static obstacles.
the testing environment, with two obstacles indicated as a
green and yellow box. The black box indicates a robot with
an arrow that represents the starting point. In Fig. 14(b),
a complete trajectory is shown in red. Fig. 14 (c–f) shows
the path covered by a robot in each time step with both
static and moving obstacles, while the red color shows the
path in the current scenario, and the light red color indicates
the path in the previous scenario taken by the robot. The
blue arrow indicates the movement of a person in the testing
environment. For a better understanding of the person’s
movement, an opaque show the starting point, and the final
position is depicted as a firm figure. We had also changed
the location of the static obstacles, as shown in Fig. 14(d–f).
V. C ONCLUSION
In this article, we presented a novel method for autonomous
navigation for a tracked robot using the DRL-based method.
More training is needed for DRL-based methods, which
reduces the training of neural networks due to computational
cost. Similarly, exploration and exploitation trade-off also
affects learning. We focused on these two main parameters
and proposed the LND3QN method, which accelerates the
training of neural networks by applying layer normalisation
before each convolutional layer. The injection of noise into
the fully connected layers improved the exploratory nature
of a tracked robot. The proposed method derived the action
commands from depth images directly through meticulous
network architecture. CNN was used to extract out the features
from the four consecutive depth images, and Q-values were
calculated from the features.
The proposed model, LND3QN, was simulated, analyzed,
and compared with three baseline models, namely DQN,
Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 09,2022 at 23:02:49 UTC from IEEE Xplore. Restrictions apply.
EJAZ et al.: VISION-BASED AUTONOMOUS NAVIGATION APPROACH FOR A TRACKED ROBOT USING DRL
DDQN, and D3QN. It is worth noting that the suggested
model had outstanding performance among all the baseline
models in terms of average reward and variance. Furthermore,
statistical analysis was performed to validate the variability of
the models in terms of DR. The results show that LND3QN
has better reliability than other Q-variant models. Real-world
experiments were conducted using a tracked robot in different
scenarios, and the results illustrate that the proposed model has
better generalization capability to any unseen environment and
is competent enough to determine the steering actions. In the
future, we will use prioritise experience replay, which may
reduce the learning time more and increase the performance
by selecting the actions that have a high Q-value. The model
may become efficient if we include more past transitions to
understand the surrounding environment better and modify the
reward function.
R EFERENCES
[1] R. Hasan, S. Asif Hussain, S. Azeemuddin Nizamuddin, and S. Mahmood, “An autonomous robot for intelligent security systems,” in Proc.
9th IEEE Control Syst. Graduate Res. Colloq. (ICSGRC), Shah Alam,
Malaysia, Aug. 2018, pp. 201–206.
[2] F. Ingrand and M. Ghallab, “Deliberation for autonomous robots: A
survey,” Artif. Intell., vol. 247, pp. 10–44, Jun. 2017.
[3] A. Pandey, “Mobile robot navigation and obstacle avoidance techniques:
A review,” Int. Robot. Autom. J., vol. 2, no. 3, May 2017, Art. no. 00022.
[4] H. Isakhani, N. Aouf, O. K. Stamatis, and J. F. Whidborne, “A furcated
visual collision avoidance system for an autonomous micro robot,” IEEE
Trans. Cogn. Develop. Syst., vol. 12, no. 1, pp. 1–11, Mar. 2020.
[5] A. Al-Kaff, F. García, D. Martín, A. De La Escalera, and J. Armingol,
“Obstacle detection and avoidance system based on monocular camera
and size expansion algorithm for UAVs,” Sensors, vol. 17, no. 5, p. 1061,
May 2017.
[6] C.-H. Chien, C.-C.-J. Hsu, W.-Y. Wang, and H.-H. Chiang, “Indirect
visual simultaneous localization and mapping based on linear models,”
IEEE Sensors J., vol. 20, no. 5, pp. 2738–2747, Mar. 2020.
[7] C. Debeunne and D. Vivet, “A review of visual-LiDAR fusion based
simultaneous localization and mapping,” Sensors, vol. 20, no. 7, p. 2068,
2020.
[8] J. Wang, V. A. Shim, R. Yan, H. Tang, and F. Sun, “Automatic
object searching and behavior learning for mobile robots in unstructured
environment by deep belief networks,” IEEE Trans. Cogn. Develop.
Syst., vol. 11, no. 3, pp. 395–404, Sep. 2019.
[9] S. Stevsic, T. Nageli, J. Alonso-Mora, and O. Hilliges, “Sample efficient learning of path following and obstacle avoidance behavior for
quadrotors,” IEEE Robot. Autom. Lett., vol. 3, no. 4, pp. 3852–3859,
Oct. 2018.
[10] F. Codevilla, M. Muller, A. Lopez, V. Koltun, and A. Dosovitskiy,
“End-to-end driving via conditional imitation learning,” in Proc. IEEE
Int. Conf. Robot. Autom. (ICRA), Brisbane, QLD, Australia, May 2018,
pp. 1–9.
[11] K. Wu, M. Abolfazli Esfahani, S. Yuan, and H. Wang, “TDPP-net:
Achieving three-dimensional path planning via a deep neural network
architecture,” Neurocomputing, vol. 357, pp. 151–162, Sep. 2019.
[12] F. Sadeghi and S. Levine, “CAD2RL: Real single-image flight without
a single real image,” presented at the Robot., Sci. Syst. XIII (RSS),
Massachusetts Institute of Technology (MIT), Cambridge, MA, USA,
Jul. 2017.
[13] P. Long, T. Fanl, X. Liao, W. Liu, H. Zhang, and J. Pan, “Towards
optimally decentralized multi-robot collision avoidance via deep reinforcement learning,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA),
May 2018, pp. 6252–6259.
[14] J. Bruce, N. Suenderhauf, P. Mirowski, R. Hadsell, and
M. Milford, “One-shot reinforcement learning for robot navigation
with interactive replay,” 2017, arXiv:1711.10137. [Online]. Available:
https://arxiv.org/abs/1711.10137
[15] M. Pfeiffer et al., “Reinforced imitation: Sample efficient deep reinforcement learning for mapless navigation by leveraging prior demonstrations,” IEEE Robot. Autom. Lett., vol. 3, no. 4, pp. 4423–4430,
Oct. 2018.
2239
[16] L. Tai, J. Zhang, M. Liu, and W. Burgard, “Socially compliant navigation through raw depth inputs with generative adversarial imitation
learning,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), Brisbane,
QLD, Australia, May 2018, pp. 1111–1117.
[17] V. Mnih et al., “Playing atari with deep reinforcement learning,” 2013,
arXiv:1312.5602. [Online]. Available: https://arxiv.org/abs/1312.5602
[18] G. Lample and D. S. Chaplot, “Playing FPS games with deep reinforcement learning,” presented at the 31st AAAI Conf. Artif. Intell. (AAAI),
California, CA, USA, Feb. 2017.
[19] L. Tai and M. Liu, “Towards cognitive exploration through deep
reinforcement learning for mobile robots,” 2016, arXiv:1610.01733.
[Online]. Available: http://arxiv.org/abs/1610.01733
[20] J. Zhang, J. T. Springenberg, J. Boedecker, and W. Burgard, “Deep
reinforcement learning with successor features for navigation across
similar environments,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst.
(IROS), Vancouver, BC, Canada, Sep. 2017, pp. 2371–2378.
[21] L. Xie, S. Wang, A. Markham, and N. Trigoni, “Towards
monocular vision based obstacle avoidance through deep reinforcement learning,” 2017, arXiv:1706.09829. [Online]. Available:
http://arxiv.org/abs/1706.09829
[22] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab,
“Deeper depth prediction with fully convolutional residual networks,”
in Proc. 4th Int. Conf. 3D Vis. (3DV), Stanford, CA, USA, Oct. 2016,
pp. 239–248.
[23] K. Wu, M. Esfahani, S. Yuan, and H. Wang, “Learn to steer through deep
reinforcement learning,” Sensors, vol. 18, no. 11, p. 3650, Oct. 2018.
[24] M. Fortunato et al., “Noisy networks for exploration,” in Proc. Int. Conf.
Learn. Represent., 2018, pp. 1–18.
[25] K. Wu, H. Wang, M. A. Esfahani, and S. Yuan, “BND*-DDQN:
Learn to steer autonomously through deep reinforcement learning,”
IEEE Trans. Cognit. Develop. Syst., early access, Jul. 16, 2020,
doi: 10.1109/TCDS.2019.2928820.
[26] V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015.
[27] H. Tang et al., “Exploration: A study of count-based exploration for
deep reinforcement learning,” in Proc. Adv. Neural Inf. Process. Syst.,
2017, pp. 2753–2762.
[28] R. Houthooft, X. Chen, Y. Duan, J. Schulman, F. DeTurck, and
P. Abbeel, “VIME: Variational information maximizing exploration,” in
Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 1109–1117.
[29] J. Achiam and S. Sastry, “Surprise-based intrinsic motivation for deep
reinforcement learning,” presented at the Int. Conf. Learn. Represent.
(ICLR), Toulon, France, Apr. 2017.
[30] D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, “Curiosity-driven
exploration by self-supervised prediction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Honolulu, HI, USA,
Jul. 2017, pp. 16–17.
[31] G. Ostrovski, M. G. Bellemare, A. Oord, and R. Munos, “Count-based
exploration with neural density models,” in Proc. 34th Int. Conf. Mach.
Learn., vol. 70, 2017, pp. 2721–2730.
[32] I. Osband, C. Blundell, A. Pritzel, and B. V. Roy, “Deep exploration
via bootstrapped DQN,” in Proc. Adv. Neural Inf. Process. Syst., 2016,
pp. 4026–4034.
[33] T. Salimans and D. P. Kingma, “Weight normalization: A simple
reparameterization to accelerate training of deep neural networks,” in
Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 901–909.
[34] M. Zimmer and S. Doncieux, “Bootstrapping Q-learning for robotics
from neuro-evolution results,” IEEE Trans. Cogn. Develop. Syst., vol. 10,
no. 1, pp. 102–119, Mar. 2018.
[35] R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning,
vol. 135. Cambridge, MA, USA: MIT Press, 1998.
[36] Z. Wang, T. Schaul, M. Hessel, H. V. Hasselt, M. Lanctot, and
N. D. Freitas, “Dueling network architectures for deep reinforcement
learning,” in Proc. 33rd Int. Conf. Mach. Learn. (PMLR), vol. 48, 2016,
pp. 1995–2003.
[37] J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” 2016,
arXiv:1607.06450. [Online]. Available: https://arxiv.org/abs/1607.06450
[38] M. Usama and D. Eui Chang, “Learning-driven exploration for
reinforcement learning,” 2019, arXiv:1906.06890. [Online]. Available:
http://arxiv.org/abs/1906.06890
[39] M. Abadi et al., “TensorFlow: A system for large-scale machine learning,” in Proc. 12th USENIX Symp. Operating Syst. Design Implement.
(OSDI), vol. 16, 2016, pp. 265–283.
[40] L. J. Lin, “Self-improvement based on reinforcement learning, planning
and teaching,” in Proc. 8th Int. Workshop, San Mateo, CA, USA: Morgan
Kaufmann, 1991.
Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 09,2022 at 23:02:49 UTC from IEEE Xplore. Restrictions apply.
2240
IEEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 15, 2021
Muhammad Mudassir Ejaz received the B.Eng.
degree in biomedical engineering from the
NED University of Engineering and Technology, Pakistan, in 2015. He is currently pursuing
the master’s degree with Universiti Teknologi
PETRONAS (UTP), Malaysia. Since 2018,
he has been with the Smart Assistive and Rehabilitative Technology (SMART), Department of
Electrical and Electronics Engineering, UTP.
His research interests include image and video
processing, robot vision, deep reinforcement
learning, and deep learning.
Tong Boon Tang (Senior Member, IEEE)
received the B.Eng. (Hons.) and Ph.D. degrees
from The University of Edinburgh. He is currently
the Director of the Institute of Health and Analytics for Personalized Care, UTP, Malaysia. His
research interests include biomedical instrumentation, from device and measurement to data
fusion. He serves as the Secretary of the HICoE
Council and the Chair of the IEEE Circuits and
Systems Society Malaysia Chapter.
Cheng-Kai Lu (Senior Member, IEEE) received
the B.S. and M.S. degrees in electronics engineering from Fu Jen Catholic University, Taipei,
Taiwan, in 2001 and 2003, respectively, and the
Ph.D. degree in engineering from The University of Edinburgh, U.K., in 2012. After graduation, he worked as the Director of the Research
and Development Division, Chyao Shiunn Electronic Industrial Company Ltd., Shanghai, China,
before he joined the National Applied Research
Laboratories, Science and Technology Policy
Research and Information Centre, Taiwan. He is currently a Faculty Member of the Electrical and Electronic Engineering Department, Universiti
Teknologi PETRONAS (UTP), Malaysia. His research interests include
medical imaging, embedded systems, and artificial intelligence, and their
applications and clinical decision support systems. Apart from academic
experience, he has more than eight years of industrial work experience.
He has not only published his research works on peer-reviewed articles
(book chapters, journal articles, conferences papers, and reports) but
also has filed a couple of patents. The most significant technical contributions to date by him are on production line automation to significantly
reduce labor costs in manufacturing while he served as the Director of the
Research and Development Division, Chyao Shiunn Electronic Industrial
Company Ltd., from 2012 to 2014. His patents have also been licensed
out successfully, and he receives partial royalties from patent licensing
agreements. It is worthwhile to mention that one of his inventions (TW
Patent: PS keyboard system, 2006) has been adopted by the Republic
of China Air Force and has further been applied to light aircraft and the
specific long-haul flight plane. He has served as an Executive Member
for the IEEE EMBS Malaysia Chapter and the Penang Chapter from
January 2017 to February 2018 and since 2018.
Authorized licensed use limited to: NATIONAL TAIWAN NORMAL UNIVERSITY. Downloaded on March 09,2022 at 23:02:49 UTC from IEEE Xplore. Restrictions apply.
Download