This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2023.3265857 GENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2022 1 Contactless Blood Pressure Measurement via Remote Photoplethysmography with Synthetic Data Generation Using Generative Adversarial Networks Bing-Fei Wu, Fellow, IEEE, Li-Wen Chiu, Student Member, IEEE, Yi-Chiao Wu, Student Member, IEEE, Chun-Chih Lai, Hao-Min Cheng, Pao-Hsien Chu Abstract— Remote photoplethysmography (rPPG) has been used to measure vital signs such as heart rate, heart rate variability, blood pressure(BP), and blood oxygen. Recent studies adopt features developed with photoplethysmography (PPG) to achieve contactless BP measurement via rPPG. These features can be classified into two groups: time or phase differences from multiple signals, or waveform feature analysis from a single signal. Here we devise a solution to extract the time difference information from the rPPG signal captured at 30 FPS. We also propose a deep learning model architecture to estimate BP from the extracted features. To prevent overfitting and compensate for the lack of data, we leverage a multi-model design and generate synthetic data. We also use subject information related to BP to assist in model learning. For real-world usage, the subject information is replaced with values estimated from face images, with performance that is still better than the state-of-the-art. To our best knowledge, the improvements can be achieved because of: (1) the model selection with estimated subject information, (2) replacing the estimated subject information with the real one, (3) the InfoGAN assistance training (synthetic data generation), and (4) the time difference features as model input. To evaluate the performance of the proposed method, we conduct a series of experiments, including dynamic BP measurement for many single subjects and nighttime BP measurement with infrared lighting. Our approach reduces the MAE from 15.49 to 8.78 mmHg for systolic blood pressure (SBP) and 10.56 to 6.16 mmHg for diastolic blood pressure(DBP) on a selfconstructed rPPG dataset. On the Taipei Veterans General Hospital(TVGH) dataset for nighttime applications, the MAE is reduced from 21.58 to 11.12 mmHg for SBP and 9.74 to 7.59 mmHg for DBP, with improvement ratios of 48.47% and 22.07% respectively. Index Terms— contactless blood pressure measurement, deep learning, nighttime blood pressure, remote photoplethysmography, synthetic data generation This work was supported in part by the National Science and Technology Council under Grant MOST 111-2221-E-A49-166-MY3. B. -F. Wu, L. -W. Chiu, Y. -C. Wu, and C. -C. Lai are with the Institute of Electrical and Control Engineering, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan. Hao-Min Cheng is with Department of Medical Research and Education, Taipei Veterans General Hospital. Pao-Hsien Chu is with Chang Gung Memorial Hospital, Chang Gung University. I. I NTRODUCTION Blood pressure (BP) is a meaningful vital sign. High BP is seen as an important factor in health issues such as stroke, other cardiovascular diseases, and kidney disease [1], [2]. Recent work [3] has shown that about 626 million women and 652 million men suffer from hypertension, over 40% of whom have never been diagnosed. A convenient BP measurement approach is necessary for the early detection of hypertension. One common method for non-invasive measurement is through a BP cuff [4]–[6]. To overcome the contact measuring limitation, recent research has largely focused on cuffless BP measurement through electrocardiographic (ECG) and photoplethysmography (PPG) signals. The time or phase difference, e.g., pulse transit time (PTT), is a common approach to deriving the BP value from ECG or PPG signals. In broad terms, PTT can be defined as the traveling time difference of the arterial pulse wave between any two consecutive sites—generally a pair of waveform signals from proximal and distal observed sites respectively. The proximal waveform from the central body ECG and the distal waveform from the fingertip or the ear lobe PPG signals are commonly selected. The starting point is set as the Q or R wave of the ECG signal, and the endpoint is approximately 50% of the height of the maximum value of the PPG signal [7]–[9]. Alternative signal sources can be multiple PPGs rather than ECG and PPG signals [10]–[13]. Instead of extracting PTT from multiple signals, waveform feature analysis such as the derivatives or morphology of the blood volume waveform has been conducted to measure BP from a single signal. Slapničar et al. [14] use a neural network with the first and second derivatives of the PPG signals as input signals to predict BP. Chakraborty et al. [15] analyze the PPG waveform to extract features containing pulse wave velocity information and regress the BP value. Haddad et al. [16] predict BP via a multi-linear regression approach with input consisting of the first and second derivatives of single PPG signals. With the development of deep learning, several end-to-end approaches have been devised to map the PPG waveform to BP values in a single stage. Deep learning approaches such Authorized licensed use limited to: SHIBAURA INSTITUTE OF TECHNOLOGY. Downloaded on October 23,2023 at 05:16:22 UTC from IEEE Xplore. Restrictions apply. © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information. This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2023.3265857 2 GENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2022 as convolutional (CNN) and recurrent neural networks bring waveform feature analysis to a higher level. Han et al. [17] train a multi-task CNN model to extract PPG features. Features are concatenated with body mass index (BMI) information to predict both hypertension classification and BP values. In contactless measuring, pulse signals can be extracted from serial RGB images. This method is termed remote photoplethysmography (rPPG) and has been widely leveraged for heart rate measurement [18]–[20]. Recently, rPPG signals have also been applied for BP measurement for both conventional waveform feature analysis and deep learningbased approaches. For conventional waveform features, Zhou et al. [21] extract valid peaks and valleys from rPPG signals and adopt their averages and BMI as features to fit BP using linear regression. Rong et al. [22] adopt additional features from rPPG, including area, slope, and energy, as input to the neural network by which to predict BP values. For highlevel waveform features via a deep learning model, Schrumpf et al. [23] obtain better-quality PPG or rPPG signals by filtering and calculating signal-to-noise ratio and predict BP based on classical networks such as AlexNet, ResNet, and long short-term memory. In [24], multichannel rPPG signals, heart rate values, and BMI are fed into a CNN model modified from ResNet18 [25]. Afterward, a training procedure loop is applied to fine-tune the model, fitting signals filtered using varying band-pass filters. The main aim of this study is to propose a convenient method for BP measurement for healthcare applications through rPPG signals. The morphological properties of rPPGs differ greatly from those of PPG signals, due to arterial pulse waveform changes along the arterial tree [26]. We propose an encoder-decoder (ENC-DEC) model with symmetric skip connections for high-level feature extraction and filtering noise. However, given their data-driven nature, deep learning models are usually prone to overfitting; it is difficult to collect enough data to prevent this. Hence, models tend to output the most common value in the training dataset such that large errors occur more in the hypertension group. This is not practical for real applications. We address this problem using a multi-model structure and synthetic data generation. Also, we evaluate the model using a cross-dataset testing protocol to ensure the efficiency of the model. To account for time or phase differences, the BP-related handcrafted feature, we use multi-channel rPPG signals as the input. These signals are upsampled to resolve the FPS limitation of the time difference feature. Recent research [13], Liu et. al., applied PTT-based BP measurement on multiPPG from the wrist and finger at 125 Hz sampling frequency. The distance from wrist to finger is almost as close as the distance between the upper and lower face, while the 125 Hz sampling frequency is much higher than 30 FPS images. With this discovery, the RGB channels are upsampled to 150 and 180 FPS to provide a more reasonable resolution for the time difference feature extraction. Furthermore, as the handcrafted PTT can depend on the site selected [27], we extract several phase difference signals to emphasize the time difference features carried in the rPPG signals. These phase difference signals are expected to be a comprehensive representation of the time difference characteristics and reduce the negative influences of the impulse noise. Note that the time difference feature represents merely the changes in BP and not a certain value thereof. This necessitates a calibration procedure that is correlated to the distance between the two observation sites [28]. In this study, subject information, age [26] and BMI [29], are used for model selection for an approximate calibration procedure. We further generate synthetic data in an attempt to generate data that fluctuates corresponding to the subject information, thus enhancing the prediction ability in the hypertension group. We evaluate the proposed method on TVGH dataset, a nighttime dataset constructed for the purpose. Since subject information was not recorded in this dataset, we utilized age and BMI estimated from facial images. The contributions of this study are as follows: 1. We propose a multi-model structure to eliminate overfitting by the deep learning model, and select models according to subject information as an approximate calibration procedure. 2. The proposed method is a single-camera image-only implementation. The signal pre-processing is leveraged to overcome the limitation of the low sampling rate of a normal camera. Besides, we utilize subject information estimated from facial images and achieve contactless BP estimation through images purely. 3. For a thorough assessment, we construct three datasets with various properties, including large sample sizes of diverse diagnoses, long-term dynamic BP changes, and nighttime scenarios. In addition, we applied cross-dataset, cross-domain evaluation for dynamic and nighttime applications. The remainder of this paper is structured as follows. The proposed method and the implementation details are described in Section II. The assessments are introduced in Section III, including the dataset and metrics. The experiment results are presented in Section IV. We conclude in Section V. II. P ROPOSED M ETHOD A. Overall Structure The overall structure is shown in Figure 1. The proposed method includes modules for signal processing, model selection with subject information, and BP estimation with multimodel structures. Given serial face images as the system input, images are fed to both the signal processing and subject information estimation modules. Signal processing includes rPPG signal extraction and feature extraction. Subject information estimation includes age recognition and BMI estimation. The estimated age and BMI are fed to the model selection table built from the training data to decide which BP model to utilize. The extracted rPPG signals, or the time difference features ftd , are fed to the systolic blood pressure (SBP) and diastolic blood pressure (DBP) models selected according to the subject information to estimate the SBP and DBP. Below we describe the implementation, including face rPPG signal extraction, time difference feature extraction, the deep learning model architecture, model selection with subject information, and synthetic data generation with InfoGAN [30]. Authorized licensed use limited to: SHIBAURA INSTITUTE OF TECHNOLOGY. Downloaded on October 23,2023 at 05:16:22 UTC from IEEE Xplore. Restrictions apply. © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information. This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2023.3265857 AUTHOR et al.: PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (FEBRUARY 2017) 3 Fig. 1: Overall structure of proposed method B. Signal Processng 1) Face rPPG Signal Extraction: We use Multitask Cascade Convolution Neural Network (MTCNN) [31] to detect the position of the facial box and five facial landmarks: the left eye, the right eye, the nose, and the left and right corners of the mouth. We then convert the face image to the YCbCr color space and employ a skin detector to filter out non-skin parts. The region of interest (ROI) of the upper and lower face is segmented by the nose landmark. As the sampling rate is too low to capture the rPPG containing time difference information with 30 FPS, we upsample the R, G, and B channels to 180 FPS before computing the rPPG. The upsampling process including upsampling and interpolation with polynomial fitting are applied to the color channel signals to enhance the physiological information that is already carried on the pulse signal. The upsampling process can reconstruct periodic pulse signals contained in the color channels. Another benefit is that the upsampled color channels contain more points for the rPPG construction. After chrominance-based (CHROM) [18] rPPG extraction, an 8order Chebyshev filter with a 0.5–7Hz passband is applied to filter out the noise to reduce interference in subsequent feature extraction. 2) Feature Extraction: The previous stage yields the rPPG y = [yu , yℓ ] from the upper and lower faces. We extract the time difference features ftd from the phase information by the following formula: 1000 · 60 · ϕ, (1) 2π · HR where ftd is the time difference features in units of milliseconds. HR is the heart rate, estimated by the maximum component of the magnitude spectrum within the 0.5–3.3 Hz rPPG band, and ϕ ≜ [ϕe , ϕp , ϕc ] are the phase differences in the range of [0, 2π) between two rPPGs. ϕe , ϕp , and ϕc can be manipulated by energy spectral density (ESD), power spectral density (PSD), and cross-correlation (X-corr) with the following equations, respectively. ESD: The phase difference ϕe is defined as ftd = ϕe ≜ ∠Yu (ωeu ) − ∠Yℓ (ωeℓ ), (2) where ωeu and ωeℓ are the frequency points corresponding to the maximum ESD of the upper and lower face, given by ωe = arg max(E), (3) and the energy spectral density function is defined as 2 E(ω) = |Y (ω)| , (4) where Y (ω) is the discrete Fourier transform of the rPPG signal. PSD: Phase difference ϕp is defined as ϕp ≜ ∠Yu (ωpu )) − ∠Yℓ (ωpℓ )), (5) where ωpu and ωpℓ are the frequency points corresponding to the maximum of the PSD Pu and Pℓ , given by ωp = arg max(P ). (6) PSD P is defined as the discrete Fourier transform of the auto-correlation function of the signal: P (ω) = F{corr(y(t), y(t))}. (7) X-corr: The phase difference ϕc is manipulated from the time delay corresponding to the maximum cross-correlation coefficient, defined as: 2π ϕc ≜ · arg max(Cu,ℓ (τ )) (8) T where ftd is the reciprocal of the sampling rate, and τ refers to the time difference. Cu,ℓ (τ ) is the cross-correlation of two rPPGs manipulated by the following equation: Cu,ℓ (τ ) = L−1 X yu (t)yℓ (t + τ ), − t=0 L L ≤τ ≤ 2 2 (9) where L is the signal length. C. Deep Learning Model To cause the model to extract more refined time difference information from the model input (rPPGs y or features ftd ), an ENC-DEC architecture is adopted as the backbone model. With symmetrical skip connections, redundancy in the features ftd can be filtered and important information does not Authorized licensed use limited to: SHIBAURA INSTITUTE OF TECHNOLOGY. Downloaded on October 23,2023 at 05:16:22 UTC from IEEE Xplore. Restrictions apply. © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information. This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2023.3265857 4 GENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2022 TABLE II: Multi-model operation ranges (a) Hyper-parameter settings Target Models (Nm ) SBP DBP 10 5 Handle range (mmHg) Lower bound (RL ) Upper bound (RH ) 90 65 160 95 (b) BP range for model ID conversion Fig. 2: Architecture of backbone model F SBP TABLE I: Model implementation Model F G Layer BN Conv1D 1 Conv1D 2 Conv1D 3 Conv1D 4 Conv1D 5 ConvTranspose1D ConvTranspose1D ConvTranspose1D ConvTranspose1D ConvTranspose1D Conv1D Conv1D Conv1D Conv1D Conv1D 1 2 3 4 5 1 1 1 1 1 Output size Parameters [1, 3, 512] [1, 8, 449] [1, 16, 396] [1, 24, 349] [1, 32, 338] [1, 40, 333] [1, 32, 338] [1, 24, 349] [1, 16, 396] [1, 8, 449] [1, 3, 512] 87.96K [1, [1, [1, [1, [1, 3.95M 1024, 1] 128, 25] 128, 116] 64, 248] 2, 512] Q Linear 1 Linear mean Linear var [1, 256] [1, 1] [1, 1] 393.73K D Linear [1, 1] 1.54K B Linear [1, 1] 1.54K PReLu activation function used here. vanish as the model depth increases. Fig. 2 is a diagram of the backbone model: it consists of 5-layered, 1D convolution for the encoder; 5-layered, 1D transpose convolution (or de-convolution) for the decoder; and the PReLu activation function. Each model in the multi-model structure is composed of ENC-DEC architecture (F ) and a fully connected layer (B). SBP and DBP are predicted by different models respectively. The implementation, the corresponding output size, and the number of parameters in each model are listed in Table I. D. Model Selection with Subject Information The multi-model ensures that each model focuses on a certain BP range, yielding more precise predictions. BMI and age have a great impact on BP [26], [29]. In this study, these two pieces of subject information are involved to determine which model is selected and could be manually-input real values or could be values estimated from the facial images using the age recognizer and the BMI estimator. Removing the necessity for manual input in this way makes the proposed method more suited for real-world applications. We respectively analyze the relationship between the estimated age or BMI, the SBP, and the DBP for all subjects in the training dataset to create an SBP mapping table MSBP (Age est , BMI est ) and DBP mapping ta- Value ID [90, 97) [97,104) [104, 111) [111, 118) [118, 125) 0 1 2 3 4 DBP Value [125, [132, [139, [146, [153, 132) 139) 146) 153) 160] ID 5 6 7 8 9 Value [65, [71, [77, [83, [89, 71) 77) 83) 89) 95] ID 0 1 2 3 4 ble MDBP (Age est , BMI est ). We use interpolation to populate specified age and BMI combinations that are missing in the collection. The SBP and DBP mapping tables are shown in (a) and (c) of Fig. 3. The horizontal axis represents the BMI, which ranges from 16 to 34, and the vertical axis represents the age, from 18 to 85. Different BP values are represented by color. The corresponding model ID MID for SBP and DBP can be calculated by (M (Age est , BMI est ) − RL ) MID = , (10) (RH − RL )/Nm where M (Age est , BMI est ) is the average of the BP value (or an interpolated value) in the training set with the specific age and BMI pair. RL and RH are the lower and upper bound of ranges handled by the proposed method. Nm is the number of the model. The detailed settings are listed in Table IIa. Based on the number of models, the range is evenly assigned to each model for model ID conversion as presented in Table IIb. The constructed model table is shown in Fig. 3. Sub-figures (a) and (c) are the statistical results of the training set, and sub-figures (b) and (d) are the final SBP and DBP model tables. For image-only implementation, we also applied age recognition and BMI estimation according to face images. 1) Age Recognition: We train an age classifier with 11 age ranges: (0–3), (4–7), (8–12), (13–18), (19–24), (25–30), (31– 39), (40–49), (50–59), (60–75), and (75+). Since the classes are dependent, which is an ordinal classification task, the soft labels with channel encoding [32] are introduced to improve classifier performance. The final estimated age is the weighted sum of the median for each age range and its probability. 2) BMI Estimation: Based on [33], we first compute the Pearson correlation coefficients between 7 facial geometric ratios and real BMI and get statistics among all training data to select the top three ratios with the highest correlation as features, including cheekbone-to-jaw-width ratio (CJWR), widthto-upper-facial-height ratio (WHR), and perimeter-to-area ratio (PAR). In Fig. 4 demonstrates a sample of the features for BMI estimation. CJWR is the ratio of the cheekbone width (red line) |P1 P17 | . WHR is the to jaw width (green line), calculated by |P 5 P13 | Authorized licensed use limited to: SHIBAURA INSTITUTE OF TECHNOLOGY. Downloaded on October 23,2023 at 05:16:22 UTC from IEEE Xplore. Restrictions apply. © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information. This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2023.3265857 AUTHOR et al.: PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (FEBRUARY 2017) (a) SBP mapping table (b) SBP model table (c) DBP mapping table (d) DBP model table Fig. 3: BP mapping tables for model selection 5 InfoGAN [30], which generates specified data by learning mutual information between latent noise and observations. Fig. 5 shows each training stage. Each BP model comprises a feature extractor F and a regression model B. The inputs are the time difference feature ftd , and the output is the estimated SBP eSBP or DBP eDBP . The generator is G with the input noise z composed of incompressible and semantic parts. It is difficult for model training to converge if the generator directly generates the time difference features. Thus, the target output of the generator is fake rPPG data, denoted as y˜u and y˜ℓ . The three fake time difference features f˜td are from the fake rPPGs. The discriminator consists of F and D, and its output prediction p indicates whether the input feature is fake or real. Respectively, F and Q are the auxiliary discriminators for extracting the mutual information between the latent code and the generated features. Output c of F and Q, which here is expected to learn age and BMI characteristics. This helps when generating data that is lacking in the collected dataset, and thus helps the model handle rare data. Given D̃ = D · F , Q̃ = Q · F , and B̃ = B · F , the objective function for the original GAN is defined as min max LGAN (D̃, G). G (11) D̃ For InfoGAN, the objective function is min max LGAN (D̃, G) − λ1 LInfo (G, Q̃). G,Q̃ (12) D̃ Hence, the final objective function of the overall training is min max LGAN (D̃, G) − λ1 LInfo (G, Q̃) − λ2 LBP G,Q̃,B̃ (13) D̃ with hyperparameters λ1 and λ2 , which are set to 1. LGAN (D̃, G) is given by Fig. 4: The sample of geometric attributes for BMI estimation, where cheekbone width is marked with the red line, jaw width is the green line, upper facial height is the blue one and the yellow one is for the perimeter of the face. ratio of the cheekbone width (red line) to upper facial height 1 P17 | (blue line), computed by |P |P67 Pc | , where Pc is center between P20 and P25 . PAR is the ratio of the perimeter-to-area of the polygon surrounded by the red line and the yellow line, where the area can be computed by the sum of three triangle areas, i.e., P1 P5 P13 , P5 P13 P9 , P1 P13 P17 . For landmark detection, we adopt our previous work [34] to inhibit landmark shaking. To further correctly locate the facial landmarks, we perform Gamma correction for facial images with lower intensity, and a horizontal calibration is conducted to make the right and the left eye at the same height. Besides, we skip the images with lower confidence scores of the landmarks used above. Finally, the support vector regression (SVR) [35] is employed to map the three ratios into a certain BMI value. E. Synthetic Data Generation There is a lack of training data with specific age and BMI combinations. We enhance model training by means of LGAN (D̃, G) = Eftd ∼Preal [log(D̃(ftd ))] + Ez∼Pz [log(1 − D̃(f˜td | G(z)))], (14) where Preal is the real data distribution and Pz the noise distribution. LInfo (G, Q̃) is given by LInfo (G, Q̃) = Ec∼P (c),ftd ∼G(z,c) [log Q̃(c | ftd )] + H(c), (15) where c denotes latent code, Q̃(c | ftd ) denotes the approximation of P (c | ftd ), and H(c) is the entropy of the latent code which can treated as a constant by fixing the latent code distribution. LBP is defined as LBP = eBP − t̂BP 2 2 , (16) where t̂BP is the target value of the SBP or DBP. In the implementation, models F , B, and G are pretrained with the training structure shown in figures (b) and (c) of Fig. 5. This pretraining yields a generator that generates fake rPPG signals and a series of F and B that estimate BP. Models thus finetuned with pretrained weights tend to converge better. Note that models G, D and Q are only used in the training procedure. Models F and B are used in BP estimation, and the inference process is shown as the blue line in of Fig. 5 (a). The output is SBP or DBP. Authorized licensed use limited to: SHIBAURA INSTITUTE OF TECHNOLOGY. Downloaded on October 23,2023 at 05:16:22 UTC from IEEE Xplore. Restrictions apply. © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information. This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2023.3265857 6 GENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2022 Fig. 5: The training structure of synthetic data generation. The inference path is marked with blue lines, other black path is only applied in the training stage as assistance. TABLE III: Datasets rPPG dataset Dataset Subjects Age (years) Male/female BMI (kg/m2 ) SBP (mmHg) DBP (mmHg) Signal Reference Training Testing 961 subjects 56.69 ± 14.81 734 (76%) / 227 (24%) 25.66 ± 4.93 125.97 ± 21.88 74.65 ± 13.5 177 subjects 55.03 ± 15.63 126 (71%) / 51 (29%) 25.56 ± 4.07 126.72 ± 17.91 75.93 ± 12.39 Face Self-constructed III. A SSESSMENT A. Datasets The information for each dataset is shown in Table III, and the training data and testing data in the rPPG dataset are listed separately. Listed information for each dataset includes recorded image types, the number of subjects, and statistics about the subject information (age, gender, BMI), SBP, and DBP. The age range in the rPPG is wider than in Dynamic datasets; the former was collected in the hospital while the participants of the latter are the laboratory staff who are mostly young and healthy. Besides, age and BMI information are not collected in the TVGH dataset which estimated age and BMI are involved in the experiment conduction. The SBP and DBP distributions of each dataset are shown in Fig. 6. It’s worth noticing that there are more subjects with hypertension in the rPPG dataset, allowing for a more comprehensive evaluation of the proposed method. The Dynamic dataset aims to collect the individual BP changes in the long term, which provides an observation on the ability to reflect the individual BP changes. Below we describe how each dataset was compiled. 1) rPPG Dataset: This dataset is under the cooperation with Chang Gung Medical Foundation of Taiwan for camera-based BP estimation. It includes 1,138 patients with various diagnoses, including hypertension, diabetes, cardiac disease, and so on. There are 860 and 278 males and females, respectively. Dynamic dataset TVGH dataset 30 subjects / 1278 tuples 30.8 ± 12.42 23 (77%) / 7 (23%) 23.63 ± 4.07 117.77 ± 12.61 72.47 ± 8.48 30 subjects / 393 tuples – 29 (97%) / 1 (3%) – 112.48 ± 15.31 71.25 ± 11.6 Face + palm Wu et al. [24] Face, Infrared light (nighttime) Self-constructed Ages range from 18 to 92. The experiment adopts Logitech C920 webcam with 30 FPS and VGA resolution (640×480) that disables auto white balance, auto gain, and autofocus to record images. Ground-truth BP is measured with a mercury sphygmomanometer. Each subject is asked to sit about 60 cm from the webcam with ambient illuminance maintained between 100 to 300 lux. Firstly, every participant measures the first BP value after resting for 5 minutes, and the 80-second lossless facial image sequences are recorded. Then, the second BP value is measured, and the ground-truth BP is the average of these two BP values. 2) Dynamic Dataset: The Dynamic dataset [24] contains 30 subjects from laboratory staff, including 23 males and 7 females. The 80-second image sequence collection includes face and palm images recorded by two cameras, including a Logitech C920 and Point Grey, respectively. These two cameras collect images synchronously. The facial and palm image samples are shown in Fig. 7. The ambient illuminance is above 300 lux. BP value is measured by the electronic sphygmomanometer (Omron HEM-7121) with a cuff. The overall period of the data collection is a month, and each participant performs the trials 1-3 times a week. Each trial includes a static and two rise stages which are 5 tuples in total. Each tuple contains 80-second facial image sequences and a ground-truth BP value from the average of two measured before and after image collection respectively. It is also worth Authorized licensed use limited to: SHIBAURA INSTITUTE OF TECHNOLOGY. Downloaded on October 23,2023 at 05:16:22 UTC from IEEE Xplore. Restrictions apply. © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information. This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2023.3265857 AUTHOR et al.: PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (FEBRUARY 2017) (a) rPPG dataset 7 Fig. 7: Facial and palm image sample of Dynamic dataset (b) Dynamic dataset Fig. 8: Experimental settings for TVGH dataset B. Evaluation closer ROIs (upper and lower face). Since palm images are available only in the Dynamic dataset, the Leave One Out Cross Validation (LOOCV) is utilized to provide conclusive experiment results. After that, we experimented with different rates of upsampling on RGB channels and extracted the time difference features as input to train different groups of models. This is guaranteed that upsampling makes time difference information in rPPG from the upper and lower faces observable. Afterward, we adopted the single-model structure as a baseline to verify the performance of the proposed backbone model, which is compared to the multi-model approach, and the InfoGAN-assisted training strategy. In addition, the results of state-of-the-art approaches are presented in Section IV-B at the same time. In Section IV-C, the age and BMI estimator are evaluated on the rPPG dataset and Dynamic dataset, this validation is unavailable for the TVGH dataset since ground-truth age and BMI are not recorded. The effect of using real or estimated age and BMI for model selection is also compared. For nighttime applications, we verified the performance of the proposed method on TVGH dataset and the corresponding results are shown in Section IV-D. Finally, we perform the ablation study in Section IV-E, to analyze the effectiveness of the remained factor, including model selection with subject information and synthetic data generation. The performances of all experiments are evaluated by the mean absolute error (MAE), the standard deviation of absolute error (Std(AE)), the mean error (ME), and the standard deviation of the error (Std(E)). The evaluation protocols are shown in Table IV. First, in Section IV-A, to address the effectiveness of the proposed time difference features, we compared the effect of the features obtained from two distant ROIs (face and palm) and two In this section, the results are present in five parts mentioned in Section III-B. All of the CNN models in our proposed (c) TVGH dataset Fig. 6: BP distributions of each dataset. In each figure, the left side is the SBP distribution and the right side is the DBP distribution. noticing that, the ground-truth collection is the same as the one in rPPG dataset. During the rise stage, participants are asked to put their feet up on a stool and apply force to induce BP increments. There is a 5-minute break after each rise stage. 3) TVGH Dataset: To evaluate real-world applications of the proposed method in nighttime situations, we work with the Taipei Veterans General Hospital to collect and create this dataset. Fig. 8 shows the experimental settings. Each subject is in a supine position and sleeps from 9:00 p.m. to 5:00 a.m. the next morning. The recording device is an RGB camera in which the IR filter is removed and is able to collect RGB images under 940 nm IR-light source. Image sequences are recorded in RGB format with only infrared LED at a 940 nm wavelength as extra lighting and no other visible light sources. The ground-truth BP is automatically measured using a WatchBP O3 monitor every half-hour. Some tuples with facial occlusion caused by the sleeping pose are excluded from the experiment. IV. E XPERIMENTAL R ESULTS Authorized licensed use limited to: SHIBAURA INSTITUTE OF TECHNOLOGY. Downloaded on October 23,2023 at 05:16:22 UTC from IEEE Xplore. Restrictions apply. © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information. This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2023.3265857 8 GENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2022 TABLE IV: Evaluation protocol Signal Training data Face + palm Testing data Leave One Out Cross Validation (LOOCV) Upper / lower face rPPG dataset: 961 subjects rPPG dataset: 177 subjects Dynamic dataset: 30 subjects / 1278 tuples TVGH dataset: 30 subjects / 393 tuples TABLE V: Effectiveness of proposed features ftd (a) Effectiveness of proposed features ftd with different ROI on Dynamic dataset SBP (mmHg) DBP (mmHg) ROI type Input signal MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) Upper and lower face rPPG, 30FPS ftd , 30FPS 5.08 5.42 3.82 3.74 -0.36 0.24 5.70 5.71 4.80 5.09 3.36 3.37 0.22 0.52 4.86 4.93 Whole face and palm rPPG, 30FPS ftd , 30FPS 5.15 3.60 3.84 3.45 -0.71 0.02 5.78 4.06 5.08 3.18 3.41 3.32 0.42 -0.11 4.93 4.82 (b) Proposed ftd and rPPG signals after signal upsampling on color channels Input signal rPPG dataset Upsampling (FPS) SBP (mmHg) Dynamic dataset DBP (mmHg) SBP (mmHg) DBP (mmHg) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) rPPG 30 150 180 10.15 9.97 9.87 8.75 8.47 8.51 -2.49 -0.47 -0.67 13.11 12.28 12.29 7.13 6.85 6.81 6.06 5.23 5.20 -0.63 -0.03 0.71 9.25 8.87 8.93 5.82 5.72 5.65 6.54 4.90 4.88 -2.55 -1.31 -1.17 6.92 6.75 6.65 5.78 5.63 5.55 4.47 4.16 4.15 -2.10 -1.47 -0.90 7.27 6.58 6.56 ftd 30 150 180 11.10 9.53 9.48 10.46 8.95 8.97 -1.39 -1.63 -0.89 13.93 12.55 12.29 8.70 6.54 6.52 7.70 5.87 5.89 -0.15 -0.03 -0.19 9.47 8.89 9.76 6.48 5.35 5.31 6.33 4.84 4.85 -1.97 -3.44 -0.81 8.35 7.45 7.31 6.07 5.44 5.42 4.80 4.75 4.74 -1.55 -2.21 -0.77 7.02 6.83 6.82 (a) The filtered rPPGs with no upsampling on RGB channels (30 FPS). (b) The filtered rPPGs with upsampling on RGB channels to 180 FPS. Fig. 9: Effect of signal upsampling on RGB channels. The pulse peaks, foots and upstroke are selected as the observation sites. The time shifts between the peaks marked with circles in (b) are 27.8, 22.2, and 22.2 ms sequentially, which are not enough to be observed for the resolution at 30 FPS in (a). method are implemented with the deep learning framework PyTorch [36]. The weights for all of the models are adjusted by the Adam optimizer with 2 × 10−3 learning rate. A. Effectiveness of Time Difference Features The PTT-based measurement extracted with the time delay of the selected sites from two signals is widely applied to ECG and PPG signals. The selected sites could be the Q or R wave of ECG signals and the peak or foot of the PPG signal. In this study, the PTT-based approach is leveraged to rPPG signals. We use the phase difference of two rPPGs to avoid the PTT fluctuating with the different dispersion or other artifact noises of pulse peaks. The proposed time difference features ftd consist of three phase differences in the inputs rather than the delay time with respect to a certain site. These phase differences are expected to be more robust to noise in the rPPG signals. To verify the existence of time delay information under 30 FPS, ftd is evaluated under the original 30 FPS signals from different ROIs. Compared to the distance between the upper and lower faces, the distance between face and palm is expected to contain more time delay information under 30 FPS. As shown in Table Va, the MAE values improve from 5.15 to 3.60 mmHg for SBP and 5.08 to 3.18 mmHg for DBP. The RGB channels are upsampled to 150 and 180 FPS for purpose of retrieving the time delay information undetectable at 30 FPS. Fig. 9 demonstrates the effect of the upsampling process. The rPPGs of the upper and lower face from the Authorized licensed use limited to: SHIBAURA INSTITUTE OF TECHNOLOGY. Downloaded on October 23,2023 at 05:16:22 UTC from IEEE Xplore. Restrictions apply. © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information. This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2023.3265857 AUTHOR et al.: PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (FEBRUARY 2017) 9 TABLE VI: Comparison results rPPG dataset Approach SBP (mmHg) Baek et al.. [37] Rong et al. [22] Zhou et al. [21] AlexNet ResNet-50 SVR S2-Net FS2-Net Dynamic dataset DBP (mmHg) SBP (mmHg) DBP (mmHg) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) 17.71 16.75 16.31 18.17 17.07 17.30 16.27 16.01 13.67 13.56 12.71 14.47 13.40 14.19 12.88 12.90 -7.81 6.97 6.10 -9.82 -5.22 -7.36 -2.67 -2.71 21.06 20.42 19.79 21.05 21.21 21.13 20.28 20.38 11.27 11.21 11.17 11.50 11.83 10.93 11.83 10.97 8.01 8.35 8.36 8.03 9.06 8.13 7.97 8.57 -0.70 2.43 0.52 0.11 4.99 -1.77 1.35 -0.09 13.67 13.80 13.97 14.02 14.21 13.50 14.02 13.92 9.96 11.25 10.18 10.67 8.74 9.15 8.25 7.40 6.00 7.75 7.38 7.36 6.69 6.43 6.27 6.25 -5.79 2.76 4.22 4.85 -1.25 3.65 1.47 0.50 10.09 11.26 10.13 9.34 10.15 10.57 9.44 8.98 7.42 7.18 6.93 5.65 5.94 5.16 6.07 5.91 5.07 5.10 4.77 4.46 4.92 4.27 4.32 4.48 3.34 0.87 0.67 0.15 -1.09 1.61 0.33 -0.48 8.35 8.77 8.38 6.92 7.64 6.44 7.44 7.40 rPPG Ours (Baseline) Ours (w/ M(.)) Ours (w/ InfoGAN) 15.59 10.15 9.13 10.63 8.75 8.18 -4.13 -2.49 -0.85 18.67 13.11 12.20 10.77 7.13 6.89 7.41 6.06 5.62 -2.71 -0.63 0.19 12.67 9.25 9.46 10.46 5.82 5.60 7.89 6.54 4.41 -2.50 -2.55 -1.07 12.48 7.31 7.33 6.94 5.78 5.55 4.74 4.47 4.23 -2.08 -2.10 -2.08 8.14 7.27 7.30 ftd Ours (Baseline) Ours (w/ M(.)) Ours (w/ InfoGAN) 15.49 9.48 8.78 10.26 8.97 8.02 -5.76 -0.89 -0.27 17.73 12.29 12.13 10.56 6.52 6.16 7.28 5.89 5.47 -2.24 -0.19 0.82 12.65 9.76 9.11 9.09 5.31 5.22 7.27 4.85 4.24 2.28 -0.81 -0.74 12.02 6.92 6.74 6.42 5.42 5.31 4.40 4.74 4.11 -0.11 -0.77 -0.27 8.14 6.82 6.91 * See [38] and [24] for the rPPG and Dynamic datasets, respectively. RGB signal at the original 30 FPS are plotted in Fig. 9a. And the ones from the upsampled RGB are shown in Fig. 9b. The blue waves stand for the rPPGs of the upper face, and the orange ones are the rPPGs of the lower face. To observe intuitively, we select the peaks, the troughs, and the upstroke of the pulse as the sites which are marked with circles, rectangles, and triangles respectively. The red and blue vertical lines are added for convenient observation. In Fig. 9a, the red and blue lines around the peaks overlap so completely that there is no time shift between the peaks in the 30-FPS signals from the upper and lower face. Upsampling the RGB channels helps to capture the time shifts between the peaks as shown in Fig. 9b. The time shifts between the peaks here are 27.8, 22.2, and 22.2 ms sequentially, which are not enough to be observed for the resolution at 30 FPS in Fig. 9a. For the troughs and the upstrokes in Fig. 9b, some time shifts between these sites are also extracted after the upsampling process. From the time shifts shown in Fig. 9, it is found that the upsampling process helps to retrieve the time difference information of some observation sites. With the findings, we applied the upsampling process before the ftd extraction to derive a more accurate BP measurement. The results in Table Vb show improved SBP and DBP MAEs on both the rPPG and Dynamic datasets. From the experimental results, ftd extracts the time delay information, helping the model to focus on important features rather than the rPPG inputs. The time delay information in the rPPG source of the upper and the lower faces is also can be extracted after signal upsampling. B. Comparison to State-of-the-Art Methods In Table VI we present the overall results and the comparison to state-of-the-art methods. Our approach includes two kinds of model inputs: the rPPG signal y and the time difference features ftd . Three results are listed for each input signal. The proposed baseline approach is tested with a single model with the ENC-DEC architecture. Following this is the multi-model structure which includes models responsible for the various BP ranges; the age/BMI subject information is used to select the BP model. Shown last is the multi-model trained with synthetic data generation. The models are trained with the assistance of InfoGAN, for which we set the output of the generator to fake rPPGs and the discrete noise to the age and BMI. This is expected to assist the model(F and B) training. For the rPPG dataset, the MAEs of the baseline of our approach is 15.59 and 10.77 mmHg for SBP and DBP, which is better than the state-of-the-art. By replacing the model inputs with the time difference features, taking subject information into consideration, and generating synthetic data, the MAEs are reduced to 8.78 and 6.16 mmHg for SBP and DBP. The standard deviation of the MAE is reduced from 10.63 to 8.02 mmHg for SBP and from 7.41 to 5.47 mmHg for DBP. For the Dynamic dataset—in this study a cross-dataset evaluation—the MAEs are 5.22 and 5.31 mmHg for SBP and DBP respectively. The standard deviation of the MAE is 4.24 and 4.11 mmHg for SBP and DBP. The proposed method also outperforms the state-of-the-art methods on this dataset. The best results among the state-of-the-art methods are with SBP MAE 7.40 mmHg from [24] and DBP MAE 5.16 mmHg, which is under an inner-dataset protocol. While our approach is tested under the cross-dataset protocol. That is, in our approach, model selection, and synthetic data generation effectively reach a comparable result under a more challenging situation. Besides, Fig. 10 sample shows that our proposed method followed the trend of the BP changes. The BP groundtruths are marked with blue dots, the results of the baseline BP estimation with the ENC-DEC backbone are marked with orange stars, and the ones of BP estimation by the proposed measuring flow are marked with yellow triangles. As shown in the figure, the ground-truth BP increased in the rise stages (1st Rise and 2nd Rise) for both SBP and DBP and decreased after short breaks (1st Break and 2nd Break). For the SBP in Fig. 10a, the estimated results of the baseline model only express the changes slightly in the first break (1st Break) and second rise (2nd Rise) stage while the ones of the proposed method perfectly reflect the changes of each stage. For the DBP in Fig. 10b, both the baseline and the proposed estimation increased and decreased correspondingly to most stages. However, the proposed method reflects the changes more delicately with smaller errors. This sample shows that the proposed method followed the trend of the BP changes Authorized licensed use limited to: SHIBAURA INSTITUTE OF TECHNOLOGY. Downloaded on October 23,2023 at 05:16:22 UTC from IEEE Xplore. Restrictions apply. © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information. This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2023.3265857 10 GENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2022 (a) SBP change (b) DBP change Fig. 10: The sample of SBP and DBP change of each stage in Dynamic dataset. (a) SBP (b) DBP Fig. 11: Bland-Altman plot of SBP and DBP on rPPG dataset. (a) SBP (b) DBP Fig. 12: Bland-Altman plot of SBP and DBP on Dynamic dataset. better than the baseline models. In Figure 11 and Figure 12 show the Bland-Altman plots of BP estimation on rPPG dataset and Dynamic dataset, respectively. These subfigures are plotted with our best results in Table VI i.e., the multi-model structure using time difference feature as input, finetuned by InfoGAN-assisted training, and using real age and BMI to select the model. The correlations between estimated SBP and ground-truth values are above 0.7 and are above 0.6 for DBP. C. Comparison of Estimated Age and BMI 1) Performance of Age Recognition: The evaluation results of the age recognition are presented in Table VIIa. The MAEs are 4.69±2.98 on the rPPG dataset with age ranges from 18 to 92, and 3.01±2.98 on the Dynamic dataset with an age range of 21 to 62 years old. 2) Performance of BMI Estimation: In Table VIIb, it shows the performance of the BMI estimation. The MAE with image pre-processing is 2.15 and 2.52 kg/m2 for the rPPG and Dynamic datasets respectively. Compare to no image preprocessing ones, the MAEs are improved by 1.53 and 0.78 kg/m2 . The age and BMI estimators provide feasible results for the purpose of purely facial image-based BP measurement. 3) Real and Estimated Subject Information: We compare the real and estimated subject information in Table VIIc. In most cases, the MAEs increase by approximately 2 mmHg. However, the performance with the estimated subject information is still better than the ones of the state-of-the-art methods. D. Nighttime Application with Estimated Subject Information For real-world usage in nighttime conditions as TVGH dataset, the subject information is replaced with estimated values based on the facial images. The images were recorded by a de-filtered RGB camera with extra infrared light sources. Authorized licensed use limited to: SHIBAURA INSTITUTE OF TECHNOLOGY. Downloaded on October 23,2023 at 05:16:22 UTC from IEEE Xplore. Restrictions apply. © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information. This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2023.3265857 AUTHOR et al.: PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (FEBRUARY 2017) 11 TABLE VII: Comparison on Estimated Age and BMI (b) Performance of BMI estimation (a) Performance of age estimation Age Dataset MAE Std(AE) ME Std(E) rPPG 4.69 2.98 -4.09 3.76 Dynamic 3.01 3.23 -1.54 4.13 BMI Dataset Image pre-processing MAE Std(AE) ME Std(E) rPPG w/o w/ 3.68 2.15 2.93 1.73 -0.11 -0.21 4.72 2.75 Dynamic w/o w/ 3.34 2.52 2.42 1.83 0.75 0.14 4.05 3.11 (c) Real and estimated subject information on proposed ftd and rPPG signals rPPG dataset Input signal Subject Info. SBP (mmHg) Dynamic dataset DBP (mmHg) SBP (mmHg) DBP (mmHg) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) rPPG Real Estimated 10.15 13.25 8.75 10.78 -2.49 -1.90 13.11 17.49 7.13 7.75 6.06 6.65 -0.63 -1.48 9.25 10.63 5.82 8.79 6.54 7.14 -2.55 0.92 7.31 15.07 5.78 7.16 4.47 6.59 -2.10 0.49 7.27 11.23 ftd Real Estimated 9.48 11.49 8.97 9.69 -0.89 -0.82 12.29 15.55 6.52 7.59 5.89 6.40 -0.19 0.57 9.76 10.28 5.31 7.07 4.85 6.43 -0.81 1.81 6.92 13.17 5.42 5.92 4.74 4.34 -0.77 2.12 6.82 10.72 TABLE VIII: TVGH dataset results SBP (mmHg) Approach DBP (mmHg) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) rPPG Ours (Baseline) Ours (w/ M(.)) Ours (w/ InfoGAN) 21.58 15.83 14.70 12.95 12.02 10.78 -0.71 -6.02 3.83 24.96 18.72 16.12 9.74 9.21 8.88 7.20 6.81 6.62 1.84 1.13 1.36 11.63 11.85 11.33 ftd Ours (Baseline) Ours (w/ M(.)) Ours (w/ InfoGAN) 16.45 12.23 11.12 11.96 10.71 9.62 8.39 2.96 3.44 17.61 15.75 15.74 9.71 8.32 7.59 8.05 7.07 5.88 4.46 3.95 4.04 11.66 11.18 11.37 (a) SBP (b) DBP Fig. 13: Bland-Altman plot of SBP and DBP on TVGH dataset. The results are shown in Table VIII, and the best ones occur as the model with the time difference features as inputs and trained using synthetic data. The MAEs are 11.12 and 7.59 mmHg; the resulting improvement ratios are 48.47% and 22.07% for SBP and DBP respectively. Bland-Altman plots of SBP and DBP estimation are shown in Fig. 13a and Fig. 13b. Although there is still space for improvement in nighttime BP measurement using only IR light due to its cross-domain nature, the MAEs are significantly improved by the proposed method. E. Ablation Study 1) Model Selection with Subject Information: Here, we address the comparison between the results with and without model selection with subject information. The subject infor- mation, which is age and BMI, and multi-model structure are leveraged as a calibration mechanism. The multi-model ensures that each model focuses on a certain BP range, eliminating overfitting issues and yielding more precise results. The effect of this model selection with subject information is presented in Table IXa. The multi-model structure and subject information improve the MAE and the standard deviation of the MAE for both the rPPGs y and the time difference features ftd . Both the rPPG and Dynamic datasets achieve the best performance with ftd as the model input. These results demonstrate that multi-model structure and model selection with subject information indeed improve performance. 2) Synthetic Data Generation: We generate synthetic data to compensate for the lack of data. To take the subject Authorized licensed use limited to: SHIBAURA INSTITUTE OF TECHNOLOGY. Downloaded on October 23,2023 at 05:16:22 UTC from IEEE Xplore. Restrictions apply. © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information. This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2023.3265857 12 GENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2022 TABLE IX: Ablation Study (a) Proposed model selection with subject information on proposed ftd and rPPG signals rPPG dataset Input signal M (·)with subject info Dynamic dataset SBP (mmHg) DBP (mmHg) SBP (mmHg) DBP (mmHg) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) rPPG w/o w/ 15.59 10.15 10.63 8.75 -4.13 -2.49 18.67 13.11 10.77 7.13 7.41 6.06 -2.71 -0.63 12.67 9.25 10.46 5.82 7.89 6.54 -2.50 -2.55 12.48 7.31 6.94 5.78 4.74 4.47 -2.08 -2.10 8.14 7.27 ftd w/o w/ 15.49 9.48 10.26 8.97 -5.76 -0.89 17.73 12.29 10.56 6.52 7.28 5.89 -2.24 -0.19 12.65 9.76 9.09 5.31 7.27 4.85 2.28 -0.81 12.02 6.92 6.42 5.42 4.40 4.74 -0.11 -0.77 8.14 6.82 (b) Proposed InfoGAN-assisted synthetic data generation on proposed ftd and rPPG signals rPPG dataset Input signal InfoGAN Dynamic dataset SBP (mmHg) DBP (mmHg) SBP (mmHg) DBP (mmHg) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) rPPG w/o w/ 10.15 9.13 8.75 8.18 -2.49 -0.85 13.11 12.20 7.13 6.89 6.06 5.62 -0.63 0.19 9.25 9.46 5.82 5.60 6.54 4.41 -2.55 -1.07 7.31 7.33 5.78 5.55 4.47 4.23 -2.10 -2.08 7.27 7.30 ftd w/o w/ 9.48 8.78 8.97 8.02 -0.89 -0.27 12.29 12.13 6.52 6.16 5.89 5.47 -0.19 0.82 9.76 9.11 5.31 5.22 4.85 4.24 -0.81 -0.74 6.92 6.74 5.42 5.31 4.74 4.11 -0.77 -0.27 6.82 6.91 (c) Comparison of different physiological inputs rPPG Dataset M (·)with subject info Age, BMI Input signal Dynamic Dataset SBP (mmHg) HR rPPG ftd DBP (mmHg) SBP (mmHg) DBP (mmHg) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) MAE Std(AE) ME Std(E) 11.36 10.15 9.48 10.35 8.75 8.97 0.30 -2.49 -0.89 15.36 13.11 12.29 7.69 7.13 6.52 6.87 6.06 5.89 0.09 -0.63 -0.19 10.31 9.25 9.76 5.92 5.82 5.31 6.63 6.54 4.85 -1.14 -2.55 -0.81 7.71 7.31 6.92 5.93 5.78 5.42 4.50 4.47 4.74 -0.49 -2.10 -0.77 6.66 7.27 6.82 TABLE X: Degree of the impact of each factor of the proposed method (a) rPPG dataset Input singal rPPG ftd ftd ftd ftd Model selection Not applied Not applied Applied Applied Applied Subject info. Not applied Not applied Estimated Real Real InfoGAN Not applied Not applied Not applied Not applied Applied SBP (mmHg) DBP (mmHg) MAE Improvement MAE Improvement 15.59 15.49 11.49 9.48 8.78 – 0.10 4.00 2.01 0.70 10.77 10.56 7.59 6.52 6.16 – 0.21 2.97 1.07 0.36 Average improvement (mmHg) – 0.15 3.49 1.54 0.53 (b) Dynamic dataset Input singal rPPG ftd ftd ftd ftd Model selection Not applied Not applied Applied Applied Applied Subject info. Not applied Not applied Estimated Real Real InfoGAN Not applied Not applied Not applied Not applied Applied SBP (mmHg) DBP (mmHg) MAE Improvement MAE Improvement 10.46 9.09 7.07 5.31 5.22 – 1.37 2.02 1.76 0.09 6.94 6.42 5.92 5.42 5.31 – 0.52 0.50 0.50 0.11 information into account, we adapt InfoGAN to generate data representing certain age and BMI pairs that are missing in the datasets. We evaluate both the rPPG signals and the time difference features. Note that in both cases, the generated data is the rPPG signal. If necessary, these fake rPPG signals are transformed to the time difference features during the model training. We evaluate the InfoGAN-assisted synthetic data generation in Table IXb. The rPPG signals and time difference features both exhibit improved performance with synthetic data generation. The model with the time difference features as input with synthetic data generation yields the best result for all Average improvement (mmHg) – 0.95 1.26 1.13 0.10 metrics on both the rPPG and Dynamic datasets. 3) Physiological Inputs: In Table IXc, the comparison of different physiological inputs including heart rate, rPPGs, and ftd is addressed. If we substitute the serial heart rate values with the rPPG signals, the MAEs on the rPPG dataset decrease from 11.36 to 10.15 mmHg for SBP and 7.69 to 7.13 mmHg for DBP. Also, the Std(E) is improved by 2.25 and 1.06 mmHg for SBP and DBP, respectively. The model doesn’t perform better with heart rate as the extracted hand-crafted feature than serial rPPGs. When replacing the input rPPG signals with the proposed time difference features ftd , the MAEs improved by 0.67 mmHg and 0.61 Hg for SBP and DBP, respectively. From Authorized licensed use limited to: SHIBAURA INSTITUTE OF TECHNOLOGY. Downloaded on October 23,2023 at 05:16:22 UTC from IEEE Xplore. Restrictions apply. © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information. This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2023.3265857 AUTHOR et al.: PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (FEBRUARY 2017) the experiment results, ftd is the feature that achieved the best performance among these three physiological inputs. V. C ONCLUSION Here, overall, we address the efficiency of every component proposed in this study, and the factors that influence the performance of the BP estimation and the corresponding MAE improvement are listed in Table X. The summary is as follows: (a) Replacing the model inputs with the proposed time difference features: In the case of the baseline model, MAE and Std(AE) of SBP and DBP are improved when our proposed features ftd are applied. (b) Model selection with estimated subject information: When the model selection is introduced, the MAE of SBP on rPPG dataset is improved from 15.49 to 11.49 mmHg, which is reduced by 4.00 mmHg; the MAE of DBP is also improved from 10.56 to 7.59 mmHg, which is reduced by 3.49 mmHg. (c) Model selection with real subject information: As replace the estimated age and BMI with real ones, the MAEs can be further improved from 11.49 to 9.48 mmHg and from 7.59 to 6.52 mmHg for SBP and DBP, with improvements of 2.01 and 1.54 mmHg respectively. (d) Using InfoGAN-assisted training (synthetic data generation): The MAEs of SBP and DBP are reduced by 0.70 mmHg and 0.53 mmHg respectively. According to the improvements in MAEs, the impact degree is as follows, from the most to the least: 1. Model selection with estimated subject information. 2. Replacing the estimated subject information with the real one. 3. Using InfoGAN-assisted training (synthetic data generation) and the proposed time difference features ftd as model inputs lead to a similar degree of improvement. It’s worth noticing that the subject information contributes the most to the performance lifting. However, the proposed time difference features ftd result in the least improvement. Moreover, there are small differences between the performance of different physiological inputs including HR, rPPGs and the time difference features. For the task of BP measurement from facial video, there is still room for further improvement, such as searching for other efficient physiological features more related to BP. In general, we propose CNN-based contactless BP measurement via rPPG signals. For the proposed time difference features, the CNN model inputs extract time delay information from the upper and lower face rPPG signals after signal upsampling. Besides, we take into account age and BMI information, which influence BP, via a multi-model structure in which models are selected based on the subject information, and show that this improves performance even for crossdataset evaluation. To further consider the lack of some subject information, we adopt InfoGAN which learns the characteristic of age and BMI with a training procedure for improved model convergence and better results. The experimental results, including a comparison to stateof-the-art approaches, an evaluation of estimated subject information, and an ablation study, provide strong evidence that 13 the proposed method is robust to cross-dataset evaluation and achieves the best performance for all tests. In the future, for nighttime applications, infrared-light rPPG construction should be considered more comprehensively to improve the performance of the proposed BP measurement. On the other hand, to refine the algorithm, the ground-truth values collected in an invasive way should be introduced for producing more precise measurements. VI. ACKNOWLEDGEMENT This work was supported by the National Science and Technology Council under MOST 111-2221-E-A49-166-MY3. The rPPG dataset used in this work is cooperated with Chang Gung Medical Foundation, Taiwan, and is approved by the institutional review board under no. 201900668B0C502. The TVGH dataset used in this work is cooperated with Taipei Veterans General Hospital and is approved by the institutional review board under no. 2019-12-016BC. R EFERENCES [1] B. Zhou, P. Perel, G. A. Mensah, and M. Ezzati, “Global epidemiology, health burden and effective interventions for elevated blood pressure and hypertension,” Nature Reviews Cardiology, vol. 18, no. 11, pp. 785–802, 2021. [2] M. H. Olsen, S. Y. Angell, S. Asma, P. Boutouyrie, D. Burger, J. A. Chirinos, A. Damasceno, C. Delles, A.-P. Gimenez-Roqueplo, D. Hering et al., “A call to action and a lifecourse strategy to address the global burden of raised blood pressure on current and future generations: the lancet commission on hypertension,” The Lancet, vol. 388, no. 10060, pp. 2665–2712, 2016. [3] B. Zhou, R. M. Carrillo-Larco, G. Danaei, L. M. Riley, C. J. Paciorek, G. A. Stevens, E. W. Gregg, J. E. Bennett, B. Solomon, R. K. Singleton et al., “Worldwide trends in hypertension prevalence and progress in treatment and control from 1990 to 2019: a pooled analysis of 1201 population-representative studies with 104 million participants,” The Lancet, vol. 398, no. 10304, pp. 957–980, 2021. [4] J. Liu, C. G. Sodini, Y. Ou, B. Yan, Y.-T. Zhang, and N. Zhao, “Feasibility of fingertip oscillometric blood pressure measurement: Model-based analysis and experimental validation,” IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 2, pp. 533–542, 2020. [5] P. Zhang, C. Liu, H. Chen, and J. Liu, “Reconstruction of continuous brachial arterial pressure from continuous finger arterial pressure using a two-level optimization strategy,” IEEE Transactions on Biomedical Engineering, vol. 67, no. 11, pp. 3173–3184, 2020. [6] A. Argha, B. G. Celler, and N. H. Lovell, “A novel automated blood pressure estimation algorithm using sequences of korotkoff sounds,” IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 4, pp. 1257–1264, 2021. [7] P. Su, X.-R. Ding, Y.-T. Zhang, J. Liu, F. Miao, and N. Zhao, “Longterm blood pressure prediction with deep recurrent neural networks,” in 2018 IEEE EMBS International Conference on Biomedical Health Informatics (BHI), 2018, pp. 323–328. [8] S. Yang, J. Sohn, S. Lee, J. Lee, and H. C. Kim, “Estimation and validation of arterial blood pressure using photoplethysmogram morphology features in conjunction with pulse arrival time in large open databases,” IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 4, pp. 1018–1030, 2021. [9] D. U. Jeong and K. M. Lim, “Combined deep cnn-lstm network-based multitasking learning architecture for noninvasive continuous blood pressure estimation using difference in ecg-ppg features,” Scientific Reports, vol. 11, no. 1, pp. 1–8, 2021. [10] C. Landry, E. T. Hedge, R. L. Hughson, S. D. Peterson, and A. Arami, “Accurate blood pressure estimation during activities of daily living: A wearable cuffless solution,” IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 7, pp. 2510–2520, 2021. [11] A.-G. Pielmuş, M. Pflugradt, T. Tigges, M. Klum, A. Feldheiser, O. Hunsicker, and R. Orglmeister, “Novel computation of pulse transit time from multi-channel ppg signals by wavelet transform: Towards continuous, non-invasive blood pressure estimation,” Current Directions in Biomedical Engineering, vol. 2, no. 1, pp. 209–213, 2016. [Online]. Available: https://doi.org/10.1515/cdbme-2016-0047 Authorized licensed use limited to: SHIBAURA INSTITUTE OF TECHNOLOGY. Downloaded on October 23,2023 at 05:16:22 UTC from IEEE Xplore. Restrictions apply. © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information. This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/JBHI.2023.3265857 14 GENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2022 [12] J. Liu, B. P. Yan, Y.-T. Zhang, X.-R. Ding, P. Su, and N. Zhao, “Multiwavelength photoplethysmography enabling continuous blood pressure measurement with compact wearable electronics,” IEEE Transactions on Biomedical Engineering, vol. 66, no. 6, pp. 1514–1525, 2019. [13] J. Liu, S. Qiu, N. Luo, S.-K. Lau, H. Yu, T. Kwok, Y.-T. Zhang, and N. Zhao, “Pca-based multi-wavelength photoplethysmography algorithm for cuffless blood pressure measurement on elderly subjects,” IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 3, pp. 663– 673, 2021. [14] G. Slapničar, N. Mlakar, and M. Luštrek, “Blood pressure estimation from photoplethysmogram using a spectro-temporal deep neural network,” Sensors, vol. 19, no. 15, p. 3420, 2019. [15] A. Chakraborty, D. Goswami, J. Mukhopadhyay, and S. Chakrabarti, “Measurement of arterial blood pressure through single-site acquisition of photoplethysmograph signal,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–10, 2021. [16] S. Haddad, A. Boukhayma, and A. Caizzone, “Continuous ppg-based blood pressure monitoring using multi-linear regression,” IEEE Journal of Biomedical and Health Informatics, pp. 1–1, 2021. [17] C. Han, M. Gu, F. Yu, R. Huang, X. Huang, and L. Cui, “Calibration-free blood pressure assessment using an integrated deep learning method,” in 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2020, pp. 1001–1005. [18] G. de Haan and V. Jeanne, “Robust Pulse Rate From ChrominanceBased rPPG,” IEEE Transactions on Biomedical Engineering, vol. 60, no. 10, pp. 2878–2886, 2013. [19] W. Wang, A. C. den Brinker, S. Stuijk, and G. de Haan, “Algorithmic Principles of Remote PPG,” IEEE Transactions on Biomedical Engineering, vol. 64, no. 7, pp. 1479–1491, 2017. [20] B.-F. Wu, Y.-C. Wu, and Y.-W. Chou, “A compensation network with error mapping for robust remote photoplethysmography in noise-heavy conditions,” IEEE Transactions on Instrumentation and Measurement, pp. 1–1, 2022. [21] Y. Zhou, H. Ni, Q. Zhang, and Q. Wu, “The noninvasive blood pressure measurement based on facial images processing,” IEEE Sensors Journal, vol. 19, no. 22, pp. 10 624–10 634, 2019. [22] M. Rong and K. Li, “A blood pressure prediction method based on imaging photoplethysmography in combination with machine learning,” Biomedical Signal Processing and Control, vol. 64, p. 102328, 2021. [23] F. Schrumpf, P. Frenzel, C. Aust, G. Osterhoff, and M. Fuchs, “Assessment of non-invasive blood pressure prediction from ppg and rppg signals using deep learning,” Sensors, vol. 21, no. 18, p. 6022, 2021. [24] B.-F. Wu, B.-J. Wu, B.-R. Tsai, and C.-P. Hsu, “A facial-imagebased blood pressure measurement system without calibration,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–13, 2022. [25] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2015. [26] M. F. O’Rourke and J. Hashimoto, “Mechanical factors in arterial aging: A clinical perspective,” Journal of the American College of Cardiology, vol. 50, no. 1, pp. 1–13, 2007. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0735109707012636 [27] M. Gao, H.-M. Cheng, S.-H. Sung, C.-H. Chen, N. B. Olivier, and R. Mukkamala, “Estimation of pulse transit time as a function of blood pressure using a nonlinear arterial tube-load model,” IEEE Transactions on Biomedical Engineering, vol. 64, no. 7, pp. 1524–1534, 2017. [28] D. Barvik, M. Cerny, M. Penhaker, and N. Noury, “Noninvasive continuous blood pressure estimation from pulse transit time: A review of the calibration models,” IEEE Reviews in Biomedical Engineering, vol. 15, pp. 138–151, 2022. [29] W. Drøyvold, K. Midthjell, T. Nilsen, and J. Holmen, “Change in body mass index and its impact on blood pressure: a prospective population study,” International journal of obesity, vol. 29, no. 6, pp. 650–655, 2005. [30] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “Infogan: Interpretable representation learning by information maximizing generative adversarial nets,” Advances in neural information processing systems, vol. 29, 2016. [31] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016. [32] B.-F. Wu, Y.-C. Wu, L.-W. Chiu, and H.-P. Liu, “Soft label with channel encoding for dependent facial image classification,” IEEE Access, vol. 10, pp. 10 661–10 672, 2022. [33] L. Wen and G. Guo, “A computational approach to body mass index prediction from face images,” Image and Vision Computing, vol. 31, no. 5, pp. 392–400, 2013. [34] B.-F. Wu, B.-R. Chen, and C.-F. Hsu, “Design of a facial landmark detection system using a dynamic optical flow approach,” IEEE Access, vol. 9, pp. 68 737–68 745, 2021. [35] H. Drucker, C. J. Burges, L. Kaufman, A. Smola, and V. Vapnik, “Support vector regression machines,” vol. 9, 1996. [36] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, pp. 8026–8037, 2019. [37] S. Baek, J. Jang, and S. Yoon, “End-to-end blood pressure prediction via fully convolutional networks,” IEEE Access, vol. 7, pp. 185 458– 185 468, 2019. [38] B.-R. Tsai and B.-F. Wu, “A calibration-free and facial-imagebased blood pressure measurement system,” [Unpublished master’s thesis], National Yang Ming Chiao Tung University, 2021. [Online]. Available: https://etd.lib.nctu.edu.tw/cgi-bin/gs32/tugsweb.cgi? o=dnctucdr&s=id=%22GT070860003%22.&searchmode=basic Authorized licensed use limited to: SHIBAURA INSTITUTE OF TECHNOLOGY. Downloaded on October 23,2023 at 05:16:22 UTC from IEEE Xplore. Restrictions apply. © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.