Chinese Journal of Electronics Vol.26, No.2, Mar. 2017 Adaptive Data Wiping Scheme with Adjustable Parameters for Ext4 File System∗ ZHANG Peng1,2 , NIU Shaozhang1 , HUANG Zhenpeng1 and QIN Xiaohua1 (1. Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China) (2. School of Physics Electrical Engineering, Ningxia University, Yinchuan 750021, China) Abstract — Data wiping is a useful technology which can prevent possible data recovery in a file system. But the growth of the amount of data usually leads to a decline in wiping efficiency. In order to make improvement, a novel data wiping scheme for Ext4 file system DWSE is proposed. It includes two proposed algorithms for wiping files and free space adaptively. According to a rate of rest blocks which is specified by users, the file wiping algorithm WFile tries to clean part of a selected file for saving time. The free space wiping algorithm WFree tries to speed up the process of cleaning dirty blocks by employing a random sampling and hypothesis testing method with two adjustable rates which represent status and content of a block group. A journal cleaning algorithm CleanJ is also proposed which tries to clean old records by creating and deleting temporary files for preventing data recovery from a journal file. With the help of parameters, users can wipe their data in a balanced way between security and efficiency. Several experiments are performed on our scheme. The experimental results show that our scheme can wipe files and free space in different security and efficiency with different parameters. Our scheme can achieve higher security and efficiency than other two data wiping schemes. Key words — Data wiping, Adjustable parameters, Ext4, Random sampling, Hypothesis testing. I. Introduction In July 2014, Avast (a famous security software vendor) claimed that they recovered an abundance of personal data from 20 used smartphones[1]. In order to prevent the possible recovery of sensitive data, it is necessary to clean files by a data wiping scheme. The key of data wiping is to overwrite original data with garbage data. Once original data has been replaced by garbage data, any information carried by them will lose forever. Peter Gutmann has proposed a wiping method which can overwrite original data for 35 times with var- ious garbage data, such as all 0s, all 1s, random data, etc [2] . It is hardly to recover original data after so many times of overwriting. But a disadvantage of this method is a great time consuming. Boneh et al. have proposed another wiping method which encrypted original data with a key[3] . Since the only thing need to be overwritten is the key, data can be wiped in a short time. The disadvantage is that the security heavily rely on a selected encryption algorithm and the safeguard of a key. If anyone has ability to break the selected encryption algorithm or get the key, data can be decrypted easily. Seung-Hoon Kang et al. has proposed a novel wiping method without employing any encryption algorithm[4]. It tries to wipe a file by continuous overwriting several bytes and then skipping a fixed interval. According to their experimental result, this method can make a wiped file unidentifiable even if it can be recovered. However, there are two shortcomings in their method. First, although part of security is sacrificed for improving the efficiency, it is better to offer choices for helping users to wipe their data based on their actual needs rather than make a decision by the method itself. Second, they mainly focus on how to wipe existing files of a file system. But there is no appropriate solution for wiping deleted files. Considering these shortcomings, a novel data wiping scheme for Ext4 file system DWSE is proposed in this paper. It has algorithms for wiping not only existing files but also deleted files. With the help of adjustable parameters of DWSE, users can also wipe their data in a balanced way between security and efficiency. The rest of this paper is organized as follows: Section II introduces related works which include two data recovery methods for an Ext4 file system. Section III describes our scheme in details. Sec- ∗ Manuscript Received Jan. 29, 2015; Accepted Apr. 19, 2015. This work is supported by the National Natural Science Foundation of China (No.61070207, No.61370195), and Beijing Natural Science Foundation (No.4132060). c 2017 Chinese Institute of Electronics. DOI:10.1049/cje.2016.06.016 Adaptive Data Wiping Scheme with Adjustable Parameters for Ext4 File System tion IV represents experimental results. The conclusion is made in Section V. II. Related Work In an Ext4 file system, data is organized and managed by a extend tree[5,6] . When a file is deleted, the file system will change information of index node (inode for short) to make the file unable to be indexed. But the content of a deleted file may remain intact and possible to be recovered. As far as we know, there are two mainly method that can recover deleted files from an Ext4 file system. The first one tries to recover them based on a special file, so-called a journal file. The journal file records recent operations of files, such as writing, deleting, etc. The inode of a deleted file can possible be found in a journal file. Followed information of a recovered inode, the content of a deleted file can possible be recovered. Fairbanks et al. introduced several recovery methods which take a journal file as “a source of previous data or metadata”[6] . D. Kim et al. developed a tool based on a journal file for providing data recovery and analysis of user actions[7] . An open-source tool, Extundelete[8] , also has an ability to recover deleted files based on a journal file. But a disadvantage of the first method is the fixed size of a journal file. It can only recover recent deleted files since new records of a journal file will replace old records continuously. The second recovery method is so-called a file carving method. It can recover deleted files without the helping of a file system. Based on special information of a file’s header and footer, partial (even full) content of a deleted file can possible be found and reorganized after an entire storage medium is scanned. Richard has proposed a useful tool, Scalpel[9] , which can recovery files based on lots of predefined headers and footers of many different file types. Lee et al. has proposed a file carving approach which can recover deleted files more efficient based on an analysis of Ext2/3 file system[10] . A disadvantage of this method is the reliance on a file’s header and footer. Once they are damaged, the amount of recoverable files would be greatly reduced. III. Proposed Data Wiping Scheme In order to prevent data recovery from two mentioned methods and help users to wipe data more efficiently, DWSE is proposed. It has three processing stages: object selection, parameter configuration and object wiping. The framework of DWSE is shown in Fig.1. In the first stage, DWSE allow users to select what they want to wipe. If a file is selected, DWSE will wipe it as a “File” object. Otherwise, DWSE will wipe it as a “Free Space” object if a disk device (such as /dev/sda1) 393 is selected. In the second stage, DWSE allow users to configure adjustable parameters of WFile, WFree and CleanJ based on their requirements. In the last stage, DWSE perform a wiping algorithm based on previous selections. The details of algorithms are described in the rest of this section. Fig. 1. Framework of DWSE 1. The algorithm for cleaning a journal file In order to prevent a possible recovery from a journal file, we should find a reliable way to clean the content of it. Allison Henderson has proposed a patch for cleaning a journal file of an Ext4 file system[11] . The patch will modify the source code of Ext4 and JBD2 (Journaling block device 2) for adding several functions which can keep track of a journal file and flush all journal blocks related to a deleted file. But it is not a good idea for ordinary users to change the source code of a file system. Moreover, modifying a journal file is a dangerous operation and may cause an Ext4 file system fail to be mounted. Therefore, we try to use one feature of a journal file, fixed size, to be a possible cleaning way. The steps of CleanJ are shown as follows: 1) Create temporary files while the number of files is specified by an adjustable parameter Ct ; 2) Call “fallocate” to expand the size of every temporary file to the size of a block; 3) Call “sync” to clean the write buffer of a file system; 4) Delete temporary files one by one and call “sync” at the same time. The “fallocate” and “sync” function are two system calls which can be supported by an Ext4 file system. We use them to create temporary files and flush the buffer of a file system. The creation and deletion of temporary files will generate many new records in a journal file. Since the size of a journal file is fixed, old records will be removed. In order to determine the value range of “Ct ”, we performed CleanJ on several Ext4 file systems with different Chinese Journal of Electronics 394 size and found out the maximum of “Ct ” can be calculated by the following formula: max(Ct ) ≈ sizeof(JournalFile) sizeof(block) × 8 The “sizeof” function is for calculating the size in bytes of a journal file or a block of a file system. If users want to maintain a lower data remnant in a journal file, a higher “Ct ” should be configured. Otherwise, a lower “Ct ” can be set for saving time. 2. The algorithm for wiping a “File” object In order to prevent data recovery from a file carving method, WFile is proposed for wiping a selected file more efficiently. The steps of WFile are shown as follows: 1) Obtain an inode of a file and get the range of blocks from the inode; 2) Use Op to overwrite the first and the last block for Ot times; 3) If (Or >0) use Op to overwrite rest blocks which specified by Or for Ot times; 4) Unlink a file and release file blocks by the inode; 5) Clean and free the inode; 6) Call CleanJ(Ct ) to clean a journal file. “Or ” represent the rate of rest blocks except the first and the last block of a file. “Op ” and “Ot ” represent the pattern of garbage data and overwriting times. In the first 3 steps, WFile tries to destroy header and footer information of a file. Rest blocks can be skipped for improving efficiency based on the value of “Or ”. After overwriting is finished, WFile tries to remove a file in the next 2 steps. Finally, CleanJ is performed to clean a journal file. The value of “Op ” and “Ot ” can be configured according to Ref.[2] and similar standards of data wiping. But the configuration of “Or ” should be set based on a file format. If a file is not a plain encoding format (such as JPG, MP3, etc.), it is not necessary to overwrite all rest blocks because they can hardly be distinguished and organized without information of file’s header and footer. On the contrary, all blocks should be overwritten for maintaining the security if a file is a plain encoding format (such as a text file, etc.). We believe the value of “Or ” should be set according to the following formula: 100%, plain encoding Or = 10% ∼ 100%, not plain encoding 3. The algorithm for wiping a “Free Space” object Since free space can also contain recoverable sensitive data, it is not enough for a wiping scheme which can only wipe files. We call a block dirty if it contain nonzero bytes while its status is unallocated. Although filling free space with a big garbage file might be a possible way to clean dirty blocks, Garfinkel has proved that is an unreliable method[12] . In order to find and overwrite dirty blocks 2017 quickly and precisely, a novel free space wiping algorithm, WFree, is proposed. It take advantage of a random sampling and hypothesis testing method. The steps of WFree are shown as follows: 1) If (ub0 !=0) do steps 2–7, otherwise do step 8; 2) Divide all blocks of a disk device into several block groups; 3) For every block group do steps 4–7; 4) Take a random sample for block status in current group and calculate the test statistic under a certain significance level; 5) If (the null hypothesis ub≥ub0 is rejected) do step 6)-7); 6) Take a random sample for block content in current group and calculate the test statistic under a certain significance level; 7) If (the null hypothesis nz≥nz0 is rejected) overwrite every dirty block in current group by Op for Ot times; 8) Overwrite every dirty blocks of a disk device by Op for Ot times; 9) Call CleanJ(Ct ) to clean a journal file. “ub” is the rate of unallocated blocks of a block group. “ub0 ” is a threshold of the rate. If “ub0 ” is zero, WFree scan all blocks in order to make sure every dirty block is wiped. Otherwise, it tries to wipe them more efficient. Since the status of all blocks in an Ext4 file system is either allocated or unallocated, we can set the hypothesis “H0 :ub≥ub0 ,H1 :ub>ub0” for every block group. “H0 ” is a null hypothesis and “H1 ” is an alternative hypothesis. According to the following formula, WFree can calculate a test statistic “u” of a block group under a certain significance level “α” (α=0.05): u= X̄ − ub √ σ/ n The variance σ 2 =ub(1-ub) since the distribution of block status obey the 0 − 1 distribution. The refused domain is “u≥uα”. The “X̄” is calculated by the formula “X̄=m/n” while “n” is the number of sampled blocks and “m” is the number of unallocated blocks in a sample. If the null hypothesis is accepted, it means that there is only a few unallocated block in a block group. It is not necessary to handle this block group. Otherwise, WFree continue to estimate the rate of none-zero bytes which a block group has. The calculation is similar with the previous one. The hypothesis are “H0 :nz≤nz0,H1 :nz>nz0”. The “nz” is a rate of none-zero bytes which are contained in a block group and the “nz0 ” is a threshold of the rate. If the null hypothesis is accepted, it means that there is only a few none-zero bytes which remained in a block group. It is not necessary to clean this group. Otherwise, WFree overwrites every dirty block by garbage data “Op ” for “Ot ” times. After dirty blocks are cleaned, CleanJ will Adaptive Data Wiping Scheme with Adjustable Parameters for Ext4 File System be performed at the end of WFree. Since WFree cannot get any information of deleted files, some dirty blocks may be skipped if we set a higher value of “ub0 ” and “nz0 ”. It can give more opportunity to a file carving method to recover deleted files. Therefore, it is better to set a lower value of them to maintain an acceptable security. The value range of “ub0 ” and “nz0 ” are: 0 ≤ ub0 ≤ 20%, 0 ≤ nz0 ≤ 20% IV. Experimental Results In order to verify DWSE, several experiments are performed on an Ubuntu 14.04. Files used in experiment are list in Table 1. Type Number Group 1 TXT 100 Table 1. List of files Group 2 Group 3 DOCX PDF JPG MP3 300 70 200 50 Group 4 ZIP AVI 20 6 As shown in Table 1, “TXT” represent a text file with a plain encoding format. Other types are not plain encoding format. All files are organized as 4 groups. In order to evaluate data remnant of “TXT” files, we fulfilled them by “0x41” in hexadecimal. For maintaining the consistency of every experiment, we created a disk image and copied all files into it. For each file group, we perform a following experiment: 1) Make a copy of the disk image and mount it to a loop device; 2) Wipe a file group in the loop device by DWSE; 3) Unmount the image and analyze it by recovery tools; 4) Same as 1); 5) Delete a file group by “rm” command; 6) Wipe free space of the loop device by DWSE; 7) Same as 3). The parameters of DWSE are listed as follows: Or =10%, 50%, 100%; Op =0; Ot =1; ub0 =1%, 5%, 20%; nz0 =1%, 10%, 20%; Ct =0, 256, 512, 1024. Two recovery tools are WinHEX (version 17.8) and Extundelete (version 0.2.4). We defined several indicators for evaluating data remnant of DWSE: HD1 /HD0 × 100%, other files CR = CT1 /CT0 × 100%, TXT files JR = JR1 /JR0 × 100% JH = JH1 /JH0 × 100%, RR = (2CR + JR + JH)/4 “CR” (Carving recovery rate) and “JR” (Journal recovery rate) represent a recovery rate of WinHEX and Extundelete. “CT0 ” (or “CT1 ”) represents the number of three consecutive “0x41” which can be found in a disk image before (or after) an experiment is performed for all 395 “TXT” files. “HD0 ” (or “HD1 ”) represents the number of file headers before (or after) an experiment is performed for all other files. “JR0 ” is the number of files in a group. “JR1 ” is the number of files recovered by Extundelete after an experiment is performed. “JH” (Journal history) represents a rate of residual file names in a journal file. “JH0 ” (or “JH1 ”) is the number of file names which can be found in a journal file before (or after) an experiment is performed. “RR” (Recovery rate) is a general rate of recovery which can be used to compare DWSE with other wiping schemes. 1. Results of WFile Experimental results of WFile are shown in Fig.2. The x-axis represents various combinations of “Or ” and “Ct ”. The y-axis at left side represents a “RT ” (Run time) of WFile and right side represents a “RR”. Three stacked area charts are represent “CR”, “JR” and “JH”. A “RR” result of “Or =100%, Ct =0” is shown in Fig.5. As shown in Fig.2, a “RT ” increase gradually with the increase of “Or . But a “CR” of different file groups behaves differently. The “CR” of the first file group decreases gradually while other file groups keep zero. That is because WinHEX can easily find the content of “TXT” files in a disk image but hard to recovery files of other types when header and footer of a file are damaged. Although values of “Ct ” are various, a “JR” of all file groups always keep zero. The reason is WFile remove a file not by calling a “remove” function but by two steps mentioned in previous section. Although Extundelete fail to recover files in this situation, it doesn’t mean there is no residual information of files in a journal file. According to “JH”, we can realized that some file names remained in a journal file. A “JH” decrease gradually with an increase of “Ct ”. It means that old records are replaced by new records of temporary files. For the first file group, a “RR” of combination “Or =50%, Ct =1024” is different from the combination “Or =100%, Ct =1024”. But they are the same for other file groups. Therefore, users should wipe all content of a plain encoding file to maintain the security or skip part of a non-plain encoding file to improve the efficiency. By adjusting values of “Or and “Ct ”, data can be wiped in a balanced way between the efficiency and security according to user’s actual needs. 2. Results of WFree Experimental results of WFree are shown in Fig.3. The x-axis represents various combinations of “ub0 ”, “nz0 ” and “Ct ”. The y-axis and 3 stacked area charts represent the same meaning as Fig.2. A “RR” result of “ub0 =1%, nz0 =1%, Ct =0” is shown in Fig.6. As shown in Fig.3, a “RT ” decrease gradually with an increase of “ub0 ” and “nz0 ”. It is because the amount of overwriting data is reduced since WFree skip some dirty blocks under a higher value of “ub0 ” and “nz0 ”. But it 396 Chinese Journal of Electronics 2017 brings an elevated “CR” because those dirty blocks might contain header or footer information of a deleted file. In experiments, we found that a smaller file may have more opportunity to “survive” in a wiping process when the value of “ub0 ” and “nz0 ” are higher than 5%. When two parameters are lower than 5%, almost all dirty blocks are wiped by WFree. It is hard to recover deleted files by a file carving method. Fig. 3. Experimental results of WFree Fig. 2. Experimental results of WFile Unlike the result of WFile, a “JR” of WFree decrease gradually. It is because files are deleted by a system command. Extundelete can recover residual information of a deleted file such as file name, file size, etc. But the content of a deleted file still lost when “ub0 ” and “nz0 ” are set to a lower value. We develop a matlab program to display a disk image as a byte plot before (or after) an experiment of WFree is performed on the fourth file group with the parameter combination “ub0 =1%, nz0 =1%, Ct =0”. The comparison of two byte plots is shown in Fig.4. The left Adaptive Data Wiping Scheme with Adjustable Parameters for Ext4 File System one shows that all files are deleted. The right one shows that dirty blocks are cleaned by WFree. 397 combination of parameter “ub0=1%, nz0 =1%, Ct =1024”. The combination “ub0 =1%, nz0 =1%, Ct =0” is added as a reference like WFile. The comparison is shown in Fig.6. Fig. 4. Comparison of two byte plots As shown in Fig.4, all dirty blocks are wiped by WFree. If there is no sensitive information in a file name, it is hard to obtain any further information. According to above analysis, users can adjust values of “ub0 ”, “nz0 ” and “Ct ” for balancing the efficiency and security of WFree. 3. Comparative results In order to evaluate the security and efficiency of DWSE, we compare it with other two wiping schemes. One is srm[13] , an open source tool for wiping files. We will wipe all files in the Table 1 by “srm” with commandline parameters “-ll -z”. It means that srm wipe files with all 0s for 1 time. Since srm wipe a file in 100% and there is no algorithm similar as CleanJ, we decide to compare it with a combination of parameter “Or =100%, Ct =0”. The combination “Or =100%, Ct =1024” is also added as a reference. The comparison is shown in Fig.5. As shown in Fig.5, the “RT ” and “RR” of “Or =100%, Ct =0” are both lower than srm. It because the wiping process of srm will add an abundance of records related to wiped files into a journal file. The “JH” of srm increase greatly for this reason. On the contrary, WFile can wipe blocks of a file directly according to the block range which can be obtained from an inode. It only adds records of blocks in a journal file instead of records of a file. Therefore, there is no growth of “JH” when “Ct =0”. But a creating information of a file (such as a file name) may possible remain in a journal file. That is why the “RR” of “Or =100%, Ct =0” is not zero. If users want to clean all data remnant, the “Or =100%, Ct =1024” is the best choice. But the disadvantage is a bigger time consuming as shown in Fig.5. Another wiping scheme is sfill[14] , an open source tool for wiping free space. We will delete all files in the Table 1 by rm command and then wipe free space by “sfill” with command-line parameters “-ll -z”. It means that sfill wipe free space with all 0s for 1 time. Since sfill has an algorithm similar as CleanJ, we decide to compare it with a Fig. 5. Comparison of WFile and srm Fig. 6. Comparison of WFree and sfill As shown in Fig.6, a “RT ” and “RR” of “ub0 =1%, nz0 =1%, Ct =1024” are both lower than sfill. It is because sfill wipe free space by creating a giant file and continuous writing garbage data into it until there is no available free space. It will bring a higher “CR” since a garbage file cannot clean all dirty blocks. But WFree can clean almost all of dirty blocks when “ub0” and “nz0 ” are lower than 5%. If “Ct ” is set to its maximum, WFree can clean a journal file as well. That is why the “RR” of WFree is lower than sfill. As shown in Fig.6, the “RR” of the com- Chinese Journal of Electronics 398 bination “ub0 =1%, nz0 =1%, Ct =0” is higher than sfill since a journal file remain unchanged. If there is no sensitive information in a journal file, users can choose this combination for saving more time. V. Conclusion Data wiping is an important branch of information security. A novel data wiping scheme for Ext4 file system is proposed in this paper. It can wipe files or free space adaptively according to user’s selections. Three main advantages of our scheme are: 1) wiping files and free space adaptively to prevent data recovery from a file carving method and a recovery method based on a journal file; 2) cleaning data remnant of free space by a novel wiping algorithm which can improve the speed of cleaning dirty blocks by random sampling and hypothesis testing methods; 3) offering several adjustable parameters for helping users to wipe their data in a balanced way between efficiency and security. The experimental results show our scheme can wipe files or free space in different security and efficiency with different parameters. Moreover, DWSE can achieve higher security and efficiency than other two data wiping schemes. References [1] Jude McColgan, “Tens of thousands of Americans sell themselves online every day”, https://blog.avast.com/2014/07/08/ tens-of-thousands-of-americans-sell-themselves-online-everyday/, 2014-7-8. [2] P. Gutmann, “Secure deletion of data from magnetic and solidstate memory”, Proceedings of the Sixth USENIX Security Symposium, San Jose, CA, USA. Vol.14, 1996. [3] Boneh, Dan and Richard J. Lipton, “A revocable backup system”, Usenix Security, pp.91–96, 1996. [4] Kang, Seung-Hoon, Keun-Young Park and Juho Kim, “Cost effective data wiping methods for mobile phone”, Multimedia Tools and Applications, Vol.71, No.2, pp.643–655, 2014. [5] A. Mathur, M. Cao, S. Bhattacharya, et al., “The new ext4 filesystem: Current status and future plans”, Proceedings of the Linux Symposium, Vol.2, pp.21–33, 2007. [6] Fairbanks and D. Kevin, “An analysis of Ext4 for digital forensics”, Digital Investigation, Vol.9, pp.S118–S130, 2012. [7] D. Kim, J. Park, K. Lee, et al., “Forensic analysis of android phone using ext4 file system journal log”, Future Information Technology, Application, and Service, Vol.1, pp.435–446, 2012. [8] N.E. Case, “Extundelete”, http://extundelete.sourceforge.net, 2014-12-8. 2017 [9] G.G. Richard III and V. Roussev, “Scalpel: A frugal, high performance file carver”, Digital Forensic Research Workshop, New Orleans, LA, USA, 2005. [10] S. Lee and T. Shon, “Improved deleted file recovery technique for Ext2/3 filesystem”, The Journal of Supercomputing, Vol.70, No.1, pp.20–30, 2014. [11] Allison Henderson, “Ext4 secure delete”, http://lwn.net/Articles/462440/, 2014-12-8. [12] S.L. Garfinkel and D.J. Malan, “One big file is not enough: A critical evaluation of the dominant free-space sanitization technique”, Privacy Enhancing Technologies, pp.135–151, 2006. [13] Van Hauser, “srm - Linux man page”, http://manpages.ubuntu. com/manpages/hardy/man1/srm.1.html, 2014-12-8. [14] Van Hauser, “sfill - Linux man page”, http://manpages.ubuntu. com/manpages/hardy/man1/sfill.1.html, 2014-12-8. ZHANG Peng was born in 1980. He is currently pursuing the Ph.D. degree in information security at the School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China. His research interests include secure deletion and software protection. (Email: longbow27@163.com) NIU Shaozhang (corresponding author) was born in 1963, he is a professor of School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China. His research interests include network information security, steganography, steganalysis and digital forensics. (Email: szniu@bupt.edu.cn) Huang Zhenpeng was born in 1988. He is currently pursuing the Master degree in information security at the School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China. (Email: huangzhp@vip.qq.com) Qin Xiaohua was born in 1990. She is currently pursuing the Master degree in information security at the School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China. (Email: qinxiaohua@163.com)