H-SWD: Incorporating Hot Data Identification into Shingled Write Disks Chung-I Lin, Dongchul Park, Weiping He and David H.C. Du Department of Computer Science and Engineering University of Minnesota–Twin Cities Minneapolis, MN 55455, USA Email: {chungli, park, weihe, du}@cs.umn.edu Abstract—Shingled write disk (SWD) is a magnetic hard disk drive that adopts the shingled magnetic recording (SMR) technology to overcome the areal density limit faced in conventional hard disk drives (HDDs). The SMR design enables SWDs to achieve two to three times higher areal density than the HDDs can reach, but it also makes SWDs unable to support random writes/in-place updates with no performance penalty. In particular, a SWD needs to concern about the random write/update interference, which indicates writing to one track overwrites the data previously stored on the subsequent tracks. Some research has been proposed to serve random write/update out-of-place to alleviate the performance degradation at the cost of bringing in the concept of garbage collection. However, none of these studies investigate SWDs based on the garbage collection performance. In this paper, we propose a SWD design called Hot data identification-based Shingled Write Disk (H-SWD). The H-SWD adopts a window-based hot data identification to effectively manage data in the hot bands and the cold bands such that it can significantly reduce the garbage collection overhead while preventing the random write/update interference. The experimental results with various realistic workloads demonstrates that H-SWD outperforms the Indirection System. Specifically, incorporating a simple hot data identification empowers the H-SWD design to remarkably improve garbage collection performance. Keywords-Shingled Magnetic Recording, SMR, Shingled Write Disk, SWD, Garbage collection, H-SWD. I. Introduction Digital data has grown dramatically from 281 EB (an exbytes equals 109 GB) in 2007 to 1.8 ZB (a zettabyte equals 1012 GB) in 2011 [1]. This digital data explosion has stimulated the research on continuously increasing hard disk drives (HDDs) capacity. As a result, HDDs capacity has steadily increased over the past few decades. However, magnetic HDDs face areal density upper limit due to the super-paramagnetic effect [2], which is an critical challenge in magnetic recording. The super-paramagnetic effect indicates that magnetic grain volume should be large enough to satisfy the media signal-to-noise ratio, write-ability, and thermal stability properties, which are the main reasons that conventional magnetic recording fails to extend areal density more than 1 Tb/in2 [3]. HDDs are the most popular storage devices due to their large capacity and low cost. Although NAND flash-based Solid State Drives (SSDs) have attracted considerable attentions for their ability to replace HDDs in many applications [4], [5], the price is still five to ten times more expensive than HDDs and grows enormously as storage capacity increases. In addition, the digital data explosion demands a huge amount of storage space. These facts ensure that HDDs still retain their own merits against SSDs and serve as an important component in diverse applications. Consequently, the storage capacity of HDDs has grown continuously with the help of advanced technologies, but they now face the areal density upper limit due to the super-paramagnetic effect [6]. To overcome HDD space limitation, Microwave Assisted Magnetic Recording (MAMR) [7], Heat Assisted Magnetic Recording (HAMR) [8], [9], [10], Bit-Patterned Media Recording (BPR) [11], [12] and Shingled Magnetic Recording (SMR) [13], [14], [15] have been proposed. Out of these advanced magnetic recording technologies, SMR is the most promising method as it does not require a significant change of the existing magnetic recording HDD makeup and write head [16]. In particular, SMR breaks the tie between write head width and track width to increase track density with no impact on cost and written bits stability. A Shingled Write Disk (SWD) is a magnetic hard disk drive that adopts SMR technology so that it also inherits the most critical issue of SMR: writing to a track affects the data previously stored on the shingled tracks. This issue results from its wider write head and the partially overlapped track layout design. Furthermore, it makes SWDs support only the archive/backup systems. However, users might still be interested in purchasing SWDs for general applications if they prefer to have much larger capacity. As a result, to both maximize SWD benefits and broaden SWD applicability, it is mandatory to have a proper treatment for random write/update issue to protect data integrity. One naive method is the readmodify-write operation. As discussed in [17], this method significantly degrades SWD performance in that every random writes/updates might incur multiple read and write operations. Therefore, the challenging issue is to propose a SWD design that can support the random writes/updates under the limitation of SMR distinct write behavior (i.e., it overwrites subsequent shingled tracks). To mask possible performance degradation due to the random writes/updates, current SWD designs typically adopt outof-place updates [17], [6]. Specifically, when an update is issued, the SWD invalidates the old data blocks and writes the updated data to other clean (i.e., free) blocks. Although doing so prevents in-place update performance impacts, it requires an address mapping table to store the physical block address (PBA) and logical block address (LBA) mapping information. In particular, the out-of-place update method generates numerous invalid data blocks on the SWD and reduces the number of clean blocks over time. Therefore, we need a special reclamation mechanism (so-called, garbage collection) to reuse the invalid block. Moreover, none of the existing works considers garbage collection (GC, hereafter) efficiency even though it is critical to the overall SWD performance. In this paper, we propose a SWD design called Hot data identification-based Shingled Write Disk (H-SWD) to both mask the random writes/updates performance penalty and improve the garbage collection efficiency. We utilize a windowbased hot data identification to distribute the data to hot bands or cold bands accordingly. The main contributions of this paper are as follows: • An effective data placement policy: Effective incoming data placement can have considerable impact on the SWD performance. Our design is inspired by this intuition. HSWD employs a hot and cold data identification algorithm to effectively distribute incoming data to an SWD. Thus, it remarkably reduces valid data block movement in hot bands. • New garbage collection algorithms: H-SWD utilizes our hot data identification scheme for its GC algorithm and retains three basic GC policies. Unlike other typical GC algorithms, H-SWD tries to reclaim not only invalid data blocks but also valid blocks. Furthermore, we found that there is a sharing effect among different regions of a SWD. By employing sharing regions to serve incoming requests, we can dynamically choose the best GC candidates with the help of our dynamic GC algorithm. • A hot data identification for SWD: We incorporate a simple window-based hot data identification scheme to record information of past requests for SWD. Both our data placement policy and GC algorithm use this hot data identification scheme. The remainder of this paper is organized as follows. Section II gives an overview of the magnetic recording technologies and the existing shingled write disk designs. Section III explains the design and operations of our proposed H-SWD design. Section IV provides a variety of our experimental results and analyses. Section V concludes the discussion. II. Background and Related Work In this section, we give an overview and discuss issues related to SMR. We also review the Indirection System, which is proposed by Hitachi Global Storage Technologies [17]. A. Shingled Magnetic Recording Shingled Magnetic Recording (SMR) can provide two to three times higher areal density than traditional HDDs without dramatically changing the makeup and the write head of current HDD designs. As revealed in [17], [16], [6], SMR adopts a shingled track design to increase the areal density. track N track N+1 Trailing Shield track N+2 track N+3 Main Pole Side Shield Track width Write Direction write track Fig. 1. Shingled writes. Shingled writes overwrite subsequent tracks [16]. Figure 1 [16] demonstrates that writing data to N+1-th track overwrites data stored on N+2-th and N+3-th tracks at the same time because the write head covers multiple tracks. For simplicity, the special writing behavior can be referred as write interference, which is the main reason that makes SMR performance suffer from random writes/in-place updates. A shingled write disk (SWD) integrates traditional HDDs with the SMR technology. As a result, it inherits the SMR characteristics including the write interference, which limits the applicability of SWDs. A naive method, which has been discussed by Yuval et al. in [17], is read-modify-write operation under a multiple shingled regions layout. Although the method can prevent the write interference; however, it introduces a possibly higher performance overhead and creates a trade-off between performance impacts and space overhead [17], which might worth further investigations. Rather than serving random writes/updates in-place, out-of-place method has been proposed [17], [16], [6]. With the insightful design, which is analogous to SSD designs, not only the write interference can be prevented but also performance impact can be reduced. However, the design requires a mapping table to store the mapping between physical block addresses (PBAs) and logical block addresses (LBAs). A GC algorithm is necessary since out-of-place methods create invalid blocks. Furthermore, most of the existing SWD layouts adopt a circular log structure to manage each region [17], [6]. Amer et al. [16] explored general issues in designing shingled write disks and focused on how to integrate SWD into storage systems and applications. In their paper, a SWD is divided into log access zones (LAZs) and random access zones (RAZs). A LAZ contains shingled tracks while a RAZ contains single tracks. In particular, RAZs store metadata while LAZs store other data. Zones are separated by interband gaps. Furthermore, an intra-band gap is maintained in each zone. The authors also provided a high-level description about hot/cold data concepts along with hierarchical circular logs. However, they do not provide concrete GC algorithms. B. Indirection System III. H-SWD Design This section describes our hot data identification-based shingled write disk (H-SWD) design. A. The block layout Since shingled write disks (SWDs) cannot support random writes/updates in-place efficiently in its own hardware level, a data block layout in the system software level becomes necessary. There are a few possible SWD block layouts [17], [16], [6] including non-shingled regions (RAZs), circular log, Fig. 2. The block layout of the H-SWD. To support random writes/updates, the H-SWD adopts a circular log layout [16]. Address Mapping Garbage Collection Algorithm Hot Data Identification H-SWD ... Section 1 Cold Band Cold Band Tail Head Hot Band Yuval et al. [17] proposed two Indirection Systems. We mainly review their second design as it provides more smooth throughput. For simplicity, we refer to their second design as the Indirection System hereafter. The Indirection System [17] serves random writes/updates out-of-place. It divides the entire disk space into multiple sections. Each section is in charge of a fixed range of consecutive LBAs and contains a cache buffer and a S-Block buffer, which are separated by an inter-band gap (to prevent write interference between buffers). Cache buffers and S-Block buffers all use the storage space of a SWD. Each buffer is managed as a circular log with a head and a tail pointers. The distance from head to tail is defined as the number of PBAs that will be affected by the SWD write interference. After briefly discussing the Indirection System block layout, we review their data placement and GC algorithms as follows. • Data placement: The Indirection System always stores an incoming write to the head of a cache buffer. No data can be written to any S-Block buffer directly. When a cache buffer is filled up with valid blocks (i.e., contains no invalid block), the Indirection System will migrate a full S-Block to a S-Block buffer. In particular, if a S-Block contains enough invalid blocks, it will be updated with the cached data and be moved to the head of the S-Block buffer. Certainly, the associated data in the cache buffer will be invalidated accordingly. • Garbage collection algorithms: The Indirection System proposed three GC algorithms: the cache buffer defrag, the group destage and the S-Block buffer defrag. One of the three GC algorithms will be invoked depending on the utilization of a cache buffer or a S-Block buffer. The details of the Indirection System GC algorithms are in their paper [17]. Furthermore, a circular log is maintained for each buffer. If the tail points to a valid block, a GC operation will move the pointed block to the head position. Otherwise, the tail-pointed block is invalid and can simply be freed without any explicit performance overhead. The Indirection System provided a detailed SWD design including the block layout, the data placement policy and the GC algorithms. However, due to its data placement and GC algorithms, it suffers from significant GC overhead and affects overall SWD performance. Based on this observation, we proposed a SWD design, H-SWD, to improve SWD performance by incorporating a hot data identification for data placement and GC algorithms. Intra-band Gap ... Section 2 ... Inter-band Gap ... Section N Fig. 3. Logical view of H-SWD architecture. Each section works independently, which means it can be treated as multiple independent drives. and flexible regions. Out of the existing block layout designs, we simply adopt a circular log block layout, which is identical to what has been proposed in [17], [16], [6]. Figure 2(a) gives a SWD block layout with a circular log while Figure 2(b) displays a logical view of the block layout with a circular log for convenience. A region contains multiple shingled tracks, which are separated by intra/inter-band gaps to prevent the write interference. Two pointers, head and tail, are maintained in each band. Furthermore, any incoming data is assigned to the head, while removing data for a GC operation is launched at the tail. Specifically, H-SWD uses fixed size bands and the number of bands is configurable based on the entire SWD disk space as well as the ratio of hot bands to cold bands. B. Architecture Figure 3 gives an overview of our H-SWD design. H-SWD divides the entire disk space (including over-provisions) into hot bands and cold bands. Each band consists of multiple data tracks. We initially assign 1% over-provision to the hot bands and the whole SWD storage capacity is served as cold bands (identical to the Indirection System). A hot band can be associated with multiple cold bands and this unit is called a section. Similar to the Indirection System, a full range of LBAs in the disk is divided into the number of sections in the H-SWD and each section can be independently managed. H-SWD maintains two types of guard gaps, inter-band gap and intra-band gap, to meet the requirement of write interference: each band is separated by a specific number of tracks (called a guard gap), as proposed in [16], [6]. As a result, writing to the last track in a band does not dilapidate the data in the first track of the neighboring band. Both hot bands and cold bands are logically managed as a circular log with a tail and a head pointer. In addition, both pointers move forward accordingly while satisfying the head-tail proximity (i.e., block interference profile) requirement [17]. Each hot band is associated with its corresponding multiple cold bands. The main idea of H-SWD is to classify incoming data into hot data and cold data by employing a hot data identification scheme. Thus, a hot data identification system is one of our main components in the architecture. Unlike the Indirection System in which all incoming data must be stored in the cache buffer, our H-SWD first classifies the data and assigns them to hot/cold bands accordingly. Furthermore, since we adopt an out-of-place update design, we need to map LBAs to PBAs in SWDs. We assume block-level (i.e., sector-level) mapping and the mapping table can reside in SWD or Non-Volatile Memory (NVRAM). Lastly, GC algorithms are very important components in a SWD design because SWD is required to effectively reclaim invalidated data blocks to accommodate incoming writes. This GC overhead severely affects the overall performance. Algorithm 1 An Intra-Band GC Algorithm Function Intra-Band GC() 1: if (Block utilization ≥ GC THRESHOLD ) then 2: while Free space < RECLAIM THRESHOLD do 3: if (Valid(tail)) then 4: Move the data to the head 5: head = head + 1 6: else 7: Free the block in the tail. 8: Increase the Free space. 9: end if 10: tail = tail + 1 11: end while 12: end if Algorithm 2 A Normal GC Algorithm Function Normal GC() C. System Algorithms 1: if (Block utilization ≥ GC THRESHOLD ) then 2: while Free space < RECLAIM THRESHOLD do 3: if (Valid(tail)) then 4: if (IsCold(tail)) then 5: Move the data to a cold band. 6: Free the block. 7: Increase the Free space. 8: else 9: Move the data to the head. 10: head = head + 1 11: end if 12: else 13: Free the block in the tail. 14: Increase the Free space. 15: end if 16: tail = tail + 1 17: end while 18: end if • Hot data identification: A hot data identification is adopted by our GC algorithm as well as the data placement policy in H-SWD. Since the definition of hot data can be different for different applications, we need to design our own hot and cold data identification system instead of adopting existing algorithms. Our scheme employs an window concept. This window stores past LBA requests and can represent temporal locality. If any data access displays high temporal locality, it is highly possible that the data will be accessed again in the near future. If any incoming data appears in this window at least once, we define the data as hot; otherwise, they are classified as cold data. This static scheme can be replaced with a dynamic scheme which dynamically changes the hot data definition based on the workload characteristics. The dynamic scheme may be able to improve our H-SWD performance; however, it definitely requires more resources for its learning process. • Data placement: One of our key ideas lies in the data placement policy. Unlike other SWD designs, on receiving a write request, the H-SWD first makes an attempt to classify the incoming data into hot or cold data. If the data is identified as hot (by using hot data appearance in the window), it is stored at the head position a hot band. Otherwise, it is assigned to the head in a cold band. The main objective of our incoming data placement policy is to collect these frequently reaccessed data (hot data) into the same band to increase the GC efficiency. Thus, many of the data in the hot bands are expected to be accessed again soon, which produces a higher number of invalid data blocks in the hot bands. As a result, the number of valid data block movements from the tail position to the head position can be significantly reduced during the GC process. Consequently, this simple and smart policy can dramatically improve the GC performance. • Garbage collection algorithms: A GC algorithm is important to the shingled magnetic recording (SMR) technology. This reclamation process is mostly executed on hot bands due to frequent data updates and limited space. H-SWD adopts our hot data identification for GC and applies a different policy to hot bands and cold bands respectively. Unlike typical GC algorithms in other designs, H-SWD tries to reclaim not only the invalid blocks but also the valid blocks if they become cold. This special policy can help H-SWD to reduce unnecessary cold block movements. The H-SWD maintains three basic GC policies: an intraband GC, normal GC, and forced GC. These fundamental polices are exploited by both hot bands and cold bands. The intra-band GC reads a block in the tail position of a band and checks if it is valid (i.e., live) or invalid (i.e., outdated). If it is a valid block, it is moved to the head in the band. Otherwise, it is freed. Lastly, both pointers (head and tail) move forward by one accordingly (Algorithm 1). The normal GC is a fundamental GC policy in the H-SWD design. This includes the aforementioned intra-band GC and adds one more policy: it tries to reclaim valid data blocks. In other words, when a tail pointer reaches a valid block, the H-SWD checks whether the corresponding data is hot or not. If the data is identified as hot, it moves the block to the head and both the head and tail proceed to the next position (intra-band GC). Algorithm 3 A Forced GC Algorithm Function Forced GC() 1: while Free space < RECLAIM THRESHOLD do 2: if (Valid(tail)) then 3: Move the data to a cold band. 4: end if 5: Free the block. 6: Increase the Free space. 7: tail = tail + 1 8: end while Algorithm 4 A GC Algorithm for Hot Bands Function Hot Band GC() 1: GC THRESHOLD = 80% 2: DELAY GC THRESHOLD = 95% 3: RECLAIM THRESHOLD = 10% 4: if (Block utilization ≥ GC THRESHOLD ) then 5: if (Invalid blocks ≥ RECLAIM THRESHOLD ) then 6: while Free space < RECLAIM THRESHOLD do 7: Normal GC() 8: end while 9: else 10: if (Block utilization ≥ DELAY GC THRESHOLD ) then 11: while Free space < RECLAIM THRESHOLD do 12: if (Invalid blocks > 0) then 13: Normal GC() 14: else 15: Forced GC() 16: end if 17: end while 18: end if 19: end if 20: end if amount of free space after a GC operation. Otherwise, it cannot guarantee that the H-SWD will obtain the predefined amount of free space since all valid data blocks still can be defined as hot data during the GC process. Even though the H-SWD could achieve a required amount of free space by executing the normal GC, it would necessarily cause it to move some valid blocks from the hot band to the cold band. Therefore, in this case, the H-SWD delays this hot band GC even if the hot band block utilization reaches or exceeds the utilization limit since some of the hot blocks in the hot bands are likely to be updated soon (This would increase the number of invalid blocks in the band). If the hot blocks finally cannot meet the requirement (in the worst case), it triggers the forced GC process. Unlike the Indirection System, our GC algorithm obtains multiple (initially 10% of a band size) free blocks whenever a GC is invoked. Unlike the hot bands, cold bands have enough spaces to accommodate the incoming data and contain (possibly) cold data; thus, in most cases, they will not frequently invoke a GC. As a result, the cold bands perform intra-band GC on cold bands if necessary. IV. Experimental Results This section provides diverse experimental results and comparative analyses. A. Evaluation Setup Otherwise, it moves the block from the hot band to the cold band and frees the pointed data (Algorithm 2). The forced GC is a more aggressive GC policy, which will be invoked only when the H-SWD cannot prepare a specified amount of free space even after the normal GC processes. For instance, when a band does not contain enough invalid blocks and all (or almost all) valid blocks retain hot data, the forced GC is triggered as follows: the H-SWD chooses a required number of blocks starting from the head and migrates them to a cold band whether they are hot data or not (Algorithm 3). However, since this is an extreme case, the H-SWD very rarely encounters such a chance to invoke this GC in most realistic environments. In fact, we did not observe this situation in any of our experiments with diverse realistic workloads. Based on these basic GC policies, the GC for hot bands invokes a normal GC and a forced GC algorithm. The hot bands maintain both a high watermark and low watermark indicating when to invoke and stop the algorithm respectively. Overall, the GC policy is as follows: if a hot band block utilization reaches 80%, the H-SWD starts to reclaim invalid blocks by triggering a normal GC algorithm until it provides a specific amount of free space (initially 10%). The block utilization of a band is defined as (Valid block # + Invalid block #) / (Valid block # + Invalid block # + Free block #). Algorithm 4 describes our main GC policy in detail for a hot band. When a block utilization of a hot band reaches 80%, the H-SWD invokes this GC algorithm. If the hot band retains enough invalid blocks (10%), the H-SWD performs a normal GC because it guarantees that the H-SWD can obtain a predefined We choose the Financial-1 and three Microsoft research (MSR) traces including prxy volume 0, rsrch volume 0 and web volume 0. The Financial-1 trace is an On-line Transaction Processing (OLTP) application trace collected at the University of Massachusetts at Amherst Storage Repository [18]. The MSR traces were collected by the Microsoft Research Cambridge Lab [19], where each trace demonstrates different workload characteristics. In particular, we consider write requests since only write requests might increase the number of invalid blocks and trigger a GC operation. Furthermore, if a request size is greater than 1, we will split the request into multiple requests of size 1, as adopted in [20], [21]. For example, if there is a request writing 7 blocks, we will split the request into 7 requests, each of which writes distinct block. The Table I shows that the update ratio of the first 1,800K write requests in each trace is higher than 55% (each request has been split into size of one block). TABLE I Workload Characteristics (K = 1,000 Requests) Traces Number of writes Write size (# of sectors) Update % Financial1 1,800K 1,800K 55.4% Prxy0 1,800K 1,800K 85.6% Rsrch0 1,800K 1,800K 84.8% Web0 1,800K 1,800K 61.1% The Indirection System and the H-SWD simulators have been implemented. The underlying storage capacity contains 108 blocks (A block equals 512 bytes), which equals 51.2 GB. Although 51.2 GB does not represent the real SWD capacity (since current HDDs can already provide 3 TB storage space); however, the simulations still proportionally demonstrate a The window size for the hot data identification is 4,096 (4K) initially. The H-SWD requires this window to store the past logical block address (LBA) information (only LBA will be recorded). The first 4,096 requests are considered as hot since the window does not contain enough LBA information for identification. Then, if an incoming request appears in the window at least once, it will be identified as hot. Otherwise, it is cold. After the request has been identified, the oldest LBA will be removed from and the latest LBA will be stored to the window accordingly. More importantly, we need to consider GC parameters including when to start (the starting threshold, hereafter) or when to stop (the stopping threshold, hereafter) a GC operation. We first vary the starting threshold of band utilization from 70% to 90% and fix the stopping threshold at 10%, which means a GC operation will stop after it free 10% of band size. We then vary the stopping from 10% to 30% and fix the starting threshold at 80%. In this simulation, the HSWD divides the underlying storage space into 20 sections equally. Since the simulation results of other traces show very similar patterns, we only demonstrate the Financial-1 trace simulation results (see Figure 4. We can observe that there is no significant difference in block movements under different starting thresholds. In particular, adjusting the starting threshold only changes the occurrence of the block movement peaks. We assume that the H-SWD contains 20 sections. To observe the impact on the GC overhead due to different starting threshold, we vary the starting threshold from 70% to 90% while fixing the stopping threshold as 10%. Figure 4 demonstrates the block movements of the four traces under various GC starting thresholds setup. As shown in Figure 4(a), adjusting the starting threshold only changes the occurrence of the block movement peaks (peak, hereafter) since GC operations are initiated at different occasions. Figure 4(b) shows the block movements of Financial-1 trace under various GC stopping thresholds. We observed that when the stopping threshold increases, the peaks is higher while the number of the peaks reduce, which fits the intuition. Furthermore, from the SWD performance point of view, to flatten the peak is preferred since a higher peak implies a longer delay a SWD needs to wait for GC operations. 40 35 30 25 20 15 10 5 0 70% 80% 90% 0 2 4 6 8 10 12 Write Requests (Unit:1,500K) Block Movement (Unit:10K) Block Movement (Unit:10K) SWD performance under various traces considering the ratio of the total write size to the underlying storage capacity. The same 2% over-provisions are provided in the H-SWD design and distributed to the hot bands and cold bands equally (the assignment is identical to the Indirection System). A S-block contains 2,000 blocks, as simulated in [17]. Furthermore, we also maintain the intra-band gaps and inter-band gaps for guarding purpose as proposed in [16], [6]. Therefore, the block layout of the H-SWD matches that of the Indirection System, which saves us from comparing the space overhead between the Indirection System and our design. For each band, one track is assumed to serve as an intra-band gap (a track contains 500 sectors). 40 35 30 25 20 15 10 5 0 10% 20% 30% 0 2 4 6 8 10 12 Write Requests (Unit:1,500K) (a) Varying Starting GC Thresholds (b) Varying Stopping GC Thresholds Fig. 4. The Number of Block Movement under Various Starting and Stopping GC Thresholds of Financial-1 Trace. B. Performance Metrics The GC performance is very important to the overall system performance. We measure the number of block movements. A block movement means that a valid block is moved a different position in a SWD due to a GC operation, which requires a read, a seek, and a write to finish. Furthermore, if a SWD design requires more block movements than other designs, it tends to suffer from performance degradation due to the GC overhead. In this respect, the number of block movement can be used as a good interpretation for the overall SWD performance as well as the GC overhead. To further understand the GC efficiency, we periodically measure the valid block ratio every 900K write requests (K equals 1,000). We compute a valid block ratio as follows: Valid block # / (Valid block # + Invalid block #). The hot false identification counts and cold false identification counts are used to relate the window-based hot data identification prediction accuracy to the H-SWD performance. The false identification count means that there is a discrepancy between the previous and the current identifications. Specifically, hot data false identification indicates that a LBA is hot in previous identification while it becomes cold in current identification. In this case, we view previous identification result as a hot false identification. Cold false identification count is obtained in the same way. C. Results and Analysis We discuss our experimental results in diverse aspects. • Indirection System Performance: We first investigate the Indirection System performance by varying the number of sections. Each section equally shares the underlying storage capacity (51.2 GB) and 2% over-provisions (1% for cache buffers and 1% for S-Block buffers). For simplicity, each section contains a cache buffer and a S-Block buffer. Therefore, the more sections we simulate, the smaller space a section can have. Furthermore, the underlying storage is initiated as empty. The largest-number-of-cached-blocks policy is adopted for group destage, which is proposed in [17]. The largestnumber-of-cached-blocks policy will migrate a S-Block that contains the largest number of cached blocks from a cache buffer to a S-Block buffer. Figure 5(a) demonstrates that the Financial-1 trace presents similar GC overhead under various section configurations. Figures 5(b), 5(c) and 5(d) demonstrate that the overall block 1e+09 1e+09 1e+08 1e+08 1e+07 1e+07 5 Sec 10 Sec 15 Sec 20 Sec 25 Sec 1e+06 100000 0 5 10 5 Sec 10 Sec 15 Sec 20 Sec 25 Sec 1e+06 100000 15 20 0 5 Block Movement (a) Financial 1 1e+09 1e+08 1e+08 100000 0 5 10 (c) Rsrch 0 Block Movement 1e+06 100000 100000 10000 100 0 5 10 1000 100 15 20 0 5 (a) Financial 1 1e+09 1e+08 1e+08 1e+07 15 20 15 20 1e+07 IS HSWD 1e+06 IS HSWD 1e+06 100000 100000 10000 10000 20 10 (b) Prxy 0 1e+09 1000 0 15 IS HSWD 10000 IS HSWD 1000 5 Sec 10 Sec 15 Sec 20 Sec 25 Sec 100000 Write Requests (Unit:900K) 1e+07 1e+06 5 10 15 20 0 Write Requests (Unit:900K) 5 10 Write Requests (Unit:900K) 20 (d) Web 0 Fig. 6. The Number of Block Movements of Indirection System and H-SWD under Various Traces. 1e+06 15 1e+08 1e+07 (c) Rsrch 0 1e+07 5 Sec 10 Sec 15 Sec 20 Sec 25 Sec 1e+06 1e+09 1e+08 (b) Prxy 0 1e+09 1e+07 10 1e+09 1000 Block Movement Block Movement movements reduce after we increase the number of sections. The experiment results prove that a small section size can give an upper bound to the GC operation. However, we also observed that the block movements increase when the number of sections increases to 25 sections. Such pattern occurs because the size of cache buffer becomes too small to buffer for the sections that are write-intensive. 0 5 10 15 20 Write Requests (Unit:900K) (d) Web 0 cache buffer and only moves data from a cache buffer to a Sblock buffer when a cache buffer is full of valid blocks. some cold data remains in cache buffers and continues to affect the overall GC efficiency. 1.2 1.2 1 1 0.8 IS HSWD 0.8 IS HSWD 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 5 10 15 20 0 5 (a) Financial 1 1.2 Valid Block Ratios • H-SWD vs. Indirection System: We compare our HSWD and the Indirection System. We use 20 sections as the Indirection System layout since it provides a better GC performance. For fairness, the H-SWD utilizes the same sections configuration (i.e., 20 sections). Each section contains a pair of a hot band and a cold band, whose size are identical to the cache buffer size and S-Block size respectively. We set 4,096 as the window size. Furthermore, a GC operation starts when a band reaches 80% utilization and stops after it frees 10% band size. Figure 6 demonstrates that our H-SWD significantly reduces the required block movements due to GC operations. Because H-SWD selectively store data to hot bands and cold bands based on hot data identification result, data in hot bands is prone to be updated frequently. Although grouping hot data into hot bands increases the GC frequency; however, H-SWD does not introduce higher GC overhead since a SWD does not require explicit operation to free an invalid block. Figure 7 illustrates the valid block ratio of the Indirection System cache buffers and H-SWD hot bands. We found that the Indirection System has higher valid block ratio than the HSWD. Although the higher valid block ratio helps the storage utilization; however, it also critically impacts the GC efficiency given the fact that every valid block pointed by tail pointer costs one block movement when performing a GC operation. As shown in Figure 6 and 7, we can conclude that as valid block ratio in cache buffer grows, total block movements of the Indirection System increase dramatically. Because the Indirection System always stores any incoming request to a Valid Block Ratios Fig. 5. The Number of Block Movements of Indirection System under Various Sections and Traces. 1.2 IS HSWD 1 10 15 20 15 20 (b) Prxy 0 IS HSWD 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 5 10 15 Write Requests (Unit:900K) (c) Rsrch 0 20 0 5 10 Write Requests (Unit:900K) (d) Web 0 Fig. 7. Valid Block Ratio in Indirection System Cache Buffer and H-SWD Hot Bands under Various Traces. Figure 8 illustrates the valid block ratios of Indirection System S-Block buffers and H-SWD cold bands. At the beginning, both H-SWD and Indirection System provide high valid block ratios. However, valid block ratio of Indirection System decreases at faster speed and eventually provides less valid block ratio than H-SWD. Note that, the reason behind the decreasing rate of both designs is not the same. As for Indirection System, the decreasing rate results from group destage selection which only considers S-Block filling ratio without using hot/cold data concept. Thus, some SBlocks being moved to S-Block buffer might still experience frequent updates, which lead valid block ratio to decrease. As 350000 35000 2K 4K 8K 16K 300000 Block Movement for H-SWD, the decreasing trend is due to cold data false identification. Nevertheless, to perform a GC operation on a band with low valid block ratio is preferred (even though GC frequency might be increased) since invalid block can be easily be freed without performance penalty. 250000 25000 200000 20000 150000 15000 100000 10000 50000 5000 0 1 IS HSWD 0.8 0.6 0.4 0.4 0.2 0.2 0 5 10 15 20 0 5 (a) Financial 1 Valid Block Ratios 1 15 1 0.6 0.4 0.4 0.2 0.2 0 5 10 15 Write Requests (Unit:900K) (c) Rsrch 0 20 5 10 15 20 15 20 (b) Prxy 0 25000 2K 4K 8K 16K 25000 20000 20000 15000 15000 10000 10000 5000 0 0 0 5 10 15 20 0 Write Requests (Unit:900K) 5 10 Write Requests (Unit:900K) (c) Rsrch 0 (d) Web 0 Fig. 9. The Number of Block Movements of H-SWD under Various Window Size and Traces. 0 0 0 30000 IS HSWD 0.8 0.6 20 5000 20 (b) Prxy 0 IS HSWD 0.8 10 15 2K 4K 8K 16K 30000 0 0 10 35000 Block Movement 0.6 5 (a) Financial 1 IS HSWD 0 5 10 15 20 Write Requests (Unit:900K) (d) Web 0 Fig. 8. Valid Block Ratio in Indirection System S-block Buffer and H-SWD Cold Band Under Various Traces. • Impact of Window Size: Another important factor to investigate is the window size of hot data identification. Figure 9 shows that the number of block movements of the HSWD under various window size configurations and traces. As shown in Figure 9, the number of block movement grows as we increase the window size. Intuitively, a larger window size implies more requests will be considered as hot. Since the number of block movements under the current simulation setup is dominated by hot band GC operations, the H-SWD will introduce a higher GC overhead under larger window size configuration. Therefore, to better understand the impact due to different window size, we make use of hot data false identification count and cold data false identification count. Figure 10 and Figure 11 plot hot data and cold false identification counts respectively. In Prxy 0 and Rsrch 0 traces, both hot data and cold data identification counts reduce as we increase the window size. However, in Web 0 trace, the window size of 4K provides a better identification accuracy than the other configurations. This is because Web 0 trace contains more random write requests than the other traces. In particular, only slight difference can be observed in Financial 1 trace. More importantly, we can observe hot data and cold data false identification peaks since there are transitions in these traces (accessing different LBA range). From Figure 9 and Figure 10, we observed that a hot data false identification peak usually causes a hot block movement peak in the later write request unit. This is because our HSWD GC algorithm starts to clean old hot data (now becomes cold due to access transition) from hot bands to cold bands. Based on this observation, we conclude that when cold band GC starts to dominate block movements, a larger window size Hot False Identification Counts 0.8 0 0 160000 140000 120000 100000 80000 60000 40000 20000 0 80000 70000 60000 50000 40000 30000 20000 10000 0 2K 4K 8K 16K 0 5 10 15 20 2K 4K 8K 16K 0 5 (a) Financial 1 Hot False Identification Counts Valid Block Ratios 1 2K 4K 8K 16K 30000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 80000 70000 60000 50000 40000 30000 20000 10000 0 2K 4K 8K 16K 0 5 10 15 Write Requests (Unit:900K) (c) Rsrch 0 10 15 20 15 20 (b) Prxy 0 20 2K 4K 8K 16K 0 5 10 Write Requests (Unit:900K) (d) Web 0 Fig. 10. Hot Data False Identification Counts under Various Window Size and Traces. setup gives better overall SWD performance since a better identification accuracy can be provided (except for traces containing frequent write access transitions). • Sharing Effect: Recall that, each section is in charge of a fixed range of LBAs so that each section can be viewed as an independent device. This design may be able to have a skewed data distribution to each section unless a mapping scheme has a judicious load balancing algorithm. Based on this observation, it might be beneficial if we can break the association between hot bands and cold bands in each section as such layout is more flexible than original section layout. Therefore, we group hot bands into a hot band pool and cold bands into a cold band pool. As a result, there is no section concept in this band sharing layout although this design still needs to cope with the physical layout. For simplicity, we refer this band sharing layout H-SWD to as pool-based H-SWD. With the help of this band sharing layout, we only need Cold False Identification Counts 160000 140000 120000 100000 80000 60000 40000 20000 0 80000 70000 60000 50000 40000 30000 20000 10000 0 2K 4K 8K 16K 0 5 10 15 20 0 5 Cold False Identification Counts (a) Financial 1 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 5 10 80000 70000 60000 50000 40000 30000 20000 10000 0 15 Write Requests (Unit:900K) (c) Rsrch 0 10 15 20 (b) Prxy 0 2K 4K 8K 16K 0 Algorithm 5 A Pool-based GC Algorithm Function Pool-based GC() 2K 4K 8K 16K 20 1: 2: 3: 4: 5: 6: 7: 8: 9: tempi = taili , where i is band ID while Free space < RECLAIM THRESHOLD do Choose one band with the smallest distance value (Di ) Free the (Di + 1) numbers of blocks in Bandi from the taili Free space = Free space + (Di + 1) Move taili towards (Di + 1) position Update the distance (Di ) value accordingly end while taili = tempi 2K 4K 8K 16K 0 5 10 15 20 Write Requests (Unit:900K) (d) Web 0 Fig. 11. Cold Data False Identification Counts under Various Window Size and Traces. to switch the write head to the next band when the current band is full while in section-based layout the incoming requests might belong to different sections (Recall that, we only consider write requests). Furthermore, the pool-based H-SWD enables us to design a more intelligent GC algorithm that can choose the best GC candidate and further reduce the required block movement. We then briefly discuss our insightful GC algorithm particularly used for the pool-based H-SWD. Algorithm 5 describes the overall processes of this GC algorithm. The new GC algorithm minimizes the required valid block movement by carefully choosing a victim band. To make the best candidate decision, we use a distance of the first invalid block from its tail position in a band. The overall process is as follows: we assume there is a predefined number (assuming M) of free blocks for cleaning (i.e., free space requirement). Suppose that the hot band GC is invoked, and our pool-based H-SWD first chooses one band with the smallest distance from the tail to its first invalid block, and then it cleans all the consecutive blocks between them to other hot or cold bands accordingly. Assuming the distance is K and the number of hot blocks within this range is H, this cleaning process produces K − H + 1 numbers of free blocks in the hot band pool and also creates K − H + 1 valid blocks in the cold band pool. The cold band pool GC is analogous to the hot band pool GC. The tail pointer moves forward accordingly, which is similar to three GC algorithms in H-SWD. Now, it examines all distance values for each band including the just cleaned band and again chooses the best one (i.e., with a smallest distance value, assuming L). Then it reclaims these L + 1 numbers of consecutive blocks and moves valid data to the hot or cold band respectively. Lastly, it moves the tail pointer to the position of its initial position plus L + 1. Theses processes are iterated until it reaches the total M numbers of free blocks. This GC algorithm enables our poolbased H-SWD to minimize data block movement. Different options might exist. For example, we can clean a band that has the highest GC efficiency (i.e., the largest number of invalid blocks). However, we observed that this algorithm triggers a much higher number of valid block movements compared to our proposed GC algorithm since, in general, there still exist many valid blocks. Figure 12 demonstrates an example of how our new GC algorithm operates. As shown in Figure 12. We assume that a distance value of each band is denoted by Di , where i is a band ID, and the required space for GC is 10 free blocks. First, it chooses Band 2 since the distance (here, D2 = 0) of the first invalid block from the tail is the smallest (D1 = 4, D2 = 0, D3 = 3, D4 = 1, and D5 = 4), which corresponds to step 1 in Figure 12 (a). Then, it starts to clean 1 block and can obtain 1 free block. Next, it moves its tail pointer to position 1. After all these processes, the band sharing H-SWD proceeds the next iteration: First, it selects Band 4 ((D1 = 4, D2 = 4, D3 = 3, D4 = 1, D5 = 4). Second, it cleans and obtains 2 free blocks (as of now, it has obtained a total of 3 free blocks). Third, it moves its head from position 0 to position 2. Similarly, once it finally reaches 10 free blocks, our GC stops. We finally compare the GC overhead of the pool-based HSWD and the H-SWD designs. Figure 13 shows that the poolbased H-SWD further reduces the required number of block movement as well as flattens the block movement peaks. This further improvement can be attributed to our dynamic GC algorithm, which helps the pool-based H-SWD dynamically choose the best candidate in each GC process. V. Conclusion In this paper, we proposed a Shingled Write Disk (SWD) design named Hot data identification-based Shingled Write Disk (H-SWD). Unlike other SWD designs that do not control initial incoming data placement, the H-SWD judiciously assigns the incoming data into the hot bands or cold bands with the help of window-based hot data identification . Therefore, the hot data (likely to be updated soon) can be collected to the hot bands, while the cold data are stored in the cold bands from the beginning. This effective data placement policy increases the garbage collection (GC) efficiency so that it can significantly reduce the GC overheads. Furthermore, we extended the H-SWD to the pool-based H-SWD by grouping hot bands into a hot band pool and cold bands into a cold band pool. We developed a dynamic GC algorithm, which dynamically chooses one of the best candidates (i.e., a band) to perform a GC operation to improve Band 1 Tail Pointer v v v v i v v i v v Band 2 Band 3 Band 4 Band 5 i v v v v i v v v v v v v i i v v i v v v i v v i v v v v i v v v v i v v i v v 1 4 2 3 References (a) During GC process Band 1 Tail Pointer Band 2 Band 3 Band 4 Band 5 F F F F F F F F F F v v v v i v v i v v v v v v i v v i v v v v v v i v v v v i v v i v v v v v v i (b) After GC process Fig. 12. New GC Algorithm for Pool-based H-SWD. Here, I, V, and F stand for Invalid, Valid, and Free blocks respectively. Numbers in Figure (a) represent a GC process sequence. Block Movement 1e+06 100000 HSWD Pool 100000 10000 10000 1000 1000 HSWD Pool 100 100 0 5 10 15 20 0 5 (a) Financial 1 100000 Block Movement 10 15 20 15 20 (b) Prxy 0 100000 HSWD Pool HSWD Pool 10000 10000 1000 100 1000 0 5 10 15 20 0 Write Requests (Unit:900K) (c) Rsrch 0 5 10 Write Requests (Unit:900K) (d) Web 0 Fig. 13. The Number of Block Movements of Pool-based H-SWD and HSWD under Various Traces. the GC performance further. Our experiments demonstrate that both our designs outperform the Indirection System by reducing the required number of block movements. Acknowledgment We would like to thank Dr. Yuval Cassuto (formerly in Hitachi GST and a primary author of Indirection System) for his many pieces of valuable advices. This work is partially support by NSF awards: IIP-1127829, IIP-0934396, and CNS1115471. [1] Y. Shiroishi, K. Fukuda, I. Tagawa, H. Iwasaki, S. Takenoiri, H. Tanaka, H. Mutoh, and N. Yoshikawa, “Future Options for HDD Storage,” IEEE Transactions on Magnetics, vol. 45, no. 10, pp. 3816–3822, 2009. [2] R. Wood, “The Feasibility of Magnetic Recording at 1 Terabit per Square Inch ,” IEEE Transactions on Magnetics, vol. 36, no. 1, pp. 36–42, 2000. [3] K. S. Chan, R. Radhakrishnan, K. Eason, M. R. Elidrissi, J. J. Miles, B. Vasic, and A. R. Krishnan, “Channel models and detectors for two-dimensional magnetic recording,” IEEE Transactions on Magnetics, vol. 46, no. 3, pp. 804–811, 2010. [4] A. Caulfield, L. Grupp, and S. Swanson, “Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications,” in ASPLOS, 2009. [5] D. Park, B. Debnath, and D. Du, “CFTL: A Convertible Flash Translation Layer Adaptive to Data Access Patterns,” in SIGMETRICS. New York, NY, USA: ACM, 2010. [6] A. Amer, J. Holliday, D. D. Long, E. Miller, J.-F. Paris, and T. Schwarz, “Data Management and Layout for Shingled Magnetic Recording,” IEEE Transactions on Magnetics, vol. 47, no. 10, pp. 3691–3697, 2011. [7] J.-G. Zhu, X. Zhu, and Y. Tang, “Microwave Assisted Magnetic Recording,” IEEE Transactions on Magnetics, vol. 44, no. 1, pp. 125–131, 2008. [8] M. Kryder, E. Gage, T. McDaniel, W. Challener, R. Rottmayer, J. Ganping, H. Yiao-Tee, and M. Erden, “Heat assisted magnetic recording,” in Proceedings of the IEEE: Advances in Magnetic Data Storage Technologies, 2008, pp. 1810–1835. [9] W. A. Challenger, C. Peng, A. Itagi, D. Karns, Y. Peng, X. Yang, X. Zhu, N. Gokemeijer, Y. Hsia, G. Yu, R. E. Rottmayer, M. Seigler, and E. C. Gage, “The road to HAMR,” in Proceedings of Asia-Pacific Magnetic Recording Conference (APMCR), 2009. [10] R. E. Rottmeyer, S. Batra, D. Buechel, W. A. Challener, J. Hohlfeld, Y. Kubota, L. Li, B. Lu, C. Mihalcea, K. Mountfiled, K. Pelhos, P. Chubing, T. Rausch, M. A. Seigler, D. Weller, , and Y. Xiaomin, “Heat-assisted magnetic recording,” IEEE Transactions on Magnetics, vol. 42, no. 10, pp. 2417–2421, 2006. [11] E. Dobisz, Z. Bandic, T. Wu, and T. Albrecht, “Patterned media: nanofabrication challenges of future disk drives,” in Proceedings of the IEEE: Advances in Magnetic Data Storage Technologies, 2008, pp. 1836–1846. [12] A. Kikitsu, Y. Kamata, M. Sakurai, and K. Naito, “Recent progress of patterned media,” IEEE Transactions on Magnetics, vol. 43, no. 9, pp. 3685–3688, 2007. [13] I. Tagawa and M. Williams, “High density data-storage using shingledwrite,” in Proceedings of the IEEE International Magnetics Conference (INTERMAG), 2009. [14] P. Kasiraj, R. New, J. de Souza, and M. Williams, “System and method for writing data to dedicated bands of a hard disk drive,” US patent 7490212, February 2009. [15] G. Gibson and M. Polte, “Directions for shingled-write and two dimensional magnetic recording system architectures: synergies with solidstate disks,” Carnegie Mellon University Parallel Data Lab Technical Report, CMU-PDL-09-104, 2009. [16] A. Amer, D. D. E. Long, E. L. Miller, J.-F. Paris, and S. J. T. Schwarz, “Design issues for a shingled write disk system,” in Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010. [17] Y. Cassuto, M. A. A. Sanvido, C. Guyot, D. R. Hall, and Z. Z. Bandic, “Indirection systems for shingled-recording disk drives,” in Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010. [18] “University of Massachusetts Amhesrst Storage Traces,” http://traces.cs. umass.edu/index.php/Storage/Storage. [19] “SNIA IOTTA Repository: MSR Cambridge Block I/O Traces,” http: //iotta.snia.org/traces/list/BlockIO. [20] D. Park, B. Debnath, Y. Nam, D. Du, Y. Kim, and Y. Kim, “HotDataTrap: A Sampling-based Hot Data Identification Scheme for Flash Memory,” in Proceedings of the 27th ACM Symposium on Applied Computing (SAC ’12), March 2012, pp. 759 – 767. [21] D. Park and D. Du, “Hot Data Identification for Flash-based Storage Systems using Multiple Bloom Filters,” in Proceedings of the 27th IEEE Symposium on Mass Storage Systems and Technologies (MSST), May 2011, pp. 1 – 11.