Pattern Matcher for Locating Areas in Phase-Shift Masks Sensitive to Aberrations by Frank Gennari Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, in partial satisfaction of the requirements for the degree of Master of Science, Plan II. Approval for the Report and Comprehensive Examination: Committee: Professor Andrew R. Neureuther Research Advisor (Date) ******* Professor Sangiovanni-Vincentelli Second Reader (Date) ABSTRACT This paper describes a prototype CAD system for rapidly determining locations in large layouts that are most impacted by aberrations in projection printing. Aberrations are accurately modeled as producing spillover between mask openings with a localized pattern that is the inverse Fourier transform (IFT) of the optical path difference (OPD) function in the pupil. The novel function in the CAD system then quickly rank orders all mask layout edges, corners, and other user-specified geometries according to the degree of similarity of their surrounding geometry to the IFT function. The prototype is based on the Cadence Design Framework II CAD system and adds procedures for evaluating the spillover function, fast pattern matching, and extraction of local layout regions for further user aerial image simulation with SPLAT. An extensive set of Cadence SKILL utility scripts is incorporated into the process flow to perform tasks such as file conversion and geometric operations. Speed and memory limitations prompted the creation of a new binary in C++ that incorporates the core data structures and algorithms for pattern matching. Optimizations such as partitioning, prefiltering, and compression led to significant improvements in both speed and memory efficiency, allowing the matcher to process large mask designs. A pattern-matching sweep of a large mask can now be accomplished in roughly the time it takes to flatten and merge the mask layout in Cadence. Results are presented in current technologies using both binary and phase-shifting masks. 1. INTRODUCTION While the quality metric (Strehl ratio) of today’s projection printers is within a few percent of unity [1], residual aberrations still contribute significant spillover of signals from one mask opening to another. These spillover effects degrade the image quality with respect to position within the field of the die as has been illustrated by Garza, et. al [2], who observed good correlation of measured aberrations with the difference in horizontal and vertical linewidths along the slit in a scanning system. Such aberration-based linewidth variations are themselves partially mitigated by higher slopes through OPC. Yet residual cross-chip linewidth variations suggest that residual aberrations continue to contribute a level of degradation that is about half as large as the level of improvement gained through applying OPC. The impact of these aberrationbased spillover effects will clearly become more important with phase shifting masks due to the inherent use of more coherent illumination as well as the presence of both phases to more efficiently direct energy to different locations in the lens pupil. The goal of this paper is to demonstrate the possibility of bringing together the knowledge of the exposure tool from the fab with the electronic design of the layout patterns. Since the physical nature of residual aberrations across the field of a lens can be and often is independently determined, this advanced knowledge of the exposure tool can and should be fed back upstream into the overall mask design and compensation process. This paper describes a physically based model and a computationally tractable method in a prototype system for identifying local regions of layout likely to have problematic interactions with residual lens aberrations. The method, as illustrated in Figure 1, is based on finding similarities in the layout to aberration test patterns. The prototype indicates that it is possible to explicitly include the lens data in the OPC process with only a fractional increase Figure 1: Two simple coma (sin) patterns matched to a hand drawn example PSM layout. The donutshaped boxes represent the best match of a vertical coma pattern to an edge and a corner in the layout. in the data preparation time. Such a system will be useful in identifying potential ‘showstoppers’ in designs at tape-out, compensating for common tool-set characteristics, and creating a new infrastructure for communication that spans from tool manufacturing to circuit design. An important insight for this project was the work of Robins, Neureuther, and Adam [3] who, in designing pattern-and-probe based aberration monitors, developed the strategy of first finding the incident field at the pupil that maximizes spillover and then taking its inverse Fourier transform (IFT) to find the spatial pattern on the mask. The fact that the aberrations are small allows a perturbational approach for evaluating the aberration effects in imaging [4]. A second perturbational approach based on image slope that was developed for estimating defect interactions with features [5] is used for estimating the resulting linewidth change. The prototype work in this project is based on the Cadence Design Framework II CAD system, which is available to students at UC Berkeley. Background on aberrations, effects on imaging, and measurement techniques can be found in [6-8]. This paper begins with a brief discussion of the physical foundation for quantifying aberration effects through pattern matching. The overall system architecture for assessment of aberration effects through pattern matching is next outlined. The following four sections then consider the details of the Cadence interface, the pattern generator, the core binary matching algorithm, and local region extraction and linkage to SPLAT [9] for further user investigation. Results including current technologies for binary and phase-shifting masks are then considered. The final section presents performance data on both memory usage and processing speed for sizeable layouts. 2. PHYSICAL FOUNDATION FOR QUANTIFYING ABERRATION EFFECTS When aberrations are small, the exponential factor containing the optical path difference (OPD) in the integral over the lens pupil used to find the image can be linearized [4] as e jOPD 1 jOPD . This linerization into two terms can be physically interpreted as producing two additive electric fields at the image plane. The constant gives the electric field for an unaberrated image and the jOPD term gives a perturbation or spillover proportional to the strength and influenced by the azimuthal and radial characteristics of the OPD. For a Strehl ratio of 0.975, a two-term approximation is reasonable as the total RMS aberration is 0.025 waves, the peak OPD function values are about 0.05 waves, and a third term is at most only 10% as large as the first term. The goal is to determine the additive electric field of the jOPD term from a collection of mask openings in a neighborhood of a central observation point. One approach is to compute the contribution to the electric field from each of the surrounding pixels and then sum them up. A more interesting alternative is to first view the problem in the pupil of the lens and attempt to maximize the spillover from the jOPD term onto the unaberrated image term. In this view the additive field will be largest when the incident electric field is uniform in magnitude and exactly cancels the phase of the OPD. That is, the additive field in the pupil is proportional to e jOPD 1 jOPD . The inverse Fourier transform (IFT) of this function in the pupil can be used to determine the pattern on the mask that will create this maximized spillover onto the unaberrated image of the central pixel. The IFT of the constant term corresponds to a fixed and strength enhanced infinitesimal pinhole at the pattern center. The effect of this pinhole is independent of the level of aberrations and so it may be disregarded in studying the additive perturbation due to aberrations. The IFT of the second term yields the desired composite pattern centered at the central observation point on the mask that will produce the greatest spillover onto the observation point for the given set of aberrations making up the OPD. This pattern is zero at the observation point itself due to the fact that the Zernike functions other than the zeroth that are included in aberration measurements individually have zero area when integrated over the pupil. The zeroth order term can be viewed as producing the unaberrated image complete with proximity effects. The contribution of the IFT test pattern at the wafer of an additive aberration induced electric field EA at the central observation point can be calibrated as follows. First, compute the IFT for a given jOPD and digitize it into a pattern surrounding the central observation point. Then, simulate the aerial image of this pattern in the presence of the aberrations and take the square root to convert intensity at the central observation peak to electric field. Here EA is a complex quantity and its imaginary part comes from even aberrations such as defocus, spherical, and astigmatism while its real part comes from imaginary aberrations such as coma and trifoil. Simulating the image of this pattern under the illumination conditions utilized in printing the wafer is believed to also help account for the reduction in sensitivity with partial coherence. The theory above implicitly assumes coherent illumination rather than the partial coherence used in various illumination schemes in projection printing. Once the perturbation of the complex electric field at the observation point due to aberrations EA has been found, the impact on the image can be evaluated. A very important consideration is the phase of EA relative to that of the electric field in the image of the unaberrated feature. This can be assumed to be the phase of the feature on which the observation point is located. The component EAO orthogonal to the feature, usually due to even aberrations, simply produces an additive intensity effect IO = |EAO|2. The component EAC co-linear with the feature, usually due to odd aberrations, creates a composite intensity that is the square of the sum, including sign, of the electric fields. The composite intensity is I C I F 2 FA I F E AC E AC . Here IF is the feature intensity at the observation point. For 2 example, at the edge of a line, IF is 0.3 of the clear field and the unaberrated electric field is I F = EF = 0.55. The parameter FA is the mutual coherence between EF and EAC and is negative when EAC and EF are in opposite directions. The total intensity perturbation is thus I E AO 2 FA I F E AC E AC . The resulting linewidth change L can be found by 2 2 dividing I by the intensity slope at the feature edge [5]. 3. ABERRATION PATTERN MATCHING SYSTEM OVERVIEW The first prototype of the pattern matching system was written entirely in version 97A of Cadence Design Framework II’s interpreted SKILL programming language, which is similar to both C and LISP. The built-in SKILL hash tables and other data structures are very convenient, but SKILL does not permit the low-level data types and bit operations required for efficient pattern matching operations. Also, SKILL uses a garbage collection method to deal with memory allocation and freeing, while a more explicit memory management system was needed for the memory-intensive matrix-based algorithms. A simple matching run took several minutes and up to 100MB of memory, and the use of large hash tables of complex objects led to problems in Cadence. Since the Cadence SKILL pattern matcher was not as fast at expected and did not support low-level operations, the core matching, polygon processing, and extraction code was written in C++ as a separate binary. Cadence continues to control software flow, provide a GUI, and convert to/from CIF, GDS, or other layout formats. In addition, Cadence is used to flatten hierarchy, perform geometric transformations, determine overlap, and merge shapes on various layers because of its efficient internal implementations of complex geometric operations. The first working attempt at an external matching algorithm was slow and memory inefficient, although redesigned and significantly better than the original SKILL version. Further work on the code involving multi-level matrix compression, edge and corner data structures, partitioning, prefiltering, and polygon operations resulted in greatly improved speed and controllable memory requirements. Eventually, new features and options were added to the core binary, making it more powerful and the original Cadence version obsolete. This software package consists of four main components, shown in the block diagram of Figure 2: the GUI and I/O code, pattern generator, core pattern matcher, and SPLAT extractor. The GUI and extractor code were written in the SKILL language of Cadence’s DFII and are configurable to meet the needs of individual systems and designers. The SKILL scripts read a mask layout file, set of pattern files, and configuration file from disk. The mask layout can be a CIF file, GDS file, Cadence DFII cellview, or any other supported format. The configuration file specifies the patterns Figure 2: Block diagram of data flow through the entire pattern matching software package. to use, physical and optical parameters, matching types, and a variety of other options. Each pattern file can be either written by hand or generated by providing a set of Zernike polynomials to the pattern generator binary. Cadence flattens and merges the layout and creates a large intermediate file consisting of rectangles, polygons, patterns, and parameters required for the matching algorithm. Then Cadence executes the core pattern matcher, which reads the intermediate file, runs the matching algorithm, and produces two results files. The first results file contains the ordered locations, scores, types, and underlying layer weights for the highest scoring matches of each pattern. This is read by another SKILL script and the results are displayed graphically in the layout window. The other results file contains extracted rectangles that can be converted into SPLAT file format by the final SKILL script. 4. CADENCE DESIGN FRAMEWORK II INTERFACE The process begins with a possibly hierarchical 0/180 phase-shift mask layout in a standard format such as CIF or GDS. The Cadence Design Framework II is used to stream in the layout, flatten the hierarchy, and merge the overlapping shapes into non-overlapping ones. The flattened and merged Cadence mask layout consists of many rectangles, polygons, and paths on a set of drawing layers. All paths are converted to polygons inside of Cadence and all polygons are then converted to rectangles in either Cadence or the pattern matcher binary, depending on user preference. Since the binary is separate from Cadence, every rectangle and polygon must be written to an intermediate I/O file along with the patterns, optical and matching parameters, and other information. This intermediate file may also include other files such as pattern matrices so that the user can incorporate into the run the more general data involved in matching without generating those values for each matching run. Rectangles consist of a set of integers representing x1, y1, x2, y2, and the layer ID. Polygons consist of the number of points, a list of points as integer pairs, and a layer ID. The drawing layers typically consist of 0-, 180-, and sometimes 90-degree phase areas, in addition to an unlimited number of temporary Boolean layers. The Boolean layers serve several purposes, including restricting matching areas and enhancing visualization of the layout. Extra layers increase the size of the intermediate files, but do not have a significant effect on the runtime or memory requirements of the core matcher. Once the pattern-matching phase is complete, the results are written to file and read by Cadence for graphical display purposes. The specified number and type of patterns are drawn at the locations with the highest correlation between the pattern and underlying layout geometry. Each pattern is shown as a bitmap color coded for phase along with a text string specifying match type, normalized score value, pattern ID, and underlying layer phase. Pattern requirement locations and optionally extracted Boolean layers are shown in different colors. Figure 3 shows the results of a sample run with the astigmatism pattern, while Figure 4 demonstrates that the pattern matcher can process many matches of several different patterns in a single run. Figure 3: The pattern matcher has matched the astigmatism pattern to the best location on this 0/180 PSM layout. Figure 4: The matching software is able to perform matching runs involving many different patterns. Here, a single run shows the best matches for aberrations such as trifoil, spherical, and astigmatism on a 0/180 PSM. 5. PATTERN GENERATOR A separate pattern generator binary was written in C++ to create 2D pattern matrices from sets of Zernike polynomials. This code incorporates a publicly available two-dimensional FFT/IFFT package [10] to compute the inverse Fourier transform of each pupil function. The pattern generator first reads a set of weighted aberrated pupil functions, each in the form of a list of coefficients representing powers of rho, sine and cosine coefficients, and coefficients for phi in both the sine and cosine terms. Any weighted combination of common aberrations or arbitrary Zernike polynomials can be converted into a pattern matrix. Refer to Figure 5 for an example of a matching run involving several simultaneous aberrations. This allows a set of Figure 5. Examples of patterns for simultaneous aberrations of coma (cos) + random, coma (sin) + coma (cos), and spherical + defocus. aberrations present in the lens of a particular stepper or other printing system to be summed into a single pattern matrix. In fact, designers can maintain a collection of matching patterns, one for each machine, and select for the matching run the pattern corresponding to the machine(s) intended to be used in printing the design. All matrices used in the pattern generator represent uniform grids of some fixed dimension in terms of λ/NA. A large background matrix of zeros is constructed around the pupil function to provide isolation for the IFFT. A circular pupil area is filled with numbers calculated by evaluating the Zernike polynomial within the pupil radius, and the area of the pupil is calculated for normalization purposes. The 2D IFFT of each function is taken, and the results are summed into a final matrix and written to file in pattern matrix format. The pixel size of the pupil function, the background matrix, and the resulting power of two pattern are specified in the input file. The sizes are constrained by several rules, but otherwise the user is free to choose pupil sizes so as to scale the pattern properly. The user can also specify the power of two size of the background matrix, with larger sizes ensuring higher accuracy. Background sizes are typically chosen as the maximum feasible size given available memory and acceptable processing time. Pattern sizes of 32x32, 64x64, and 128x128 pixels provide both adequate accuracy and reasonable matching times. Several other SKILL scripts are used to resize, recenter, and trim the pattern matrices before the core pattern matcher binary uses them. Another SKILL script can be invoked to perform a conversion from Cadence layout to a pattern file. This allows the user to design a custom pattern that the pattern generator is incapable of producing. For example, the user can create a pattern layout that meets design rules for a particular technology and then print that pattern on a wafer for testing purposes. The layout to pattern converter was created to allow the designer of such a custom target to run the pattern matching software so that the target pattern matches the hand crafted layout exactly. This converter can also be used to transform an arbitrary section of mask geometry into a pattern so that the pattern matcher can be used to match non-aberration based patterns to layout, effectively implementing a generalized geometric search procedure. 6. BINARY FOR CORE MATCHING ALGORITHMS The pattern matcher binary, written in C++, is a standalone, platform-independent executable that reads and writes a variety of files to disk. It can be compiled for and used in any operating system. This binary is intended to be called from inside Cadence, but can be run independently if the input files are present. Depending on the verbose level specified in the command line, various warning, error, progress, and statistics messages are sent to stdout. Important errors or warnings of problems such as missing data are also shown. The strict input parser and extensive internal error checking cause the binary to quit on any error and return one of a number of error codes recognized by the Cadence SKILL script. A flowchart showing the construction of several major data structures is shown in Figure 6. The main algorithms that create and process these data structures are outlined in the sections below. Figure 6: The data structures of the core matching procedure. Polygons are read from file, split into rectangles, added to an array of Boolean layer maps/bitmaps, and merged into a weighted cell matrix prior to executing the inner matching loops. 6.1 Data Structures and Algorithms The input file to the pattern matcher consists of rectangles and polygons on a Manhattan grid, a set of pattern matrices to match, optical parameters, performance parameters, included file lists, and pre-calculated values. Rectangles and polygons off grid or with non-orthogonal edges are forced onto the nearest grid points, and the diagonal lines are broken into short step-like edges, which adds some error to the matching procedure. However, most layouts are created on fixed Manhattan grids to begin with so as to make the job of automated CAD tools easier. Since polygons are more difficult to partition, sort, and store than rectangles, each polygon is split into a number of rectangles as it is read from disk. In some cases, the polygon splitting procedure may locate a defective polygon due to the application of grid snapping and line splitting. Defective or abnormal polygons include those that have consecutive points at the same grid location, three or more collinear points, edges of length zero, diagonal lines, self-intersections, or overlapping edges. If the polygon is found to be defective, it is fixed, if possible; otherwise, it is discarded and a warning is issued to the user. However, before the polygon is split its edges, inside corners, and outside corners are added to their respective data structures. The splitting algorithm proceeds by scanning through the points and locating the set of unique x and y values or divisions that, if used as cutlines, will partition the polygon into a large but nearly optimal number of smaller rectangles. Next, a binary edge matrix is built, where a value of ‘1’ represents the presence of a vertical edge along that cutline segment. An in_poly binary flag is initialized to 0 and toggles each time a 1 is encountered in the edge matrix. Each x value of each row of the edge matrix is iterated through, and horizontal rectangles are extracted from the polygon for each consecutive y value. The starting x value of the rectangle results from the location where the in_poly flag toggles from a 0 to a 1, and the ending x value results from a 1 to 0 toggle in the same row. Each rectangle is stored in an STL vector and added to the layer map in a later step. Edges, line ends, and corners are either extracted by the Cadence script upon rectangle and polygon generation and included in the intermediate file, or extracted from the polygons in the input file by the pattern matcher binary in the following manner. Each polygon side is added to the set of edges, the set of line ends, or both, depending on the length of the side and other input parameters relating to minimum feature size and target geometry. A clockwise polygon point traversal is assumed, and separate sets of inside and outside corners are built based on the directions of consecutive polygon edge vectors. Since a polygon must have four more outside corners than inside corners, the corner sets are swapped if their sizes are incorrect. This case results when the polygon points were actually specified in a counterclockwise direction. The geometry vectors of rectangles, edges, line ends, inside corners, and outside corners are sorted based on matrix position so that they can be partitioned more efficiently. Replacing the standard 2D floating-point matrices with single dimension integer arrays reduces file size and memory requirements by substituting two floating-point coordinates with a single integer. This does limit the range of x and y values to around 16 bit integers each, but so far that has not been a problem. The unique coordinate is constructed by adding together the x_value and x_size*y_value. Dividing the layout and corresponding matrices into partitions further reduces memory requirements since only a single partition of the layer map and cell matrix are constructed in memory at any given time. Partitions consist of horizontal strips separated by horizontal cutlines at evenly spaced vertical intervals. An overlap equal to plus and minus half the size of the largest pattern is added to ensure continuity when matching along the edge of the partition. The geometry vectors are also partitioned into regions prior to creating the matrix partitions so that the geometry related to a partition can be processed independently of the other partitions. There are two types of regions: those corresponding to a particular partition, and those corresponding to the overlap area between partitions. Edges, line ends, and rectangles crossing horizontal cutlines are split into multiple segments and added to each region. A single matching iteration must process one partition of each matrix and the corresponding region along with the two adjacent overlap regions (one if at the top or bottom of the design) of the geometry vectors. This partitioning could allow the matching process to be run on many processors simultaneously with or without shared memory, although this feature has not yet been implemented. In any case, partitions need not be processed in any particular order. All rectangles are then added to the layer map, a set of bitmaps, one for each layer. Each partition of the layer map is converted into the 2D cell matrix prior to calculating the match values at each location. The weights for each of the up to 256 possible combinations of overlapping layers in each set of eight layers (each byte in the layer map) are pre-calculated. If there are more than eight layers present, then each set of eight layers is compressed into a separate character array. All bytes and all pixels in the layer map are iterated through, and the byte at each location is used to reference a weight from the previously calculated table, which is added to the corresponding pixel in the cell matrix. This leads to a cell matrix of floating point pixel values equal to the sum of the weights of every layer present at that pixel. The match value of a pattern at a specific location is the sum of the products of each pixel in the pattern with the matching pixel at the match location in the cell matrix. The calculation of match values takes up to 90% of the matcher’s runtime. Details on this inner matching loop are presented in section 6.3. The core matching binary is capable of processing an arbitrary number of mask layers and temporary layers. Since the layer map is stored as a character based bit vector, eight layers can be processed in a single operation. The matcher is capable of evaluating a set of equations describing Boolean operations to be performed on the layers. These operations include layer AND, OR, XOR, NOT, 2D derivative, and algebraic combinations of these including parenthesized sub-expressions. If the number of interrelated layers is eight or less, a lookup table is constructed from the expression and each byte in the layer map is transformed by a simple table lookup, resulting in a very fast layer Boolean with up to eight parallel computations. If the number of interrelated layers is greater than eight, the expression is converted to prefix notation and evaluated recursively for each layer to be assigned to. Patterns can be constrained to only match certain real or temporary layers, such as where poly overlaps active region, forming a transistor gate. Any of these layers can be extracted into rectangle format and imported back into Cadence as a form of external layer Boolean. In some cases this process is faster than the internal Cadence layer Booleans due to the parallel nature of the layer processing operations. Layout locations to be matched are determined by intersecting user defined match geometry requirements with the previously computed geometry vectors. For instance, if the user has selected to perform matching at line ends, then a match is evaluated at each location along each valid line end in the two to three regions corresponding to the current matrix partition. The use of prefiltering eliminates false line ends, corners, etc. that originated from internal polygon edges, overlapping shapes, or a similar situation in the input file. If the validity of a coordinate has not been verified, the geometric query function is called for that type to check it against the cell matrix and ensure it is actually a coordinate of that type. If so, the coordinate is flagged as “good”; otherwise, it is flagged as “bad”. If all coordinates of an object are tagged as “bad,” then the object itself becomes “bad” and in some cases is removed from its data structure. The inner loop is called on each line end that has been flagged as “good,” returning a score for that match location. If the score is higher than any previous score, it becomes the “best” match. If the score is lower than the best match but above some cutoff, then it is added to a sorted set of match values for that pattern and partition. If the set reaches a heuristically determined maximum size, then the match with the lowest score is removed. This size constraint ensures that the memory requirements do not explode if the layout consists of many repetitive shapes with similar match values. The cutoff is dynamically calculated using a heuristic algorithm and user defined parameters, and is usually specified as a percentage of the current best score. The cutoff increases with increasing “best” scores. High-scoring matches for different partitions are either added to the same global match set or are added to individual partition match sets and later merged if multiple processes are invoked. In theory, each process must read the same matrix and vector data from shared memory and write to different data structures to avoid write conflicts. However, this multi-process setup has never been tested. When the matching phase is finished, the final match set is converted back into the results vector, which has the same form as the geometry vectors. If this is the last matching phase, then the vector is written out to the results file and the mask layout region around the best match is extracted. Duplicate or invalid matches are marked as “bad” and are not written to the results file. 6.2 Compression The single most important factor in the speed of this code is matrix compression. In many cases, both the mask layout and the pattern matrix can be compressed by averaging a block of 2x2 pixels into a single pixel. This 4X pixel compression reduces all dimensions by a factor of two and matrix size by a factor of four. Since the innermost nested loops iterate through both the mask matrix elements and the pattern matrix elements, a compressed match takes up to 4x4 = 16 times less matrix element multiplies. Furthermore, the compression can be repeated recursively for higher compression levels of 16:1, 64:1, …, 4n:1, until the pattern becomes too small to compress further. The maximum number of compression levels is usually limited to three by compression overhead and error. As in any form of lossy compression, accumulated error can eventually lead to incorrect results. Once the best matches are found, the matching algorithm is run on the higher resolution, uncompressed matrices to determine the exact match value, and only the top few of these are kept. A complex set of equations is used to dynamically calculate the worst-case error for each pattern and adjust the number of “best” matches to be preserved for the high resolution filtering. The maximum 4X compression error is computed as follows. Given a cell matrix block of four pixels with values a, b, c, and d that are to be compressed into a single pixel of value e, the compression function is a simple averaging of the pixels. Similarly, a block of four pattern pixels with values f, g, h, and i are compressed into a single pixel with value j, resulting in a compressed multiplication value of m. e = (a + b + c + d)/4 j = (f + g + h + i)/4 m = ej = (a + b + c + d) (f + g + h + i)/16 = (af + ag + ah + ai + bf + bg + bh + bi + cf + cg + ch + ci + df + dg + dh + di)/16 The exact multiplication value is computed as: M = af + bh + ch + di The compression error E is defined as the difference between the exact and compressed values: E = M – m = M – ej = (af + bh + ch + di) - (a + b + c + d)(f + g + h + i)/16 The maximum error can be determined by choosing the values of a through d and f through i as either the most positive or most negative extreme so as to maximize the difference between M and m. Two solutions exist: maximize m and force the product of e and j to be as negative as possible (max positive error E+), or minimize M and force the product of e and j to be as positive as possible (max negative error E-). The error estimation algorithm assigns extreme values to the variables and chooses an error margin equal to E+ - E-. The relative error, (E+ - E-)/Mexpected, is the parameter that determines cutoff values and limits the usefulness of compression. At first, this error compensation procedure led to an increase in runtime because the worst-case error cutoff resulted in most matches being run at both low-resolution and highresolution mode. This is because the worst-case error results when both the layout and pattern pixels alternate between the largest and smallest possible values. Real layouts consist of large blocks of similar pixel values, while real aberration patterns are relatively smooth in their transitions from one extreme to the other. Since the worst-case error was an order of magnitude higher than the average observed error, probability theory was used to estimate the maximum expected error and to produce a cutoff value for defining the “best” matches based on the distribution of scores. The compression algorithm has been further refined several times to produce a version with user parameters to adjust the tradeoff between speed and accuracy. It is important to update the locations in the geometry vectors (edges, corners, etc.) to match the compressed matrix scale. Instead of duplicating each geometry vector, it is more memory efficient to temporarily condense line and point values. Also, the original coordinates of low resolution matches will need to be extracted for matching at higher resolutions. This requires the addition of an extra byte containing flag bits to the elements of the vectors, which is used to reconstruct the original coordinates in time for the higher resolution matches. The matcher first compresses all matrices and vectors at increasingly higher compression levels, using the data a level i-1 to compute the compressed data at level i. Then, data at the highest compression level is used to generate a set of match results stored in the aforementioned results vector. The results vector coordinates are expanded and elements not flagged as “bad” are used as the input geometry vectors to the next highest compression level matching. This continues until level 0, the uncompressed phase, is reached. The level 0 matching phase verifies that each match meets the geometry constraints imposed by the user, runs the highest resolution matching algorithm, and then writes the resulting results vector to disk as the output file. The logic behind compression is that each level of compression reduces the number of matching locations evaluated in the next higher resolution matching phase, so that by the time the expensive level 0 matching phase is executed there are very few match locations left to consider. Although this is an estimation of the best matches, it is accurate in most cases for lower compression levels. 6.3 The Inner Loop The innermost loop of the matching code performs a floating-point pixel multiplication between a 2D pattern and an equally sized area of the underlying cell matrix at a specific match location to determine the match score. More specifically, the pattern match value is the sum of all multiplications between each pixel in the pattern with the corresponding pixel in the cell matrix, calculated using the equation: score ysize1 xsize1 ( pattern[i][ j ] * cell _ matrix[i][ j ]) . i 0 j 0 This multiplication is O(x*y), where x and y are the width and height of the pattern matrix, respectively. This loop must be called once for every potentially high scoring match. Since the inner loop is typically responsible for around 90% of the runtime of the matcher, it is important that this loop is called as few times as possible and the code for the loop is extremely efficient. The compression algorithm explained above reduces the number of calls to the inner loop as well as the loop bounds. For this reason, compression leads to a significant speed improvement despite the overhead of generating the compressed data structures. Numerical results of performance analysis are presented in section 9. Input pattern x and y sizes are required to be divisible by eight because of loop unrolling in the inner loop and are required to be a power of two for compatibility with bitwise operations. If the raw pattern does not meet these sizing constrains, a SKILL function is available to resize the pattern using 2D interpolation. A further speedup of the inner loop is attained through early termination of the loop when the score predicted on the fly is lower than a predefined cutoff point, in this case the same cutoff as used in the match filtering. The predicted score is computed by adding to the current running score a prediction based on the assumption that the portions of the pattern and cell matrices yet to be multiplied are an exact match. This prediction is always optimistic, and therefore will never result in premature termination of a high scoring match calculation. Predicted values for each matrix section of each pattern are calculated prior to performing the actual matching, and this check is only performed once per row after the multiplication is halfway done. 6.4 Debugging It was difficult to verify the correctness of the algorithm with such complex input file syntax, dozens of available options, and several compression parameters. After many failed attempts at testing the correctness of the code, the use of geometric transforms on both the mask and the target finally provided an easy way to locate several major bugs. In this process, the results of a simple matching run were first recorded. Then both the layout and the pattern were transformed through identical rotations, translations, mirroring, and scaling and rerun while the results were again recorded. Since the pattern and layout underwent the same transformations, the results should have also been the same regardless of the transform used. Cases of observed deviation in results were investigated with smaller designs or isolated areas until the bug was found. In addition, matching runs were performed with various combinations of options and debugged until they produced identical results. Some other errors in the match results are not obvious to the user, especially when a large set of match values are generated. The errors may result from an incorrect implementation of one of the algorithms or possibly some special case that was not anticipated. A simple pm_comp program has been developed to help detect errors in the output of the matcher by comparing results files from different runs on the same mask layout. Not only does this program check for syntax errors, incorrect result sizes, invalid numerical ranges, and other simple single-token errors, but it also checks for the more difficult to detect cases. One particular example is when two matching runs produce matches of the same type and at the same location but with different match values or different underlying mask weights, which implies that one or more of the results are incorrect. Although the core matcher verifies its highest match value by performing the full bitmap matrix multiply on the best match, there are many ways in which the values can become corrupted between and even within matching runs. This same results processing program is also used to quantify the accuracy of approximate matching methods, such as the current implementation of multi-level compression. The only observed negative effect of multi-level compression is that some of the valid high scoring match locations fail to appear in the output file due to a misjudgment of the match values calculated using the compressed data structures. Given a known good output file, such as one produced without compression, pm_comp determines the accuracy of the output of another run involving the same input data but more aggressive algorithmic parameters. The pm_comp binary compares the results pattern by pattern, producing various statistics along the way and a final accuracy percentage based on the number of valid identical results between the two files. This percentage determines the speed versus accuracy tradeoff in the compression algorithm, which was found to be highly dependent on both the layout and the patterns. 6.5 Pattern Matching on Large Designs Larger chips such as today’s microprocessors can attain sizes of several centimeters on an edge, translating into billions of pixels on a high-resolution sub-micron grid. Completely flattening designs this large for use in pattern matching is impractical due to memory limitations, among other problems. Even if a CAD tool was capable of flattening the entire chip, simply loading the pattern matcher input file would exceed available memory long before partitioning. Even if the partitioning steps were reached, the thousands of resulting partitions and the time and memory overhead of partitioning would prevent the pattern matcher from finishing in any reasonable amount of time since memory operations would likely dominate the runtime. The solution to this problem is to partition the design before flattening, while it is still represented as a hierarchical layout in the CAD tool. This partitioning must be done manually or with the assistance of an external tool. Many modern integrated circuits are designed using a hierarchy of macrocells and possibly underlying clock and power grids. Large arrays of regular structures such as on-chip RAM, cache, clock and power grids, and tiled gate arrays are poor candidates for pattern matching because the large number of repetitive structures leads to many matches with identical values, producing inflated sets of match results. In extreme cases, the ordered set of match results can become so large that it dominates the memory while insertion and removal from it dominate the runtime of the matcher. These areas of the design are best tested by running the matching algorithm on a single tile of the array. Because of the problem with regular arrays, custom or hierarchical macrocells are the best candidates for hierarchical partitioning. In fact, partitioning is most efficient on macrocells with little or no overlap. In hierarchical partitioning, a single large macrocell or collection of adjacent macrocells is selected and flattened into a new mask layout representing a portion of the chip. To preserve continuity between partitions, any overlapping of cells as well as a perimeter strip extended to equal the width of the largest pattern must be extracted, flattened, and added to the flat design. Underlying cells such as power distribution lines can be added if necessary, assuming they share common mask layers with the macrocells. Each macrocell or group of cells can be independently extracted, flattened, and run through the pattern matcher as if it was a complete chip. The results of each sequential run are saved into separate files, and afterwards the files are merged with a special program to determine the best matches over the entire layout. This merge binary reads two sets of results, and if they are of compatible sizes combines the matches into a sorted pool, skimming the best matches off the top and storing them in a merged file. Each partition is merged into the current results pool, and at the end the final set of matching results has been determined. This approach is possible because the score of a match is independent of all other match scores. In fact, the partitions can be processed in any order, possibly utilizing more than one processor for a parallel implementation. Note that a flat design can also be partitioned in this way if the user prefers an external partitioning to the internal partitioning of the pattern matcher. This version of partition has not been tested in the Cadence DFII for lack of sufficiently large designs as well as the server quota space to store them. The actual hierarchical partitioning process is highly dependent on the available CAD tools, system, and layout representation. As a side note, a custom flattening algorithm was attempted in SKILL but never finished due to the complexity of the hierarchical representation and geometry transforms supported in Cadence. It is doubtful that the Cadence flattening algorithm can be improved significantly, if at all. 7. RECTANGLE EXTRACTION AND SPLAT SIMULATION SPLAT is an aerial image simulator that produces image intensity plots along cutlines and contour plots of intensity over pattern areas from an input file representing a portion of a mask layout [9]. The SPLAT file format consists of a header defining variables such as σ, λ, NA, and simulation area followed by a list of rectangles and finally plot commands. Rectangles are represented in the form <x1> <y1> <length> <width> <transmittance> <phase>. The SKILL extractor and SPLAT file converter can only produce rectangles with transmittance of 1.0 and phase of 0, 90, or 180 degrees, which is sufficient for most standard PSM layouts. The layout geometry underneath the best match of each pattern is extracted into a rectangle file and afterwards converted into SPLAT format. Three rectangle extraction algorithms have been written in C++ and SKILL, each having a different speed vs. quality tradeoff. The original extractor, which is still used in the polygon splitting algorithm, locates rectangle edges and extracts horizontal slices from left to right between pairs of edges in a form of one dimensional rectangle expansion. The resulting rectangles are long, thin, and not necessarily optimal. The grid used need not be of fixed size. This algorithm is less efficient than the others, but is required when dealing with the overlapping or self-intersecting polygons that may appear in the input file (see section 6.1). The second algorithm involves searching for the bottom left corner of a rectangle and expanding up and to the right while removing the rectangle pixel-by-pixel from the bitmap matrix. The algorithm proceeds from the lower left corner to the upper right corner of the extraction region, extracting rectangles until the region is empty. Only a single bit is used to represent each pixel of the layer map. This algorithm is linear in the size of the extraction region and very efficient, possibly talking less time than actually creating the bitmap, but the set of extracted rectangles is not always minimum. The third and final extraction algorithm expands in all four directions, decrementing a pixel counter for each pixel in the extracted rectangle. After one pass, the previously described two-dimensional extractor is called to extract the remaining rectangles. This extractor is believed to produce the exact minimum number of rectangles, but the extraction takes at least twice as long and several times the memory (for the counters) as the two-dimensional extractor. Also, this four-way expansion algorithm can produce overlapping rectangles, which are not supported in SPLAT. “Negative rectangles” can be generated to cancel out the overlaps, but this procedure is unnecessarily complex for internal pattern matcher usage. Thus, the original two-dimensional extractor was chosen to convert the layout bitmaps into rectangles. The extraction and SPLAT simulation procedure is shown in Figures 7-10. Figure 7 is a close-up of a coma pattern above the layout geometry to be extracted. After rectangle extraction, the output file is converted to SPLAT format through the use of a SKILL script. This procedure simply formats the rectangles correctly for SPLAT. The drawmask plot of this SPLAT file, which matches the geometry in the original layout, is shown in Figure 8. Figures 9 and 10 are contour plots of SPLAT simulations with and without 0.1 waves of coma. Notice how coma makes the printed line in the center of the contour plot narrower (contour lines closer together) and shifted to the left. Figure 7. A close-up of the match location shows the high correlation of about 0.7 between the coma pattern and the underlying mask geometry. Figure 8. Mask layout geometry near the match location is automatically extracted and made available for simulation. Shown here is a Drawmask plot of a SPLAT input file. Figure 9: SPLAT simulation contour plot of the previous coma pattern match without aberrations. Figure 10: SPLAT plot of coma match area with 0.1 waves of simple coma added. Notice the distortion in the contours. A separate SKILL procedure was written to convert a Cadence mask layout to a SPLAT file as well. The algorithm involves splitting each polygon in the layout, creating a variable width bitmap, extracting rectangles from the bitmap using the two-way expansion extractor, and converting the rectangles into SPLAT format using the SKILL script mentioned above. A second SKILL procedure converts SPLAT files back into layout, thus closing the data conversion loop. 8. RESULTS FOR VARIOUS LAYOUTS Figure 11 illustrates a handcrafted layout of test structures designed to be sensitive to the trifoil, coma, and spherical aberrations. This layout consists of arrays of targets of varied dimensions that are common in real layouts and likely to have a high degree of similarity with the test patterns for the aberrations indicated. This layout was used to initially test the matching theory as well as the software. The normalized pattern-matching Figure 11: Handcrafted test cases include layouts likely to be affected by trifoil, coma, and spherical. factor for these shapes ranges from 0.362 for coma, 0.419 for trifoil and 0.470 for spherical. The match factors of these test structures are relatively low compared to those achieved with more complex geometries. However, the test structures were designed to represent common layout geometries without much complexity. Including both 0- and 180-degree phase regions in the test structures while keeping them simple increases the match factor by approximately 50%. A test structure exactly matching the aberration pattern of interest can of course achieve a match factor of 1.0, which is the basis for some other work being done to quantify the amount of aberrations in real printing systems. Figures 12 and 13 show a small piece of a clear field binary mask layout and a print of a similar layout at 193 nm [11]. The challenge for the lithographer is to be able to read the resist image on the wafer and distinguish issues associated with diffraction limited imaging of simple features, optical proximity, aberrations, and resist performance. The most difficult feature to control in this layout is the small gap between the vertical resist lines. The diffraction limited imaging of the narrow opening on the mask is likely a rather nonlinear function of the opening width. The local roughness observed in this gap in the print occurs over a distance much smaller than the resolution and is clearly an indication that the resist performance is also an issue. The optical proximity effects are clearly evident in that the large vertical lines bulge toward the small horizontal lines. The pattern matching software allows the potential effects of aberrations to be tested. The most sensitive edge location within the center rectangle is the 0.51 match value for astigmatism (cos) as shown in Figure 12. Figure 12: Binary mask tested for astigmatism (cos) aberration in dark field with constraining box. Figure 13: A clear field print of a mask similar to Figure 12. We were able to acquire a set of 0/180 industry PSM layouts, pieces of which are shown in Figures 14 and 15. Not only does the pattern matcher locate areas in the mask layout susceptible to aberration effects, but Figure 14 also demonstrates that it can also locate the points along the edge of the mask regions with the highest light intensity using the IFFT of the unaberrated pupil function. All locations marked in Figure 14 are inside corners, even the small notches in the vertical strips that cannot easily be seen in the picture. The match scores and other identifying text symbols are displayed next to their respective matched patterns. Other aberrations were run over this same layout with patterns such as the coma target shown in Figure 15. Notice how the coma pattern closely resembles the mask layout underneath it. Figure 14: Using the unaberrated pupil function as a pattern, the mask edge regions with the highest light intensity are found to be at the inside corners. Figure 15: Here is one result of matching the coma pattern to a real 0/180 PSM design. This coma pattern has a slightly larger radius than the ones in the above figures, resulting in the partial third ring. One unexpected result was the difference between positive and negative match values for the same pattern. Most of the work on pattern matching was done with the assumption that the “best” match was the one with the highest (most positive) score, S+. However, the lowest (most negative) score, S-, can also result in a large aberration effect on the layout. Without loss of generality, assume that a large positive score for some aberration pattern will cause a line to shift right in the presence of that aberration, while a large negative score results in the line moving to the left. The full effect of the aberration must take into account which score, S+ or S-, causes the line to deviate the most from its unaberrated position. Therefore, the “best” score would be S = max(S+, -S-). As an example, consider the case where the mask layout consists of only 0-degree regions and the pattern pixel values are almost entirely positive. Clearly, the match value will always be positive or zero, meaning that the effect of the aberration will always shift the line to the right. The inverse of this situation will always shift the line to the left. The relative magnitudes of S+ and S- were found to depend on both the relative amounts of 0- and 180-degree phase areas in the mask layout and the sum of all pixel values in the pattern. The even aberrations such as coma are anti-symmetric along some straight line y=mx, their pixel values are equal and opposite in sign along this line, and therefore their pixel sum is zero. Thus S = S+ = -S-, and only one of {S+, S-} must be calculated. However, the odd, rotationally symmetric aberrations such as spherical and defocus have pixel sums that vary with radius. This observation leads to the requirement that two matching runs be used for odd aberrations: one to calculate S +, and another with layer weights negated to determine S-. It would be advantageous to choose a pattern radius such that the pixel sum was zero to avoid having to perform two matching runs, but this is not possible for all aberrations and the required radius might not be convenient. 9. PERFORMANCE The pattern matching software package, especially the core matching binary, was designed for the fastest possible matching runs using as little memory as possible. Each major algorithm was individually timed and profiled, optimized, and recompiled dozens of times. Some of the algorithms were changed several times to increase speed and reduce memory requirements. The combination of partitioning, prefiltering, compression, and conditional code execution led to a very efficient implementation of the matcher. Typical performance values recorded before the final profile-based optimizations are shown in Table 1. Recent improvements since this table was produced have increased the efficiency by an estimated 25% or more. One level of compression was used with each of the test runs listed in Table 1. As shown in the table, all test design runs completed in only seconds and required only a few tens of MB of memory, even designs as large as 87Mpixels. Performance on generated PSM, fastest options, 16MB partition size on 440MHz HP System Layout Size Memory (um) Resolution (um) Matrix Size (x1E6) Partitions Pattern Size Patterns Time (s) (MB) 152.1 x 179.9 0.1 2.7 2 64 x 64 8 6.51 30.6 149.7 x 175.9 0.1 2.6 2 32 x 32 8 39.79 54.2 149.7 x 175.9 0.1 2.6 2 32 x 32 3 6.54 13.5 149.7 x 175.9 0.1 2.6 2 44 x 44 1 0.95 18.8 149.7 x 175.9 0.05 10.4 8 44 x 44 1 2.11 23.1 887.1 x 977.3 0.1 86.7 60 44 x 44 1 22.76 47.3 Table 1: Typical speed and memory performance for the pattern matching software on a 0.35um technology node FPGA interconnect custom layout. Other experimental results showed that a single level of compression reduces the runtime by a factor of two to four, two levels of compression by a factor of up to seven, and three levels of compression by a factor of up to ten. Further levels of compression do not provide a significant speedup due to compression overhead. One level of compression rarely results in missed matches, while more than one level of compression can result in errors if the compression adjustment and correlation factors are set incorrectly. This code can find the effects of residual aberrations in less time than it takes Cadence DFII to flatten the layout for a typical layer in a chip and a full set of 36 Zernike coefficients. It has been estimated that a typical 1cm square chip can be processed with adequate physical accuracy in about an hour if sufficient memory is available. Partitioning reduces the memory required to represent the layer map and cell matrix, allowing a standard workstation to be used. A typical 16MB partition contains about 1.3 million matrix elements, or 1.3Mpixels, which is promptly freed after it’s used in the matching loop. In most cases, single level compression can lead to a significant speedup, while multi-level compression can have even more dramatic results if care is taken to provide the proper parameters. Matching time scales with resolution for line ends and edges, scales with resolution squared for unrestrained areas, and is independent of resolution for corners represented by single points. 10. CONCLUSION A system has been prototyped that links the residual aberrations of the exposure tool back to the design process. This software is extensible to multiple mask levels in either binary or phase-shifting masks. A collection of Cadence SKILL scripts were created to assist in various aspects of the matching process flow. Custom data structures and algorithms were developed and implemented in C++ that allow a good-sized chip to be processed in about one hour. An exploratory 2D recursive compression algorithm gives a further speedup of a factor of four to ten without much loss of accuracy. The data representation and partitioning reduce memory significantly, permitting a large mask to be processed on a standard desktop computer. Even larger chips can be processed by partitioning prior to flattening, running the matcher, and merging the results together. This pattern matching system should make a useful addition to a mask designer’s CAD toolset. Currently, this software is limited to flat layouts on a Manhattan grid of fixed x and y resolution. Layouts not meeting these criteria are snapped to a grid and discretized into vertical and horizontal line segments and rectangles. The compression theory and algorithm are still in progress as the results of compression can be unpredictable. Specifically, an algorithm has yet to be found that consistently achieves a 10X speedup with three levels of compression and near zero error on large designs. In addition, the software could be extended to take advantage of multiprocessor or distributed systems, although this may not be necessary because the overall runtime is short in general. The real limit to design size will probably be either available memory or coordinates exceeding a 32-bit integer. We hope to eventually create or acquire a chip layout large enough to determine which of these limits is hit first, and correct the problem if it is the integer size limit. Additional future work includes quantifying the effects of aberrations in terms of pattern match factors. Simulation through programs such as SPLAT will help in verifying the correctness of the pattern matching theory, but the real work involves printing a mask with a stepper with known lens aberrations and examining the results to determine if they mach the results predicted using the pattern matcher. ACKNOWLEDGEMENTS I would like to thank my research advisor, Prof. Andy Neureuther, for his help on this project and for explaining the background optics and physics to me. I would also like to thank Garth Robins and Kostas Adam for their help with the pattern generation and theory, and Matt Moskewicz for his assistance with the coding involved in the project. This research was jointly sponsored fully under SRC contract 01-MC-460 and DARPA grant MDA972-01-1-0021. REFERENCES 1. Mark Terry, Private Communication, March 2001. 2. C.M. Garza, et. al, “Ring test aberration determination and device lithography correlation,” pp.36-44, Proc. SPIE vol. 4345, pp.36-44, 2000. 3. G. Robins, A.R. Neureuther, K. Adam, “Measuring optical image aberrations with pattern and probe based targets,” J. Vac. Sci. Technol. B, to appear, Nov./Dec., 2001. 4. H. Fukuda, K. Hayano and S. Shirai, “Determination of high-order lens aberrations using phase/amplitude linear algebra,” J. Vac Sci. Technol. B. 17(6), pp. 3318-3321, Nov/Dec 1999. 5. A.R. Neureuther, P. Flanner III and S. Shen, “Coherence of defect interactions with features in optical imaging,” J. Vac. Sci. Technol. B, pp. 308-312, Jan/Feb. 1987. 6. Born and Wolf, Principles of Optics, 7th Ed, Section 9.4, Cambridge Press, 1999. 7. T. A. Brunner, IBM J. Res. Develop, 41, 57, 1997. 8. J. Kirk, “Review of photoresist-based lens evaluation methods,” Proc. SPIE vol. 4000, pp. 2, 2000. 9. K.H. Toh and A.R. Neureuther, Proc SPIE vol. 772, pp. 202, 1987. 10. http://momonga.t.u-tokyo.ac.jp/~ooura/fft.html 11. SEM courtesy of photolithography section of M. Hanratty.