A Test Strategy for Nanoscale Wafer Level Packaged Circuits D.C. Keezer1, J. S. Davis1, S. Ang2, M. Rotaru2 1- Georgia Institute of Technology 2- National University of Singapore Abstract This paper presents a strategy for testing future generations of wafer-level packaged logic devices that have nanoscale I/O structures. The strategy assumes that the devices incorporate built-in self test (BIST) features so that only a subset of the functional I/O needs to be directly accessed during testing. A miniature tester is described that provides test control, pattern sequencing, and critical timing for the test. An interposer is used for electro-mechanical connection between the miniature tester and the device I/O nanostructures. Prototypes of the miniature tester are presented that demonstrate the fundamental ability to control logic transitions with 20ps or better accuracy, as is required to meet the test needs for 5 Gbps signals. A complete 5 Gbps miniature tester is under development using the results obtained in the prototyping phases of the project. 1. Introduction There is no arguement as to the general direction of electronics packaging development. Only the pace of this future development is undetermined. Future packages will continue to provide higher density of I/Os between the logic devices and the next level of system integration. Incorporation of nanoscale structures within these I/O will lead to pin densities of several thousand per cm2 in the short term, and perhaps to ~100,000/cm 2 in the not-todistant future. Likewise, relentless advances in logic performance continue to follow Moore’s law, resulting in ICs with more functionality and higher speeds in each generation. Today on-chip clocks routinely exceed 2 GHz, and I/O rates are measured in hundreds of megabits per second on wide data busses. Serial communication channels are operating above 3 Gbps. Clearly even higher rates are expected. These wonderful properties of future devices provide many opportunities for electronic applications. However, at the same time they create tremendous challenges for testing. Pin denstities of even 1000/cm2 stress existing electrical probe capabilities. The highest performance automated test equipment (ATE) available today support data rates of only 1.6 Gbps (multiplexed to 3.2 Gbps), barely meeting the needs of current devices. Furthermore these testers come at a price on the order of several thousand US dollars per channel, so a 1000 channel ATE costs several million dollars. How then can electronics testing possibly keep up with the scaling trends expected in the future? The question is appropriate for future electronics in general, but is even more critical when applied to nano wafer level packaged devices. The small I/O structures enable tremendous advances in I/O denstity. Furthermore, the smaller geometries tend to reduce many of the electrical parasitics that have limited I/O rates in the past. In fact, one goal of nano wafer level packaging is provide an almost seamless transition between on-chip and off-chip signal speeds. So, for testing nanoscale wafer level packaged devices, we are faced with a problem on at least three fronts: (1) higher pin densities, (2) increased signal speeds, (3) the trend to higher ATE costs. Of course, these problems must be sovled in order to realize the performance and economic advantages promised by these new packaging technologies. This paper proposes a strategy for achieving this. In section 2, the concept of a miniature tester is introduced. Unlike general-purpose ATE, the mini-tester is customized for a particular application. Basically a trade-off is made between flexibility and size/cost to allow the mini-tester to be much smaller and lower-cost than conventional ATE. The mini-tester is also designed to specifically support on-chip built-in self test (BIST) features. It is specifically assumed that the test device will incorporate extensive BIST structures (an assumption not made by conventional ATE). In section 3 we illustrate a staged-development effort to produce such a mini-tester. This begins with the design and characterization of a “Digital Test Core” (DLC). The DLC is a reusable programmable logic structure that provides most of the common interface, sequencing and control functions for the mini-tester. The next step is to add the necessary support electronics to customize the mini-tester for a specific application. One example is shown in section 3, that provides several channels of test stimulus and response-capture with timing resloution of 10ps to 20ps. Section 3 concludes by describing the current effort to produce a 5 Gbps miniature tester. 2. Miniature Tester Concept As introduced in the previous section, a miniature tester is being developed to provide near-autonomous testing of high-performance nanoscale wafer level packaged devices. The miniature tester is customized to specifically support the BIST features contained within a particular device type. Such features include boundary scan, internal scan, internal pattern generators, and internal response analyzers, among others. When these fearures are accessed using the IEEE 1149.1 or 1149.4 standards, only a very limited number of external pins are required. Furthermore, in most cases these few test pins operate at much lower frequencies than the normal device clock rate. By supplimenting these with timing reference signals, precise at-speed testing of both the internal logic and external interfaces is possible. The detailed BIST approach will vary from one device to another. However the need for interface signals supporting the standard test busses, and the general need for precise timing reference signals is common to most BIST strategies. The application of such BIST strategies has become commonplace for system-on-chip (SOC) devices today. In large SOCs, major blocks of logic are often assembled from multiple design sources. When this is done, each design group provides test patterns and sequences for the subcircuits. These tests can be applied to the embedded logic blocks if the global SOC design is partitioned (usually with a scan-design approach). The individual tests can be initiated with standard commands through the IEEE 1149.x standard test busses. This approach is becoming a defacto standard design-for-test (DFT) practice in large SOCs today. It is expected to be applied even more universally for nanoscale wafer-level packaged devices. If the number of independent I/O signals needed for testing is minimized, then a much smaller-scale test system can be used (as compared with traditional ATE). Providing logic to support the IEEE1149.x standard is not difficult, and can be accomplished using readilyavailable programmable logic devices. However, if accurate high-speed tests are required, then at least one (and usually several) timing reference signal is also required. The generation and control of programmable, multigigahertz signals is possible, but not trivial. Figure 1 illustrates a test configuration for nanoscale wafer-level packaged devices. The setup is similar to conventional wafer-probing. Here the wafer-level packaged wafer is placed on a movable wafer “chuck”. A particular die site is then positioned beneith an “interposer.” The interposer is designed to mechanically align the leads of the wafer-level package and to provide a temporary pressure-contact for each electrical signal, including power and ground pins. Within the interposer, the signals are distributed and fanned-out to a larger array on the scale of a micro BGA. The interposer is CTEmatched to that of the Silicon wafer, and is mounted to a conventional multilayer PCB which further distributes the signals to pins or connectors that mate to the mini-tester. The mini-tester controls and globally sequences the various self-tests supported by BIST within the IC. Communications to the external world are provided through the mini-tester using the standard Universal Serial Bus (USB). In operation, the various BIST sequences are applied within the IC, results are communicated back to the minitester, and a final test decision (pass or fail) is sent through the USB to a controlling computer. The wafer chuck is then repositioned to a new die site, and the process repeats (step-and-repeat). Mini-Tester Power & USB DTC Support Interposer PCB Wafer Chuck 5 – 10 Gbps (per signal) Wafer Figure 1. Mini-tester applied to testing at the wafer-level. Another perspective view of this test configuration is shown in Figure 2. Here, the 2-dimensional array of wafer-level packaged devices is illustrated. The simple interface between the minitester and the external world is shown, including the USB communications port, a single (low-noise, continuous) timing reference, and power supply connections. RF Test Support Processor (mini-tester) ~2.5 GHz TSP-N PC USB 5-10 Gbps Interposer PWR Wafer Device-Under-Test (DUT) w/ compliant leads Figure 2. Mini-tester overhead view. Given the simple external connections needed to the minitester, it is easy to imagine replicating the whole apprach of Figures 1 and 2 to form an array of minitesters. Such a parallel test configuration is illustrated in Figure 3. This would permit full at-speed functional testing of many devices in parallel, and greatly increase the test throughput. For nanoscale wafer-level packages the parallel expansion of the probe setup may be limited by the total force necessary to achieve reliable contact to ten/ hundreds of thousands of device leads. Part of the process design development for nanoscale leads therefore includes limitations on the force per-lead required for reliable pressure contact. In order for the step-and-repeat test probe process to be successful, the mechanical alignment and reliable electrical pressure contact reques careful attention to the device lead design (on the wafer) as well as the lead capture mechanism (within the interposer). The solution RF ~2.5Gbps TSP-N-1 PC TSP-N TSP-N+1 USB created to interface to the nano-wafer level packaged circuits. A block diagram for the digital test core is shown in Figure 5. The programmable chip at the center of the digital test core is a Xilinx Virtex-E series field programmable gate array (FPGA). A highlight of the Virtex series is its configurable I/Os that can interface to a wide range of different logic types including low-voltage 5-10 Gbps 6 Power PWR RF Gen Test Application User Defined CLK ~90 Single Ended I/O ~45 Differential I/O Cypress CY7C64613 Wafer PC 2 USB µC 18 8 Figure 3. Parallel (array) testing using multiple mini-testers. MultiLINX IEEE 1149.1 (Optional) 10 12 FPGA XCV300E-FG256 8 36 12 MHz Optional 8Mb SRAM (3.3ns) Crystal PROM Micron MT58L256l36P XC17V02 for this is still being developed. However, some general features are illustrated in Figure 4, including the mechanical alignment structure of the lead capture mechanism. PC B TSP Interposer IC Self-alignm ent features IC IC Com pliant Leads W afer Figure 4. Self-alignment and capture of nanoscale wafer-level packing leads. 3. Miniature tester implementation The miniature tester is based on previous research using a Digital Test Core (DTC). The DTC is a reusable, reconfigurable circuit that can be incorporated into various testing scenarios to augment current test capabilities or to apply new test functions. The DTC includes programmable logic to generate test stimuli and capture output responses. The programmable logic is coupled with programmable I/Os and has provisions for a large data memory. When coupled with additional logic to provide multi-gigahertz speeds and a Universal Serial Bus bridge to a controlling computer, a miniature tester is 1 Figure 5. Digital Test Core Block Diagram. differential PECL. In the current design, the specific chip used is an XCV300E in a 256-pin fine-pitch ball grid array package. The programmable nature of the chip allows test engineers to update and customize tests for future technologies, such as the miniature nano-wafer level tester. A computer is needed to provide test data and analyze results, but the critical testing logic is contained within the miniature tester. A Cypress microcontroller is used to decode and encode the USB signals and provide the data to and from the Xilinx FPGA. Current versions of the DTC use the USB 1.1 standard which operates at 12Mbps. Future versions will use the USB 2.0 standard which operates at a rate of 480 Mbps. A Digital Test Core was constructed in an evaluation board for verification of concept as shown in Figure 6. The I/Os of the Virtex FPGA were verified to operate at speeds up to 622Mbps as specified by the manufacturer. Newer Virtex series FPGAs have I/O speeds up to 840Mbps. While these speeds are high enough to compete with currently available ATE, higher speeds are necessary for some current and future devices. The next version of the DTC was coupled with external high-speed PECL logic, which times and formats the signals, enabling operation at multi-gigahertz speeds. In a specific application, the four-bit parallel output data is transmitted through coaxial connections to four different wavelength lasers, optically combined, and then transmitted over an optical fiber. Incoming data from the fiber optics is converted to four-bit parallel electrical pulses, sampled by the PECL circuits, and transmitted to the DTC. A logic block diagram is shown in Figure 7. CMOS and the high-speed PECL logic. A socket for a PROM was also implemented for automatic programming of the FPGA upon board power-up. The optical component can be removed and the circuit can be used as an electronic multi-gigahertz tester. ATE connector Clock Distribution Oscilloscope connector Tx Data Formatter Tx Outputs Power Connector Xilinx FPGA socket USB microcontroller Tx Delay Generator Optional SRAM USB connector JTAG connector Optional PROM FPGA programming connector Power connector ATE connector 2 Digital Test Core Figure 6. Digital Test Core Photograph [3]. Rx Logic Rx Inputs Figure 8. Multi-gigahertz Tx/Rx Photograph [3]. Divide by 2 Clk 622 MHz Fanout Buffers 8 Clks 14 311 MHz Clk 2 USB Prog. Delays 1-4 Data 8 PECL Word Formatter Logic Phases 1-4 Data & Presence Divide by 2 Delay Program 1.244 GHz 5 After construction of the multi-gigahertz minature tester (shown in Figure 8), the Virtex firmware was programmed to output a series of four-bit words. An oscilloscope was used to measure the outputs of the four channels shown in Figure 9. The output circuitry is set up to transmit the pulses in short bursts. Even though the maximum data rate of the Virtex chips is 622Mbps, the pulse widths can be shortened by the PECL chips because of faster rise and falls times. The measured rise and fall times are 150-200ps, which resulted in a minimum pulse width of about 300ps. This proves the data rate of the PECL chips can exceed 3 Gbps per channel. 8 Presence 2 Trigger 2 2 Control Digital Test Core Data 8 2 PECL Sampling Circuit 8 Presence 2 7 Figure 7. Multi-gigahertz Tx/Rx Block Diagram. Four-bit words are transmitted during each clock cycle along with a presence bit which signals that the data is valid. A trigger bit is included for output to an oscilloscope for characterization of the board. Four digitally-programmable delay PECL chips set variable pulse widths and delays. The delay range is 10ns, and the resolution is 10ps. The four-bit data words are synchronized, while the precense bit is offset to signal when the data is valid. Four high-speed input channels are also provided with an input for a high-speed clock. The receiver channels are designed to operate independantly or in synchronous operation with the transmitter. The data is captured using a PECL register and returned to the DTC. The DTC can perform simple analysis on the data, and can transmit the raw data back to the controlling computer for further analysis. The multi-gigahertz tester was constructed as shown in Figure 8. The DTC was used without the optional SRAM since the on-chip memory in the Virtex FPGA was sufficient for this application. Separate power connections are included to isolate the relatively-noisy Figure 9. Four Data Outputs from the Multigigahertz Tx/Rx. Equally important as short pulse width of the outputs is accurate edge placement. The more precisly the data outputs are placed relative to the presence output, the faster the circuit can operate. If the presence bit operates 11 as a clock to a flip-flop, the setup-and-hold time is minimized with well-timed clock placement. The delay circuits on the board have a resolution of 10ps per step. This means pulse width can be measured with a resolution of 10ps. However, other factors such as jitter can affect the accuracy of the signals. Figure 10 shows the rise edge of the data output being changed in 20ps increments. RF PECL PECL PECL PECL PECL PECL PECL PECL PECL PECL PECL PECL PECL PECL PECL ~2.5 GHz 5 Gbps Logic Core USB (Cypress) PC FPGA (Xilinx) PWR Customized High-speed logic (Clock Distr., Prog. Delays, Mux/deMux) SRAM (Micron) Test Channels 311 Mbps EPROM (Xilinx) 8 Figure 11. Miniature Tester Block Diagram Figure 10. 20ps Edge Placement Resolution The previous instances of the DTC, along with support logic has proven that it is well suited for standalone operation as a miniature tester. With a small physical layout and high performance, multiple instances of the miniature tester can be implemented to test multiple chips on the nano wafer level. Using similar techniques previously described, many outputs can be driven from one minitester. For lower speed testing (<620Mbps) the data can be driven directly from the DTC. For higher speed testing (upwards of 4.8Gbps) the DTC can interface with PECL circuits for signal generation and capture. The PECL logic is customized to provide the highest throughput, including self-calibration logic, similar to techniques described in [2]. Using precise timing measurements, multiple lower speed outputs from the DTC are multiplexed using the PECL logic to generate higher speed outputs. The data can then be recovered using a precisely timed clock. As newer technologies develop, such as SiGe, data rates can exceed 10Gbps using similar techniques. With projected rise and fall times of 40ps, coupled with edge placement resolution of 10ps or less, the mininature tester can be easily upgraded to test the next generation of wafer level packaged circuits. 4. Conclusions In this paper we have introduced the concept of a miniature tester that can be embedded within the probing arrangement for nanoscale wafer-level packaged devices. The mini-tester provides interface to BIST stuctures in the IC through standard IEEE 1149.x test busses. The minitester also provides programmable precision timing references for at-speed test and characterization of the wafer-level packaged IC. The relatively simple nature of the mini-tester external interface suggests the possibility for parallel testing of arrays of ICs to increase test throughput in production settings. Acknowledgments This work was supported in part by the National Science Foundation under contract EEC-9402723. The National Science Foundation supports the Georgia Tech Packaging Research Center (PRC) as a NSF Engineering Research Center. References 1. D. C. Keezer, Q. Zhou, “Test Support Processors for Enhanced Testability of High-Performance Circuits,” Proc. of the Int’l Test Conf., Baltimore, MD, October 1999, pp 801-809. 2. D.C. Keezer, Q. Zhou, C. Bair, J. Kuan, B. Poole, “Terabit per Second Automated Digital Testing”, Proc. of the Int’l Test Conf., October 2001, pp 11431151. 3. J.S. Davis, D.C. Keezer, “Multi-Purpose Digital Test Core Utilizing Programmable Logic,” Proc. of the Int’l Test Conf., Baltimore, MD, October 2002, pp. 438-445.