Design Considerations for 3D Heterogeneous Integration Driven Analog Processing-in-Pixel for Extreme-Edge Intelligence Zihan Yin Gourav Datta Md Abdullah-Al Kaiser Electrical & Computer Engineering University of Wisconsin-Madison Madison, USA zyin83@wisc.edu Ming Hsieh Department of Electrical and Computer Engineering University of Southern California Los Angeles, USA gdatta@usc.edu Ming Hsieh Department of Electrical and Computer Engineering University of Southern California Los Angeles, USA mdabdull@usc.edu Peter Beerel Ajey Jacob Akhilesh Jaiswal Ming Hsieh Department of Electrical and Computer Engineering University of Southern California Los Angeles, USA pabeerel@usc.edu Information Sciences Institute University of Southern California Marina Del Rey, USA ajey@isi.edu Electrical & Computer Engineering University of Wisconsin-Madison Madison, USA akhilesh.jaiswal@wisc.edu Abstract—Given the progress in computer vision, image sensors are broadening their capabilities, which requires adding data processing close to or within the pixel chips. In this context, in-pixel computing has emerged as a notable paradigm, offering the capability to process data within the pixel unit itself. Interestingly, state-of-art in-pixel paradigms rely on high-density 3D heterogeneous integration to establish a per-pixel connection with vertically aligned analog processing units. This article provides a comprehensive review of the most recent developments in in-pixel computing and its relation to 3D heterogeneous integration. It offers an in-depth examination of innovative circuit design, adaptations in algorithms, and the challenges in 3D integration technology for sensor chips, thereby presenting a holistic perspective on the future trajectory of in-pixel computing driven by advances in 3D integration. Index Terms—3D integration, in-pixel computing, edge computing, CMOS image sensors, Cu–Cu connections. I. I NTRODUCTION High-resolution, high-frame-rate cameras have millions of fast-response pixels that produce and relay substantial data to the back-end processors over energy and bandwidthconstrained channels. Further, prevalent AI-enabled computer vision (CV) applications today [1], [2] require fast processing of the image data. This spatial division between sensors and processors results in energy and bandwidth bottlenecks in existing CV systems. Addressing this concern, efforts have been made to process and compress data nearer to the sensor to decrease the transferable data volume. 3D heterogeneous integration has enabled higher levels of computation integration while reducing area overhead in camera technology. 3D heterogeneous integration involves vertically stacking disparate materials and devices in a three-dimensional config- uration, allowing for the close coupling of diverse components such as sensors, processors, and memory modules. For camera systems, this means a potential leap in performance and functionality. By integrating the CMOS image sensor (CIS) with other essential components in a 3D stack, data can be processed more rapidly, reducing latency. Additionally, this layered approach can lead to cameras with smaller footprints and greater energy efficiency. As a result, 3D heterogeneous integration holds promise for compact cameras with unprecedented computational power, making real-time advanced image processing and analysis more feasible and accessible. In recent years, advancements in 3D integration technology for image sensor chips have been significant, as depicted in Fig. 1. The evolution from initially integrating all components on a single 2D chip, incurring considerable extra overhead area, progressed to chip stacking, wafer-on-wafer, and chipon-wafer methodologies. Among these, Through Silicon Via (TSV) emerges as one of the most area-efficient techniques, offering notably higher interconnect and device density, and consequently, shorter connection lengths. However, the advent and application of Copper-Copper (Cu-Cu) bonding have revealed even greater potential for area conservation. Unlike TSVs, which require establishing connections on the circuit’s periphery, Cu-Cu connection enables direct connections under the pixel circuit itself. This not only reduces the distance between the pixel and peripheral circuits but also facilitates each pixel circuit unit in establishing a direct connection to a circuit block directly beneath it. Noteworthy advancements have been made by Sony and Samsung, who have leveraged Cu-Cu connection technology to further expand the boundaries of 3D integration [3], [4]. Fig. 1: Recent years 3D integration technology evolution. To process and compress data closer to the sensor, typically three approaches have been developed, distinguished by the proximity of the processing unit to the data generation site within the sensor: 1) Near-Sensor Processing: Here, the processor for the data generated by the sensor is situated near the CMOS image sensor (CIS) chip, enhancing energy and bandwidth efficiency by reducing data transfer costs between the sensor and processor [5], [6]. However, in this approach, a significant distance between the sensor and the processor still exists as they are on separate chips. 2) In-Sensor Processing: This approach integrates an analog or digital signal processor within the sensor chip’s periphery circuit, thereby shortening the distance between the processor and the sensor [7]. Although more efficient than conventional systems, the data transfer bottlenecks between the sensor and periphery or the back-end processor remain noteworthy. 3) In-Pixel Processing: This strategy augments the pixel circuit within the sensor to enable basic computations along each row or column of the pixel array, enabling early processing before any data transmission to output. This significantly reduces the bandwidth and the associated energy and latency of the data being sent from the sensor to the processor. This article is particularly focused on in-pixel processing for advancing extreme-edge intelligence, a domain where several pieces of research have recently been documented [8]–[11]. Complex machine learning tasks demand functionalities like multi-bit, multi-channel convolution, batch normalization, and ReLU functions. A range of works can be found in the literature [8]–[11]. A recent study [12] has introduced a multi-bit, multi-channel, weight-embedded in-pixel methodology, reporting a significant improvement in energy-delay-product (EDP) on the high-resolution Visual Wake Words (VWW) dataset. Another research [13] evaluated in-pixel processing solutions on the large-scale BDD100K self-driving car dataset. Works [14]–[16] has also introduced reconfigurability of weights for diverse applications by using non-volatile memory (NVM) like RRAM for weight implementation. These works show great potential for further development and rely on 3D integration to enable massively parallel processing-in-pixel. In this paper, we will focus on this new 3D-integrated analog pixel computing paradigm of extreme-edge computing. In-pixel processing requires careful consideration at the technology, circuit, and algorithm levels. Incorporating computations such as matrix-vector multiplication within the pixel array could decrease pixel density. However, heterogeneous 3D integration permits vertical alignment of logic or memory layers with CIS. To maintain algorithmic accuracy, a high number of channels and a small stride are essential, potentially increasing the weights per pixel and limiting the CIS resolution. While advanced nodes with lower areas may fulfill these demands, the large pitch of 3D integration connections might offset any area benefits. Additionally, algorithm specifics and hardware setups significantly impact the latency and energy of the in-pixel approach. Successfully creating a chip for inpixel-in computing involves integrating innovative layout and circuit design with a tailored algorithm. This paper provides a detailed examination of these three aspects, focusing on circuit design, algorithm optimization, and novel 3D chip design. II. E DGE COMPUTING : CIRCUIT DESIGN For in-pixel computing, the P2 M circuit is proposed in [14]– [16], as shown in Fig. 2. In convolutional neural networks (CNNs), the primary layer entails multiplying pixel outputs from cameras with multi-bit weight values [17]. The P2 M framework embeds weights within the pixel array through various 3D integration technologies, such as those delineated by [18] and [19], enabling vertical stacking of weights. Additionally, these weights can be allocated to either the geometry of the CMOS transistor or the resistance state of specific non-volatile memory (NVM) devices, including but not limited to, Resistive Random Access Memory (RRAM), Phase Change Memory (PCM), and Magnetic Random Access Memory (MRAM) [20]. From a circuit perspective, as depicted in Fig. 3(a), the weight values are encoded through either the NVM resistance states or varied gate widths of transistors. Multiple RRAMs (MW i , where i extends from 1 to NC ) at each pixel circuit’s ”S” node represent different kernel weights in the output feature map of the initial convolutional layer. Each RRAM-based weight could be individually activated by the series-connected transistor MEN . The convolution operations are carried out by sequentially activating multiple rows and connecting the corresponding bitlines inside the pixel array for each channel, dependent on layer configurations. Further insights into read operations, BN, and ReLU implementations utilizing Single Slope (SS) ADC are documented in [12]. Fig. 3(b) offers a comparison between normalized simulated convolutional outputs and the ideal results, conducted using the GF 22nm FD-SOI node. A scatter plot, also demonstrated in (a) (b) Fig. 3: RRAM-based Weight circuit techniques and simulated output for P2 M-enabled CIS. (a) Weight embedded pixel circuit, and (b) a scatter plot comparing the simulated convolutional results (normalized VOUT ) with ideal convolutional results (normalized weight×input, W×I) . process facilitated by the CDS operation. III. P ROCESSING - IN - PIXEL : A LGORITHM Fig. 2: Overall P2 M-enabled CIS system. (a) Back-side illuminated CMOS image sensor (BI-CIS) die, (b) weightcontaining die, (c) pixel circuit, (d) multi-bit multi-channel positive and negative weight banks (mapped into transistor’s width (CMOS), or the resistance state (NVM)), (e) SS-ADC performing the ReLU and part of BN operations. the figure, is generated by examining a diverse array of weight (varied gate widths) and input light intensity (photodiode current) values for a 3×3x3 kernel size. A discernible deviation between the simulated (represented by a fitted solid red line) and ideal results (solid red line) can be identified, attributed to the circuit’s inherent non-linearity. Integrating this nonlinearity into the training algorithm is essential, calling for the substitution of the ideal convolution function with a non-linear custom convolution function, a topic that will be expanded upon in the algorithm section of the paper. For the realization of multi-channel convolution operations, associating multiple weights with each pixel is fundamental. Additionally, ensuring high testing accuracy requires addressing both positive and negative weight values, calling for the implementation of specialized circuit techniques to effectively interpret and process these values. Expanding on this, [12] introduces an inventive strategy where the peripheral Single-Slope (SS) ADC is utilized to accumulate the MAC results corresponding to both positive and negative weights. The approach is characterized by a distinctive mechanism ‘up-counting’ for positive weights and ‘down-counting’ for negative weights—to determine the final convolution output. Furthermore, the integrated correlated double sampling circuit (CDS) inherent in CISs, in conjunction with the SS ADC, is used to perform the Rectified Linear Unit (ReLU) operation. This application ensures that the non-linear activation, a fundamental aspect of CNNs, yields final count values from the ADC that remain either positive or zero (representing ReLu operation), following the sequential up-and-down counting From an algorithmic perspective, a Convolutional Neural Network (CNN) initiates with a linear convolution operation, succeeded by batch normalization (BN) and non-linear (ReLU) activation. To integrate the in-pixel computing paradigm, where traditionally the processing unit obtained the accumulated output of the image data, this procedure can be divided into two primary segments: 1) Simulation of hardware computation through a mathematical model. 2) Integration of the remaining CNN algorithm with the newly acquired input data. The P2 M circuit scheme, as explained in the previous section, implements convolution operation in the analog domain using modified memory-embedded (using NVM like RRAM, CMOS transistors) pixels. The fundamental components of these pixels are transistors, which are inherently non-linear devices. Consequently, generally speaking, any analog convolution circuit composed of transistor devices will display non-ideal non-linear characteristics concerning the convolution operation. Many existing works, specifically in the domain of memristive analog dot product operation, ignore non-idealities arising from non-linear transistor devices [21], [22]. However, to test out the performance of the new hardware computation model, the non-linearity that occurred with the implementation of the hardware computation needs to be taken into account. To capture these non-linearities, the P2 M concerned papers [14]–[16] have performed extensive simulations of the novel pixel circuit spanning wide range of circuit parameters such as the width of weight transistors and the photodiode current based on commercial 22nm GlobaFoundries transistor technology node. The resulting SPICE results, i.e. the pixel output voltages corresponding to a range of weights and photodiode currents, are modeled using a behavioral curve-fitting function. The generated function is then included in the algorithmic framework, replacing the convolution operation in the first layer of the network. Specifically, within the simulation model of P2 M, the algorithm accumulates the output of each curvefitting function designated for every pixel in the receptive field. To illustrate, given 3 input channels and a kernel size of 5×5, the resultant receptive field size is 75. This framework models each inner product accumulated by the in-pixel convolutional layer. Subsequently, this algorithmic structure has been employed to optimize the training of CNNs for datasets such as VWW and BDD100K. A. Algorithmic Adjustments for Optimizing P2 M Circuit Scheme Performance The P2 M circuit scheme enhances parallelism and reduces data bandwidth by simultaneously activating multiple pixels and executing several parallel analog convolution operations for a specific channel in the output feature map. This process is serially repeated for each channel, boosting parallelism, minimizing bandwidth, and improving both energy efficiency and speed. However, expanding the number of channels in the initial layer intensifies the serial nature of the convolution, which in turn affects the aforementioned benefits. This situation introduces a complex circuit-algorithm trade-off. To address this, the backbone CNN needs to be optimized for larger kernel sizes—which increases the concurrent activation of more pixels, aiding parallelism—and non-overlapping strides—to diminish dimensionality in subsequent CNN layers, thereby reducing the number of operations and peak memory usage. Additionally, maintaining a smaller number of channels is essential to lessen serial operations for each channel while preserving competitive classification accuracy and considering the imperfections linked with analog convolution operations. Reducing the number of channels also decreases the number of weight transistors within each pixel, leading to improvements in both area and power consumption. The subsequent smaller output activation map, resulting from the decreased number of channels and larger kernel sizes with non-overlapping strides, cuts down the energy used in transmitting data from the CIS to the downstream CNN processing unit. This also reduces the number of floating-point operations in the downstream layers, consequently lowering energy consumption. IV. I MPROVING STACKED DEVICE TECHNOLOGY 3D integration is the critical technology driver for enabling in-pixel computing without degrading pixel density. 3D integration with respect to camera technology has been exploited for various photography use cases. With the progress in inpixel computing, 3D integration is set to lead to disruptive advances in enabling intelligence inside camera pixels. In previous papers like [23] TSVs has reduced the area over traditional 2D chip however the leftovers is still huge as the TSV can only be built on the outside of the pixel arrays. Recently advanced stacking technology has started to use fine-pitch Copper-Copper (Cu–Cu) connections which enables pixelparallel architecture. In the TSV configuration, signal lines are routed to the logic layer situated at the periphery of the pixel array. Contrarily, Cu–Cu connections can be integrated directly beneath the pixels, facilitating an increment in the number of linkages, as shown in Fig. 1. This distinction highlights the inherent versatility and adaptability of Cu–Cu connections in Fig. 4: Misalignment reference pattern between two Cu Pads. enhancing interconnectivity within the configuration. Samsung has recently given out a design [3] that used Cu-Cu bonding for which each pixel in the chip has two small-pitch Cu-toCu interconnections for the wafer-level stacking to connect to the pixel-wise ADC and an in-pixel digital memory, and the pitch of each unit pixel is less than 5µm, also Sony [4] has successfully launched their image sensor with 3µm pitch 3M Cu-Cu bonding chips, which proves the industry ultrafine pitch Cu-Cu connections are with sufficient electrical properties and reliabilities. So with Cu-Cu bonding, for the P2 M circuit, the weights are implemented in a separate die heterogeneously integrated with the back-side-illuminated (BI) CMOS Image Sensor (CIS) die. The weights of each pixel need to be stacked vertically and aligned with the pixel pitch to achieve no area overhead. A. Challenges and Considerations in Pixel Unit Design with Cu-Cu Bonding When designing the pixel unit with Cu-Cu bonding in consideration, several factors require attention. Kernel size, the number of strides, and output channels can constrain the minimum pixel pitch size. Additionally, 3D integration technology imposes its minimum bond pitch requirements. As noted in [15], the minimum pixel pitch of P2 M-enabled CIS may be limited either by the area of the weight transistors (when the number of channels and kernel size are large, and the stride is small) or the bond pitch, depending on which is larger. Cu-Cu interconnects, integral to every pixel unit, are susceptible to numerous structural defects due to the fabrication process, including misalignment, as illustrated in Fig. 4. These defects require thorough testing to ascertain the performance of 3D-ICs. Misalignments, caused by factors such as translation, rotation, and the run-out effect (related to wafer expansion due to thermal stress), can significantly increase resistance and capacitance between nodes. However, as detailed in [24], advancements in commercial bond alignment tools are minimizing these effects. Moreover, recent developments in image sensor chips by Sony [4] and Samsung [3], which employ pixel-unit Cu-Cu bonding for under-pixel-unit ADC and DRAMs, suggest that a certain degree of misalignment is tolerable, resulting in acceptable variations in capacitance and resistance. Thus, misalignment is not anticipated to be a bottleneck in achieving in-pixel computing in this regard. Given the high density of both pixels and Cu-Cu bondings, thermal modeling is also crucial. However, as metal lines and vias serve as heat spreaders, distributing heat in alternate directions, and as discussed in [25], managing thermal effects in 3D ICs appears to be feasible. V. C ONCLUSION This article has provided an extensive review of the latest advancements in extreme edge computing, focusing particularly on in-pixel processing and innovations in stacking chip design. The pioneering P2 M circuit has demonstrated significant promise, showcasing its potential to decrease bandwidth and enhance processing speed, attributed to its capability to perform parallel matrix multiplication within the pixel array. Looking forward, the implementation of Cu–Cu bonding is pivotal for the future deployment of the P2 M circuit, as it facilitates the incorporation of pixel-wise weight block, thereby enabling computation within the pixel itself. This development underscores the continual evolution and potential of in-pixel computing in addressing the challenges and demands of advanced imaging technologies. R EFERENCES [1] Y. Chen, H. Dai, and Y. Ding, “Pseudo-stereo for monocular 3d object detection in autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 887–897, 2022. [2] L. Jiao, R. Zhang, F. Liu, S. Yang, B. Hou, L. Li, and X. Tang, “New generation deep learning for video object detection: A survey,” IEEE Transactions on Neural Networks and Learning Systems, 2021. [3] M.-W. Seo, M. Chu, H.-Y. Jung, S. Kim, J. Song, J. Lee, S.-Y. Kim, J. Lee, S.-J. Byun, D. Bae, M. Kim, G.-D. Lee, H. Shim, C. Um, C. Kim, I.-G. Baek, D. Kwon, H. Kim, H. Choi, J. Go, J. Ahn, J. Lee, C. Moon, K. Lee, and H.-S. Kim, “A 2.6 e-rms low-random-noise, 116.2 mw lowpower 2-mp global shutter cmos image sensor with pixel-level adc and in-pixel memory,” in 2021 Symposium on VLSI Circuits, pp. 1–2, 2021. [4] Y. Kagawa, S. Hida, Y. Kobayashi, K. Takahashi, S. Miyanomae, M. Kawamura, H. Kawashima, H. Yamagishi, T. Hirano, K. Tatani, H. Nakayama, K. Ohno, H. Iwamoto, and S. Kadomura, “The scaling of cu-cu hybrid bonding for high density 3d chip stacking,” in 2019 Electron Devices Technology and Manufacturing Conference (EDTM), pp. 297–299, 2019. [5] R. Eki, S. Yamada, H. Ozawa, H. Kai, K. Okuike, H. Gowtham, H. Nakanishi, E. Almog, Y. Livne, G. Yuval, et al., “9.6 a 1/2.3 inch 12.3 mpixel with on-chip 4.97 tops/w cnn processor back-illuminated stacked cmos image sensor,” in 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64, pp. 154–156, IEEE, 2021. [6] F. Zhou and Y. Chai, “Near-sensor and in-sensor computing,” Nature Electronics, vol. 3, no. 11, pp. 664–671, 2020. [7] M. Lefebvre, L. Moreau, R. Dekimpe, and D. Bol, “7.7 a 0.2-to-3.6 tops/w programmable convolutional imager soc with in-sensor currentdomain ternary-weighted mac operations for feature extraction and region-of-interest detection,” in 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64, pp. 118–120, IEEE, 2021. [8] H. Xu, N. Lin, L. Luo, Q. Wei, R. Wang, C. Zhuo, X. Yin, F. Qiao, and H. Yang, “Senputing: An ultra-low-power always-on vision perception chip featuring the deep fusion of sensing and computing,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 1, pp. 232–243, 2021. [9] L. Bose, P. Dudek, S. J. Chen, Jand C, and W. W. Mayol-Cuevas, “Fully embedding fast convolutional networks on pixel processor arrays,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX 16, pp. 488–503, Springer, 2020. [10] T. Hsu, Y. Chen, R. Liu, C. Lo, K. Tang, M. Chang, and C. Hsieh, “A 0.5v real-time computational cmos image sensor with programmable kernel for feature extraction,” IEEE Journal of Solid-State Circuits, vol. 56, no. 5, pp. 1588–1596, 2020. [11] R. Song, K. Huang, Z. Wang, and H. Shen, “A reconfigurable convolution-in-pixel cmos image sensor architecture,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 10, pp. 7212–7225, 2022. [12] G. Datta, S. Kundu, Z. Yin, J. Lakkireddy, R Mathai, A. Jacob, P. Beerel, and A. Jaiswal, “A processing-in-pixel-in-memory paradigm for resource-constrained tinyml applications,” Scientific Reports, vol. 12, no. 1, p. 14396, 2022. [13] G. Datta, S. Kundu, Z. Yin, J. Mathai, Z. Liu, Z. Wang, M. Tian, S. Lu, R. Lakkireddy, et al., “P 2 m-detrack: Processing-in-pixel-in-memory for energy-efficient and real-time multi-object detection and tracking,” in 2022 IFIP/IEEE 30th International Conference on Very Large Scale Integration (VLSI-SoC), pp. 1–6, IEEE, 2022. [14] M. A.-A. Kaiser, G. Datta, Z. Wang, A. P. Jacob, P. A. Beerel, and A. R. Jaiswal, “Neuromorphic-p2m: processing-in-pixel-in-memory paradigm for neuromorphic image sensors,” Frontiers in Neuroinformatics, vol. 17, p. 1144301, 2023. [15] M. A.-A. Kaiser, G. Datta, S. Sarkar, S. Kundu, Z. Yin, M. Garg, A. P. Jacob, P. A. Beerel, and A. R. Jaiswal, “Technology-circuitalgorithm tri-design for processing-in-pixel-in-memory (p2m),” arXiv preprint arXiv:2304.02968, 2023. [16] G. Datta, S. Kundu, Z. Yin, R. T. Lakkireddy, J. Mathai, A. P. Jacob, P. A. Beerel, and A. R. Jaiswal, “A processing-in-pixel-in-memory paradigm for resource-constrained tinyml applications,” Scientific Reports, vol. 12, no. 1, p. 14396, 2022. [17] H. Nam and B. Han, “Learning multi-domain convolutional neural networks for visual tracking,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4293–4302, 2016. [18] M. Seo, M. Chu, H. Jung, S. Kim, J. Song, J. Lee, S. Kim, J. Lee, S. Byun, D. Bae, et al., “A 2.6 e-rms low-random-noise, 116.2 mw lowpower 2-mp global shutter cmos image sensor with pixel-level adc and in-pixel memory,” in 2021 Symposium on VLSI Technology, pp. 1–2, IEEE, 2021. [19] Y. Kagawa, N. Fujii, K. Aoyagi, Y. Kobayashi, S. Nishi, N. Todaka, S. Takeshita, J. Taura, H. Takahashi, Y. Nishimura, et al., “Novel stacked cmos image sensor with advanced cu2cu hybrid bonding,” in 2016 IEEE International Electron Devices Meeting (IEDM), pp. 8–4, IEEE, 2016. [20] S. Tabrizchi, A. Nezhadi, S. Angizi, and A. Roohi, “Appcip: Energyefficient approximate convolution-in-pixel scheme for neural network acceleration,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2023. [21] S. Jain, A. Sengupta, K. Roy, and A. Raghunathan, “RxNN: A framework for evaluating deep neural networks on resistive crossbars,” Trans. Comp.-Aided Des. Integ. Cir. Sys., vol. 40, p. 326–338, feb 2021. [22] C. Lammie and M. R. Azghadi, “Memtorch: A simulation framework for deep memristive cross-bar architectures,” in 2020 IEEE International Symposium on Circuits and Systems (ISCAS), vol. 1, pp. 1–5, 2020. [23] S. Sukegawa, T. Umebayashi, T. Nakajima, H. Kawanobe, K. Koseki, I. Hirota, T. Haruta, M. Kasai, K. Fukumoto, T. Wakano, K. Inoue, H. Takahashi, T. Nagano, Y. Nitta, T. Hirayama, and N. Fukushima, “A 1/4-inch 8mpixel back-illuminated stacked cmos image sensor,” in 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, pp. 484–485, 2013. [24] I. Jani, D. Lattard, P. Vivet, L. Arnaud, and E. Beigné, “Misalignment analysis and electrical performance of high density 3d-ic interconnects,” in 2019 International 3D Systems Integration Conference (3DIC), pp. 1– 4, 2019. [25] P. Leduc, F. de Crecy, M. Fayolle, B. Charlet, T. Enot, M. Zussy, B. Jones, J.-C. Barbe, N. Kernevez, N. Sillon, S. Maitrejean, D. Louis, and G. Passemard, “Challenges for 3d ic integration: bonding quality and thermal management,” in 2007 IEEE International Interconnect Technology Conferencee, pp. 210–212, 2007.