Uploaded by Zihan Yin

3D Integration for Analog Processing-in-Pixel

advertisement
Design Considerations for 3D Heterogeneous
Integration Driven Analog Processing-in-Pixel for
Extreme-Edge Intelligence
Zihan Yin
Gourav Datta
Md Abdullah-Al Kaiser
Electrical & Computer Engineering
University of Wisconsin-Madison
Madison, USA
zyin83@wisc.edu
Ming Hsieh Department of
Electrical and Computer Engineering
University of Southern California
Los Angeles, USA
gdatta@usc.edu
Ming Hsieh Department of
Electrical and Computer Engineering
University of Southern California
Los Angeles, USA
mdabdull@usc.edu
Peter Beerel
Ajey Jacob
Akhilesh Jaiswal
Ming Hsieh Department of
Electrical and Computer Engineering
University of Southern California
Los Angeles, USA
pabeerel@usc.edu
Information Sciences Institute
University of Southern California
Marina Del Rey, USA
ajey@isi.edu
Electrical & Computer Engineering
University of Wisconsin-Madison
Madison, USA
akhilesh.jaiswal@wisc.edu
Abstract—Given the progress in computer vision, image sensors are broadening their capabilities, which requires adding
data processing close to or within the pixel chips. In this
context, in-pixel computing has emerged as a notable paradigm,
offering the capability to process data within the pixel unit itself.
Interestingly, state-of-art in-pixel paradigms rely on high-density
3D heterogeneous integration to establish a per-pixel connection
with vertically aligned analog processing units. This article
provides a comprehensive review of the most recent developments
in in-pixel computing and its relation to 3D heterogeneous
integration. It offers an in-depth examination of innovative circuit
design, adaptations in algorithms, and the challenges in 3D
integration technology for sensor chips, thereby presenting a
holistic perspective on the future trajectory of in-pixel computing
driven by advances in 3D integration.
Index Terms—3D integration, in-pixel computing, edge computing, CMOS image sensors, Cu–Cu connections.
I. I NTRODUCTION
High-resolution, high-frame-rate cameras have millions of
fast-response pixels that produce and relay substantial data
to the back-end processors over energy and bandwidthconstrained channels. Further, prevalent AI-enabled computer
vision (CV) applications today [1], [2] require fast processing
of the image data. This spatial division between sensors and
processors results in energy and bandwidth bottlenecks in
existing CV systems. Addressing this concern, efforts have
been made to process and compress data nearer to the sensor
to decrease the transferable data volume. 3D heterogeneous integration has enabled higher levels of computation integration
while reducing area overhead in camera technology.
3D heterogeneous integration involves vertically stacking
disparate materials and devices in a three-dimensional config-
uration, allowing for the close coupling of diverse components
such as sensors, processors, and memory modules. For camera
systems, this means a potential leap in performance and
functionality. By integrating the CMOS image sensor (CIS)
with other essential components in a 3D stack, data can be
processed more rapidly, reducing latency. Additionally, this
layered approach can lead to cameras with smaller footprints
and greater energy efficiency. As a result, 3D heterogeneous
integration holds promise for compact cameras with unprecedented computational power, making real-time advanced image processing and analysis more feasible and accessible. In
recent years, advancements in 3D integration technology for
image sensor chips have been significant, as depicted in Fig.
1. The evolution from initially integrating all components
on a single 2D chip, incurring considerable extra overhead
area, progressed to chip stacking, wafer-on-wafer, and chipon-wafer methodologies. Among these, Through Silicon Via
(TSV) emerges as one of the most area-efficient techniques,
offering notably higher interconnect and device density, and
consequently, shorter connection lengths. However, the advent
and application of Copper-Copper (Cu-Cu) bonding have
revealed even greater potential for area conservation. Unlike
TSVs, which require establishing connections on the circuit’s
periphery, Cu-Cu connection enables direct connections under
the pixel circuit itself. This not only reduces the distance
between the pixel and peripheral circuits but also facilitates
each pixel circuit unit in establishing a direct connection to
a circuit block directly beneath it. Noteworthy advancements
have been made by Sony and Samsung, who have leveraged
Cu-Cu connection technology to further expand the boundaries
of 3D integration [3], [4].
Fig. 1: Recent years 3D integration technology evolution.
To process and compress data closer to the sensor, typically
three approaches have been developed, distinguished by the
proximity of the processing unit to the data generation site
within the sensor:
1) Near-Sensor Processing: Here, the processor for the data
generated by the sensor is situated near the CMOS image
sensor (CIS) chip, enhancing energy and bandwidth efficiency by reducing data transfer costs between the sensor
and processor [5], [6]. However, in this approach, a
significant distance between the sensor and the processor
still exists as they are on separate chips.
2) In-Sensor Processing: This approach integrates an analog or digital signal processor within the sensor chip’s
periphery circuit, thereby shortening the distance between the processor and the sensor [7]. Although more
efficient than conventional systems, the data transfer
bottlenecks between the sensor and periphery or the
back-end processor remain noteworthy.
3) In-Pixel Processing: This strategy augments the pixel
circuit within the sensor to enable basic computations
along each row or column of the pixel array, enabling
early processing before any data transmission to output.
This significantly reduces the bandwidth and the associated energy and latency of the data being sent from the
sensor to the processor.
This article is particularly focused on in-pixel processing for
advancing extreme-edge intelligence, a domain where several
pieces of research have recently been documented [8]–[11].
Complex machine learning tasks demand functionalities like
multi-bit, multi-channel convolution, batch normalization, and
ReLU functions. A range of works can be found in the literature [8]–[11]. A recent study [12] has introduced a multi-bit,
multi-channel, weight-embedded in-pixel methodology, reporting a significant improvement in energy-delay-product (EDP)
on the high-resolution Visual Wake Words (VWW) dataset.
Another research [13] evaluated in-pixel processing solutions
on the large-scale BDD100K self-driving car dataset. Works
[14]–[16] has also introduced reconfigurability of weights for
diverse applications by using non-volatile memory (NVM) like
RRAM for weight implementation. These works show great
potential for further development and rely on 3D integration to
enable massively parallel processing-in-pixel. In this paper, we
will focus on this new 3D-integrated analog pixel computing
paradigm of extreme-edge computing.
In-pixel processing requires careful consideration at the
technology, circuit, and algorithm levels. Incorporating computations such as matrix-vector multiplication within the pixel
array could decrease pixel density. However, heterogeneous
3D integration permits vertical alignment of logic or memory
layers with CIS. To maintain algorithmic accuracy, a high
number of channels and a small stride are essential, potentially
increasing the weights per pixel and limiting the CIS resolution. While advanced nodes with lower areas may fulfill these
demands, the large pitch of 3D integration connections might
offset any area benefits. Additionally, algorithm specifics and
hardware setups significantly impact the latency and energy
of the in-pixel approach. Successfully creating a chip for inpixel-in computing involves integrating innovative layout and
circuit design with a tailored algorithm. This paper provides a
detailed examination of these three aspects, focusing on circuit
design, algorithm optimization, and novel 3D chip design.
II. E DGE COMPUTING : CIRCUIT DESIGN
For in-pixel computing, the P2 M circuit is proposed in [14]–
[16], as shown in Fig. 2. In convolutional neural networks
(CNNs), the primary layer entails multiplying pixel outputs
from cameras with multi-bit weight values [17].
The P2 M framework embeds weights within the pixel array
through various 3D integration technologies, such as those
delineated by [18] and [19], enabling vertical stacking of
weights. Additionally, these weights can be allocated to either
the geometry of the CMOS transistor or the resistance state
of specific non-volatile memory (NVM) devices, including but
not limited to, Resistive Random Access Memory (RRAM),
Phase Change Memory (PCM), and Magnetic Random Access
Memory (MRAM) [20].
From a circuit perspective, as depicted in Fig. 3(a), the
weight values are encoded through either the NVM resistance states or varied gate widths of transistors. Multiple
RRAMs (MW i , where i extends from 1 to NC ) at each pixel
circuit’s ”S” node represent different kernel weights in the
output feature map of the initial convolutional layer. Each
RRAM-based weight could be individually activated by the
series-connected transistor MEN . The convolution operations
are carried out by sequentially activating multiple rows and
connecting the corresponding bitlines inside the pixel array
for each channel, dependent on layer configurations. Further
insights into read operations, BN, and ReLU implementations
utilizing Single Slope (SS) ADC are documented in [12].
Fig. 3(b) offers a comparison between normalized simulated
convolutional outputs and the ideal results, conducted using the
GF 22nm FD-SOI node. A scatter plot, also demonstrated in
(a)
(b)
Fig. 3: RRAM-based Weight circuit techniques and simulated
output for P2 M-enabled CIS. (a) Weight embedded pixel
circuit, and (b) a scatter plot comparing the simulated convolutional results (normalized VOUT ) with ideal convolutional
results (normalized weight×input, W×I) .
process facilitated by the CDS operation.
III. P ROCESSING - IN - PIXEL : A LGORITHM
Fig. 2: Overall P2 M-enabled CIS system. (a) Back-side illuminated CMOS image sensor (BI-CIS) die, (b) weightcontaining die, (c) pixel circuit, (d) multi-bit multi-channel
positive and negative weight banks (mapped into transistor’s
width (CMOS), or the resistance state (NVM)), (e) SS-ADC
performing the ReLU and part of BN operations.
the figure, is generated by examining a diverse array of weight
(varied gate widths) and input light intensity (photodiode
current) values for a 3×3x3 kernel size. A discernible deviation
between the simulated (represented by a fitted solid red line)
and ideal results (solid red line) can be identified, attributed
to the circuit’s inherent non-linearity. Integrating this nonlinearity into the training algorithm is essential, calling for the
substitution of the ideal convolution function with a non-linear
custom convolution function, a topic that will be expanded
upon in the algorithm section of the paper.
For the realization of multi-channel convolution operations,
associating multiple weights with each pixel is fundamental. Additionally, ensuring high testing accuracy requires addressing both positive and negative weight values, calling
for the implementation of specialized circuit techniques to
effectively interpret and process these values. Expanding on
this, [12] introduces an inventive strategy where the peripheral
Single-Slope (SS) ADC is utilized to accumulate the MAC
results corresponding to both positive and negative weights.
The approach is characterized by a distinctive mechanism
‘up-counting’ for positive weights and ‘down-counting’ for
negative weights—to determine the final convolution output.
Furthermore, the integrated correlated double sampling circuit
(CDS) inherent in CISs, in conjunction with the SS ADC, is
used to perform the Rectified Linear Unit (ReLU) operation.
This application ensures that the non-linear activation, a fundamental aspect of CNNs, yields final count values from the
ADC that remain either positive or zero (representing ReLu
operation), following the sequential up-and-down counting
From an algorithmic perspective, a Convolutional Neural
Network (CNN) initiates with a linear convolution operation,
succeeded by batch normalization (BN) and non-linear (ReLU)
activation. To integrate the in-pixel computing paradigm,
where traditionally the processing unit obtained the accumulated output of the image data, this procedure can be divided
into two primary segments:
1) Simulation of hardware computation through a mathematical model.
2) Integration of the remaining CNN algorithm with the
newly acquired input data.
The P2 M circuit scheme, as explained in the previous section, implements convolution operation in the analog domain
using modified memory-embedded (using NVM like RRAM,
CMOS transistors) pixels. The fundamental components of
these pixels are transistors, which are inherently non-linear
devices. Consequently, generally speaking, any analog convolution circuit composed of transistor devices will display
non-ideal non-linear characteristics concerning the convolution
operation. Many existing works, specifically in the domain of
memristive analog dot product operation, ignore non-idealities
arising from non-linear transistor devices [21], [22]. However,
to test out the performance of the new hardware computation
model, the non-linearity that occurred with the implementation
of the hardware computation needs to be taken into account.
To capture these non-linearities, the P2 M concerned papers
[14]–[16] have performed extensive simulations of the novel
pixel circuit spanning wide range of circuit parameters such
as the width of weight transistors and the photodiode current
based on commercial 22nm GlobaFoundries transistor technology node. The resulting SPICE results, i.e. the pixel output
voltages corresponding to a range of weights and photodiode
currents, are modeled using a behavioral curve-fitting function.
The generated function is then included in the algorithmic
framework, replacing the convolution operation in the first
layer of the network. Specifically, within the simulation model
of P2 M, the algorithm accumulates the output of each curvefitting function designated for every pixel in the receptive field.
To illustrate, given 3 input channels and a kernel size of 5×5,
the resultant receptive field size is 75. This framework models
each inner product accumulated by the in-pixel convolutional
layer. Subsequently, this algorithmic structure has been employed to optimize the training of CNNs for datasets such as
VWW and BDD100K.
A. Algorithmic Adjustments for Optimizing P2 M Circuit
Scheme Performance
The P2 M circuit scheme enhances parallelism and reduces
data bandwidth by simultaneously activating multiple pixels
and executing several parallel analog convolution operations
for a specific channel in the output feature map. This process
is serially repeated for each channel, boosting parallelism,
minimizing bandwidth, and improving both energy efficiency
and speed. However, expanding the number of channels in
the initial layer intensifies the serial nature of the convolution, which in turn affects the aforementioned benefits. This
situation introduces a complex circuit-algorithm trade-off. To
address this, the backbone CNN needs to be optimized for
larger kernel sizes—which increases the concurrent activation of more pixels, aiding parallelism—and non-overlapping
strides—to diminish dimensionality in subsequent CNN layers,
thereby reducing the number of operations and peak memory
usage. Additionally, maintaining a smaller number of channels
is essential to lessen serial operations for each channel while
preserving competitive classification accuracy and considering
the imperfections linked with analog convolution operations.
Reducing the number of channels also decreases the number of
weight transistors within each pixel, leading to improvements
in both area and power consumption. The subsequent smaller
output activation map, resulting from the decreased number of
channels and larger kernel sizes with non-overlapping strides,
cuts down the energy used in transmitting data from the CIS
to the downstream CNN processing unit. This also reduces the
number of floating-point operations in the downstream layers,
consequently lowering energy consumption.
IV. I MPROVING STACKED DEVICE TECHNOLOGY
3D integration is the critical technology driver for enabling
in-pixel computing without degrading pixel density. 3D integration with respect to camera technology has been exploited
for various photography use cases. With the progress in inpixel computing, 3D integration is set to lead to disruptive
advances in enabling intelligence inside camera pixels. In
previous papers like [23] TSVs has reduced the area over traditional 2D chip however the leftovers is still huge as the TSV
can only be built on the outside of the pixel arrays. Recently
advanced stacking technology has started to use fine-pitch
Copper-Copper (Cu–Cu) connections which enables pixelparallel architecture. In the TSV configuration, signal lines are
routed to the logic layer situated at the periphery of the pixel
array. Contrarily, Cu–Cu connections can be integrated directly
beneath the pixels, facilitating an increment in the number of
linkages, as shown in Fig. 1. This distinction highlights the
inherent versatility and adaptability of Cu–Cu connections in
Fig. 4: Misalignment reference pattern between two Cu Pads.
enhancing interconnectivity within the configuration. Samsung
has recently given out a design [3] that used Cu-Cu bonding
for which each pixel in the chip has two small-pitch Cu-toCu interconnections for the wafer-level stacking to connect
to the pixel-wise ADC and an in-pixel digital memory, and
the pitch of each unit pixel is less than 5µm, also Sony [4]
has successfully launched their image sensor with 3µm pitch
3M Cu-Cu bonding chips, which proves the industry ultrafine pitch Cu-Cu connections are with sufficient electrical
properties and reliabilities. So with Cu-Cu bonding, for the
P2 M circuit, the weights are implemented in a separate die
heterogeneously integrated with the back-side-illuminated (BI)
CMOS Image Sensor (CIS) die. The weights of each pixel
need to be stacked vertically and aligned with the pixel pitch
to achieve no area overhead.
A. Challenges and Considerations in Pixel Unit Design with
Cu-Cu Bonding
When designing the pixel unit with Cu-Cu bonding in
consideration, several factors require attention. Kernel size,
the number of strides, and output channels can constrain
the minimum pixel pitch size. Additionally, 3D integration
technology imposes its minimum bond pitch requirements. As
noted in [15], the minimum pixel pitch of P2 M-enabled CIS
may be limited either by the area of the weight transistors
(when the number of channels and kernel size are large, and
the stride is small) or the bond pitch, depending on which is
larger.
Cu-Cu interconnects, integral to every pixel unit, are susceptible to numerous structural defects due to the fabrication process, including misalignment, as illustrated in Fig.
4. These defects require thorough testing to ascertain the
performance of 3D-ICs. Misalignments, caused by factors such
as translation, rotation, and the run-out effect (related to wafer
expansion due to thermal stress), can significantly increase resistance and capacitance between nodes. However, as detailed
in [24], advancements in commercial bond alignment tools
are minimizing these effects. Moreover, recent developments
in image sensor chips by Sony [4] and Samsung [3], which
employ pixel-unit Cu-Cu bonding for under-pixel-unit ADC
and DRAMs, suggest that a certain degree of misalignment
is tolerable, resulting in acceptable variations in capacitance
and resistance. Thus, misalignment is not anticipated to be a
bottleneck in achieving in-pixel computing in this regard.
Given the high density of both pixels and Cu-Cu bondings,
thermal modeling is also crucial. However, as metal lines
and vias serve as heat spreaders, distributing heat in alternate
directions, and as discussed in [25], managing thermal effects
in 3D ICs appears to be feasible.
V. C ONCLUSION
This article has provided an extensive review of the latest
advancements in extreme edge computing, focusing particularly on in-pixel processing and innovations in stacking chip
design. The pioneering P2 M circuit has demonstrated significant promise, showcasing its potential to decrease bandwidth
and enhance processing speed, attributed to its capability to
perform parallel matrix multiplication within the pixel array.
Looking forward, the implementation of Cu–Cu bonding is
pivotal for the future deployment of the P2 M circuit, as
it facilitates the incorporation of pixel-wise weight block,
thereby enabling computation within the pixel itself. This development underscores the continual evolution and potential of
in-pixel computing in addressing the challenges and demands
of advanced imaging technologies.
R EFERENCES
[1] Y. Chen, H. Dai, and Y. Ding, “Pseudo-stereo for monocular 3d object
detection in autonomous driving,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pp. 887–897,
2022.
[2] L. Jiao, R. Zhang, F. Liu, S. Yang, B. Hou, L. Li, and X. Tang, “New
generation deep learning for video object detection: A survey,” IEEE
Transactions on Neural Networks and Learning Systems, 2021.
[3] M.-W. Seo, M. Chu, H.-Y. Jung, S. Kim, J. Song, J. Lee, S.-Y. Kim,
J. Lee, S.-J. Byun, D. Bae, M. Kim, G.-D. Lee, H. Shim, C. Um, C. Kim,
I.-G. Baek, D. Kwon, H. Kim, H. Choi, J. Go, J. Ahn, J. Lee, C. Moon,
K. Lee, and H.-S. Kim, “A 2.6 e-rms low-random-noise, 116.2 mw lowpower 2-mp global shutter cmos image sensor with pixel-level adc and
in-pixel memory,” in 2021 Symposium on VLSI Circuits, pp. 1–2, 2021.
[4] Y. Kagawa, S. Hida, Y. Kobayashi, K. Takahashi, S. Miyanomae,
M. Kawamura, H. Kawashima, H. Yamagishi, T. Hirano, K. Tatani,
H. Nakayama, K. Ohno, H. Iwamoto, and S. Kadomura, “The scaling
of cu-cu hybrid bonding for high density 3d chip stacking,” in 2019
Electron Devices Technology and Manufacturing Conference (EDTM),
pp. 297–299, 2019.
[5] R. Eki, S. Yamada, H. Ozawa, H. Kai, K. Okuike, H. Gowtham,
H. Nakanishi, E. Almog, Y. Livne, G. Yuval, et al., “9.6 a 1/2.3 inch
12.3 mpixel with on-chip 4.97 tops/w cnn processor back-illuminated
stacked cmos image sensor,” in 2021 IEEE International Solid-State
Circuits Conference (ISSCC), vol. 64, pp. 154–156, IEEE, 2021.
[6] F. Zhou and Y. Chai, “Near-sensor and in-sensor computing,” Nature
Electronics, vol. 3, no. 11, pp. 664–671, 2020.
[7] M. Lefebvre, L. Moreau, R. Dekimpe, and D. Bol, “7.7 a 0.2-to-3.6
tops/w programmable convolutional imager soc with in-sensor currentdomain ternary-weighted mac operations for feature extraction and
region-of-interest detection,” in 2021 IEEE International Solid-State
Circuits Conference (ISSCC), vol. 64, pp. 118–120, IEEE, 2021.
[8] H. Xu, N. Lin, L. Luo, Q. Wei, R. Wang, C. Zhuo, X. Yin, F. Qiao,
and H. Yang, “Senputing: An ultra-low-power always-on vision perception chip featuring the deep fusion of sensing and computing,” IEEE
Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 1,
pp. 232–243, 2021.
[9] L. Bose, P. Dudek, S. J. Chen, Jand C, and W. W. Mayol-Cuevas, “Fully
embedding fast convolutional networks on pixel processor arrays,” in
Computer Vision–ECCV 2020: 16th European Conference, Glasgow,
UK, August 23–28, 2020, Proceedings, Part XXIX 16, pp. 488–503,
Springer, 2020.
[10] T. Hsu, Y. Chen, R. Liu, C. Lo, K. Tang, M. Chang, and C. Hsieh, “A 0.5v real-time computational cmos image sensor with programmable kernel
for feature extraction,” IEEE Journal of Solid-State Circuits, vol. 56,
no. 5, pp. 1588–1596, 2020.
[11] R. Song, K. Huang, Z. Wang, and H. Shen, “A reconfigurable
convolution-in-pixel cmos image sensor architecture,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 10,
pp. 7212–7225, 2022.
[12] G. Datta, S. Kundu, Z. Yin, J. Lakkireddy, R Mathai, A. Jacob,
P. Beerel, and A. Jaiswal, “A processing-in-pixel-in-memory paradigm
for resource-constrained tinyml applications,” Scientific Reports, vol. 12,
no. 1, p. 14396, 2022.
[13] G. Datta, S. Kundu, Z. Yin, J. Mathai, Z. Liu, Z. Wang, M. Tian, S. Lu,
R. Lakkireddy, et al., “P 2 m-detrack: Processing-in-pixel-in-memory
for energy-efficient and real-time multi-object detection and tracking,”
in 2022 IFIP/IEEE 30th International Conference on Very Large Scale
Integration (VLSI-SoC), pp. 1–6, IEEE, 2022.
[14] M. A.-A. Kaiser, G. Datta, Z. Wang, A. P. Jacob, P. A. Beerel, and A. R.
Jaiswal, “Neuromorphic-p2m: processing-in-pixel-in-memory paradigm
for neuromorphic image sensors,” Frontiers in Neuroinformatics, vol. 17,
p. 1144301, 2023.
[15] M. A.-A. Kaiser, G. Datta, S. Sarkar, S. Kundu, Z. Yin, M. Garg,
A. P. Jacob, P. A. Beerel, and A. R. Jaiswal, “Technology-circuitalgorithm tri-design for processing-in-pixel-in-memory (p2m),” arXiv
preprint arXiv:2304.02968, 2023.
[16] G. Datta, S. Kundu, Z. Yin, R. T. Lakkireddy, J. Mathai, A. P. Jacob,
P. A. Beerel, and A. R. Jaiswal, “A processing-in-pixel-in-memory
paradigm for resource-constrained tinyml applications,” Scientific Reports, vol. 12, no. 1, p. 14396, 2022.
[17] H. Nam and B. Han, “Learning multi-domain convolutional neural
networks for visual tracking,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, pp. 4293–4302, 2016.
[18] M. Seo, M. Chu, H. Jung, S. Kim, J. Song, J. Lee, S. Kim, J. Lee,
S. Byun, D. Bae, et al., “A 2.6 e-rms low-random-noise, 116.2 mw lowpower 2-mp global shutter cmos image sensor with pixel-level adc and
in-pixel memory,” in 2021 Symposium on VLSI Technology, pp. 1–2,
IEEE, 2021.
[19] Y. Kagawa, N. Fujii, K. Aoyagi, Y. Kobayashi, S. Nishi, N. Todaka,
S. Takeshita, J. Taura, H. Takahashi, Y. Nishimura, et al., “Novel stacked
cmos image sensor with advanced cu2cu hybrid bonding,” in 2016 IEEE
International Electron Devices Meeting (IEDM), pp. 8–4, IEEE, 2016.
[20] S. Tabrizchi, A. Nezhadi, S. Angizi, and A. Roohi, “Appcip: Energyefficient approximate convolution-in-pixel scheme for neural network
acceleration,” IEEE Journal on Emerging and Selected Topics in Circuits
and Systems, 2023.
[21] S. Jain, A. Sengupta, K. Roy, and A. Raghunathan, “RxNN: A framework for evaluating deep neural networks on resistive crossbars,” Trans.
Comp.-Aided Des. Integ. Cir. Sys., vol. 40, p. 326–338, feb 2021.
[22] C. Lammie and M. R. Azghadi, “Memtorch: A simulation framework
for deep memristive cross-bar architectures,” in 2020 IEEE International
Symposium on Circuits and Systems (ISCAS), vol. 1, pp. 1–5, 2020.
[23] S. Sukegawa, T. Umebayashi, T. Nakajima, H. Kawanobe, K. Koseki,
I. Hirota, T. Haruta, M. Kasai, K. Fukumoto, T. Wakano, K. Inoue,
H. Takahashi, T. Nagano, Y. Nitta, T. Hirayama, and N. Fukushima, “A
1/4-inch 8mpixel back-illuminated stacked cmos image sensor,” in 2013
IEEE International Solid-State Circuits Conference Digest of Technical
Papers, pp. 484–485, 2013.
[24] I. Jani, D. Lattard, P. Vivet, L. Arnaud, and E. Beigné, “Misalignment
analysis and electrical performance of high density 3d-ic interconnects,”
in 2019 International 3D Systems Integration Conference (3DIC), pp. 1–
4, 2019.
[25] P. Leduc, F. de Crecy, M. Fayolle, B. Charlet, T. Enot, M. Zussy,
B. Jones, J.-C. Barbe, N. Kernevez, N. Sillon, S. Maitrejean, D. Louis,
and G. Passemard, “Challenges for 3d ic integration: bonding quality
and thermal management,” in 2007 IEEE International Interconnect
Technology Conferencee, pp. 210–212, 2007.
Download