Uploaded by Nikodem Pikul

Machine Learning Processing Unit Effectiveness

advertisement
__________________________________
Computer Science
Extended Essay
Higher Level
Candidates number: khq293
Word Count: 3999
Title:
Assessing the effectiveness of processing units used in machine learning.
RESEARCH QUESTION:
To what extent does the usage of: Apple M1 chip Neural Engine, GPUs, CPUs processing units, alters the
efficiency and effectiveness of running the machine learning models and neural networks training?
__________________________________
1
Acknowledgements:
I would like to thank my EE supervisor, Ms. Natalia for the meritorious, professional, and academic
support she provided during the process of writing this research paper.
I would like to thank Mr. Mateusz, for his equally nerdy excitement, support, and engagement in the
model's training process.
I would like to thank my brother Maksymilian and classmate Mikołaj for enabling me to use their
computing devices in order to complete the experiments.
I would also like to thank Mr. Bartłomiej (Cortland salon manager) for providing me with
information regarding the Apple M1 Pro’s SoC (neural processing unit) articles and insights.
2
Table of contents:
1.
Introduction
2.
Background information
3.
2.1.
Biological Neural Networks
2.2.
Artificial Neural Networks
2.3.
Mathematics as the backend of neural networks
2.4.
CPUs
2.5.
GPUs
2.6.
Apple’s SoC - 16 core Neural Engine
2.7.
Software used in Machine Learning
2.8.
Terminology
Methodology
3.1.
Compuing devices used in benchmarks
3.2.
Inference benchmarking - OsiriX lite - AI segmentation
3.3.
Benchmarking - neural networks training
4.
Experiment Results
5.
Study Limitations and research opportunities
6.
Conclusion
7.
Bibliography
8.
Appendix
3
1.
Introduction
AI is evolving at an astounding rate, achieving and producing creative images, allowing us to increase
our privacy through technology such as facial or iris recognition or augmenting game frames, but
what are the objects performing all the logic operations, allowing us to enjoy the benefits of not
having an "if-else" code structure? What processing units output certain results quicker or with greater
accuracy? I've always wondered how it's possible to type the object or person I'm looking for into the
"images" app and have all the results from the large gallery of 5000 pictures appear within
milliseconds. This research paper is focused on exploring the variables and factors affecting the
efficiency and effectiveness of multiple processing units in the training and utilization of neural
network models. By using artificial intelligence-accelerated segmentation of computed tomography
for Apple MacBook computers from various Intel, ARM, and AMD-based generations, experiments
were conducted, resulting in the definitive superiority of the M1 Pro SoC in running prerecorded
models. However, to evaluate the processing units' ability to train the models, three different
"PyTorch" machine learning projects regarding 25x25, image processing, and classification were used
in order to obtain numerical, time, efficiency, and effectiveness-related outcomes. By combining the
results of both experiments, I was able to reach a conclusion that not only included the best processing
unit for those two functions but also explained how specific computational components, such as the
number of FPUs, cache memory, CUDA cores, or software processing schemes, and their metrics
determine one's performance superiority over another.
4
2.
Background Information
2.1.
Biological Neural Netoworks
ANN vs Biological NN1
To understand the artificial neural network, it is necessary to understand how the biological brain
learns and processes information. In the picture above, we can see both biological and artificial
neurons. Dendrites are inputs, nuclei are nodes, and axons are the output, while the cell body relates to
all the hidden layers. However, unlike an artificial neural network, the brain is designed to be
multi-tasking and capable of performing a variety of tasks. Additionally, it is confused when few
things are learned in a short span of time, making it difficult to classify and place in long-term
memory, which is achieved by intense emotions or in a habitual manner.
1
(Frumusanu, Apple announces M1 Pro & M1 Max: Giant new arm socs with all-out performance
2021)
5
2.2.
Artificial Neural Networks
The concept of (ANN) artificial neural networks is part of the broader umbrella term "AI," or artificial
intelligence, which is a field of computer science researching and developing ways in which computer
algorithms can automate tasks that would otherwise be performed by humans. AI is commonly
subdivided into three areas: machine learning, deep learning, and artificial neural networks.
Image 2 2
Machine learning (ML) is a type of AI that uses past data about a solution to a problem to create an
algorithm-based model that solves the problem more accurately. Deep learning, is a subset of machine
learning that employs multiple artificial neural network architectures and layers to process past data
and produce the most accurate results. It is commonly used when automation is difficult and more
sophisticated algorithms are required. However, there is no set number of layers or algorithms that
distinguishes regular machine learning from deep learning, and the distinction is rather
2
Own Graphics
6
contractual.ANNs, which belong to the lowest subcategory, resemble biological neural networks,
where all the nodes or neurons correspond to information as inputs, weights, biases, and activation
functions such as "sigmoid" and "tanh" that act as solution modeling functions for the whole network.
Inputs refer to data factors that all together formulate a certain outcome; for example, certain
placements of lid pixels on the screen result in a larger picture, which is an outcome. Weights describe
the "importance" of that placement to later pass the changed input value to the activation algorithm
function; such an activation function corresponds to chemical reactions in the brain determining
whether that placement is "important enough" to let this configuration be activated and recognize the
picture as a certain object , and the whole process of checking whether the function has identified the
configuration of pixels as a correct corresponding object and alternation of the prediction process
together is called training.
Since birth, receiving inputs as information from the outside world; however, unlike machines, where
everything operates on the basis of 1s and 0s, it is impossible to program the brain to pick only the
relevant information and ignore all other data. Training of the neural network is based on thousands of
repetitions of that process's outcome, ultimately altering the weights, bias, and activation function in
such a way that it is getting more accurate with every next repetition or epoch on that dataset. The
more inputs, data, and layers, the longer it takes to train the neural network. The stage after the
training of the ANN is referred to as "inference," which is running the trained model using completely
new data in order to obtain valid output.
7
2.3.
Mathematics as the backend of neural networks
In every case, terminology such as biases, weights, activation functions, or models has a
mathematical foundation. As the very first layer of the neural network is the input layer, it is important
to note that every input variable acts as an input node, which is directly connected to the following
layers. However, every node (except the first nodes, as they are input nodes) has a weighted
connection with the next layer, acting as an important variable in determining the output. The larger
the value that the weight carries, the bigger the significance or multiplier. Similarly, biases serve as
the error correction variable in the neural network equation, taking the form:
3
It works in a way that the next node of the next layer sums up all the nodes from the previous
layer and later multiplies them by the weight of connections to, at the end, add the bias variable that
shifts the function depending on the value. Later, it has to go through the activation function that the
developer decided is the best fit for the problem to calculate the value of the next node. Activation
functions divide into two major groups: linear and non-linear functions.
3
(Neural Network)
8
(Graphs drawn by Casio FG-CG50)
Linear function:
Non-Linear functions:
Sigmoid Function:
9
Tanh - hyperbolic tangent
10
Relu 4
Such an activation function won’t return the output and send it to the next hidden layer unless
the inputs altered by the weights and biases fit in the range of activation function values.
Matrix multiplications5 are the programming part of calculation that makes running neural network
trainings and models particularly difficult. Matrices are x-y-rectangular multidimensional arrays that
can store information such as numbers and characters; in the case of programming, they multiply in
such a way that the number of columns in the first matrix and the number of rows in the second matrix
should be equal, and they output a third matrix of the columns and rows of the first and second
matrices, respectively.
4
(ReLu activation Function)
5
(dishashree26, Activation functions: Fundamentals of deep learning 2022)
11
2.4.
CPUs6
The CPU7 (Central Processing Unit) is the computer's brain, and it follows the instruction cycle to
perform large numbers of uniform, simple computations. The most common instruction cycle is
fetch-decode-execute-store. For instance, when we turn on the computer, depending on the
configuration, instructions for the CPU come from the ROM read-only memory (used for the BIOS
essential computer software). They are later passed forward, either straight to the random access
memory (RAM) or to the CPU's superfast L1 cache memory, to be later passed to RAM anyway. After
the instructions are provided to primary memory (the cache and RAM), the Control Unit (CU)
receives the instructions and stores them in the CIR (Current Instruction Register) and later the SCR
(Sequence Control Register) to coordinate the order of performing calculations and logic operations in
the Arithmetic Logic Unit. The instructions and data are passed by the paths called address and data
buses from the CU to the ALU to perform calculations and output the data back with the buses to
store it on the IAS in the form of L1 CPU Cache and pass it forward to RAM to fetch another set of
instructions.
6
(Does bios need to be loaded into main memory to be executed by CPU? 1966)
7
(Syed, AI vs. ML vs. DL (vs. NN) 2020)
12
8
Depending on the computer we need and use, there may be CPUs that use the Complex
Instruction Set Computer (CISC) architecture, which is common for x86 Intel and AMD CPUs, or the
Reduced Instruction Set Computer (RISC) architecture, which is more commonly used with ARM
technology. ARM focuses on developing chipsets meant for one specific OS or task, such as the FSD
(Full Self-Driving Chip) made for Tesla in a reduced instruction set computer meant to run the ML
model faster and in parallel with multiple calculations. Two major factors describing the computing
8
Own Graphics
13
power of the CPU are its clock speed and cache memory size. Clock speed refers to the earlier
mentioned cycles and how many of them are executed per second, usually expressed in GHz per
second (billions of cycles per second).
2.5.
GPUs9
A graphics processing unit (GPU) is a different kind of processing unit used for tasks
requiring an immense number of operations that need to be performed and delivered very quickly. As
its name suggests, its initial and major purpose is graphics processing, which needs to be continuous
and performed in real time. In contrast to CPUs, where we can work with up to 8–16 cores in
commercial level and 64 cores in industrial level CPUs, GPUs consist of hundreds of Compute
Unified Device Architecture (CUDA) cores that are also often referred to as "floating point" cores.
GPUs are characterized by higher (than CPUs) memory bandwidth, which describes the maximum
amount of data transfer in a given range of time. They are less sophisticated and hence smaller, which
enables them to be implemented in larger quantities and process more calculations in parallel. It
means that GPUs are used for machine learning most often because of hundreds of available cores that
can either work together with their own level 1 cache memory or in blocks of 32 cores that are called
warps and enable the GPU to process even larger parallel computations and allocate even more
memory to perform those tasks. Such GPU architecture enables it to perform calculations nearly
exclusively used for machine learning, such as matrix multiplication.
9
(Typical Nvidia GPU architecture. the GPU is comprised of a set of ...)
14
GPU structure10
An example of a GPU is shown above, where each of the two major units has its own
GDDR-5 RAM memory, which is controlled by the memory controller and joined with level 2
high-speed cache memory. The GPU grid at the highest level of hierarchy is built of
multi-multiprocessor blocks (clusters), which consist of streaming multiprocessors characterized by
"streams" of data that are processed with the aim of real-time computation. These multiprocessors in
turn consist of processors with their own higher-speed (primary) cache, consisting (in this example) of
64 cores each.
10
(Exploring the GPU architecture: Vmware)
15
2.6.
Apple’s SoC - 16 Core Neural Engine11
12
A System on a Chip (SoC) is an integrated circuit that houses all of the computer's major
processing units, including the GPU, CPU, RAM, cache memory, and, in the case of the Apple M1
SoC chipsets, even a 16-core13 neural network accelerator engine14. The fact that it is all put together
11
(admin_mirabilis, Apple Neural Processor 2021)
12
(Frumusanu, Apple announces M1 Pro & M1 Max: Giant new arm socs with all-out performance
2021)
13
(Hollance, Hollance/neural-engine: Everything we actually know about the Apple Neural Engine
(ANE))
14
(Apple M1 chip. everything you wanted to know about it 2021)
16
on a silicon circuit reduces the latency caused by the distance between the components placed on the
standard architecture of the motherboard. With that, Apple has given us the option to choose how
much memory we want to have in the SoC, which acts as an astoundingly fast central, graphics, and
neural engine integrated unit memory that can be accessed quicker, be more power efficient, and
perform more calculations. This particular SoC is developed in the Advanced RISC Machines (ARM)
architecture, which when combined with Apple's Operating System (OS) is developed with the use of
their own "SWIFT" (higher level) programming language, C and C++, which are the first layer
languages after the "assembly". results in less power-hungry computations.
2.7.
Software used in machine Learning
Keras is a high-level Python library built upon other Theano and TensorFlow libraries used for deep
learning and neural networks. It is more user-friendly for programmers because it does not involve
low-level, detailed, and more complex code structures.
Source: Own Graphics
17
It allows the creation of so-called Keras models, which may consist of multiple neural layers,
activation functions, optimizers, and other functions of neural network models such as initialization
and regularization schemes. Regularization is a type of regression that looks for relationships between
dependent and independent variables and makes sure no such thing as overfitting occurs, which
relates to the issue of a dataset-specific trained ML model that would struggle to perform well during
the inference stage. Theano is a numerical library mainly used for matrix multiplications and neural
network-related calculations, while TensorFlow is a broadly used ML or DL library focusing on
training and inference of models.
2.8.
Terminology
The key terms most often used during the stages of development and production of software
based on ANNs and MLs overall are: epochs, batches, loss, validation, train, evaluation, and accuracy.
Epoch is a word used for the iterations of ANN's training dataset, which refers to the number
of times ANN was trained on that particular data set. Epochs are often divided into batches (smaller
collections of training data samples)since during the training, memory must be allocated for storing
the naturally occurring losses. Losses are penalties for bad predictions during the run of the batch and
are later used to update the ANN model in the form of modified weights and biases. As a result, the
smaller the batch size, the less memory that can be allocated to store the losses, making training faster
and more efficient. The final stage in the development of a finished ML model is evaluation, which
refers to the methods used in assessing the accuracy and effectiveness of the model.
The three most commonly used evaluation methods are accuracy, precision, and recall. Accuracy can
be expressed by a given formula:
.
18
Precision refers to the ratio of correct predictions in the scope of all the data samples with the exact
same outcomes, which can be expressed in the form of:
.
Precision, as opposed to accuracy, measures how relevant the predictions are within the expected
outcomes, i.e., the presence of truly correctly evaluated positive values in the range of both falsely and
truly evaluated values. That method enables the algorithm and developer to exclude falsely evaluated
as positive negatives from the range of positives.
And the third evaluation method is reverse precision, which is expressed as:
It helps identify the positives that are falsely evaluated as negatives.
Train, in turn, refers to the accuracy of the training data rather than the whole model, and finally,
validation is based on running the model on a test dataset derived from the training data.
3.
Methodolgy
3.1.
Computing Devices used in benchmarks
Gaming Rig PC15
15
Own Picture
19
CPU
AMD Ryzen 7 3800x 8-core 3.90GHz, CISC
GPU
NVIDIA GeForce RTX 3060 Ti - 1.67GHz - 8gb - GDDR6
16
16
Cuda Cores
4864
Motherboard
ROG STRIX X570-E GAMING
Power Supply
650w BeQuiet
RAM
16 GB 3200MHz
System
Windows 10 Pro 22H2
(image of geforce rtx 3060ti)
20
MacBook Air Retina 13-inch 201817
CPU
1.6 GHz dualcore processor intel core i5, CISC
GPU
Intel UHG Graphics 617 1536 MB,
RAM
8 GB 2133 MHz LPDDR3
Power Supply
Apple 30 watt
System
MacOS Ventura 13.0.1
System
MacOS Ventura 13.2
MacBook Pro 14-inch 202118
17
(Apple specifications)
18
(Apple specifications)
21
Apple M1 Pro 19
SOC
SOC-CPU
8-core CPU peak 3.2GHz, RISC
SOC-GPU
14-core GPU peak 3.2GHz
SOC-Neural Engine
16-core Neural Engine
Unified Memory
16 GB -256-bit LPDDR5 SDRAM - Bandwidth 200GB/s
Power Supply
Apple 67 watt
System
MacOS Ventura 13.1
MacBook Pro 16-inch 201920
CPU
2.6 GHz - 4.5 GHz 6-core Intel Core i7 - 12MB level 3 Cache,
19
(M1 Pro motherboard)
20
(Apple specifications)
22
CISC
GPU
2.6 GHz AMD Radeon Pro 5300M GDDR6 4GB + Intel UHD
Graphics 630 1536 MB
RAM
16 GB 2667 MHz DDR4
Power Supply
Apple 96 watt
System
MacOS Ventura 13.2
3.2.
Inference benchmarking - OsiriX lite - AI segmentation
The first stage of evaluating the performance, efficiency, and effectiveness of CPUs, GPUs,
and SoCs with explicit division for neural engine acceleration was performed by running a pretrained
Keras neural network model. Following Takashi Shirakawa's (cardiovascular surgeon, master of
mechanical engineering, programmer) lead, I used the "OsiriX lite" software for viewing "digital
imaging and communication in medicine" (DICOM), information object definition (IOD) medical
images (with support for 64-bit and multithreading computing), and the AI segmentation plugin to
measure the capabilities of Apple computers' processing units. The author of the plugin stated that it
was "specially constructed for this performance test," and the "AI core," also known as a model, "has
23
been trained on more than 90,000 CT images for semantic segmentation of the aorta." Additionally,
thanks to the explicit support of the new Apple silicon lineup by "OsiriX lite" CT computed
tomography software, the model was converted to Core ML format (BENCHMARK.mlmodel) using
Apple’s coremltools for the macOS platform. The conversion was required to gain access to the neural
engine management using the Apple machine learning framework "Core ML" written in Swift. It is
not possible to run such a benchmark on a Windows-OS-based machine. However this is not a
limitation, as the three chosen laptops are equipped with Intel’s i7 series processor, Intel’s integrated
graphics processor, and AMD's GPU, which ultimately increases the accuracy of the benchmark in
terms of mobile processing units.
Each device (updated to MacOS “Ventura”) performed
segmentation of abdominem and skull images in samples or slices of 100, 200, and 500, where all
slices processed and added together give 1000, and efficiency is expressed in milliseconds per slice
“OsiriX Lite” - Software - MacBook Pro 14’ - M1Pro21
21
Appendix C
24
Input (whole picture)
Output(ROI area)
22
Abdominem
23
Abdominem ROI
AI Segmentation plugin for “OsiriX lite”
#Single Pretrained AI core or model
#Number of Segmentation Slices Selection
#ROI - Region of interest drawing and rendering
#Antialias or Smoothing border
#As there is no need for exporting the outcome
due to benchmarking purpose this option was
disabled (the input and output of computation can
be seen below)
#Choice of used processing units
#Apple M1 Pro as SoC
22
(Softneta, Dicom Library - Anonymize, share, view dicom files online)
23
(Softneta, Dicom Library - Anonymize, share, view dicom files online)
25
Github Repository Source code for the plugin:
(Tkshirakawa, TKSHIRAKAWA/AISEGMENTATION_V141: The public source code of
a.i.segmentation (AIS) version 1.4.1. AIS is a plugin of OsiriX for macos enables semantic
segmentation of medical images using artificial intelligence (core ML framework) in macos.)
3.3.
Neural networks Training benchmarking
The second stage of evaluating the previously specified processing units in terms of machine learning
and training efficiency is to perform three distinct trainings of three different neural networks, each
more computationally intensive than the last. To conduct the training, an appropriate environment
must be prepared and installed, beginning with downloading and installing the most recent Python
language version globally in the computer's terminal (MacOS) or command line (Windows).
Anaconda was installed to create a clean environment containing only the necessary packages, with
the goal of increasing benchmark credibility and preventing such from affecting the final results. The
PyTorch DL python framework was installed in the Anaconda environment so that the image
recognition and number detection training could be interpreted. Finally, the source code from
26
Sebastian Raschka and Alexander Ziskin's GitHub repository 24was downloaded and placed in the
top-level computer directory.
Example of lenet-mnist.py training from M1 Pro CPU
(screenshot from terminal)
Lenet-mnist.py training
4.
Experiment Results
4.1.
Inference Benchmarking
After performing the AI segmentation of DICOM images on all the MacOS devices, I was
able to insert the data into the excel file, sum up the total computation and core ML neural network
time, divide it by the number of rendered slices, and finally round them up to whole numbers for
clarity. The findings are presented as follows:
24
Appendix A
27
According to the theory of parallel computing, it took the most time per slice for CPUs (with a visible
relationship between processing unit speed and number of cores) in every case, where the Mac Air
performed the worst and the only computer containing a neural engine or ANE, 14-inch Macbook Pro
with M1 Pro performed the best. The part of the AI segmentation accelerated by the Core ML used for
predictions regarding the ROI took on average only 26.7% of the whole computation, while in the
second category "CPU+GPU", Core ML time took over half the time of the whole computation for the
same computer. What is surprising, but explainable, is that in the "CPU" category, Core ML and
computation time were nearly equal, indicating that Core ML was actively running longer. This is
likely because, like in the first, second, and third categories, 22, 14, and 17 milliseconds, respectively,
are used for the computation of operations not related to the AI segmentation itself but rather
preparation to run the plugin. The same situation can be seen in the cases of "Mac Air" and "16-inch
Mac" for both categories. However, while speed and performance are one factor, efficiency per watt is
another.
28
Similarly to the first test, the M1 Pro SoC scored the best in terms of efficiency during inference.
4.2.
25
Benchmarking - neural networks training25
Appendix B
29
Starting from the SoC Macbook 14’ chipset and GeForce RTX 3060 Ti, results vary depending on
the neural network that was being trained. In the case of mlp-mnist.py, the SoC chip was in total only
2 seconds behind Nvidia's powerful GPU. However, in the case of Lenet-mnist.py, RTX 3060 Ti
managed to perform the training at a time equal to the M1 Pro chip running mlp-mnist; moreover,
the gap in performance got even larger, by 6 seconds, with RTX performing the very same task on a
less optimized Windows OS. The hardest to train neural network with more pixels and hence input
data, vgg16-cifar10.py, M1 Pro, was beaten by over three times.
30
By carrying on with the analysis, it was already confirmed in the last ‘inference” benchmark that
CPUs aren’t the greatest processing units for running the models, we can conclude the same in case
of performing the training. AMD Ryzen 7 3800x (overclockable CPU) did the best in all three
trainings, with M1 pro on the second place and intel core on third. Unfortunately Intel i5 from
Macbook air couldn’t complete the last training due to its exceptionally bad performance. It didn’t
manage to finish the fourth batch while it was running over 9.85 hours which made it clear that it
wouldn’t be able to finish the training under 30 hours with 14 batches of 100 samples.
31
To figure out why one CPU performs better than the other, I examined the relationship between CPU
clock speed and training time. According to the diagram, AMD Ryzen 7 had the best performance,
followed by the M1 Pro and Intel Core i7 in second place, and Intel Core i5 in last place. The
additional factor affecting the final efficiency of the CPU is the ratio of its cache memory to results;
however, it is difficult to show a direct relationship between these CPUs and the M1 Pro’s CPU since
SoC has unified memory as a ram and cache for all the components of the chipset, and the rest of
these CPUs have external ram and the CPU's internal cache, which would certainly affect the
outcome.
32
The last crucial factors determining the processing units' effectiveness in training are the train,
validation, and test accuracy (defined in the background information). Data have shown that despite
the inproportionally faster speed of training in case of GPUs and whole SoC, CPUs were actually
more effective.
33
5.
Study Limitations and research opportunities
All of the experiments and research that have been conducted could be expanded to include further
evaluation of other processing units, such as Google's TPUs (tensor processing units) for machine
learning and the logic architecture of processing units. While it became clear that CPUs, GPUs, etc.
vary in terms of performance, in my experiments I focused on the "Pytorch" library and "Anaconda"
software, which limited me to a certain range of supported computing devices. For example, I
couldn’t run the AI segmentation on a Windows device, which would likely perform worse than the
M1 Pro with an explicit neural engine model accelerator for things such as voice recognition in the
form of Apple’s "Siri" or "FaceID." Furthermore, the "Conda" environment limited the use of
supported GPUs to Nvidia manufacturers with CUDA cores only, which aren’t much different from
FPU floating point operations units. The other limitation to the experiment was access to data from
Apple, as it was not possible to get adequate data about SoCs' power consumption, which could bias
the final result. The main research opportunity associated with the experiment is testing the M1 Pro
against other rare ARM SoCs in RISC architecture and comparing them with new Nvidia GPUs using
software supporting more computing devices.
34
6.
Conclusion.
However, this research has shown the significant power of the mobile M1 Pro SoC in comparison to
one of the most powerful GPUs nowadays. Despite losing against both the desktop Ryzen 7 3800X
and the desktop RTX 3060 TI, which had access to six times the power that the M1 Pro had access to,
packed in the small and thin size of a laptop, the performance turned out to be astonishing. When
compared to other Intel-based macbooks with CISC architecture and the ability to run both mobile
and desktop versions of software, it is clear that mobile processing units are becoming increasingly
efficient with the advancement of ARM architecture-based processing units and software
optimization.When combined and used with Nvidia's GPU's SLI link, Nvidia's GPU, with 4864 cuda
cores and the possibility of greater parallel computing, and AMD's 5300M, one of the two most
powerful Intel generations of macOS GPUs, could revolutionize the AI world, either limited by the
already too computationally intensive Windows OS or by a lack of graphics memory with a great
OS.To address the research question, the type of processing unit has a significant impact on efficiency
in terms of time and power consumption, as well as effectiveness in terms of increased model
accuracy and validation. Experiments have shown that CPUs are more effective but less efficient,
taking unproportionally more time to train and run the model. However, differences in effectiveness
are likely due to varying internal logic architectures amongst those units, which cause them to come
up with slightly different outcomes, for example in double floating point operations, etc. When it
comes to fast and effective neural network training, GPUs remain the best choice. Moreover, running
pretrained models on processing units designed specifically for the model is far more efficient,
whereas general-purpose computing devices such as a GPU or CPU will simply be inefficient.
35
7.
Bibliography:
[1]
SoC
M1PRO
(no
date).
Available
at:
https://www.anandtech.com/show/17019/apple-announced-m1-pro-m1-max-giant-new-socs-with-allo
ut-performance (Accessed: February 14, 2023).
[2]
GPU
structure
(no
date).
https://core.vmware.com/resource/exploring-gpu-architecture#section3
Available
(Accessed:
February
at:
14,
2023).
[3]
Neural
Network
(no
date)
learnopencv.
Available
at:
https://learnopencv.com/understanding-activation-functions-in-deep-learning/ (Accessed: February
14, 2023).
[4]
Neurons
vs
Artificial
Neurons
(no
date).
Available
at:
https://towardsdatascience.com/the-concept-of-artificial-neurons-perceptrons-in-neural-networks-fab2
2249cbfc (Accessed: February 14, 2023).
[5]
dishashree26 (2022) Activation functions: Fundamentals of deep learning, Analytics Vidhya.
Available
at:https://www.analyticsvidhya.com/blog/2020/01/fundamentals-deep-learning-activation-functions-w
hen-to-use-them/ (Accessed: February 14, 2023).
[6]
Syed, A. (2020) AI vs. ML vs. DL (vs. NN), Medium. A Coder's Guide to AI. Available at:
https://medium.com/a-coders-guide-to-ai/ai-vs-ml-vs-dl-vs-nn-f6968db769d1 (Accessed: February
14, 2023).
[7]
Typical Nvidia GPU architecture. the GPU is comprised of a set of ... (no date). Available at:
https://www.researchgate.net/figure/Typical-NVIDIA-GPU-architecture-The-GPU-is-comprised-of-aset-of-Streaming_fig1_236666656 (Accessed: February 14, 2023).
36
[8]
TimTim 16.7k6767 gold badges178178 silver badges259259 bronze badges and AndyAndy 1
(1966) Does bios need to be loaded into main memory to be executed by CPU?, Super User. Available
at:
https://superuser.com/questions/1407254/does-bios-need-to-be-loaded-into-main-memory-to-be-execu
ted-by-cpu (Accessed: February 14, 2023).
[9]
Apple
Core
ML
(no
date)
Apple
Developer
Documentation.
Available
at:
https://developer.apple.com/documentation/coreml (Accessed: February 14, 2023).
[10]
admin_mirabilis (2021) Apple Neural Processor, Mirabilis Design. Available at:
https://www.mirabilisdesign.com/apple-neural-processor/ (Accessed: February 14, 2023).
[11]
Hollance (no date) Hollance/neural-engine: Everything we actually know about the Apple
Neural Engine (ANE), GitHub. Available at: https://github.com/hollance/neural-engine (Accessed:
February 14, 2023).
[12]
Apple M1 chip. everything you wanted to know about it (2021) Logidots. Available at:
https://logidots.com/insights/apple-m1-chip-everything-you-wanted-to-know-about-it/
(Accessed:
February 14, 2023)
[13]
Tkshirakawa (no date) Tkshirakawa/AIS_TRAINING_CODESET: Python code to train
neural network models with your original dataset for semantic segmentation. this codeset also
includes a converter to create macos core ML models from trained Keras models for
A.I.Segmentation., GitHub. Available at: https://github.com/tkshirakawa/AIS_Training_Codeset
(Accessed: February 14, 2023). .
[14]
Softneta (no date) Dicom Library - Anonymize, share, view dicom files online,
DICOMLibrary. Available at: https://www.dicomlibrary.com/ (Accessed: February 14, 2023).
[15]
Tkshirakawa (no date) TKSHIRAKAWA/AISEGMENTATION_V141: The public source
code of a.i.segmentation (AIS) version 1.4.1. AIS is a plugin of OsiriX for macos enables semantic
37
segmentation of medical images using artificial intelligence (core ML framework) in macos., GitHub.
Available at: https://github.com/tkshirakawa/AISegmentation_v141 (Accessed: February 14, 2023).
[16]
M1
Pro
motherboard
(no
date).
Available
at:
https://www.ifixit.com/Guide/MacBook+Pro+14-Inch+2021+Chip+ID/145718 (Accessed: February
14, 2023).
[17]
Apple
specifications
(no
date)
(UK).
Available
at:
https://support.apple.com/kb/SP783?locale=en_GB (Accessed: February 14, 2023).
[18]
image
of
geforce
rtx
3060ti
(no
date).
Available
at:
https://www.techpowerup.com/review/gigabyte-geforce-rtx-3060-ti-gaming-oc-pro/3.html (Accessed:
February 14, 2023).
[19]
Hollance (no date) Hollance/neural-engine: Everything we actually know about the Apple
Neural Engine (ANE), GitHub. Available at: https://github.com/hollance/neural-engine (Accessed:
February 14, 2023).
[20]
admin_mirabilis (2021) Apple Neural Processor, Mirabilis Design. Available at:
https://www.mirabilisdesign.com/apple-neural-processor/ (Accessed: February 14, 2023).
[21]
Exploring the GPU architecture: Vmware (no date) The Cloud Platform Tech Zone. Available
at: https://core.vmware.com/resource/exploring-gpu-architecture#section3 (Accessed: February 14,
2023). (Accessed: February 14, 2023).
[22]
Torch.utils.data (no date) torch.utils.data - PyTorch 1.13 documentation. Available at:
https://pytorch.org/docs/stable/data.html (Accessed: February 14, 2023).
[23]
Deep Network designer (no date) VGG-16 convolutional neural network - MATLAB.
Available at: https://www.mathworks.com/help/deeplearning/ref/vgg16.html (Accessed: February 14,
2023).
38
[24]
7.6. Convolutional Neural Networks (lenet)¶ colab [pytorch] open the notebook in colab colab
[mxnet] open the notebook in colab colab [jax] open the notebook in colab colab [tensorflow] open
the notebook in colab sagemaker studio lab open the notebook in SageMaker Studio Lab (no date) 7.6.
Convolutional Neural Networks (LeNet) - Dive into Deep Learning 1.0.0-beta0 documentation.
Available at: https://d2l.ai/chapter_convolutional-neural-networks/lenet.html (Accessed: February 14,
2023).
[25]
Furkan
Gulsen
(no
date)
What
is
a
Tensor?
Available
at:
https://furkangulsen.medium.com/what-is-a-tensor-ce8e78835d08 (Accessed: February 14, 2023).
[26]
TimTim 16.7k6767 gold badges178178 silver badges259259 bronze badges and AndyAndy 1
(1966) Does bios need to be loaded into main memory to be executed by CPU?, Super User. Available
at:
https://superuser.com/questions/1407254/does-bios-need-to-be-loaded-into-main-memory-to-be-execu
ted-by-cpu (Accessed: February 14, 2023).
[27]
Zheng, H. (1970) Model validation, Machine Learning, SpringerLink. Springer New York.
Available
at:
https://link.springer.com/referenceworkentry/10.1007/978-1-4419-9863-7_233
(Accessed: February 14, 2023).
[28]
ReLu
activation
Function
(no
date)
researchgate.
Available
at:
https://www.researchgate.net/figure/ReLU-activation-function_fig7_333411007 (Accessed: February
14, 2023).
8.
Appendix:
A - Machine learning projects
39
Mlp-mnist.py is a linear and Relu activation function-based machine learning program for
handwritten number recognition, creating 10 classes, each for a single digit, in the "mnist"
database. Modified National Institute of Standards and Technology MNIST is made up of 600,000
handwritten 28x28 pixel images:
Crucial pieces of code included only*
# "argparse" module makes it
easy to write user-friendly
command-line interfaces.
Mac-->Terminal,
Windows --> Comannd line
#Setting a random seed as an
image transformation
parameter
#Parsing or passing the
argument or information
regarding the processing unit
used
#displaying library version and
device used
#Setting the number of epochs
#determining batch size in a
single epoch
#transforming the images by
resizing them
#transforming the images by
converting them to matrices of
multilinear relationships
between pictures and
(Tensors relate to data
structures where the data can
be scalar as a singular number,
a vector with two variables, a
matrix with rows and columns,
40
or even tensors where rows
and columns contain matrices.)
26
#defining the model structure
with 784 inputs etc
#Assigning linear activation
function to the first layer and
every next hidden layer, ReLu
activation function for the use
in model
LeNet-mnist.py is a project similar to mlp-mnist except it is using the LeNet5 network architecture,
which consists of 2 convolutional layers connected with pooling layers and a dense block
consisting of 3 layers, giving a total of 7 layers. Additionally, it is taking such 28x28 pictures to
create 784-dimensional [] vectors, which can be explained as an array consisting of 784 pixels per
picture, which has to be transformed and results in 10 final classes.
26
(What is a Tensor?)
41
LeNet - 5 - architecture.27
Crucial pieces of code included only*
#Defining the type of network, in this case
it is a convolutional neural network, which
relates to the mathematical convolution of
functions, resulting in a third function with
certain traits of the initial two functions.
#A classifier, like in mlp-mnist, is used to
classify the images; here, linear and tanh
activation and transformation functions are
used.
#defining the model as LeNet5, which is a
7-layer convolutional neural network
inputting 28x28 pixel images and dividing
them into 10 classes in greyscale.
#Assigning functions to one model
function and later calling all the functions
and setting logging (saving data of the
processed batch) to intervals of 100
samples of pictures.
27
(7.6. Convolutional Neural Networks (lenet))
42
Vgg16-cifar10.py28 - is another convolutional neural network made up of 16 layers classifying up
to 1000 different objects such as animals, devices and office accessories. Input picture 224x224
giving vectors of 50176
Crucial pieces of code included only*
#”Data loader. It combines a dataset and a
sampler and provides an iterable over the given
dataset. The DataLoader supports both
map-style and iterable-style datasets with
single- or multi-process loading.”
#cifar10 data retrieval function
#downloading vgg16
B - Machine learning training benchmark
28
(Deep Network designer)
43
ROG GAMING Ryzen 5 CPU
Benchmark File - mlp-mnist.py
Epoch
counter
Batch
loss
001/001 0000/04 2.3063
21
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
91.51%
93.50%
92.02%
0.10
0.23
0.27
001/001 0100/04 0.3429
21
001/001 0200/04 0.3083
21
001/001 0300/04 0.3685
21
001/001 0400/04 0.3488
21
ROG GAMING NVIDIA GeForce RTX 3060 CUDA
44
Benchmark File - mlp-mnist.py
Epoch
counter
Batch
loss
001/001 0000/04 2.3063
21
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
91.43%
93.38%
92.15%
0.11
0.24
0.28
001/001 0100/04 0.3429
21
001/001 0200/04 0.3083
21
001/001 0300/04 0.3685
21
001/001 0400/04 0.3482
21
ROG GAMING ryzen 5 CPU
Benchmark File - lenet-mnist.py
Epoch
counter
Batch
loss
001/001 0000/04 2.3098
21
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
97.32%
97.77%
97.40%
0.11
0.24
0.28
001/001 0100/04 0.2646
21
45
001/001 0200/04 0.1437
21
001/001 0300/04 0.1009
21
001/001 0400/04 0.0734
21
ROG GAMING NVIDIA GeForce RTX 3060 CUDA
Benchmark File - lenet-mnist.py
Epoch
counter
Batch
loss
001/001 0000/04 2.3098
21
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
97.32%
97.78%
97.40%
0.14
0.27
0.31
001/001 0100/04 0.2646
21
001/001 0200/04 0.1437
21
001/001 0300/04 0.1012
21
001/001 0400/04 0.0734
21
46
ROG GAMING ryzen 5 CPU
Benchmark File - vgg16-cifar10.py
Epoch
counter
Batch
loss
001/001 0000/14 2.6287
06
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
36.01%
35.04
36.22
235.61
313.68
329.47
001/001 0100/14 2.1928
06
001/001 0200/14 1.9123
06
001/001 0300/14 2.0286
06
001/001 0400/14 2.0359
06
001/001 0500/14 1.9385
06
001/001 0600/14 1.9830
0604
47
001/001 0700/14 1.8315
06
001/001 0800/14 2.0363
06
001/001 0900/14 1.8601
06
001/001 1000/14 1.7345
06
001/001 1100/14
06
1.8575
001/001 1200/14 1.8636
06
001/001 1300/14 2.0912
06
001/001 1400/14 1.8696
06
ROG GAMING NVIDIA GeForce RTX 3060 CUDA
Benchmark File - vgg16-cifar10.py
48
Epoch
counter
Batch
loss
001/001 0000/14 2.8048
06
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
24.56%
24.50%
25.22%
8.01
11.05
11.68
001/001 0100/14 2.2773
06
001/001 0200/14 2.3926
06
001/001 0300/14 2.2269
06
001/001 0400/14 2.1138
06
001/001 0500/14 2.1860
06
001/001 0600/14 2.1008
0604
001/001 0700/14 2.1855
06
001/001 0800/14 2.0561
06
001/001 0900/14 2.1550
06
001/001 1000/14 2.1875
06
001/001 1100/14
06
2.0554
001/001 1200/14 2.1677
06
001/001 1300/14 1.9814
06
001/001 1400/14 2.2570
06
49
APPLE M1 series MPS
Benchmark File - lenet-mnist.py
Epoch
counter
Batch
loss
001/001 0000/04 2.3098
21
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
97.33%
97.75%
97.39%
0.22
0.36
0.41
001/001 0100/04 0.2646
21
001/001 0200/04 0.1437
21
001/001 0300/04 0.1010
21
001/001 0400/04 0.0734
21
50
APPLE M1 series CPU
Benchmark File - lenet-mnist.py
Epoch
counter
Batch
loss
001/001 0000/04 2.3098
21
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
97.32%
97.75%
97.40%
2.29
3.49
3.81
001/001 0100/04 0.2646
21
001/001 0200/04 0.1437
21
001/001 0300/04 0.1010
21
001/001 0400/04 0.0734
21
51
APPLE M1 series MPS
Benchmark File - vgg16-cifar10.py
Epoch
counter
Batch
loss
001/001 0000/14 2.7701
06
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
18.19%
18.26%
18.52%
30.40
38.32
40.11
001/001 0100/14 2.3483
06
001/001 0200/14 2.2327
06
001/001 0300/14 2.2476
06
001/001 0400/14 2.3149
06
001/001 0500/14 2.2989
06
001/001 0600/14 2.2574
0604
001/001 0700/14 2.1690
52
06
001/001 0800/14 2.1377
06
001/001 0900/14 1.0730
06
001/001 1000/14 1.9288
06
001/001 1100/14
06
1.1098
001/001 1200/14 2.2131
06
001/001 1300/14 2.1121
06
001/001 1400/14 2.1564
06
APPLE M1 series CPU
Benchmark File - vgg16-cifar10.py
53
Epoch
counter
Batch
loss
001/001 0000/14 2.6052
06
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
31.06%
31.38
31.61
418.48
550.32
576.87
001/001 0100/14 2.4348
06
001/001 0200/14 1.9956
06
001/001 0300/14 1.8892
06
001/001 0400/14 2.1870
06
001/001 0500/14 1.9244
06
001/001 0600/14 2.0415
0604
001/001 0700/14 2.0132
06
001/001 0800/14 2.0168
06
001/001 0900/14 2.0304
06
001/001 1000/14 1.7992
06
001/001 1100/14
06
1.8867
001/001 1200/14 1.7387
06
001/001 1300/14 1.6586
06
001/001 1400/14 1.8780
06
54
APPLE M1 series mps
Benchmark File - mlp-mnist.py
Epoch
counter
Batch
loss
001/001 0000/04 2.3063
21
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
91.63%
93.48%
92.15%
0.13
0.27
0.32
001/001 0100/04 0.3429
21
001/001 0200/04 0.3103
21
001/001 0300/04 0.3708
21
001/001 0400/04 0.3499
21
55
APPLE M1 series CPU
Benchmark File - mlp-mnist.py
Epoch
counter
Batch
loss
001/001 0000/04 2.3063
21
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
91.51%
93.50%
92.02%
0.13
0.27
0.32
001/001 0100/04 0.3429
21
001/001 0200/04 0.3083
21
001/001 0300/04 0.3685
21
001/001 0400/04 0.3488
21
56
Apple Macbook Air Intel(R) UHD 617 - CPU
Benchmark File - mlp-mnist.py
Epoch
counter
Batch
loss
001/001 0000/04 2.3063
21
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
91.51%
93.50%
92.02%
0.26
0.56
0.65
001/001 0100/04 0.3429
21
001/001 0200/04 0.3083
21
001/001 0300/04 0.3685
21
001/001 0400/04 0.3488
21
Apple Macbook Air Inter(R) UHD 617 - CPU
Benchmark File - lenet-mnist.py
57
Epoch
counter
Batch
loss
001/001 0000/04 2.3098
21
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
97.33%
97.77%
97.39%
0.58
0.98
1.08
001/001 0100/04 0.2646
21
001/001 0200/04 0.1437
21
001/001 0300/04 0.1009
21
001/001 0400/04 0.0732
21
21;25 - 7.16 still not done - 9.51
Not possible to run on cuda due to the lack of support for non nvidia GPUs
Neither possible to run on the whole system with that exact same software setup.
Apple Macbook Air Inter(R) UHD 617 - CPU
Benchmark File - vgg16-cifar10.py
Epoch
counter
Batch
loss
001/001 0000/14 2.4554
06
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
591 min
at the
time the
test
was
shut
down
58
001/001 0100/14 2.4443
06
001/001 0200/14 2.3538
06
001/001 0300/14 1.9436
06
001/001 0400/14 2.026
06
MacBook Pro 16-inch 2019
Benchmark File - vgg16-cifar10.py
Epoch
counter
Batch
loss
001/001 0000/14 2.7701
06
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
36.88%
38.10%
38.29%
727
824.72
843.63
001/001 0100/14 2.3483
06
001/001 0200/14 2.2327
06
001/001 0300/14 2.2476
59
06
001/001 0400/14 2.3149
06
001/001 0500/14 2.2989
06
001/001 0600/14 2.2574
0604
001/001 0700/14 2.1690
06
001/001 0800/14 2.1377
06
001/001 0900/14 1.0730
06
001/001 1000/14 1.9288
06
001/001 1100/14
06
1.1098
001/001 1200/14 2.2131
06
001/001 1300/14 2.1121
06
001/001 1400/14 2.1564
06
MacBook Pro 16-inch 2019
Benchmark File - lenet-mnist.py
Epoch
Batch
loss
Train:
Validati
Test
Time/ep Total
Total
60
counter
001/001 0000/04 2.3098
21
97.33%
on:
accurac
y
och
without
evaluati
on(min)
Training Time:
time
(min)
(min)
97.77%
97.39%
0.30
0.52
0.58
001/001 0100/04 0.2646
21
001/001 0200/04 0.1438
21
001/001 0300/04 0.1011
21
001/001 0400/04 0.0733
21
MacBook Pro 16-inch 2019
Benchmark File - mlp-mnist.py
Epoch
counter
Batch
loss
001/001 0000/04 2.3063
21
Train:
Validati
on:
Test
accurac
y
Time/ep
och
without
evaluati
on(min)
Total
Total
Training Time:
time
(min)
(min)
91.36%
93.23%
91.89%
0.16
0.39
0.45
001/001 0100/04 0.3437
21
001/001 0200/04 0.3072
21
001/001 0300/04 0.3702
21
61
001/001 0400/04 0.3527
21
C - DICOM AI Segmentation
Apple Macbook Pro 16-inch 2019 (Without Neural Engine)
DICOM
IMAGE
SLICES
CPU-GPU-ANE*
CPU-GPU
CPU
Core ML
time
Computat Core ML
ion time
time
Computat Core ML
ion time
time
Computat
ion time
Abdominem
100
3,016968
7,427609
2,177623
6,317432
41,70753
7
46,01953
0
Abdominem
200
4,286666
9,709627
4,305234
9,714994
82,29784
5
87,81043
3
Skull front
100
2,200989
6,386208
2,163848
6,472658
42,21029
5
46,58363
0
Skull
500
10,76283
1
20,99885
2
10,80430
5
21,38629
5
203,3322
55
214,0224
62
100
2,165628
6,621599
2,146345
6,221807
41,74256
2
46,25595
3
SUM
and 1000
Avg time per
slice
in
milliseconds
22,43308
2
51,14389
5
21,59735
5
50,11318
6
411,2904
94
440,6920
08
Benchmark
Skull
Benchmark
Apple Macbook Air Inter(R) UHD 617 - (Without Neural Engine)
DICOM
IMAGE
SLICES
CPU-GPU-ANE
CPU-GPU
CPU
62
Core ML
time
Computat
ion time
Core ML
time
Computat
ion time
Core ML
time
Computat
ion time
Abdominem
100
45,63516
2
56,49102
1
48,92469
8
57,09714
2
79,04695
5
88,07042
9
Abdominem
200
110,2708
18
127,3409
24
121,0103
79
142,1640
06
334,1804
14
355,7949
79
Skull front
100
49,68498
5
61,51804
4
58,31912
1
70,01487
5
166,9576
88
178,8497
85
Skull
Benchmark
500
259,6555
09
289,0464
55
254,8884
90
285,3958
43
404,9628
19
442,3897
22
Skull
Benchmark
100
51,27504
6
59,28404
1
49,90485
7
58,53897
0
82,09275
6
91,14854
6
SUM and 10
Avg
time
per slice in
milliseconds
516,5215
2
593,6804
85
533,0475
45
613,2108
36
1067,240
632
1156,253
461
Apple Macbook Pro 14’ 2021 - M1 Pro
DICOM
SLICES
IMAGE
CPU+GPU+ANE
CPU+GPU
CPU
Core ML
time
Computat
ion time
Core ML
time
Computat
ion time
Core ML
time
Computat
ion time
Abdominem
100
0,770511
4,125617
1,994017
3,661932
16,19058
1
18,09745
4
abdominem
200
1,532248
5,807359
3,181870
5,719136
32,38371
4
35,46040
5
Skull front
100
0,763828
3,776746
1,590075
3,089768
15,82623
2
17,50779
8
63
Skull
benchmark
100
0,767479
4,001889
1,615191
3,342085
15,46396
3
17,47674
4
Skull
benchmark
500
3,847994
12,42973
2
7,918357
14,03846
4
80,21617
1
87,97623
4
SUM and 1000
Avg
time
per slice in
milliseconds
7,68206
30,14134
3
16,29951
29,85138
5
160,0806
61
176,5186
35
64
Download