Uploaded by Jawand Singh

Research Report: Measuring eBPF Tracing Overhead on Ubuntu Servers

advertisement
Measuring eBPF Tracing Overhead on Ubuntu Servers
Jawand Singh
Thomas Jefferson High School for Science and Technology
Abstract—This project quantified the cost of extended Berkeley Packet Filtering (eBPF)
tracing on the throughput performance of a standardized workload. During the project, the
relationship between the throughput performance and number of probes, number of events
recorded, and type of probe respectively were captured. Several standardized experiments
were created: varying the number of context switch probes used for tracing, varying the
type of probe used for tracing, and varying the number of cores being used during other
experiments. These experiments were then run on three computers with varying hardware
configurations. Generally, it was found that increasing the number of context switch probes
caused logarithmic decay in throughput performance. Notably, it was found that using one
core instead of two when running the standard workload without tracing consistently
improved throughput performance, regardless of machine. One machine saw a
performance increase of approximately nine percent when restricted from two to one cores.
This unexpected result was validated by a simple experiment, demonstrating that one core
outperformed two cores regardless of the number of threads the workload used. However,
when tracing is added, increasing the number of cores almost always improves the
computer’s performance. Finally, this project found that experiments run on Amazon Web
Services occasionally demonstrate periods of instability, where performance would fall
dramatically. As a result of this work, researchers and system administrators can better
understand the impact of tracing on their workloads. Additionally, system administrators
can consider selecting a single core instance on cloud platforms such as Amazon Web
Services as it may in fact improve performance compared to a more powerful and
expensive instance. Finally, this project indicates the possibility that using non virtualized
machines may provide greater stability than machines hosted on the cloud.
I. Background
A. Introduction
Tracing is used by researchers, developers, and system administrators to better understand highly
complex operating systems. However, tracing has an associated resource cost, known and tracing
overhead, that not just reduces performance but can also interfere with the results of tracing.
Especially for researchers, who need high accuracy data to investigate the operating system,
tracing overhead can potentially interfere with their results. Therefore, measuring the impact of
tracing overhead provides the data needed to model overhead for different hardware
configurations. With a high accuracy model of tracing overhead, system administrators and
developers can factor the cost of tracing into their work, while researchers can adjust their data to
account for the precise training overhead.
B. The Operating System and Kernel
The operating system (OS) is software that serves as an interface between hardware resources
and the user of a computer. Instead of having to interact with a computer’s hardware directly, the
OS abstracts many functions, significantly improving the experience of using a computer. As a
result, a well designed OS allows the user to execute programs efficiently and conveniently.
Modern programming languages, used by developers, rely heavily on OS abstractions to create,
increasing the efficiency and ease of creating programs. Therefore, the design of an OS is critical
for both standard users and developers both in terms of the efficiency of applications being run
on the OS and the user’s overall experience [1].
One of the most important tasks of the OS is managing the limited resources in a
computer. Typically, an OS will manage the central processing unit (CPU), memory, file storage,
input/output (I/O), devices, and network connections [2]. At the center of the OS is the kernel,
with four primary tasks: memory management, process management, device drivers, and system
calls [3]. When done properly these tasks result in an efficient computer. One example, process
management is implemented through a scheduler, a of the kernel. Effective scheduling and
process execution ensures that applications and processes do not interfere with each other. Based
on factors such as process priority, waiting time, and available resources the scheduler will
determine which process to execute next and which CPU to assign it to. Depending on the goals
of the operating system, different scheduling policies are implemented optimizing for different
results [4]. However, while the scheduler automates the majority of process management, the
Linux OS allows users to manually set some values. For example, the taskset command sets the
CPU affinity of running processes, which binds processes to the set of CPUs defined by the user
[5]. Nice can set the priority of processes, but this value is adjusted by the scheduler dynamically
over time [6]. The CPU frequency governor, or simply the governor, allows users to adjust the
CPU clock frequency. Performance sets the CPU to the highest frequency, resulting in the
greatest performance and power consumption while powersave does the opposite [7].
When applications need to access hardware resources or other kernel functionality the
kernel provides standardized methods to do so, called system calls. By blocking standard
applications from accessing hardware resources, the OS can protect the computer against rogue
applications while also abstracting the process of accessing hardware resources [1]. If the OS
follows the portable operating system interface standards, then system calls will be standardized
regardless of manufacture [8]. Therefore, developers will find it easier to adapt their programs to
various operating systems.
Among the many OSs available, Linux is one of the most popular due to its open-source
nature and strong development community. Linux is used from personal computers to business,
as an enterprise ready OS [9]. Since the Linux OS is open-source, there are a variety of
distributions available for multiple purposes. One widespread distribution is Ubuntu, which is
often used for cloud computing, enterprise, and personal use. While Linux, and its variations
such as Ubuntu are common and effective OSs, they are also highly complex pieces of software
that can be difficult to understand and improve. This complexity can be seen from the fact that
thousands of developers have worked on the Linux kernel since 2005 and the kernel itself is
incredibly large, at more than 23 million lines of code, one of the largest software projects in the
world [10]. Therefore, trying to modify or understand kernel behavior can be incredibly difficult
without the use of tools such as tracing and profiling.
C. Tracing and Profiling
Tracing and profiling are critical tools in computer system analysis, widely utilized in software
development, system administration, and research. Due to the complexity of modern operating
systems, these tools are essential in comprehending kernel behavior. They offer valuable insights
into the system's performance, identifying program problems, optimizing application efficiency,
and pinpointing weaknesses in the operating system. In short, tracing and profiling enable
developers to diagnose and improve system performance, leading to more reliable and efficient
computer systems [11].
Tracing allows for detailed analysis by recording system events during the execution of a
program on a computer. Events such as a context switch can help users see where CPU time is
being allocated. Other events, such as system calls, can indicate how many times a process
interacts with the kernel. When an event occurs, secondary data can be captured, including the
process that caused the event and the timestamp. To maintain this data, a buffer is used and
intermittently written to stable storage [11]. To achieve kernel tracing, different tools can use
tracepoints, which are markers that allow tools to create a simple call back mechanism [12].
When a tracepoint is turned on, the associated function will be executed when the tracepoint is
triggered by its associated event [13]. The final data can be used for detailed analysis of the
program run while tracing occurs. However, it is essential to note that the resource cost incurred
from tracing may impact the captured results.
Profiling is used to summarize performance metrics such as CPU time used for a process,
the memory/time complexity of a program, and the frequency of function calls. There are two
primary methods of profiling, sampling or measuring. With sampling, the state of the computer is
recorded at a set interval. Compared to tracing, this significantly reduces the resources for
recording a particular event each time it occurs. Measuring-based profiling, on the other hand, is
closer to tracing in that it continuously captures data but does not retain temporal data. While
measuring-based profiling may have a higher overhead than sampling-based profiling, its cost is
still less than tracing [11].
D. Tools for Tracing and Profiling
Tracing and profiling are both powerful tools, and there are a variety of tools used depending on
the specific need. For example, both tracing and profiling can be done at the hardware, kernel,
and user levels. There are many tools available for various purposes on Linux and its
distributions. Some of the most common include perf, function trace (ftrace), gprof, and bpftrace.
Perf (also known as perf_events or perf tools) allows for tracing and profiling in both user
and kernel spaces. Unlike some older tools, perf is built into the kernel and is relatively easy to
use. Various subcommands allow for data collection. For example, “stat” measures event count,
“top” provides a dynamic list of the most resource intensive functions and “record” samples data
for a program [14].
Ftrace offers various tools to provide visibility into the kernel and is often used to trace
kernel-space events. The gathered data can assist users in analyzing performance issues
occurring outside of the user space and debugging issues not otherwise easily accessible. Outside
of event tracing, ftrace can also be used for latency tracing [15].
Gprof allows users to build an execution profile for programs written in low-level
languages, such as C and Pascal. Within the flat profile, function time and calls are easy to
access. The call graph shows which functions are calling other functions and how many times
they are called [16]. Overall, gprof is included with Linux and is a standard tool used to profile
programs.
Bpftrace is a high-level language used for tracing and profiling. Built on top of the
Extended Berkeley Packet Filter (eBPF), which is a virtual machine used to run user-defined
code in the kernel. As a result, bpftrace is a low-overhead solution for users looking to trace or
profile on Linux. There are a variety of capabilities and tools written with bpftrace available
online [17].
E. eBPF and bpftrace
eBPF is a low-level virtual machine that has been integrated into the Linux kernel since 2014.
With eBPF, users can create sandboxed programs and run them in the kernel safely and
efficiently. As a result, eBPF has allowed users to expedite experimentation and evolution of the
kernel, which has normally been highly restricted due to security and stability concerns. While
eBPF was initially used primarily for network purposes, it has since evolved to include other
functions such as tracing, security, and performance analysis for the Linux kernel [18].
Over time, other tools have been built with eBPF, such as bpftrace, a high-level language
used for tracing and profiling. Bpftrace simplifies the process to leverage some eBPF capabilities,
and is compiled to BPF-bytecode, making it a low-overhead solution. Bpftrace has a large set of
capabilities, such as attaching code to a tracepoint which will then execute when the associated
events occur. Additional tools have been built with bpftrace and are available online on GitHub.
Some tools include visualization I/O latency as a histogram, printing entered bash commands
across the system, and tracing when new processes with exec() are called [17]. Overall, bpftrace
offers tracing and profiling using eBPF with a rich set of features and straightforward syntax.
F. Overhead and Tracing Overhead
In computing, overhead refers to indirect or excess resources consumed to complete a task.
Depending on the application in question, there can be many different types of overhead. Some
examples are: protocol overhead (bandwidth spent on headers instead of data), data structure
memory overhead (such as memory spent on pointers instead of data), and method call overhead
(CPU cycles spent on setting up a stack frame, copying parameters, and setting a return address).
Generally, overhead can be considered resources spent on tasks that do not directly contribute to
the final result, but are still necessary due to the technology being used [19].
When attempting to understand a program’s behavior with tracing, there is an associated
cost known as tracing overhead [20]. The tracing overhead can impact both the performance of
the program being analyzed and the accuracy of data collection. This resource cost can be
incurred as higher CPU cycles, memory usage, disk I/O, and disk space. Depending on the level
of tracing overhead, the computer’s behavior may change.
Tracing overhead has two large components: instrumentation overhead and measurement
overhead. The cost of inserting and executing probes is referred to as instrumentation overhead.
Depending on the complexity, type and placement of the probe, the resulting resource costs can
vary significantly. The cost of collecting and processing data from tracing is referred to as
measurement overhead, which can vary depending on the frequency of the event being traced.
Instrumentation overhead will primarily appear as additional memory and CPU cycles while
measurement overhead can comprise additional CPU cycles, memory usage, disk I/O, and disk
storage [21].
Solutions built with eBPF are considered low-overhead tools, but using tools such as
bpftrace can still affect program performance when used [17][18]. By understanding what factors
affect the overhead when using bpftrace, end-users and researchers can better account for tracing
overhead caused by bpftrace.
II. Methodology
A. Hardware
Three different sources were used for hardware during experimentation, each running Ubuntu
20.04. One source was hosted on Amazon Web Services (AWS) Elastic Connect 2 (EC2)
(referred to as “aws” during experimentation). Another two computers were physically accessed
(“home” and “school”). All computers possessed an intel processor. On AWS, three different
instances were used, each of the t2 type which provide consistent baseline performance and
burstable CPU [22]. The default CPU governor, as dictated by the Ubuntu OS, is powersave, but
can be changed to performance on the physically accessed machines. Additionally, on the
physically accessed machines the number of cores being used by an experiment can be adjusted
with the taskset command, and is noted by adding the number of cores after the machine name
(i.e. school0 indicates the school machine running on core 0).
Official name
Machine name
Number of
CPUs/cores
vCPU?
t2.micro
aws1
One
Yes
t2.medium
aws2
Two
Yes
t2.2xlarge
aws8
Eight
Yes
N/A
school
Twelve
No
N/A
home
Eight
No
Fig. 1: Description of Hardware Used
B. Machine Setup
In order to standardize the software environment, the same setup script was run on each machine
prior to any experiments being run. The list of commands run, in order, with explanations are
listed:
1. Update the system (sudo apt update; sudo apt upgrade -y; sudo apt autoremove -y)
2. Install eBPF through bpftrace and pip for python packages (sudo apt install
bpftrace pip -y )
3. Install python packages used for data processing and visualization (sudo pip
install matplotlib seaborn pandas scikit-learn)
4. Retrieve experiment scripts through git (git clone
https://github.com/JawandS/Overhead-Research.git
5. Change to the new directory (cd Overhead-Research)
6. Enable script execution (sudo chmod u+x csExp.sh; sudo chmod u+x
probesExp.sh)
The complete script was run in two commands as follows:
1. sudo apt update; sudo apt upgrade -y; sudo apt autoremove -y; sudo apt install
bpftrace pip -y && sudo pip install matplotlib seaborn pandas scikit-learn; git
clone https://github.com/JawandS/Overhead-Research.git
2. cd Overhead-Research; sudo chmod u+x csExp.sh; sudo chmod u+x
probesExp.sh
C. Overview of Experiments
Two experiments were executed to understand tracing overhead in the course of the project. The
first, referred to as context switch experiment (found in csExp.sh), sought to understand the
relationship between the number of probes and overhead. The second, referred to as the probes
experiment (found in probesExp.sh), sought to understand the relationship between different
types of probes and the associated tracing overhead. Combined, both experiments allowed us to
better understand tracing overhead, as each captures measurement and instrumentation overhead
in different ways. To account for the number of cores present in various machines, a secondary
script was developed, named variable cores test, (found in coresExp.sh) and run on the school
machine. In the cores experiment, the context switch experiment was run with 1, 2, 4, and 8
using the taskset command. Finally, a simple experiment was run where the relationship between
throughput performance and the number of cores was measured (found in zeroCoreText.sh).
Both primary experiment scripts have similar structures and the same two arguments: run
id and governor. Run id helps identify the particular experiment run, and the file is named
machine_run#_governor. If the computer was physically accessed, the governor was selected
between performance and powersave. When the computer was hosted on the cloud, governor
management was not exposed to end users and therefore could not be adjusted.
D. Context Switch and Probe Experiments Pseudocode
Run 10 iterations
- Begin with a 5 second warmup phase
- Start an iteration (0-10 probes)
- Kill all python and eBPF processes
- Clear the log file (raw.txt)
- Sleep for 1 second
- Start tracing (depending on the experiment) and output to the log files
- Context switch experiment: start the appropriate number of probes (0-10)
(Scripts/A.bt)
- Probes experiment: start one of the probes - no tracing, rcu:rcu_utilization,
syscalls:sys_enter_nanosleep, sched:sched_switch, sched:sched_wakeup,
timer:time_start, cpu_cpuhp_enter, syscalls:sys_entergetcpu
(ProbesScripts/[A-G].bt)
- Set the end time to be 20 seconds ahead
- Execute as many jobs as possible during the 20 second window and count the
number of jobs completed
- Arguments: counter (any value) | number of threads (any number) | scale
of job (any number)
- Experiment used: python3 job.py $counter threads=500 scale=1500
-
Process: create 500 threads performing 1500 square root operations on a
random number generated each time
- Random number explanation
- Square root performed with exponential operator (** 2)
- Record the number of jobs completed and the number of events recorded in the
log file
- Reset the job counter and clear log file
- While time remains run the python job
- Increment counter with completed job
- Kill all eBPF processes
- Log the number of jobs and events to console/file
Finish running all 10 iterations
Send data to python processing script
Update the git repository with raw log and processed information
E. Experiment Output & Processing
In the primary experiments, job.py is run on a loop for 20 seconds, and the number of times
job.py completes execution is recorded as the performance metric. During the experiment, each
independent variable is run over ten iterations so that job.py is allowed to run for 200 seconds in
aggregate for each condition (such as the number of context switch probes). Therefore the
context switch experiment produces 110 data points (ten runs time eleven different probes) per
run. The probes experiment generates 80 data points (ten runs time eight different probes) per
run.
F. Examples of Experiment Script Execution
- sudo ./csExp aws_1 X
- sudo ./csExp school_1_per performance
- sudo ./probeExp cloudlab_1 X
III. Results
A. School Computer
Fig. 2: Jobs vs context switch probes, School
machine restricted to core 0 (n=220, 110 with
powersave/performance governor each)
Fig. 3: Jobs vs number of events recorded
by context switch probe(s), school machine
restricted to core 0
Fig. 4: Events vs probes, school machine
restricted to core 0
Fig. 5: Jobs vs context switch probes, for all
core configurations for the school machine
(n=1760)
Cores
Includ
es 1st
data
point?
y = a * e ^ bx + c
Standard
deviations (a, b, c)
Change in
performan
ce 0/1
probes (%)
Averag
e jobs
for 0
probes
1
Y
y = 317.58 * exp(-0.91 * x) +
48.97
2.18, 0.01, 0.84
-59.13
373
2
Y
y = 306.09 * exp(-0.32 * x) +
37.99
1.91, 0.01, 1.65
-24.84
341
4
N
y = 187.75 * exp(-0.48 * x) +
150.88
3.54, 0.01, 0.90
-20.92
347
8
N/A
N/A
N/A
-02.06
349
Fig. 6: Table with summary of information for the context switch probe experiments run on the
school machine with varying number of cores
When the school machine is restricted to one core, increasing the number of context switch
probes has a strong logarithmic effect on the number of jobs as seen from fig. 2. Adding a single
context switch probe has a significant effect on performance with an approximately 59%
decrease in performance. Similarly, a correlation between the number of events and the number
of jobs is apparent in fig. 3. Finally, increasing context switch probes does not linearly increase
the number of events as might be expected (fig. 4). Fig. 5 demonstrates that without probes a
single-core run outperforms runs with two/four/eight cores. Generally, when probes are added
more cores improves performance with one exception at two probes where two cores
outperforms four. Even at zero probes performance is ranked eight, four, and two.
These results are summarized and confirmed by numerical analysis found in fig. 6.
Notably, reducing the number of cores from two to one increases performance by approximately
nine percent on average when there are no probes. However, the decrease in performance from
zero to one context switch probe is significantly lowered by using more cores. Exponential
regressions were able to be performed on one and two core experiments as well as four core
when the first data point was removed. However, an exponential function was not able to be fit to
the data generated from the eight core experiments. While there is significant error for the
coefficient and offset the exponeneital’s power has a consistently low error.
B. AWS Machines
Fig. 7: Jobs vs probes for aws1 (n=660)
Fig. 8: Jobs vs probes for aws2 (n=330)
Fig. 9: Probes vs jobs for aws8 (n=330)
Fig. 10: Probes vs jobs for all aws
experiments
In figures 7-9 two consistent patterns emerge: first, probes and performance have an inverse
relationship and, secondly, instability appears where some data points are significantly lower than
the pattern. One exception in fig. 7 is that at two probes there is an uptick performance.
Additionally, in fig. 7, orange/blue data points were generated from experiments run during
normal working hours while the brown/purple/green/red runs were executed outside of normal
working hours. In fig. 10 one core outperforms two/eight, corroborating results from the school
machine. When viewing all curves together performance is generally eight > two > one cores
when tracing is added, with the exception of the noise diverging from the patterns.
C. Multi Core Experiment
Fig. 11: Jobs vs threads for school machine
In order to validate the observation of one core outperforming more on the aws and school
machines, the results from the multi core experiment is included (fig. 11). Regardless of the
number of threads running the standardized job, without the presence of tracing one core out
performs two/four. However, four cores consistently outperforms two cores.
IV. Conclusion
A. Results and Implications
From this project three major findings resulted. Firstly, without the presence of tracing,
restricting the number of cores to one can increase performance. However, when tracing is
added, increased cores results in greater performance. This result was observed running the
vCPUs on AWS and on a physical CPU by adjusting the number of cores. A major implication of
this finding is that it is possible for some workloads businesses may select a lower tier cloud
instance, decreasing their cost, while increasing their performance. It is possible that this
performance difference is caused by the overhead cost from managing more than one core. This
is supported by the data from fig. 11 where four cores outperforms two while underperforming
significantly compared to a single core.
Secondly, when experiments were run AWS, while a pattern emerged from the data there
were also many data points diverging from this pattern. This may indicate that some source is
interfering with the workload being run on the cloud, possibly from another user’s workload
interfering with the ones run in this project. Alternatively the instances used could be
experiencing performance throttling. This instability is especially visible when compared to non
virtualized hardware, where little to no instability in results was observed.
Finally, while the number of probes and performance has a general inverse relationship
there are some expectations and variations in the exact pattern of data. During some number of
probes, there is a performance uptick that serves as an outlier compared to the rest of the pattern.
When the number of cores is increased, sometimes exponential functions are no longer able to fit
onto the data.
To better validate these results, in the future the workload selected should avoid
containing any elements of randomness. Before being used in experimentation, the file
containing the workload program should also have extraneous code removed. The design of
experimentation should also better isolate instrumentation vs measurement overhead. Finally, the
time per job should be recorded, not just the number of jobs completed during each interval.
B. Further Investigation
Based on the results of this project, in the future the effect of other hardware resources on the
level of tracing. Understanding why performance upticks occur could provide insights into OS or
CPU design. Running experiments during different times of day and checking performance may
provide a correlation on AWS. Finally, validating why one core outperforms a higher number
with more complex workloads may prove valuable for some business cases.
V. Acknowledgements
This project extensively used the support, insight, and mentorship of Dr. Songqing Chen, a
professor at George Mason University, and Yuqi Fu, a PhD student at the University of Virginia.
VI. Appendix
All data and scripts can be found in full at https://github.com/JawandS/Overhead-Research.
VII. References
[1] R. Arpaci-Dusseau and A. Arpaci-Dusseau, Introduction to Operating Systems.
Arpaci-Dusseau Books, 2019. Accessed: Feb. 5, 2023. [Online]. Available:
https://pages.cs.wisc.edu/~remzi/OSTEP/intro.pdf.
[2] D. Hemmendinger, Operating System. Encyclopedia Britannica, 2022. Accessed: Mar.
15, 2023. [Online]. https://www.britannica.com/technology/operating-system.
[3] Red Hat, What is the Linux Kernel? Red Hat, 2019. Accessed: Mar. 15, 2023. [Online].
https://www.redhat.com/en/topics/linux/what-is-the-linux-kernel.
[4] R. Arpaci-Dusseau and A. Arpaci-Dusseau, Scheduling: Introduction. Arpaci-Dusseau
Books, 2019. Accessed: Mar. 15, 2023. [Online]. Available:
https://pages.cs.wisc.edu/~remzi/OSTEP/cpu-sched.pdf.
[5] M. Kerrisk, taskset(1) - Linux manual page. Man7.org, 2022. Accessed: Mar. 15, 2023.
[Online]. Available: https://man7.org/linux/man-pages/man1/taskset.1.html.
[6] M. Kerrisk, nice(1) - Linux manual page. Man7.org, 2022. Accessed: Mar. 15, 2023.
[Online]. Available: https://man7.org/linux/man-pages/man1/nice.1.html.
[7] D. Brodowski, CPU frequency and voltage scaling code in the Linux(TM) kernel.
Accessed: Feb. 5, 2023. [Online]. Available:
https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt.
[8] Techopedia, Portable Operating System Interface. Accessed: Mar. 15, 2023. [Online].
Available:
https://www.techopedia.com/definition/24541/portable-operating-system-interface-posix.
[9] The Linux Foundation, About. Accessed: Feb. 5, 2023. [Online]. Available:
https://www.linuxfoundation.org/about.
[10] SUSE, Linux Kernel. Accessed: Feb. 5, 2023. [Online]. Available:
https://www.suse.com/suse-defines/definition/linux-kernel/.
[11] J. Whitham, Profiling versus tracing. Dr. Jack Whitham, 2016. Accessed: Feb. 5,
2023. [Online]. Available:
https://www.jwhitham.org/2016/02/profiling-versus-tracing.html.
[12] J. Baron and W. Cohen, The Linux Kernel Tracepoint API. Kernel Docs. Accessed:
Mar. 15, 2023. [Online]. Available: https://docs.kernel.org/core-api/tracepoint.html.
[13] M. Desnoyers, Using the Linux Kernel Tracepoints. Kernel Docs. Accessed: Mar. 15,
2023. [Online]. Available: https://docs.kernel.org/core-api/tracepoint.html.
[14] M. Kerrisk, perf(1) - Linux manual page. Man7.org, 2022. Accessed: Feb. 6, 2023.
[Online]. Available: https://man7.org/linux/man-pages/man1/perf.1.html.
[15] S. Rostedt, ftrace - Function Tracer. Red Hat Inc., 2008. Accessed: Feb. 5, 2023.
[Online]. Available: https://www.kernel.org/doc/html/v5.0/trace/ftrace.html.
[16]
M. Kerrisk, gprof(1) - Linux manual page. Man7.org, 2022. Accessed: Feb. 6, 2023.
[Online]. Available: https://man7.org/linux/man-pages/man1/gprof.1.html.
[17] Iovisor, bpftrace. Iovisor, 2019. Accessed: Nov. 9, 2022. [Online]. Available:
https://github.com/iovisor/bpftrace.
[18] eBPF, eBPF. eBPF, 2022. Accessed: Feb. 8, 2023. [Online]. Available:
https://ebpf.io/.
[19] C. C. Eglantine, Overhead- Computing. ACM, 2012. Accessed: Mar. 15, 2023.
[Online]. Available: https://dl.acm.org/doi/book/10.5555/2378395.
[20] Google, Tracing Overhead. Accessed: Mar. 15, 2023. [Online]. Available:
https://github.com/google/tracing-framework/blob/master/docs/overhead.md.
[21] S. Shende, Measurement Overhead and Instrumentation Control. University of
Oregon, 2003. Accessed: Feb. 9, 2023. [Online]. Available:
https://www.cs.uoregon.edu/research/paracomp/papers/padc03/html/node3.html.
[22] Amazon Web Services, Amazon EC2 Instance Types. Accessed: Apr. 10, 2023.
[Online]. Available: https://aws.amazon.com/ec2/instance-types/.
Download