Kexec - Horms Solutions

advertisement
Kexec: Soft-Reboot and Crash-Dump Analysis
for Linux and Xen
Linux.Conf.Au, Sydney, Australia
http://www.vergenet.net/linux/kexec/
Simon Horman (Horms) <simon@valinux.co.jp>
Magnus Damm <magnus@valinux.co.jp>
VA Linux Systems Japan K.K.
http://www.valinux.co.jp/en/
18th January 2007
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
1 / 34
A Tale in Three Parts
Part I: Crash-Dump Analysis
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
2 / 34
A Tale in Three Parts
Part I: Crash-Dump Analysis
Part II: Kexec
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
2 / 34
A Tale in Three Parts
Part I: Crash-Dump Analysis
Part II: Kexec
Part III: Port of Kexec to Xen
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
2 / 34
Part I
Crash-Dump Analysis
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
3 / 34
Crash-Dump Analysis
User-Space Core-Dump
When an application crashes it is useful to analyse the state of the
programme at the time of the crash
Operating systems provide a facility to dump the memory core of
applications
The core can be analysed using tools such as gdb
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
4 / 34
Crash-Dump Analysis
User-Space Core-Dump
When an application crashes it is useful to analyse the state of the
programme at the time of the crash
Operating systems provide a facility to dump the memory core of
applications
The core can be analysed using tools such as gdb
Operating System Crash-Dump
When an operating system crashes it is also useful to analyse the state
of the system at the time of the crash
The difficulty arises because it is the operating system that would
naturally collect the memory dump, however it has crashed
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
4 / 34
Who is Interested In It?
Enterprise Users
Want deep analysis of crashes
Are willing to go to third parties for analysis
But they want to use vendor-supplied kernels
So a standardised solution is needed
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
5 / 34
Who is Interested In It?
Enterprise Users
Want deep analysis of crashes
Are willing to go to third parties for analysis
But they want to use vendor-supplied kernels
So a standardised solution is needed
Kernel Developers
If end-users could collect crash dumps, it could be useful for kernel
developers to isolate bugs.
The facility would need to be provided by distributions.
Crash-Dump size may be a problem when transferring between
end-users and kernel-developers
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
5 / 34
Some Crash-Dump Solutions for Linux
LKCD — Linux Kernel Crash Dump
http://lkcd.sourceforge.net/
netdump
http://www.redhat.com/support/wpapers/redhat/netdump/
diskdump
http://sourceforge.net/projects/lkdump/
Mini Kernel Dump
http://mkdump.sourceforge.net/
Kexec/Kdump
Full Disclosure: Mini Kernel Dump was developed in part by my employer, VA Linux Systems Japan K.K.
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
6 / 34
Part II
Kexec
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
7 / 34
Kexec
Kexec is a feature of the Linux kernel that allows a new kernel to run
in place of the running kernel
Kernel
kexec
Second
Kernel
It is named after exec(2), the system call which allows a process to
replace a running process
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
8 / 34
Motivation for Kexec
Allows rebooting to skip the BIOS and hardware initialisation
Some hardware has buggy BIOSes that do not allow reboot to occur
reliably
Some hardware takes a really long time to reboot
Rebooting using kexec can be really fast
Can be very useful for bootstrapping and rebooting in development
environments.
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
9 / 34
Performing Kexec
The new kernel is loaded into the running kernel from user-space
using kexec-tool
# kexec -l /boot/vmlinuz
The new kernel can then be kexeced using the same tool
# kexec -e
Kexec is a method of warm or soft-booting machines
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
10 / 34
Kexec Flow
1. Kexec Load
file system
Load
kexec-tool
Files
Kernel Image
Initramfs / Initrd
Horms (VA Linux Systems Japan K.K.)
Purgatory
Kernel
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
Kexec
18th January 2007
11 / 34
Kexec Flow
1. Kexec Load
file system
Load
kexec-tool
Files
Kernel Image
Initramfs / Initrd
Purgatory
Kernel
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
2. Kexec Execute
kexec−tool
Purgatory
Kernel
Execute
Point of
No Return
Horms (VA Linux Systems Japan K.K.)
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
Kexec
Boot Param
Kernel Image
Initramfs / Initrd
Command Line
Second
Kernel
18th January 2007
12 / 34
Kdump
Kdump is an extension of kexec with allows a crash kernel to be
executed in place of a running kernel if the running kernel panics
crash
Kernel
kdump
Crash Kernel
Kdump and kexec use separate kernel images,
which may be loaded independently of each other
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
13 / 34
Kexec vs Kdump
Kexec
Uses (all) system memory as any booting kernel would
The second kernel runs when triggered by kexec-tool
Kdump
Runs the crash kernel in a reserved area of memory,
so as to leave the panicked kernel’s memory untouched
for forensic analysis
The crash kernel runs when the running kernel panics
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
14 / 34
Kdump Flow
1. Kdump Load
file system
Load
kexec-tool
Files
Kernel Image
Initramfs / Initrd
Horms (VA Linux Systems Japan K.K.)
Purgatory
Kernel
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
Kexec
18th January 2007
15 / 34
Kdump Flow
1. Kdump Load
file system
kexec-tool
Load
Files
Kernel Image
Initramfs / Initrd
Purgatory
Kernel
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
2. Kdump Execute
Panic!
Kernel
Purgatory
Kernel
Execute
Point of
No Return
Horms (VA Linux Systems Japan K.K.)
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
Kexec
Boot Param
Kernel Image
Initramfs / Initrd
Command Line
Crash
Kernel
18th January 2007
16 / 34
Triggering Kdump
Usually triggered by the kernel on panic
But may be triggered from user-space (for testing purposes? :-)
echo c > /proc/sysrq-trigger
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
17 / 34
Triggering Kdump
Usually triggered by the kernel on panic
But may be triggered from user-space (for testing purposes? :-)
echo c > /proc/sysrq-trigger
sysrq−trigger
Purgatory
Kernel
Execute
Point of
No Return
Horms (VA Linux Systems Japan K.K.)
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
Kexec
Boot Param
Kernel Image
Initramfs / Initrd
Command Line
Crash
Kernel
18th January 2007
17 / 34
VMCore
Visible as /proc/vmcore
from within a crash kernel run by kdump
Includes the entire physical memory,
as visible to the panicked kernel
crash
Kernel
kdump
Crash Kernel
vmcore
Preamble
Crash Notes
Physical
Memory
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
18 / 34
Core Files
Linux core files are generally ELF format
Preamble
Crash Notes
Memory
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
19 / 34
Core Files
Linux core files are generally ELF format
They start with an ELF preamble
Preamble
Crash Notes
Memory
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
19 / 34
Core Files
Linux core files are generally ELF format
They start with an ELF preamble
Then there is a crash notes section which
includes the pid of the crashing process,
cpu registers, and the like
Horms (VA Linux Systems Japan K.K.)
Kexec
Preamble
Crash Notes
Memory
18th January 2007
19 / 34
Core Files
Linux core files are generally ELF format
They start with an ELF preamble
Then there is a crash notes section which
includes the pid of the crashing process,
cpu registers, and the like
Preamble
Crash Notes
Memory
And then there is the memory core of the
crashed process
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
19 / 34
Core Files
Linux core files are generally ELF format
They start with an ELF preamble
Then there is a crash notes section which
includes the pid of the crashing process,
cpu registers, and the like
Preamble
Crash Notes
Memory
And then there is the memory core of the
crashed process
Core files may be analysed using tools such
as gdb
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
19 / 34
VMCore Files
Kexec’s vmcore is also ELF format
Preamble
Crash Notes
Physical
Memory
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
20 / 34
VMCore Files
Kexec’s vmcore is also ELF format
Preamble
Crash Notes
Preamble
Physical
Memory
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
20 / 34
VMCore Files
Kexec’s vmcore is also ELF format
Crash Notes
Preamble
Then per-CPU crash notes
Horms (VA Linux Systems Japan K.K.)
Preamble
Physical
Memory
Kexec
18th January 2007
20 / 34
VMCore Files
Kexec’s vmcore is also ELF format
Crash Notes
Preamble
Then per-CPU crash notes
And then the entire physical memory of the
crashed machine
Horms (VA Linux Systems Japan K.K.)
Preamble
Kexec
Physical
Memory
18th January 2007
20 / 34
Using Kexec for Crash-Dump Analysis
Kdump can be used to capture the state of a system when a panic
occurs
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
21 / 34
Using Kexec for Crash-Dump Analysis
Kdump can be used to capture the state of a system when a panic
occurs
VMcore can be used to save this state to disk or a remote host
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
21 / 34
Using Kexec for Crash-Dump Analysis
Kdump can be used to capture the state of a system when a panic
occurs
VMcore can be used to save this state to disk or a remote host
The resulting image than then be analysed off-line using tools like
crash
http://people.redhat.com/anderson/crash whitepaper/
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
21 / 34
Part III
Port of Kexec to Xen
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
22 / 34
Motivation for Porting Kexec to Xen
Provide a crash-dump facility for both domain 0
and the hypervisor
kexec for domain U is being developed separately by Gerd Hoffmann
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
23 / 34
Motivation for Porting Kexec to Xen
Provide a crash-dump facility for both domain 0
and the hypervisor
kexec for domain U is being developed separately by Gerd Hoffmann
Kexec is a strong solution for hypervisor crash-dump
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
23 / 34
Motivation for Porting Kexec to Xen
Provide a crash-dump facility for both domain 0
and the hypervisor
kexec for domain U is being developed separately by Gerd Hoffmann
Kexec is a strong solution for hypervisor crash-dump
And for free it can also provide crash-dump for domain 0
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
23 / 34
Motivation for Porting Kexec to Xen
Provide a crash-dump facility for both domain 0
and the hypervisor
kexec for domain U is being developed separately by Gerd Hoffmann
Kexec is a strong solution for hypervisor crash-dump
And for free it can also provide crash-dump for domain 0
It is has an active developer community (for Linux)
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
23 / 34
Motivation for Porting Kexec to Xen
Provide a crash-dump facility for both domain 0
and the hypervisor
kexec for domain U is being developed separately by Gerd Hoffmann
Kexec is a strong solution for hypervisor crash-dump
And for free it can also provide crash-dump for domain 0
It is has an active developer community (for Linux)
And as a further bonus, provides a method of soft-booting
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
23 / 34
Implementation in Brief
Linux’s kexec code has a generic core and architectures implement
several hooks
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
24 / 34
Implementation in Brief
Linux’s kexec code has a generic core and architectures implement
several hooks
The implementation of these hooks have been moved into the
hypervisor
Accessed via a hypercall
Only the hypervisor has access to the machine’s physical memory
which is a requirement for implementing kexec
Allows the hypervisor to trigger kdump if it panics
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
24 / 34
Implementation in Brief
Linux’s kexec code has a generic core and architectures implement
several hooks
The implementation of these hooks have been moved into the
hypervisor
Accessed via a hypercall
Only the hypervisor has access to the machine’s physical memory
which is a requirement for implementing kexec
Allows the hypervisor to trigger kdump if it panics
Additional hooks were added to load and unload kernels into the
hypervisor
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
24 / 34
Xen Kexec Flow
1. Kexec Load
file system
Load
Files
Hypervisor Image*
Kernel Image
Initramfs / Initrd
kexec-tool
Purgatory
Segments
Hypervisor Image*
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
Domain 0
Kernel
Hypervisor
Segments
Hypervisor Image*
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
* Hypervisor Image is only needed if kexecing into Xen, it should be omitted when kexecing into Linux
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
25 / 34
Kexec Flow
1. Kexec Load
file system
Load
kexec-tool
Files
Hypervisor Image*
Kernel Image
Initramfs / Initrd
Purgatory
2. Kexec Execute
kexec−tool
Segments
Hypervisor Image*
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
Domain 0
Kernel
Hypervisor
Segments
Hypervisor Image*
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
* Hypervisor Image is only needed if kexecing into Xen, it should be omitted when kexecing into Linux
Execute
Domain 0
Kernel
Segments
Hypervisor Image*
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
Point of
No Return
Horms (VA Linux Systems Japan K.K.)
Purgatory
Hypervsior
Kexec
Boot Param
Hypervisor Image*
Kernel Image
Initramfs / Initrd
Command Line
18th January 2007
Linux
or
Xen
26 / 34
Xen Kdump Flow
1. Kdump Load
file system
kexec-tool
Files
Kernel Image
Initramfs / Initrd
Purgatory
Horms (VA Linux Systems Japan K.K.)
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
Domain 0
Kernel
Kexec
Hypervisor
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
18th January 2007
27 / 34
Xen Kdump Flow
1. Kdump Load
file system
kexec-tool
Files
Kernel Image
Initramfs / Initrd
Purgatory
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
Domain 0
Kernel
Hypervisor
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
2a. Kdump Execute - Domain 0 Panics
Panic!
Domain 0
Kernel
Execute
Domain 0
Kernel
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
Point of
No Return
Horms (VA Linux Systems Japan K.K.)
Purgatory
Hypervsior
Kexec
Boot Param
Kernel Image
Initramfs / Initrd
Command Line
18th January 2007
Crash
Kernel
28 / 34
Xen Kdump Flow
1. Kdump Load
file system
kexec-tool
Files
Kernel Image
Initramfs / Initrd
Purgatory
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
Domain 0
Kernel
Hypervisor
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
2a. Kdump Execute - Domain 0 Panics
Panic!
Domain 0
Kernel
Execute
Domain 0
Kernel
Purgatory
Hypervsior
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
Point of
No Return
Boot Param
Kernel Image
Initramfs / Initrd
Command Line
Crash
Kernel
2b. Kdump Execute - Hypervisor Panics
Panic!
Hypervisor
Execute
Point of
No Return
Horms (VA Linux Systems Japan K.K.)
Purgatory
Hypervsior
Kexec
Segments
Kernel Image
Initramfs / Initrd
Command Line
Purgatory
Boot Param
Kernel Image
Initramfs / Initrd
Command Line
18th January 2007
Crash
Kernel
29 / 34
VMCore Files for Xen
The same as Linux’s (for now)
memory includes the hypervisor,
domain 0 and domain U
Kazuo Moriwaka is working on
dumpread, which allows the
memory of the hypervisor, or
any of the domains to be
extracted from the vmcore
Preamble
Crash Notes
Physical
Memory
Hypervisor
Domain 0
Domain 1
Domain N
Note that domain memory is not linear
Itsuro Oda is working on
xencrash, a fork of crash for
handling xen crash-dumps.
* Full Disclosure: Both Kazuo Moriwaka and Itsuro Oda are colleagues of mine
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
30 / 34
Implementation Problems: Constants
Some constants differ between Linux and Xen
e.g. PAGE OFFSET (ia64)
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
31 / 34
Implementation Problems: Constants
Some constants differ between Linux and Xen
e.g. PAGE OFFSET (ia64)
Compile time constants in kexec-tool: purgatory
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
31 / 34
Implementation Problems: Constants
Some constants differ between Linux and Xen
e.g. PAGE OFFSET (ia64)
Compile time constants in kexec-tool: purgatory
Compile time constants in the Kernel: relocate kernel()
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
31 / 34
Implementation Problems: Constants
Some constants differ between Linux and Xen
e.g. PAGE OFFSET (ia64)
Compile time constants in kexec-tool: purgatory
Compile time constants in the Kernel: relocate kernel()
Globals the Kernel: relocate kernel()
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
31 / 34
Implementation Problems: Constants
Some constants differ between Linux and Xen
e.g. PAGE OFFSET (ia64)
Compile time constants in kexec-tool: purgatory
Compile time constants in the Kernel: relocate kernel()
Globals the Kernel: relocate kernel()
In practice this has not been a problem for the i386 and x86 64
implementations, but it does manifest in ia64
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
31 / 34
Implementation Problems: Crash Notes
Problem
Per-cpu memory reserved at kernel boot time
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
32 / 34
Implementation Problems: Crash Notes
Problem
Per-cpu memory reserved at kernel boot time
Crash note mapping for VMcore is set up at kexec-load time
/sys/devices/system/cpu/cpu*/crash notes
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
32 / 34
Implementation Problems: Crash Notes
Problem
Per-cpu memory reserved at kernel boot time
Crash note mapping for VMcore is set up at kexec-load time
/sys/devices/system/cpu/cpu*/crash notes
Written to at crash-time for each cpu
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
32 / 34
Implementation Problems: Crash Notes
Problem
Per-cpu memory reserved at kernel boot time
Crash note mapping for VMcore is set up at kexec-load time
/sys/devices/system/cpu/cpu*/crash notes
Written to at crash-time for each cpu
When using xen, the domain U kernels see virtual cpus
May only be a subset of physical cpus
Physical to virtual mapping may change at any time
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
32 / 34
Implementation Problems: Crash Notes
Problem
Per-cpu memory reserved at kernel boot time
Crash note mapping for VMcore is set up at kexec-load time
/sys/devices/system/cpu/cpu*/crash notes
Written to at crash-time for each cpu
When using xen, the domain U kernels see virtual cpus
May only be a subset of physical cpus
Physical to virtual mapping may change at any time
Solution
Leave crash notes as-is in the domain 0 kernel
The virtual cpus at the time of kexec-load will be visible in the VMCore
The virtual cpus at the time of crash will be saved, if the domain 0
kernel crashes
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
32 / 34
Implementation Problems: Crash Notes
Problem
Per-cpu memory reserved at kernel boot time
Crash note mapping for VMcore is set up at kexec-load time
/sys/devices/system/cpu/cpu*/crash notes
Written to at crash-time for each cpu
When using xen, the domain U kernels see virtual cpus
May only be a subset of physical cpus
Physical to virtual mapping may change at any time
Solution
Leave crash notes as-is in the domain 0 kernel
The virtual cpus at the time of kexec-load will be visible in the VMCore
The virtual cpus at the time of crash will be saved, if the domain 0
kernel crashes
A separate area is reserved by the hypervisor to save per-cpu
crash-note-like information
Saved regardless of if it is the hypervisor or domain 0 kernel that crash
Visible as ELF sections in the VMCore
But a customised tool is needed anyway
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
32 / 34
Implementation Problems: Crash Notes
Problem
Per-cpu memory reserved at kernel boot time
Crash note mapping for VMcore is set up at kexec-load time
/sys/devices/system/cpu/cpu*/crash notes
Written to at crash-time for each cpu
When using xen, the domain U kernels see virtual cpus
May only be a subset of physical cpus
Physical to virtual mapping may change at any time
Solution
Leave crash notes as-is in the domain 0 kernel
The virtual cpus at the time of kexec-load will be visible in the VMCore
The virtual cpus at the time of crash will be saved, if the domain 0
kernel crashes
A separate area is reserved by the hypervisor to save per-cpu
crash-note-like information
Saved regardless of if it is the hypervisor or domain 0 kernel that crash
Visible as ELF sections in the VMCore
But a customised tool is needed anyway
N.B: This is probably broken if cpus are hotpluged, regardless of Xen
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
32 / 34
Status of the Xen Port
Working with the port, all combinations of
Xen/Linux→Xen/Linux transitions possible
Without the port, only
Linux→Xen/Linux transitions
are possible
Status
x86 32
x86 64
ia64
Horms (VA Linux Systems Japan K.K.)
included in Xen 3.0.4
included in Xen 3.0.4
work in progress
Kexec
18th January 2007
33 / 34
Questions?
Horms (VA Linux Systems Japan K.K.)
Kexec
18th January 2007
34 / 34
Download