HIGIS 3/プレゼンテーション資料/J_GrayA

Development Status of Troubleshooting Features,
Tracing, Message Logging in Linux Kernel
5/20/2014
Seiji Aguchi
Information & Telecommunication Systems Company
IT Platform Division Group, IT Platform R&D Management Division
Hitachi, Ltd,.
© Hitachi, Ltd. 2014. All rights reserved.
Contents
1.Introduction
2.Case Study (Trace)
3.Case Study (Message log)
4. Persistent Store
5. Remaining message log issue
6. Summary
© Hitachi, Ltd. 2014. All rights reserved.
1
1.Introduction
© Hitachi, Ltd. 2014. All rights reserved.
2
1-1. Introduction
Background
 Users in enterprise area have migrated Linux from commercial
UNIX/mainframe .
 Recently, they try to apply Linux to in virtualized environment as well.
Expectation and goal
 Linux system is expected to have same level of troubleshooting
features as commercial UNIX/mainframe.
 Our goal is enhancing troubleshooting features of Linux, including
virtualized environment.
© Hitachi, Ltd. 2014. All rights reserved.
3
1-2. Troubleshooting Process
Step 1
Failure
happens
•
•
•
•
Troubleshooting
features
save materials
to disk
Step 2
Users
send materials to
vendors
Step 4
Step 3
Vendors analyze
materials
Vendors deliver
a fix to users
Step 1: To get sufficient materials to diagnose
Step 2: To send materials quickly
Step 3: To detect a root cause of failure quickly
Step 4: To create a fix and deliver it to user quickly
It is important for vendors/users that Linux has troubleshooting
features enough to diagnose a failure.
© Hitachi, Ltd. 2014. All rights reserved.
4
1-3. How to diagnose
• Detailed analysis
• Detect root cause of issues like performance and system down.
• Tracing, memory dump are used.
• Overall analysis
• diagnose overview of the current state of the system.
• Massage Log is used.
© Hitachi, Ltd. 2014. All rights reserved.
5
1-4. Troubleshooting Feature Overview
• Trace (ftrace, perf, SystemTap, LTTng, uprobes)
– Logging system’s behavior in detail.
• Message Log (pstore, kmsg_dump, syslog)
– All components, both kernel and applications, output their messages
to let users know what happens in a system.
• Memory Dump (kdump, coredump)
– Snapshot of memory to analyze a root cause of failure
• Performance Monitoring Tool (sar, iostat, netstat, oprofile, perf)
– Gathering statistics information related to performance
• Configuration Acquisition (sosreport)
– Gathering user’s system configuration
© Hitachi, Ltd. 2014. All rights reserved.
6
1-4. Troubleshooting Feature Overview
• Trace (ftrace, perf, SystemTap, LTTng, uprobes)
– Logging system’s behavior in detail.
• Message Log (pstore, serial console, syslog)
– All components, both kernel and applications, output their messages
to let users know what happens in a system.
• Memory Dump (kdump, coredump)
– Snapshot of memory to analyze a root cause of failure
• Performance Monitoring Tool (sar, iostat, netstat, oprofile, perf)
– Gathering statistics information related to performance
• Configuration Acquisition (sosreport)
– Gathering user’s system configuration
© Hitachi, Ltd. 2014. All rights reserved.
7
2. Case Study (Trace)
© Hitachi, Ltd. 2014. All rights reserved.
8
2-1. Problem: Delay of user process
Problem
• A user observed that some was delayed, very
infrequently.
Investigation
(1) Analyze with kernel trace by enabling system calls,
context switches, process wakeups, and local timer
interrupts.
(2) Analyze with kernel trace by enabling page fault
events.
© Hitachi, Ltd. 2014. All rights reserved.
9
2-2. Analyze with context switch events.
#
Process cpu
1
dd-20968 [002] 5410.836234
sys_read(fd: 0, buf: a79000, count: 200)
2
dd-20968 [002] 5410.836235
sys_enter: NR 0 (0, a79000, 200, 0, 0, 0)
3
dd-20968 [002] 5410.836237
sched_switch: prev_comm=dd prev_pid=20968 prev_prio=120
prev_state=D|W ==> next_comm=kswapd next_pid=55 next_prio=120
4
dd-22237 [002] 5410.838105
sched_wakeup: comm=dd pid=20968 prio=120 success=1
target_cpu=002
5
6
•
•
•
•
timestamp
delay
trace information
kswapd-55 [003] 5410.884150
sched_switch: prev_comm=kswapd prev_pid=55 prev_prio=100
prev_state=S ==> next_comm=dd next_pid=20968 next_prio=120
dd-20968 [003] 5410.884152
sys_read -> 0x200
#1 - #2: Process 20968 runs on cpu 2.
#3: Context switch happens.
#4 - #5: Process 20968 moves to cpu 3 and some delay happens.
#6: Process 20968 reruns on cpu 3.
Workaround
• Separating the CPUs by CPUSET was effective.
© Hitachi, Ltd. 2014. All rights reserved.
10
2-3. Analyze with page fault event.
#
Process
cpu
timestamp
trace information
1
app-1915 [003] 309.604958
page_fault_user: address=0x3216e01a94 ip=0x3216f2f927
error_code=0x4
2
app-1915 [003] 309.604960
page_fault_user: address=0x3216e04006 ip=0x3216f2f8ba
error_code=0x4
3
app-1915 [003] 309.604964
page_fault_user: address=0x3216e0500e ip=0x3216f2f8ba
error_code=0x4
4
app-1915 [003] 309.604967
page_fault_user: address=0x3216e02000 ip=0x3216f2f927
error_code=0x4
5
app-1915 [003] 309.604969
page_fault_user: address=0x3216e06000 ip=0x3216f2f8db
error_code=0x4
6
app-1915 [003] 309.604973
page_fault_user: address=0x3216e07006 ip=0x3216f2f8ba
error_code=0x4
• While process1915 is running, some delay happens because of
page fault events.
Workaround
• To avoid page fault, pinning memory area with mlock() is effective.
© Hitachi, Ltd. 2014. All rights reserved.
11
3. Case Study (Message Log)
© Hitachi, Ltd. 2014. All rights reserved.
12
3-1. Boot up issue
Problem
• Linux system sometimes fail to boot up for some reason.
Ex) Kernel panics before kdump service enables.
Investigation
• In these cases, administrators check kernel message.
• If serial console is usable, kernel messages can be logged.
• But, disk/serial console are not available, or trusted in the case of
a crash, persistent store(pstore) is useful.
© Hitachi, Ltd. 2014. All rights reserved.
13
4. Persistent Store
4.1 Overview
4.2 Basice usage
4.3 Advanced usage
© Hitachi, Ltd. 2014. All rights reserved.
14
4. Persistent Store
4.1 Overview
4.2 Basic usage
4.3 Advanced usage
© Hitachi, Ltd. 2014. All rights reserved.
15
4-1-1. Persistent Store overview
• When Linux systems crash, generally we rely on writing to log files on
disk, or serial console.
• In some case, disk/serial console may not be available, or trusted in
the case of a crash.
• Pstore logs messages into a platform-specific place, NVRAM.
• And it reads the data by a subsequent boot.
user
User Space
Persistent store
Kernel Space
/sys/fs/pstore
Panic/Oops/shutdown
kick
kmsg_dump
Kernel Msg
write callback
NVRAM driver(EFI/APEI)
NVRAM
erase callback
Write function call
read callback
Read function call
Kernel Msg
© Hitachi, Ltd. 2014. All rights reserved.
16
4-1-3. Persistent Store (Virtualized environment)
• Pstore is usable in kvm-virtualized environment.
kmsg_dump
Kernel Msg
Pstore file system
Guest
kernel
Qemu
NVRAM driver
Host kernel
kmsg_dump
Pstore file system
Kernel Msg
NVRAM driver
NVRAM
Kernel
Msg(host)
disk
© Hitachi, Ltd. 2014. All rights reserved.
17
4. Persistent Store
4.1 Overview
4.2 Basic usage
4.3 Advanced usage
© Hitachi, Ltd. 2014. All rights reserved.
18
4-2-1. Basic usage
1. Set kernel parameter to enable pstore functionality.
2. Kick trigger to log kernel message, and reboot system.
3. Check the message files via pstore filesystem.
© Hitachi, Ltd. 2014. All rights reserved.
19
4-2-2. Step 1: Set kernel parameter
• Boot system up with adding kernel parameter
• EFI driver
pstore.backend=efi efi_pstore.disable=N
• ERST driver
pstore.backend=erst
© Hitachi, Ltd. 2014. All rights reserved.
20
4-2-3. Step 2: Kick trigger
• The following triggers are suported.
• kernel panic, oops, emergency_restart, reboot, power off, halt
• If you want to try “kernel panic”, execute sysrq-trigger.
echo c > /proc/sysrq-trigger
• And then, system boots up again.
© Hitachi, Ltd. 2014. All rights reserved.
21
4-2-4. Step 3: Check message files
• Mount pstore filesystem.
# mount –t pstore - /sys/fs/pstore
• Check message files.
10kB messages are logged.
# ls –l /sys/fs/pstore
total 0
-r--r--r--.
-r--r--r--.
-r--r--r--.
-r--r--r--.
-r--r--r--.
-r--r--r--.
-r--r--r--.
-r--r--r--.
-r--r--r--.
-r--r--r--.
-r--r--r--.
1
1
1
1
1
1
1
1
1
1
1
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
Timestamp of trigger is shown.
1016
1012
948
943
677
993
1010
999
976
1006
949
May
May
May
May
May
May
May
May
May
May
May
13
13
13
13
13
13
13
13
13
13
13
07:46
07:46
07:46
07:46
07:46
07:46
07:46
07:46
07:46
07:46
07:46
dmesg-efi-1
dmesg-efi-10
dmesg-efi-11
dmesg-efi-2
dmesg-efi-3
dmesg-efi-4
dmesg-efi-5
dmesg-efi-6
dmesg-efi-7
dmesg-efi-8
dmesg-efi-9
© Hitachi, Ltd. 2014. All rights reserved.
22
4-2-5. Step 3: Check message files (Contd.)
• Check message.
# cat dmesg-efi-4
Sysrq is triggered in this panic.
cat /sys/fs/pstore/dmesg-efi-4
Panic#2 Part4
<1>[ 306.271891] IP: [<ffffffff813ba3e6>] sysrq_handle_crash+0x16/0x20
<4>[ 306.271917] PGD 80a98c067 PUD 807e8e067 PMD 0
<4>[ 306.271937] Oops: 0002 [#1] SMP
<4>[ 306.271952] Modules linked in: tcp_lp rfcomm fuse xt_CHECKSUM
nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT
xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
nf_conntrack iptable_mangle iptable_security iptable_raw bnep arc4
snd_hda_codec_realtek snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support vfat
fat x86_pkg_temp_thermal coretemp kvm iwlmvm crc32_pclmul mac80211
crc32c_intel ghash_clmulni_intel microcode joydev serio_raw iwlwifi
i2c_i801 cfg80211 sdhci_pci sdhci lpc_ich mmc_core mfd_core snd_hda_intel
snd_hda_codec btusb snd_hwdep bluetooth snd_seq
© Hitachi, Ltd. 2014. All rights reserved.
23
4. Persistent Store
4.1 Overview
4.2 Basic usage
4.3 Advanced usage
© Hitachi, Ltd. 2014. All rights reserved.
24
4-3-1. Advanced usage (Tune log size)
Usage
10KB messages are logged by default. And the size is tunable by
specifying “kmsg_byte” when mounting pstore filesystem.
# mount –t pstore –o kmsg_byte=102400 - /sys/fs/pstore
Usecase
• In case of kernel panic, the default 10KB messages are enough to
save the panic log.
• In case of disk failure, a lot of I/O failure messages are shown, so
more than 10KB messages are needed to diagnose the root cause.
© Hitachi, Ltd. 2014. All rights reserved.
25
4-3-2. Advanced usage (unexpected reboot)
Usage
By adding “printk.always_kmsg_dump” to a kernel parameter, pstore
logs more than panic log.
printk.always_kmsg_dump=Y pstore.backend=efi efi_pstore.disable=N
Usecase
•
•
•
Linux system sometimes reboot unexpectedly. One of the
candidates is “emergency_restart” event.
“emergency_restart” event is kicked by following triggers.
• watchdog timer
• hangcheck-timer
• echo c > /proc/sysrq-trigger
By logging the kernel message, we understand the reason why a
system reboots.
© Hitachi, Ltd. 2014. All rights reserved.
26
4-3-3. Advanced usage (normal reboot)
Usage
Also, by adding “printk.always_kmsg_dump”, normal reboot events can
be logged.
printk.always_kmsg_dump=Y pstore.backend=efi efi_pstore.disable=N
Usecase
•
•
Currently, device names, sdX, may change every boot on Linux
system.
So, some command to get performance statistics like sar and iostat,
may not work when this issue happens, because they access to the
device name directly.
# iostat
…
Device:
sda
“sda” may change ever boot
tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
31.29
1160.34
430.39 2181836 809280
27
© Hitachi, Ltd. 2014. All rights reserved.
4-3-4. Advanced usage (Diagnose inconsistent device name)
Usecase (Contd.)
•
•
•
In this case, the statistics data cannot get correctly, but
administrators cannot predict when this issue happens.
To detect when the inconsistent device naming issue happened,
they have to check kernel massages back past months.
To do this, message logs need to be saved to log disk every boot as
follows.
mv /sys/fs/pstore/* /var/log/pstore
© Hitachi, Ltd. 2014. All rights reserved.
28
5. Remaining message log issue
© Hitachi, Ltd. 2014. All rights reserved.
29
5-1. Remaining message log issue
Current kernel message
•
•
[
[
[
[
[
[
[
[
[
[
In a following pseudo SCSI error test, the device information and the
detail error are divided.
This implementation may have some issues.
• When other messages are inserted, it may be difficult for users
to match device information and the detail error.
• When user tools handle the error messages, those divided
messages will create some inconveniences.
17.842110]
18.859098]
18.859103]
18.859108]
18.859110]
18.859114]
18.859116]
18.859119]
18.859122]
18.859124]
sd 2:0:0:0: [sdb] Attached SCSI disk
sd 2:0:0:0: [sdb] Unhandled sense code
sd 2:0:0:0: [sdb] [ 18.859106] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sdb]
Sense Key : Medium Error [current]
Info fld=0x1234
sd 2:0:0:0: [sdb]
Add. Sense: Unrecovered read error
sd 2:0:0:0: [sdb] CDB:
Read(10): 28 00 00 00 11 e0 00 01 00 00
© Hitachi, Ltd. 2014. All rights reserved.
30
5-2. Remaining message log issue
Expected kernel message
•
[
[
[
[
[
[
[
The device information and the detail error should be displayed in a
single line to fix the issues.
17.842110]
18.859098]
18.859103]
18.859108]
18.859114]
18.859116]
18.859122]
sd 2:0:0:0: [sdb]
sd 2:0:0:0: [sdb]
sd 2:0:0:0: [sdb]
sd 2:0:0:0: [sdb]
Info fld=0x1234
sd 2:0:0:0: [sdb]
sd 2:0:0:0: [sdb]
Attached SCSI disk
Unhandled sense code
[ 18.859106] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sense Key : Medium Error [current]
Add. Sense: Unrecovered read error
CDB: Read(10): 28 00 00 00 11 e0 00 01 00 00
© Hitachi, Ltd. 2014. All rights reserved.
31
6. Summary
© Hitachi, Ltd. 2014. All rights reserved.
32
6-1. Summary
• Hitachi has experienced troubleshooting thorough support service in
Linux system.
• Trace is useful for diagnosing performance issue.
• Pstore is useful for troubleshooting in boot up/shutdown failure
cases.
• Kernel panics before kdump service is enabled.
• Disk failure
• Unexpected reboot.
• Normal reboot.
• There is a remaining issue on kernel message and should be fixed
near future.
© Hitachi, Ltd. 2014. All rights reserved.
33
END
Development Status of Troubleshooting Features,
Tracing, Message Logging in Linux Kernel
5/20/2014
Seiji Aguchi
Information & Telecommunication Systems Company
IT Platform Division Group, IT Platform R&D Management Division
Hitachi, Ltd,.
© Hitachi, Ltd. 2014. All rights reserved.
34
Trademarks
•Linux is a trademark of Linus Torvalds in the United
States, other countries or both.
•UNIX is a registered trademark of The Open Group in
the United States and other countries.
•Other company, product or service names may be
trademarks or service mark of others.
© Hitachi, Ltd. 2014. All rights reserved.
35