SF 2009 System & Application Software Performance Tuning For Devices Powered by the Intel® Atom™ Processor Wise Chen, Intel Software and Services Group Developer Products Division Agenda • • • • 2 Market Segment and Tools Overview System Software Development Application Performance Tuning Q&A Intel® Tools Cover All These Device Categories Consumer electronic Intel® Atom™ processor CE4100 Mobile Internet Devices Intel® Atom™ processor Zxx series Intel® Media processor CE3100 Netbooks/Nettops Intel Atom™ processor Zxx series Embedded Intel Atom™ processor Zxx series Intel® Atom™ processor Nxx series Windows* Linux* Moblin/Linux* Moblin/Linux* Intel® Software Development Tools available 3 Moblin/Linux* RTOS Intel® Software Development Products fully support Intel® Atom™ processors running Moblin, Windows* and RTOS Intel® Software Development Tools Coverage Windows* Moblin/Linux* RTOS Intel® Intel® Intel® Intel® Intel® C++ Compiler for Windows* Integrated Performance Primitives Library (IPP) VTune™ Performance Analyzer Parallel Studio Threading Building Blocks Intel® Embedded Software Development Tool Suite Intel® Application Software Development Tool Suite Intel® C++ Compiler Professional Edition for QNX* Neutrino* RTOS “Application Suite“ • For ISVs and Moblin Community – tune Moblin applications for more performance and extend battery life of Intel® Atom™ processor powered devices “Embedded Suite“ • For OEM/ODMs (+ their key ISVs) and OSVs – use a complete tools solution with a sophisticated JTAG debug solution for embedded system and application software design • 4 http://software.intel.com/software/products/atomtools Intel® Software Development Tools For Intel® Atom™ Processors • Outstanding performance – Increased application software performance can help to extend battery life time • Intel® architecture customization increases productivity & efficiency – Find and fix issues faster with full GUI driven system-level JTAG and application debugging tools • Technology alignment – Latest Intel® Atom™ Processor and chipset support – NDA Tools BETA programs for next generation silicon • Excellent customer support 5 Moblin Software Development Tools Moblin Open Source Linux* SW Platform for Mobile & Embedded Devices including Mobile Internet Devices (MID´s), Netbooks, Automotive In-Vehicle Infotainment Systems Intel® Embedded Software Development Tool Suite Intel® Application Software Development Tool Suite The Moblin SDK • • • • • • Development guides, tutorials, sample code, API references Compliance Tools Project generator GNU Tools Moblin Image Creator 2 PowerTop Intel® Software Development Tool Suite • • • • • Intel® C++ Compiler Intel® Integrated Performance Primitives Library Intel® JTAG Debugger Intel® Application Debugger Intel® VTune™ Performance Analyzer Intel® Tool Suites complement the open source Moblin SDK 6 Intel® C++ Software Development Tools Intel® Tools – a complete solution with more performance, and latest technology alignment 7 *Other names and brands may be claimed as the property of others Agenda • • • • 8 Market Segment and Tools Overview System Software Development Application Performance Tuning Q&A Intel® Tools for System Development Cross Development • Different host and target hardware • Cross compile on host • Download and debug with JTAG Debugger Intel® C++ Compiler • Build performance critical OS components and drivers • Optimize for fast execution and fast OS switch into low power mode Intel® JTAG Debugger • Debug and identify issues in bootloader • Debug and identify issues in OS kernel • Debug and identify issues in device drivers 9 Using Intel® C++ Compiler for OS Kernel Development • Install Intel® C++ Compiler into build environment – Use protected OS image build environment like Moblin Image Creator 2 • Modify component makefiles to use ICC instead of GCC for parts that – Are multimedia or data volume, or data stream driven – Have a lot of direct interaction with user interface – Note: OS kernels are highly optimized code. Recompile using different compiler – “hard work with limited benefit” • Improve overall OS responsiveness and end-user experience Use Intel® C++ Compiler for spot optimizations in System Software, e.g. performance critical drivers, codecs, etc. 10 Intel® JTAG Debugger - Target Connection • System Software Development == bare metal programming • JTAG based debugging is the only solution • JTAG connector on the target HW required – to access • CPU registers • SoC components / peripheral registers • An intelligent probe - e.g. Intel® XDP3 JTAG I/F probe connects host system with the target JTAG interface Intel® Atom™ Processor Host Z510/Z530 JTAG USB 24bit LVDS 400/533 MHz FSB System Controller Hub SDVO DDR2 400/533 (mem down) 2 PCIe* x1 Lanes US15W GPIO 8* USB 2.0 Host Ports SMBus LPC SDIO/ MMC 1 IDE Channel 11 FWH Intel® High Definition Audio SIO RCP Eclipse GUI based JTAG Debugger Intel® JTAG Debugger offers: • Full C++/C/ASM debugging • Full platform support with unique hardware insight • On-Chip Trace Support • Linux* host support and Linux* target OS awareness • Flash Memory support 12 Debug Linux* OS kernel h • Ensure OS image is on the target • Connect JTAG Debugger to Intel® Atom™ Processor mwait_idle $ ./xdb.sh Kernel XDB> set opt /hard=on • Run target platform until basic platform initialization through firmware/BIOS is complete – System stops at “start_kernel” • Step through kernel initialization and single step as you please • Run to &mwait_idle to debug fully initialized OS 13 start_kernel Firmware/BIOS Intel® JTAG Debugger is recommended for OEM/OSVs who need to customize, debug and validate OS kernels. Boot sequence • Load OS image symbol information into debugger • Set HW breakpoint at label “start_kernel“ – some memory locations may not be mapped as valid yet or may be read-only: Trace Support • Hardware feature of Intel® Atom™ Processor • Enables viewing of execution history • Identify the root cause for exceptions Branch Trace Buffer On chip (Intel® JTAG Debugger) Memory allocated (Intel® Application Debugger) Executed Kernel or Application Application Source Code Send Branch Trace Information Branch 14 To Debugger Localize Configuration Issues with Instruction Trace C/C++ Source Window Trace Window Stop at specific OS signal Assembler Window 15 Chipset Peripheral Registers Intel® Atom™ processor Kernel Module/Audio Driver - Init Code Z510/Z530 #include <hdaudioregisters.h> 24bit LVDS 400/533 MHz FSB SDVO #define HD_AUDIO_REG_BASE = 0x00FF0000; System Controller Hub uint32 * hdaudioregbase = (uint32)HD_AUDIO_REG_BASE; DDR2 400/533 (mem down) init() GPIO US15W 8* USB 2.0 Host Ports SMBus LPC 1 IDE Channel { 2 PCIe* x1 hdaudioregbase[D27FO_IHDACR] = 0x01; Lanes SDIO/ MMC FWH SIO … } Intel® System Controller Hub US15W • ~400 Peripheral Registers Intel® High Definition Audio • Validating Peripheral Register Settings Can Be Quite Complex 16 CPU & Chipset Specific Register Access Show and change the content of all processor & chipset registers Convenient access to architectural registers - analyze register changes after instruction execution Chipset Registers Bitfield Editor Graphical representation of peripheral registers and bit fields with online documentation Easy and fully documented access to all processor registers and peripherals. Change register contents on the fly, without re-compilation Note: Intel® JTAG Debugger requires the XDP3 JTAG hardware interface from Intel 17 Linux* OS Awareness – System Debug Kernel Kernel • Monitor kernel modules and system threads • Access status information aware ofof allLinux* relevant platform software stack interactions • Be Debugging memory images 18 * Other names and brands may be claimed as the property of others. System Software Debugging Receipe • • • • • • GCC for kernel build, ICC for performance critical code Compile kernel with debug info Connect target through JTAG I/F Set hardware breakpoint @ “start_kernel“ Run target to complete firmware/BIOS init Debug kernel – Execution trace to find errors that are hard to detect – Use translation table feature to resolve segmentation faults • Inspect SoC/chipset peripheral registers to validate low-level drivers • Use Flash feature to burn image into Flash memory 19 Use Intel® JTAG Debugger for in-depth system software debugging with full Si/SoC/chipset awareness Agenda • • • • 20 Market Segment and Tools Overview System Software Development Application Performance Tuning Q&A Performance Optimization Principles VTune Implement library functions • Highly optimized multimedia/math library functions • OpenMP compiled (works on multicore/HT only) • Update application source code & build environment Modify source code • Identify C and ASM – source spot optimization opportunities • Analyse results – update sources, rebuild, analyze again Compiler: Intel® C++ Compiler IPP: Intel® Integrated Performance Primitives VTune: Intel® VTune™ Performance Analyzer 21 Intel® Tool Suites provide a complete spectrum of performance optimization methodologies Less efforts IPP Better results Compiler Re-compile • –xSSE3_ATOM (Atom switch / in-order scheduler) • IPO (interprocedural optimization) • PGO (program guided optimization) • OpenMP (works on multicore/HT only) – source modification Identify Optimization Opportunities Compiler IPP VTune Get the best performance out of an application, by • Identifying optimization opportunities using the Intel® VTune™ Performance Analyzer Questions to ask • • • 22 Where do I spend most of my execution time? Where do small optimizations have the biggest impact? What hardware bottlenecks and dependency stalls can be easily avoided? Intel® VTune™ Performance Analyzer Identifies hard to find performance bottlenecks Features • Low overhead sampling • No instrumentation required • Monitor processor events like cache misses etc. • View results in source or assembly Usage Model • Two components .TB5 file • Intel® VTune™ Performance Intel VTune Analyzer Sampling Collector Analyzer on host • Sampling Collector on the target • Collect data on target and analyze it on the host 23 Sampling - How To Find Hotspots • Pick an event to sample and configure PMU – Cache misses, branch mis-predictions, Dependency/pipeline stalls • Start SEP sampling routine and application • Performance Management Unit (PMU) periodically interrupts the processor SEP == ISR Counter registers PMU Event 1 <0 Event 2 <0 Event 3 24 Collect • Execution address in memory (CS:IP) <0 • OS process and thread ID • Executable module loaded at that address Event 4 <0 Write Event 5 <0 General Purpose Event Registers Dedicated Event Registers • IRQ Numbers in counters define sampling rate • Information into *.TB5 file Take Advantage of Sampling Data • The Intel® VTune™ Performance Analyzer tells you which module, function or routine could use some improvement. Focus your application optimization efforts where it counts – Intel® VTune™ Performance Analyzer helps to analyze applications without source and binary instrumentation 25 Intel® C++ Compiler Compiler IPP VTune 26 Compiler Features Benefits Performance Significantly faster than GCC High performing code maps directly into application quality and battery lifetime In-order scheduler Compiler optimization switch that rearranges/optimizes application code to be executed with best performance on Intel’s Low-power Intel® Architecture technology Better performance of system- and application software helps to reduce power consumption of a mobile device Profile Guided Optimization Multi-stage optimization method with feedback loop Improves application performance by reducing instruction-cache thrashing, reorganizing code layout, shrinking code size, and reducing branch mispredictions GCC Compatibility Intel Compiler provides GCC language extensions and is source and binary code compatible with GCC Saves efforts in porting/re-using existing code Need For In-order Scheduler Support - avoid dependency stalls Representative assembly: Consider code sequence: a = b * 7; c = d * 7; Processor cycles 1 movl b,%eax Memory Load Dependency Stall 2 imull $7,%eax 3 movl %eax,a 4 movl d,%edx Memory Load Dependency Stall 5 imull $7,%edx 6 movl %edx,c • In some cases assembly code causes delays and dependency stalls which decrease the performance of application and performance critical code 27 Dependency Dependency Need For In-order Scheduler Support - avoid dependency stalls Representative assembly: movl b,%eax 1 1movl b,%eax compiler switch Consider code sequence: a = b * 7; c = d * 7; in-order scheduler Processor cycles -xSSE3_ATOM Memory Load 4 movl d,%edx Dependency Stall 2 imull $7,%eax 2 imull Dependency Memory Load 6 movl %edx,c Dependency Stall Dependency $7,%eax imull $7,%edx 3 5movl %eax,a movl %eax,a 4 3movl d,%edx 5 imull $7,%edx 6 movl %edx,c • Compiler switch –xSSE3_ATOM enables the in-order scheduler, which may improve application’s performance behavior Model instruction pipeline and avoid dependency stalls by using the in-order-scheduler feature 28 C/C++ Compiler Benchmark Intel® C++ Compiler 11.1 for Linux* VS. GCC 4.5.0 based on SPEC* CPU2000 estimated results – September 2009 81% Estimated Relative Performance To GCC 4.5.0 (GCC 4.5.0 = 1.0) faster 45% 10% 23% faster faster faster Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products or call (U.S.) 1-800-6288686 or 1-916-356-3104 Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase. *Other brands and names are the property of their respective owners Use Intel® C++ Compiler for higher performance on Intel® Atom™ processors Estimated by measurement on internal systems based on the following configuration assumptions: • Source: Intel estimates as of September 9, 2009 • Basis of comparison: Intel estimates for current version of Intel and GCC compilers as of September 9, 2009 Compilers: • Intel® C++ Compiler 11.1 for Linux* (icc) • GCC 4.5.0 Hardware: • Form factor: Mini-ITX / micro-ATX compatible • Integrated Intel® Atom™ Processor 330 (1.6 GHz / 1MB L2 Cache / 533 System Bus), v6.12.2 • Memory: 2GB, Harddisk: 40GB • Chipset: Intel® 945GC and ICH7 • Audio: Realtek ALC662 audio codec (5.1 channel HD audio) • Video: Intel® Graphics Media Accelerator 950 & S-video output support • I/O Control: SMSC LPC47M997 based Legacy I/O controller for serial, parallel, and PS/2 ports • LAN control: 10/100/1000 Mbits/sec LAN subsystem using the Realtek LAN adapter device Operating System: • Linux nsticlxl284 2.6.18-8.el5PAE #1 SMP Fri Jan 26 14:28:43 EST 2007 i686 i686 i386 GNU/Linux SPECint*_base2000 and SPECfp*_base2000 from SPEC CPU2000 V1.3 • SPEC and SPECint, SPECfp are trademarks of the Standard Performance Evaluation Corporation. For more information see www.spec.org • SPEC has retired SPEC CPU2000 and is no longer publishing results on its website Compiler switches used for estimates: “-o2“ • icc/ifort: -O2 • GCC: -O2 –m32 “Advanced“ • icc/ifort: -O3 -ipo -no-prec-div -prof_use –xSSE3_ATOM • GCC: -O3 -ffast-math -funroll-all-loops -m32 Note: • 178.galgel: GCC 4.5.0: Assumes use of -fno-strict-aliasing • 252.eon: GCC 4.5.0: Assumes use of src.alt • 255.vortex: GCC 4.5.0: Assumes use of: -mpc64 -ffixed-form ffixed-line-length-132 EXTRA_LDFLAGS = -mpc64 • Compiler benchmarks based on SPECfp are based on C/C++ applications only (177.mesa, 179.art, 183.equake, 188.ammp) Intel® Integrated Performance Primitives (Intel® IPP) Library Compiler IPP VTune • Highly optimized multimedia functions – Images & video – Communication & signal processing – Data processing • Fully utilizing – Intel® MMX™ technology – SSE2, SSE3 – Multi-core / HT technology • Rapid application development • Cross-platform compatibility & code re-use • Outstanding performance Optimized for Intel® Atom™ Processor Use Intel® IPP libraries to concentrate on new features rather than optimizing application performance * Other names and brands may be claimed as the property of others. 30 Intel® IPP Library Example: MP3 Decoder Bitstream in ippsUnpackFrameHeader Bitstream ippsUnpackSideInfo Unpacking ippsUnpackScaleFactors ippsHuffmanDecode Huffman Decoder Requantization, ippsReQuantize Stereo Processing ippsMDCTInv Synthesis ippsSynthPQMF Filter Bank 31 Intel® Integrated Performance Primitives (Intel® IPP) PCM audio out Summary • Intel Software Development Tool Suites for OEMs, OSVs, (“Embedded Suite“) and ISVs (“Application Suite“) cover the entire cycle of SW development • Intel® Tool Suites for Intel® Atom™ Processors complement the open source Moblin SDK • Intel Tool Suites provide a complete spectrum of performance optimization methodologies (compiler switches, IPP multimedia libs, performance bottleneck analysis with VTune) • Intel® C++ Compiler for spot optimizations in System Software, e.g. performance critical drivers, codecs, and applications in general • Intel JTAG debugger for in-depth system software debugging with full Si/SoC/chipset awareness 32 Call to Action • Check the web for more details on both the “Embedded” and “Application” tool suites – www.intel.com/software/products/atomtools • Download your 30days try-and-buy evaluation version • Let us know if you need BETA tools for the next generation Intel® Atom™ processor platform code-named “Moorestown“ (CNDA required) – http://software.intel.com/en-us/articles/intel-embedded-tool-suite-beta/ • Contact us, if you have any further questions – tccEMEA@intel.com Thank you! 33 Call to Action Support Intel Premier Support: https://premier.intel.com Public Forum: http://software.intel.com/en-us/forums/software-development-toolsuite-atom/ Articles and Documentation: http://software.intel.com/en-us/articles/intel-application-tool-suite-documentation/ http://software.intel.com/en-us/articles/intel-embedded-tool-suite-documentation/ Knowledge Base Articles: http://software.intel.com/en-us/articles/software-development-toolsuite-atom-kb/all/1/ http://software.intel.com/en-us/articles/installing-compiler-into-kvm-atom/ http://software.intel.com/en-us/articles/moblin-integration-software-development-tool-suite-atom/ http://software.intel.com/en-us/articles/intel-development-tools-for-mids-faqs Q&A 35 Legal Disclaimer • INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. • Intel may make changes to specifications and product descriptions at any time, without notice. • All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. • Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. • Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. • Intel, Vtune and the Intel logo are trademarks of Intel Corporation in the United States and other countries. • *Other names and brands may be claimed as the property of others. • Copyright © 2009 Intel Corporation. 36 Risk Factors The above statements and any others in this document that refer to plans and expectations for the third quarter, the year and the future are forward-looking statements that involve a number of risks and uncertainties. Many factors could affect Intel’s actual results, and variances from Intel’s current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements. Intel presently considers the following to be the important factors that could cause actual results to differ materially from the corporation’s expectations. Ongoing uncertainty in global economic conditions pose a risk to the overall economy as consumers and businesses may defer purchases in response to tighter credit and negative financial news, which could negatively affect product demand and other related matters. Consequently, demand could be different from Intel's expectations due to factors including changes in business and economic conditions, including conditions in the credit market that could affect consumer confidence; customer acceptance of Intel’s and competitors’ products; changes in customer order patterns including order cancellations; and changes in the level of inventory at customers. Intel operates in intensely competitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and product demand that is highly variable and difficult to forecast. Additionally, Intel is in the process of transitioning to its next generation of products on 32nm process technology, and there could be execution issues associated with these changes, including product defects and errata along with lower than anticipated manufacturing yields. Revenue and the gross margin percentage are affected by the timing of new Intel product introductions and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors, including product offerings and introductions, marketing programs and pricing pressures and Intel’s response to such actions; and Intel’s ability to respond quickly to technological developments and to incorporate new features into its products. The gross margin percentage could vary significantly from expectations based on changes in revenue levels; capacity utilization; start-up costs, including costs associated with the new 32nm process technology; variations in inventory valuation, including variations related to the timing of qualifying products for sale; excess or obsolete inventory; product mix and pricing; manufacturing yields; changes in unit costs; impairments of long-lived assets, including manufacturing, assembly/test and intangible assets; and the timing and execution of the manufacturing ramp and associated costs. Expenses, particularly certain marketing and compensation expenses, as well as restructuring and asset impairment charges, vary depending on the level of demand for Intel's products and the level of revenue and profits. The current financial stress affecting the banking system and financial markets and the going concern threats to investment banks and other financial institutions have resulted in a tightening in the credit markets, a reduced level of liquidity in many financial markets, and heightened volatility in fixed income, credit and equity markets. There could be a number of follow-on effects from the credit crisis on Intel’s business, including insolvency of key suppliers resulting in product delays; inability of customers to obtain credit to finance purchases of our products and/or customer insolvencies; counterparty failures negatively impacting our treasury operations; increased expense or inability to obtain short-term financing of Intel’s operations from the issuance of commercial paper; and increased impairments from the inability of investee companies to obtain financing. The majority of our non-marketable equity investment portfolio balance is concentrated in companies in the flash memory market segment, and declines in this market segment or changes in management’s plans with respect to our investments in this market segment could result in significant impairment charges, impacting restructuring charges as well as gains/losses on equity investments and interest and other. Intel's results could be impacted by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Intel's results could be affected by adverse effects associated with product defects and errata (deviations from published specifications), and by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust and other issues, such as the litigation and regulatory matters described in Intel's SEC reports. A detailed discussion of these and other risk factors that could affect Intel’s results is included in Intel’s SEC filings, including the report on Form 10-Q for the quarter ended June 27, 2009. Rev. 7/27/09 37 Additional sources of information on this topic: • Other Sessions – MOBL001: ”Accelerate Performance-Critical Applications and Code Under Moblin with Intel Software Products” • Sep 22 (day1) 10:15-12:05 – MOBL002: ”Developing for Moblin Hands-On Lab” • Sep 22 (day1) 15:10-17:10 • Demos in the showcase – Intel Tools @ Moblin Pavilion • Additional info in the Moblin community – www.moblin.org – www.moblinzone.com 38 Session Presentations - PDFs The PDF for this Session presentation is available from our IDF Content Catalog at the end of the day at: intel.com/go/idfsessions 39 Please Fill out the Session Evaluation Form Give the completed form to the room monitors as you exit! Thank You for your input, we use it to improve future Intel Developer Forum events 40