Special workshop KNIME: Visual Programming for Metabolomics

advertisement
KNIME
Visual Programming for Metabolomics
Stephan Beisken
Visual Programming
• “Visual programming languages enable physicians
and other computer users with little knowledge of
programming to develop computer software. The
physician uses a visual paradigm to "draw" the
computer interface and then attaches short
segments of computer code to buttons,
menus, and list boxes.”
Ebell, M. H. (1993). Visual programming languages. M.D.
Computing : Computers in Medical Practice, 10(5), 305–11.
Motivation
• Simplify your (working) life
• Data processing and analysis requires various
different tools to work together in sequence
• Data input and output
• Spreadsheets
• Data transformation
• Transposition, aggregation, string manipulation
• IsaCreator
• Formatting of tables
Agenda
• Introduction
• Tutorial
• Installation and Extensions
• Overview of the Workbench
• Nodes and Table Models
• Exercises
• Introductory Examples
• MassCascade
• OpenMS
• XCMS
• Slides, software, workflows, and data for takeaway
Disclaimer
• Workflows are great
• It does not have to be KNIME,
there are many other solutions
• Every method that captures information in a consistent
manner and enables reproducibility is great
• Transparency
• Ability to share data and ‘everything’ that was done to the data
Who is already a KNIME user?
Introduction
• KNIME: Konstanz Information Miner
• http://www.knime.org/
• Developed at University of Konstanz in Germany
• Desktop version available free of charge (open source)
• Modular platform for building and executing workflows using
predefined components: nodes
• Core functionality available for tasks such as data mining,
analysis, and manipulation
• Extra features and functionality available in KNIME through
extensions from various groups (community) and vendors
• Written in Java based on the Eclipse SDK platform
Workflow Concepts
• Workflow execution
• Can execute complex, multi-step operations on input data
• Can be run be “non-experts” using predefined parameter
templates ensuring optimal results
• Can be set up for specific measurement systems
• Can be shared across researchers
Functionality
• Data manipulation and analysis
• File & database I/O, sorting, filtering, grouping, joining, pivoting
• Data mining and machine learning
• R, WEKA, KNIME, interactive plotting
• Cheminformatics
• Conversions, similarity, clustering, (Q)SAR analysis, etc.
• Scripting integration
• R, Perl, Python, Matlab, Octave, Groovy
• Reporting and much more
• Bioinformatics, HTS & image analysis, network & text mining
• Marketing, big data and business analytics
Modules (Community Extensions)
• http://tech.knime.org/community
• Chemoinformatics
• CDK (EMBL-EBI), RDKit (Novartis), Indigo (GGA),
• ErlWood (Eli Lilly), Enalos (NovaMechanics)
• ChEMBL and ChEBI (EMBL-EBI)
• Bioinformatics
• OpenMS (Tübingen, ETH Zurich)
• MassCascade (EMBL-EBI)
• HCS (MPI), NGS (Konstanz), Image analysis
• Integration
• Python, Perl, R, Groovy, Matlab (MPI), PDB web services client
(Vernalis), REST and SOAP web service support
Workflow Platforms
Applications
Applications cont.
Applications cont.
Applications cont.
Calibration
Regression
Applications cont.
Advantages
Disadvantages
• Intuitive to use
• No or little programming
•
•
•
•
experience required
•
•
•
•
•
Good for prototyping
Lots of functionality
Very modular and flexible
Active community
Extensible
• Visual Feedback
Steep learning cure
Resource greedy
No (free) server edition
Slower execution than
standalone scripts
Installation
• Download and unzip KNIME
• No further setup required
• ./knime.ini contains arguments for launch
• Install new modules (nodes) from update sites
• Explorer and installation wizard provided
• Workflows and data are stored in a workspace
• ~/<user>/knime/workspace
• C:\Users\<user>\knime\workspace
• Preferences in: File  Preferences  KNIME
Workbench
Auto-layout Execute Execute all nodes
Node description
tabs
workflow projects
favorite nodes
public server
workflow editor
node repository
outline
console
Nodes
• Node: Basic processing unit of a workflow
• performs a particular task
Input port(s) – on the left of icon
Title
Output port(s) – on the right of icon
Icon
Status display (‘traffic lights’)
Sequence number
• Red (not ready)
• Amber (ready)
• Green (executed)
•
Blue bar during execution
(with percentage or flashing)
Right-click menu
To configure and
execute the node,
display the output
views, edit the
node, and display
data for the ports
Dialogs
• Double-click opens configuration dialogs
• Explicit column types
Tables
Table rows
Column specifications
Various renderers
Column types
Exercises: Preliminaries
• Pre-installed KNIME Desktop 2.9.1
• Workflows
• starters, xcms, openms, masscascade
• Data
• FAAH knockout LC/MS data
• ESB tomato LC/MS QC data
• ChEBI SDFile, KEGG SDFile
• Plug-Ins (more in About KNIME  Installation Details)
• R (interactive)
• Erl Wood, CDK
• OpenMS, MassCascade
Exercises: Installation
• Open your KNIME directory
• ~/Desktop/knime_2.9.1
• ./knime.exe
• Memory allocation
• ./knime.ini
Exercises: Starters
• More examples available
from the Examples repository
Exercises: MassCascade
https://bitbucket.org/sbeisken/masscascadeknime/wiki/ExampleWorkflows
Exercises: XCMS
http://www.bioconductor.org/packages/devel/data/experiment/manuals/faahKO/man/faahKO.pdf
Exercises: OpenMS
http://ftp.mi.fu-berlin.de/OpenMS/release-documentation/OpenMS_tutorial.pdf
Final Remarks
• Workflows can make exploratory or repetitive data tasks
easier and save time
• Extensive data pre-processing functionality
• Extensions for statistics, machine learning, bio-, and
cheminformatics
• Integration of R (XCMS) and spectrometry extensions can
help you to build elaborate pipelines and share work
• Can help to organize one’s thoughts.
• It’s actually quite a bit of fun.
Resources
• KNIME Forum
• http://www.knime.org/
• KNIME Learning Hub
• http://www.knime.org/learning-hub
• Quickstart Guide
• http://tech.knime.org/files/KNIME_quickstart.pdf
• Happy to Help
• beisken@ebi.ac.uk
Q&A
•
•
•
•
•
•
•
•
•
•
Download