Accelerating discovery research_180215.cdr

advertisement
Life Sciences
White Paper
Adopting Future-Forward Discovery Research
Strategies: High Performance Computing in the
Pharmaceutical Industry
About the Authors
Dr Arundhati Saraph
Lead—High Performance Computing (HPC), Life Sciences
At Tata Consultancy Services (TCS), Arundhati works on leveraging HPC to address
scientific problems in niche areas of Life Sciences. These include discovery research,
genomics, metagenomics, predictive toxicology, and bioinformatics, all of which find
application in the pharmaceutical and healthcare sectors. Arundhati holds a PhD from
the National Chemical Laboratory, Pune, and has previously worked with several
research and academic institutions.
Anita Suri
Domain Consultant —HPC, Life Sciences
Anita has expertise in bioinformatics applications and various Life Sciences tools in the
HPC environment. She has over 10 years of experience in areas like molecular docking,
molecular dynamics, high throughput virtual screening, Quantitative Structure Activity
Relationships (QSAR), and pharmacophore modeling. Anita holds a master's degree in
bioinformatics from the University of Pune.
Vivek Sharma
Research Analyst—HPC, Life Sciences
Vivek has over six years of experience in using HPC tools in the Life Sciences domain.
He has worked with different categories of bioinformatics applications such as in silico
drug discovery, genomics data analysis, biomolecular simulation, and molecular
visualization. Vivek holds a master's degree in biomedical engineering from the Indian
Institute of Technology Bombay.
Abstract
High drug discovery and development costs and limited patent cycles are known challenges
of the pharmaceutical industry. An economical drug discovery and development process
which successfully leverages research and development (R&D) can act as a game-changer in
this competitive space. Reduced time-to-market plays a key role in driving performance.
It typically takes about 10 years for a potential new drug to be registered. Today, the R&D
productivity in terms of numbers of new chemical entities (NCEs) registered per unit amount
of investment is declining due to various factors. Also, there is a high iteration rate at the
clinical trial stage because of the stringent Food and Drug Administration (FDA) regulations.
This is resulting in slower drug approvals, and reinforcing the need for increased R&D
productivity.
One way this can be done is by simulating several steps in the drug discovery process.
The simulations, which generally leverage HPC, have demonstrated reduction in product
development lifecycles, as well as in product development costs and improved overall
productivity. In this paper, we discuss how HPC can enable pharmaceutical companies to
accelerate biomedical research and the discovery process.
Contents
1. Introduction
5
2. Application of HPC in Discovery Research
5
Challenges Encountered with HPC platforms
7
HPC enabled Bio-computation and simulation platforms
8
3. Benefits of Leveraging HPC
8
4. Conclusion
9
Introduction
The pharmaceutical industry is at a challenging juncture. Increasing drug development and marketing costs,
shrinking pipelines, expiring patents, demanding regulatory requirements, changing healthcare policies, as well as
the reducing popularity of the blockbuster drug model call for new adaptation strategies. In order to have sufficient
new molecules in the pipeline, there is a need to increase R&D productivity.
This can be achieved by applying innovative and cost-effective predictive research methods using in silico or virtual
studies. Simulations carried out prior to lengthy laboratory experiments and trials can help analyze a lot of 'what if'
scenarios. For instance, the experimental medicinal chemist and biochemist can use in silico screens to choose the
most promising drug candidates and avoid experimental dead-ends early in the new drug discovery cycle. HPC is
the practice of aggregating computing power to deliver much higher performance than one could achieve with
regular computing infrastructure. It helps perform complex simulations at various stages of the new drug discovery
and development process.
Application of HPC in Discovery Research
Trends in biomedical research show a shift towards translational research which requires better understanding of
biology. Today, simulation studies can be used to explore new research avenues, throw up novel insights, and
address problems upfront on the basis of the simulation assessment. These results can potentially enable a
medicinal chemist to select the appropriate molecules that can be taken ahead in the pipeline at various stages of
the discovery research process. Also, disciplines such as bioinformatics, systems biology, and next generation
sequencing (NGS) are data and compute-intense applications. HPC can be leveraged to carry out such biosimulations as well as for bio-computational analysis.
Some examples that show how HPC can be leveraged at different stages in a product life cycle are given below:
Example#1: HPC enables rapid response to the H1N1 virus
A group of scientists from the University of Illinois, Urbana- Champaign, carried out in silico studies to assess
mutation or change in the H1N1 virus that had rendered a potent drug Tamiflu ineffective.1 By appropriately using
the HPC platform, the experiment could be completed in just over an hour. If the simulations had been conducted
on a CPU this would have taken days to complete. The complete study was fairly extensive and required multiple
simulations to be carried out for a system of 35,000 atoms. The study concluded that the mutation that led to the
resistance to Tamiflu was due to the disruption of the ‘binding funnel’, shedding light on the mechanism behind
drug resistance.
[1] NVIDIA UK, University of Illinois: Accelerated Molecular Modelling Enables Rapid Response to H1N1, http://www.nvidia.co.uk/object/illinois-university-uk.html
5
Example#2 High throughput virtual screening using HPC
In silico methods such as high throughput virtual screening (HTVS) can be used in hit identification. This allows
covering a large chemical space and reducing the time, effort, and cost compared to the traditional high
2
throughput screening (HTS). Conventionally, HTS allowed a researcher to test millions of compounds in vitro using
robotics. As the number of molecules that are tested in the chemical space run into millions, their acquisition (or
chemical synthesis) and efficient testing by sophisticated robots is generally cost prohibitive and time consuming.
On the other hand, HTVS takes advantage of fast algorithms to filter the chemical space and successfully select
potential drug candidates. This reduces the number of probable hits to be further screened (in vitro and in vivo)
from the large chemical space of 10⁶⁰ conceivable compounds to a manageable number that can be synthesized,
purchased, and tested.3
A significant reduction in time is also achieved by using computational processes. In a study conducted using a
50,000-core cluster (Amazon Cloud), the computational chemistry outfit Schrödinger analyzed 21 million drug
compounds in just 3 hours for less than USD 4,900. Using the traditional approach, this would have taken years to
4
complete and involved a huge sum of money. The use of an HPC environment for HTVS reduces both the time and
cost involved in the hit identification stage of the new drug discovery process. This approach can also be used at
the lead optimization stage and for drug repositioning.
Example #3 Use of HPC in NGS and translational research
Techniques like NGS can help understand the genetic origin of diseases, assist in developing more potent regimes
for treating disorders, and offer alternate strategies to the standard blockbuster drug approach. The Translational
Genomics Research Institute is leveraging the high throughput gene sequencing technology for precision therapy
trials for children and adults with lethal cancers. It uses this information for diagnosis and treatment tailored to
individuals, for rapidly advancing child cancers.5
HPC, complex data analytics platforms, and high throughput storage systems are some of the key technologies
driving disciplines like NGS and bioinformatics forward. A robust HPC platform that hosts all the data analytics
tools, various bio-computational and bio-simulation tools, as well as databases that can be accessed by multiple
users, can drive research outcomes.
Research on biomarker analysis using NGS relies on data and compute-intense activities. To derive meaningful
analysis from the raw results thrown up by the sequencing machines or platforms, a number of computational and
data reorganization steps are required. A lot of information stored in databases pertaining to research data,
published work, omics, and clinical trials can be used to derive these insights. Such analysis will aid in better target
selection, research on the development of biomarkers for diagnostics, and in the long run reduce phase II and III
attrition. Studies on binding between potential drugs and proteins in the body can help identify cross binders that
can be eliminated ahead of lengthy and expensive experimental stages of development. Figure 1 depicts how
information from various sources can be used to conduct bio-simulations and computations upfront before
conducting lengthy experimental procedures to test several 'what if' scenarios.
[2] Supercomputing facility for Bioinformatics and Computational Biology, IIT Delhi, Virtual high throughput screening: will it help for lead identification?, accessed on 10 February
2015, www.scfbio-iitd.res.in/seminar/incob/IndiraGhosh.doc
[3] International Journal of Advanced Research in Computer Science and Software Engineering, Insilico Methods in Drug Discovery-A Review, May 2013, accessed in 11 February
2015, http://www.ijarcsse.com/docs/papers/Volume_3/5_May2013/V3I5-0282.pdf
{4} Giagaom,Cycle Computing spins up 50K core Amazon cluster, April 2012, accessed on 4 February 2015 https://gigaom.com/2012/04/19/cycle-computing-spins-up-50k-coreamazon-cluster/
[5] DataDirect Networks, Genomics Research (DDN case study) 2011, accessed on 28 January 2015 http://www.ddn.com/pdfs/Tgen_Case_Study.pdf?982861
6
Pharmacogenomics approach linking biomarkers to therapeutics
Successful outcome
in select population
Hit to
lead
Target
to hit
Lead
optimization
Preclinical
Preclinic
al
Phase
1
Phase
II
Bio-simulation platforms
Bio-informatics platform
Genomics platform
Model based drug
development
n Novel target identification
n
n
n
Phase III
Submission
to launch
Launch
Information repository
n
n
n
n
Research
Published
Clinical
NGS
Figure 1: Use of the In silico Approach at Various Stages in the Product Life Cycle of a New Chemical Entity
Challenges encountered with HPC platforms
Computational analysis in each area of discovery research requires a different set of tools, each with a specific
compute, input output (I/O), and throughput requirement. For instance, the amount of compute, I/O, and memory
requirement for optimal performance on a bio-simulation run could be different from resource requirement for
genomics tools carrying out sequence alignment. Suboptimal use of resources such as memory or I/O can even
lead to poor performance or application failures.
Furthermore, to carry out accurate and speedy analysis of the clinical data, the data has to be accessed, processed,
analyzed, and visualized. Often, this requires movement of data from the storage device to cluster nodes for further
analysis, which becomes challenging as the data size increases. This research data generated through microarray
and cell analyzer systems is image based and generally requires both short and long term storage. Also, the data
and results generated at the discovery research step as well as various steps in the new drug discovery cycle have to
be studied and analyzed by various groups working across the organization. This increased movement of data back
and forth between different types of temporary storage can create bottlenecks and lead to suboptimal
performance. Storage thus becomes a critical resource requirement for all data-driven research, especially in NGS
analysis, as there is a huge need for data access and data storage—both short term and for data archival purposes.
Considering these challenges, it is difficult to provide optimal HPC architecture in scenarios with the following
diverse requirements:
n
Algorithmic complexity and high compute-intensive platform requirements
n
Distributed computing solutions needing low latency network-centric HPC
7
n
High throughput file systems which can access and synchronize data at multiple gigabytes
n
Architectures providing large-scale memory
n
Large data analysis and information extraction requiring data mining
n
Acquiring information in multiple data formats from a variety of sources requiring low latency and high
throughput systems
n
Graphics and visualization demands asking for near real time solutions
Complete knowledge and understanding of the resource requirement for each class of applications used in the
industry is required to address the above needs. HPC enabled platforms that host workflows to carry out various
analyses in areas of genomics, bio-simulations, or bioinformatics enable scientists to focus on analyzing results and
address domain related problems without spending time on resolving software or infrastructure-related issues.
HPC enabled Bio-computation and simulation platforms
A bio-computation and simulation service through the use of workflows or pipelines can be used as a prepackaged solution to address problems in discovery research. The main focus would be to see how computations
go a long way in supporting the traditional R&D activity. Recent advances in software techniques have made it
possible to combine software tools and information bases, making it possible to offer predefined software
solutions to domain problems. The scientific workflow system is a specialized form of a workflow management
system designed specifically to compose and execute a series of computational or data manipulation steps.
Essential requirements of a HPC-enabled work-flow or pipeline are as follows:
1. Strong domain knowledge, both breadth and depth
2. A set of software technologies that can create a user friendly and expressive interface
3. An assortment of well-trusted software tools
4. A software suite that allows the combination of the tools and takes care of the integration
5. A cluster management suite that manages the computer hardware—usually a network of clusters
A robust HPC infrastructure should cater to variable work load requirements. It should use optimal resources
through cluster and data management tools and job scheduling software to efficiently run the simulations.
Benefits of Leveraging HPC
By applying the HPC approach pharmaceutical enterprises can:
n
Accelerate drug discovery: HPC can be used to accelerate several steps like target identification and validation,
hit identification, cross binding studies, and drug repositioning in the new drug discovery process.
8
n
Explore new research avenues: HPC technology can successfully execute large simulations which otherwise
take longer to complete or are abandoned due to computing resource shortfalls, thus enabling new research
opportunities.
n
Enhance organizational focus: HPC enabled platforms that host work-flows to carry out various analyses in the
areas of genomics, run bio-simulations, or conduct bioinformatics analysis allow scientists to focus on analyzing
results and address domain-related problems without spending time on resolving software and IT related issues
n
Execute tasks faster: Use of HPC enabled workflows and platforms simplify solution usage and execute tasks
faster by reducing the simulation runtime
Conclusion
Complex, compute-intense application areas in the early stages of drug discovery such as target identification and
validation, hit or lead identification, cross binding studies, and NGS analysis can benefit from the use of HPC.
With the constant growth in data volumes and the requirement to carry out multiple analyses there arises a need to
move large volumes of data from storage device to cluster nodes. To achieve this, there is a need for high
performance as well as scalable I/O, as performance bottlenecks at any of these steps result in delays in the
downstream analysis. To avoid this, there should be an efficient access of data across servers as well as hassle-free
data movement between different tiers of storage to support variable work load. This can be realized with the use
of an HPC system that enables optimal utilization of the infrastructure through cluster and data management tools
providing optimal performance enhancement.
HPC enabled platforms or work-flows are instrumental in addressing problems through virtual or in silico studies at
various steps in the discovery research. Use of HPC enabled platforms in other areas in downstream drug discovery
including clinical trial analysis, supply chain management, drug safety analysis, and health economics can also be
beneficial. An HPC platform that hosts end-to-end applications and conducts seamless integration of workflows
linking the discovery, development, and clinical trial analysis to the downstream supply chain can enable a
pharmaceutical company to efficiently carry out new drug discovery and ultimately reduce the time-to-market.
9
About TCS Life Sciences
With over two decades of experience in the life sciences domain, TCS offers a comprehensive
portfolio in IT, Consulting, KPO, Infrastructure and Engineering services as well as new-age business
solutions including mobility and big data catering to companies in the pharma, biotech, medical
devices, and diagnostics industries. Our offerings help clients accelerate drug discovery, advance
clinical trial efficiencies, maximize manufacturing productivity, and improve sales and marketing
effectiveness.
We draw on our experience of having worked with 7 of the top 10 global pharmaceutical companies
and 8 of the top 10 medical device manufacturers. Our commitment towards developing next
generation innovative solutions and facilitating cutting-edge research - through our Life Sciences
Innovation Lab, research collaborations, multiple centers of excellence and Co-Innovation Network
(COINTM) - have made us a preferred partner for the world's leading life sciences companies.
Contact
For more information, contact lshcip.pmo@tcs.com
Subscribe to TCS White Papers
TCS.com RSS: http://www.tcs.com/rss_feeds/Pages/feed.aspx?f=w
Feedburner: http://feeds2.feedburner.com/tcswhitepapers
About Tata Consultancy Services (TCS)
Tata Consultancy Services is an IT services, consulting and business solutions organization that
delivers real results to global business, ensuring a level of certainty no other firm can match.
TCS offers a consulting-led, integrated portfolio of IT and IT-enabled infrastructure, engineering and
assurance services. This is delivered through its unique Global Network Delivery ModelTM,
recognized as the benchmark of excellence in software development. A part of the Tata Group,
India’s largest industrial conglomerate, TCS has a global footprint and is listed on the National Stock
Exchange and Bombay Stock Exchange in India.
IT Services
Business Solutions
Consulting
All content / information present here is the exclusive property of Tata Consultancy Services Limited (TCS). The content / information contained here is correct at
the time of publishing. No material from here may be copied, modified, reproduced, republished, uploaded, transmitted, posted or distributed in any form
without prior written permission from TCS. Unauthorized use of the content / information appearing here may violate copyright, trademark and other applicable
laws, and could result in criminal or civil penalties. Copyright © 2015 Tata Consultancy Services Limited
TCS Design Services I M I 03 I 15
For more information, visit us at www.tcs.com
Download