screening.discovery.review - Stanford Translational Medicine

advertisement
High Throughput Screening Informatics
Running title: HTS data handling
Xuefeng Bruce Ling
Stanford Medical Center
Stanford University, CA
Email: xuefeng_ling@yahoo.com
Supplementary materials: http://hts.stanford.edu
Abbreviations:
Key words:
ABSTRACT
High throughput screening (HTS), an industrial effort to leverage
developments in the areas of modern robotics, data analysis and control
software, liquid handling devices, and sensitive detectors, has played a pivotal
role in the current drug discovery process, allowing researchers to efficiently
screen millions of compounds to identify tractable small molecule modulators of a
given biological process or disease state and transform these into high content
lead series. As HTS throughput has significantly increased the volume,
complexity of data, and the content level of information output, discovery
research demands a clear corporate strategy for scientific computing and
subsequent establishment of enterprise-wide (usually globally) accessible, robust
informatics platforms, enabling complicated HTS work flows, facilitating HTS data
mining, and driving effective decision-making. The purpose of this review is,
from the data analysis and handling perspective, to examine key elements in
HTS operations and some essential data-related activities supporting or
interfacing the screening process, outline properties that various enabling
software solutions should have, and offer some general advice for corporate
managers with system procurement responsibilities.
INTRODUCTION
The completion of the Human Genome Project has significantly advanced
our understanding of human biology and the nature of many diseases, resulting
in a plethora of novel therapeutic targets. High throughput screening (HTS), an
industrial effort to leverage developments in the areas of modern robotics,
scientific computing and control software, liquid handling devices, and sensitive
detectors, has played crucial role in the current drug discovery process, allowing
researchers to effectively and efficiently screen millions of compounds to identify
tractable chemical series (HTS hits) with the requisite biological activity against
the target of choice. Subsequently, HTS hits are followed up as starting points
for drug design, transformed into high content lead series, and as tool
compounds to understand the interaction or role of a particular biochemical
process in biology.
As a brute-force approach and a complex industrial process, HTS
“manufactures” unprecedented amount of experimental data -- usually
observations about how some biological entity, either proteins or cells, reacts to
exposure to various chemical compounds -- in a relatively short time. Given the
cost, technical specialization and operational sophistication, application of small
molecule screening and discovery to the development of chemical probe
research tools has been limited in pharmaceutical/biotech companies. There is a
recent trend in academia to set up HTS facilities with the capacity to rapidly
screen millions of compounds on a routine basis, which normally only industry
had. The scale and complexity of any current HTS campaigns quickly outstrip
the capabilities of desktop spreadsheet like data analysis software. As HTS
technology moves from the industrial sector to the academic and small biotech
sectors, strategies of how to deliver cost-effective and scalable computing
solutions with limited budget become important. The purpose of this review is,
from the informatics perspective, to examine the key elements in HTS operations
and some essential data-related activities supporting or interfacing the screening
process, outline requirements and properties that various enabling scientific
computing systems should have, and offer some general advice for managers
with system procurement responsibilities.
MAIN TEXT
High-throughput screening is a key link in the chain comprising the
industrialized lead discovery paradigm. Figure 1 diagrams the key operational
elements and necessary supporting databases in HTS. To enable complex HTS
workflows and strive for the best possible decisions for various HTS scenarios,
development, deployment and maintenance of enterprise data handling systems
are essential. Data management requirements include the effective track of
various types of information transfer, robust storage of large data sets of
compound structures, biological samples, various types of containers, registered
assay catalogs, primary/secondary screenings, dose-response assays, specificity
assays, assay/automation workflow SOPs, and efficient interface to other
operational and scientific data sets including PKDM. The integration of various
information resources and software applications can significantly promote
multidisciplinary communications and teamwork between HTS and other
corporate functions, including biology, laboratory automation, sample bank,
analytical/computational/medicinal chemistry, scientific computing, and project
teams.
“Buy versus Build” has been the constant dilemma for corporate managers
implementing HTS and supporting HTS data analysis. Off-the-shelf software are
currently available that can be acquired to "plug and play" to support HTS. They
are mass-produced containing generic content not developed specifically for one
organization or user population. Therefore, pieces of the HTS process may be
best served well by a given vendor and their marketed packages, but that
delivering a complete solution as vendors usually claim is unlikely.
The various elements in this case will require front and back-end application
support via their programming interfaces or APIs (Application Programmer
Interfaces). While high throughput technologies have dramatically increased the
pace of research, they have also dramatically increased the complexity of HTS
and HTS-related processes. In reality, it is impossible to “Build” or develop a
complete custom enterprise solution from grounds up. Therefore, the hunting
dilemma becomes “what to buy” and “what to build”.
It does not matter of the final decision as whether to “Build vs. Buy”, you have
to thoroughly digest the business analysis and you have to perform integration
afterwards both the business and the current retained software. As the HTS
processes grow more complex, they have also grown more sensitive to even
small process changes, making process optimization and process yield
management essential to remain competitive. BergenShaw International's Focus
software product enables high throughput laboratories to rapidly accelerate their
response to change by quickly identifying those process factors and combination
of factors associated with yield loss and yield gain. Developing a complete
custom package for HTS is a formidable task requiring high levels of cooperation
between the various teams that produce and consume information and those that
develop the data management tools.
Buy Versus Build: A Battle of Needs
By Laura M. Francis with Randy Emelo
It's the question that haunts the dreams of every training professional, the one that gives adults nervousstomach butterflies as though it were the first day of school. Buy versus build, the perpetual dilemma for people who
are implementing e-learning.
Build typically means creating an e-learning product from the ground up. That includes determining learning
objectives, writing content, creating graphics, and crafting the design. Building also must include testing the product,
by such means as pilot sessions, to ensure that it functions as anticipated.
In addition to buying off-the-shelf e-learning products or building them from the ground up, you can also
customize e-learning by making cosmetic changes to a program's content and graphics, so it appears to be built just
for you. I put that process under the build umbrella since the product must be modified, but customizing typically
requires fewer resources and a smaller degree of commitment than starting from scratch.
Although the decision to buy or build may seem daunting, it can be boiled down into three factors to consider:
needs, resources, and uniqueness. (See Buy v. Build Flowchart.)
Needs
Begin by identifying your needs, which are critical to helping you narrow your focus and determine the features
most important to you. To pinpoint your needs, ask yourself these questions:
* What organizational objectives must I meet?
* What skills do I want to build?
* What information do I want to pass along in order to improve knowledge?
* What behaviors do I want to support and enhance?
You'll rarely be able to meet all of the needs you discover; therefore, you must prioritize them. Match your most
pressing needs to the learning objectives of the program. For example, if you have a strong need for improved
communication skills, look for a product with one or more learning objectives that meet that need. Such objectives
might read, "discover three types of basic communication" or "develop a communication-based action plan for
addressing an employee's poor work habits."
Resources
This factor is one that most people wish they could avoid thinking about. However, examining your resources is
critical. If resources means just money to you, think again. Although money does play an important role, two other
pieces of the puzzle need to be considered as well: time and personnel.
Time. When considered in terms of buying or building e-learning, time takes into account the following:
* how long you have to make your decision
* how long you have to develop an e-learning product, including testing time if necessary
* how long you have to roll out or implement the product within the organization.
To understand the importance of time resources in the buy versus build decision, consider this scenario. You're
given unlimited funds for your e-learning purchase, but only a three- to six-month timeframe to review, purchase, and
roll out the e-learning product. In this case, time will be the deciding factor in your choice of buying or building. By
analyzing your own situation against the three variables above, you can accurately determine the importance of time
in your decision.
Personnel. Another often-overlooked resource factor is personnel. That includes people needed for both
implementation and support of e-learning; each group plays an integral role in your decision. As you analyze your
resources, you must determine whether you have not only the personnel to implement the e-learning product within
the organization (you'll need an advocate or champion to push the initiative within the company), but also whether
you have the needed number of technical support staff available (for example, IT and administrative support people).
If you don't have the personnel you anticipate needing, you must identify where you can obtain the required support
as well as how much that support will cost, which brings us to the third resource.
Money. Both a limited and an unlimited budget can influence your choice of buying versus building. When
considering your budget, you must look at the short-term and long-term benefits of the overall investment--that
includes analyzing the effects of not implementing any e-learning and in turn not developing the skills, knowledge,
and behaviors you identified as needs. When building, you may incur a large, up-front, one-time fee, but the
investment may be lower over time. (The product will be your exclusive property once it's paid for in full.) A built
product will typically need to serve at least 500 users in order to qualify as a good investment.
When buying e-learning, you must calculate how much money you anticipate paying over the life of the product,
which is typically 12 to 24 months for off-the-shelf products. For example, you may think you're spending less for offthe-shelf e-learning, but you need to take into account annual renewal and maintenance fees as well as costs for
upgrades or add-ons that may become available. Those hidden expenses may end up costing you more over the life
of the product than you had anticipated or budgeted.
To combat rising costs, identify and document both your company's and the supplier's role and responsibilities.
That can help you pinpoint added fees up front. For example, identify who will provide technical support. If it's the
supplier, determine whether that's included in the purchase price, how much it will cost if it's not included (is it a flat
monthly or yearly rate or do you have to pay per request?), and how long the support will be available.
Uniqueness
The third factor to consider when deciding whether to buy or build e-learning is uniqueness. Are you using elearning to teach a proprietary business process or skill? Do you need the e-learning product to cover general skills
or subjects (for example, conducting annual reviews) or more targeted information (for example, giving effective
feedback to hostile, resistant, and ambivalent employees)? Can you find distinct connections between seemingly
mismatched off-the-shelf products and your needs (such as an off-the-shelf communication product that you can pair
with your effective feedback initiative)? Does either your industry or corporate culture (or perhaps both) preclude a
generic e-learning product? Answering those questions can help you determine the degree of uniqueness you need
from e-learning products.
If you're transferring a proprietary business process or skill (such as patented information for a pharmaceutical
company) into an e-learning program for employees, you'll typically need to custom-build a product. You may
consider customizing one that already exists, but generally a proprietary business process is so unique that no
existing products will meet your needs.
If your content needs are more generic, you can still customize an off-the-shelf product to give it the feel and
flare of your organization. That could include making such simple changes as putting the company's logo in the
header or slightly revising the wording to match organizational language. Ultimately, you must decide how willing you
are to forego originality for quicker delivery time.
Drumming up support
Once you've analyzed needs, resources, and uniqueness, you should also consider a supplemental decision
factor: the support or buy-in you can expect from people within the organization. Although this factor may not affect
your decision to buy or build directly, it will certainly affect how people within the organization receive your decision.
Necessary support falls into three categories:
* support of upper-level management and executives
* support of like-level colleagues
* support of end users.
Management. The need for support from upper-level management and executives is easily apparent; they have
the power to feed your idea to the lions or cut you a check with lots of zeros on it. Becuase this group has so much
influence, people often spend all their time "selling" them and forget the other two important groups whose support
they need.
Colleagues. This group constitutes your peers, those people you must interact with on a daily basis. Their basic
level of influence with upper-level management and executives can affect the outcome of your initiative, so it's
important to gain their support. People in this group need to understand your goals for the e-learning product, how
you plan to pay for it (especially since your budget may include funds they sought for their own uses), who will use it,
and how their own jobs will be affected by the initiative.
End Users. This group is the oft-forgotten low man on the totem pole. They have the hidden power to scuttle
your entire plan by refusing to use the product, which will convince your colleagues and executives that the
investment was wasted. End users may not intentionally sabotage the initiative; their resistance may stem from not
understanding why they have to use the product or resenting not being part of the decision-making process (whether
or not their opinion would've influenced your decision). Or they may resist because the product doesn't meet their
needs. Don't downplay the importance of this group--their support is critical.
Although it's not always possible to get the support of all those individuals, open and direct communication with
each of them can help your e-learning implementation succeed, whether you buy or build. Armed with the
information in this article you should be able to choose wisely--and sleep at night.
Developing a complete custom package for HTS is a formidable task
requiring high levels of cooperation between the various teams that produce and
consume information and those that develop the data management tools. Data
strage models or objects need to be developed for each of the subsystems, and
APIs will have to be developed to permit communication between them. There
are many factors to be weighed in choosing the best tools and approaches for
developing HTS systems. In our own efforts, we have benefits greatly from the
use of object-oriented programming methods. These approaches emphasize a
modular approach that produces systems that are extensible, easy to maintain,
and code-efficient. Every consideration should be given to ease of use and
flexibility, but in the end the flexibility desired by the users must be carefully
balanced against both the flexibility allowed by the automated assay systems and
the IT cost of maintaining overly complex software.
, have far-reaching consequences for success of later obtaining tractable
high quality lead series.
An effective and efficient HTS campaign obviously depends significantly
on the successful procurement and management of compound structures,
samples, containers and requests coming from HTS stages.
Although the assay developer always comes up with a method to consistently measure an experimental
outcome, the assay characteristics are different depending on where one sits within the spectrum of drug discovery.
In the beginning of the spectrum are assays whose primary purpose is to identify chemical leads from the thousands
of compounds found in a chemical library. These assays are typically for high-throughput primary screening (HTS)
and are usually multiwell and very robust.
On the other side of the spectrum are assays for clinical diagnostics, which are robust and well characterized
and have to go through a rigorous analysis of sample lots, reagent shelf life, intra- and interassay variability
performed in a good laboratory practice (GLP)-compliant manner to achieve FDA approval for marketing. Because I
have been fortunate enough in my tenure in the pharmaceutical industry to develop assays for both HTS and
therapeutic target teams, I can highlight both the differences and similarities between assays in these two areas.
Assay development for HTS is developing an assay that is target-specific, multiwell, robust, and capable of
automation. It can be cell-based or biochemical. The assay is sometimes given to the HTS team by the therapeutic
group but often in a format that is not suitable for an HTS and needs to be optimized or redeveloped from scratch.
Although there are exceptions, these assays are homogeneous and low volume. They can be read by fluorescence,
luminescence, absorbance, radioactivity, or any other output that is available on plate reading systems.
The assays are suited for the running of thousands of compounds per run, so each plate must be designed with
its own controls, to ensure quality of results. The samples are typically run in singlet, and therefore the assay must
perform so well that one can tell the difference between an active compound or "hit" and noise in the assay. This is
validated using the z-factor analysis of variability. The assays are developed only upon request by a therapeutic area
when a target is ready to enter screening and the target itself has been validated. Depending upon the throughput or
the number of compounds screened, these assays can be completed in a matter of days to weeks so there is no
need to be prepared to run these assays on an ongoing basis. These assays are run as in a factory where
production output is important and the assay itself must fit into the process.
Given various challenges and complexities of any HTS operation, it is
imperative that all interested parties engage and involve thoroughly during the
entire HTS campaign. From HTS function point of view, screening scientists
should and usually integrate well within the corporate drug discovery
environment. For example, the operational integration of various screening
activities like compound supply, assay design and execution, data analysis
and tracking, compound quality control can obviously have profound impact
on the efficiency and effectiveness of HTS campaigns. Striving for the best
possible decisions for various HTS scenarios, such as screen prioritization,
timing for assay to screen transition, choices of assay format, compound input
format (e.g. number, type, and concentration), choices of secondary and
selectivity assays, and when to rescreen, can be daunting tasks without
assistance of comprehensive real time information.

HTS campaign and business process standardizations

Importance of the controlled vocabulary
Multi displinary team will advance a specification to formally describe interoperable business processes and
business interaction protocols for Web services orchestration.
allows users to describe business process activities as Web services and define how they can be
connected to accomplish specific tasks.
The co-authors rightfully view customer adoption as the most important hurdle in making a business
process standard meaningful--and that means ubiquitous ISV support.
"To solve real-life business problems, companies may need to invoke multiple Web services applications
inside their firewalls and across networks to communicate with their customers, partners, and suppliers,"
said Diane Jordan of IBM, co-chair of the OASIS WSBPEL Technical Committee. "BPEL4WS allows you to
sequence and coordinate internal and external Web services to accomplish your business tasks. Thus, the
result of one Web service can influence which Web service gets called next, and successful completion of
multiple Web services in a process can be coordinated."
John Evdemon of Microsoft, co-chair of the OASIS WSBPEL Technical Committee, added, "The participants
in this Technical Committee are committed to building and delivering standards-based interoperable Web
services solutions to meet customer requirements. Business processes are potentially very complex and
require a long series of time- and data-dependent interactions. However, BPEL4WS allows companies to
describe sequential interactions and exception handling in a standard, interoperable way that can be shared
across platforms, applications, transports and protocols."
"Through OASIS, a large group of organizations are joining together to further the evolution of BPEL4WS
from specification to standard--within the context of an open, publicly vetted process. Active participation
from the OASIS membership at-large, which includes many business process solution vendors as well as
customers, will provide valuable input on usage cases and implementation scenarios that will result in the
broadest possible industry adoption," commented Karl Best, vice president of OASIS. "We plan to work
closely with organizations such W3C, UN/CEFACT, and others in completing the 'big picture' of Web
services."
"W3C's members believe coordination is vital to ensure the delivery of timely and thorough technical
solutions that truly meet the needs of customers, especially in the area of Web services," explained Steve
Bratt, Chief Operating Officer for W3C. "To that end, W3C's Web Services Choreography Working Group has
invited representatives of the OASIS WSBPEL Technical Committee to attend its second face-to-face
meeting in June. We look forward to building on the technical coordination already established between
OASIS Technical Committees and W3C Working Groups."
About OASIS (http://www.oasis-open.org)
OASIS (Organization for the Advancement of Structured Information Standards) is a not-for-profit, global
consortium that drives the development, convergence, and adoption of e-business standards. Members
themselves set the OASIS technical agenda, using a lightweight, open process expressly designed to
promote industry consensus and unite disparate efforts. OASIS produces worldwide standards for Web
services, security, XML conformance, business transactions, electronic publishing, topic maps and
interoperability within and between marketplaces. Founded in 1993, OASIS has more than 2,000 participants
representing over 600 organizations and individual members in 100 countries.

HTS automation support and interface with data analysis pipelines
Assay data for HTS is generated on a wide variety of instruments or “readers”.
Virtually all these readers are controlled via software provided by their
manufacturer. Instrument control software varies widely in complexity but usually
has menu options allowing some customization of its exported (e.g., ASCII text)
data files. At present, there are no standard data formats adopted by the
instrument makers, so developers of an HTS calculating system need to write a
number of routines or “data filters” which parse the text file to a format or formats
compatible with their system. To limit the complexity and variety of the data
filters, the automation team can implement process controls to standardize the
formats used by the various readers.

HTS data analysis, storage and QC requirements
Challenge: automation
In an overly simplified scenario, a screen essentially brings a compound plate and an
assay plate together, from which a readout is taken. Of course, the compound plate
typically needs to be diluted and/or transferred, which requires a pipetting station.
Another station is needed to start the biochemical plate or remove media and replace
buffer from a cell plate. The order of reagents added to the plate needs to be decided in
advance because it is typically the last reagent that should start the reaction. After mixing
the diluted compounds and reagents or cells, the plates may need to be incubated, before
finally going through the detection devices.
There are two common approaches to accomplishing this workflow, which are subject to
yet "another debate topic," says Roche's Garippa. Screening facilities may either use a
robotic system or a series of workstations. "Robotic systems produce less variability," he
says. "You also need dedicated personnel and a tight partnership with your service
provider since down-time on a robotics platform can be very expensive." The high
capacities on robotic platforms are very attractive because the primary screen can be
finished very quickly.
On the other hand, workstations yield more flexibility to work around technical problems,
and if they do occur, typically there are easily available substitutions with backup
instrumentation. "If the primary pipettor breaks, you just use another. Each of the
workstations now have stackers, so you can load a stack of plates and have walk-away
capability, but you can't walk away for many hours at a time as with robotic systems,"
says Garippa.
BMS's Cacace says that the choice depends on the assay and the flexibility of the robotic
system. "Some robotic systems do not have the flexibility to deal with varying assay
formats," she says. "It's depends on what equipment is on automated system and how
flexible it is, as well as how dynamic is the scheduler [informatics platform that controls
the movement of the robot]. Long incubation steps, for example, may cause a lot of
down-time on the robot, which decreases throughput and warrants a switch to a
workstation system."
Merck takes an "industrialized approach" to HTS assays and relies on robotic platforms.
"The problem with semi-automated workstations is that scientists are burdened with
unnecessary repetition; automation just makes our life easier," says Peter Hodder, PhD,
head of HTS robotics, Merck Research Laboratories, North Wales, Pa. "Our robotic
systems are flexible and modular, so we can move detectors or liquid handlers on and off
depending on the needs of the assay."
Merck runs their robots continuously, either validating an HTS assay or running the
screen itself. The trick is to develop an assay that is scalable to a robotic system. "We
make sure that the assay is robust and that assay development scientists anticipate how
the protocol will run on the robotic platform," continues Hodder. "They are aware that the
robot may not be able to reproduce what can be done by hand, and work with automation
scientists to design the HTS protocol accordingly. When the protocol is designed with
this end-point in mind, the scale-up to HTS is a gradual and straightforward process."
Challenge: quality control
Running a perfect screen is nearly impossible. The trick to success is detecting any
problems that may have arisen along the way. Taking multi-parameter measurements
during the screen, using appropriate controls, and monitoring the data in real time during
the screen are all part of the quality control (QC) steps necessary to ensure meaningful
data. Analyzing QC data in real time while the screen is still running allows researchers
to correct any obvious equipment problems.
"We need quick visualization tools or automatic cut-off points to detect pipetting
problems, clogged tips, consistency in liquid volume in all wells, tip carryover from plate
to plate, compound autofluorescence, bubbles in wells, and many more," says Garippa.
"The QC software is key: it allows us to find a multitude of problems in the assay, from
pipetting to washing to incubation. I wish I could say [QC] was all automated. But
because it's so important, we end up taking a quick glance at each plate to see if there are
any obvious problems. We are not yet at a point where we can completely trust in silico
QC."
The popular software packages for HTS data management and QC include CyBi-SIENA
from CyBio AG, Jena, Germany; Screener from Genedata, Basel, Switzerland;
ActivityBase suite from IDBS, Surrey, UK; and DecisionSite for Lead Discovery from
Spotfire Inc., Somerville, Mass. Many pharmaceutical companies also rely on databases
and informatics tools developed in-house.
"We have a custom database for viewing results and QC," says Merck's Hodder. "It's a
large and essential IT component in our operation, responsible for acquiring data from the
robots and for data management. We define statistical parameters to determine whether
the plate is 'healthy' without human intervention," he says. "The screen data is also linked
to compound information."

Business decisions: home-made vs. commercial software
There are several commercially available software packages for HTS (Activity
Base, MDL information Systems, Accelrys, Tripos). These packages vary in their
capabilities and should be examined with the point of view that pieces of the
process may be best served by a given package, but that supplying a complete
solution is unlikely. The various elements in this case will require front and backend application support via their programming interfaces or APIs (Application
Programmer Interfaces).
Developing a complete custom package for HTS is a formidable task requiring
high levels of cooperation between the various teams that produce and consume
information and those that develop the data management tools. Data storage
models or objects need to be developed for each of the subsystems, and APIs
will have to be developed to permit communication between them. (In this
discussion, the term “object” refers to the representation of a set of data as a
series of text and numerical variables in computer memory).
Some critically important properties that HTS calculating and review software
should have (1) flexibility – end users should be able to place various control
types and choose assay mappings for the calculators from a simple interface.
Other specifications needed for proper calculations should be supplied to the
calculator from predefined prototypes as well. Caution should be exercised in
permitting too much flexibility in the calculating software, as the automated
processes themselves are usually the primary reason for limiting flexibility. (2)
graphical user interface – numerical trends are best perceived using graphs of
the data. If possible, incorporate interactive graphical tools for labeling points,
navigating through the plates and subsetting the data. (3) speed and ease of use
– the HTS laboratory is a hectic place. Any tool used by screening personnel
should be fast, intuitive to use, and robust. Wherever possible, precompute
variables and store them in RAM to allow efficient random navigation through the
data. Choices from the interface should be from list controls rather than edit
fields. Storing the configuration of the calculator settings with the data object
generated by the calculator allow users to correct for minor changes in the
protocol (e.g. the location of the controls was accidentally reversed) at run time.

Computing algorithm overview (single dose vs multiple dose), high
content screening

Quality control algorithms and data visualization
4:15 Statistical Quality Control throughout the High-Throughput Screening
Process
Monitoring and analyzing the quality of data in High-Throughput Screening (HTS) is becoming
ever more complex as volume increases, and turn-around times decrease. Over the last 2 years
GSK have been developing a Statistical Quality Control system to meet this challenge. Key
features of this system are: a modular rules-based approach to defining statistical measures of
quality, provision of rules for both the screening process and for individual plates, provision of
tools for both real-time and offline analysis of the quality data, early warning and alerting to
process problems, and low impact integration with GSK's existing Activity Base screening
environment. This system, developed with Tessella in 2005, is now being used routinely by
screening scientists to analyze screening plates worldwide at GSK. It has facilitated the
application of common business rules for QC across sites for passing or failing plates before
publishing HTS data. Through the application of a modular rules-based Statistical Quality Control
system coupled with a sophisticated visualization and data analysis tool significant improvements
in the efficiency of the Molecular Screening process can be realized.

Data warehousing and data mining
The basic question a compound screen seeks to answer is: What are the true
biological effects of a compound? The challenge for current data analysis
tools is analyzing multiple readouts for signs of compound activity, specificity
and cross reactivity to determine real effects from artifacts. Common
questions: Hits Has this compound been considered a hit in other assays?
Was it confirmed? Statistics How often has this compound been tested? How
often was it a hit? Details Show me the structure and other detail information
about the compound. Screens Show me every screening result for the
compound. Challenge: target selectivity
With the growing size of compound libraries, it is not uncommon for a screen
to yield in excess of 10,000 active hits for a given target. Target selectivity is
one of the main criteria in narrowing the leads.
Most targets currently in screening are members of larger protein families,
such as isozymes or receptor subtypes. Despite the high degree of homology,
these protein family members have different functions, making it important for
a compound to selectively affect the chosen target.
"Very early on, we would like to know how much of a liability target
nonselectivity is," says Ralph Garippa, PhD, research leader, cell-based HTS
and robotics at Hoffmann-La Roche Inc., Nutley, N.J. "For some targets, you
want to develop a pan-inhibitor; you don't care if it also knocks out other
members of the protein family. In other cases, the primary side-effect profile
arises from the activity of closely related receptor or enzyme. The project
team has to decide before developing the assay whether to bring other family
members forward in cloning, expression, and purification or in parallel cell
lines," says Garippa.
In most screening facilities, it is standard follow-up procedure to run counter
screens to determine if the active compounds from the primary screen hit
other related targets. This would eliminate "promiscuous" compounds. For
example, when screening G-protein-coupled receptor (GPCR) targets, it is
common for the disease team to consider fold-selectivity over the "nearest
neighbor" that may cause side effects. In functional cell-based GPCR agonist
assays, where endogenous GPCRs are present in the given cell line, the
specificity of the compound to the target is determined by running the parental
cell line in the same assay and eliminating compounds that show activity at
endogenous GPCRs in the parental cell line screen.
In addition to running counter screens against the target family members,
compounds are also profiled across 10 to 50 receptors to determine any other
activity the compound may exhibit. This activity would need to be eliminated,
either by dropping the hit or optimizing at the medicinal chemistry stage, in
order to avoid the toxic liability of the clinical candidate. The more screens are
run on the same library, the more historical data on compound profiling is
available, indicating each compound's activity in a variety of assays.

Knowledge sharing and “Google”
1:30 Using Web 2.0 Technologies for Effective Knowledge Management and Real-Time
Collaboration
In 2006 Nastech launched a project-based, user-driven collaborative resource on our
intranet to meet the growing needs of capturing our Nastech knowledge in a structured
format. This presentation will demonstrate how we developed a standardized ontology for
the description of Nastech assays and how Web 2.0 development methodologies were
advantageous to the rapid development of a user-driven collaborative workspace.
*
Brief overview of the Nastech informatics infrastructure
*
The challenges to developing and maintaining a standardized ontology
*
How Nastech utilized Web 2.0 development methodologies to allow scientists to
dynamically interact with their experimental data in real time
*
Using a decentralized approach to the storing and management of the various internal
and external data silos within an adaptive framework to allow disparate data sources to be
seamlessly integrated
*
Highlights with a few examples
Jeremy Thompson, Development Manager, Research Information Services, Nastech
Pharmaceutical Company, Inc.
12:00 pm Enterprise-Wide Management of Scientific Data
Effective scientific collaboration is essential for enhancing R&D productivity, and is as
important within a company as it is between strategic partners. It requires the alignment
of data handling and project management systems on a global scale, yet with minimal
complexity. It also needs to bridge the inherent differences between individual R&D
components or “silos”, an aspect identified by the FDA’s Critical Path. Teranode is
addressing this with technologies based on the next generation of the Web, which is able
to address the scale and diversity within pharmaceutical organizations. This presentation
will address these issues in the context of Translational Research.
*
Universal access to all information resources
*
Creation of annotations of between any set of data and documents
*
Mapping to legacy datasystems
*
Dynamic inclusion of new data types, including biomarkers and genotypic profiles
*
Security and access control
*
Ability to add and utilize controlled vocabulary across all groups
*
Support for powerful query engines and inference agents
Eric Neumann, Ph.D., Senior Director Product Strategy, Teranode Corporation
6 Information Management
Data Acquisition
Data Analysis
Expectations
Strategies
Error Detection
Error Correction
Normalization and Data Condensing
Data Standardization
Statistical Analysis
Binning and Pooling
Statistics
General
Strategies
Random and Systematic Error
Random Error
Systematic Error
Bias
Type 1 and Type 2 Error
Sample Number (N)
Signal-To-Noise and Signal-To-Background Ratios
Signal-To-Noise
Signal-To-Background
Limit Of Detection
Precision and Accuracy
Standard Deviation
Coefficient of Variance
Resolution
Residual Analysis
Ordinary Least Squares
Residual Analysis
Z' Factor
Software and Automated Data Analysis
References
Download