NewsGator - SQL Server 2008 Case Study

advertisement
RSS Aggregator NewsGator Manages 2.5
Billion Articles with SQL Server 2008
Overview
Country or Region: United States
Industry: High Tech and Electronics - IT
Customer Profile
Based in Denver, Colorado, NewsGator
Technologies develops and markets
solutions for the aggregation and
viewing of Really Simple Syndication
(RSS) feeds.
Business Situation
NewsGator needed to enhance the
relational database infrastructure it uses
to support 2.5 billion RSS articles totaling
4 terabytes, as part of its RSS
aggregation and custom delivery
solutions.
Solution
NewsGator is upgrading to Microsoft®
SQL Server® 2008 Enterprise Edition (64bit) database software running on the
Windows Server® for 64-Bit Systems
operating system.
Benefits
 High availability with Database
Mirroring
 Reduced backup storage needs with
Backup Compression
 Better control with Resource Governor
 Scalability
 Easier data management
“When dealing with terabytes of data, backup
becomes a big issues. The Backup Compression
feature in SQL Server 2008 should reduce our storage
needs by at least half.”
Greg Reinacker, Chief Technology Officer and Founder, NewsGator Technologies
NewsGator makes life easier for individuals and companies by
aggregating Really Simple Syndication (RSS) data feeds from
across the Web to provide users with customized content
delivery, enabling everyone to essentially create their own
electronic newspaper. The company, which also provides
Software as a Service to more than 50 media outlets including
CNN and USA Today, stores some 2.5 billion RSS articles
totalling about 4 terabytes on clustered databases running
Microsoft® SQL Server® database Software. NewsGator is
upgrading its database infrastructure to SQL Server 2008
Enterprise Edition (64-bit) running on the Windows Server® 2008
for 64-Bit Systems operating system to take advantage of a
number of new features, including enhanced Database Mirroring
for high availability, Backup Compression to reduce storage
needs, and Resource Governor for allocating processing
resources.
Fast Facts
Data Stored on SQL Server
4 terabytes
RSS Articles Stored
2.5 billion
Average SQL Server I/O operations
6,000 per second
Peak SQL Server I/O operations
25,000 per second
Application Platform Capabilities
Data Management, Service-Oriented
Architecture and Business Process
Situation
While the total Web content of the World
Wide Web might be unimaginably vast,
NewsGator Technologies has earned a loyal
and growing following by helping to ensure
that it’s not un-manageably vast.
NewsGator helps individuals and
companies tame the Web’s vast realm of
information to gain a more personalized
and convenient Web experience. The
company aggregates Really Simple
Syndication (RSS) feeds so users can have
their own personally chosen collection of
Web-based news, information, podcasts,
and other relevant content always at the
ready whenever they care to go browsing,
from whatever Web-enabled device is
handy.
Check your feeds from Hong Kong on a
mobile phone one day, from your brother’s
Apple laptop the next day, and from your
desktop computer running the Windows
Vista operating system the next. Each time
you log on, NewsGator will remember what
articles you’ve already seen, regardless of
the device or operating system you were
using, while keeping all of your RSS
subscriptions up to date.
“We maintain what we call read states so
no matter what device or software you are
using, we keep all of that synchronized so
you don’t see the same articles that you’ve
already read,” explains Glenn Berry,
Database Architect at NewsGator
Technologies.
Providing service like this requires effective
data management. The company stores
about 2.5 billion RSS articles, totaling some
4 terabytes, to support the solutions it
offers, which includes providing free RSS
readers for individuals, in addition to
creating enterprise-grade applications to
support large organizations, including a
number of Fortune 500 companies. The
company also acts as a service provider
with its Software as a Service (SaaS)
business.
If the ability to choose what news you want
delivered to you each day in the form of
RSS feeds sounds a lot like creating your
own personalized newspaper, it shouldn’t
be surprising that NewsGator’s SaaS
business has more than 50 media and
online publishing customers, including
CNN, Discovery Communications, Media
General, and USA Today.
The company’s customers value its SaaS
service because the applications and data
are hosted by NewsGator, meaning an
organization can take advantage of the
company’s services without deploying
supporting RSS infrastructure. Small blocks
of embedded code, called widgets, can be
dragged and dropped onto a Web page or
blog to support automated RSS delivery.
Using NewsGator’s Widget Framework,
online content providers can extend their
content and brands while positively
impacting important online metrics that
enhance their profitability.
The company currently stores about 4
terabytes of RSS feeds to support its
subscribing customers. That figure is
expected to only climb, so database
management is a major operational factor
for NewsGator. The company has been
“We store about 2.5
billion articles in SQL
Server, and it continues
to scale to meet our
needs. With SQL Server
2008 and the rest of the
Microsoft Application
Platform we don’t see
any limits to our ability
to grow.”
Greg Reinacker, Chief Technology Officer
and Founder, NewsGator Technologies
impressed with the performance it has
enjoyed using Microsoft® SQL Server®
2005 database software running on the
Windows Server® 2003 R2 operating
system.
However, NewsGator was eager to take
advantage of several new features in SQL
Server 2008, and decided to begin
upgrading its databases even before the
software was released.

Solution
NewsGator Technologies has begun
upgrading its database infrastructure to
SQL Server 2008 Enterprise Edition (64-bit)
running on Windows Server 2008
Enterprise Edition for 64-Bit Systems. The
company was eager to upgrade to SQL
Server 2008 to take advantage of a number
of new features and enhancements,
including:
Database Mirroring Enhancements.
Database Mirroring is a technology for
increasing database availability by
transferring transaction log records from
one server to another, allowing quick fail
over to the standby server. SQL Server
2008 includes enhancements to Database
Mirroring, including automated Torn
Page (data corruption) Detection and
Correction and Database Mirroring Log
Compression.
 Policy-based Management. New for
SQL Server 2008, Policy-based
Management helps organizations set and
enforce compliance with policies for
system configuration, SQL Server
databases, and other SQL Server objects.
Administered from the SQL Server
Management Studio, Policy-based
Management can be used to set and
enforce policy for internal and external
database developers and administrators.
 Resource Governor. SQL Server 2008
enables organizations to provide a




consistent and predictable response to
end users with the introduction of
Resource Governor. Organizations can
use Resource Governor to define
resource limits and priorities for different
workloads, and to help ensure resources
can’t be unduly impacted by poorly
constructed queries or other unusual
workloads.
Backup compression. With SQL Server
2008 backup compression, the
compression is performed in memory
before the data is transferred to disk.
Backups run significantly faster since less
disk I/O is required. Backup compression
reduces the storage required to keep
backups online, reducing the overall cost
of keeping disk-based backups.
Performance Data Collection. SQL
Server 2008 provides Performance
Studio, an integrated framework that
organizations can use to collect, analyze,
troubleshoot, and store SQL Server
diagnostics information.
Integrated Full-Text Search. SQL Server
2008 introduces Integrated Full-Text
Search, which makes the transition
between full-text search and relational
data seamless while enabling users to
employ the full-text indexes to perform
high-speed text searches on large text
columns.
Merge SQL Statement. The MERGE SQL
statement, new for SQL Server 2008,
enables developers to more effectively
handle common database administration
tasks such as checking whether a row
exists and then executing an insert or
update.
NewsGator supports its SaaS business with
a multi-tier architecture that includes:

Web Servers Tier. NewsGator has many
load balanced Web servers running the
Windows Server 2008 for 64-Bit Systems
operating system and Internet
High Availability - Using SQL
Server 2008 Database
Mirroring for high availability
Information Services 7 (IIS7). The Web
servers also store NewsGator Web
services and widgets.
 Content Servers Tier. NewsGator has 10
content servers that run the company’s
internally developed aggregation
applications and retrieve content from
the 2 million feeds that the company
pulls daily for its customers. Content
servers store feeds on the database tier,
and depending on content demand, on
caching servers. During peak activity the
content servers process more than 700
articles a second. The applications were


SQL Server 2008 Database Mirroring
(asynchronous)
(Principal)
(Mirror)

SQL Server 2008 Failover Cluster
Primary data on
3PAR SAN
Mirrored data
on EMC SAN

developed using earlier versions of the
Microsoft Visual Studio development
system and the Microsoft .NET
Framework. The company now uses
Visual Studio 2008 and the .NET
Framework 3.5.
Caching Servers Tier. RSS feeds are
cached to speed response times in
retrieving items for customers. The
caching servers reduce calls to the
database servers.
Database Tier. The company’s 4
terabytes of compressed RSS data is
hosted on two mirrored clusters running
SQL Server 2008 Enterprise Edition (64bit). Each cluster has 3 nodes in an
active\active\passive configuration. The
two clusters are synchronized using SQL
Server 2008 Database Mirroring. The 4
terabytes are hosted across four SQL
Server instances.
Indexing Tier. As RSS feeds are
aggregated to the database clusters,
NewsGator’s indexing servers index
articles and other content as it arrives.
The index servers are hosted on
dedicated instances of SQL Server. The
Index servers will be upgraded to take
advantage of the Full-Text Search feature
of SQL Server 2008.
Storage Tier. Storage is on an 3PAR
storage area network (SAN).
The solution is hosted on Dell PowerEdge
server computers with 4-way, 64-bit, dualcore processors and 32 gigabytes (GB) of
RAM.
SQL Server 2008
Failover Cluster
(Archive)
Benefits
3PAR SAN
EMC SAN
NewsGator Technologies is benefiting from
the enhanced Database Mirroring in SQL
Server 2008, which it uses for high
availability and as a powerful database
management tool. The company is also
benefiting from reduced backup storage
needs by using Backup Compression, better
control of processing allocation using
"The Performance Data
Collection feature of
SQL Server 2008,
combined with the
Dynamic Management
Views we gained with
the earlier release, give
us the tools to precisely
see what is happening
so we can make our
operations ever more
efficient in supporting
our customers.”
Darryl Dreiling, Director of Platform
Development, NewsGator Technologies
Resource Governor, scalability, and easier
database management.
High Availability with Enhanced
Database Mirroring
NewsGator was an early adopter of
Database Mirroring when it was introduced
as part of SQL Server 2005. The company
has found that in addition to supporting
high availability, Database Mirroring also
can be used to reduce downtime from
hours to just seconds when performing
scheduled maintenance.
“Last year we deployed a new SAN and
needed to move our 4 terabytes of data
from the old SAN to the new one,” Berry
says. “Normally this process would require
a service outage of several hours. Using
SQL Server Database Mirroring, we simply
ran a backup from the existing SAN, loaded
it onto the new SAN, and brought it up as a
mirror. After the new SAN had
synchronized, we flipped our service over
to it and took the old SAN offline,
physically moved it to a new location, and
then brought it back up and synchronized
it as the second half of the mirrored set.
Our total outage for this was 15 seconds.”
The company uses Database Mirroring in
the same way when applying service packs
and other scheduled maintenance, with the
same results. “Service pack updates that
used to require 2 hours of down time are
now accomplished with the same kind of
15-second breaks,” says Berry. “A 15second outage, compared to hours of
downtime, represents a huge benefit for
our operations.”
Berry sees similar reductions in scheduled
downtime coming from SQL Server 2008
enhancements to Database Mirroring,
including the Automatic Torn Page
Detection and Repair.
“Should you ever get a torn page because
of a hiccup in your I/O subsystem, or some
other cause, SQL Server 2008 Database
Mirroring now has the ability to
automatically detect the problem and
request the page from the other side of the
mirror,” says Berry. “This is a really powerful
enhancement because prior to this if you
detected corruption you would have to run
DBCC CHECKDB to try to repair the data,
and that would likely mean taking
downtime, because this is a fairly intense
operation. With SQL Server 2008 Database
Mirroring you can avoid the effort and
downtime.”
Reduced Backup Storage Needs with
Backup Compression
NewsGator expects to gain longer life from
its existing storage infrastructure by
deploying the Backup Compression feature
of SQL Server 2008. “When dealing with
terabytes of data, backup becomes a big
issue,” says Greg Reinacker, Chief
Technology Officer and Founder of
NewsGator Technologies. “Backup
Compression should reduce our space
needs by at least half, which will provide a
longer life for our existing backup storage.”
Darryl Dreiling, Director of Platform
Development at NewsGator Technologies,
adds: “With multi-terabytes of data,
backups can be problematic. You have to
deal with time constraints, I/O constraints,
and just plain running out of space. From
our testing we can see that the Backup
Compression feature in SQL Server 2008 is
going to be a big help for us.”
Better Control of Processing Allocation
with Resource Governor
The Resource Governor feature of SQL
Server 2008 will greatly enhance
NewsGator’s ability to allocate how system
resources across its infrastructure are
allocated. The ability to govern the use of
“Policy-based
Management gives us
the ability to enforce
naming standards,
security settings,
memory settings, and
other elements to
simplify database
management.”
Glenn Berry, Database Architect, NewsGator
Technologies
resources is so important to its operations
that the company has built some throttling
technology into its applications, so it
welcomes the additional tools it gains for
this with SQL Server 2008.
“We pride ourselves in providing an
exceptional user experience, which includes
split-second delivery of feeds to our users,”
says Dreiling. “Some of our processes need
to run with maximum performance. But we
have other operations that don’t require
that level of performance. For example, if
our content engines update feeds every
five seconds, that is great. The content
upgrades don’t have to be within a
millisecond. Governing how much of the
CPU, memory, or other resources we
allocate for content upgrades, may help us
meet our service level agreements on other
metrics.”
Berry is also happy to have Resource
Governor to protect operations.
“Resource Governor protects our
operations from being dominated by a
particular Web server, or a specific SQL
Server log-in, or application that could
otherwise monopolize resources,” says
Berry. “We can now limit a Web server or
group of users to a certain amount of
memory or CPUs. Resource Governor helps
protect our operations.”
Scalability
The scalability of SQL Server helps
NewsGator keep pace with the growing
demands of its global users. NewsGator
uses several techniques for scalability. A
goal was to stay with their commodity
server computers, so when their volume
started getting too big for one database
server they decided to scale out.
First, the company used a Services Oriented
Data Architecture (SODA) technique to
move some of the independent parts of the
database to their own servers. Next they
split the older RSS feeds to an archive
server and used Data Dependant Routing
logic in the application layer to locate the
content.
The company’s SQL Server database
averages 6,000 operations a second in
serving up content to NewsGator
customers. “That figure goes up to 25,000
SQL Server I/O events per second during
peak usage,” says Reinacker. “We store
about 2.5 billion articles in SQL Server, and
it continues to scale to meet our needs.
With SQL Server 2008 and the rest of the
Microsoft Application Platform we don’t
see any limits to our ability to grow.”
Easier Data Management
With terabytes of data to handle,
NewsGator appreciates management
features new to SQL Server 2008 including
Policy-based Management, the MERGE SQL
Statement, Full-Text Indexing, and
Performance Data Collection.
“Microsoft has realized that a lot of DBAs
have hundreds if not thousands of servers
to manage, with lots of databases on each
one,” says Berry. “Policy-based
Management gives us the ability to enforce
naming standards, security settings,
memory settings, and other elements to
simplify database management. You can
have one policy for your development
servers, and another policy for your
production servers, to help ensure that
whenever you stand up another server it is
set up correctly to match all of your other
servers.”
The SQL Server 2008 MERGE SQL
Statement is “very interesting to us because
it will reduce the number of I/O writes we
do when logging, and whatever we can do
to minimize I/O writes is of interest,” says
Dreiling. “MERGE SQL gives us the ability to
take a bunch of data and copy it in with
minimal logging and that's huge to us
because without something like this I know
I've got to pay the overhead price of I/O
writes. This enables us to greatly reduce the
overhead of logging, and that’s good for
our overall operations.”
Integrated Full-Text Search, new for SQL
Server 2008 is important to NewsGator
because it enables the company to
combine relational and full-text searching
to support richer content searches. “Our
customers want to be able to search on
meta data plus full text at the same time,”
says Reinacker. “For example, rather than
just matching all the articles that mention
Katmai, our customers want to be able to
search for all articles that mention Katmai
from a specific list of RSS feeds. Customers
will be able to search for a term across their
own specified white list of, say, 700 feeds.
This ability to search across feeds is difficult
without Integrated Full-Text Search.”
NewsGator likes SQL Server 2008
Performance Data Collection for the same
reason it values the Dynamic Management
Views (DMVs) feature that was introduced
with SQL Server 2005 - - the greater
visibility into operational performance
helps the company tune its systems to
provide faster response times for its
customers. “Prior to DMVs, I couldn't tell
you—without a lot of extra work—what
were our most expensive operations in
terms of resource utilization,” says Dreiling.
“Now, with DMVs I can run a quick query
and identify the top 10 or 25 stored
procedures in terms of taking time to
execute. This enables us to very precisely
tune our operations, and then to re-test to
verify that our changes are improving
processing times. The Performance Data
Collection feature of SQL Server 2008,
combined with the Dynamic Management
Views we gained with the earlier release,
give us the tools to precisely see what is
happening so we can make our operations
ever more efficient in supporting our
customers.”
3PAR Utility Storage
NewsGator’s previous storage environment
lacked flexibility and was difficult to
manage and scale in response to business
changes and rapid growth. With 3PAR
Utility Storage, NewsGator has scaled easily
within a single, autonomically loadbalanced storage system. The decision to
deploy 3PAR has allowed NewsGator to
enhance their RSS database infrastructure
without time-consuming planning activities
or costly professional services. After
evaluating alternatives from traditional SAN
vendors, NewsGator selected 3PAR Utility
Storage to support its solutions, all of
which are built on multiple Microsoft SQL
Server databases. NewsGator selected a
3PAR InServ® S400 Storage Server with 50
terabytes of storage and several 3PAR
software offerings including 3PAR Dynamic
Optimization, 3PAR Virtual Copy, and 3PAR
System Reporter.
3PAR Dynamic Optimization software is
designed to enable users to change data
service levels with a single command,
online and non-disruptively. This provides
NewsGator the ability to convert between
RAID levels, drive types (Fibre Channel and
Enterprise-class Serial ATA), and data
placement to provide high-capacity
utilization while preserving high
performance levels.
3PAR Virtual Copy is an innovative thin
snapshot technology that requires no
upfront space reservations and is designed
to provide customers such as NewsGator
with efficient copies of their Microsoft SQL
databases and to allow them to revert to
any previously created snapshot with a
single command. This functionality is
important to recovering Microsoft SQL
databases quickly and minimizing company
recovery time objectives.
3PAR System Reporter is a simple-to-use,
Web-based performance and capacity
management tool designed to aggregate
historical system data for one or more
3PAR InServ Storage Servers. System
Reporter is ideal for troubleshooting,
planning, monitoring and providing
information required for Service Level
Agreements and chargeback support.
NewsGator uses the performance statistics
available through System Reporter to plan
for and justify needed upgrades.
"3PAR already provides a reliable and easyto-use consolidated storage platform for
our Windows Server environment," says
Glenn Berry, Database Architect for
NewsGator. "With 3PAR's support of
Windows Server 2008 and SQL Server 2008,
we can now combine Microsoft's most
advanced operating system and database
technology with our 3PAR Utility Storage
platform to run our business with complete
confidence.”
Summary
In summary, NewsGator Technologies is
using SQL Server 2008 to enhance its
impressive RSS database infrastructure to
provide individuals, self-hosting
corporations, and SaaS customers with
scalable solutions for pulling value from the
Web through customized delivery of RSS
content.
For More Information
For more information about Microsoft
products and services, call the Microsoft
Sales Information Center at (800) 4269400. In Canada, call the Microsoft
Canada Information Centre at (877) 5682495. Customers who are deaf or hardof-hearing can reach Microsoft text
telephone (TTY/TDD) services at (800)
892-5234 in the United States or (905)
568-9641 in Canada. Outside the 50
United States and Canada, please contact
your local Microsoft subsidiary. To access
information using the World Wide Web,
go to: www.microsoft.com
For more information about NewsGator
products and services, call (800) 6084597 or visit the Web site at:
www.newsgator.com
For more information about 3PAR
products and services, visit the Web site
at: www.3par.com
Windows Server 2008, SQL
Server 2008, and Visual Studio
2008
Windows Server 2008, SQL Server 2008,
and Visual Studio 2008 provide a secure
and trusted foundation for creating and
running your most demanding applications.
Combined, the products offer advanced
security technology, developer support for
the latest platforms, improved
management and Web tools, flexible
virtualization technology to optimize your
infrastructure, and access to relevant
information throughout your organization.
For more information about Windows
Server 2008, go to:
www.microsoft.com/windowsserver2008
For more information about SQL Server
2008, go to:
www.microsoft.com/sql/2008/default.mspx
For more information about Visual Studio
2008, go to:
www.microsoft.com/vstudio.
Software and Services

Microsoft Servers
− Windows Server 2008 for 64-Bit
Systems
− SQL Server 2008 Enterprise (64-bit)
3rd Party Software
 3PAR Utility Storage Software
This case study is for informational purposes only.
MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED,
IN THIS SUMMARY.
Document published February 2008
3PAR Dynamic Optimization
3PAR System Reporter
3PAR Virtual Copy
Hardware

Dell PowerEdge server computers with 4way, 64-bit, dual-core processors and 32
GB of RAM
Partner

3PAR
Download