RSS Aggregator NewsGator Manages 2.5 Billion Articles with SQL Server 2008 Overview Country or Region: United States Industry: High Tech and Electronics - IT Customer Profile Based in Denver, Colorado, NewsGator Technologies develops and markets solutions for the aggregation and viewing of Really Simple Syndication (RSS) feeds. Business Situation NewsGator needed to enhance the relational database infrastructure it uses to support 2.5 billion RSS articles totaling 4 terabytes, as part of its RSS aggregation and custom delivery solutions. Solution NewsGator is upgrading to Microsoft® SQL Server® 2008 Enterprise Edition (64bit) database software running on the Windows Server® for 64-Bit Systems operating system. Benefits High availability with Database Mirroring Reduced backup storage needs with Backup Compression Better control with Resource Governor Scalability Easier data management “When dealing with terabytes of data, backup becomes a big issues. The Backup Compression feature in SQL Server 2008 should reduce our storage needs by at least half.” Greg Reinacker, Chief Technology Officer and Founder, NewsGator Technologies NewsGator makes life easier for individuals and companies by aggregating Really Simple Syndication (RSS) data feeds from across the Web to provide users with customized content delivery, enabling everyone to essentially create their own electronic newspaper. The company, which also provides Software as a Service to more than 50 media outlets including CNN and USA Today, stores some 2.5 billion RSS articles totalling about 4 terabytes on clustered databases running Microsoft® SQL Server® database Software. NewsGator is upgrading its database infrastructure to SQL Server 2008 Enterprise Edition (64-bit) running on the Windows Server® 2008 for 64-Bit Systems operating system to take advantage of a number of new features, including enhanced Database Mirroring for high availability, Backup Compression to reduce storage needs, and Resource Governor for allocating processing resources. Fast Facts Data Stored on SQL Server 4 terabytes RSS Articles Stored 2.5 billion Average SQL Server I/O operations 6,000 per second Peak SQL Server I/O operations 25,000 per second Application Platform Capabilities Data Management, Service-Oriented Architecture and Business Process Situation While the total Web content of the World Wide Web might be unimaginably vast, NewsGator Technologies has earned a loyal and growing following by helping to ensure that it’s not un-manageably vast. NewsGator helps individuals and companies tame the Web’s vast realm of information to gain a more personalized and convenient Web experience. The company aggregates Really Simple Syndication (RSS) feeds so users can have their own personally chosen collection of Web-based news, information, podcasts, and other relevant content always at the ready whenever they care to go browsing, from whatever Web-enabled device is handy. Check your feeds from Hong Kong on a mobile phone one day, from your brother’s Apple laptop the next day, and from your desktop computer running the Windows Vista operating system the next. Each time you log on, NewsGator will remember what articles you’ve already seen, regardless of the device or operating system you were using, while keeping all of your RSS subscriptions up to date. “We maintain what we call read states so no matter what device or software you are using, we keep all of that synchronized so you don’t see the same articles that you’ve already read,” explains Glenn Berry, Database Architect at NewsGator Technologies. Providing service like this requires effective data management. The company stores about 2.5 billion RSS articles, totaling some 4 terabytes, to support the solutions it offers, which includes providing free RSS readers for individuals, in addition to creating enterprise-grade applications to support large organizations, including a number of Fortune 500 companies. The company also acts as a service provider with its Software as a Service (SaaS) business. If the ability to choose what news you want delivered to you each day in the form of RSS feeds sounds a lot like creating your own personalized newspaper, it shouldn’t be surprising that NewsGator’s SaaS business has more than 50 media and online publishing customers, including CNN, Discovery Communications, Media General, and USA Today. The company’s customers value its SaaS service because the applications and data are hosted by NewsGator, meaning an organization can take advantage of the company’s services without deploying supporting RSS infrastructure. Small blocks of embedded code, called widgets, can be dragged and dropped onto a Web page or blog to support automated RSS delivery. Using NewsGator’s Widget Framework, online content providers can extend their content and brands while positively impacting important online metrics that enhance their profitability. The company currently stores about 4 terabytes of RSS feeds to support its subscribing customers. That figure is expected to only climb, so database management is a major operational factor for NewsGator. The company has been “We store about 2.5 billion articles in SQL Server, and it continues to scale to meet our needs. With SQL Server 2008 and the rest of the Microsoft Application Platform we don’t see any limits to our ability to grow.” Greg Reinacker, Chief Technology Officer and Founder, NewsGator Technologies impressed with the performance it has enjoyed using Microsoft® SQL Server® 2005 database software running on the Windows Server® 2003 R2 operating system. However, NewsGator was eager to take advantage of several new features in SQL Server 2008, and decided to begin upgrading its databases even before the software was released. Solution NewsGator Technologies has begun upgrading its database infrastructure to SQL Server 2008 Enterprise Edition (64-bit) running on Windows Server 2008 Enterprise Edition for 64-Bit Systems. The company was eager to upgrade to SQL Server 2008 to take advantage of a number of new features and enhancements, including: Database Mirroring Enhancements. Database Mirroring is a technology for increasing database availability by transferring transaction log records from one server to another, allowing quick fail over to the standby server. SQL Server 2008 includes enhancements to Database Mirroring, including automated Torn Page (data corruption) Detection and Correction and Database Mirroring Log Compression. Policy-based Management. New for SQL Server 2008, Policy-based Management helps organizations set and enforce compliance with policies for system configuration, SQL Server databases, and other SQL Server objects. Administered from the SQL Server Management Studio, Policy-based Management can be used to set and enforce policy for internal and external database developers and administrators. Resource Governor. SQL Server 2008 enables organizations to provide a consistent and predictable response to end users with the introduction of Resource Governor. Organizations can use Resource Governor to define resource limits and priorities for different workloads, and to help ensure resources can’t be unduly impacted by poorly constructed queries or other unusual workloads. Backup compression. With SQL Server 2008 backup compression, the compression is performed in memory before the data is transferred to disk. Backups run significantly faster since less disk I/O is required. Backup compression reduces the storage required to keep backups online, reducing the overall cost of keeping disk-based backups. Performance Data Collection. SQL Server 2008 provides Performance Studio, an integrated framework that organizations can use to collect, analyze, troubleshoot, and store SQL Server diagnostics information. Integrated Full-Text Search. SQL Server 2008 introduces Integrated Full-Text Search, which makes the transition between full-text search and relational data seamless while enabling users to employ the full-text indexes to perform high-speed text searches on large text columns. Merge SQL Statement. The MERGE SQL statement, new for SQL Server 2008, enables developers to more effectively handle common database administration tasks such as checking whether a row exists and then executing an insert or update. NewsGator supports its SaaS business with a multi-tier architecture that includes: Web Servers Tier. NewsGator has many load balanced Web servers running the Windows Server 2008 for 64-Bit Systems operating system and Internet High Availability - Using SQL Server 2008 Database Mirroring for high availability Information Services 7 (IIS7). The Web servers also store NewsGator Web services and widgets. Content Servers Tier. NewsGator has 10 content servers that run the company’s internally developed aggregation applications and retrieve content from the 2 million feeds that the company pulls daily for its customers. Content servers store feeds on the database tier, and depending on content demand, on caching servers. During peak activity the content servers process more than 700 articles a second. The applications were SQL Server 2008 Database Mirroring (asynchronous) (Principal) (Mirror) SQL Server 2008 Failover Cluster Primary data on 3PAR SAN Mirrored data on EMC SAN developed using earlier versions of the Microsoft Visual Studio development system and the Microsoft .NET Framework. The company now uses Visual Studio 2008 and the .NET Framework 3.5. Caching Servers Tier. RSS feeds are cached to speed response times in retrieving items for customers. The caching servers reduce calls to the database servers. Database Tier. The company’s 4 terabytes of compressed RSS data is hosted on two mirrored clusters running SQL Server 2008 Enterprise Edition (64bit). Each cluster has 3 nodes in an active\active\passive configuration. The two clusters are synchronized using SQL Server 2008 Database Mirroring. The 4 terabytes are hosted across four SQL Server instances. Indexing Tier. As RSS feeds are aggregated to the database clusters, NewsGator’s indexing servers index articles and other content as it arrives. The index servers are hosted on dedicated instances of SQL Server. The Index servers will be upgraded to take advantage of the Full-Text Search feature of SQL Server 2008. Storage Tier. Storage is on an 3PAR storage area network (SAN). The solution is hosted on Dell PowerEdge server computers with 4-way, 64-bit, dualcore processors and 32 gigabytes (GB) of RAM. SQL Server 2008 Failover Cluster (Archive) Benefits 3PAR SAN EMC SAN NewsGator Technologies is benefiting from the enhanced Database Mirroring in SQL Server 2008, which it uses for high availability and as a powerful database management tool. The company is also benefiting from reduced backup storage needs by using Backup Compression, better control of processing allocation using "The Performance Data Collection feature of SQL Server 2008, combined with the Dynamic Management Views we gained with the earlier release, give us the tools to precisely see what is happening so we can make our operations ever more efficient in supporting our customers.” Darryl Dreiling, Director of Platform Development, NewsGator Technologies Resource Governor, scalability, and easier database management. High Availability with Enhanced Database Mirroring NewsGator was an early adopter of Database Mirroring when it was introduced as part of SQL Server 2005. The company has found that in addition to supporting high availability, Database Mirroring also can be used to reduce downtime from hours to just seconds when performing scheduled maintenance. “Last year we deployed a new SAN and needed to move our 4 terabytes of data from the old SAN to the new one,” Berry says. “Normally this process would require a service outage of several hours. Using SQL Server Database Mirroring, we simply ran a backup from the existing SAN, loaded it onto the new SAN, and brought it up as a mirror. After the new SAN had synchronized, we flipped our service over to it and took the old SAN offline, physically moved it to a new location, and then brought it back up and synchronized it as the second half of the mirrored set. Our total outage for this was 15 seconds.” The company uses Database Mirroring in the same way when applying service packs and other scheduled maintenance, with the same results. “Service pack updates that used to require 2 hours of down time are now accomplished with the same kind of 15-second breaks,” says Berry. “A 15second outage, compared to hours of downtime, represents a huge benefit for our operations.” Berry sees similar reductions in scheduled downtime coming from SQL Server 2008 enhancements to Database Mirroring, including the Automatic Torn Page Detection and Repair. “Should you ever get a torn page because of a hiccup in your I/O subsystem, or some other cause, SQL Server 2008 Database Mirroring now has the ability to automatically detect the problem and request the page from the other side of the mirror,” says Berry. “This is a really powerful enhancement because prior to this if you detected corruption you would have to run DBCC CHECKDB to try to repair the data, and that would likely mean taking downtime, because this is a fairly intense operation. With SQL Server 2008 Database Mirroring you can avoid the effort and downtime.” Reduced Backup Storage Needs with Backup Compression NewsGator expects to gain longer life from its existing storage infrastructure by deploying the Backup Compression feature of SQL Server 2008. “When dealing with terabytes of data, backup becomes a big issue,” says Greg Reinacker, Chief Technology Officer and Founder of NewsGator Technologies. “Backup Compression should reduce our space needs by at least half, which will provide a longer life for our existing backup storage.” Darryl Dreiling, Director of Platform Development at NewsGator Technologies, adds: “With multi-terabytes of data, backups can be problematic. You have to deal with time constraints, I/O constraints, and just plain running out of space. From our testing we can see that the Backup Compression feature in SQL Server 2008 is going to be a big help for us.” Better Control of Processing Allocation with Resource Governor The Resource Governor feature of SQL Server 2008 will greatly enhance NewsGator’s ability to allocate how system resources across its infrastructure are allocated. The ability to govern the use of “Policy-based Management gives us the ability to enforce naming standards, security settings, memory settings, and other elements to simplify database management.” Glenn Berry, Database Architect, NewsGator Technologies resources is so important to its operations that the company has built some throttling technology into its applications, so it welcomes the additional tools it gains for this with SQL Server 2008. “We pride ourselves in providing an exceptional user experience, which includes split-second delivery of feeds to our users,” says Dreiling. “Some of our processes need to run with maximum performance. But we have other operations that don’t require that level of performance. For example, if our content engines update feeds every five seconds, that is great. The content upgrades don’t have to be within a millisecond. Governing how much of the CPU, memory, or other resources we allocate for content upgrades, may help us meet our service level agreements on other metrics.” Berry is also happy to have Resource Governor to protect operations. “Resource Governor protects our operations from being dominated by a particular Web server, or a specific SQL Server log-in, or application that could otherwise monopolize resources,” says Berry. “We can now limit a Web server or group of users to a certain amount of memory or CPUs. Resource Governor helps protect our operations.” Scalability The scalability of SQL Server helps NewsGator keep pace with the growing demands of its global users. NewsGator uses several techniques for scalability. A goal was to stay with their commodity server computers, so when their volume started getting too big for one database server they decided to scale out. First, the company used a Services Oriented Data Architecture (SODA) technique to move some of the independent parts of the database to their own servers. Next they split the older RSS feeds to an archive server and used Data Dependant Routing logic in the application layer to locate the content. The company’s SQL Server database averages 6,000 operations a second in serving up content to NewsGator customers. “That figure goes up to 25,000 SQL Server I/O events per second during peak usage,” says Reinacker. “We store about 2.5 billion articles in SQL Server, and it continues to scale to meet our needs. With SQL Server 2008 and the rest of the Microsoft Application Platform we don’t see any limits to our ability to grow.” Easier Data Management With terabytes of data to handle, NewsGator appreciates management features new to SQL Server 2008 including Policy-based Management, the MERGE SQL Statement, Full-Text Indexing, and Performance Data Collection. “Microsoft has realized that a lot of DBAs have hundreds if not thousands of servers to manage, with lots of databases on each one,” says Berry. “Policy-based Management gives us the ability to enforce naming standards, security settings, memory settings, and other elements to simplify database management. You can have one policy for your development servers, and another policy for your production servers, to help ensure that whenever you stand up another server it is set up correctly to match all of your other servers.” The SQL Server 2008 MERGE SQL Statement is “very interesting to us because it will reduce the number of I/O writes we do when logging, and whatever we can do to minimize I/O writes is of interest,” says Dreiling. “MERGE SQL gives us the ability to take a bunch of data and copy it in with minimal logging and that's huge to us because without something like this I know I've got to pay the overhead price of I/O writes. This enables us to greatly reduce the overhead of logging, and that’s good for our overall operations.” Integrated Full-Text Search, new for SQL Server 2008 is important to NewsGator because it enables the company to combine relational and full-text searching to support richer content searches. “Our customers want to be able to search on meta data plus full text at the same time,” says Reinacker. “For example, rather than just matching all the articles that mention Katmai, our customers want to be able to search for all articles that mention Katmai from a specific list of RSS feeds. Customers will be able to search for a term across their own specified white list of, say, 700 feeds. This ability to search across feeds is difficult without Integrated Full-Text Search.” NewsGator likes SQL Server 2008 Performance Data Collection for the same reason it values the Dynamic Management Views (DMVs) feature that was introduced with SQL Server 2005 - - the greater visibility into operational performance helps the company tune its systems to provide faster response times for its customers. “Prior to DMVs, I couldn't tell you—without a lot of extra work—what were our most expensive operations in terms of resource utilization,” says Dreiling. “Now, with DMVs I can run a quick query and identify the top 10 or 25 stored procedures in terms of taking time to execute. This enables us to very precisely tune our operations, and then to re-test to verify that our changes are improving processing times. The Performance Data Collection feature of SQL Server 2008, combined with the Dynamic Management Views we gained with the earlier release, give us the tools to precisely see what is happening so we can make our operations ever more efficient in supporting our customers.” 3PAR Utility Storage NewsGator’s previous storage environment lacked flexibility and was difficult to manage and scale in response to business changes and rapid growth. With 3PAR Utility Storage, NewsGator has scaled easily within a single, autonomically loadbalanced storage system. The decision to deploy 3PAR has allowed NewsGator to enhance their RSS database infrastructure without time-consuming planning activities or costly professional services. After evaluating alternatives from traditional SAN vendors, NewsGator selected 3PAR Utility Storage to support its solutions, all of which are built on multiple Microsoft SQL Server databases. NewsGator selected a 3PAR InServ® S400 Storage Server with 50 terabytes of storage and several 3PAR software offerings including 3PAR Dynamic Optimization, 3PAR Virtual Copy, and 3PAR System Reporter. 3PAR Dynamic Optimization software is designed to enable users to change data service levels with a single command, online and non-disruptively. This provides NewsGator the ability to convert between RAID levels, drive types (Fibre Channel and Enterprise-class Serial ATA), and data placement to provide high-capacity utilization while preserving high performance levels. 3PAR Virtual Copy is an innovative thin snapshot technology that requires no upfront space reservations and is designed to provide customers such as NewsGator with efficient copies of their Microsoft SQL databases and to allow them to revert to any previously created snapshot with a single command. This functionality is important to recovering Microsoft SQL databases quickly and minimizing company recovery time objectives. 3PAR System Reporter is a simple-to-use, Web-based performance and capacity management tool designed to aggregate historical system data for one or more 3PAR InServ Storage Servers. System Reporter is ideal for troubleshooting, planning, monitoring and providing information required for Service Level Agreements and chargeback support. NewsGator uses the performance statistics available through System Reporter to plan for and justify needed upgrades. "3PAR already provides a reliable and easyto-use consolidated storage platform for our Windows Server environment," says Glenn Berry, Database Architect for NewsGator. "With 3PAR's support of Windows Server 2008 and SQL Server 2008, we can now combine Microsoft's most advanced operating system and database technology with our 3PAR Utility Storage platform to run our business with complete confidence.” Summary In summary, NewsGator Technologies is using SQL Server 2008 to enhance its impressive RSS database infrastructure to provide individuals, self-hosting corporations, and SaaS customers with scalable solutions for pulling value from the Web through customized delivery of RSS content. For More Information For more information about Microsoft products and services, call the Microsoft Sales Information Center at (800) 4269400. In Canada, call the Microsoft Canada Information Centre at (877) 5682495. Customers who are deaf or hardof-hearing can reach Microsoft text telephone (TTY/TDD) services at (800) 892-5234 in the United States or (905) 568-9641 in Canada. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information using the World Wide Web, go to: www.microsoft.com For more information about NewsGator products and services, call (800) 6084597 or visit the Web site at: www.newsgator.com For more information about 3PAR products and services, visit the Web site at: www.3par.com Windows Server 2008, SQL Server 2008, and Visual Studio 2008 Windows Server 2008, SQL Server 2008, and Visual Studio 2008 provide a secure and trusted foundation for creating and running your most demanding applications. Combined, the products offer advanced security technology, developer support for the latest platforms, improved management and Web tools, flexible virtualization technology to optimize your infrastructure, and access to relevant information throughout your organization. For more information about Windows Server 2008, go to: www.microsoft.com/windowsserver2008 For more information about SQL Server 2008, go to: www.microsoft.com/sql/2008/default.mspx For more information about Visual Studio 2008, go to: www.microsoft.com/vstudio. Software and Services Microsoft Servers − Windows Server 2008 for 64-Bit Systems − SQL Server 2008 Enterprise (64-bit) 3rd Party Software 3PAR Utility Storage Software This case study is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Document published February 2008 3PAR Dynamic Optimization 3PAR System Reporter 3PAR Virtual Copy Hardware Dell PowerEdge server computers with 4way, 64-bit, dual-core processors and 32 GB of RAM Partner 3PAR