NGA Public Web Site Redesign – CQ5 Lessons Learned November, 2012 (David Beaudet, NGA Lead Developer / Architect) Topics 1. Technical Architecture 2. Art Data and Image Integration 3. Other Useful Nuggets NGA Web Site Redesign – March 2013 2 NGA CQ5 Technical Architecture Installation / Upgrade Lessons Learned • CQ 5.4 to 5.5 is a major change – everything is OSGi – some configs buried and some features no longer supported, e.g. RMI no longer seems like an option. • Java 1.7 not supported by CQ 5.5 – unsure about 5.6 – Adobe claims Oracle has a problem in 1.7 that needs fixing • Always disable virus scanner prior to CQ5 install, particularly McAfee, as CQ5 initial install will not complete if McAfee on-access protection is enabled. Maybe run DEV, QA, and Prod on Linux to avoid having to install a virus scanner ? • webDAV doesn’t seem to be completely supported; compounded by Windows 7 shipping with broken webDAV client. • Lucene index occasionally misses nodes – limited to DEV / QA systems? Reindexing may be an option – we have succesfully reindexed. • Start / stop scripts need modification to work properly on Linux • 5.6 seems to have some user interface screens missing, e.g. under Tools. We’re sticking with 5.5 SP 2.1 for now – any opinions about 5.6? • After fresh install, one finds quickly that default memory is insufficient NGA Web Site Redesign – March 2013 3 NGA CQ5 Technical Architecture CQ5 Server Configuration and Operational Concerns • one RHEL 6 VM guest for each of DEV and QA (16 GB RAM / 4 CPUs) • one RHEL 6 VM guest each for prod author and prod publish (16 GB / 4 CPUs) • Lessons Learned • allocate plenty of memory to the JVM for each instance (we use between 4 and 8 GB depending) – use 64 bit operating system otherwise processes are limited to 4GB each • allocate plenty of storage up front – CQ doesn’t perform JCR garbage collection by default – CQ5 halts itself when insufficient storage space is detected – 100GB min • change the default temp directory of CQ5 in start script configuration to a volume with plenty of space • tar persistence is highly I/O intensive - use fastest reliable storage – anyone using SSD? We are using iSCSI on 1 Gbps network • Java patches need to be managed – system library links are severed as part of RHN updates which can lead to CQ5 errors if CQ5 is left running during the patch. • Schedule datastore garbage collection and use incremental backups to minimize impact of backups NGA Web Site Redesign – March 2013 4 NGA CQ5 Technical Architecture CQ5 Clustering • Set up is easy, but result is risky (CQ5 admin class) and expensive • When clustering, understand the types of clustering available and make the right choice based on your requirements (e.g. shared JCR files or synchronized copies). • Conclusions • Frequently, the complexities inherent in software clusters create more downtime than clusters protect against. For small shops especially, that’s almost always the case. • Active / active requires two CQ5 licenses; active / passive might not (ask Adobe). NGA Web Site Redesign – March 2013 5 NGA CQ5 Technical Architecture Web Server Configurations • NGA uses CQ5 behind Apache – HTTPS for author, both schemes for publish – dispatcher only used for publish. • Lessons Learned • Configure mod_proxy to prevent Apache from becoming an open proxy server • SSL configuration in CQ5 requires installing a server certificate into the Java keystore. Unfortunately, Oracle doesn’t include the full set of CA certs that browers do, so your CA’s cert might have to be added to the Java configuration. Moreover, Java updates can replace the keystore which means you have to reinstall the CA cert every time Java is updated. When getting your server’s CSR signed, select a vendor whose CA cert is included out of the box in Java. • Consider creating Apache and dispatcher configuration file templates and automating the creation and distribution of the configs to avoid differences between environments. • Use Apache rather than CQ5 for index page redirection; generally speaking, minimize the number of requests that have to be sent to CQ5 publish instance • Urge developers to test through DEV dispatcher before requested code promotion as filter rejects are easy to identify; create a CGI to expose a long tail of DEV author, DEV publish, and DEV dispatcher logs so developers can see errors. NGA Web Site Redesign – March 2013 6 NGA CQ5 Technical Architecture Dispatcher Configuration • NGA using ver 4.1.2, but 4.1.3 available for download • Lessons Learned • Dispatcher is an application firewall, so it’s probably best to filter out by default and include only the URL patterns that are needed • Check for the text “reject” in dispatcher.log to identify requests that were rejected by the dispatcher filters • Differentiating between the scheme (http vs. https) of the original browser request is impossible to do when proxying the connection to CQ (either with mod_proxy or dispatcher) because the scheme reported by JSP is the scheme of the proxy, not the browser. • Latest dispatcher (4.1.3?) claims to support native HTTPS without require stunnel. Even so, the latest dispatcher still might not support multiple configurations within a single Apache instance, so you might have to install two separate Apache instances on the same machine to detect the scheme properly. • The dispatcher options for flushing the cache are quite limited – you might have to write your own CGI to intercept dispatcher flush requests and handle them yourself – or go with a very small number of stat file levels – we’re still figuring this out. NGA Web Site Redesign – March 2013 7 NGA CQ5 Technical Architecture Author Instance Authentication • NGA looked briefly at SSO with Apache / mod_auth* modules - we use it for other web-based systems; the security model (trusted credential) has been deprecated due to potentially serious security problems • NGA opted for LDAP “out of the box” / we will be transitioning to LDAPS now that it’s working over unsecured LDAP. • Lessons Learned • CQ5 trusted credentials attribute is being phased out as indicated in JackRabbit log files – don’t depend on it - unclear what, if any, mechanism will replace it. • That said, Kerberos (or most any other kind of Apache supported authentication) can be integrated with Apache simply by setting a certain HTTP header in the request after Apache authentication is performed. However, there are security issues with this since headers can be spoofed easily – under current SSO, you must not expose CQ5 instances outside of the local machine. • LDAP (prior to CQ 5.6 I believe) has a bug related to case sensitivity of user names – it’s fixed by a patch – OR – users need to kill the session cookies that are created after their initial login using LDAP. NGA Web Site Redesign – March 2013 8 NGA CQ5 Technical Architecture Publish Instance Authentication • We support native CQ5 users and social users for public authentication • Native CQ5 users are the same users used by the WCM tools, so be sure the default group assignments do not include WCM permissions • oAuth uses the deprecated trusted credentials attribute – so error log fills with warnings about deprecation of trusted credentials attribute • Lessons Learned • oAuth is not a trivial endeavor – there are lots of details and configurations to keep up with plus you’ll be writing some custom code to secure it anyway. • Create specific logging configuration for the oAuth module to send repetitive deprecation warnings to a separate log file or to /dev/null • CQ5.5 SP 2.1 breaks oAuth due to ACL on /etc/cloudservices – you have to adjust it to permit read to /etc/cloudservices/facebook and twitter directories. • HTTPS vs. HTTP and cookie security – might have to get Apache involved to logout users if unsecured cookie transmission is detected – we’re still working on this. NGA Web Site Redesign – March 2013 9 NGA CQ5 Technical Architecture Secure URLs • Need HTTPS for certain form submissions • Avoid abuse of forms that generate e-mail • Lessons Learned • Force HTTPS using Apache and / or with JSPs (if possible) – still figuring out • Use captcha to avoid getting on an e-mail blacklist User Form Submissions • Need removal of data from publish instances due to gov’t privacy rqmnts • Lessons Learned • Relatively straightforward workflows and workflow launchers can achieve this (auto-reverse-replication and content deactivation); form authors need training • Potential security issue with out of the box form handler (path specification for form element name) – need to lock down permissions for anonymous user and public authenticated users NGA Web Site Redesign – March 2013 10 NGA CQ5 Art Data and Image Integration NGA Art Data – key content for the web site • 110k+ art objects, 20k artists, other entities – highly complex relationships stored in external relational database tables • Redesigned web site has complex art data searches, sort orders, and faceting of art data search results • Data must be synchronized – Art object data changes regularly and new objects are added nearly every day Images of Art Objects – key to a visually rich web site • Images of the art – zoom files are very large, multiple sizes are required, color profile management, specific algorithms required for resizing, must support addition of manually cropped renditions • Image associations with art objects are managed in a different system • Images must be synchronized – images change regularly and new images are added frequently NGA Web Site Redesign – March 2013 11 NGA CQ5 Art Data and Image Integration For art object data, we attempted to build a prototype that would feature: • automatic synchronization of data between relational database and JCR • a JCR content hierarchy able to accommodate 1 million+ art data entities and relationships between them without sacrificing data richness or overloading the JCR • Efficient JCR SQL2 queries that return distinct result sets in a highly customized sort order along with custom facets Findings • Relational database to JCR mapping isn’t straightforward. We realized that in order for JCR queries to perform well and in order to avoid duplicate search results, we would have to store multiple copies of the art object data in multiple JCR hierarchies, possibly many depending on the specific queries. • Custom facet extractors in CQ5 must use the QueryBuilder API – none of the other search APIs support it – and the QueryBuilder implementation requires ht be a requirement of all CQ5 searches actually since CQthat all nodes of the result set be visited to accumulate facets. This mig5 permissions do not seem to be indexed in Lucene – so the larger the result set size, the slower the query performance – others customers have reported similar findings. • A solution involving duplication of data into an unknown number of content hierarchies was deemed to be impractical. NGA Web Site Redesign – March 2013 12 NGA CQ5 Art Data and Image Integration What we did • As the complexity of the art object requirements grew, we studied the feasibility of loading all art data and image meta-data into memory. We optimized the memory footprint of those structures and found that it was feasible. The art data bundle loads and caches in RAM approximately 1GB of data. Searches, faceting, and sorting of this data is lightening fast. • What about free text search? We realized that re-implementing the features of Lucene to support free-text search was not something we wanted to do, so we also built a small module to replicate a significantly flattened set of art data in the JCR for Lucene to perform free-text searches on. This data set supports the site search as well as some of the searches on our “advanced collection search” page NGA Web Site Redesign – March 2013 13 NGA CQ5 Art Data and Image Integration For art object imagery, we attempted to: • Use CQ5 DAM to create renditions, including zoom images • Synchronize imagery with the CQ5 DAM using independent Java program invoking JCR APIs over RMI. Findings • Using DAM for renditions ballooned the JCR to over 250GB very quickly – datastore GC might have helped with that, but we didn’t know about it • Insufficient control options over renditions - color profiles are not retained during resizing operations and no out of the box control over resizing algorithms exists • rsync over webDAV didn’t work, so separate program or OSGi bundle would have to be developed to synchronize imagery • Custom workflow processes for images would have to be developed for creating image renditions What we did • Enhance existing image processing jobs to produce additional sizes; rsync images and serve from a dedicated image and image zoom server • Enhance Art Object JCR bundle to include image data NGA Web Site Redesign – March 2013 14 NGA CQ5 Art Data and Image Integration Lessons Learned • Let CQ do what CQ is good at – authoring, storing, and rendering web content. Resist the temptation to feed a lot of other data sources into the JCR. • Spend the money to consult with an Adobe Architect – it’s worth it. Just be sure they understand how many hours you’ve allocated to each topic so you don’t blow the budget. • Consider using a search engine such as SOLR to power your search features if you have requirements bordering on the complex or if you find yourself struggling with CQ5 JCR queries to get what you need. • The CQ5 DAM is useful for assisting content authors, but until it evolves into something larger, limit it’s use to web content authoring. • Perform data store garbage collection on a regular basis if you find your JCR is growing too rapidly. Remember that CQ backups double your disk space requirement. • Adobe still has a lot of work to do with respect to JCR search. Hopefully the next major version of CQ5 will expose a much richer and better performing search engine to developers. NGA Web Site Redesign – March 2013 15 NGA CQ5 Other Useful Nuggets NGA Developer Workstation Configuration • Java 1.6, latest patch • CRXDE rather than Maven + Eclipse – recommendation from Adobe since small team • CRXDE can be slow and SVN integration is barely sufficient • Missing CRXDE libraries • Disable virus scanner, particularly during install • 4 fast cores + 16GB RAM + fast disk (SSD?) so developer can easily run two instances + tools efficiently – some laptops have only 8 which becomes problematic • local Apache + dispatcher would have been helpful Builds • Scripted build process on shared DEV / build server • Uses curl for all operations • Delete and recreate approach taken • SVN export of apps, design, and environment specific sling:osgiConfig nodes • Bundle build to create JAR files • Package created of all apps, JARs, and OSGI configs; package deployed to targets NGA Web Site Redesign – March 2013 16 NGA CQ5 Other Useful Nuggets In hindsight and assuming availability of resources and time, I would have: 1.Specified SOLR as our search solution from the start 2.Sent a few people to CQ5 advanced developer training earlier in the process 3.Sent a few people to CQ5 admin training earlier in the process 4.Committed to Maven + Eclipse rather than using CRXDE for development 5.Would look more seriously at custom oAuth effort rather than using CQ5’s social collaboration module for authentication NGA Web Site Redesign – March 2013 17 NGA CQ5 Other Useful Nuggets Questions? d-beaudet at nga.gov 202.312.2755 NGA Web Site Redesign – March 2013 18 NGA CQ5 Other Useful Nuggets • Multiple run modes: Configure your CQ5 instances with multiple run modes (e.g. dev,author) and use sling:osgiConfig nodes to target specific run-mode specific directories (e.g. config.dev.author, config.dev, config.author) • Avoid Felix console: don’t apply configurations outside of your DEV activities using the Felix console – it’s better to define those in osgi configs and push them as part of your build. Also, don’t mix modes of applying these configurations – it gets confusing otherwise as changes are persisted differently by each. • Avoid memory caching of JCR content if possible (for performance reasons, it’s not always possible) and focus instead on the dispatcher to cache your slowest running pages • Use AJAX to avoid cache flushing: For content shared across the site via base page templates (e.g. site-wide alerts), consider using a separate AJAX call in order to avoid having to flush all of your pages from cache. Instead, just cache the individual component instance. • Use reference components and train content authors how to create and use reference component instances to avoid having to duplicate a lot of identical content across your site. If you’re programmatically including fixed path reference components in a template, consider keeping that content isolated to a specific directory that requires a developer to change that content; e.g. we use reference components outside of the context of a cq:page for drawers on our event pages. NGA Web Site Redesign – March 2013 19 NGA CQ5 Other Useful Nuggets • Use client libraries to consolidate and minify your CSS and JavaScript. Separate your JS (under /apps/appname/component path) and your CSS (under /etc/designs/appname) and use categories and embeds (and possibly dependencies) to roll up all of your client libraries into a few consolidated files at publish time. Also, separate your publish components from your author components by checking the WCM mode before including client libs. Otherwise, you might end up (like we did) finding a 7.3MB widgets.js file included for no reason on your home page. Remember that /etc/designs/yourapp/jcr:content contains the list of components permitted for a given template – you can either store it in SVN or let it live on your production authoring instance – but you should choose one approach over the other and adapt build scripts accordingly. • Use web browser tools and free on-line web site performance analysis tools to suggest ways of improving the load times of your pages and minimizing the number of requests. • Consolidated Search: we had a requirement to spider and show results from non-CQ5 managed web sites within our general site search so we wrote a spider that stores the content of select pages from those sites into the JCR. The web site source is used as a facet so users can opt to select / deselect based on the source web site. SOLR (or another external search engine) could be used for this purpose as well and CQ5 templates would simply wrap the SOLR restful APIs to render results. • Use latest dispatcher • If at first Adobe support isn’t helpful, be persistent, request a phone call – the quality of response is highly variable – I’ve also heard that specifying “Sling” as the component leads to the best customer support personnel, but I haven’t confirmed that. NGA Web Site Redesign – March 2013 20 NGA CQ5 Other Useful Nuggets • Avoid using custom name spaces if possible – they cannot be removed from the JCR once created. • Avoid creating custom node types for the purpose of simplifying JCR queries – our performance tests show that using attribute / value pairs is usally even faster than a query specifying a particular node type. NGA Web Site Redesign – March 2013 21