Visualizing Health: Enhancing Public Health through Weave Data Analysis and Visualization Carol A. Gotway Crawford*, Michael Smyser†, Georges Grinstein‡, Jim Ribble*, Soyoun Park*, Robb Chapman*, Shweta Purushe‡, Patrick Ryan‡, Franck Kamayou‡, Ekaterina Galkina‡ *Centers for Disease Control and Prevention, Division of Behavioral Surveillance, Public Health Surveillance and Informatics Program Office. †Public Health Seattle & King County and Department of Epidemiology at the University of Washington. ‡University of Massachusetts Lowell ABSTRACT Wepresentasophisticatedweb‐basedanalytical toolforuseinpublichealthandcommunity decisionsupportsystems.Thistoolisbasedon Weave,aWeb‐basedAnalysisandVisualization Environmentdevelopedbyacollaborationof computerscientistsattheUniversityof MassachusettsLowellandmembersoftheOpen IndicatorsConsortium.Inthispresentation,we describeanewWeavecomponent,theAnalyst Workstation(AWS),beingdevelopedtoenhance thedatawarehousingandanalyticalcapabilitiesof theWeaveenvironment.Thisworkisbeingledby acollaborationofWeavedevelopersandscientists, surveillancescientistsattheU.S.Centersfor DiseaseControlandPreventionandpublichealth practitionersatPublicHealth–Seattle&King County.Ourworkfocusesonaddingstrongand flexibleanalyticcapacitytotheexistingWeave visualizationsoftwareforpurposesofcommunity healthassessment.Theentiresystemissupported throughotherfreelyavailableopen‐source packages,butwillalsoworkwithproprietary tools.ExamplesofWeaveoutputanduseofthe AWSprototypewillalsobepresented,aswellhas howtheprogramdesignwillbeabletomeetneeds forautomatedhealthdataassessment,andrapid responseforweb‐visualizedhealthdata information. Keywords: Health informatics, information visualization, visualization systems and tools, open source software. Index Terms: H.1.2 [Information Systems]: User/Machine Systems – Human information processing; J.3.2 [Life and Medical Sciences]: Health 1 INTRODUCTION Sourcesofpublichealthdatahaverapidly increasedwithimprovedlocalandfederal governmentcommitmentstomakedataavailable tothepublic[1,2].Suchdatahavebeenvery beneficialtolocalpublichealthorganizationsfor localcommunityhealthassessment,researchinto specifichealthtrendsordiseasepatternsand evaluationoflocalpublichealthprogramsor interventions.Theavailabilityofsomanynew datasourcespresentsmanychallengesinanalysis andexploration,aswellaspresentationand dissemination.ImprovementsinInternet technologynowpermitthepublicuseraswellas theanalysttoexplorethedataincreativewaysfor understandingandassessmentusingdynamic displaysthathelpawideaudiencequicklygrasp patterns,trends,orcomplexempiricalinformation. 1.1 WHAT IS WEAVE? WeaveisaWeb‐basedAnalysisandVisualization Environment.Itisanopen‐sourcecomputational andvisualizationpackagedevelopedatthe UniversityofMassachusettsLowell’sInstitutefor VisualizationandPerceptionResearch(IVPR)in partnershipwiththeOpenIndicatorsConsortium (OIC),acollaborativeofpublicandnonprofit organizationsworkingtoimproveaccesstomore andhigherqualitydata[3,4].Weaveisfreely availablefordownloadonGithub[5]andmaybe usedbyanyorganizationsuchashealth departmentsorcommunityhealthorganizations thattypicallymaynothaveresourcestopurchase orlicenseproprietarysoftware.These organizationsincludehospitals,clinics,medical offices,aswellasstateandlocalgovernments. CurrentlytheOICparticipatesasagoverningand advisorybodythatincludesmembersfrom18 communityandgovernmentalagencies.The WeaveUsers’Forumprovidesassistancetoany userwhohasquestionsaboutthepackage.Weave allowsformanytypesofinteractivepresentation formatswhichincludestandarddisplaysofcharts, graphs,mappingtools,aswellasnoveldata miningtoolssuchasRadViz[6]anddata‐driven documentretrievalsuchasInfoMaps[7].All visualizationsandanalysescanbeembeddedin non‐webpresentations.Perhapsthemost importantfeatureofWeaveisit’screationof “sessionstates”:anarchitecturethatcanrecord everyeventorinteraction,includingthosewith externaltoolshavinganApplicationProgramming Interface(API).Thisallowsuserstoquickly replicatealltasksleadingtoaparticular visualization,andallowsuserstoeasilysharethe commands. 1.2 CHALLENGES: REALIZING WEAVE’S FULL POTENTIAL FOR ANALYZING AND VISUALIZING HEALTH DATA Weaveeasilyloadsdatathatarereadyfor visualizationandpresentation.AlthoughWeave cananalyzedatathroughaninterfacewiththeR‐ projectpackage[8]orthroughexploratory visualizations,itdidnot,initsoriginalincarnation, handlecomplexdataprocessingroutines. Consequently,somedatasetshadtobepre‐ processedbeforevisualizationwithWeaveand thisprocessingandanalysisrequiredusingother analyticalsoftware.Asusersmayalsorequire rapiddata‐to‐visualizationoutput,itbecame importanttoaddenhancedanalyticalcapabilityto Weavethatwouldallowtheuseofcustomized analysisroutines,processingandlinkingoflarge, complexdatasystems,andeasystorageandaccess ofdataandanalyticalroutines. 2 METHODS: ADDRESSING THE CHALLENGE RecognizingtheneedforatoollikeWeavefor clinicaldecisionsupportandcommunityhealth needsassessment,CDC’sDivisionofBehavioral Surveillance(DBS)becameanactivememberin thecollaborativeWeavedevelopmentprocess. DBSmanagestheBehavioralRiskFactor SurveillanceSystem(BRFSS),theworld'slargest, on‐goingtelephonehealthsurveysystem[9].1 Thus,anadditionalgoalofCDCparticipationinthe Weavedevelopmentprocesswastoallowflexible analysisandvisualizationofBRFSSdatainacloud‐ basedenvironment.Open‐sourcewasdeemeda criticalcomponentinordertoleveragecostsand expertise.Atthesametime,PublicHealth–Seattle &KingCounty(PHSKC)neededtorevisetheir analysis,visualizationanddisseminationsystem thatprovidesanevidencebaseforhealth protection(trackingandpreventingdiseaseand otherthreats;regulatingdangerousenvironmental andworkplaceexposures;andensuringthesafety ofwater,airandfood),healthpromotion(leading effortstopromotehealthandpreventchronic conditionsandinjuries)andhealthprovision (helpingassureaccesstohighqualityhealthcare forallpopulations)[10].PHSKCreportsonnearly 100communityhealthindicatorswhichare presentedonthedepartmentalwebpages.These indicatorsareanalyzedfortrendsovertimeas wellasfordifferenceswithindemographic subgroups(e.g.,age,sex,raceandethnicity)and geography[11].Inaddition,PHSKCreceives hundredsofrequestseveryyearforcustomized healthassessmentreportstomeettheneedfor departmentfundingorevaluationneeds.Requests alsocomefromlocalcommunityhealthandother organizations,generallyneededtosupportgrant proposalsorotherfundingrequests.Muchofthis informationisproducedusinglegacysoftware createdbyPHSKCcalledVistaPHwwhichperforms dataanalysisofvitalrecordandpopulationdata, butdoesnotproduceoutputinformatsthatare readyfordisplayorimmediateuse.Updatesand revisionstocommunityhealthindicator informationaretime‐consuming,butneedtobe performedatleastannuallyasnewdataare received. Thus,inresponsetoCDCandPHSKCneedsand requirements,theIVPRteamhasbegun developmentoftheAnalystWorkstation(AWS). RESULTS: THE ANALYST WORKSTATION (AWS) WiththeCDCandPHSKCneedsandrequirements, theIVPRteambegandevelopmentoftheAnalyst Workstation(AWS).AsWeaveisopen‐source,this workstationisnowbeingdevelopedthrough activeengagementbymembersofOpenIndicators Consortium(OIC)throughtheAgilesoftware developmentprocess[12]wheretheprojectis drivenbymemberneedsandparticipation.All workedtogethertodevelopanarchitecture extendingWeaveandaprototypetomeetthe collectiveanddiverseneedsoftheCDCandPHSKC aswellasotherlocalhealthdepartments.The AWSisinitiallybeingdevelopedtousedatafrom theBehavioralRiskFactorSurveillanceSystem sinceitisthelargestsourceofstateandlocal healthdata,butlaterreleaseswillinclude enhancementstoworkwithvirtuallyanydataset withaccompanyingmetadata.Analytical developmentshavefocusedontheuseoftheR statisticalpackage,butadditionalmodifications willallowtheuseofSTATAandSAS.Thesystemis beingdesignedtoincludeacomprehensivedata analysisandpresentationsystemwhichincludes datawarehousecapabilitiesoronlinedataaccess, cloud‐baseduseanduser‐definedanalysis routinesandvisualizations. 3 4 IMPACT TheAnalystWorkstationwillgreatlyenhance publichealthdecision‐makingcapabilityby allowinguserstonowfullyanalyze,visualizeand reportdatainanintegratedframework.Itcanbe usedtoassistpublichealthprofessionalsin determiningdiseaseimpact,recognizingdisease clusters,andidentifyingpopulationsandareas mostaffectedoratrisk.Theresultisafully integrateddataanalysisandvisualizationsystem whichcanalsobeusedtohelpprioritizeanddirect criticalresources,anddevelopeffective communitybasedinitiativestoimprovethe public’shealth.Perhapsmostimportantly,the entireWeaveplatformhasbeendrivenbythe needsof,andincollaborationwith,stateandlocal publichealthpractitioners.Theendresultofthis collaborationisanopen‐sourcedataanalysisand visualizationenvironmentthatissophisticated, intuitiveandadaptabletolocalpublichealth needs. 5 CONCLUSION ThecollaborativeeffortsofWeavedevelopersand OICmembers,especiallytheCDCandPublicHealth –Seattle&KingCounty,havebeensuccessfulin thecreationofdesignsandinitialprogramingfor addingstrongandflexibleanalyticcapacitytothe existingWeavevisualizationsoftware.Thenew Weavecomponent,theAnalystWorkstation (AWS),isbeingdevelopedtoanalyzeand dynamicallydisplayinaweb‐basedformatany publichealthorotherdatathatcanbeloadedinto thesoftware.Theentiresystemisintegratedwith otheropen‐sourcepackagesincludingMySQL (datastorageandmanagement)[13],Apache Tomcat(webserver)[14]andR(dataanalysis) [15],butwillalsoworkwithproprietarytoolssuch asStataandSAS.Theopen‐sourceWeave softwarewiththeadditionoftheAWShas potentialformeetinghealthassessmentneedsof communityandgovernmentalorganizations worldwideincludingautomatedhealthdata assessmentandrapidresponseforweb‐visualized healthdata. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] U.S.DepartmentofHealth&HumanServices.HealthData.gov. [Online].Available:http://www.healthdata.gov. U.S.CentersforDiseaseControlandPrevention.NationalCenter forHealthStatistics.Public‐UseDataFilesandDocumentation. [Online].Available: http://www.cdc.gov/nchs/data_access/ftp_data.htm. Weave.[Online].Available: http://info.oicweave.org/projects/weave. OpenIndicatorsConsortium.[Online].Available: http://oicweave.org. GitHub.[Online].Available:https://github.com/about. SharkoJ,GrinsteinG,MarxKA.VectorizedRadvizandits applicationtomultipleclusterdatasets.IEEETransVisComput Graph.2008Nov‐Dec;14(6):1444‐51. KolmanS,DufilieAS,AnbalaganSK,GrinsteinG,"InfoMaps:A SessionBasedDocumentVisualizationandAnalysisTool,"iv, pp.274‐282,201216thInternationalConferenceonInformation Visualisation,2012.[Online].Available: http://www.computer.org/csdl/proceedings/iv/2012/4771/00 /4771a274‐abs.html. TheRProjectforStatisticalComputing.[Online].Available: http://www.r‐project.org. U.S.CentersforDiseaseControlandPrevention.BehavioralRisk FactorSurveillanceSystem.[Online].Available: http://www.cdc.gov/brfss. PublicHealth–Seattle&KingCounty.StatementofMission. [Online].Available: http://www.kingcounty.gov/healthservices/health/healthoffice r/mission.aspx. PublicHealth–Seattle&KingCounty.CommunityHealth Indicators.[Online].Available: http://www.kingcounty.gov/healthservices/health/data/indicat ors.aspx. Beck,Kent;etal.(2001)."ManifestoforAgileSoftware Development".AgileAlliance(http://agilemanifesto.org/). MySQL.[Online].Available:http://www.mysql.com. ApacheTomcat.[Online].Available:http://tomcat.apache.org. TheRProjectforStatisticalComputing.[Online].Available: http://www.r‐project.org. 1 U.S.CentersforDiseaseControlandPrevention.BehavioralRisk FactorSurveillanceSystem(http://www.cdc.gov/brfss/).