Visualizing Health: Enhancing Public Health through Weave Data Analysis and Visualization

advertisement
Visualizing Health: Enhancing Public Health through Weave Data
Analysis and Visualization
Carol A. Gotway Crawford*, Michael Smyser†, Georges Grinstein‡, Jim Ribble*, Soyoun Park*, Robb
Chapman*, Shweta Purushe‡, Patrick Ryan‡, Franck Kamayou‡, Ekaterina Galkina‡
*Centers for Disease Control and Prevention, Division of Behavioral Surveillance, Public Health Surveillance and Informatics Program
Office. †Public Health Seattle & King County and Department of Epidemiology at the University of Washington. ‡University of Massachusetts
Lowell
ABSTRACT
Wepresentasophisticatedweb‐basedanalytical
toolforuseinpublichealthandcommunity
decisionsupportsystems.Thistoolisbasedon
Weave,aWeb‐basedAnalysisandVisualization
Environmentdevelopedbyacollaborationof
computerscientistsattheUniversityof
MassachusettsLowellandmembersoftheOpen
IndicatorsConsortium.Inthispresentation,we
describeanewWeavecomponent,theAnalyst
Workstation(AWS),beingdevelopedtoenhance
thedatawarehousingandanalyticalcapabilitiesof
theWeaveenvironment.Thisworkisbeingledby
acollaborationofWeavedevelopersandscientists,
surveillancescientistsattheU.S.Centersfor
DiseaseControlandPreventionandpublichealth
practitionersatPublicHealth–Seattle&King
County.Ourworkfocusesonaddingstrongand
flexibleanalyticcapacitytotheexistingWeave
visualizationsoftwareforpurposesofcommunity
healthassessment.Theentiresystemissupported
throughotherfreelyavailableopen‐source
packages,butwillalsoworkwithproprietary
tools.ExamplesofWeaveoutputanduseofthe
AWSprototypewillalsobepresented,aswellhas
howtheprogramdesignwillbeabletomeetneeds
forautomatedhealthdataassessment,andrapid
responseforweb‐visualizedhealthdata
information.
Keywords:
Health
informatics,
information
visualization, visualization systems and tools, open
source software.
Index Terms: H.1.2 [Information Systems]:
User/Machine Systems – Human information
processing; J.3.2 [Life and Medical Sciences]: Health
1 INTRODUCTION
Sourcesofpublichealthdatahaverapidly
increasedwithimprovedlocalandfederal
governmentcommitmentstomakedataavailable
tothepublic[1,2].Suchdatahavebeenvery
beneficialtolocalpublichealthorganizationsfor
localcommunityhealthassessment,researchinto
specifichealthtrendsordiseasepatternsand
evaluationoflocalpublichealthprogramsor
interventions.Theavailabilityofsomanynew
datasourcespresentsmanychallengesinanalysis
andexploration,aswellaspresentationand
dissemination.ImprovementsinInternet
technologynowpermitthepublicuseraswellas
theanalysttoexplorethedataincreativewaysfor
understandingandassessmentusingdynamic
displaysthathelpawideaudiencequicklygrasp
patterns,trends,orcomplexempiricalinformation.
1.1 WHAT IS WEAVE?
WeaveisaWeb‐basedAnalysisandVisualization
Environment.Itisanopen‐sourcecomputational
andvisualizationpackagedevelopedatthe
UniversityofMassachusettsLowell’sInstitutefor
VisualizationandPerceptionResearch(IVPR)in
partnershipwiththeOpenIndicatorsConsortium
(OIC),acollaborativeofpublicandnonprofit
organizationsworkingtoimproveaccesstomore
andhigherqualitydata[3,4].Weaveisfreely
availablefordownloadonGithub[5]andmaybe
usedbyanyorganizationsuchashealth
departmentsorcommunityhealthorganizations
thattypicallymaynothaveresourcestopurchase
orlicenseproprietarysoftware.These
organizationsincludehospitals,clinics,medical
offices,aswellasstateandlocalgovernments.
CurrentlytheOICparticipatesasagoverningand
advisorybodythatincludesmembersfrom18
communityandgovernmentalagencies.The
WeaveUsers’Forumprovidesassistancetoany
userwhohasquestionsaboutthepackage.Weave
allowsformanytypesofinteractivepresentation
formatswhichincludestandarddisplaysofcharts,
graphs,mappingtools,aswellasnoveldata
miningtoolssuchasRadViz[6]anddata‐driven
documentretrievalsuchasInfoMaps[7].All
visualizationsandanalysescanbeembeddedin
non‐webpresentations.Perhapsthemost
importantfeatureofWeaveisit’screationof
“sessionstates”:anarchitecturethatcanrecord
everyeventorinteraction,includingthosewith
externaltoolshavinganApplicationProgramming
Interface(API).Thisallowsuserstoquickly
replicatealltasksleadingtoaparticular
visualization,andallowsuserstoeasilysharethe
commands.
1.2 CHALLENGES: REALIZING WEAVE’S FULL
POTENTIAL FOR ANALYZING AND VISUALIZING
HEALTH DATA
Weaveeasilyloadsdatathatarereadyfor
visualizationandpresentation.AlthoughWeave
cananalyzedatathroughaninterfacewiththeR‐
projectpackage[8]orthroughexploratory
visualizations,itdidnot,initsoriginalincarnation,
handlecomplexdataprocessingroutines.
Consequently,somedatasetshadtobepre‐
processedbeforevisualizationwithWeaveand
thisprocessingandanalysisrequiredusingother
analyticalsoftware.Asusersmayalsorequire
rapiddata‐to‐visualizationoutput,itbecame
importanttoaddenhancedanalyticalcapabilityto
Weavethatwouldallowtheuseofcustomized
analysisroutines,processingandlinkingoflarge,
complexdatasystems,andeasystorageandaccess
ofdataandanalyticalroutines.
2 METHODS: ADDRESSING THE CHALLENGE
RecognizingtheneedforatoollikeWeavefor
clinicaldecisionsupportandcommunityhealth
needsassessment,CDC’sDivisionofBehavioral
Surveillance(DBS)becameanactivememberin
thecollaborativeWeavedevelopmentprocess.
DBSmanagestheBehavioralRiskFactor
SurveillanceSystem(BRFSS),theworld'slargest,
on‐goingtelephonehealthsurveysystem[9].1
Thus,anadditionalgoalofCDCparticipationinthe
Weavedevelopmentprocesswastoallowflexible
analysisandvisualizationofBRFSSdatainacloud‐
basedenvironment.Open‐sourcewasdeemeda
criticalcomponentinordertoleveragecostsand
expertise.Atthesametime,PublicHealth–Seattle
&KingCounty(PHSKC)neededtorevisetheir
analysis,visualizationanddisseminationsystem
thatprovidesanevidencebaseforhealth
protection(trackingandpreventingdiseaseand
otherthreats;regulatingdangerousenvironmental
andworkplaceexposures;andensuringthesafety
ofwater,airandfood),healthpromotion(leading
effortstopromotehealthandpreventchronic
conditionsandinjuries)andhealthprovision
(helpingassureaccesstohighqualityhealthcare
forallpopulations)[10].PHSKCreportsonnearly
100communityhealthindicatorswhichare
presentedonthedepartmentalwebpages.These
indicatorsareanalyzedfortrendsovertimeas
wellasfordifferenceswithindemographic
subgroups(e.g.,age,sex,raceandethnicity)and
geography[11].Inaddition,PHSKCreceives
hundredsofrequestseveryyearforcustomized
healthassessmentreportstomeettheneedfor
departmentfundingorevaluationneeds.Requests
alsocomefromlocalcommunityhealthandother
organizations,generallyneededtosupportgrant
proposalsorotherfundingrequests.Muchofthis
informationisproducedusinglegacysoftware
createdbyPHSKCcalledVistaPHwwhichperforms
dataanalysisofvitalrecordandpopulationdata,
butdoesnotproduceoutputinformatsthatare
readyfordisplayorimmediateuse.Updatesand
revisionstocommunityhealthindicator
informationaretime‐consuming,butneedtobe
performedatleastannuallyasnewdataare
received.
Thus,inresponsetoCDCandPHSKCneedsand
requirements,theIVPRteamhasbegun
developmentoftheAnalystWorkstation(AWS).
RESULTS: THE ANALYST WORKSTATION
(AWS)
WiththeCDCandPHSKCneedsandrequirements,
theIVPRteambegandevelopmentoftheAnalyst
Workstation(AWS).AsWeaveisopen‐source,this
workstationisnowbeingdevelopedthrough
activeengagementbymembersofOpenIndicators
Consortium(OIC)throughtheAgilesoftware
developmentprocess[12]wheretheprojectis
drivenbymemberneedsandparticipation.All
workedtogethertodevelopanarchitecture
extendingWeaveandaprototypetomeetthe
collectiveanddiverseneedsoftheCDCandPHSKC
aswellasotherlocalhealthdepartments.The
AWSisinitiallybeingdevelopedtousedatafrom
theBehavioralRiskFactorSurveillanceSystem
sinceitisthelargestsourceofstateandlocal
healthdata,butlaterreleaseswillinclude
enhancementstoworkwithvirtuallyanydataset
withaccompanyingmetadata.Analytical
developmentshavefocusedontheuseoftheR
statisticalpackage,butadditionalmodifications
willallowtheuseofSTATAandSAS.Thesystemis
beingdesignedtoincludeacomprehensivedata
analysisandpresentationsystemwhichincludes
datawarehousecapabilitiesoronlinedataaccess,
cloud‐baseduseanduser‐definedanalysis
routinesandvisualizations.
3
4 IMPACT
TheAnalystWorkstationwillgreatlyenhance
publichealthdecision‐makingcapabilityby
allowinguserstonowfullyanalyze,visualizeand
reportdatainanintegratedframework.Itcanbe
usedtoassistpublichealthprofessionalsin
determiningdiseaseimpact,recognizingdisease
clusters,andidentifyingpopulationsandareas
mostaffectedoratrisk.Theresultisafully
integrateddataanalysisandvisualizationsystem
whichcanalsobeusedtohelpprioritizeanddirect
criticalresources,anddevelopeffective
communitybasedinitiativestoimprovethe
public’shealth.Perhapsmostimportantly,the
entireWeaveplatformhasbeendrivenbythe
needsof,andincollaborationwith,stateandlocal
publichealthpractitioners.Theendresultofthis
collaborationisanopen‐sourcedataanalysisand
visualizationenvironmentthatissophisticated,
intuitiveandadaptabletolocalpublichealth
needs.
5 CONCLUSION
ThecollaborativeeffortsofWeavedevelopersand
OICmembers,especiallytheCDCandPublicHealth
–Seattle&KingCounty,havebeensuccessfulin
thecreationofdesignsandinitialprogramingfor
addingstrongandflexibleanalyticcapacitytothe
existingWeavevisualizationsoftware.Thenew
Weavecomponent,theAnalystWorkstation
(AWS),isbeingdevelopedtoanalyzeand
dynamicallydisplayinaweb‐basedformatany
publichealthorotherdatathatcanbeloadedinto
thesoftware.Theentiresystemisintegratedwith
otheropen‐sourcepackagesincludingMySQL
(datastorageandmanagement)[13],Apache
Tomcat(webserver)[14]andR(dataanalysis)
[15],butwillalsoworkwithproprietarytoolssuch
asStataandSAS.Theopen‐sourceWeave
softwarewiththeadditionoftheAWShas
potentialformeetinghealthassessmentneedsof
communityandgovernmentalorganizations
worldwideincludingautomatedhealthdata
assessmentandrapidresponseforweb‐visualized
healthdata.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
U.S.DepartmentofHealth&HumanServices.HealthData.gov.
[Online].Available:http://www.healthdata.gov.
U.S.CentersforDiseaseControlandPrevention.NationalCenter
forHealthStatistics.Public‐UseDataFilesandDocumentation.
[Online].Available:
http://www.cdc.gov/nchs/data_access/ftp_data.htm.
Weave.[Online].Available:
http://info.oicweave.org/projects/weave.
OpenIndicatorsConsortium.[Online].Available:
http://oicweave.org.
GitHub.[Online].Available:https://github.com/about.
SharkoJ,GrinsteinG,MarxKA.VectorizedRadvizandits
applicationtomultipleclusterdatasets.IEEETransVisComput
Graph.2008Nov‐Dec;14(6):1444‐51.
KolmanS,DufilieAS,AnbalaganSK,GrinsteinG,"InfoMaps:A
SessionBasedDocumentVisualizationandAnalysisTool,"iv,
pp.274‐282,201216thInternationalConferenceonInformation
Visualisation,2012.[Online].Available:
http://www.computer.org/csdl/proceedings/iv/2012/4771/00
/4771a274‐abs.html.
TheRProjectforStatisticalComputing.[Online].Available:
http://www.r‐project.org.
U.S.CentersforDiseaseControlandPrevention.BehavioralRisk
FactorSurveillanceSystem.[Online].Available:
http://www.cdc.gov/brfss.
PublicHealth–Seattle&KingCounty.StatementofMission.
[Online].Available:
http://www.kingcounty.gov/healthservices/health/healthoffice
r/mission.aspx.
PublicHealth–Seattle&KingCounty.CommunityHealth
Indicators.[Online].Available:
http://www.kingcounty.gov/healthservices/health/data/indicat
ors.aspx.
Beck,Kent;etal.(2001)."ManifestoforAgileSoftware
Development".AgileAlliance(http://agilemanifesto.org/).
MySQL.[Online].Available:http://www.mysql.com.
ApacheTomcat.[Online].Available:http://tomcat.apache.org.
TheRProjectforStatisticalComputing.[Online].Available:
http://www.r‐project.org.
1
U.S.CentersforDiseaseControlandPrevention.BehavioralRisk
FactorSurveillanceSystem(http://www.cdc.gov/brfss/).
Download