SAS Mapping functionality to measure and present the Veracity of Location Data Richard J Self, Senior Lecturer in Analytics and Governance Vishal Patel, Final Year Student, University of Derby Daniel Corah, Final Year Student, University of Derby, Viktor Horecny , Final Year Student, University of Derby University of Derby, UK r.j.self@derby.ac.uk http://computing.derby.ac.uk/wordpress/people-2/richard-j-self/ 2 Objectives SAS – Exploring Mapping Functionality to Visualise Veracity of Location Data Lessons Learned about Location Data Veracity and SAS Visualisations 3 Context Smart Device Locations Services is seen as reliable May not be true, consequences are many Retail LBS based marketing Social network apps Photo locations in social media and Google Maps Forensics Criminal Justice system Research Question is To what extent is A-GPS reliable and in what circumstances? 4 Triggers to Research Project 5 Final Year Student Project 12 students researching 3 are co-authors, contributing valuable analyses 7 students contributed data to this presentation (2460 data points) Daniel Corah Vishal Patel Amna Almutawa Ishwa Khadka Victor Horecny Shehzaad kashmiri Farondeep Bains 6 Critical Questions Levels of accuracy in different conditions Indoors / outdoors Rural / residential / urban Weather conditions Stability of indicated location Differences between devices (make / model / operating system) 7 V Patel – Key Insight – Models Vary phone N Mean Std Dev Std Err Nexus 54 41.5629 24.1146 3.2816 iPhone 58 85.5101 113.8 14.9403 -43.9472 83.5987 15.8088 t Value -2.78 -2.87 Pr > |t| 0.0064 0.0055 Diff (1-2) Method Variances DF Pooled Equal 110 SatterthwaiteUnequal 62.476 Proc Univariate – Histogram issues 8 D Corah – Key Insight – Stone Built Houses Proc SGPLOT 9 V Horecny – Key Insight – Chipsets HTC-M8 (blue) modern chipset HTC-Desire S (Pink) early version chipset Uses XL/JMP® 10 Other Insights Cloud conditions affect accuracy Accuracy variable with time 11 Overall Accuracy of LBS 85% <+ 25 metres 2364 out of 2420 <= 500 m 12 Accuracy Variable with Time • • Start-up of LS max error 360m Uses Annotate coding and macros 13 Accuracy Variable with Time 14 Consolidated Data – 2420 points Red = > 300m 15 Annotate for Time Based Accuracy Challenges Auto-scaling and boundaries Data System ANNOMAC coding for labels 16 Raw Data Long_True_De Lat_True_Deg g Lat_Xif_Deg Long_Xif_Deg Loc_Ind Image_Path Date_Time_Stamp Phone_type 53.962118 -1.308214 53.960864 -1.308306 Open IMG_0464.JPG 06/09/2014 07:24:24 iPhone 5C 53.962118 -1.308214 53.958194 -1.311589 Open IMG_0466.JPG 17/08/2014 13:56:51 iPhone 5C 53.962118 -1.308214 53.960864 -1.308306 Open IMG_0465.JPG 17/08/2014 13:52:47 iPhone 5C 52.911537 -1.484403 52.909194 -1.486745 circuit 1 IMG_01102.jpg 23/03/2015 08:25:46 iPhone 5C 52.911537 -1.484403 52.909194 -1.486742 circuit 1 IMG_01103.jpg 23/03/2015 08:25:47 iPhone 5C 52.911537 -1.484403 52.909194 -1.486742 circuit 1 IMG_01104.jpg 23/03/2015 08:25:48 iPhone 5C 52.911537 -1.484403 52.909194 -1.486742 circuit 1 IMG_01105.jpg 23/03/2015 08:25:49 iPhone 5C 52.911537 -1.484403 52.909194 -1.486742 circuit 1 IMG_01106.jpg 23/03/2015 08:25:50 iPhone 5C 52.911537 -1.484403 52.909194 -1.486742 circuit 1 IMG_01107.jpg 23/03/2015 08:25:51 iPhone 5C 52.911537 -1.484403 52.909194 -1.486742 circuit 1 IMG_01108.jpg 23/03/2015 08:25:52 iPhone 5C 52.911537 -1.484403 52.909194 -1.486742 circuit 1 IMG_01109.jpg 23/03/2015 08:25:53 iPhone 5C Lat_True_Deg and Long_True_Deg found through Google Maps Lat_Xif_Deg, Long_Xif_Deg and Date_Time_Stamp read from images using IrfanView 17 Boundaries proc means data=work_derby min max noprint; output out=means_derby; var x y; run; /* deduce and output corner coordinates (in Lat (Y) / Long Degrees (X)) and output using symput */ data _null_; set means_derby; if _stat_ = 'MIN' then do; call symput('min_x', x); call symput('min_y',y); end; if _stat_ = 'MAX' then do; call symput('max_x', x); call symput('max_y',y); end; run; 18 Auto-Scaling xsys = '1'; /* using Frame area*/ ysys = '1'; hsys = '3'; dotsize=0.5; /* plot data in centered 90% of Frame Area */ /* min_x etc set from previous section */ x=(90-(x - symget('min_x'))*90 / (symget('max_x') - symget('min_x')))+5; y=(y - symget('min_y'))*90 / (symget('max_y') - symget('min_y'))+5; /*basic size of plotted error dot % of frame */ 19 Dot Generation – using annomac macros if error < 1 then do; dotsize=dotsize*1; /* small dot for high accuracy */ %slice(x,y,0,360,dotsize,darkgreen,solid,3); /* different colors for different errors */ end; else if error>=1 and error < 10 then do; dotsize=dotsize*1.5; %slice(x,y,0,360,dotsize,mediumgreen,solid,3); end; else if error >= 10 and error < 100 then do; dotsize=dotsize*2; %slice(x,y,0,360,dotsize,mediumyellow,solid,3); end; else if error >= 100 and error < 200 then do; dotsize=dotsize*2.5; /* large dot for big error */ %slice(x,y,0,360,dotsize,darkyellow,solid,3); end; 20 Adding Sequence Labels length color $64 number $4 posn 8. ; retain posn; if _n_ = 1 then posn = 0; . . posn = posn + 1; if posn = 10 then posn = 1; /* similar to using the MOD( ) function base 9 */ if posn=1 then do; %label(x,y,number,white,0,0,3,times new roman,1); /* Position cannot be added from a variable */ end; /* in the label macro (last macro parameter) */ else if posn=2 then do; %label(x,y,number,white,0,0,3,times new roman,2); end; /* etc. */ 21 Final Output – Using Proc GANNO goptions reset=all border cback=black ctitle=white; proc ganno annotate=workanno; /* from previous Data Step */ run; 22 Conclusions Mapping relies on using Annotate Can be displayed in Proc GMAP or GANNO GANNO allows simple scaling. 23 Session ID #3202 24