SAS Mapping functionality to measure and

advertisement
SAS Mapping functionality to
measure and present the Veracity of
Location Data
Richard J Self, Senior Lecturer in Analytics and Governance
Vishal Patel, Final Year Student, University of Derby
Daniel Corah, Final Year Student, University of Derby,
Viktor Horecny , Final Year Student, University of Derby
University of Derby, UK
r.j.self@derby.ac.uk
http://computing.derby.ac.uk/wordpress/people-2/richard-j-self/
2
Objectives
 SAS – Exploring Mapping Functionality to Visualise
Veracity of Location Data
 Lessons Learned about Location Data Veracity and SAS
Visualisations
3
Context
 Smart Device Locations Services is seen as reliable
 May not be true, consequences are many
 Retail LBS based marketing
 Social network apps
 Photo locations in social media and Google Maps
 Forensics
 Criminal Justice system
 Research Question is
 To what extent is A-GPS reliable and in what circumstances?
4
Triggers to Research Project
5
Final Year Student Project
 12 students researching
 3 are co-authors, contributing valuable analyses
 7 students contributed data to this presentation (2460 data points)
 Daniel Corah
 Vishal Patel
 Amna Almutawa
 Ishwa Khadka
 Victor Horecny
 Shehzaad kashmiri
 Farondeep Bains
6
Critical Questions
 Levels of accuracy in different conditions
 Indoors / outdoors
 Rural / residential / urban
 Weather conditions
 Stability of indicated location
 Differences between devices (make / model / operating system)
7
V Patel – Key Insight – Models Vary
phone
N
Mean
Std Dev
Std Err
Nexus
54
41.5629
24.1146
3.2816
iPhone
58
85.5101
113.8
14.9403
-43.9472
83.5987
15.8088
t Value
-2.78
-2.87
Pr > |t|
0.0064
0.0055
Diff (1-2)
Method
Variances DF
Pooled
Equal
110
SatterthwaiteUnequal 62.476
Proc Univariate – Histogram issues
8
D Corah – Key Insight – Stone Built Houses
Proc SGPLOT
9
V Horecny – Key Insight – Chipsets
HTC-M8 (blue) modern chipset
HTC-Desire S (Pink) early version chipset
Uses XL/JMP®
10
Other Insights
 Cloud conditions affect accuracy
 Accuracy variable with time
11
Overall Accuracy of LBS
85% <+ 25 metres
2364 out of 2420 <=
500 m
12
Accuracy Variable with Time
•
•
Start-up of LS max error 360m
Uses Annotate coding and
macros
13
Accuracy Variable with Time
14
Consolidated Data – 2420 points
Red = >
300m
15
Annotate for Time Based Accuracy
 Challenges
 Auto-scaling and boundaries
 Data System
 ANNOMAC coding for labels
16
Raw Data
Long_True_De
Lat_True_Deg g
Lat_Xif_Deg
Long_Xif_Deg Loc_Ind
Image_Path
Date_Time_Stamp
Phone_type
53.962118
-1.308214
53.960864
-1.308306 Open
IMG_0464.JPG
06/09/2014 07:24:24 iPhone 5C
53.962118
-1.308214
53.958194
-1.311589 Open
IMG_0466.JPG
17/08/2014 13:56:51 iPhone 5C
53.962118
-1.308214
53.960864
-1.308306 Open
IMG_0465.JPG
17/08/2014 13:52:47 iPhone 5C
52.911537
-1.484403
52.909194
-1.486745 circuit 1
IMG_01102.jpg
23/03/2015 08:25:46 iPhone 5C
52.911537
-1.484403
52.909194
-1.486742 circuit 1
IMG_01103.jpg
23/03/2015 08:25:47 iPhone 5C
52.911537
-1.484403
52.909194
-1.486742 circuit 1
IMG_01104.jpg
23/03/2015 08:25:48 iPhone 5C
52.911537
-1.484403
52.909194
-1.486742 circuit 1
IMG_01105.jpg
23/03/2015 08:25:49 iPhone 5C
52.911537
-1.484403
52.909194
-1.486742 circuit 1
IMG_01106.jpg
23/03/2015 08:25:50 iPhone 5C
52.911537
-1.484403
52.909194
-1.486742 circuit 1
IMG_01107.jpg
23/03/2015 08:25:51 iPhone 5C
52.911537
-1.484403
52.909194
-1.486742 circuit 1
IMG_01108.jpg
23/03/2015 08:25:52 iPhone 5C
52.911537
-1.484403
52.909194
-1.486742 circuit 1
IMG_01109.jpg
23/03/2015 08:25:53 iPhone 5C
Lat_True_Deg and Long_True_Deg found through Google Maps
Lat_Xif_Deg, Long_Xif_Deg and Date_Time_Stamp read from images using
IrfanView
17
Boundaries
















proc means data=work_derby min max noprint;
output out=means_derby;
var x y;
run;
/* deduce and output corner coordinates (in Lat (Y) / Long Degrees (X)) and output using symput
*/
data _null_;
set means_derby;
if _stat_ = 'MIN' then do;
call symput('min_x', x);
call symput('min_y',y);
end;
if _stat_ = 'MAX' then do;
call symput('max_x', x);
call symput('max_y',y);
end;
run;
18
Auto-Scaling

xsys = '1'; /* using Frame area*/

ysys = '1';

hsys = '3';

dotsize=0.5;

/* plot data in centered 90% of Frame Area */

/* min_x etc set from previous section */

x=(90-(x - symget('min_x'))*90 / (symget('max_x') - symget('min_x')))+5;

y=(y - symget('min_y'))*90 / (symget('max_y') - symget('min_y'))+5;
/*basic size of plotted error dot % of frame */
19
Dot Generation – using annomac macros
















if error < 1 then do;
dotsize=dotsize*1;
/* small dot for high accuracy */
%slice(x,y,0,360,dotsize,darkgreen,solid,3); /* different colors for different errors */
end;
else if error>=1 and error < 10 then do;
dotsize=dotsize*1.5;
%slice(x,y,0,360,dotsize,mediumgreen,solid,3);
end;
else if error >= 10 and error < 100 then do;
dotsize=dotsize*2;
%slice(x,y,0,360,dotsize,mediumyellow,solid,3);
end;
else if error >= 100 and error < 200 then do;
dotsize=dotsize*2.5;
/* large dot for big error */
%slice(x,y,0,360,dotsize,darkyellow,solid,3);
end;
20
Adding Sequence Labels














length color $64
number $4
posn 8.
;
retain posn;
if _n_ = 1 then posn = 0;
.
.
posn = posn + 1;
if posn = 10 then posn = 1; /* similar to using the MOD( ) function base 9 */
if posn=1 then do;
%label(x,y,number,white,0,0,3,times new roman,1); /* Position cannot be added from a variable */
end;
/* in the label macro (last macro parameter) */
else if posn=2 then do;
%label(x,y,number,white,0,0,3,times new roman,2);
end; /* etc. */
21
Final Output – Using Proc GANNO



goptions reset=all border cback=black ctitle=white;
proc ganno annotate=workanno; /* from previous Data Step */
run;
22
Conclusions
 Mapping relies on using Annotate
 Can be displayed in Proc GMAP or GANNO
 GANNO allows simple scaling.
23
Session ID #3202
24
Download