Visual Link Analysis Christopher R. Westphal Visual Analytics Inc www.visualanalytics.com Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 1 Christopher R. Westphal CEO of Visual Analytics, Inc (VAI) Over 22 years of experience Experienced with a wide number of domains: Financial Crimes Money Laundering Frauds (corporate/insurance) Law Enforcement (RMS) Intelligence Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 2 Prerequisites for Analysis It’s not rocket science… Need to use your noodle It’s not nuclear science… …but it does require some intelligence Need to think “outside the box” 1+1=a Need to know your data ! Visual Analytics Inc. Need to learn new techniques Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 3 There are no roadmaps to follow... …or existing references to use... …you have to “make-it-up” as you go along. Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 4 What does… …. a money launderer look like? …. a criminal look like? …. an insider trader look like? …. a generic disorder in DNA look like? …. a manufacturing defect look like? …. a terrorist look like? Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 5 Interpretation Methods Typing: 1 Reading: 40 Hearing: 60 Visual: 12,500 Leverages human facility to process visual information 312 times more efficient than reading text Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 6 Want to Expose Patterns & Trends • Need to use right visual presentation for exposing patterns. • Many times the pattern is not obvious – and using alternative presentations can help expose the anomaly. It’s like trying to find a needle in a haystack Visual Analytics Inc. • How will business processes change once the patterns are found? Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 7 Finding Patterns in the Data Pattern #1 Pattern #3 Pattern #2 The method of data presentation is key to exposing hidden patterns Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 8 Detecting Patterns Placement Influences Interpretation Field-X Field-Y Field-Z X1 Y1 Z1 X2 Y2 Z2 X3 Y3 Z3 X4 Databases X1 X2 X3 X4 X5 X5 Y1 Y1 Y2 Y2 Y3 Y3 Z1 Z2 Z2 Z3 Z3 Z4 Z4 X5 X1 Y3 Z1 X3 Z2 X4 X5 Z4 Z1 X1 Y1 Z2 X2 Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com Y2 X2 Table Visual Analytics Inc. Y1 Z3 Z3 Y2 X3 X4 X5 Y3 Z4 9 Example: Unexpected Commonality Unexpected Commonality • Certain “entities” should never be shared (e.g., SSNs) • Data prone to typos and misspellings • Possible misrepresentation and/or falsifying data on forms • Appearance of avoidance by varying information Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 10 Example: Too Much Commonality Too Much Commonality • Many patterns are exposed due to repeating behaviors • Too many commonalities may indicate organized behaviors • Subjects perpetrate the same crime at different financial institutions • Only minor changes in their underlying Modus Operandi (MO) Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 11 Example: Accumulated Behaviors Accumulated Behaviors • Each unique filing looks valid – need to see it collectively (all at once) • Large numbers of discrete actions forms the bigger pattern • Easy to avoid detection if each transaction appears legitimate • Individual may be using mules to move money in/out of the accounts Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 12 What are Data? Organizations Money People Phones Accounts Meetings Weapons Facilities Vehicles Transfers Claims Events Addresses Drugs Comms Narcotics Passports Email ID Numbs Equipment Vessels Cases Aircrafts Travel Which of the following are Real-World Objects and which are Conceptual Objects? Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 13 Be Consistent with Defining Types Define the type for the entity – not the role Would not want to create: Phone • Caller/Callee • Deposit/Withdrawal • From/To People • Arrival/Destination • Shipper/Consignee • Seller/Buyer Vehicle • Prime/Sub • Payor/Payee Address • Sender/Receiver • Owner/Renter Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 14 Technologies vs. Methodologies • Link Visualization is a tool just as Microsoft Word is a tool • Link Analysis does not replace the knowledge of the user – Improves efficiency – Produces better and higher-quality results • Link Analysis does exactly what it is told to do • Link Analysis makes data explicit • Methodology drives the technology • Need to fully understand your data • Need to have an expectation of what you want to see Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 15 Connect The Dots…. Simple…Easy…Straightforward…???? q p o r What happens if you… • Don’t know the dots? • Have missing/extra dots? • Mess-up the sequence? • Don’t recognize the threat? l m s u k n t v j z w g 6 a x h y f 1 5 b e 2 4 c d i Pattern #1 Enters country on student visa Attends flight-training school Indirect connections to known terrorist Pattern #2 Commercial driver’s license Apply for chemical-hauling permits Purchase storage containers Rent transport trucks 3 …Pattern #X… Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 16 Is this an Important Pattern? Depends on Context Depends on Content UNKNOWN 999-99-9999 111-11-1111 000-00-0000 123-45-6789 Betty Ronald John Ronny NOT DEFINED Depends on Data Quality Roger Ronnie 480-07-7456 Pam Ron Depends on Interpretation 480-07-7456 Depends on Sources The Gipper Mary Dutch-Boy George Depends on Importance Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 17 Is this an Important Pattern? What about this pattern? JOHN SMITH Common values can short circuit a network and potentially lead to inconsistencies. Therefore, it is important to decide how to represent your entities and try to manage the “lowest common denominator” Visual Analytics Inc. Also need to factor the degree of transpositions to determine if it reflects an intentional misrepresentation of the facts. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 18 Is this a Reliable Pattern? Pattern #1 #3 PHONE EVENT 05/16/05 EVENT 01/01/02 EVENT 06/03/05 SUBJECT SUBJECT EVENT 01/05/02 EVENT 06/27/05 EVENT 01/11/02 ADDRESS Pattern #3 #2 Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com EVENT 07/01/05 19 Do these Patterns Make Sense? Pattern #1 Pattern #2 ID NUMBER SSN SSN SUBJECT REPORT ORG ADDRESS PHONE = SS Death Master Hit SSN Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 20 Is this an Important Pattern? 222334444 INVALID Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 21 Which Pattern Is More Valuable? REPORT ADDRESS PHONE REPORT ADDRESS REPORT REPORT SUBJECT SUBJECT REPORT ADDRESS REPORT REPORT REPORT REPORT ADDRESS SSN SSN REPORT ID NUMBER SSN SUBJECT REPORT SSN REPORT PHONE REPORT SUBJECT ID NUMBER ADDRESS PHONE PHONE Pattern #2 Pattern #1 Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 22 What Does this Pattern Tell Us? Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 23 Who is the Most Important Person? Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 24 Methodologies – What’s Important?!?! A single SAR with a large number of SUBJECTS typically indicates some type of fraud-scheme. Data Result Sets SUBJECT has numerous SAR filings utilizing the same ACCOUNT number Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 25 Multiple Sources Single Source What Are We Looking For? TELEPHONE Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com CRIMINAL TERRORISM 26 Source Integration Patterns Property Income Income < $10k Property > $500k (non-compliant) No Income Any Property (non-filers) Overlap Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 27 Entity Resolution Are they the same entity? Source A Jonathan Q. Adams Boston, MA 05/29/1968 DL:54321-123 Source B John Quincy Adams 123 Main Street Boston, MA 774-207-0000 Source C Quincy Adams Bedford, MA 774-207-0000 12/05/1965 Tokenizing Standardization Normalizing Aliasing Value-add Permutation Anonymous Resolution md5_128bit = d35ecc61e4cc6810913e5de7fcb5931c Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 28 Sample Network - Unchanged Multiple references to the same people based on spelling variations in their name. Different color-boxes show the like/similar entities MARIA DAVID EDISON Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 29 Same Network - Consolidated Same exact data and information being displayed from previous diagram Reduced network from 14 entities to 3 entities. Much more readable and comprehendible . Proper data-cleaning is important for highlyvariable data entry processes. Larger frequency between Edison and David Bi-directional flow between Edison and David Transfers only flows from Edison to Maria No transfer between David and Maria Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 30 Network Structures – Interpretation Highly centric network – shows either source/sink behavior and strong influence/control over the network. Vulnerable and easy to monitor and seize assets. Network may be alien smuggling or various fraudulent activities. Visual Analytics Inc. More interconnected nodes provides less overall control over the network. Multiple players act in a distributed fashion to add complexity to monitor or disrupt due to multiple targets of interest. Network may be narcotics trafficking or gambling operations. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com Highly distributed structure shows limited control or oversight across the network. No single control point and network can easily reconstitute using alternative entities. Hard to track and trace. Network may be terrorist financing. 31 Data Quality Impacts Analyses • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • MANHA HAN MANHANHATTAN MANHANTAN MANHANTHAN MANHANTTAN MANHATAN MANHATTAN MANHATTAN K MANHATTAN N Y MANHATTAN NY MANHATTEN MANHATTON MEW YORK NY NEW Y ORK NEW Y ORK NEW YO RK NEW YOEK NEW YOIK NEW YOK NEW YOKR NEW YOORK NEW YOR NEW YOR K NEW YOR, NEW YORJ NEW YORK NEW YORK 10017-1011 NEW YORK 10031 NEW YORK 725 NEW YORK 806 NEW YORK 806 NEW YORK 987 NEW YORK BK NEW YORK CITY NEW YORK N NEW YORK NEW YORK NEW YORK NY NEW YORK NY NEW YORK NY 10001 NEW YORK NY 10002 NEW YORK NY 10009 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • NEW YORK NY 10016 NEW YORK NY 10018 NEW YORK NY 10019 NEW YORK NY 10022 NEW YORK NY 10023 NEW YORK NY 10028 NEW YORK NY 10029 NEW YORK NY 10036 NEW YORK NY 10036-3619 NEW YORK QUEENS NEW YORK ROOSEVELT ISLAND NEW YORK STATE NEW YORK Y NEW YORK, NEW YORK, NEW YORK, NEW YORK NEW YORK, NY NEW YORK, NY 10017 NEW YORKCITY NEW YORKD NEW YORKE NEW YORKJ NEW YORKK NEW YORKQ NEW YORKS NEW YORKY NEW YORK| NEW YORL NEW YORY NEW YOTK NEW YOUR NEW YOURK NEW YOYK NEW YRK NEW YROK NEWYORK NY NY NY NY PLAZA NYC Y Lower Manhattan Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 10004, 10005, 10006, 10007, 10038, 10280 32 Question? What country is represented by the code SA ? What country is represented by the code ZA ? Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 33 Example – Structuring Dentist Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 34 Example – Structuring Same Address Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 35 Example – Structuring Dental Practice Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 36 Example – Structuring More Structuring Date (2004) Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 37 Example – Structuring Original Filing (over $10k) Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 38 Elderly Abuse Pattern… SAR-MSB where SUBJECT DOB < 1930 Notice anything in common among these SAR-MSBs? Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 39 Analysis of the Warranty Data Show all vehicles with fewer than 100 miles brand new cars many will still be on the dealership lot not realistic mileage for general service repairs Show labor only entire cost is based on mechanic time no parts replaced (not traceable) no work was outsourced (not external) Extracted Set Repair Type Review Details 1 hour - $45 1 hour - $60 1 hour - $50 1 hour - $65 Cigarette Lighters Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 1 hour - $45 40 Example – Geospatial Filings • All SAR forms filed by banks in Howard County, Maryland • Filing Years 2004 – 2005 • Approximately 300 filings • Group by CITY/STATE of the subject Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 41 Example – Geospatial Filings • Geo-encoded the Centroid of the Zipcode • Centroid = approx middle of region • Populated GIS viewer with results of encoded addresses Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 42 Example – Geospatial Filings • Filtered out any addresses associated with SAR transactions below $100k • Heavy concentration along I-95 corridor Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 43 Example – Geospatial Filings • Zoom-in to the map • Highlight the boundaries for Howard County • Notice: all but a few of the addresses fall outside the county Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 44 I-94 - Arrival-Departure Record 1. Family Name 2. First (Given) Name 3. Birth Date (Day/Month/Year) 4. Country of Citizenship 5. Sex (Male or Female) 6. Passport Number 7. Airline and Flight Number 8. Country Where You Live 9. City Where You Boarded EVENT 10. City Where Visa Was Issued 11. Date Visa Issued (Day/Month/Year) 12. Address While in the United States 13. City and State 14. Family Name 15. First (Given) Name 16. Birth Date (Day/Month/Year) 17. Country of Citizenship Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com FLIGHT PASSPORT SUBJECT ADDRESS 45 I-94 – Multiple Passport Numbers Identified a courier with over 50 different passport numbers for over 200 travel events Generated a timeline to show the number was changed in July (Mexican Passport) Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 46 I-94 – Multiple Passport Numbers 1) Reverse look-up on address 2) Identified a courier business 3) Expanded to show other I-94 targets Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 47 What Type of Data are in a Date? 4th DoM Sunday 2nd DoW July Summer JULY 4, 1976 2nd WoM Leap Year 186 DoY 3rd QTR-c Banks Closed Holiday-US 2000 March 7 2001 February 27 2002 February 12 2003 March 4 2004 February 24 2005 February 8 2006 February 28 2007 February 20 2008 February 5 2009 February 24 2010 February 16 Visual Analytics Inc. 1976 7th MoY 28 WoY ? DOW - Day of Week DOM - Day of Month DOY - Day of Year DOQ - Day of Quarter WOQ - Week of Quarter WOY - Week of Year WOM - Week of Month MOY - Month of Year QTR-f - Quarter Fiscal QTR-c - Quarter Calendar Season/Holiday/Leap Year 4th QTR-f What do these dates have in common? ? Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 48 Temporal – SAR Filings Represents a very consistent filing behavior Dates for first ½ year occurred more often on Mondays Week of Year Very regular filing (1 per month – except Dec) Unusual change for remainder of year – jumps around a bit Reflect filing behavior of financial institution Day of Week Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 49 Temporal – SARC Filings SAR-CASINO filing that clearly show the individual tends to prefer weekend and holiday gambling time frames. Long weekend Period of inactivity Tells us he is “employed” since he is working during the week. Inactive times correspond to known work periods Holiday – 4th July Long weekend (Labor Day) Period of inactivity Holiday break timeframe is quite active and includes 12/25 Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 50 Convenience Store Deposits Cash Transaction Reports (CTR) filings for a convenience store owner. Business offers “check cashing” for clients. Mondays / Fridays Period of inactivity (2 week vacation) All transactions represent over $10,000. Very consistent Monday & Friday filings All transactions represented are cash DEPOSITS – which is inconsistent for a check-cashing business (the events should be WITHDRAWLS) Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com Pattern abruptly stops 51 Temporal Grid – HOD / WOY After hours Border Crossing After hours Mostly afternoon After hours 12:00 Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 52 Temporal Grid – HOD / WOY Border Crossing Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 53 Little Old Man… • • • • • Use of SAR-MSB DOB between 1920-1930 Same use of ID Number 55 unique filings Narratives state: – “CUSTOMER PURCHASES MONEY ORDERS TOTALING LARGE AMOUNTS VERY FREQUENTLY.” – “CUSTOMER NEEDS THEM FOR PAYROLL FOR EMPLOYEES” – “OPERATES AN INSURANCE BUSINESS” • Over $400k of MSB • Primarily in filed 2006 Visual Analytics Inc. Heavy filings on Mondays and Wednesdays Averages 2-3 transactions per week Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 54 Reference Sources Address Validation Death Master Match SSN Validity Check ITIN Check Watch list / PIP Check Public Records Case Records Public / Common Phone Match Critical Infrastructure Important Dates Sex Offenders Criteria Countries Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 55 A Basic Network Diagram SSN SUBJECT SUBJECT ADDRESS PHONE SUBJECT SUBJECT SUBJECT SUBJECT SUBJECT PHONE SUBJECT SUBJECT EVENT ADDRESS ADDRESS SUBJECT SUBJECT EVENT EVENT Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com EVENT 56 A Value-Added Network Diagram Pay Phone SSDM SUBJECT SUBJECT ADDRESS SUBJECT SUBJECT Sex Offender SUBJECT SUBJECT Embassy PHONE SUBJECT SUBJECT EVENT Prior Case Bad Address SUBJECT EVENT Watch List Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 57 Some Simple Questions… What do these companies have in common? CARIBBEAN HAPPY LINES MODERN ELECTRONIC COMPANY SACKS FACTORY Where can this telephone number be found? 202-289-9313 What is located at this address? 801 Mount Vernon Place, NW Washington, DC 20001 What is the vehicle make/model of this VIN? ZA9BC10U13LA12551 Who owns this Social Security Number? Visual Analytics Inc. 1) OFAC list 2) Greyhound Bus Station 3) DC Convention Center 4) Lamborghini 5) The Woolworth Card 078-05-1120 Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 58 Some Simple Questions… What happened on 03/11/2004 ? Where is 38.898748° -77.037684° ? Who owns this IP address: 198.81.129.100 ? Where is ZIP code 96616-2876 found ? 1) Usama Bin Laden 2) Madrid Bombings 3) White House 4) CIA 5) USS Ronald Regan Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 59 A Simple Question… Who is the FBI’s most wanted fugitive ? Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 60 What Do Terrorists Looks Like? • White / Black / Asian / Arabic? • Man / Woman? • Domestic / Foreign? • Poor / Rich? • Illiterate / Educated? • Adult / Teenager? It’s based on their actions, behaviors, and relationships … Visual Analytics Inc. Copyright © 2006 – All Rights Reserved 301-407-2200 • www.visualanalytics.com 61