Data Mining as Pre-EDD Investigatory Tool Team 9 Data Mining Overview • Use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data sets – E.g., Statistical models, mathematical algorithms, machine learning methods • Can be performed on many types of data including those in structured, textual, spatial, Web, or multimedia forms Data Mining Overview cont. • Government and Industry • Most Common Purposes – – – – – – Improving service or performance Detecting fraud, waste, abuse Analyzing scientific and research information Managing human resources Detecting criminal activities or patterns Analyzing intelligence and detecting terrorist activities Advantages as Pre-EDD Tool • Assist researchers by speeding up their data analyzing process, allowing them more time to work on other projects. • Improve effectiveness by Identifying patterns and relationships that may otherwise go unnoticed. • Advances in technology are allowing for more efficient techniques Data Mining Initiatives • Able Danger – The Department of Defense characterized Able Danger as a demonstration project to test analytical methods and technology on very large amounts of data. • National Security Agency (NSA) – Speculation on NSA terrorist surveillance dating back to at least 2002, involving the domestic collection, analysis, and sharing of information. Data Mining Initiatives cont. • The Novel Intelligence from Massive Data (NIMD) Program – NIMD program focuses on the development of data mining and analysis tools to be used in working with massive data. Factors Affecting Use of Data Mining as Pre-EDD Tool…. • Data Quality – In the wake of Choicepoint, Lexis-Nexis, etc. full aware of commercial risks of privacy breaches, bad data quality, accuracy, etc. Access vs. Accuracy • Interoperability – What use is data without proper context and resources? (Collaboration with sharing data, just data mining might not be enough to stop criminals and terrorists or be meaningful). Quantity vs. Meaningfulness • Mission/Purpose – Limiting privacy laws may be useful, but abuse, other uses of data may occur outside of original intention. Authenticity vs. Illegitimacy Limiting Privacy Laws Equals Increasing Oversight? • H.R. 1502 the Civil Liberties Restoration Act of 2005 – Department or agency engaged in any activity or use or develop data-mining technology submit a public report to Congress – A list and analysis of the laws and regulations that would apply to the data mining activity – Laws and regulations that would need to be modified to allow the data mining activity to be implemented – Information on how individuals whose information is being used in the data mining activity will be notified of the use of their information – These reports would be due to Congress no later than 90 days after the enactment of H.R. 1502, and would be required to be updated annually to include “any new data-mining technologies.” Sources • • • • • http://www.gao.gov/new.items/d04548.pdf http://www.gao.gov/new.items/d05866.pdf http://www.fas.org/sgp/crs/intel/RL31789.pdf http://www.contentanalyst.com http://en.wikipedia.org/wiki/Data_mining