Query Size Restriction: The Database Tracker Problem EECS710: Information Security and Assurance Professor H. Saiedian From: Denning, et al “The Tracker: A Threat to Statistical Database Security” ACM TODBS, 1978 A statistical database • Construction of a characteristic formula C – • A logical formula, operators: AND, OR, NOT (~) Common queries count (C) – sum (C; j) – • Examples – – – count (M AND CS) = 3 short for count (Sex=‘M’ AND Dept=‘CS’) sum (M OR ~CS; Salary) = $176K sum (salary <= 15K; Contributions) = $180 2 Compormise • When confidential info is deduced – – – • Positive: deduce a value Negative: learn that a value is not in a given field (e.g., Baker did not contribute $200) Secure: no compromise is possible Example: a person knows that Dodd is a female CS professor count (F AND CS AND Prof) = 1 – count (F AND CS AND Prof AND Salary <= 15K) = 1 – If count = 0, Dodd’s salary is not <= $15K – 3 Setting a lower bound? • Setting a lower bound value helps but not always We know count (~C) = n – count (C) – Ask a tautology count (Prof OR ~Prof) = 12 count (~(F AND CS AND Prof)) = 11 12-11 = 1 female prof sum (Prof OR Prof; Salary) = $194K sum (~(F AND CS AND Prof; Salary)) = $179K Dodd’s salary = $194 - $179 = $15K 4 Need an upper bound also Respond to query (C) if k ≤ count (C) ≤ n k reject otherwise • Note: k ≤ n/2 (otherwise all queries will be unanswerable) • 5 What value for k? If a questioner knows (from external sources) that individual I is uniquely characterized by C, then the questioner will seek whether I has characteristic α • Assume k = 2 • Because count(C AND α) ≤ count (C) = 1 < k questioner cannot use the above example • Questioner may divide C into two parts to calculate count (C AND α) • 6 The database tracker • How? Divide C into C = C1 AND C2 such that count (C1 AND ~C2) and count (C1) are answerable • T = C1 AND ~C2 is called a tracker of I – it tracks down additional characteristics of I 7 Calculating the tracker C = C1 AND C2 • T = C1 AND ~C2 • count (C) = count (C1) – count (T) • count (C AND α) = count (T OR C1 AND α) – count (T) • If count (C AND α) = 0 negative compromise • If count (C AND α) = count (C) positive compromise (I has α) • If count (C) = 1 arbitrary stats about I can be computed from query (C) = query (C1) – query (T) • 8 A tracker example Suppose k = 2 • Query (C) is answerable if 2 <= count (C) <= 10 • Questioner believes C = F AND CS AND Prof is Dodd • Constructs T = C1 AND ~C2 where C1 = “F” C2 = “CS AND Prof” • 9 To verify the tracker count (F AND CS AND Prof) = count (F) – count (F AND ~(CS AND Prof)) = 5 – 4 = 1 To find Dodd’s salary, apply query (c) = query (A) – query (T) sum (F AND CS AND Prof; salary) = sum (F; Salary) – sum (F AND ~(CS AND Prof); salary)= $90K - $75K = $15K 10 Negative compromise also possible count (F AND CS AND Prof AND Salary > $15K) = count (F AND ~(CS AND Prof) OR F AND Salary > $15K) – count (F AND CS AND Prof) = 4–4=0 11