YOU ARE BEING TRACKED BY MW RENALD You are being tracked By MW. Renald Table of contents TABLE OF CONTENTS 2 CHAPTER 1. TRACKING YOU AND YOUR DATA, GEOLOCATION 4 1.1 Introduction 4 1.2 User-Volunteered Geolocation Data Collection 6 1.3 Collateral information data collection 8 1.4 Secretly Geolocation Data Collection 9 CHAPTER 2. BIOMETRICS TRACKING 13 2.1 Introduction 13 2.2 Biometrics identification and verification 13 2.3 Forms of Biometrics 15 CHAPTER 3. INTERNET TRACKING, YOU ARE BEING FOLLOWED 22 3.1 Introduction 22 3.2 How the Web and the Internet Work 22 3.3 Tracking with Cookies and JavaScript 23 3.4 Tracking with Browser Information 24 3.5 Tracking by Your ISP 26 3.6 What Can You Do? 27 CHAPTER 4. DATA COLLECTION METHODS 30 4.1 Introduction 30 4.2 How Much Information Is Collected and How? 30 4.3 Predictive algorithms 33 4.4 Shadow Profiling 35 4.5 What can I do? 37 CHAPTER 5. NOWHERE TO HIDE? 39 5.1 Health-Related Data collection 39 5.2 Facial Recognition Technology 41 5.3 Tattoo Recognition 43 5.4 Advertising Kiosks 43 5.5 What Can You Do? 45 CHAPTER 6. THE DARK WEB 47 6.1 Introduction 47 6.2 Tor Browser 47 6.3 The Dark web anatomy 49 6.4 Dark Web Activities 50 6.5 What Can You Do? 53 CHAPTER 7. THE FUTURE OF PERSONAL DATA 55 7.1 Introduction 55 7.2 DNA as Personal Data 55 7.3 DNA profiles 58 7.4 The Future of Privacy Regulations 59 7.5 The Legislative Future of Personal Data 60 Chapter 1. Tracking you And Your Data, Geolocation 1.1 Introduction What if your local police department had the technology to individually track and follow you and any of your neighbours as you walked around town? What if advertisers could do the same? What if your spouse's divorce lawyer could? You are where you go and for that purpose, there is a tracking phenomenon known as geolocation, which identifies where a person is physically and uses that to derive certain influences from that information. The Supreme Court in a case captioned United States versus Jones address some of the law and privacy issues of geolocation in a matter involving the decision of FBI agents to put a global positioning system or GPS tracking device on a vehicle belonging to a man named Jones. A GPS system works by interacting with satellites that orbit the Earth. It was developed in the 1970s for military use, and was open for commercial development only in 1995. It's the basis for the electronic navigator in your car, and on your smartphone. Without it, Uber and Google Maps just wouldn't work. Interestingly, the key to geolocation isn't geography. It's time. That's why GPS satellites carry atomic clocks that tell time by measuring the decay of radioactive isotopes. The clocks are synchronised to each other and to clocks on the ground. To the extent the small errors creep into the clocks, they are corrected every day and that's why GPS is based on accurately knowing what time it is. GPS satellites broadcast their time and their location continuously, a GPS receiver listens for these signals. With the signals from four different satellites, it can calculate exactly where it is, and exactly what time it is based on small differences in how long the signal takes to reach the receiver. The general accuracy of a commercial GPS system falls within about 3 meters or 10 feet. But in practice, it can be even more precise since the receiver is continuously calculating its location and averaging the measurements it's making. GPS tracking is used for a wide variety of commercial and military applications. It's the main way that we navigate. Almost nobody uses paper maps anymore. It's at the heart of futuristic concepts like driverless cars and, of course, GPS is the core idea behind precision guided munitions that hit military targets with great accuracy and specificity. GPS also has surveillance applications, as must no doubt come as a surprise to you since this is a book on tracking, and the example of the criminal suspect Mr. Jones. His name is Antoine Jones and he was the operator and owner of a Washington DC nightclub. FBI agents attached a GPS tracker to a Jeep registered in the name of his wife. The vehicle was parked in a public lot at the time. Authorities suspected Jones was a drug dealer, thus they tracked him to a location that they came to believe was a Drug Stash house, a place outside the home where dealers often hide their narcotics. The house was raided, drugs were found, Jones was arrested, and he was convicted. But the FBI had made a mistake, they hadn't gotten a warrant to put the tracker on the car. The government's argument was simple, cars travel on public roads, public roads are in public view, So Jones has no expectation of privacy in his travels and therefore, no warrant is needed. What the police lost and Jones won, the Supreme Court decided the case nine to nothing. Five justices thought that Jones should prevail for a very narrow and limited reason that the federal agents had trespassing on his property in placing the GPS tracker on his car, and that they were required a warrant to legally do so. Fourth other justices would have decided the case on a broader ground. They said that the collection of a large volume of data, what we've called the big data problem raises constitutional issues, because it allows for the creation of a so called mosaic picture of Jones. In other words, by combining many snippets of information, authorities could piece together a much more comprehensive picture of Jones's life then was revealed by any individual piece. In reality, this picture suggested Jones was a drug dealer, but in other settings, it might have been used to determine whether it was a Democrat or Republican. Thus, Jones illustrates two points. First, the case demonstrates something about the revealing power of geography. What enforcement authorities concluded that they knew what Jones was because of where he went. That's very useful analytics and it's also very spooky. Second, we can infer that the mosaic distinction requires some line drawing, but that nobody knows exactly where the line is. How many snippets of information are enough to create a mosaic? Nobody knows. And the majority of the court didn't answer the question at all. But we can recognise that the Supreme Court is thinking hard about the Fourth Amendment protections against unreasonable searches and seizures and applying them in an era of big data where the search might consist of digital scraps of information. After the Supreme Court vacated, Mr. Jones has committed The government offered him a deal. He refused and went to trial. The jury did not reach a verdict, maybe because there was no GPS evidence. But rather than face yet another trial, Jones reached a plea agreement and was sentenced to 15 years in prison. As we survey the field of geolocation, I want to identify three separate concepts that form a useful framework for our discussion. They are related broadly speaking to the manner in which geolocation information is collected. First, in some instances, we volunteer that information to the world around us. In others, geolocation data, art collateral information, that is necessarily collected as part of some other process like making a phone call. And third, we can talk about a way of collecting geolocation data through surreptitious means. 1.2 User-Volunteered Geolocation Data Collection Aaron Schock is a former republican congressman from Illinois, who seemed to have it all, a safe district, good ideas, solid political prospects and a great social media footprint. He was much more well-known than many of his peers on Capitol Hill and they have been marked with the label rising star. Once dubbed as the ripped representative and the fittest man in Congress by Men's Health magazine. Schock also learned about the dangers of geolocation in a era of selfies and Instagram. Schock spent taxpayer campaign funds on flights aboard private planes owned by some of his key donors. The Associated Press identified at least a dozen flights valued at more than $40,000 on donors planes e also reportedly enjoyed other expensive travel and writing up significant personal entertainment charges, including massages and tickets to music concert. How did The AP know all this? They tracked Schock reliance on the aircraft, partially through the congressman's penchant for uploading pictures and videos of himself to his Instagram account. They extracted location data associated with each image and correlated it with flight records showing airport stopovers and expenses built for air travel against Schock office and campaign records. If you take pictures with your smartphone, you are creating location data about yourself. Your camera stores a bunch of data about every picture you take. It records the aperture, shutter speed, ISO speed, camera mode, focal distance, and sometimes even more than that. All of this is stored in the EXIF data, an extra piece of information attached to every picture file your camera creates. It's called the exchangeable image file. EXIF data has been around since the early days of digital photography. Today, one thing that the photo puts in the Exif is your geolocation. Almost all smartphone cameras geotag the photos they take and once you put the picture up on Instagram, or most any other photo collection programme, it's a simple process to download the photo, select its properties and find the EXIF data for the picture. That's how The Associated Press knew where comes from shock was when he took all the pictures. As shocked as the former Congress might have been after the AP revealed that it had been tracking his whereabouts some of us advertise the same information. We run around round formally tagging our location and checking in at various places. If you use an app like foursquare swarm, you are purposefully broadcasting where you are and it's pretty easy to accumulate that data and use it to draw a picture of an individual's activities. For example, Raytheon has developed something it calls the Rapid Information Overlay Technology (RIOT). RIOT uses only publicly available data from social media programmes like Instagram, Facebook, foursquare and Tiktok. With that information, you can draw a detailed picture of a person based on where he goes. Raytheon understands the power of this sort of analytic tool and the peril. That's why it describes the RIOT tool as privacy protective. Jared Adams, a spokesman for Raytheon's intelligence information systems department, says this ‘’RIOT is a big data analytics system design. We are working along with industry, national labs, and commercial partners to help turn massive amounts of data into usable information to help meet our nation's rapidly changing security needs. Its innovative privacy features are the most robust that we're aware of enabling the sharing and analysis of data without personally identifiable information, such as social security numbers bank or other financial account information being disclosed.’’ We see one of the common tools that systems integrators often use as a means of ameliorating privacy and civil liberties concerns, the tool of partial masking or pseudonymity. By scrubbing with data of personally identifiable information, but still making it capable of being correlated and analysed, you can create a two step process that is fought to be more robust in protecting privacy. At the first step, data that is scrubbed of identity markers is linked together in patterns. Only when those patterns meet some threshold of concern and typically, when some third party or supervisor verifies that the threshold has been exceeded, only then is the anonymity of the data removed and identifying information added back in. In this way large volumes of innocent collateral data can be collected and sifted in an automated fashion. Without, it is said threats to privacy. Of course, to rely on that system, you have to trust the process. Former Congressman Schock Instagram account revealed what we might think of as deliberate, but inadvertent geolocation sharing. He had probably never heard of the EXIF file. Now that he has, he can turn off the identifying information and still use his camera and his Instagram account, just like he used to. The geotag is not an essential function but what about when it is essential? 1.3 Collateral information data collection Geolocation is essential to navigation functions, such as Google Maps, for example. That's the type of functionality you can't really turn off Still navigate. And so the only way to avoid exposing your location data is not to use that function at all. For years, some people refused to get an E-ZPass to travel on toll roads. Their problem with this electronic system of capturing your payment and path as you pass by was that as you track up tolls, the E-ZPass created a record of where you are and where you've been. That's a record a geolocation record that law enforcement and others can collect and analyse just like any other geo record. We have seen E-ZPass type tool records used in everything from criminal investigations to divorce proceedings. They accepted the inconvenience of longer wait time. In exchange for a little bit later, personal obscurity. But some other geolocation functions are, for all intents and purposes, an essential component of modern day life. When that happens, then the sort of surveillance that in other contexts might seem only a bit creepy can begin to become pretty scary and even downright authoritarian. Think for example, about your cell phone. Not all of the super sophisticated location apps that you could do without rather think of the phone itself and the voice and text communications that are probably at the core of your personal mobility and your personal connectivity. These features also allow us to know exactly where you are all the time. Your cell phone is constantly reporting your location to the nearest cell towers. That's how the telephone system knows where you are so it can connect a call to you. Otherwise, cell phone service really just wouldn't work. The phone company keeps those records of where your cell phone is, or was. That means that they know where you are right now and also where you've been. The German politician Malte Spitz used his cell phone records and a Google Maps to create a video log of all his movements for six month time period. He did this to make a point because most of where we go is innocuous. But if I have six months of your travel logs, I also know a great deal about you. Maybe you are not worried about what your phone company knows. But what if they sell it to some commercial advertiser? Or what if the government issues a subpoena, and collect all those records about you? The issue is highly contentious but the law does not protect information you share with a third party. When you voluntarily broadcast your location to the Cell Phone Company or Facebook that means that there's no constitutional rule that prevents them in turn, giving the information to the USA government. That is a pretty hot definition of voluntary, that kind of implied consent has a very forced feel to it. We can't turn the geolocation part of the cell phone off, at least not if we want our cell phones to work, and we can't really quit society. Our consent is in effect, coerce. That's why a few courts in USA are starting to take a different view and extending the law governing warrants to cover cell tower records. They're saying that in the absence of a warrant based on probable cause, the government can't secure these historical records. And that extension, which limits police methods, naturally brings with it problems of a different sort. Sometimes geolocation cell tower data can be powerful evidence of criminality. If you are a fan of forensic files, in one case for example, cell tower data located the defendants’ Mobile phone in close proximity to six different armed robberies. Federal investigators also use cell tower location records to establish that a State Department whistleblower was in the same place as a TV reporter who later published leaked classified information. We need as a society to choose how much or little access we want to give the government to geolocation data. The basis for this choice comes down to a rough form of cost benefit analysis. If we think the value of the positive uses of a technology are great enough, then we will deploy it while trying to manage its use through warrant requirements or data retention rules in the light. But if we are concern that technology provides too much surveillance power to the government, we often consider banning the technology altogether. That step isn't always possible as we are too dependent on the technology. And that I think explains why some groups always fight so hard to prevent a new technology from coming online in the first instance, not because initial users are so abusive but because a step down that technological path can't easily be untaken. Think of Social Security once your identifying number had a sole purpose and no other now it's a personal identifier that's ubiquitous. Knowing what we know now, what might be the right way to structure government access to cell tower geolocation data, would you require it to be deleted upon collection or stored only for a few hours, so that could be used in emergency cases like an ongoing kidnapping, or held for longer periods, but available to the government only with a warrant or never available to the government but you usable by the commercial sector. You don't need to answer that right now. But you do need to be thinking about it. 1.4 Secretly Geolocation Data Collection Let's turn to the collection of geolocation data by the government without your knowledge or consent. At least in the cell phone example, we could point to your implicit agreement through your use of the company's geolocation technology. But what happens when the government starts tracking you without your knowledge, as in the case of Mr. Jones. One important example this is exemplified by a tracking system known as Stingray. When a stingray tracking device is turned on, it pretends it is a cellphone tower. It simulates the call out from the tower to nearby phones, even when they're not turned on. Those phones in turn, respond To the Stingray by reporting in their phone number, and a unique electronic serial ID number. According to the Non-profit civil liberties organisation in Washington, known as the Electronic Privacy Information Centre (EPIC), government investigators and private individuals alike, can use Stingray and others cell site simulators to locate, interfere with, and even intercept communications from cell towers and other wireless devices. EPIC tells us that the FBI has used these simulators to track and locate phones and users since at least 1995. How powerful is this technology? According to The Wall Street Journal, the US Marshal Service flies planes carrying devices that mimic cell phone towers in order to scan the identifying information of Americans phones. As it searches for criminal suspects and fugitives. Under this programme, the government collects data from thousands of mobile phones along the way, it also collects and then it says discards data on a large number of innocent Americans. The Justice Department justifies the phone records collection programme by arguing that it is minimally invasive and an essential way to attempt to track terrorists and criminals. Its main virtue from the government's perspective, is that the programme eliminates the need to go to phone companies as an intermediary in searching for suspects. Rather than asking a company for cell tower information to help locate a suspect, a process that law enforcement has criticised as slow and inaccurate. The government can get that information itself. Naturally, others see that is problematic. Christopher Soghoian Chief Technologist of the American Civil Liberties Union who characterises it as dragnet surveillance. He says Stingray in the air is ‘’… inexcusable and it's likely to the extent that judges are authorising it, that they have no idea of the scale of it.’’ Now, one recurring theme in contemporary issues of law and technology is the balance to be sought between secrecy and transparency. It's important for citizens to know what their government is doing. On the other hand, it's clear that the disclosure of certain surveillance techniques can significantly diminish their utility. Most of what you now know about Stingray is a matter of public record, derived from enterprising journalists and litigants who have run into the programme as part of ongoing criminal proceedings for example. But the US government is anxious, some might say desperate to keep the technology out of the hands of non-government actors. To that end, whenever they're asked to produce the equipment in court, they dismiss the case, rather than disclose how the Stingray actually works. The Washington Post brought us the story of Tadrae Mackenzie, Mackenzie was a small time criminal living in Florida. He and two of his friends robbed another small time crook, a marijuana dealer, armed only with BB guns. At first, the police had no leads to this crime. But the robbers stole the victim's cell phone as well. So the Tampa police got a court order directing the local telephone company to give them the cell tower location data which would allow them to track the device. The cell tower records helped, but they were not specific enough. The records gave the police a general location for where the stolen cell phone was, which helped them narrow their search to a general neighbourhood, but no more. Then the police use the Stingray device to more narrowly focus their investigation. Eventually, using the Stingray, the police were able to narrow down the location of the stolen cell phone to a single specific house, which they put under surveillance and when and when Mackenzie left one morning, he was arrested. Fast forward to the preliminary hearing in Mackenzie's case, the lead police officer, upon being asked how he had been able to identify the specific house where the stolen phone was, declined to answer. He said that he had used a device that was subject to a nondisclosure agreement with the FBI, and therefore, that he wasn't allowed to tell the defence counsel how the machine operated. The Florida judge handling the matter was not amused. He ordered the government to produce details of the stingray and how it operated. Rather than do that, the state prosecutor offered McKenzie a deal. And so McKenzie agreed to plead guilty to a second degree misdemeanour and he received six months probation. Because of the secrecy surrounding how the Stingray works, Mackenzie got off easy. It's almost as if he won the lottery. And that should give all of us pause. If you are a one order advocate, you should be concerned that a guilty man went essentially free. If you're a champion of civil liberties, you should be concerned that the Stingray and its technical details remain secret. We are I'm afraid in a very unsteady and unstable place where neither answer satisfies. More to the point such an unreconciled conflict is simply untenable in the long run. Either the government's use of new technologies in the public sphere will have to be fully disclosed and made subject to adequate oversight, or the police are going to have to give up such surveillance and tracking tools if they can't withstand the scrutiny that comes with its use in a free society. One small step in the direction of reconciling the conflict. The Department of Justice has now said that as a matter of policy, but not legal obligation, federal law enforcement officers will seek warrants before using a stingray. That's probably a sound result but note that the Department of Justice doesn't bind the state and local law enforcement who also use stingray. Here's a final thought from Christopher Soghoian Chief Technologist of the American Civil Liberties Union. He points out that there is yet one more reason why the secrecy surrounding sting raises problematic. The idea is that if the FBI can use sting rays, then so can our enemies. Christopher Soghoian says: ‘’our government is sitting on a security flaw that impacts every phone in the country. If we don't talk about Stingray style tools and the flaws that they exploit, we can't defend ourselves against foreign governments and criminals using this equipment too.’’ Chapter 2. Biometrics tracking 2.1 Introduction At one point in the movie Minority Report, the lead character chief john Anderton is on the run. He needs to change his identity. But he can't, at least not easily because the central government has everybody registered by a unique identifying characteristic, pattern of blood vessels in their eye. This technique, known as iris recognition, is actually in its growth stage today. The future imagined in Minority Report marries the capability of uniquely identifying people through their eye patterns to a small universal scanner that scares like a spider tracking taking the picture of every individual eye in order to identify them. Because your eye pattern is unique and immutable, the government sees this as a way of conclusively identifying malefactors. Citizens see it as a way of exercising control. This is not the stuff of science fiction. During the wars in Iraq and Afghanistan, US forces used iris scanning technology as a sort of digital filing system on civilians and others they encounter. Anderton's fiction is today's reality. The only way for john Anderton played by the actor Tom Cruise to avoid this surveillance is to change his eyes. In a rather gruesome scene, he goes through an operation in which eyeballs harvested from a cadaver are transplanted into his eye sockets. That seems pretty extreme and it's beyond the realm of possible today. It gives you a sense of both the power and the peril of biometric identification. In this chapter, I want to ask, why do we care? What is it about biometrics that make them useful? Then I want to tell you about the technology itself, what it is and how it works and finally, we'll close with some thoughts about the dark side of the technology, how it might threaten civil liberties. 2.2 Biometrics identification and verification Why biometrics are interesting, comes down to the problem of establishing one's identity. If I say to you, my name is Eric Trump. How do you know that I am not Donald Trump pretending to be an eagle? How do we verify my identity? The problem came into Stark highlight after the September 11 attacks. The government's comprehensive review identified a number of gaps in America's security infrastructure, including the USA inability to know who was who. The Florida driver's licence picture of Mohamed Atta has become an iconic symbol of the insecurity of the identification apparatus. Miss identification is a critical and endemic problem and that's why biometrics are increasingly important. In a post 911 world, the USA want to link the biographic information they have available about risks associated with an individual be at risk of financial fraud, abuse of eligibility for benefits or being a potential terrorist to a verifiable biometric characteristic that is a physical characteristic that is in possible to change unless you are Tom Cruise in Minority Report. In every walk of life as a basic building block of risk assessment, we think it's imperative that we have confidence that people are who they say they are. Consider some of the uses to which a verify biometric identity can be put. Getting through the airport is easier for trusted travellers. Establishing access control checkpoints to let people into buildings and computer systems or to keep them out, verifying credit and other consumer behaviour thereby pinpointing or streamlining retail transactions, reducing fraud and resulting in lower fees. Eliminating voter fraud and ending the voter id debate verifying age and legal authorization to drive or vote or drink alcohol and so on. Biometrics are actually among the oldest of new technologies. They began with fingerprints early in the 20th century and today include more novel ideas like gait recognition, which is the ability to identify an individual by his physical movement, how he or she walks. Now biometrics can be used in two distinct ways for verification or for identification. When a biometric system is used to verify whether a person is who he or she claims to be, that verification is frequently referred to as one to one matching. Almost all systems can determine whether there is a match between a person's presented biometric and a biometric template in a database in less than a second. Identification by contrast, is known as one to many matching. In one to many matching framework a person's biometric signal Whether it's an iris or fingerprint is compared with all the biometric templates within a database. Now there are also two different types of identification systems for this framework. Positive and negative. Positive systems expect there to be a match between the biometric presented and the template. The systems are designed to make sure that a person is in the database. Negative systems are set up the opposite way to make sure that a person is not in the system. Negative identification can also take the form of a watch list, where a match triggers a notice to the appropriate authority for exclusionary action. Neither system generates perfect matches or exclusionary filters. Instead, each comparison generates a score of how close the presented biometric is to the stored template. The system is compared compare that score with a predefined number or with algorithms to determine whether the presented biometric and the template are sufficiently close to be considered a match. Most biometric systems therefore require an enrollment process in which a sample biometric is captured, extracted and encoded as a biometric template. This template is typically then stored in a database against which future comparisons will be made. When the biometric is used for verification, for example, access control, the biometric system confirms the validity of the claimed identity. When used for identification, a biometric technology compares a specific person's biometric with all the stored biometric records to see if there's a match. For biometric technology to be effective, the database has to be accurate and reasonably comprehensive. The process of enrollment, creation of a database and comparison between the template and the sample is common to all biometrics. 2.3 Forms of Biometrics There are many different forms of biometrics. We are going to talk about four of the most common fingerprints, iris recognition, facial recognition and voice recognition. Then we will mention two other forms of biometrics, hand geometry and gait recognition and we will end our description of biometrics with DNA analysis. Fingerprint recognition is probably the most widely used and well known biometric. Fingerprint recognition relies on features found in the impressions made by the distinct ridges on the fingertips. There are two types of fingerprints, flat and Roll. It used to be that fingerprint comparisons were made by hand with experienced examiners making judgments about matches. Today, fingerprint images are scanned and enhanced and then converted into templates. These templates are saved in the database for future comparisons using optical, silicon or ultrasound scanners. In Pakistan, the government requires everyone with a cell phone and SIM card to register with their fingerprints, saying it's an anti-terror initiative, since untraceable unregistered SIM cards were proliferating as a means of terrorist communication. Another area where fingerprint biometrics have been used is for identity and access management in healthcare, for example VA or teaching hospitals. The biometric technology is used to solve the challenge of how hospitals can give access to users and yet maintain security levels that provide competence and comfort to their patients. This is a critical challenge since greater security usually decreases access. Using fingerprints has seemed to work as a way of protecting patient privacy without too much inconvenience for the doctors. Iris recognition technology, relies on the distinctly coloured ring that surrounds the people of the eye. Irises have approximately 266 distinctive characteristics, including things like a troubled ocular meshwork striations rings, furrows, a Corona and freckles. Retinal scanning, by contrast, looks at the blood vessel patterns in the iris, it's the same idea implemented in a slightly different form. For iris recognition, typically, more than 170 of the distinctive characteristics are used in creating a template. Irises form during the eighth month of pregnancy and the thought to remain stable throughout an individual's life barring injury iris recognition systems usually start with a small camera that takes a picture of the iris. Pictures then analysed to identify the boundaries of the iris and create a coordinate grid over the image. Then the hundred and 70 characteristics found in each different zone are identified and stored in a database as the individual's biometric template. Iris recognition technology is relatively easy to use and can process a large number of people quickly. It's also only minimally intrusive in a physical sense. However, coloured or bifocal contact lenses might hinder the effectiveness of the Irish recognition system as can strong eyeglasses, glare or reflection can also be problematic for the cameras. In addition and people with poor eyesight occasionally have difficulty aligning their eyes correctly with the camera and people who have glaucoma or cataracts might not be suitable for screening using Irish recognised technology, but it is useful. The United Arab Emirates has found iris recognition to be an effective over security means for preventing expelled foreigners from re-entering the country. The UAE faced a situation in which an expelled foreigner would return to his or her home country and then legally change his or her name, date of birth and address all descriptors traditionally used to screen individuals entering the country since the new identity would not be in any of the traditional maintained name dependent lists, government agents would then admit the banned individual to the UAE when he returned. To counter this problem, the small Arab country began developing a biometric system that could be used to scan all individuals arriving in the country and determine whether the person was banned from entering. The UAE specifications for the system included using a biometric that didn't change over time, could be quickly acquired was easy to use could be used in real time was safe and non invasive and which could be scaled into the millions. The Emirates determined that iris recognition technology was the only technology that produced a single person match in a sufficiently short period of time to meet its needs. According to the country self report, the system is remarkably effective. After the first 10 years, the use of Iris scans has they say prevented the re-entry of 347,019 deportees. A statistical analysis of the programme suggests that the likelihood of a false positive match, that is the system would miss identify someone as registered when they are not, is less than one in 80 billion. Face recognition technology identifies individuals by analysing certain features on their face, may look at the nose with or the eye sockets or the mouth. Typically facial recognition compares a live person with a stored template. But it's also been used for comparison between photographs and templates. This technology works for verification, and also for identification. MasterCard is now in the process of trialling a new facial recognition app for your smartphone that will let you use your face as a way of verifying your identity and approving a credit card transaction. Amusingly, in order to prove that it's a real face taken as a selfie not a picture, you actually have to blink while the picture is being processed to prove you are alive. In addition, facial recognition is the biometric system that can best be routinely used covertly, since a person's face can often be captured by video technology. In other words, you may never know if a photo is being taken of you and compared to some day database. Deep face, the facial recognition technology developed by Facebook is said to be 97% accurate, making it competitive with human distinguishing capabilities. Voice recognition technology identifies people based on vocal differences that are caused either by differences in their physical characteristics like the shape of their mouth or from speaking habits, like an accent. Such systems capture samples of a person's speech as scripted information is recorded multiple times into a host record keeping system. That speech is known as the passphrase. This passphrase is then converted to a digital form and distinctive characteristics like the pitch, cadence and tone are extracted to create a template for the speaker. Voice recognition technology can be used for both identification and verification. The use of the technology requires minimal training for those involved. It's also fairly inexpensive and very non intrusive. The biggest disadvantage with the technology is that it can be unreliable. For instance, it doesn't work well in noisy environments like airports or border entry points. Another form of physical recognition is a measurement based on the human hand and the width, height and length of the fingers, distances between the joints and the shape of the knuckles. It's called hand geometry, using optical cameras and light emitting diodes that have mirrors and reflectors to orthogonal two dimensional images of the back and the sides of the hand are taken. Based on these images, 96 measurements are calculated and a template is created. Most hand readers have pins to help position the hand properly. These pins help with consistent hand placement and template repeatability, so there is a low false positive rate and a low failure to accurately match as well. Hand geometry is actually a mature technology, primarily used for high volume, time and attendance and access controls. Hand geometry works particularly well when many people need to be processed in a short period of time, so long as it's one to one matching. Although people's hands differ, they're not really individually distinct. As a result, hand geometry technology can be used for the one to many matching procedure we discussed a while ago. Hand geometry is perceived as very accurate and has been used in a variety of industries to regulate access controls for more than 30 years. It is useful in identifying who's permitted somewhere or to do something and who is not. It's really very difficult to spoof someone's hand shadow without the person's cooperation. The main advances in the technology over the years has been cost reduction. Today, a wide variety of places rely on hand geometry for access. The San Francisco Airport uses it for access to the tarmac, the port of Rotterdam, Scott Air Force Base and a sorority at the University of Oklahoma all rely on it. By contrast, Gait recognition, which I mentioned earlier, is an emerging biometric technology. It's one that involves people being identified purely through the analysis of the way they walk. According to the Homeland Security news, scientists in Japan have developed a system measuring how the foot hits and leaves the ground during walking. They then use 3D image processing and a technique called image extraction to analyse the heel strike, roll to four foot Push off by the toes. Some say that accuracy and recognition is up to 90%. With the caveat, of course, that if you know you're being watched, you can change your gait. The idea, however, has attracted interest because it's non invasive and doesn't require the subjects cooperation. Gait recognition can be used from a distance, making it well suited to identifying perpetrators at the crime scene. Or imagine if the USA army have been able to see inside bin Laden's hidden in his house. Perhaps they could have identified him pacing on the rooftop, just by his gait. Researchers also envision medical applications for the technology. For example, recognising changes in walking patterns early on, can help identify conditions such as Parkinson's disease and multiple sclerosis in their earliest stages. DNA analysis is perhaps the most accurate biometric method of one to one identity verification. You will likely recall what happened to Bill Clinton after Monica Lewinsky turned over a navy blue dress that she said she had worn during a romantic encounter with the President. Investigators compared the DNA in a stain on that dress to a blood sample from the president. By conducting the two standard DNA comparisons, the FBI laboratory concluded that Bill Clinton was the source of the DNA obtained from Monica lewinsky's dress. According to the more sensitive RFLP test, referring to a restriction fragment length polymorphism used by molecular biologists to follow a particular sequence of DNA as it's passed on to other cells. The genetic markers contained in Mr. Clinton's DNA are characteristic of one out of 7.87 trillion Caucasians. On the flip side, DNA evidence has increasingly come to be used to exonerate the wrongfully accused and convicted. Hundreds of such cases have been overturned at least 20, which involve people who had served time on death row. Biometrics are great. What could possibly go wrong? The answer rest was whether or not we're comfortable with the government having an immutable record of who we are and what we do. One development in recent years that troubles some civil liberties advocates was the case Maryland versus King, which was decided by the Supreme Court in 2013. The case asked the question of whether and when the government could forcibly collect your DNA from you. In general, authorities can collect DNA from people convicted of crimes. But what if you are merely arrested and not yet convicted? The Supreme Court by a narrow five to four majority said that the administrative collection of DNA from all arrestees was permissible even in the absence of a warrant or probable cause. What happened to the rule of innocent until proven guilty? Of course, your DNA is everywhere you are and remains through shedding after you go. With the result in the King case, the government is now free to assemble a template DNA national database of anyone who's ever been arrested for a crime. The best estimate I've seen suggests that the database may in the end, connect contain the DNA, a one in four Americans with a significantly higher rate for African Americans. Not all the samples were collected for criminal reasons, of course, but many were. And all of this suggests that the use of biometric technologies poses a host of interrelated policy questions. Some of the questions one might ask are: Can the biometric system be narrowly tailored to its task? Who oversee the programme? What alternatives are there to biometric technologies? What information will be stored and in what form? To what facility or location will the biometric give access? Will the original biometric material be retained or biometric data be kept separately from other identifying personal information? Who will have access to the information? How will access to the information be controlled? How will the system ensure accuracy? Will data be aggregated across databases? If data is stored in a database, how will it be protected? Who makes sure that the programme administrators are responsive to privacy concerns? Can people remove themselves from a database voluntarily? In effect, can they unenrolled? How will consistency between data collected at multiple sites be maintained? If there's a choice, will people be informed of optional versus mandatory enrollment alternatives? Some of the fears surrounding biometric information include the date the gathered without permission, knowledge or clearly defined reasons, used for a multitude of purposes other than the one for which it was initially gathered, you know, function Creep, disseminated to others without explicit permission. Use to help create a complete picture about people for surveillance, or social control purposes. There are also concerns about tracking, which is real time or near real time surveillance of an individual and profiling where a person's past activities are reconstructed. Both of these would effectively destroy a person's anonymity. Here are some ideas about biometrics to consider: Enrollment in biometric systems should generally be overt instead of covert before one is enrolled in a biometric programme, once you probably be made aware of that enrollment, thus we should be more sceptical of government run biometric programmes such as public facial recognition that permit the surreptitious capture of biometric data. Biometric systems are better used for verification than identification in general they are better suited for a one to one match, assuring that the individual in question is who he says he is and has the requisite authorization to engage in the activity in question. Biometrics are both less practically useful and more problematic as a matter of policy, when they're used in a one to many fashion to pierce an individual's anonymity without the justification inherent in for example, seeking access to a particular location. We should prefer biometric systems that are opt in and require a person's consent rather than those that are mandatory. By this, we should not mean that requiring one to opt in cannot be made a condition of participation. For example, if you want to enter the United States, you must provide a biometric since participation is ultimately voluntary in some way. We also recognise that certain biometric applications like DNA for convicted criminals may need to be mandatory. However, this should be an exception to the general rule of voluntariness. Any biometric system we built should have a strong audit and oversight programme to prevent misuse. Someone must, as we've said before, watch the watchers and finally, we need to be concerned about the security of a biometric database. After all, if your password or credit card number gets hacked, you can change it. It's inconvenient and costly, to be sure, but it can be done. If your biometric data gets hacked as happened to many government employees, and the breach of the Office of Personnel Management security database, there is much more trouble afoot. You can't after all, change your fingerprint. Centralised storage of biometric data also raises privacy concerns by tending to enable easier mission creep. Clearly, for some technologies and applications, local storage won't be feasible, but to the extent practicable, local storage should be preferred. But all this pales next to the larger question of who gets to decide? Should citizens have a right to control their extremely sensitive biometric data? Should for example, the collection of facial biometrics on a public way be impermissible? In one sense, the answer seems like it should be obvious. If I can take a picture of you on the street without your permission, which I can just go on the street and take a shot, why can't the government? On the other hand, it's the government. Today however, the decision to move forward with biometrics is not really the subject of wide public debate. In 2014, the FBI started to use a next generation identification biometric database, with 14 million face images. Current plans are to increase that number to 52 million images by 2015 with more images to be collected in the future. Some communities are even issuing mobile biometric readers to their governmental staff. The staff, usually police officers, but sometimes other regulatory agents can take pictures of people on the street or in their homes and immediate identify them and enrol them in face recognition databases. Biometric technologies are likely to be of great value in creating secure identification, but to be useful and acceptable, they need to be privacy and civil liberties neutral. They can and should be designed with appropriate protocols to ensure privacy before they are implemented, on that perhaps we all can agree. Chapter 3. Internet tracking, you are being followed 3.1 Introduction In our previous chapter, we've discovered some of the many ways your personal data is harvested through different websites and across digital platforms. We've seen how a lot of this happens unexpected ways, but sometimes with unexpected consequences. By now you're getting an idea of the scope of data collection and the wide variety of ways it can be used, both good and bad. And you're developing a sense of where you fall on the privacy spectrum. Now it's time to go deeper into how this works from a technology standpoint, because the more fluent you are in how this happens, the more nuanced and effect if you can be in your decision making. In this chapter, we'll take a look under the hood of web tracking how websites keep track of you and your actions and the data you produce. To do this, we need to begin with a primer on how the web and the internet work, because this will help us better understand the tracking technologies built on top of that. 3.2 How the Web and the Internet Work The web, the internet, these terms are often used interchangeably, but they're actually different things. The internet is the global network of computers. The web is a network of websites that operate on the internet. To use an old metaphor, you can think of the internet like highway system, the roads and interchanges and intersections that are all part of the Internet. The web would be something like a national and local public bus network. It uses the highway system, but it's not the same as the highway system. It's a way to transport things over that network. Information is transmitted over the internet on the web. Web using protocols, which are basically rules about how to format send and receive information. The Internet uses the Internet Protocol IP. And the web uses the hypertext transfer protocol HTTP, you're probably familiar with the HTTP because that's what goes at the beginning of a web address. They start with http://, which tells your browser that you're using the HTTP protocol. Even if you don't type that part, your browser fills it in. Because it's a necessary part of requesting webpages. You may also be familiar with the IP acronym from the term IP address. An IP address is a unique address assigned to every computer and device that's connected to the internet. Anytime you send a request for something online, like a web page, your IP address is included so the recipient knows where to send the information back to When you make a request for information online, there's a series of steps that get followed. Say you're requesting this video, so you can watch it on your device. (Image) Your computer puts together a request for this information. And it goes from your computer to your internet service provider, your ISP, like the cable company that brings internet to your house. From there, it may go to a couple other servers owned by your ISP servers or just computers connected to the internet, then it gets passed out to the internet backbone. This is like the interstate system, a series of very fast core connections and interchanges to get traffic close to its destination. From there it connects to the server you're trying to reach. That server processes the request and sends a request back to your IP address in a reverse route. The exact path your request takes may vary each time. The internet was intentionally set up to have many possible paths between computers. This is because it was originally a project of the US Department of Defence, and they wanted to ensure the network would continue to work, even if some sites were taken out by bombs, attacks or other failures. Servers keep a log of the requests they get and the responses they send. This allows them to analyse their own performance, your IP address is included in the log. This is the first way you can be tracked online. All requests are routed with IP addresses, but you probably don't interact with IP addresses on a daily basis. When you try to go somewhere online, you usually use a domain name like google.com. When you enter that the first step is that it has to get turned into an IP address. There's a large distributed public database that map's domain names to IP addresses called DNS (domain name servers). The first step that your computer takes is to take the domain you entered, look it up in a DNS and get the IP address to route it. These lookups are another way you can be tracked, which we'll discuss more in a bit. The web operates on top of the Internet Protocol. It uses the same processes that run everything on the internet, and then adds a layer to make webpages, images and other data appear in your web browser. Think of it like the Internet Protocol is a truck moving data around and webpages are the cargo. There's a lot more tracking that can happen on webpages, but we'll focus on that later since it's not part of the internet and webs core functionality. But just from the core way the web works, there are logs of your IP address and every web page you visit. This is nothing new and it isn't insidious. Since the earliest days of the web. This information has been recorded. But it wasn't used to track people across the web or to do much to personalise their experience. 3.3 Tracking with Cookies and JavaScript On the web, there are a few technologies that have made their way into tracking infrastructure. Though they were not originally intended to be used that way, Cookies are one of these technologies. You've probably seen this term because a lot of websites now show you an alert that they use cookies and ask you to agree. Cookies are little pieces of code or identifiers that a website places on your computer. There are lots of benign uses of cookies. For example, a website might use cookies to keep track of the fact that you're logged in, or to remember your username for a login screen. It might keep track of what products you viewed with a cookie. In fact, you can disable cookies in your browser, but most modern websites will not work without them. Based on their original use cookie could tell a website that you had visited it before. However, there are now more modern cookies that can track you across many websites and aggregate that information to follow your visits more broadly. JavaScript is also another technology used in tracking. This is a programming language that's used for all modern interactive functionality on the web. But it's so powerful that it can be used to monitor everything you do down to individual keystrokes, you type in forms, even if you don't submit the data. With those terms in mind, let's look at some specific tracking examples to see how these technologies work. You have probably had the experience where you've been looking at a product and then ads for that product show up on other websites, even when you know there's no partnership between the two sites. So how does that happen? This is something called ad retargeting. I like to We call this the Phantom toilet phenomenon. I first encountered this personally when I was remodelling my bathroom. I was on a home improvement website looking at a new tub sink and toilet. Then that exact same toilet showed up on my Facebook page and on a cooking blog and on another website. When I've talked to people about this phenomenon, some of them are very upset and have said they stopped shopping with companies whose products follow them around because they're upset that those companies are tracking their browsing behaviour. However, that's not exactly what's happening. If Emily's Home Improvement uses ad retargeting, they partner with a web advertising company. The Home Improvement site puts a little code in their website. And the ad company uses that to track what products you've looked at. Then, other websites that partner with that ad company have a little code that tells the advertiser find products that this person has looked and show them. This uses a combination of JavaScript and cookies. Only the ad company tracks you across the web, not the individual sites. That said, it can be really upsetting that any company's tracking you. In this case, the cookies that you agree to, when you engage with a website, keep a note of products you viewed. The ad company then retrieves those cookies to decide what ads to show you on other websites. This is a simplified version of what's happening, but it gives you the general gist of how it works. 3.4 Tracking with Browser Information One thing you may not have noticed, though, is that these ads appear across devices. You may shop for that toilet on your home computer, but the retargeted ads may appear on your phone or tablet. How do companies know it's you when you're using different devices? They do it through browser fingerprinting A browser is the application you use to access the web. Internet Explorer, Firefox, Google, Chrome and Safari are all browsers. They know how to send requests for webpages, like when you type in a search or a web address, and how to display the web pages code in a nice way. Browser fingerprinting is a technology that can uniquely identify your browser by its characteristics, just like a fingerprint can uniquely identify you by its characteristics. That's your browser, the app on your device that lets you access the internet, not your device itself. So how does this work? Imagine that there were only two people in the world and one of us had an iPhone and one of us had an Android phone. If an advertiser wanted to tell us apart, they could just look at what type of phone we have. This is information that gets transmitted anytime you come to a web page. You've actually seen that in practice before because some web pages format themselves different Depending on if you're looking at them on your phone, or on a computer, this is because information about the system you're using is sent along with your request for a web page. Now, if there were four people, and some had iPhones and androids, the phone type would not be enough to uniquely identify someone. So the advertiser could look at other system information. For example, if two people have iPhones and two people have androids, and one of each uses Google Chrome is their browser and the other uses a different browser than the combination of what type of phone you have and what type of browser you use, also identifies you uniquely. The Android chrome user would look different than the Android opera user. And the iPhone Safari user will look different than the iPhone chrome user. But if we have eight people in some of them have the same combinations again, this won't work. What else could be used to uniquely identify people? In fact, hundreds of pieces of information about your system setup are available anytime you visit a webpage. That includes things like what version of the operating system you're running, what fonts you have installed on your system, the dimension of the window, you have open, what extensions are installed in your browser, and the list goes on. If an advertiser collects all of that information for every person, that information essentially becomes a fingerprint. It's not necessarily unique for every person, because two people could have the exact same system configuration. But it's unique in the vast majority of cases. About 80% of people can be uniquely identified by the configuration of their system with no personal or otherwise identifying information. This allows an advertiser to know that you're the same person visiting different websites. Even if you don't have any cookies or other stored information. They can see that you're the person with that browser fingerprint. And so they know what other webpages you visited. Essentially, they have a fingerprint for one of your fingers, and they can detect it in a bunch of places. But how is this used to track you across devices? If you have an identifiable configuration on your phone, and your computer, how does an advertiser know to link those two profiles together? They can do this with account information. If you're logged into an account on a website and visited on your desktop computer, the advertiser doesn't need to know any personal information about your account. They just need to know something like a username or user ID number that was logged in on that computer. That's often easy to identify. Then if you log into the same account on your phone, the advertiser can say well This account was logged in from a desktop computer with this fingerprint and it was also logged in on a mobile device with this other fingerprint. So the identifier tells us it's the same person who owns these two fingerprints. Essentially, they now have prints for two fingers instead of just one. They store all this information in a database. So when you come to a website, they grab the information provided about your system configuration, and reference that against their database. This lets them know exactly who's visiting. You can't block this system configuration information from being transmitted. So this is a technique that can be used on you regardless of what privacy settings or other privacy techniques you might implement. It's a very powerful way that your behaviour can be tracked across the web, especially when these advertisers are large. Since lots of commercial websites, even personal blogs have advertising, large advertisers may have information embedded on the majority of web pages that you visit. That means that not only do they know that you've come to a particular webpage at a particular time, they may actually be able to track the series of websites that you visit, because every one of those pages has some of their code embedded in it. Even if they miss a few of those websites, because their advertising is not used there, they get a pretty thorough picture of your behaviour on the web. If you want to know if your browser fingerprint is unique, there's a tool that can help you with that. The Electronic Frontier Foundation runs a tool that will analyse your configuration information, and let you know if it's likely to identify you. And how prevalent are these kinds of trackers. Let's take a look through an extension that blocks them. This is an extension called Ghostery that can be used to block trackers. We'll talk a little bit later in this chapter about how you might use this yourself. One of the interesting things that it does show you is exactly which trackers are installed on the web page that you visit. If we visit a major news website, Ghostery blocks 26 different trackers, it shows how many trackers were blocked. And if we look a little further, it shows a list of where each is from and what category it belongs in. So that's tracking with browser information and cookies. 3.5 Tracking by Your ISP Beyond all the tracking we've just discussed. You can also be tracked by your internet service provider (ISP). They know all the websites you're going to because they provide the internet connection to your house. Thus, any request that you make has to go through them. They see where that request is going to and they can keep a log of it. Until fairly recently, there was not much they could do about that. They were not supposed to use it to target ads to you. And in the United States. The Obama administration introduced rules In 2015, to make this illegal, however, one of the first pieces of legislation passed under the Trump administration allowed this kind of tracking, it allowed ISP to use this data to target you with ads. These regulations direct what internet service providers can collect, and what internet service providers can share. If your ISP is able to see what you do online, and with their service, they may decide to use this only internally to target you with advertising. Or they could sell that information to other people. The law I just mentioned allows it to use with outside advertisers. But internally, Internet service providers have been using that information for even longer. Lots of people get their internet service through their cable providers. And for a long time, cable providers have been telling advertisers that they can quite specifically target people based on their interests. This is not just with web advertising. This is with the actual commercials you see on your television. By combining some of your web activity with your viewing activity, your internet service provider can create a profile of your interests. Then they can use this to show you different television ads then they show your neighbour who may be watching the same programme. It's like the kind of personalised advertising we've grown used to on the web. But many people don't know that it's also happening with the content they'll see on TV. If you're watching TV at night, your cable company may show you different commercials than your neighbours based on the demographic profile they have of you. If you're a retiree, you might see a different commercial than the family with Kids Next Door, even if you're watching the same show. They can combine data from your account with third party data about you and your interests obtained from companies who collect this kind of data about people. Now they can also legally analyse your web traffic to enhance those profiles. Beyond your cable company using your internet data to show you ads, they now have permission to use this information with advertisers online. It's still playing out exactly how this will be used. But certainly, this opens up a whole new source of personal data for advertisers to access and use. So that's a rough guide to how tracking works across websites, across devices, and even across technologies. Tracking is a complicated and evolving phenomenon that can surprise us in its reach and its power. But it's not inescapable. And that's good news for those who are uncomfortable with it. 3.6 What Can You Do? What if you want to block this kind of monitoring? There are several options that you have: The first is to install some browser extensions that block many of this kind of cookie and tracking activity. There are many options available, one of which is Ghostery, or other tracker blockers. You can install these in your browser with one click. Then, when you go to a page, they blocked all the trackers that they know about. Many will also create a report for you so you know what was blocked. If there's a website that needs these kinds of trackers, and that you trust, you can create an exception so that site can use them. Not only does this protect you from a lot of tracking and improve your privacy, but it can also increase the speed at which you browse pages online. That's because it prevents all kinds of code from running in your browser. And it stops a lot of places from putting code onto the webpage, so there's less data to load and less processing that's happening in the background that's not necessary for your web experience. If you do a quick search in your web browsers extensions library, just search for Firefox or Chrome or whatever your preferred browser is, and extensions, you'll find lots of extensions to block tracking. One thing to keep in mind, though, is that some pages really rely heavily on these kinds of trackers and they simply will not function if they're blocked. To deal with that you can either add exceptions in the blockers to allow the trackers to be used on those pages, or install a second browser and use that one on the rare occasion that you need to go to a webpage that requires the trackers. This is my personal strategy. I have one browser that I basically only use to order things because the ecommerce website will not work with all the tracker blocking that I have installed. What about your internet service provider tracking you? There are two main ways that your ISP can see what you're doing online: The first is that they can see the actual webpages you go to because they bring those web pages into your home. The second is that they know what pages you want to visit because they turn the domain name, the thing that ends in .com or .net into the IP address the number that the internet actually uses to send you to a web page. As we discussed earlier, looking up a domain on a domain name server to get an IP address is a necessary step to get information online. Usually, your ISP has their own domain name servers. This means they can log all the lookups done for you, and then they know what pages you're visiting. Those are two separate steps and both are ways that you can be tracked. If you want to stop your internet service provider from tracking you. You need to stop them from seeing information in both steps. To stop them from seeing your domain name lookups, you can have someone else do that for you. In the network settings on your computer, you have the option to specify the IP address for the domain name server you'll use. There are lots of free open domain name servers out there provided by independent entities. You can easily find these with a web search. When you put those in as the domain name servers to use, your computer sends the domain names to those servers instead of your internet service provider to get the IP address, that then blocks your internet service provider from seeing where you're going based on that lookup. The next step is to stop your internet service provider from bringing those actual web pages to your device. That may seem impossible since they're providing your internet service, and you need to get those pages to your device. But there's a clever way around this. You may remember me mentioning VPNs in the first chapter. VPN stands for virtual private network. The way that this works is that it essentially hides your web traffic from anyone who might be looking. Most often, it's used to prevent hackers from seeing where you're going on the web, and to hide your content from them. But it also can be used to stop your internet service provider from seeing the pages that you're browsing. So say you want to visit Google, you type in google.com. Using the domain name server you specified this gets turned into an IP address. Next, your computer sends a request to go to that IP address. Without a VPN, your internet service provider will send that request and return the page to your computer. That allows them to see the content and know the page that you visited. With a VPN, Instead of your internet service provider fetching the page, your VPN host does it. Your computer establishes a secure connection with your VPN server. Your requests always Go from your computer to the VPN service provider. They're encrypted, as is the information that comes back. So your home internet service provider is unable to see any information about what page you visited, or the content that was on that page. It's essentially like a tunnel that goes from your house to someone else's house and anything that goes out to the web goes through that tunnel, and then out through your friend's house. The encryption that happens here prevents anyone except your VPN service provider from knowing what pages you visited. Your home ISP just connects you to the tunnel and they can't see anything else. A VPN is a great way to increase your overall security online and to protect the privacy of your data. And the resources will provide you with some reviews of VPN companies. There are free VPN service providers, but those are not a good choice for privacy because they can see all the pages you're doing too, and they make money by selling information about you. Paid VPNs are very affordable, just a few dollars a month and give you a lot of security and additional privacy. By using an alternative domain name server and a VPN, your browsing activity will be totally hidden from your internet service provider, and you get the added benefit of keeping things much more secure. Right now we're essentially in a high point for surveillance capitalism. Lots of companies base their business on monitoring your every move and monetizing it. The Privacy universe this is creating is troubling. But fortunately, there's something you can do about it. Blocking trackers, using VPNs and hiding your web activity are all pretty easy after initial setup, and let you maintain some privacy in a world that's working hard to know everything you're doing. Chapter 4. Data collection methods 4.1 Introduction When we talk about being careful with our personal data, especially online, we tend to think about what we explicitly choose to share. For example, don't post your address, phone number or other sensitive personal information. Think twice about what photos you share. Don't tell all your intimate personal details to people on Facebook. And while this is all good advice, data is collected about us in a lot of other ways. Some of these we know about our phone companies keep track of who we call, and when and for how long. They know who we text on our mobile devices. Our credit card companies keep track of where we shop, and what we spend. If we use loyalty cards or numbers stores know what we've bought, there's a reasonable discussion to be had about how much of that information should be collected and stored. But in this chapter, we're going to focus on the vast amounts of data that are being collected that you might not know about. 4.2 How Much Information Is Collected and How? Websites and your devices with the apps you have installed are regularly collecting huge amounts of information about you, including sensitive personal information that they probably don't need to operate. Let's start by taking a look at how and how much of this information is collected. And then talk about some of the ways you can prevent its collection. Let's start with an older but quite simple example. Facebook wants to know details of how people use their platform. Their goal is to keep people engaged with Facebook, they want us to use it as much as possible, and to know what kinds of activities keep us engaged. For example, if you're commenting on your friends’ posts, that's good from Facebook's perspective, because it keeps you on Facebook and encourages your friends to engage. And of course, when we comment, Facebook knows that we've done it. So it encourages our friends to respond. But what if you start to post something, and then reconsider before sending it. We're not talking about posting and deleting. We're talking about just typing text in a box on Facebook and never posting it at all. Facebook wants to know when this happens. So they have code that collects information about when you do this. We don't know if they're collecting the text you type, or just data like how much you typed, but the technology exists that would let them grab your comment, store it and analyse it, even if you don't actively post it. When this was first reported, Facebook claimed they were not collecting what people typed in the box, just how much they typed there. But if their policies changed, or if other services use this technology, they could store that information and use it in a variety of ways. The technology to collect text like this is very straightforward to use. Thus, it's safe to assume that a lot of websites, whether they're social media, or ecommerce or communications based, are harvesting information that you type on their platforms, even if you choose not to send it. Your phone is also a goldmine for companies who want to collect information about you. Jeffrey Fowler at the Washington Post wondered about his iPhone. Just how much data was it sending out that he didn't know about? He worked with a company called Disconnect to monitor the data that was sent from his phone and was shocked by the results. In a week 5400 hidden apps and trackers received data from his phone. Here's a quick quote from his story: ‘’ On a recent Monday night, a dozen marketing companies research firms and other personal data guzzlers got reports from my iPhone. At 11:43pm. A company called amplitude learned my phone number, email and exact location at 3:58am. Another called Appboy got a digital fingerprint of my phone. At 6:25am, a tracker called demdex received a way to identify my phone and sent back a list of other trackers to pair up with an all night long, there was some startling behaviour by a household name Yelp, it was receiving a message that included my IP address once every five minutes. Apps, trackers in your phone itself can track your location, your pattern of movements, who you call and how long you talk, who you text with and how often what other apps you have installed your phone number, your contacts, sometimes your photos. And in most cases, we did not know who those companies are or what they're doing with our data. Some of them are tracking us in especially surprising ways. Have you had this experience, you're out to dinner, your phone is on the table untouched. And in the course of the conversation, you may say something like, you know, next year for spring break, maybe we'll go to Costa Rica, you don't search for it. Don't note it down. Don't touch your phone at all. But the next day, you start seeing ads for Costa Rican tourism all across the web. This happens because some apps turn on the microphone and passively listen in on what's happening in the background. They can use this to pull out keywords you say or to identify TV shows or songs or commercials you're hearing. All of this feeds into a profile about you, your interests and your activities. One widely publicised scandal involving this behaviour happened during the Women's World Cup. La Liga, a national Professional Football League in Spain used their app to spy on users, they turned on the microphone to listen in and hear if the user was in a bar or other establishment that had the game on TV. Then, using location services, they could identify exactly which establishment that was. If the bar didn't have a licence to show the game, La Liga could come after them. Essentially, users phones were hijacked to catch bars showing the game who hadn't paid for the rights. They were able to do this because users allowed microphone and location access to the app when they installed it. But the app did not say what it planned to do with that access. Even without access to the microphone, though, it's possible to listen in on you. Phones have accelerometers which can tell how fast you're moving in three dimensions. This is used for things like the compass, fitness apps that count steps, and for games you control by tilting your phone, there's no special privacy permission that controls access to the accelerometer. Researchers have shown that the sound from talking causes vibrations that the accelerometer picks up and an app could analyse those and convert it into speech. Even when we do know our devices are listening, they may capture more than we expect. Consider the case of Timothy Verrill, who's accused of murdering Christine Sullivan and Jenna Marie Pellegrini. In January 2017, Verrill allegedly believed that Pellegrini was a drug informant on the night of the murder he broke into Sullivan's home and brutally beaten stabbed the two women to death. There was an Amazon Echo personal assistant device in the home and prosecutors took the device. They believe there may be some recordings of the actual murder on it. A judge in the case also ruled that Amazon had to turn over any recordings they had. That's not the only instance where an echo has been involved in a murder trial. In 2015, Victor Collins was found floating facedown in the pool of his friend James Bates. Bates was initially arrested for murder, but the charges were later dropped because the evidence did not support the charges. However, in the midst of the investigation, Amazon turned over recordings after Bates said he would voluntarily supply them. These types of personal assistants work by listening for a weak word like hey Siri, or Hey, Google, or Alexa. They record what follows upload that audio to the host like apple or Amazon or Google, where it's processed, analysed, and a response is generated and sent back. If someone were to activate an echo while something criminal was happening, Amazon may well have a recording of it. Now, perhaps you're not criminally inclined and therefore are not particularly worried about your personal assistant device incriminating you at trial. But the fact that these devices can and do make recordings in your home without your knowledge should set off alarm bells. Just how much does Amazon collect from devices like this? A lot. 4.3 Predictive algorithms In late 2018, a German Amazon user requested an archive of all his data that the company held a right he has under the European privacy general data protection regulation (GDPR). Amazon sent 1700 recordings from someone else, including some recordings the person had made while he was in the shower. The recipient contacted Amazon but never heard back from them. So eventually he went to the magazine publisher Heise with what he had received. The journalist found that by listening to recordings that asked for weather that mentioned people's first and last names, and friends information, they were easily able to identify the voice on the recordings, and the man's girlfriend. Amazon says releasing this data to the wrong person was a mistake. But the fact that they save thousands of recordings from people is an interesting fact on its own. These contain deeply personal data that when aggregated, can reveal a tremendous amount about a user. Why collect all that data? because it can be used to profile you. The realm of things that can be understood from simple data is vast and growing. In the first chapter, we talked about how liking the page for curly fries was a strong indicator of high intelligence, and how analysing likes could reveal your race, religion, gender, sexual orientation, drinking and drug habits, intelligence, and much more. But that technology is relatively dated, the range of things we can discover about people from their personal data has expanded. Researchers were able to find out what your political leanings were based on who you follow on Twitter. That might not be surprising. Some conservatives are more likely to follow other conservatives and liberals are likely to follow other liberals. However, they were able to detect this even looking at neutral data, like what national parks you follow. That is surprising, but not as surprising is this. More recent work has been able to predict things that will be true about you in the future, before you even know it yet. In the first chapter, I described a study from Cornell that set out to identify someone's spouse or significant other on Facebook by looking only at which of their friends knew one another. As they analyse their data, they accidentally discovered that they could predict whether someone's current relationship was likely to last or fall apart in the near term. And a lot of us have been looking at building algorithms to predict future behaviour and attributes. One interesting piece of work investigated whether an algorithm could identify women who were at risk for developing postpartum depression by analysing their social media feeds. The research followed women on Twitter over the course of their pregnancies. The researchers collected data on their interactions, frequency of posts and language they use. They then followed up to see which women developed postpartum depression and which didn't. They use this to train an algorithm to identify women at risk. All the women changed the way they use Twitter during their pregnancies. But the women who developed postpartum depression changed in opposite ways from those who didn't. For example, those who did become depressed increased the number of questions they asked over the course of their pregnancies, while it decreased for women who did not become depressed. The research didn't look at what those questions were. Maybe they were pregnancy related, but they could have been questions about anything, TV shows, sports or just life. The use of verbs, adverbs and pronouns also increased in one group, but decreased in the other. Overall, the algorithm used these clues very effectively. On the day a woman gives birth, it can predict with high accuracy if she'll develop postpartum depression. And if the data extends a few days after giving birth, it's almost 85% accurate. On one hand, this is incredibly promising work. Postpartum depression is an insidious condition, women often don't report it because they believe they're expected to be happy and joyful just having given birth. A tool that can accurately predict it would allow a woman's doctor to push a button when she comes in to deliver her baby, and know if she should be monitored more closely. However, this could also be misused, someone's boss or insurance could run it on them and deny them coverage or opportunities as a result. It highlights a lot of the concerns that comes with this technology, especially when it operates on public data. It can really help people in the organisations they work with. At the same time, it can be incredibly intrusive and used in unfair ways. We don't yet know how to strike the balance. I mentioned earlier a study from the University of Maryland that looked at alcoholism recovery. Let's talk about that one a little bit more. Researchers from the University of Maryland went onto Twitter and found everyone who had announced that they were going to their first Alcoholics Anonymous meeting. Now, of course, this takes the anonymous part out of it. they're not sure how that impacted their results. But they made sure to filter out jokes or people who are going to support someone else. And they were left with a dataset of hundreds of people who clearly were drinking too much, and felt like they needed to get it under control. They then followed what they tweeted after that to determine if they were sober 90 days later. This is a good indicator of early addiction recovery. They made sure people said explicitly what their status was, it could be that six months later, they were celebrating their six months of sobriety. And then they know they were also so Britain 90 days, it could be that a week later, they were complaining about being hungover at work. And so they knew that they hadn't made it in 90 days without drinking again. So They have this explicit data for hundreds of people. And then They gathered everything they had done or posted on Twitter up until announcing they went to that first Alcoholics Anonymous meeting. And with that data, They built a model that would predict based on everything they did up until announcing that they were going to a whether or not they would be sober 90 days later. So essentially, once you decide to go, you can push a button, analyse your tweets, and know if the program will work. They model works exceptionally well. It's right about 85% of the time. So is this good or bad? A lot of artificial intelligence technology works in a blackbox manner: you pour the data in, you get a good answer out. But you don't get any insight into why it's true. They wanted to build this algorithm to give insight, so they looked at things that addiction researchers might consider like whether you have a social circle full of people who drink a lot, or if you have poor mechanisms for coping with stress, which is common among people with addictions. And they modelled those characteristics on Twitter for their algorithm. So if the algorithm says it looks like you won't make it 90 days, They could tell you that you might need to change up your social circle, because your friends talk about drinking a lot. Or you might want to get some cognitive behavioural therapy to help you deal with stress in a more productive way. So in that sense, the research is good and could help people. On the other hand, there are lots of ways that these algorithms could be misused, whether it's employers deciding to fire you from your job, or the justice system using the results to determine whether you go to jail for a DUI. But for all of these kinds of algorithms, it's worth considering what's the worst thing a person could do with them. Because we've seen lots of data scandals such as Cambridge Analytica arise out of the misuse of these algorithms. We'll also talk about those specifically in a the fourth chapter. The point of these stories, though, is that there's a lot of information that can be uncovered about you from background data that you may be unaware is being collected. On top of that, you're even less likely to be aware of the algorithms being applied to that data, and the power, they may have to understand things about you. And basically, you can't hide from them. 4.4 Shadow Profiling Let's use Facebook as an example here. Whenever I talk to groups about this, I asked people to raise their hand if they don't have a Facebook account. And in a large audience, there may be five or 10 people who don't have Facebook. But the correct answer is really that everyone has a Facebook profile. Some people have made it themselves. And for other people, Facebook has made it for them. If you have not created a Facebook profile, Facebook still knows a lot about you. For many people who are not on the site, Facebook creates something they call a shadow profile. Basically, it's very easy to know when a person is missing from a social network. There's a hole where a person should be and Facebook can easily figure out who that person is that the hole represents. So if you don’t have a Facebook account, and you've never shared any information with Facebook, how did they build a profile about you? And what could they come up with? I want to mention here that Facebook admits that they have these shadow profiles, but we don't know a lot about how much data they have in each one, or how they calculate that data. So what I'm going to tell you about is straightforward technology that could be used to build one of these shadow profiles. We don't know if it's exactly how Facebook's doing it. So this is an educated guess, if I were running a social network like Facebook, and I wanted to build a profile of people who hadn't signed up, this is how I would do it. Also, we're focusing on Facebook, but most other big social media platforms can and may do this too. The most obvious information to use is other people's contact lists. If you have a friend who has a social media account, and they use the app, they likely have given access to their contact list. Many platforms ask for this because it allows them to pair you up with other people who are using the app. It does that by downloading a list of your contacts along with their data, their phone number, their email address, their street address, maybe a photo. And if they have another user in the system with that same email address or phone number, they can suggest that you become friends with that person. Essentially, a phone number or an email address are unique identifiers. If you're not in Facebook, but you are in someone's contact list, when Facebook, download your friends contact list, they now know that you exist, what your name is, what your phone number is, what your email addresses, and maybe things like your street address or your website or a picture of you. Getting that from one person is useful. But the vast majority of Americans who have internet access, also have a Facebook account. So if you're not on Facebook, most of your friends still are, and you're likely in many of their contact lists. So when they give permission for Facebook to access their contacts, Facebook retrieves your information from a lot of different people. So now Facebook doesn't just know that you exist, they also know who a bunch of your friends are. And since these friends have profiles, some of them are likely friends with each other, while others are from different social circles. This can reveal your interests. For example, several people may be in your neighbourhoods Facebook group. And so if you have three or four people who you know in real life, and they have you in their contact list, and they're on Facebook, and also part of the neighbourhood group, Facebook may infer that you're likely to live in that same neighbourhood, especially if they have your street address, it can show that you live nearby. Similarly, if you go to a church or temple or mosque, and you have friends on Facebook, who have you in their contact list, and who go to that same religious institution, and who are friends with each other, Facebook may be able to infer your religion from that. The same thing applies if a lot of your friends share an interest in a sports team, or list the same employer from this information that your friends have provided. Facebook knows your name, your location, your contact information, a bunch of your friends in many of your interests. This information can also reveal other traits. For example, there's research that shows especially for men, it's quite easy to determine their sexual orientation based solely on information about their friends. So even if you keep your sexual orientation private online, if you have friends who do not Facebook may be able to tell your sexual orientation just from information that other people provide. As you can imagine, these details about people who have opted out of the system can be used in a variety of ways with both good and potentially dangerous consequences. 4.5 What can I do? The first thing to do is to check out how much of your data is being shared. There are apps that will allow you to do this. The Privacy Pro app created by the disconnect team we mentioned earlier, that helped the Washington Post investigation has a free option that will let you monitor trackers and information being shared in the background on your phone. It will also block them which speeds up the performance of your phone and protects data that would be shared. On social media, Delete old posts. When less information is available about you algorithms can discover less about you as well. Check your privacy preferences, on your phone Turn off apps’ permission to access your location, contacts and other information unless it's really critical. To stop background data collection, select the options that prevent apps from running in the background and if they do not need to contact the internet As is the case with a lot of games turn off their rights to use cellular data. For example, I have a couple games that I play on my phone. I'm not playing them with anyone else. So I don't need internet connectivity to play the game. The game works fine. For example, when I have the phone in aeroplane mode, and I'm not connected to the internet. Thus, there's no real reason that phone app should be using any data at all. Occasionally, it may need to be updated. But that's a standard app update that you do from your phone's App Store, not something that the app would do on its own. If you block apps from using data, it prevents them from being able to send any information about you out to the world. It may also stop them from downloading and showing ads to you. Now, this can disable some features. For example, the game I like to play most has an option where you can get a bonus feature in the app by watching a 32nd video, downloading that video requires internet access. And so if I block the app from using data, then it's unable to get those video ads and thus, I can't use the feature to get the bonuses for watching the video. I'm fine with this trade off. I would rather not have bonus points in a game and protect my data than to give away a lot of information for a few in game trinkets. But ultimately, you need to decide which option feels best for you. For each individual app, you can go into the controls about using data and turn it on or off. You'll need to leave this on for apps that require the internet like social media maps, web browsers and other apps that use online information. Technical steps alone cannot stop the mass collection of data about us, but we can make it a lot harder. To get true protection though. We will need better legal protections. Chapter 5. Nowhere to Hide? 5.1 Health-Related Data collection We have talked a lot about surveillance through digital channels that we probably expect. We know that social media companies are recording everything we post and sharing it. We also know that companies that we buy things from probably keep track of the patterns in what we buy to try to offer us other products. But surveillance is out in the world in a lot of ways that we may not suspect. We will talk in other chapters about how our devices collect data about us. But what about our interactions when we are off of our devices and out in the world? Just how pervasive is that sort of surveillance. One story that highlights how difficult it is to hide from this kind of mass surveillance was shared in 2014 on salon.com. Sarah Gray wrote an article about one woman Janet Vertesi, who was an associate professor of sociology at Princeton at the time. She was pregnant and wanted to keep that private. She decided that for the full nine months, she would not share any information about her pregnancy in any digital form. Obviously, that means that she was not posting about it on social media, but also was not doing any other real world digital interactions that would reveal she was pregnant. That meant that she was only calling people to tell them about her pregnancy. She asked her family members and friends not to post about it on Facebook. One article about her efforts reported that she had an uncle who sent her a congratulatory message on Facebook and she unfriended him in response. She wanted to do some research about her baby and baby products on the internet but she didn't want that to be tied back to her. As we will see, browser fingerprinting and other technologies can be used to uniquely identify you when you're on the web. Your internet service provider can also track what websites you're visiting and use that to advertise to you. To prevent that kind of tracking, she used the Tor Browser. We'll discuss that more in our chapter on the dark web and traditionally, it's associated with people who are investigating or carrying out nefarious deeds online. In this case, it was just used to keep news about an impending baby from being digitally tracked. Her efforts also meant that she would not register for baby gifts at stores that had online registries. When she was buying things for her baby she wouldn't use a credit card. We know retailers keep track of what we buy and analyse that to offer us new products. Even brick and mortar stores track our purchases through our credit card use. Any of these companies may be selling information about our purchase histories in ways that we don't know about if you buy items with a credit card, those purchases are easily linked to your identity and can be tied into large marketing databases. This means one single purchase of a baby item could affect your profile, so you're marked as pregnant and start receiving marketing material about parenting. Avoiding that means paying in cash in offline stores and getting creative to shop online. She created a new Amazon account with a new email address that would have packages delivered to a locker not to her home address. This made it very difficult to associate that Amazon account with her specifically. However, as we mentioned, she was not using credit cards to buy anything so how do you shop online like that? Her solution was to use cash to buy prepaid amazon gift cards, which she would then load into her profile. One really interesting story from her efforts highlights the problems that can arise here. She and her husband wanted to buy a stroller on Amazon. The stroller was expensive, and so they needed a lot of gift cards. Her husband took $500 in cash to a local pharmacy where the gift cards were sold and tried to buy enough to cover the price of the stroller. When he went to check out the pharmacy told him that they had to report the transaction because excessive cash spending on gift cards is suspicious. That's because this is how terrorists do a lot of their business, for exactly the same reason they don't want to be tracked and analysed in their digital activity, they don't use credit cards. Still, they may want to do a lot of things online and so they use cash to buy prepaid gift cards. Essentially, if you don't want to be tracked, you look like a terrorist. With normal behaviour were tracked, constantly monitored and marketed to and if one were to opt out of that type of pervasive tracking it looks suspicious and possibly even illegal. Janet's story serves as a cautionary tale of just how difficult it can be to keep very personal information about yourself to yourself. Companies are also trying to collect more information about us with a veneer of consent, even though we may not know exactly what's going on behind the scenes. Health insurance companies, for example, offer some people discounts or gift cards if they link their fitness tracker with their insurance account. Of course, if you take a lot of steps, it makes sense that you might get rewarded for that. Auto insurance companies are taking similar steps by giving people tracking devices that can monitor their speed and driving habits. In exchange for the discount, people may give away their privacy in that domain. But what about when we don't know? The Houston Chronicle shared a story in 2018 about a man who had sleep apnea. And who used a CPAP machine to help him breathe at night. These machines need replacement parts like filters and hoses that insurance will pay for. When the man got a new machine, he registered it and opted out of receiving communication. However, after the first night, he woke up to an email congratulating him on his use the night before. Later, he talked to someone at the company, who mentioned that the device was working well at keeping his airway open. She knew that because she had a report of his usage. This was something his old machine did, but that was recorded on a removable card that he would bring to his doctor's office. This machine, without his knowledge was transmitting data about his usage. Not only that it was sending it much more widely. It wasn't just going to his doctor, it was going to the company who made the machine and to his shock to his insurance company. And insurers use this data to deny coverage to patients who aren't using the machine enough. Even with strong federal protections for health related data, this type of monitoring seems to be legal when patients agree to the terms that come with their devices. 5.2 Facial Recognition Technology Outside the home, facial recognition technology is another space where real world surveillance is becoming more sophisticated. We all know that surveillance cameras are everywhere when we are moving around in public. Private businesses have them some municipalities have them and devices like ATMs also have built in cameras. As a result, our movements can often be tracked. But privacy is preserved in a way because so many people walk past these cameras. The images from the cameras are owned and controlled by lots of different people. As a result, it's difficult to aggregate all this together to follow a single person's movement. But that may change in the future as technology and integration develops. You only need to look to examples of police trying to track the movements of a victim or a suspect on cameras, see just how difficult this can be. It requires going into businesses asking for copies of their video footage which sometimes isn't working or sometimes is misstated or blurry. It requires watching hours of footage to try to identify exactly the right person and the time that they walked past and to reconcile that with what other cameras show. This, of course, makes things difficult for police but for the average person who's just moving about, it also means it's very difficult for any large organisation to keep track of our movements. China is a counter example to this where there's massive state surveillance that can indeed be used to track the movements of people on a large scale. Part of the way China is able to do that is with facial recognition technology. Facial recognition algorithms can identify an individual person by analysing the patient pattern of their facial features. It's a technology that many large corporations are working on. Facebook has a good facial recognition algorithm. You may have noticed this working if you upload a picture and it automatically identifies the people who are in that photograph. However, not everyone has access to such a huge database of people's photos and so there are only a handful of companies with large and accurate facial recognition systems. Some of these companies like Amazon, are selling that technology to third parties. There's been a lot of controversy around this. First, the technology is not extremely accurate as we'll see, it works much better for white men than it does for women and people of colour. That means that when errors are made, they're more likely to be made for those groups. This was highlighted in August of 2018 when the American Civil Liberties Union did an experiment using Amazon's facial recognition software. They compared 120 California lawmakers’ images to a database of 25,000 mug shots. The algorithm incorrectly identified 28 state legislators as criminals, even though none of them had ever been in jail and they were not the people matched in the mug shots. That's a pretty high error rate for an algorithm that's being deployed and used by police forces or other organisations. The way that this technology might be used makes that even more troubling. For example, there was a plan that has since been rolled back to link facial recognition technology and criminal databases with video doorbells. So when someone comes to your door, the video doorbell picks them up, runs their face against the set of databases, and can identify if someone with a criminal record is at your door. However, we know that there's a lot of inaccuracy in these algorithms, and they tend to be more inaccurate and make more mistakes on people with darker skin. This means it's likely to reinforce existing social biases. Furthermore, in neighbourhoods where there are higher densities of people who have been in jail, it means that people's criminal records will be constantly at the forefront of everyone's mind. Friends and family members will be reminded that the people they spend time with have been in jail. There are real social implications to doing this kind of thing, even if the algorithms are right all the time. There's a lot of debate over the right way to use these algorithms. The inaccuracy and the potential for them to create a variety of social problems have led to bans on the use of facial recognition technology by government departments, including police agencies in some cities. However, we're in the early days of this technology, it's possible that going forward that facial recognition may become more integrated into applications. It will require close monitoring if it's to be used in a fair way. In fact, even if the accuracy problems are solved, it's hard to say if there even is a fair way for this to be used, it would constitute a dramatic escalation in the way people are monitored through their everyday movements. This is an area where I personally have a lot of concern. I think this technology should drive the development of privacy legislation globally, as it's one of the greatest threats to personal and civil liberties that we face. 5.3 Tattoo Recognition Beyond facial recognition technology exists to individually monitor people and their associations in other ways. Considered tattoo recognition, facial recognition looks at the biometrics of your face to uniquely identify you. Tattoo recognition does a similar thing, scanning an image of the tattoo to distinguish it from any other. However, tattoos may be nearly identical between two people. If we both have a star or a flag or logo tattoo on our forearm, an algorithm may have a hard time telling them apart. But the fact that we have the same tattoo is still interesting. It may reveal that we're part of the same group. Maybe that's the same branch of the military. Maybe it's that we're in the same gang. Data Collection about tattoos is already quite advanced. NIST, the US National Institutes of Standards and Technology provides government and law enforcement with a list of characteristics to note about tattoos, including type, location, colour and imagery. Law enforcement has long used tattoo imagery to identify gangs in members of hate groups but tattoo recognition technology allows this to be carried to a new level. People on streets that are monitored with cameras, even existing surveillance systems could have their tattoos automatically scanned, cross referenced and flagged as potentially gang related. Essentially in otherwise an anonymous person can be labelled as a gang member without any other action. We have seen this kind of analysis go wrong before. Daniel Ramirez Medina, a 25 year old immigrant who had been granted dreamer status, was arrested in 2017. The government tried to strip him of his protected status alleging he was a gang member because of a tattoo he had. The tattoo is actually the name of the place his family was from in Mexico. Eventually, he was released and a federal judge restored his status and barred the government from asserting he was a gang member. However, the process took a year to sort out and could still drag on. Now, this wasn't a case of using automated tattoo identification, but it shows the consequences of mistaken tattoo Association. Imagine this scaled up and automated and the potential for tremendously impactful mistakes is clear. 5.4 Advertising Kiosks Taking pictures of us and monitoring us is not just limited to identification for law enforcement purposes. There are now advertising kiosks that will analyse your face as well. The Wall Street Journal reported that some shopping malls in South Korea had installed kiosks that have maps of the mall with lists of the stores. I'm sure you've seen these before. But in this case, each kiosk had a set of cameras and a motion detector. When someone came up to look at the map or browse the stores on the screen, those cameras in detectors used facial recognition type systems to analyse the face of the person using the map. Why would they do this? They weren't trying to uniquely identify that person, but rather to estimate their gender and age. From there, the kiosk could drive them to different stores or show them ads for other products. A young woman may see ads for something different than an older man. This is not just a science fiction technology. This is something that's been actually used in shopping malls already? Do we want this kind of processing to happen? It respects privacy more than facial recognition but it's still invasive. It blurs the line between surveillance cameras that we've become somewhat used to that monitor us in stores, presumably for public safety and to deter theft and facial recognition technology that's monitoring and recording our movements is unique, identifiable people. When we are in public spaces, we know that we can be seen by other people who are there and we know we can be monitored in different ways. But we may not expect that the way we look, act or move through those spaces will result in personalised advertising directed specifically towards us. Our reactions to this kind of technology should also consider how our data is handled. For example, in the kiosk situation, what's being stored? Is it a person's age and gender being recorded or Just using the moment? Could the kiosk owner analyse the demographics of people who used it? Are copies of people's photos being stored, or being shared with third parties? The fact is, when we walk up to a kiosk like this, we generally have no idea that a camera is present or that it's finding out information about us. Because of their privacy laws, a system like this is unlikely to be able to operate in Europe. Collecting this kind of personal data about a person would require explicit consent and obvious transparency. And because European laws require an explicit opt in, people would have to essentially push a button that says Yes, they're willing to have their picture taken and personal information analysed in order to show them ads. That really defeats the purpose of passively analysing people with a system like this. Strong privacy rights mean, you're unlikely to see these kinds of kiosks in Europe. This kind of surveillance in the world highlights the need for legislation that will clarify what kind of privacy we should be able to expect and what kind of monitoring we can avoid. Digital surveillance traces can also make their way into the offline world. In the summer of 2017, the news outlet The intercept published an article about Russian attacks on US voting systems. Russian military intelligence launched a cyber-attack against one manufacturer of us voting hardware. They also executed a spear phishing campaign, which sent targeted emails to over 100 election officials trying to get them to download a Microsoft Word document infected with malware that would give Russians full control over the officials’ computers. This scoop was the result of a top secret NSA report that had been shared anonymously with The intercept. The person who shared it knew better than to email it NSA systems are closely monitored and personal electronic are not even allowed in the building. Instead, the source printed the report carried it out and mailed it to the intercept. As the NSA tracks who prints every document, and when the FBI investigated the leak, they claimed only one person who printed the document had email contact with The intercept. However, even if there were no email contact, the person who shared the document could have been easily caught. Many colour printers include an almost invisible series of dots on each page they print. These include the date and the serial number of the printer. The images of the NSA report that the intercept included in their article included these dots though the news outlets certainly didn't know it at the time. And the source was indeed caught. 25 year old NSA contractor Reality Winner was arrested and eventually sentenced to five years and three months in federal prison for violating the Espionage Act. While there are many interesting aspects to this story, the fact that printers are including surveillance material without permission or disclosure is surprising. 5.5 What Can You Do? We have already talked about ways to avoid being tracked digitally. But as the story of Janet Vertesi shows actually doing so can be almost impossibly difficult. In terms of digital surveillance, it's really important to think about your comfort zone and what sort of effort you want to expend. Offline surveillance is even more difficult to control since we often don't know when we are being watched, and to what end. Some surveillance in public is inevitable, and has benefits for public safety and security. But too much can threaten individual liberties and freedoms. The difficulty of analysing that data has protected most of us from the most troubling consequences so far, but the technology and the algorithms are improving every day. What can we do? This is a case where individual efforts of control might not be very effective. Surveillance and its consequences can only really be controlled through regulation and policy. If you feel strongly about surveillance and its impact on you, I would encourage you to get to know the privacy laws in place in your country or community and become active in trying to improve those laws. Guidelines that bestow rights on each of us to decide how much we share about ourselves, especially with profit driven surveillance systems are likely our best hope for a future with less monitoring. But we have a ways to go before these structures are in place. Chapter 6. The Dark Web 6.1 Introduction You've probably heard of the dark web and likely in a rather nefarious context. What is it? The dark web is a place where illegal activity happens on the internet from drugs and weapons dealing to trading software viruses. But it's also a place where plenty of people go for legitimate reasons, including because they want privacy. This is especially true if they're living in an environment where free speech is suppressed and speaking against a government or religion is punished. In this chapter, we are going to talk about how the dark web works, what you can find there and the ways it connects to issues around in your personal data online. The dark web is called dark because it's not accessible from regular browsers and it's not indexed by search engines. It's the same technology as the web and operates with browsers and all the things that you're used to online. But to get there, you need to be able to access that part of the web network. This is done using the Tor Browser. 6.2 Tor Browser TOR stands for The Onion Router and was originally developed by the US Navy. The Tor browser is built on top of Firefox, so it will look very familiar. You can download it for free and you can use it just like you would use any web browser. The key differences are: 1. It can access sites on the dark web and 2. It protects your web traffic from snooping. In a previous chapter, we talked about how your web usage can be monitored as a way of tracking on you online, your internet service provider, advertisers or entities who are looking in from the outside, can see all the sites that you go to and build an understanding of what you're interested in and what you're doing. If that's something that you want to keep private Tor protects that as well. Let's start with the technical fundamentals of how the Tor Browser works. As I mentioned, Tor is built on top of Firefox, and so it basically works mostly like the Firefox browser. However, it's designed to protect your web browsing by routing it differently. Instead of connecting you directly to the webpage you want to access, Tor routes your traffic through a series of intermediate servers. For example, if you were sitting at home, and you want to access Google to do a web search, any standard browser will just connect you directly to Google. Your request will go from your computer to your internet service provider. Then hop across the internet backbone until it finds Google. Google then passes the webpage back to you along that path. Every website works this way, and it'll have a log that a request was received from the IP address of your home computer. Instead of finding a quick path from you to the page you want to visit, Tor passes your request through several intermediate servers, it may take your initial request and route it to an intermediate server that say in Belarus. That server knows how to get the request back to you at home, and it routes your traffic to another location, say in South Korea. The Korean server does not know your home location, but it knows that it needs to send information back to Belarus. Then it will forward the request on to yet another location say Brazil. Again Brazil only knows that it has to pass information back to the last time Stop South Korea, it doesn't know where you're actually connecting from. There are a series of these steps and any server in that series only knows the server that came directly before it in the chain. Thus, even if your traffic were intercepted at one of those servers, no one would be able to track it back to your computer. Eventually, your request will reach Google. They'll then return the page to the last server who requested it. That server will pass it back in the chain, and this repeats until the page finally reaches you. This gives you a great deal of privacy with respect to your web searching habits. Since no one can trace a request back to you. Your home IP address is only known by the first server in the chain. Servers on the Tor network do not log this information so no one can piece together that chain to track a request back to you. If you are concerned about privacy. Tor is an additional technique that you could use If you wanted to hide your web browsing from your home internet service provider, or from anyone else who might be snooping. It works extremely well. If that's the case, why didn't I mention it when we talked about ways to protect the privacy of your web browsing? Well, routing your traffic around like this dramatically slows down your web experience. If you try to go to Google from your home computer in a regular browser, you hardly notice a delay before the page appears on your screen. On a slow day, it may take one second before it appears. With the Tor Browser. Accessing that same website in the same way, may take five or 10 seconds. Sometimes it even fails and you have to try to connect to the website again. That's because if you route your request around the world a few times, every server has to wait and respond. And that really slows down what's happening. Ultimately, whether you think that kind of delay is acceptable or not comes down to personal preference. There are other inconveniences that come with this traffic routing, the website at the end, maybe using basic information about where you were coming from to determine what to show you. If it looks like you're coming from another country, some websites may not work. For example, when I was trying out the Tor Browser, I tried to order a pizza in it. But when I went to the pizza website, it told me I couldn't order because they didn't currently delivered to Belarus. Even though I was not there, it didn't matter that I told them I was not in Belarus, they looked at where my traffic was being sent to and use that to determine my location. Now I may use the Tor Browser, but I can't use it for pizza. Like many things, there are trade-offs between privacy and convenience. How much you want to protect your privacy, and how much inconvenience you're willing to put up with for that protection is a personal decision. People may want this kind of protection simply if they're privacy conscious. But it becomes more important if you live in places where you know your web traffic is being monitored. In many countries who do this kind of monitoring VPNs virtual private networks that encrypt data coming from your computer are banned. So traffic can't be hidden that way. Tor provides a way around this. Of course, people who are engaging in illegal activities also want this kind of privacy. But don't worry, it's perfectly legal to use the Tor Browser. And lots of people use it for legitimate purposes. So there's nothing wrong with downloading it and giving it a try. Protecting the privacy of your web activity is one of the main features of the Tor Browser. The other is that it's able to access the dark web, which is what we're here for. Let's talk a little bit about the dark web and its anatomy. 6.3 The Dark web anatomy The dark web is not a different technology from the regular web. The main way that you can tell the difference between a dark website and a regular website is that dark websites all end with .onion top level domain instead of things you're familiar with like.com or .net. If you try to access a .onion website with your regular browser, your browser will just think that you've put in an incorrect address and the browser will not be able to get to it. The Tor Browser, on the other hand can access these sites. The domain names of .onion sites look different than what you would expect on the regular web. Instead of being able to choose your own domain name like cnn.com or google.com, or Ilovepizza.net, every dark web domain name is 16 characters followed by .onion. Just like with the regular web, anyone who's connected to the dark web can set up a server, and host a website if they have the technical skills to do so. On the regular web, if you want a domain name, like Obama.org, you need to register it with a domain name registration service. That map's your domain name to the IP address of the computer that hosts your website. A distributed database of these mappings is kept on lots of computers, the domain name servers that we talked about in an earlier chapter. On the dark web, there's a similar domain name service but instead of just choosing the word that you want for your domain name, you basically pick from a list of all available 16 character strings. This means that almost every website on the dark web has a meaningless domain name (e.g 5fghkl54752dmnd45.onion) that's just a bunch of letters and numbers followed by dot onion. That means that it's pretty much impractical to memorise domain names on the dark web, like you do on the regular web. Some websites you're familiar with exist on the dark web. For example, Facebook is on the dark web. And while that is facebook.com on the regular web, they have to follow the same rules and have a 16 character domain name with dot onion to be on the dark web. That means they can't be Facebook.onion. Instead, they are Facebookcorewwwi.onion on the dark web. Since domain names essentially can't be memorised, it would be useful to have good search engines for the dark web, however, that's not the case. There are dark web search engines, but they're more like using a search engine from back in the 1990s on the regular web. The results are often irrelevant lead to broken webpages and are missing a lot of relevant information. This is not because there are no professionals building the search engines, but rather that things change at a much faster rate on the dark web. There's a lot of nefarious activity going on there. Popular sites will get attacked by hackers that bring the service down. Or they're targeted by law enforcement because illegal activities are taking place there and then they'll shut down. Once they're gone, it's very easy for them to simply reopen with a new dark web domain. And since those domain names are not meaningful, or memorised, it's not like they're losing important branding. But frequently, changing domain names means that search engines can't rely on sites being in the same place for very long. Thus, a lot of the way that you find things out on the dark web is by word of mouth. This is not the only way that the dark web feels like a throwback to the regular web of the 1990s. Looking at websites on the dark web also feels very old school. They tend to have very simple interfaces, no fancy scripts or graphics, and little tracking information because no one on the dark web wants to be tracked and the Tor Browser prevents meaningful tracking anyway. Thus there are simpler and less professional looking, but they load very quickly. 6.4 Dark Web Activities If you want to get on the dark web, you need to get the Tor Browser and then find the domain of a site you want to go to. As an example of just how to get on, we can look at Duckduckgo.com. This is a reputable search engine that's available on the regular web and also operates on the dark web. If we go there, we see something that looks pretty much like a regular website and works like a regular website because it is a regular website. We're just accessing it in a slightly unusual way. First, launch the Tor Browser (torproject.org). You can access any web page from Tor because it's a regular browser. But we can also access pages that end with .onion. Having a dark web presence allows people in countries with oppressive governments and restrictive internet to access the site while covering their tracks. There are also online marketplaces on the dark web that sell legal and illegal things. So far, I haven't made a very compelling case for why you would want to use the dark web, you have to use a browser that's much slower than a regular browser, search engines don't exist in the same way so it's very hard to find what you want and there's a lot of sketchy things going on there. Why do people use it? In the context of your personal data, there are two things that are relevant. First, the dark web is where personal data that's been stolen can be found. The other is that it's a way to keep your activities more private. Let's start with that good one. How do you keep things more private by using the dark web? Remember that you can do lots of normal things on the dark web. So we're not necessarily talking about keeping criminal activities private. There are plenty of perfectly legitimate activities going on in the dark web, including people playing games, or having political debates or sharing news. You can find the full text of popular books along with pirated content that while illegal, is of interest to a lot of people. If you're discussing sensitive topics, being able to do that anonymously, in a way that can't be tracked is attractive. Certainly, if you live in a country where you know web use is closely monitored, the dark web is very attractive. It's a place where you cannot be tracked, your information is encrypted, you can speak anonymously and discuss important civic issues without fear of governmental retribution. In this way, the dark web embodies a lot of the ethics of the early web that focused on freedom of expression and that freedom is a tool for improving people's lives. Even if you aren't engaging in something sensitive, the ability to discuss and interact without being monitored or monetized is very attractive to a lot of people. That said, the dark web is also a place where a lot of illicit things happen. A study of over 2500 dark websites found that well over half of them included some kind of illicit or illegal content. One of the major activities that you can do on the dark web is to buy things and you can buy pretty much anything. You can buy stolen credit card numbers, stolen login information, drugs, guns, pornography, computer viruses, and the services of people who will help you do more of those illegal things. You can hire hackers, currency traders and hitmen, the marketplaces where this happens look a lot like a low version of eBay (grymktgwyxu3sikl.onion/market). People can create listings with photos, other people can bid or directly buy the products and money is held in escrow like often happened with online auction websites before PayPal was popular. Once the products are delivered, the money goes to the seller. How is it that people can buy a kilo of cocaine online and not get caught? In addition to the anonymity offered by the dark web, the rise of Bitcoin and other cryptocurrencies has enabled these sorts of transactions to take place anonymously and securely. Bitcoin and cryptocurrency are terms you've probably heard, but you may not know what they are. Essentially, they're currencies that were invented. They're not tied to any government or company. Transactions with cryptocurrencies are recorded in public Ledger's maintained by volunteers. The transactions are anonymous, with each person identified only by a string of random looking letters and numbers, and the data within them is encrypted. So it remains secret to everyone except the two people in the transaction. Ideally, you can convert cryptocurrency into any other currency, but it's highly volatile. And whether or not conversion works reliably and safely is still up for debate. It often involves meeting strangers in fast food, parking lots to do the exchange. You can easily buy bitcoin or other cryptocurrencies with regular money. You can trade it on exchanges, and you can use it to buy things on the dark web. The deeper details of how cryptocurrencies work is relatively complicated and we won't get into it here. But the important feature is that two people can exchange money securely without knowing any personal information about the other person. Crypto currency in the dark web marketplaces have really evolved together because the marketplaces make cryptocurrencies like Bitcoin more useful and Bitcoin enables those kinds of transactions to take place securely and privately. I mentioned that personal information can be bought on the dark web. But what does that involve? It's basically limitless. You can buy the bank account numbers, login and password information for a bank account in the United States that has a $50,000 balance for like $500 on the dark web. That's very sensitive personal information. And using it comes with a huge risk, but it can be had for a price. When you hear about large data breaches of username and password information from big websites that data tends to end up on the darkweb as well. It's not as valuable as bank account information, but it can be used in a lot of ways. For example, there was a large breach of Yahoo login and password information, even though everyone may have changed their password on the Yahoo website after that, anyone who obtained that hacked information would know your username, yahoo email address and a password you were known to have used. If you use that same username and password combination anywhere else on the web, they could try it out and possibly get access to different websites. This is why people are often encouraged to use different passwords on different websites. Though we know it can be impractical given the number of places that we have passwords. The suggestion is designed to protect us against attacks like this. Hacked personal information can also be aggregated. So there are places on the dark web where you can find a person's collected email addresses and login names, along with other information that may have been obtained illegally like hacked passwords, credit card numbers, social security number and other information really sensitive information. This is all available for relatively low priced, anyone who wants to buy it. In 2017, Experian did a study of personal information that could be bought on the dark web and how much it cost. A social security number went for just $1. A credit card number with the code on the back was $5. And the debit card number and associated bank information was $15. And driver's licence was $20. If you have any kind of normal online presence, there's probably information about you for sale on the dark web. Unfortunately, there's not a whole lot that you can do about that. Because of all of the privacy and security elements of the dark web that we've already discussed. It's quite difficult to shut these repositories down. They just pop up someplace else. And they're not traceable to the individuals who are running them. That said, you still may want to know what's there. 6.5 What Can You Do? You could get on the dark web yourself and start searching. But plenty of places like credit bureaus and credit card companies now offer dark web monitoring that looks for your personal information on the dark web and alerts you to it if they find your credit card number, or a password that you were still using. They can alert you that this is something to change in order to keep other accounts secure. However, these services come at a cost and sometimes it's in random fees that you're charged and often it means you give up your right to sue the people monitoring you, even if they are the reason your information ended up on the dark web in the first place. For example, if a credit bureau is hacked, and you take them up on an offer to monitor the dark web for your information, it may be letting them off the hook for being hacked, and letting your information out there. The only real steps you can take to protect yourself are using good security practices. Using things like two factor authentication will alert you if someone tries to get into your accounts and it makes it harder for them to access them. You can set up a credit freeze or monitoring with a credit bureau, but be sure to read the fine print. But beyond that, the dark web and the dark marketplaces of information that exists there are just an unfortunate reality of our modern digital life right now. Hackers are going to hack and until there are stiffer penalties for companies to encourage them to do much more to protect our information, that information will fall into the hands of criminals and end up on a low rent version of eBay. The freedom of the dark web is useful as it is, comes at a cost to us all. Chapter 7. The Future of Personal Data 7.1 Introduction What's the future of personal data and privacy? The answer is hard to come by not only because there are so many paths, but because technology drives this, and it's evolving quickly. Nonetheless, in this last chapter, we'll ask this question and look at a variety of ways that it might be answered in the coming decades. We'll start with DNA as an example of how technologies change and privacy assurances and rights shift with them. Then we'll look at the legal landscape and what kinds of privacy regulations may be coming in the future. 7.2 DNA as Personal Data DNA represents an interesting crossroads of technological, legal and ethical issues with personal data. We know that DNA is now in the toolbox of every law enforcement organisation to catch criminals. If you bleed or sweat or cry at a crime scene, you leave a piece of yourself behind that can uniquely be matched to you. But in the 30 years since we started hearing about DNA in the courtroom, we've come a long way and being able to identify and link people with their genetic profiles and this serves as an example of how technological advances can undo privacy guarantees that were made in the past and create new challenges going forward. If you're interested in true crime, you might know the stories that I'm about to tell you. The first begins with a woman Lisa, who did not know where she came from, as a child she had been taken away from her family, kidnapped by a man who would eventually turn out to be a serial killer. He had taken her moved around the country, but eventually tired of her when she was only five years old. At the RV Park where they were staying, he gave her away to a couple who were his neighbours. They took her in but eventually went to the police knowing something wasn't right. From there, she went into protective custody was adopted and grew up not knowing who her birth family was, or even what her birth name was. The man who claimed to be her father was eventually caught, tried and convicted of abandonment. But for some reason, they never finished the paternity test. If they had, they would have discovered he was not her biological father. 14 years later, when Lisa was an adult, a detective finished that test. On a hunch, when she discovered that the man was not her father, the case became more complicated. Who was Lisa and how did he come to have her with him? And where did she come from? For the next 10 years, there were no answers. They tried searching for a parent or sibling using Lisa's DNA, but had no luck. Finally, Lisa herself suggested that they start looking into genealogical databases on sites like ancestry.com, 23andme and other open source DNA databases. People can upload their profiles to try to find distant relatives. She thought her identity might lie somewhere in those databases. Detectives uploaded the profile and started finding distant cousins. We have a lot of distant cousins. For Lisa, there were 25,000 relatives to sift through, tracking down and immediate family member would require more work. Barbara Rae-Venter, a genetic genealogist joined the search. After a year of work, Barbara and her team narrowed down Lisa's mother to one person, the detective handling her case, reached out to the family and Lisa learned her real name was Dawn. This was the first use of familial DNA in this way in a criminal case, it got detectives thinking about other ways they could use this strategy. Essentially, they take the profile of a person and look for whatever distant relatives they can find. Working with other public records and genealogical data brings them increasingly closer to their near relatives. In Lisa's case, we had a known person with an unknown identity. But what if you had an unknown person like a murderer or a rapist, and you wanted to identify them using the DNA of their family members. It turns out that this works, and it helped catch an infamous serial rapist and serial killer who had eluded police for decades. The Golden State killer profiled most prominently in Michelle McNamara's book ‘’I'll be gone in the dark’’ committed dozens of rapes in the Bay Area before escalating to home invasions and murder in the Los Angeles area in the 1970s and 80s. The police had DNA but had been unable to match it to a known person in all that time. Eventually, they turned to familial DNA using public open databases where people voluntarily upload their DNA profiles to try to find relatives. They were searching for relatives of a murderer. Using a similar process to leases, they were able to eventually narrow it down to one man who they thought was the killer they were looking for. Of course, they couldn't rely just on these familial records for an arrest. But once those records gave them a suspect, they were able to surreptitiously collected DNA sample from the door handle of the suspects’ car, and it matched. The man known as the Golden State killer, the East area rapist, and the original Night Stalker had been caught. It was 72 year old Joseph D’Angelo, and he was arrested in 2018. These stories show the power that lies in DNA. The Golden State killer was just the first prominent example in what become a string of cold cases that have been solved with this technology. Catching killers and rapists is good. But there are a number of questions that arise about this use of personal data that we have to consider. In these cases, people had voluntarily uploaded their profiles into public databases where one could reasonably presumed law enforcement would also have access to it. They may be surprised to know their profiles could be used to catch their distant relatives for committing crimes. When we're talking about catching murderers and rapists, it feels like we're firmly on the good side using this information. But what if it starts being used for more petty crimes, simple assault, breaking and entering, a drug crime where DNA is left behind? What if it eventually becomes so cheap that it's used to catch people for committing relatively minor crimes? Do people want their DNA profiles to catch distant cousins for littering? And there are other applications where the territory becomes much more troubling. For example, consider the case of anonymous sperm donation or egg donation. Most of the time when men and women choose to anonymously donate in this way, they sign legal contracts that protect their identity. They surrender parental rights and the contracts keep their identities private and unknown to the families that receive donated sperm and eggs. If donors knew they could be identified, it's likely that many would refuse to participate. Some simply want to keep their donations to themselves. A woman who donates her eggs in her 20s because she wants to help an infertile couple and make some extra money maybe very reticent to donate if the resulting children may track her down 20 years later. Maybe she just wants to keep her reproductive choices private and be left alone. A reasonable precaution donors might take is to keep their DNA records out of public databases. They may even choose to never get a DNA test at all. However, even with basic ancestral DNA searches, if that donor has a brother or sister or parents who upload their DNA, the child that resulted from the donation would find that immediate relative quite quickly in their search. Then, if they're working in a system that allows contact with genetic matches, the child could reach out to the donors family. If the donor never told anyone about their donation, the child has revealed a deeply personal and private piece of reproductive health information to family members. This is especially troublesome if the donation violates the family's ethical or religious norms. This kind of Revelation is severe enough that for some people, it could destroy family relationships and the donor has a right to keep that information private. Furthermore, a donor likely does not want a relationship with the child that they anonymously donated to produce. They chose to donate anonymously in the first place for a reason. Yet a donor’s family could decide to pursue a relationship after finding a DNA match. The donor’s right to control the outcome of their donation is taken away from them. Sperm banks and reproductive health facilities are now considering how to discuss with potential donors the way that their anonymity will be preserved. They may simply be unable to guarantee anonymity in the current world of DNA testing. But for people who donated in the 90s and 2000s, who were assured that their identity would be kept private, familial DNA search is now potentially taking that away and not for any real social good, but potentially for the whims and curiosities of other people. And of course, the problems go deeper than this. 7.3 DNA profiles There's only a small amount of protection with respect to DNA profiles at this point, when George W. Bush was president, he signed legislation that prohibited health insurance companies from discriminating based on DNA. That's an important law. However, it's very narrow. There is a case of a child being barred from enrolling in his local school because he was a genetic carrier for cystic fibrosis. The school had a prohibition against two students who had cystic fibrosis from both attending to reduce the risk of infection. There was already a student with cystic fibrosis at the school, so the boy was barred from enrolling even though he did not actually have cystic fibrosis, he was merely a genetic carrier. The school's ignorance of what these genetic tests meant, led them to take away the rights of this student. This speaks to a long line of discrimination based on medical misunderstandings by lay people. You may remember the case of Ryan White, a student barred from attending public school in the 1980s because he was HIV positive. If genetic testing becomes cheap and easy, there are no current laws that prevent employers, schools and other organisations from discriminating against people based solely on their genetic profiles. If you're genetically predisposed towards heart disease, or Alzheimer's or schizophrenia, even if you're taking all the behavioural steps that help prevent that, you could still be barred from getting a job based on discrimination against that factor. The lack of scientific sophistication among the general public who's not trained in interpreting and understanding genetic testing means that the opportunities for unfairness are rampant, and we're likely to continue to see this sort of discrimination based on DNA. Genetic privacy is a complex topic, and it's unlikely that a single law could be put in place that protects people from unfairness, discrimination, and having their privacy compromised. We want to allow for reasonable law enforcement and healthcare use of DNA, but protect people as we start moving towards more regulation in this space. Just how to do that is uncertain. Thinking about DNA specifically, it's worth thinking about whether you want to even have your DNA tested by one of these companies. If you do want it for your own genetic insights, or if you already have had it tested, you can think about strictly controlling the privacy settings on your DNA in the system. Having your DNA profile deleted after you've obtained the information you want, may also help. 7.4 The Future of Privacy Regulations DNA is only one example of many technologies that are evolving to provide more insight and invasion into our lives. Artificial Intelligence, data integration and massive data collection all promise to lead towards new tech that can uncover identities, attributes and connections that we never expected. As a result, we likely need to think about fundamental privacy rights that we want to establish rather than piecing together domain specific regulations. That may sound familiar as it's the basis of European privacy protections. But are we anywhere close to getting that in the US where big tech are based, federal regulations are still largely up in the air. But there are interesting developments happening at the state level, especially in California, that might offer some insight. The California consumer Privacy Act is a state law that some people refer to as GDPR light went into effect in January 2020. It gives many similar rights to citizens of California that Europeans have. It governs large businesses and businesses that make most of their money by sharing or processing personal data. The law offers a number of protections. It requires that companies be transparent about what data is collected and how they use it. Citizens have a right to control the data about themselves and they have a right to see the data that companies hold, they have the right to request that it be deleted. While this will be very beneficial for citizens of California, it raises the prospect of a GDPR like federal law in the United States. That's because California is likely not going to be alone in passing a consumer Privacy Act. Many other states have their own privacy laws and several are considering bills that will grant similar protections within their borders, having a bunch of different state laws that regulate consumer privacy, especially when those regulations are not the same across states can make it very difficult for a company working with personal data to operate in the United States. You potentially have to handle it differently and offer different features across 50 states. Depending on how these laws are written, you may even have to offer different protections if people are simply visiting a state versus living there. This scenario makes it more likely that we'll see a federal law come into place in the next few years that offers similar protections. This would allow companies to operate under a single United States privacy law as opposed to operating differently in each state. If this happens, the US will be following Europe's lead. The current situation is very similar to what the EU faced, Europe had a privacy directive that was implemented differently in each country. That led to some of the same difficulties we see in the United States. In May 2018, when GDPR came into effect, it harmonised those laws, making it much easier for companies to comply and there are already precedents for federal laws to protect privacy in the US. The Children's online privacy protection act or COPPA is one example of a federal privacy law that governs the data of children 13 years and younger. It has very strong consent protections in place. And you're likely familiar with the Health Insurance Portability and Accountability Act, HIPAA, which governs the privacy of health information, you probably encountered this consent requirement through forms in your doctor's office and pharmacy. And there's also FERPA, the Family Educational Rights and Privacy Act, which grants privacy protections over education records. These are all federal laws that allow for consistent implementation of privacy policies around the country. There's not currently a GDPR like privacy law in the United States. But if the Congress were to move this way, it would likely be in the interest of offering more protections for consumers to control their data, and bringing a more European like privacy legal standard to the United States. Indeed, the US is already benefiting from Europe's leadership on privacy, they're able to see what's worked and what hasn't with GDPR so far, and to make changes as they develop our own law. Without the European law leading the way, it would have likely been much harder to even get state privacy laws passed within the US. But now that California has the privacy ball rolling, so to speak, there's hope that they’ll all eventually gain much more appropriate control over their own data. 7.5 The Legislative Future of Personal Data The legislative future is not just limited to privacy protection laws, they need a robust set of laws that cover many different aspects of the personal data problem. As we discussed in the chapter on data scandals, cybersecurity is a real issue that connects with data privacy. We've all been the victims of multiple data hacks, whether it's our credit cards being stolen from major retailers, or social media companies and email providers being hacked. Sometimes these breaches don't yield much useful information. It could just be our email addresses released which isn't especially valuable or private. However, as evidenced by the hack of the Office of Personnel Management, which revealed background check records for federal employees and contractors, deeply sensitive information can also be released. How do we prevent people from hacking this kind of data? The bad guys are always going to try to get it and that means we need a strong defence against their attacks. That requires comprehensive cybersecurity and strong incentives for companies to follow best practices and the latest guidelines. Right now, the penalties for poor security practices are relatively weak. Even when these companies are gathering tremendously sensitive information about people that can cause major disruptions in their lives. The penalties for weak security tend to be small enough that their business is not disrupted. Contrast this against laws in Europe that can find companies a significant percentage of their gross profits for irresponsible data security. We need better cybersecurity laws that create very harsh penalties for companies that do not protect our data. We also need to understand and discourage massive data collection without purpose. One interesting proposal that's been floated in the US Senate is to require publicly traded companies to disclose the value and liability associated with the personal data that they hold about people. Now, we don't really know how to put a value on that data at this point and so if legislation like that were to become law, it would require techniques for valuing the data. However, if that problem is solved, and companies have to report that they potentially hold billions of dollars’ worth of personal data, and the liability associated with it being stolen, that serves as a financial deterrent for saving unnecessary information. For example, if Amazon had to disclose that they have, say $10 billion worth of personal data wrapped up in all the recordings that they've stored from people who have an Amazon Echo, they may decide to keep fewer of those recordings, that then improves privacy, because there's less data for them to analyse, and improve security because if that data is leaked, there's less information there that could be exploited. There's not going to be any omnibus bill that addresses all of the issues surrounding personal data. Instead, we're going to have to look at the different parts of this very complicated problem and come up with steps that improve each. The focus on corporations also carries into the discussion on privacy and surveillance. We've talked a lot about the rights of consumers and what legislation may do to protect them from companies who have their data. But what about the relationship between people and companies when the people are employees. Companies have very few limitations when it comes to monitoring their employees. We likely expect the companies can monitor our work emails, even though we hope they're not reading them regularly just to monitor us. But monitoring technology often extends out of the office and out of the bounds of work related activities. Consider the story of Myrna Arias, she worked for a money transfer firm. Her company had an app that tracked employees when they were working out of the office. Many companies have this and in most cases, it's a reasonable way to monitor work. If workers are making deliveries. The app lets employers know where they are in the process. It can track if they're actually working and provide better service to customers. But should employees keep these apps on when they're not working? Myrna’s company said yes. When she objected. She was told she had to have her phone on and app launched at all times. Her boss used it to monitor her when she was off duty and bragged that he knew how fast she was driving at certain times when she was not working because he monitored her in the app. When she uninstalled the app, she was fired. She sued the company and settled out of court. Would she have won a suit for wrongful termination? It's not clear and the waters get even murkier when workplace programmes gather information that's protected. For example, in April 2019, The Washington Post reported that pregnancy tracking app ovia was sharing data with employers. Women use this app to track their periods, bodily functions, sex drive, and more. The app also partners with workplace wellness programmes. As part of that they provide information to employers for a fee, that shares aggregated information about the women using the app. This includes if they're pregnant, when they might return to work. And if they lost a pregnancy. It's illegal to use pregnancy status in hiring decisions, and while okay isn't sharing names directly with employers, it can be easy to identify women. Consider a small company in a male dominated field with relatively few female employees. Some may be old enough that they aren't likely to consider having children, others may be younger, it could be a very small number who may be considering pregnancy, and they could become identifiable quite easily. If an employer finds out this kind of information by partnering with the app, they may make employment decisions based on that. And if they do, how can a woman prove that she was discriminated against? More importantly, why should companies get this information at all? We've become so used to surveillance and workplace situations that we may need to step back and really ask why an employer has any legitimate right to know about its employee’s fertility issues, there's really no need for it. And yet companies continue to push employee tracking with a variety of apps and devices. They may require it for insurance coverage or discounts. They end up with a lot of very personal data for no legitimate reason. A lot of the discussion around privacy rights is centred on companies and their users data. And this is an important topic, but there should also be a serious and critical analysis of companies monitoring employees and legislation that enshrines protections for workers as well. Of course, monitoring within the workplace may be important, legitimate and should be allowed. But as employers cross into monitoring the Private Lives of workers, we shift from a space of desired productivity to desired power. The technology that can collect, analyse and derive insights from our data is growing so fast. As a society, we've not figured out how to apply our ethics, values and protections in this domain. When we think we've caught up, the technology has sped ahead. DNA profiling shows just how disruptive new technologies can be in the face of old privacy guarantees. It can help catch criminals but also identify anonymous donors. To address this legally, we need to think about fundamental rights people have over their data. There's a change happening here as well. Though the future is uncertain, but trends suggest we may see more protections coming. This brings us to the end of our quick but critical look at the world of personal data and what you can do to control how you operate in it. What have we learned? Well, we've learned that you can't control everything. Part of living in a data driven world is coming to terms with the fact that you and the data you create intentionally or unintentionally, are valuable commodities and that people, corporations and governments can and do go to great lengths to access that data. Sometimes this can make your life better in meaningful ways. But sometimes it can result in really alarming outcomes. We've also learned that you have some power over the situation, you have the power to know where you stand when it comes to your privacy preferences. You have the power to take targeted steps to create and maintain a level of data privacy that works for you and you have the power to speak up intelligently and demand change when you feel more protections are needed.