AJ Guardado
3/2/08
Computer Science 49S
There are literally millions of clicks made every day on Google, whether it is on links, action buttons on the Google page, or any other number of things. One important thing that people click that most people don’t give much thought to are all the advertisements that circulate on Google and are frequently clicked on by internet users, whether purposefully or not. One job that Google has taken upon itself is to filter through all of the clicks that occur during a given day on the advertisements they run on their site, and find out which ones are valid or invalid, and intentional or unintentional. This is actually assigned to a group, the Google Click Quality Team, who go to extensive means to prove whether or not a given click will actually be passed as valid and intentional.
In order for Google to be able to actively monitor programs such as AdWords and
AdSense, charge correctly for the Pay-per-click model, and keep track of all the clicks made by users, it must first collect a good amount of data from throughout the day describing all the querying and clicking activities happening on its site. All of said data is accumulated by Google and contains all information about visitor’s activities while on the Google network. Advertisers could easily aid Google in this endeavor by providing them with the “post-clicking” data that they receive from conversion actions that occur on their website (when a person clicks on a link and it takes them to another portion of the site). These actions make up a large part of the collected data that Google needs.
If the advertiser formally agrees to provide Google this information, it allows Google to see the data on what pages the user(s) went to on a certain advertiser’s site that the advertiser has marked as a “conversion” page (check-out page, form-filling page, etc.). However, Google is unfortunate in that they only receive as much data as the advertiser(s) choose to release to
Google. Some advertisers may choose to not give Google this information at all, in which case
Google has no base to work off of to identify the validity of clicks and other such things.
When this data is given to Google, however, they are then able to clean this “raw” data, preprocess it and store it into variable logs in Google’s mainframe for different types of analysis, should they have to extract information from it. One of the weaknesses of this process is that
Google is unable to get full access to all of the clicking activities of its visitors. This conversion data they collect is only part of all the activity of a visitor on an advertiser’s site, and is crucial for Google to even start their monitoring process. It is also used to detect invalid clicks, but because Google cannot gain full access to this data, it makes their job more difficult. However,
1
AJ Guardado
3/2/08
Computer Science 49S while advertisers blame Google for doing a poor job of detecting invalid clicks, they should know that this isn’t Google’s fault, but rather a limitation of the types of data that have been made available to Google by advertisers.
If an advertiser should choose to give Google information about their clicking activities, it is processed and filtered, and then given back to the advertisers in the form of clicking and billing activity reports. These reports are not done that well, since the smallest unit of measure is one day; the best they can do is tell the advertiser that a click occurred, and when, but Google does not tell the advertiser whether a click was valid or invalid. Advertisers feel they have the right to know this information, but Google chooses to withhold it because it would give advertisers hints about how Google’s click detection works.
It is often debated what an invalid click actually is. One popular definition is: “When a person, automated script, or computer program imitates a legitimate user of a web browser clicking on an ad, for the purpose of generating an improper charger per click”. This definition alludes to a widely known fact about invalid clicks; they can be caused both by humans and by computer programs that were designed by humans specifically for this task. On Google’s end of the spectrum, for them to determine what an invalid click is, they have to be able to determine what the intent of the person who clicked the ad was.
Google follows a list of guidelines that helps them to determine whether or not a click is valid or invalid, or if it has been generated artificially or not. Many of these clicks can be detected by this system, but some of them can also elude Google, like a person who may be looking back at an ad a second time to make sure they saw everything in the ad correctly.
Doubleclicks are also sometimes under dispute as being valid or invalid. They use an equation with p as the time difference between clicks; if p is a relatively large frame of time (maybe 2 or 3 seconds at most), then the second click the user made is counted as valid.
These clicks arise from a malicious intent of a user to make an advertiser pay for clicks that are unwanted and invalid. These types of clicks are called fraudulent clicks , which are clicks that are made with malicious intent and are invalid. This is a major concern that the Google Click
Quality team is addressing and attempting to fix, but are having a very hard time doing.
2
AJ Guardado
3/2/08
Computer Science 49S
There are anomaly-based clicks, when the Click Quality team notices too many clicks that could have purposefully been made in any given amount of time. The second is rule-based, an example of which might be: “if there was a significant amount of time between the first two clicks, then the second click is valid”. The last method used is classifier-based, where a member of the Click Quality team learns to recognize an invalid click from past experiences. Google often uses the first two methods I described, but the third is too unreliable to use consistently.
Google filters and detects problems on several levels within their data. They have prefiltering, online filtering, and post-filtering, as well as automated monitoring and manual reviews done by workers. Google started with only three filters, but that number grew steadily over time.
They prioritize their filters by the order in which they are used to check invalid clicks, and they test all of their filters before they use them; however, the ones that pass the test need careful and frequent maintenance. If Google ever does miss an invalid click, they give credit to the advertisers and go about fixing their filters.
Since Google can’t be sure about the validity of all clicks on their advertisements, it is hard to keep the filters working properly. Each filter can only detect about 2-3% of data that isn’t detected by the filter before it. Offline invalid click methods detect few invalid clicks compared to the filters, even though they too can be faulty.
3