AJ Guardado Comp. Sci. 49S

advertisement
AJ Guardado
Comp. Sci. 49S


In order to correctly manage programs
(AdSense, AdWords), properly charge for the
PPC revenue model, and detect invalid clicks,
Google must collect a great deal of data about
querying and clicking activities.
All of this data is accumulated by Google and
contains information about a visitor’s activities
on the Google Network.




The “post-clicking” data about conversion actions
on the advertiser’s website makes up a large piece
of this collected data.
If the advertiser formally agrees to provide this
information, Google collects data on what pages
the user went to on the advertised site marked as
“conversion” pages (checkout page, form filling
pages, etc).
This data is limited to what the ADVERTISER
decides to provide to GOOGLE. Some decide to
opt out from providing this conversion data.
This “raw” data is cleaned, preprocessed and
stored in various internal logs by Google for
different types of analysis.




A weakness of Google’s data collection effort is
it’s inability to get full access to all clicking
activities of visitors.
The conversion data they collect is only part of
all the activity of a visitor an the advertised
site.
This data is important for detecting invalid
clicks, but Google and many other search
engines don’t have full access to it.
This isn’t Google’s fault, it is a limitation of the
types of data available to Google.



Advertisers get reports describing clicking and
billing activities from Google.
These reports aren’t done that well. Smallest unit
of analysis is one day, so advertisers can’t know if
a click was marked as valid or invalid by Google,
and Google won’t give them this info.
Advertisers feel they have the right to know this
info, but if Google gives them the info they open
themselves up to click fraud, because they are
giving the advertisers hints about how click
detection works.



One definition of invalid clicks: “When a
person, automated script or computer program
imitates a legitimate user of a web browser
clicking on an ad, for the purpose of generating
an improper charger per click”.
Invalid clicks can be made by humans or
computer programs.
To evaluate how valid a click is, you have to
understand what the intent of clicking the ad
was.



Need to determine if the click is generated
“artificially” or not, by way of a list of
“prohibited means” that Google follows:
(https://www.google.com/adsense/policies?s
ourceid=asos&subid=ww-ww-etHC_entry&medium=link )
Many can be detected, but some elude Google,
like a person looking at an ad a second time to
make sure he’s certain what the ad entailed.
Doubleclicks are also sometimes disputed as
valid or invalid. p is time difference between
clicks, and if p is relatively large, second click is
valid.





These acts come from a malicious intent to make an
advertiser pay for unnecessary clicks.
Fraudulent clicks are invalid clicks made with
malicious intent.
Example of invalid is a person doubleclicking an
add out of habit.
May come from software or “bots” designed to
click on ads, people manipulating pages,
advertisers clicking on the ads of their competitors,
or multiple accounts from AdSense publishers.
Goal of the Click Quality team is to identify all
invalid clicks regardless of nature, but they’re not
there yet.




Anomaly-based: Too many clicks in a given
amount of time (Ex: 100 times a day).
Rule-based: IF-THEN rules established.
Classifier-based: One learns to recognize
invalid clicks from past experiences with
invalid clicks.
Google uses the first two often, rarely uses
third.


No real definition of invalid clicks, and a
definition can’t be given to the public because
unethical users will take advantage.
Search engines must either assure advertisers
that they are doing everything possible, or use
independent third-party vendors to solve the
problem.




Click Quality team tries to protect Google’s
advertising and provide customer service.
Does this through prevention and detection.
Filtering and detection on several levels help
solve the problem.
Pre-filtering, online filtering, post-filtering,
automated monitoring, manual reviews
(proactive and reactive).



Started with only 3 filters, steadily grew over
the years. Prioritizes filters by order in which
they are used in checking invalid clicks.
Test filters before they actually use them, those
that pass require constant tuning and
maintenance to perform.
When Google sees the filters missed invalid
clicks, they give credits to the advertisers and
try to fix their filters.

4 types of clicks:
True Positive: invalid, correctly identified as invalid.
 True Negative: valid, correctly identified as valid.
 False Positive: valid, incorrectly identified as invalid.
 False Negative: invalid, incorrectly identified as
valid.



TP+TN+FP+FN=N (total number of clicks).
Accuracy rate of a filter equal to (TP+TN)/N,
and error rate to (FP+FN)/N.



Hard for Google to get this info, doesn’t know
about actual validity of clicks.
Each filter only detects 2-3% not detected by
other filters already.
Offline invalid click methods detect few invalid
clicks in comparison to the filters.
Download