Research on the Methods of Micro-blog Centralization for Online

advertisement
Research on the Methods of Micro-blog Centralization for
Online-Advertising Targeting
,2
Qian Yuan1 Shuqin Cai1, ,Lei Duan1
1School
, Feng Liang 1
of Management, Huazhong University of Science and Technology, Wuhan,China
2
Weibo,Sina
caishuqin@sina.com,fliang76@gmail.com,hustyq@hust.edu.cn
Abstract: Online-advertising targeting develops a useful segmentation strategy which is
critical in marketing. However decentralized UGC in micro-blog platform gives rise to
content, space and time segmentation. Prior research just predicts the customer advertising
preference from the perspective of statistic, and cannot meet marketing need because of
failing considering the feature of UGC. This paper centers on micro-blog centralization
process to improve the effect of advertising, considers different type micro-blog information
as processing raw material and explores effective information processing methods of
advertising targeting for micro-blog service providers. Our works are demonstrated valuable
in micro-blog context with decentralized UGC.
Keywords: UGC, Targeting Advertising
Supported by projects of National Natural Science Foundation of China (71071066,7073101) and project of the Ministry of
Education Research of Social Sciences of China (11YJA630098)
Introduction
With the booming of micro-blogs, micro-blog has become the most important platform
for companies to design online-advertising targeting as marketing strategies, as micro-blog
has great superiority of transmission, interactivity and accuracy. Advertising has been the
major revenue source of micro-blog websites. Huge amounts of UGC in micro-blog including
preference, behavior and context information of customers, create data context for onlineadvertising targeting which has developed from individual level (property-centric) to personal
level (customer-centric) and contextual level (context-centric). Nevertheless, vast majority
websites is unsatisfied with the effects of advertising revenue, so as audiences [1]. Fragmented
and decentralized micro-blogs impede the utilization of content in advertising strategies.
Extracting information from fragmented, decentralized micro-blog UGC is not only critical
but also urgent for online targeting advertising from the perspective of marketing.
Micro-blog is a representative UGC (user generated content) website. There are two main
advertising approaches for advertisers in the UGC context: publishing professional ads
alongside the content generated by users or asking users to create ads for the company's brand
[2]
. And the former is focused on here. Targeting advertising can be categories into
content-oriented advertising and behavior-oriented advertising. From the content-oriented
perspective, researches mainly are focused on keywords extraction [3], short-text similarity
calculation [4], text-matching strategies [5] et al. From the behavior-oriented perspective, the
main solution is collecting the feedback information of the users' click action to match ads
and search keywords by combining with the content-oriented targeting solutions [6, 7]. To
improve the Web service level, Goethals et al.[8] suggest fixing more attention on the
centralization and decentralization of data and procedure. The work of Paltoglou et al. [9]
coalescing the search results from search engine like Google from the view of centralization is
very meaningful to increase the accuracy and relativity of search results. Goldszmidt et al. put
forward four indicators by comparing the centralization paradigm and decentralization
paradigm and suggest two paradigms can be and should be co-existed in online marketing [10].
Researches mentioned above provide insights of centralization, however there are some
problems remaining:1) the research object of prior researches are mainly traditional online
content, lacking the consideration of micro-blog’s new characteristics; 2) researches of
centralization are mainly positioned in the qualitative, macroscopic level and it is difficult to
meet the micro-level micro-blog information processing needs due to the limitation of
operability; 3) the researches of measurement algorithms are too much, while there are few
systematic studies from the perspective of marketing by considering customer’s demands .
From the perspective of marketing, studies considering companies’ marketing needs,
customer’s specific demand, and requirements of online advertising targeting, are urgently
needed to explore the systematic centralization methods by combine information/knowledge
management and marketing science together. Therefore, choosing the micro-blog services
website as background, we propose a UGC centralization method in order to meet
online-advertising targeting needs of company in micro-blog platform by summarizing
advertising targeting needs in the view of location, time, behavior and content, extracting
centralization features in the view of user, content and trends, and presenting a three-layer
(person level, group level and platform level) framework of centralizing UGC.
0.
Space and Features of Centralization for Advertising Targeting
Online-advertising targeting in our study is based on the marketing demand of advertising
targeting and features of UGC in micro-blog platform, and delivers right ads with right
content to right micro-blog users at the right time and the right position in the right page,
learning from the customers’ personalization information and their context. Business
marketing targets customers generally based on the demography related to products or
services, geography positions, customer psychological statistics and behavior factors [11].
However new UGC context provides advertisers new opportunities to realize it. We category
the needs of targeting as location-oriented, time-oriented, behavior-oriented, and
content-oriented targeting need.
We propose a micro-blog centralization space to support the marketing aim of online
advertising targeting, as illustrated in Fig
Behavior-oriented Targeting
1. Although micro-blog websites attract
huge amounts of content on one hand,
Behavior Strength
they also bring the users’ and content’s
information fragments and further results
Behavior Frequency
in the decentralization of micro-blog on
Content-Oriented
the other hand. The micro-blog
Targeting
Number Of Frequency Of
centralization is a user-centric procedure
Time
Subject Terms Subject Terms
which treats micro-blog content as
Space
corpuses to obtain the users’ space-time
Space-time-Oriented
Targeting
related, behavior related and subject term
Fig 1 Centralization space for micro-blog advertising targeting
related information from the demand of
advertising targeting.
Based on centralization space before, we analyze three kinds of trends here. There are
different kinds of micro-blog content in micro-blog website, however not all of them are
suitable for targeting advertising. For example, homepage is not a good choice for ads, as
content in it isn’t convergent and can’t represent customers’ preference. For this reason, we
try to estimate customers’ potential preference by integrating different level of UGC as
corpuses. Corpuses are split into three levels here: personal micro-blog (X1), micro-group
blogs (X2), and the whole micro-blogs (X3), illustrated in Fig 2. X1 is the micro-blogs posted
by user himself/herself in personal pages, including original micro-blogs content set, user’s
profile information, time sequence of user’s micro- blog posting action and location set where
user posts micro-blogs. X2 is posted by users in the micro-group, and revealed in micro-group
pages including original micro-blogs content generated
by users, and the features of micro-group micro-blog
Personal Micro-blog(X1)
are the topics and categories of group. X3 is all of
micro-blogs in the micro-blog websites, including
Micro-Group Blog(X2)
micro-blog content, temporal information and topics
Whole-level Micro-blog(X3)
emerged. Based on the marketing requirement of
Fig 2 The level of micro-blog
online advertising targeting, they can be transferred
into user features, content features and trend features.
 User features
User features (Y1) is related to the personal micro-blogs, related to user’s behavior. The
behavior of online user is divided into short-term and long-term behavior. Occasional and
unpredictable short-term behaviors happen more frequent in a specific period because the
user’s interests in a specific entity or event are relatively unstable. Long-term behaviors come
from user’s stable interests so that related behaviors will reveal more frequently in a long time
period.
We character user features as Y1={(K,F,S),CT,CL}, where K is the set of keywords terms
related to ads in micro-blog, F is the value of keywords’ frequency, S is the value of
keywords’ strengths, CT is the set of micro-blogs’ centralized timestamps, and CL is the set of
micro-blogs’ centralized location.
 Content features
Content features (Y2) reflect the individual features mining from micro-group content
related to ads themes. The topics in a micro-group blog are convergent. Joining in a special
group implies potential interests to those contents posted into that group. The type and domain
of micro-group can be recognized from group topics and group profile information which are
included in the micro-group page. Delivering ads to a user will performs well only if he/she
comments a product majorly with positive sentiments [12]. It will accelerate the realization of
marketing strategies.
We define content features as Y2={K,N,E}, where K is the keywords collection related
the ads from micro-level blogs, N is the times of keywords appearing, and E is the sentiment
polar of related keywords.
 Trends features
Trends features (Y3) are the trend topics appearing in whole-level micro-blogs.
Micro-blog service providers usually present hot-topics to users by order. Those topics related
to ads can draw users’ attention and encourage them to join in the discussion which is a
excellent opportunity to enhance the interaction between companies and customers. In the
profit model of Twitter, “promoted trends” strategy charges highest by sending brand
promotion information directly to the head of hot topics in sidebars. The trends centralization
method proposed here can combine the ads delivering and current hot topics together, and
guide user into topics discussion to promote companies’ marketing strategies, as hot-topics
can be noticed by many users and exposure rate is critical for companies.
Trends features can be defined as Y3={K,N,ST}, where K is the keywords collection
related the ads theme in whole-level micro-blogs, N is the times of keywords appearing, and
ST represent the stage of keywords’ life cycle in a hot-topics.
1.
Centralization Processing Procedure and Method of Micro-blog
1.1 Centralization Processing Procedure of Micro-blog
Micro-blog centralization processing is a procedure to get different centralized results and
match with different demands of customers by using suitable approach of advertising
targeting based on personal micro-blogs, micro-group blogs and whole-level micro-blogs.
Centralized results must match advertising targeting which customers’ needs according
the matching degree, as illustrated in Fig 3. Personal micro-blogs (X1) will turn into user
features (Y1) by the user-centric centralization process (P1). Micro-group blogs (X2) will turn
into content characteristics (Y2) by the content-centric centralization process (P2).
Whole-level micro-blogs (X3) will turn into trends characteristics (Y3) by the trends-centric
centralization process (P3).
Centralized
Centralized
Processing
Processing
Procedure
Procedure
Output
Input
User Feature(Y1)
P1
Personal-level(X1)
Centralization
Space Of Targeting
Advertising
Decentralized
Micro-blog
Space-Time(A)
MAP
Group-level(X2)
P2
Content Feature(Y2)
Behavior(B)
Whole-level(X3)
P3
Trend Feature(3)
Content(C)
Fig 3 Centralization processing of micro-blogs
1.2 Centralization Processing Methods of Micro-blogs
2.2.1 User-Centric Processing Methods
 Centralization based on the behavior’s frequency and strengths features
Frequency features majorly are reflected in the freshness of short-term behavior and the
dispersion of long-term. Freshness is the average posting time in an effective cluster after
normalization. Dispersion is the mean-square deviation of the posting time in an effective
cluster after normalization. Micro-blogs can be clustered based on the similarity between
keywords. When the similarity between two micro-blogs is greater than a pre-given threshold
(α), they are categories into the same cluster. Only if the number of micro-blogs in a
micro-blog cluster’s proportion of the total number of micro-blogs reaches a certain level, the
cluster contains the ability to reflect the users’ behavior characteristics. The rate can be
defined as behavior factor, indicated by ξ. So only if the amount of micro-blogs in a cluster is
greater than ξ×m, that cluster can be considered as an effective cluster reflecting that user
behavior characteristics, otherwise it is a noise cluster.
The strength features depend on the repeated time of actions. It is reflected in the number of
times that keywords appear. And the behavior strength is divided into three categories:
strength, neutrality, weakness. The minimum and maximum of appearing times are chosen as
the both ends, and trisect the interval equally as the scope of three categories. Strength
features is the necessary attribute of keywords.
 Centralization based on the behavior’s space-time features
The space and time features based on the user’s
behavior which is often used in marketing practice,
learn from the centralization processing on the
location and time attributes of the all micro-blogs
posted by a user. The purpose is choosing a right
time quantum to deliver ads for advertiser based on
the action of logging and posting, and the location
and time of posting.
Location features is the result of centralization
processing on the location information added in the
Mon
Tue
Wen
Thu
Fri
Sat
Sun
1 2 3 4
……
24
Fig 4 Centralization of User’s Time Feature
micro-blogs. Keywords related to current location information are recognized by NLP
(Natural Language Processing). Time features comes from dividing the posting time of the
micro-blogs related to a special keyword based on days of week and hours of a day and then
determining which section those micro-blogs belong to, just as the case illustrated in Fig 4.
2.2.2 Content-Centric Processing Methods
Feature extraction is the first step for content-centric processing and the next is constructing
the keywords polar dictionary in a special domain to recognize the opinion trends of users.
The centralized micro-blog content characteristics can be mined at last.
 Feature extraction based on micro-blog content
Feature extraction is defined as extracting related content features from lots of
micro-blogs automatically through machine learning. Those extracting technologies are
widely used in the opinion analysis of customer product reviews in general. Micro-blogs can
are considered as one kind of product reviews here excepted for the low pertinence. The
mainly steps for association rules mining based on the content features are: 1) tagging
keywords polar; 2) composing a transaction file with nouns and nouns phrases; 3) extracting
frequent keywords based on the association rules mining; 4) pruning the features rules based
on adjacent rules; 4) pruning the features rules based on the dependent support; 5) forming the
micro-blog characteristics collection composed by frequent items; and 6) complementing the
domain or product characteristic of the infrequent items in micro-blogs at last.
 Sentiment analysis based on micro-blog content
According to the further analysis of the sentiment trends to the product characteristic
referred in the micro-blogs, sentiment analysis is defined as classifying a user’s sentiment
polar to a characteristic, positive or negative, given by feature mining. Sentiment trends of
users can be summarized by classifying the sentiment in huge amount of micro-blogs. Polar
dictionary construction and sentiment classification are very important in sentiment analysis.
There are 445 positive adjective keywords and 337 negative adjective keywords picked out
from the used dataset by authors manually at last. Those keywords are meaningful in
marketing. Because Chinese is high-context, some polar keywords which may convert its
polar when describing some features in special context are defined as the abnormal polar
keywords. Some works to collect the polar keywords in the special domain and complement
the polar keyword dictionary have been done.
Table 1 the formulas for calculating the text sentiment classifying index
Category
Positive
Negative
Pricesion
PP  a1 / b1 100%
PN  a2 / b2 100%
Recall
RP  a1 / c1 100%
RN  a2 / c2 100%
Index
The sentiment classification in micro-blog is measured by the indexes borrowed from text
theme classification, precision and recall. Assumed that a1 is the correct text classified
number in the positive text recognized by classifiers, a2 is the correct classified text number
in the negative text recognized by classifiers, b1 is the positive text number recognized by
classifiers, b2 is the positive text number recognized by classifiers, c1 is the actual positive
text number and c2 is the actual negative text number. The formulas are listed in table 1.
2.2.3 Trends-Centric Processing Methods
Trends-centric progressing is divided into three Hot
stages: short text clustering, evolution cycles filtering
and hot-topics clusters ranking. The key processing task
h2
is filtering those clusters related to the needs of
advertiser in the up-going stage.
A series of hot-topic micro-blog clusters set can be
h1
gotten by using micro-blog text clustering. It is
reasonable to assume that there is an evolutionary cycle
illustrated in Fig 5. In the evolutionary period, the life
t1
t2
t
Fig
5
Evolution
Cycle
of
Hot
Topic
Clusters
cycle of hot topic is a procedure which experiences the
birth, growth, mature and death at last, and rises from valley to peak fast and declines to
valley slowly again. The filtering of evolutionary period can be supported by the cluster in the
rectangle where hot h in (h1, h2), and time t in (t1, t2). And h1 should depend on actual
situation. If h1 is set too small, the content may be not convergent enough. If h1 is set too
large, it will be too difficult to filter enough content and results in missing opportunities for
advertising from the perspective of marketing. Based on the feature of revolutionary curve of
hot-topic event, clusters unfit for the revolutionary curve can be removed even though the
similarities between them are high enough.
The candidate hot-topic clusters can be ranked after filtering processes of evolutionary
cycle. Clusters with more corpuses and higher weight in the feature vector have the larger
possibility to act as the hot corpus clusters. The set of clusters can be filtered whose ranking
position is larger than the pre-given threshold. Those chosen clusters fitting with the
evolutionary rules are ranked again according to the feature items’ weight. A feature item with
the larger weight is the more valuable to be used to deliver related ads.
2.
Cases and Conclusions
2.1 Cases
Weibo 2 with a high-level open and users’ activities which is considered as twitter’s
alternative product in mainland China, is chosen as the source of original corpuses.
Content-centric and user-centric processing procedures are executed based on the research
mentioned above to realize advertising targeting as marketing strategies. The micro-group,
named “car”, with the most users in car-related micro-groups is chosen as the information
source. By the end of November 1, 2012, there are 39255 members, 23096 micro-blogs in
“car”. About 150 pieces of micro-blogs are been posed average per day in our dataset.
 Content-centric processing procedure
According to feature extracting results of car-related product micro-blogs, top 8 keywords
related to car’s attributes are identified based on the attention ranking. According to the
classifying training model for product categories, sentiment trends are recognized based on
2
http://www.weibo.com
the sentence. The results of sentiment analysis are displayed in Table 2, by combining polar
dictionary into sentiment analysis tool.
Table 2 Attributes and polar of car-related micro-blogs
Positive
Rank
Attributes
Negative
Micro-blog Micro-blog
Amount
Amount
Micro-blog
Rate of Positive
Amount
Micro-blogs /%
1
新款
893
490
1383
65
2
兰博基尼
605
137
742
82
3
改装
557
160
717
78
4
宝马
487
207
694
70
5
奥迪
433
237
670
65
6
跑车
391
227
618
63
7
发动机
311
200
511
61
8
车型
288
213
501
57
 User-centric processing
Users who show interests to cars can be found by the tag, “car”. A high original-create
rate user, named “疯狂的石头”, is chosen randomly. He is a VIP and is authenticated as
“well-known micro-blog” in Weibo. There are 293 followees, 49388 followers and 1270
micro-blogs in his profile and he tagged himself as “After 80s”, ”Car”, ”
Subaru ” , ”Sport”, ”Car racer”, and ”Essay”. He is obvious a typical car enthusiast. And no
business promotion factors are involved in his profiles according to his micro-blogs content.
Table 3 Attribute and polar of car-related micro-blogs posted by users
Positive
Rank
Attributes
Negative
Micro-blog Micro-blog
Amount
Amount
Micro-blog
Rate of Positive
Amount
Micro-blogs
/%
1
奥迪
55
12
67
82
2
赛车
41
9
50
82
3
大众
35
14
49
71
4
沙漠
33
16
49
67
5
宝马
23
17
40
58
6
法拉利
29
5
34
85
7
德国
14
14
28
50
8
动力
6
14
20
30
898 micro-blogs are collected thought LocoySpider. Feature keywords related to car
have been extracted if keyword’s frequency is larger than ten after the preprocessing
procedure. And some synonyms have been merged together, such as “沙漠” and “戈壁滩”,”
法拉利” and “F430”, “宝马” and “BMW”, “赛车” and ”赛道”, and “奥迪” and “A4L”. The
top 8 characteristic keywords are obtained and their sentiment polar are recognized. Results
are displayed in Table 3.
Due to the huge data amount, it is too difficult to check the result of sentiment
reorganization. 100 sentences are chosen as training set and 50 sentences are chosen as testing
set. The results are list in Table 4. The recall of positive comment sentence is 64%, and the
precision of it is 78%. The recall of negative comment sentence is 82%, and the precision of it
is 69%. And the total precision is 73%.
Table 4 Frequency statistic result of keywords
Micro-blogs
APCM
ANCM
PPCM
32
9
PNCM
18
41
In table 4, APCM: Actual Positive Comment Micro-blogs, ANCM: Actual Negative
Comment Micro-blogs, PPCM: Predicted Positive Comment Micro-blogs, PNCM: Predicted
Negative Comment Micro-blogs.
The influence of behavior factor ξ to ads delivering is displayed in Table 5. It is obvious
that the larger ξ will lead to the removing of noise clusters and those effective clusters which
can reflect the short-term behavior and the long-term behavior of users are left. The
pertinence of ads is raised.
ξ
Table 5 the influence of behavior factor to Ads delivering
Theme of Effective Clusters
4%
奥迪,赛车,大众,沙漠,宝马,法拉利,德国
5%
奥迪,赛车,大众,沙漠,德国
6%
奥迪,赛车,德国
7%
奥迪,德国
This case combines the product features mining in Chinese micro-blogs and sentiment
analysis technology together and validates that domain features can be summarized out from
huge amount of micro-blogs content. Centralized information has been extracted successfully,
such as the number of comment sentences, the number of positive opinions, the number of
negative opinions, the rate of positive opinions and the domain-related features of
micro-blogs. Methods of feature mining proposed in the case performance well and provide
the decision support for delivering targeting advertising.
2.2 Conclusions
Based on the framework of information centralization, this paper chooses Weibo as the
research background, chooses delivering targeting ads as pointcut, and studies the methods
and models of micro-blog centralization based on the web micro-content information
processing theory in the mixing fields of marketing and information system. This research
constructs the theories and methods aiming at solving the marketing problem of online
advertising targeting brought by information fragments in micro-blog websites, and searches
some breaks in the information science and marketing for the Internet companies. Since this
study is involved in marketing strategies, knowledge management, information technology
and others research fields, there are some problems left which need to solve in actual
marketing strategies and left in the future research: weighting everyone’s micro-blog based on
the relationship between users, extending centralization model for mining the micro-blog
from social networks perspective, increasing the application of real-time processing, and
exploring the more effective characteristic mining and sentiment analysis methods.
References:
[1].JS Beuscart, Kevin Mellet. Business Models of the Web 2.0: Advertising or the Tale of Two
Stories. Communications & Strategies, 2009.
[2].Sandeep Krishnamurthy, Wenyu Dou. Note From Special Issue Editors: Advertising with
User-Generated Content: A Framework and Research Agenda. Journal of Interactive
Advertising, Vol 8,1-4(2008)
[3].Wen-tau Yih, Joshua Goodman, Vitor R. Carvalho. Finding Advertising Keywords on Web
pages. In WWW, 213-222 (2006)
[4].Sahami M., Heilman T. A web-based kernel function for matching short text snippets.
Proceedings of the Workshop on Learning in Web Search located at 22th Intemational
Conference on Machine Learning, pp.377-386(2005)
[5].Berthier Ribeiro-Neto, Mareo Cristo, Paulo B. Golgher, Edleno Silva de Moura Impedanee
Coupling in Content-targeted Advertising. In SIGIR, pp.15-19(2005)
[6].Chakrabarti D., Agarwal D., Josifovski V. Contextual advertising by combining relevance
with click feedback. International World Wide Web Conference, pp.417-426(2008)
[7].Ciaramita M., Murdoek V., Plachouras V. Online learning from click data for sponsored
search. International World Wide Web Conference, pp. 227-236(2008)
[8].Goethals R. G., Snoeck M., Lemahieu W., et al. Considering (de)centralization in a Web
Services World. Second international conference on Internet and Web applications and
services. In: Morne, 22(2007)
[9].Salampasis M., Satratzemi M. A comparison of Centralized and Distributed Information
Retrieval approaches. In: Samos. Panhellenic conference on informatics, pp.21-25(2008)
[10].
Goldszmidt G., Yemini Y. Distributed management by delegation. In Icdcs, , 15th
IEEE International Conference on Distributed Computing Systems, pp. 0333(1995)
[11].
Chandra, A., Targeted Advertising: The Role of Subscriber Characteristics in Media
Markets. The Journal of Industrial Economics,Vol 57(1), 58-84(2009)
[12].
Chatterjee, P., D.L. Hoffman and T.P. Novak, Modeling the Clickstream: Implications
for Web-Based Advertising Efforts. Marketing Science, Vol22(4), 520 -541(2003).
Download