Research Journal of Applied Sciences, Engineering and Technology 4(21): 4423-4428,... ISSN: 2040-7467

advertisement
Research Journal of Applied Sciences, Engineering and Technology 4(21): 4423-4428, 2012
ISSN: 2040-7467
© Maxwell Scientific Organization, 2012
Submitted: May 01, 2012
Accepted: June 01, 2012
Published: November 01, 2012
A Free-Rider Forecasting Model Based on Gray System Theory in P2P Networks
1, 3, 4
He Xu, 1Zhao-xiong Zhou, 2Suo-ping Wang and 1, 3, 4Ru-chuan Wang
College of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003,
China
2
College of Automation
3
Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Jiangsu,
Nanjing 210003, China
4
Key Lab of Broadband Wireless Communication and Sensor Network Technology (Nanjing
University of Posts and Telecommunications), Ministry of Education Jiangsu Province, Nanjing
210003, China
1
Abstract: The aim of this study is to forecast the number of free-riders in P2P networks which can help
network managers to know the status of the networks in advance and take appropriate measures to cope
with free-riding behavior. Free-riding behavior is common in P2P networks, which has a negative impact
on the robustness, availability and stability of the networks. Severe free-riding behavior may lead to the
crash of the whole P2P application system. Based on the research of free-riding behavior in P2P networks,
this paper constructs a free-rider forecasting model (GST model) using Gray System Theory. Simulation
experiments show that this model has high feasibility, and can carry out reasonable predictions on the
number of free-riders in P2P networks.
Keywords: Free-riding, gray system theory, p2p networks
INTRODUCTION
Since the birth of Peer-to-Peer (P2P) networks, it is
equipped with the idea of information sharing and
service (Xu et al., 2010). Some P2P networking
technology based file sharing systems, such as
Gnutella, eDonkey and BitTorrent, are very popular.
The number of their online users is sometimes more
than one million worldwide (Liao et al., 2006).
However, whether each node in P2P networks puts the
idea of information sharing into practice remains
proven by flow measurements and statistical analysis.
Adar and other researchers pointed out that there is
a huge difference for each node in the aspects of
information sharing or networks maintenance in
Gnutella (Adar and Huberman, 2000). Most nodes don't
share files, or only share few files, some even share the
files hardly accessed by other people. The purpose of
many nodes joining P2P networks is obtaining the
service provided by other nodes, but not willingly
contributing to the networks (Ramaswamy and Liu,
2003). This phenomenon that inconsistent with
collaborative sharing ideas promoted by P2P
communication mode is called free-riding behavior. In
order to make P2P networks exert its due role, the
research of free-riding behavior is imperative.
Gray System Theory holds that in spite of the
obscurity of system behavior and the complexity of the
data, which has its order and overall function. Before
setting up the gray forecasting model, data processing
of the original sequence is needed firstly and the
preprocessed data sequence is called generated column.
The purpose of preprocessing the original data isn't
looking for its statistical rule and probability
distribution, but turning the chaotic data into regular
sequence data using a certain approach. Then, a
dynamic model is established (Deng, 1990). This study
introduces free-riding behavior and Gray System
Theory (Gray Forecasting) firstly, then the algorithm
flow and steps of GST model are given, finally, the
model is tested by a group of simulation experiments
and the experimental results are analyzed.
Free-riding behavior:
Definition 1 free-riding Yu and Jin (2008): The
behavior of the nodes in P2P networks which only
enjoying the information resource services but not
contributing to the system is called free-riding.
Definition 2 free-rider Yu and Jin (2008): The node
equipped with free-riding behavior is called a free-rider.
To ensure the healthy, secure and reliable operation
of P2P networks, it is necessary to predict the number
of free-riders in advance, then the current and future
states of P2P networks are known in time and
Corresponding Author: He Xu, College of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003,
China
4423
Res. J. Appl. Sci. Eng. Technol., 4(21): 4423-4428, 2012
appropriate ways can be adopted to inhibit the severe
free-riding behavior.
Step 2: Setting up the first-order linear differential
equation of s (1) (t ) :
ds (1)
+ as (1) = u
dt
Gray system theory (gray forecasting): Gray model
has strict theoretical basis (that is, Gray System
Theory), the biggest advantage of which is utility. The
results forecasted by the gray model are relatively
stable. It is not only suitable for the forecasting of a
large amount of data, but the forecasting results are also
accurate when the amount of data is small (Deng,
1990).
The basics of gray forecasting: Gray forecasting is
built on Gray System Theory and it reaches the purpose
of forecasting the future development trends of things:
firstly, identifying the differences of the development
trends between system factors; secondly, searching the
changing discipline of system by generation processing
on the original data; thirdly, generating the data
sequence with strong regularity; lastly, setting up the
relevant differential equation model. After the reverse
processing of the forecasted values obtained by the
generated data model, the data of gray forecasting is
obtained (Audun et al., 2007).
The basic steps of gray forecasting: The single
sequence first-order linear differential equation model
GM (1,1) in gray system is the most commonly used
among the numerous gray models. Take this model as
an example, this subsection introduces the basic steps
of gray forecasting (Deng, 1990).
Suppose the original data column is
(0)
s
= ( s ( 0 ) (1 ), s ( 0 ) ( 2 ),..., s ( 0 ) ( n )) , n is the number of
data. The basic steps of the algorithm which is setting
up GM (1,1) to achieve forecasting capabilities are as
follows:
where, in a, u are undetermined coefficients,
called development factor and gray actuating
quantity, respectively. The effective range of a
is (−2, 2) and the matrix composed by a, u is
(1)
^
⎛ a ⎞ . As long as a, u are calculated, s (t )
a=⎜ ⎟
⎝u⎠
can be obtained, and the future predicted values
(0)
of s are calculated.
Step 3: Accumulation matrix B and constant vector Cn
are generated by taking the mean of the
accumulated generating data, that is:
⎡ 0 .5( s (1) (1) + s (1) ( 2 ))
⎤
⎢
⎥
(1)
(1)
B = ⎢ 0 .5( s ( 2) + s (3)) ⎥
⎢ 0 .5( s (1) ( n − 1) + s (1) ( n )) ⎥
⎣
⎦
s
= (s (1), s (2),..., s (n))
(1)
(1)
(1)
(5)
Cn = (s (0) (2), s (0) (3),..., s ( 0) (n))T
(6)
^
Step 4: The gray parameter a is calculated using leastsquares method, that is:
^
⎛ a ⎞
a = ⎜ ⎟ = ( B T B ) −1 B T C
⎝ u ⎠
Step 5: The gray parameter
^
a
(7)
n
is substituted into
ds (1 )
+ as (1 ) = u , then:
dt
Step 1: The accumulation of the original data obtains
new data sequence:
(1)
(4)
^ (1 )
s
( t + 1) = ( s ( 0 ) (1) −
(1)
Since
(1)
where, in the data in s (t ) represents the
accumulation of the corresponding first several
data, that is:
^
a
u − at
u
)e
+
a
a
(8)
is an approximation calculated by
least-squares
method,
approximate expression.
^ (1)
s (t + 1)
is
an
^ (1)
t
s (1) (t ) = ∑ s ( 0) (k ), t = 1,2,..., n
(2)
Step 6: Dispersing the function expressions of s (t +1)
s (t + 1) = ∑ s (k ), t = 1,2,..., n
(0)
s ( 0) sequence is restituted and the
approximate
obtained:
t +1
(1)
^ (1)
and s (t ) ,
k =1
(3)
k =1
4424 data
sequence
^ (0)
s (t + 1)
is
Res. J. Appl. Sci. Eng. Technol., 4(21): 4423-4428, 2012
^ (0)
s
^ (1 )
( t + 1) = s
^ (1 )
( t + 1) − s
(9)
(t )
Step 7: Using this model to forecast:
⎡^ 0 ^ 0
⎤
^0
^0
^0
^0
s = ⎢s (1), s (2),...,s (n),s (n + 1), s (n + 2),...,s (n + m)⎥
4442444
31444442444443⎥
⎢1
The forecasting of futureseries
⎣ Simulationof theoriginalseries
⎦
(10)
The approximation is calculated by least-squares
method, so there is inevitable deviation in this model.
The steps of the testing on the established gray model
are as follows.
Calculating the residual e (0) (t ) and relative error
( 0)
q(s) between s(0) and
^ (0)
s
Table 1: Gray model accuracy testing table
Relative
Small error
Rank
error q
probability P
<0.01
>0.95
Level Ⅰ
<0.05
<0.80
Level Ⅱ
Variance
ratio D
<0.35
<0.50
Level Ⅲ
<0.10
<0.70
<0.65
Level Ⅳ
>0.20
<0.60
>0.80
(t ) :
^ (0)
e ( 0 ) (t ) = s ( 0 ) − s
q(s) =
•
•
•
•
e
s
(0)
(0)
(11)
(t )
(t )
(t )
(12)
Calculating the average and variance f1 of the
original data s(0)
Calculating the average ‫ݍ‬ത of e(0) (t) and the
variance of residual f2
௙
Calculating the variance ratio ‫ ܦ‬ൌ మ
௙భ
Calculating
the
small
error
P = P { e ( t ) < 0 . 6745 f 1 }
probability
Fig. 1: The algorithm flow of GST model
Testing the results according to the gray model
accuracy testing table (Table 1):
In the process of practical application, the method
of testing the accuracy of the model is not unique. The
above approach can be used to test the gray model, and
the justifiability of the model can be judged by the
combination of error percentage of q(s) and the test
results between the actual data and the forecasted data.
GST model: GST model uses the number of free-riders
of the past in P2P networks as the original data, and
makes use of gray model to calculate the number of
free-riders in the future, so as to know the development
of the networks in advance and suitable ways can be
adopted to inhibit the severe free-riding behavior.
The algorithm flow of GST model: The algorithm
flow of GST model is shown in Fig. 1.
The solution steps of GST model: According to Fig.
1, the solution steps of GST model are as follows:
Step 1: Using proactive measurement or passive
measurement to measure the number of freeriders of the past in P2P networks and using it
as the original data.
Step 2: Accumulating the input original data and new
data sequence is obtained.
Step 3: Constructing accumulation matrix B and
constant matrix Cn.
Step 4: Using least-squares method to calculate gray
∧
parameter a .
Step 5: Bringing the gray parameter into the
forecasting model to forecast the data.
Step 6: Outputting the calculated forecasted data and a
comparison is made between the forecasted
data and the original data.
4425 Res. J. Appl. Sci. Eng. Technol., 4(21): 4423-4428, 2012
As can be seen from Fig. 2, the number of freeriders in this P2P networks are increasing. Figure 3 is
the cylinder comparison chart of the original data and
the forecasted data from year 2002 to year 2011.
According to formula 11 and 12, calculating the
relative error and the accuracy level, as shown in
Table 3.
As can be seen from Table 3, applying GST model
SIMULATION EXPERIMENTS AND RESULTS
ANALYSIS
In order to verify GST model, this subsection uses
the number of free-riders of the past 10 years (20022011) as the original data. In the group of simulation
experiments, the number of free-riders of each P2P
networks are increasing, which belongs to normal
networks. The operating conditions of this model are
verified using two groups of simulation experiments.
What's more, the accuracy of this model is tested. The
simulation experiments are based on MATLAB 7.0.
Taking the increasingly popular of P2P
applications into account, the number of free-riders of
most P2P networks will present a growing trend. In the
group of simulation experiments, two normal P2P
networks are included. The purposes of the simulation
experiments are verifying whether GST model is
effective in normal P2P networks with different scales
and calculating the accuracy of this model.
From year 2002 to year 2011, the number of freeriders in the two P2P networks (P2P networks 1 and 2)
are presented in Table 2.
−
into P2P networks 1, the average relative error q is
0.0402. Moreover, the variance ratio D is 0.3213, and
the small error probability P is 0.6754. Compared with
Table 1, the accuracy belongs to level Ⅱ, indicating
that the forecasted results of this model are relatively
accurate. At the same time, this model is easy to be
realized and has high practical value.
Results analysis of P2P networks 2: The purposes of
this simulation are verifying the applicability of GST
model in large-scale P2P networks and calculating the
accuracy. Bringing the original data of P2P networks 2
into GST model, obtaining the comparison of the
original data and the forecasted data, as shown in Fig. 4.
As can be seen from Fig. 4, the number of free-riders in
this P2P networks are also increasing. Figure 5 is the
cylinder comparison chart of the original data and the
forecasted data from year 2002 to year 2011.
According to formula 11 and 12, calculating the
relative error and the accuracy level, as shown in
Table 4.
Results analysis of P2P networks 1: The purposes of
this simulation are verifying the applicability of GST
model in small-scale P2P networks and calculating the
accuracy. Bringing the original data of P2P networks 1
into GST model, obtaining the comparison of the
original data and the forecasted data, as shown in
Fig. 2.
Table 2: The number of free-riders in P2P networks
Time/Y
2002
2003
2004
Networks 1/10000
1.5
2.0
2.5
Networks 2/10000
20.0
26.0
31.0
Table 3: The accuracy level of P2P networks 1
Time/y
2002
2003
2004
Relative error q
0
0.1659
0.0560
−
0.0402
2005
3.0
34.0
2005
0.0036
2006
3.4
40.0
2006
0.0045
2007
4.0
45.0
2007
0.0419
2008
4.6
51.0
2008
0.0567
2009
5.0
55.0
2009
0.0174
2010
5.6
61.0
2010
0.0066
2011
6.0
65.0
2011
0.0498
q
Accuracy level
Level Ⅱ
Table 4: The accuracy level of P2P networks 2
Time/y
2002
2003
2004
Relative error q
0
0.0882
0.0174
−
0.0293
2005
0.0340
2006
0.0202
2007
0.0292
2008
0.0451
2009
0.0129
2010
0.0079
2011
0.0378
q
Accuracy level
Level Ⅱ
Table 5: The forecasted values of the number of free-riders in P2P networks
Time/y
2012
2013
2014
2015
2016
Networks 7.132
8.076
9.144
10.353
11.722
1/10000
Networks 75.200
83.830
93.450
104.170
116.120
2/10000
4426 2017
13.273
2018
15.029
2019
17.016
2020
19.267
2021
21.816
129.440
144.290
160.850
179.300
199.880
The original data
The forecasted data
20
15
10
5
7
30
20
10
5
4
3
2
1
20
09
201
0
20
11
20
08
20
02
20
03
20
04
20
05
20
06
20
07
0
Year/y
Fig. 3: The comparison of the original data and the forecasted
data from year 2002 to year 2011 in P2P networks 1
200
The original data
The forecasted data
180
20
09
201
0
20
11
20
08
20
02
20
03
20
04
20
05
20
06
20
07
201
6
20
18
20
20
202
2
201
0
20
12
20
14
The original data
The forecasted data
6
160
140
120
100
80
60
40
This group of simulation experiments shows that
GST model is capable of forecasting the number of
free-riders in normal P2P networks, and the accuracy is
relatively high.
Table 5 reflects the number of free-riders in these
two P2P networks in the coming few years. We can see
that the numbers of free-riders of these two P2P
networks are increasing, it is necessary for the operators
of P2P networks to take appropriate measures to cope
with severe free-riding behavior.
In this section, for the purpose of verifying the
correctness and feasibility of GST model, simulation
experiments are set up. Simulation experiment contains
two normal P2P networks, that is, the number of freeriders are increasing and the free-riding behavior is
worsening. For P2P networks, the number of free-riders
can be forecasted using GST model. As can be seen
from the comparison of the original data and the
forecasted data, the deviations of the forecasted data are
relatively small, the accuracy of the model is within the
acceptable range. So, GST model are effective and
highly feasible, which can be used to forecasting the
number of free-riders in normal P2P networks.
CONCLUSION
201
6
20
18
20
20
202
2
20
10
20
12
20
14
6
20
08
200
20
04
20
20
02
50
40
Fig. 5: The comparison of the original data and the forecasted
data from year 2002 to year 2011 in P2P networks 2
Fig. 2: Gray forecasting the number of free-riders in P2P
networks 1
The number of free-riders/10000
The original data
The forecasted data
Year/y
Year/y
The number of free-riders/10000
70
60
0
0
200
2
200
4
20
06
20
08
The number of free-riders/10000
25
The number of free-riders/10000
Res. J. Appl. Sci. Eng. Technol., 4(21): 4423-4428, 2012
Year/y
Fig. 4: Gray forecasting the number of free-riders in P2P
networks 2
As can be seen from Table 4, applying GST model into
P2P networks 2, the average relative error ‫ݍ‬ത is 0.0293.
What's more, the variance ratio D is 0.2519 and the
small error probability P is 0.5431. The accuracy
belongs to level Ⅱ, which indicates that the forecasted
results of this model are relatively accurate and this
model has high application value.
This study analyzes the free-riding behavior in P2P
networks firstly, including the definitions of the
concepts related to free-riding, the measurements of
free-riding behavior and the impacts of free-riding of
P2P networks, etc., Then, Gray model (Gray System
Theory) is illustrated in detail, and the free-rider
forecasting model-GST model for free-riding on P2P
networks is constructed based on Gray System Theory.
Finally, the groups of simulation experiments are built
to verify the correctness and feasibility of GST model
and the experimental results are analyzed and
compared. GST model are effective and highly feasible,
4427 Res. J. Appl. Sci. Eng. Technol., 4(21): 4423-4428, 2012
which can be used to forecasting the number of freeriders in normal P2P networks. The results show that
the GST model in this paper has the advantages of
simple operation and strong practicability, which can be
used to forecasting the number of free-riders in future
P2P networks.
Institutions (PAPD). The authors would like to thank
the editors and the anonymous reviewers, who provide
insightful and constructive comments for improving
this study.
ACKNOWLEDGMENT
Adar, E. and B. Huberman, 2000. Free riding on
Gnutella. First Monday, 5(10): 32-35.
Audun, J., I. Roslan and B. Colin, 2007. A survey of
trust and reputation systems for online service
provision. Decis. Support Syst., 43(2): 618-644.
Deng, J.L., 1990. Grey System Theory Tutorial.
Huazhong University Press, Wuhan, China.
Liao, X.F., H. Jin, Y.H. Liu, L.M. Ni and D.F. Deng,
2006. Any See: Peer-to-Peer Live Streaming. IEEE
Infocom, pp: 1-10.
Ramaswamy, L. and L. Liu, 2003. Free riding: A new
challenge to peer-to-peer file sharing systems.
Proceedings of the 36th Hawaii International
Conference on System Sciences, Hawaii, pp:
220-229.
Xu, H., S.P. Wang, R.C. Wang, Y. Rao and X. Shao,
2010. Improving QoS in peer-to-peer streaming
media system. J. Comput. Inform. Syst., 6(5):
1387-1395.
Yu, Y.J. and H. Jin, 2008. A survey on overcoming free
riding in peer-to-peer networks. Chinese J. L
Comput., 31(1): 1-15.
The subject is sponsored by the National Natural
Science Foundation of P. R. China (No. 60973139,
61170065,
61171053,
61003039,
61003236,
61103195), the Natural Science Foundation of Jiangsu
Province (BK2011755), Scientific and Technological
Support Project (Industry) of Jiangsu Province (No.
BE2010197, BE2010198, BE2011844, BE2011189),
Natural Science Key Fund for Colleges and
Universities in Jiangsu Province (11KJA520001),
Project sponsored by Jiangsu provincial research
scheme of natural science for higher education
institutions
(10KJB520013,
11KJB520014,
11KJB520016), Scientific Research and Industry
Promotion Project for Higher Education Institutions
(JH2010-14, JHB2011-9), Postdoctoral Foundation
(20100480048), Science and Technology Innovation
Fund for higher education institutions of Jiangsu
Province (CX10B-196Z, CX10B-199Z, CX10B-200Z,
CXZZ11-0405, CXZZ11-0406)、Doctoral Fund of
Ministry of Education of China (20103223120007,
20113223110002) and key Laboratory Foundation of
Information Technology processing of Jiangsu Province
(KJS1022), A Project Funded by the Priority Academic
Program Development of Jiangsu Higher Education
REFERENCES
4428 
Download