Research Journal of Applied Sciences, Engineering and Technology 4(21): 4423-4428, 2012 ISSN: 2040-7467 © Maxwell Scientific Organization, 2012 Submitted: May 01, 2012 Accepted: June 01, 2012 Published: November 01, 2012 A Free-Rider Forecasting Model Based on Gray System Theory in P2P Networks 1, 3, 4 He Xu, 1Zhao-xiong Zhou, 2Suo-ping Wang and 1, 3, 4Ru-chuan Wang College of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003, China 2 College of Automation 3 Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Jiangsu, Nanjing 210003, China 4 Key Lab of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education Jiangsu Province, Nanjing 210003, China 1 Abstract: The aim of this study is to forecast the number of free-riders in P2P networks which can help network managers to know the status of the networks in advance and take appropriate measures to cope with free-riding behavior. Free-riding behavior is common in P2P networks, which has a negative impact on the robustness, availability and stability of the networks. Severe free-riding behavior may lead to the crash of the whole P2P application system. Based on the research of free-riding behavior in P2P networks, this paper constructs a free-rider forecasting model (GST model) using Gray System Theory. Simulation experiments show that this model has high feasibility, and can carry out reasonable predictions on the number of free-riders in P2P networks. Keywords: Free-riding, gray system theory, p2p networks INTRODUCTION Since the birth of Peer-to-Peer (P2P) networks, it is equipped with the idea of information sharing and service (Xu et al., 2010). Some P2P networking technology based file sharing systems, such as Gnutella, eDonkey and BitTorrent, are very popular. The number of their online users is sometimes more than one million worldwide (Liao et al., 2006). However, whether each node in P2P networks puts the idea of information sharing into practice remains proven by flow measurements and statistical analysis. Adar and other researchers pointed out that there is a huge difference for each node in the aspects of information sharing or networks maintenance in Gnutella (Adar and Huberman, 2000). Most nodes don't share files, or only share few files, some even share the files hardly accessed by other people. The purpose of many nodes joining P2P networks is obtaining the service provided by other nodes, but not willingly contributing to the networks (Ramaswamy and Liu, 2003). This phenomenon that inconsistent with collaborative sharing ideas promoted by P2P communication mode is called free-riding behavior. In order to make P2P networks exert its due role, the research of free-riding behavior is imperative. Gray System Theory holds that in spite of the obscurity of system behavior and the complexity of the data, which has its order and overall function. Before setting up the gray forecasting model, data processing of the original sequence is needed firstly and the preprocessed data sequence is called generated column. The purpose of preprocessing the original data isn't looking for its statistical rule and probability distribution, but turning the chaotic data into regular sequence data using a certain approach. Then, a dynamic model is established (Deng, 1990). This study introduces free-riding behavior and Gray System Theory (Gray Forecasting) firstly, then the algorithm flow and steps of GST model are given, finally, the model is tested by a group of simulation experiments and the experimental results are analyzed. Free-riding behavior: Definition 1 free-riding Yu and Jin (2008): The behavior of the nodes in P2P networks which only enjoying the information resource services but not contributing to the system is called free-riding. Definition 2 free-rider Yu and Jin (2008): The node equipped with free-riding behavior is called a free-rider. To ensure the healthy, secure and reliable operation of P2P networks, it is necessary to predict the number of free-riders in advance, then the current and future states of P2P networks are known in time and Corresponding Author: He Xu, College of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003, China 4423 Res. J. Appl. Sci. Eng. Technol., 4(21): 4423-4428, 2012 appropriate ways can be adopted to inhibit the severe free-riding behavior. Step 2: Setting up the first-order linear differential equation of s (1) (t ) : ds (1) + as (1) = u dt Gray system theory (gray forecasting): Gray model has strict theoretical basis (that is, Gray System Theory), the biggest advantage of which is utility. The results forecasted by the gray model are relatively stable. It is not only suitable for the forecasting of a large amount of data, but the forecasting results are also accurate when the amount of data is small (Deng, 1990). The basics of gray forecasting: Gray forecasting is built on Gray System Theory and it reaches the purpose of forecasting the future development trends of things: firstly, identifying the differences of the development trends between system factors; secondly, searching the changing discipline of system by generation processing on the original data; thirdly, generating the data sequence with strong regularity; lastly, setting up the relevant differential equation model. After the reverse processing of the forecasted values obtained by the generated data model, the data of gray forecasting is obtained (Audun et al., 2007). The basic steps of gray forecasting: The single sequence first-order linear differential equation model GM (1,1) in gray system is the most commonly used among the numerous gray models. Take this model as an example, this subsection introduces the basic steps of gray forecasting (Deng, 1990). Suppose the original data column is (0) s = ( s ( 0 ) (1 ), s ( 0 ) ( 2 ),..., s ( 0 ) ( n )) , n is the number of data. The basic steps of the algorithm which is setting up GM (1,1) to achieve forecasting capabilities are as follows: where, in a, u are undetermined coefficients, called development factor and gray actuating quantity, respectively. The effective range of a is (−2, 2) and the matrix composed by a, u is (1) ^ ⎛ a ⎞ . As long as a, u are calculated, s (t ) a=⎜ ⎟ ⎝u⎠ can be obtained, and the future predicted values (0) of s are calculated. Step 3: Accumulation matrix B and constant vector Cn are generated by taking the mean of the accumulated generating data, that is: ⎡ 0 .5( s (1) (1) + s (1) ( 2 )) ⎤ ⎢ ⎥ (1) (1) B = ⎢ 0 .5( s ( 2) + s (3)) ⎥ ⎢ 0 .5( s (1) ( n − 1) + s (1) ( n )) ⎥ ⎣ ⎦ s = (s (1), s (2),..., s (n)) (1) (1) (1) (5) Cn = (s (0) (2), s (0) (3),..., s ( 0) (n))T (6) ^ Step 4: The gray parameter a is calculated using leastsquares method, that is: ^ ⎛ a ⎞ a = ⎜ ⎟ = ( B T B ) −1 B T C ⎝ u ⎠ Step 5: The gray parameter ^ a (7) n is substituted into ds (1 ) + as (1 ) = u , then: dt Step 1: The accumulation of the original data obtains new data sequence: (1) (4) ^ (1 ) s ( t + 1) = ( s ( 0 ) (1) − (1) Since (1) where, in the data in s (t ) represents the accumulation of the corresponding first several data, that is: ^ a u − at u )e + a a (8) is an approximation calculated by least-squares method, approximate expression. ^ (1) s (t + 1) is an ^ (1) t s (1) (t ) = ∑ s ( 0) (k ), t = 1,2,..., n (2) Step 6: Dispersing the function expressions of s (t +1) s (t + 1) = ∑ s (k ), t = 1,2,..., n (0) s ( 0) sequence is restituted and the approximate obtained: t +1 (1) ^ (1) and s (t ) , k =1 (3) k =1 4424 data sequence ^ (0) s (t + 1) is Res. J. Appl. Sci. Eng. Technol., 4(21): 4423-4428, 2012 ^ (0) s ^ (1 ) ( t + 1) = s ^ (1 ) ( t + 1) − s (9) (t ) Step 7: Using this model to forecast: ⎡^ 0 ^ 0 ⎤ ^0 ^0 ^0 ^0 s = ⎢s (1), s (2),...,s (n),s (n + 1), s (n + 2),...,s (n + m)⎥ 4442444 31444442444443⎥ ⎢1 The forecasting of futureseries ⎣ Simulationof theoriginalseries ⎦ (10) The approximation is calculated by least-squares method, so there is inevitable deviation in this model. The steps of the testing on the established gray model are as follows. Calculating the residual e (0) (t ) and relative error ( 0) q(s) between s(0) and ^ (0) s Table 1: Gray model accuracy testing table Relative Small error Rank error q probability P <0.01 >0.95 Level Ⅰ <0.05 <0.80 Level Ⅱ Variance ratio D <0.35 <0.50 Level Ⅲ <0.10 <0.70 <0.65 Level Ⅳ >0.20 <0.60 >0.80 (t ) : ^ (0) e ( 0 ) (t ) = s ( 0 ) − s q(s) = • • • • e s (0) (0) (11) (t ) (t ) (t ) (12) Calculating the average and variance f1 of the original data s(0) Calculating the average ݍത of e(0) (t) and the variance of residual f2 Calculating the variance ratio ܦൌ మ భ Calculating the small error P = P { e ( t ) < 0 . 6745 f 1 } probability Fig. 1: The algorithm flow of GST model Testing the results according to the gray model accuracy testing table (Table 1): In the process of practical application, the method of testing the accuracy of the model is not unique. The above approach can be used to test the gray model, and the justifiability of the model can be judged by the combination of error percentage of q(s) and the test results between the actual data and the forecasted data. GST model: GST model uses the number of free-riders of the past in P2P networks as the original data, and makes use of gray model to calculate the number of free-riders in the future, so as to know the development of the networks in advance and suitable ways can be adopted to inhibit the severe free-riding behavior. The algorithm flow of GST model: The algorithm flow of GST model is shown in Fig. 1. The solution steps of GST model: According to Fig. 1, the solution steps of GST model are as follows: Step 1: Using proactive measurement or passive measurement to measure the number of freeriders of the past in P2P networks and using it as the original data. Step 2: Accumulating the input original data and new data sequence is obtained. Step 3: Constructing accumulation matrix B and constant matrix Cn. Step 4: Using least-squares method to calculate gray ∧ parameter a . Step 5: Bringing the gray parameter into the forecasting model to forecast the data. Step 6: Outputting the calculated forecasted data and a comparison is made between the forecasted data and the original data. 4425 Res. J. Appl. Sci. Eng. Technol., 4(21): 4423-4428, 2012 As can be seen from Fig. 2, the number of freeriders in this P2P networks are increasing. Figure 3 is the cylinder comparison chart of the original data and the forecasted data from year 2002 to year 2011. According to formula 11 and 12, calculating the relative error and the accuracy level, as shown in Table 3. As can be seen from Table 3, applying GST model SIMULATION EXPERIMENTS AND RESULTS ANALYSIS In order to verify GST model, this subsection uses the number of free-riders of the past 10 years (20022011) as the original data. In the group of simulation experiments, the number of free-riders of each P2P networks are increasing, which belongs to normal networks. The operating conditions of this model are verified using two groups of simulation experiments. What's more, the accuracy of this model is tested. The simulation experiments are based on MATLAB 7.0. Taking the increasingly popular of P2P applications into account, the number of free-riders of most P2P networks will present a growing trend. In the group of simulation experiments, two normal P2P networks are included. The purposes of the simulation experiments are verifying whether GST model is effective in normal P2P networks with different scales and calculating the accuracy of this model. From year 2002 to year 2011, the number of freeriders in the two P2P networks (P2P networks 1 and 2) are presented in Table 2. − into P2P networks 1, the average relative error q is 0.0402. Moreover, the variance ratio D is 0.3213, and the small error probability P is 0.6754. Compared with Table 1, the accuracy belongs to level Ⅱ, indicating that the forecasted results of this model are relatively accurate. At the same time, this model is easy to be realized and has high practical value. Results analysis of P2P networks 2: The purposes of this simulation are verifying the applicability of GST model in large-scale P2P networks and calculating the accuracy. Bringing the original data of P2P networks 2 into GST model, obtaining the comparison of the original data and the forecasted data, as shown in Fig. 4. As can be seen from Fig. 4, the number of free-riders in this P2P networks are also increasing. Figure 5 is the cylinder comparison chart of the original data and the forecasted data from year 2002 to year 2011. According to formula 11 and 12, calculating the relative error and the accuracy level, as shown in Table 4. Results analysis of P2P networks 1: The purposes of this simulation are verifying the applicability of GST model in small-scale P2P networks and calculating the accuracy. Bringing the original data of P2P networks 1 into GST model, obtaining the comparison of the original data and the forecasted data, as shown in Fig. 2. Table 2: The number of free-riders in P2P networks Time/Y 2002 2003 2004 Networks 1/10000 1.5 2.0 2.5 Networks 2/10000 20.0 26.0 31.0 Table 3: The accuracy level of P2P networks 1 Time/y 2002 2003 2004 Relative error q 0 0.1659 0.0560 − 0.0402 2005 3.0 34.0 2005 0.0036 2006 3.4 40.0 2006 0.0045 2007 4.0 45.0 2007 0.0419 2008 4.6 51.0 2008 0.0567 2009 5.0 55.0 2009 0.0174 2010 5.6 61.0 2010 0.0066 2011 6.0 65.0 2011 0.0498 q Accuracy level Level Ⅱ Table 4: The accuracy level of P2P networks 2 Time/y 2002 2003 2004 Relative error q 0 0.0882 0.0174 − 0.0293 2005 0.0340 2006 0.0202 2007 0.0292 2008 0.0451 2009 0.0129 2010 0.0079 2011 0.0378 q Accuracy level Level Ⅱ Table 5: The forecasted values of the number of free-riders in P2P networks Time/y 2012 2013 2014 2015 2016 Networks 7.132 8.076 9.144 10.353 11.722 1/10000 Networks 75.200 83.830 93.450 104.170 116.120 2/10000 4426 2017 13.273 2018 15.029 2019 17.016 2020 19.267 2021 21.816 129.440 144.290 160.850 179.300 199.880 The original data The forecasted data 20 15 10 5 7 30 20 10 5 4 3 2 1 20 09 201 0 20 11 20 08 20 02 20 03 20 04 20 05 20 06 20 07 0 Year/y Fig. 3: The comparison of the original data and the forecasted data from year 2002 to year 2011 in P2P networks 1 200 The original data The forecasted data 180 20 09 201 0 20 11 20 08 20 02 20 03 20 04 20 05 20 06 20 07 201 6 20 18 20 20 202 2 201 0 20 12 20 14 The original data The forecasted data 6 160 140 120 100 80 60 40 This group of simulation experiments shows that GST model is capable of forecasting the number of free-riders in normal P2P networks, and the accuracy is relatively high. Table 5 reflects the number of free-riders in these two P2P networks in the coming few years. We can see that the numbers of free-riders of these two P2P networks are increasing, it is necessary for the operators of P2P networks to take appropriate measures to cope with severe free-riding behavior. In this section, for the purpose of verifying the correctness and feasibility of GST model, simulation experiments are set up. Simulation experiment contains two normal P2P networks, that is, the number of freeriders are increasing and the free-riding behavior is worsening. For P2P networks, the number of free-riders can be forecasted using GST model. As can be seen from the comparison of the original data and the forecasted data, the deviations of the forecasted data are relatively small, the accuracy of the model is within the acceptable range. So, GST model are effective and highly feasible, which can be used to forecasting the number of free-riders in normal P2P networks. CONCLUSION 201 6 20 18 20 20 202 2 20 10 20 12 20 14 6 20 08 200 20 04 20 20 02 50 40 Fig. 5: The comparison of the original data and the forecasted data from year 2002 to year 2011 in P2P networks 2 Fig. 2: Gray forecasting the number of free-riders in P2P networks 1 The number of free-riders/10000 The original data The forecasted data Year/y Year/y The number of free-riders/10000 70 60 0 0 200 2 200 4 20 06 20 08 The number of free-riders/10000 25 The number of free-riders/10000 Res. J. Appl. Sci. Eng. Technol., 4(21): 4423-4428, 2012 Year/y Fig. 4: Gray forecasting the number of free-riders in P2P networks 2 As can be seen from Table 4, applying GST model into P2P networks 2, the average relative error ݍത is 0.0293. What's more, the variance ratio D is 0.2519 and the small error probability P is 0.5431. The accuracy belongs to level Ⅱ, which indicates that the forecasted results of this model are relatively accurate and this model has high application value. This study analyzes the free-riding behavior in P2P networks firstly, including the definitions of the concepts related to free-riding, the measurements of free-riding behavior and the impacts of free-riding of P2P networks, etc., Then, Gray model (Gray System Theory) is illustrated in detail, and the free-rider forecasting model-GST model for free-riding on P2P networks is constructed based on Gray System Theory. Finally, the groups of simulation experiments are built to verify the correctness and feasibility of GST model and the experimental results are analyzed and compared. GST model are effective and highly feasible, 4427 Res. J. Appl. Sci. Eng. Technol., 4(21): 4423-4428, 2012 which can be used to forecasting the number of freeriders in normal P2P networks. The results show that the GST model in this paper has the advantages of simple operation and strong practicability, which can be used to forecasting the number of free-riders in future P2P networks. Institutions (PAPD). The authors would like to thank the editors and the anonymous reviewers, who provide insightful and constructive comments for improving this study. ACKNOWLEDGMENT Adar, E. and B. Huberman, 2000. Free riding on Gnutella. First Monday, 5(10): 32-35. Audun, J., I. Roslan and B. Colin, 2007. A survey of trust and reputation systems for online service provision. Decis. Support Syst., 43(2): 618-644. Deng, J.L., 1990. Grey System Theory Tutorial. Huazhong University Press, Wuhan, China. Liao, X.F., H. Jin, Y.H. Liu, L.M. Ni and D.F. Deng, 2006. Any See: Peer-to-Peer Live Streaming. IEEE Infocom, pp: 1-10. Ramaswamy, L. and L. Liu, 2003. Free riding: A new challenge to peer-to-peer file sharing systems. Proceedings of the 36th Hawaii International Conference on System Sciences, Hawaii, pp: 220-229. Xu, H., S.P. Wang, R.C. Wang, Y. Rao and X. Shao, 2010. Improving QoS in peer-to-peer streaming media system. J. Comput. Inform. Syst., 6(5): 1387-1395. Yu, Y.J. and H. Jin, 2008. A survey on overcoming free riding in peer-to-peer networks. Chinese J. L Comput., 31(1): 1-15. The subject is sponsored by the National Natural Science Foundation of P. R. China (No. 60973139, 61170065, 61171053, 61003039, 61003236, 61103195), the Natural Science Foundation of Jiangsu Province (BK2011755), Scientific and Technological Support Project (Industry) of Jiangsu Province (No. BE2010197, BE2010198, BE2011844, BE2011189), Natural Science Key Fund for Colleges and Universities in Jiangsu Province (11KJA520001), Project sponsored by Jiangsu provincial research scheme of natural science for higher education institutions (10KJB520013, 11KJB520014, 11KJB520016), Scientific Research and Industry Promotion Project for Higher Education Institutions (JH2010-14, JHB2011-9), Postdoctoral Foundation (20100480048), Science and Technology Innovation Fund for higher education institutions of Jiangsu Province (CX10B-196Z, CX10B-199Z, CX10B-200Z, CXZZ11-0405, CXZZ11-0406)、Doctoral Fund of Ministry of Education of China (20103223120007, 20113223110002) and key Laboratory Foundation of Information Technology processing of Jiangsu Province (KJS1022), A Project Funded by the Priority Academic Program Development of Jiangsu Higher Education REFERENCES 4428