1 Exploring Community Simulation Methods John Stevens Abstract Analysing a complete social network of a community provides a large amount of information about that community. As collecting complete network information on all bar micro communities is virtually imposable computer modelling techniques are used to model communities. The community features in simulations I chose to investigate included the physical geography of a community, the passing of time in the model, the degree of links when compared with the real world, the representation of the strength and type of those links in the community and the enabling of different characteristics of layers within a community. The modelling methods I investigated where probability modelling, several types of random modelling, multi level modelling and agent based modelling. With this investigation I confirm that agent based modelling is the best at modelling communities. 1. Introduction This paper explores several computer based simulation methods. Mapping a complete social network would facilitate a range of complex analyses, but in practice, the disinterest of some people, and the challenges of contacting all members, even from a small network, to capture the entire set of relations between members of that network, poses challenges making such a complete mapping almost impossible. This paper explores the potential simulation of complete networks by using several simulations generated in C+. The methods I explore in the rest of the paper are probability modelling, random modelling, multi level modelling and agent based modelling 2. Literature Review There are many problems to be faced when collecting complete social network data. Mainly I discovered the lack of interest from individuals willing to participate in completing questionnaires for various reasons. In spite of my range of efforts (described in more detail in Stevens 2008), the level of non-response created problems for my analysis. For practical and cost reasons, completed network studies had to be focused on small populations. But one solution that helped to alleviate this problem was to simulate the network data instead of collecting real data. This then gave the obvious problem that the network data is not truly representative of the social situation that real data would present. However, this approach did have the advantage that as long as network data was created using a simulation program that accounts for parameters from the social situation and that it is modelling. There are several methods for simulating and modelling social networks. These range from historic methods such as Marcov probability modelling (1931) and differing types of random modelling described in Holland and Leinhardt (1970) and Holland and Leinhardt (1979). The more recent methods such as multi level modelling detailed by Snijders (2001) and agent base modelling that was used in Doran and Gilbert (1995) are discussed further in the sections below. 2 2.1 Probability modelling One of the first methods proposed for modelling networks was to use the mathematical technique of probability. Andrei Markov proposed this in his seminal paper on Markov chains (1931). This is a discrete time stochastic process with the Markov property. In such a process, the past is irrelevant for predicting the future, given knowledge of the present. This technique was then later modified by Markov, to provide a continuous feedback loop by taking the current result of the Markov equation to be an input into the equation that is to be known as the continuous time of Markov chains. Markov random graphs and the evolution of statistical models for social networks are discussed in the papers of Frank and Strauss (1986). 2.2 Random modelling There are three main forms of random network models. The first is described by Holland and Leinhardt (1970) as P*, P1, P2, and later found described in Holland and Leinhardt (1979), as types of exponential random graphs. This is further developed to use the Logit regression by Wasserman and Pattison (1996). The random P* model was implemented unwittingly by Erdos-Renye, (1895), Watts-Strogatz, (1998) and Kleinberg (1999). In each of these models, a point represents a random individual and then connections are distributed at random. Random network models serve as a base from which we can build more sophisticated models of existing social networks. In these models, points correspond to persons, and edges then simulate social links. None of the random models adequately account for the dynamic changes that occur in social networks. For these models, a given number of points result in a fixed structure. Social links can begin to fade or even strengthen over time, and may be reinforced through repeated social contact, but I found that only the Kleinberg model can account for the geography. Nevertheless, the Kleinberg approach embeds individuals within a two-dimensional space. While assuming that any individual is more likely to have more friends living nearby, the uniform lattice Kleinberg structure does not reflect the less than uniform distributions of people into houses, streets, villages or city neighbourhoods, and countries. The random network models make for the assumption that all links are as equally strong, while the links in social networks can vary in intensity. For example, family ties often differ markedly from ties to co-workers. None of the models has link-strength or type. Similarly to all links being equal, all points are equal. There is no provision for age, gender, job type, and so forth, in these models. But a real social network has many various "layers", each with a different distance function and the points having links of varying degrees of closeness within each. While the Kleinberg model might adequately simulate neighbourhood level links, it cannot simultaneously account for links at the workplace or based on other organizational affiliations. For simplicity, let us consider only housing and work-based links. If we think of each individual as embedded within two distinct planes (that is, where you work and where you live) and each individual as a person making connections to other points in each plane, we will have a two-dimensional Kleinberg-type model (without the far links). But what we lack is a means to explain the connections between work and home locations. P* networks have been used by Robins, Pattison and Woolcock (2004) to investigate missing data and also by Albert, R, and Barabási, (2002) to investigate the mechanics of complex networks. None of the random models I looked into could account for any of these fundamental properties of social networks without significant modification. Of the three, the Kleinberg 3 model, offers the most relevant starting point. Yet, while it is possible that we can model the small-world properties of a social network accurately without explicitly modelling all the points, we will not know whether this is the case without more complex socially aware models. P1 modelling deals with Statistical models for triadic structure, transitivity/closure, intransitivity, openness and triadic local structures in the networks, by using the triad census different forces, such as transitivity, closure, and openness are considered. I look at how these different triadic forces can be tested as proposed by Holland and Leinhardt (1981). With P2 modelling Fienberg and Wasserman explore log linear models for analyzing dyadic independence in a network. They explore how these models also are able to incorporate characteristics of the actor and dyadic covariates (such as other ties). This is described in Fienberg and Wasserman (1985). 2.3 Multi level modelling An alternate way of modelling networks is by representing several differing networks in layers. Multi level modelling is also called stochastic in the literature. Multi level modelling is implemented in a computer program called Siena, and detailed in Snijders (2001). Siena is very good at dealing with a network’s nodes’ facets, but one limitation of Siena is it has no representation of the networks environment. More recently Snijders & Baerveldt. C (2003) investigated a multilevel network study of the effects of delinquent behaviour on friendship evolution. However, Nosh, in his book, Theories of Communication Networks (2003), details of his research interests including applications of systems theories of complexity to communication, the role of emergent networks within and between organizations, and collaboration technologies in the workplace multilevel modelling. 2.4 Agent based modelling Others have developed agent-based modelling packages, such as “Eos”, “Life” and “Swarm”. Swarm simulates the behaviour of insect colonies by representing each insect in its own right and with its own set of rules for how it should function in the environment or in this situation, a colony. One advantage of this method of simulation over multi level modelling is that it adds features of natural surroundings or built environment. Gilbert and Doran pioneered this simulation method with the EOS project. Their initial work is summarized in Gilbert and Doran (1995). This area is further described in Gilbert and Troitzsch (1999), which describes computer simulation of societies in general, its uses and its methods. Methods described include queuing models, multilevel models and multi agent models. Examples of multi agent models are the EOS project from the University of Essex (Gilbert and Doran, 1995) and Swarm. A more detailed description on the “tools and techniques” of simulation can be found in Suleiman R, Troitzsch and Gilbert, (2000). This method of multi agent modelling has the advantage of accounting for individual agency in the formation and continuation of links to other individuals in the community, but it does not represent a community as a network. My own implementation reported in the later part of chapter 7, deals with the protocols for agent interaction, collaboration, communication and languages. 3. Research Question and Simulation Judging Criteria The specific research question I wish to answer in this chapter is: Which is the best method of computer modelling to use when modelling communities such as random modelling, multi-level or agent based modelling, 4 The criteria used to judge the performance of computer simulation program at producing a model of a community is how ell thee simulation program fulfils the following requirements: Representing the physical geography of a community. Representing the passing of time Representing the degree of links in the real world Representing the strength of those links in the community Representing the types of link within a community Enabling the different characteristics of layers within a community 4. Methods The six methods I describe in this section include four methods which all are can be loosely described as random network modelling methods including Erdos-Renyi graphs, BarabásiAlbert graphs, Watts-Strogatz graphs and Kleinberg graphs. The fifth method I will investigate is a multi-layered network modelling method which I developed and the final method an agent based network modelling method. 4.1 Erdos-Renyi graphs The first method of social network simulation I investigate is Erdos-Renyi graphs. A description of which has been translated for the paper Erdos, P. & Renyi, A. (1959), although this technique dates back several centuries. This method creates a graph with n points and independently fills each of the possible n(n-1)/2 undirected edges with probability p = m / (n1). m is therefore the expected mean degree. 4.2 Barabási-Albert graphs The next modelling method I examine is one developed by Barabási, Albert and is described in Barabási, Albert-lászló, (2002). This method is also known as “Scale Free Networking”. I take a totally disconnected graph of m points 1,...,m. We inductively build up the simple graph by supposing we already have points 1,...,r-1. We then add point r and simultaneously create m simple edges between r and points sampled (with repetitions re-sampled) from 1,...,r-1 such that the probability of choosing a point s is proportional to the degree of s (this is the degree in the graph before adding any of the m links from r, i.e. we only update the sampling distribution after finishing adding all m links from each point r). (In particular for the point m+1 we create edges to all other existing points 1,...,m). We stop when we have added the point r = n. The number of edges after adding the point r is m(r-m) so the mean degree of the final graph is 2m(1 - m/n). 4.3 Watts-Strogatz graphs The next modelling method I consider is one developed by Watts and Strogatz and is described in Watts, D. & Strogatz S. (1998). I take points {0,1,...,n-1} arranged in a clockwise fashion around a ring. Initially each point is connected with an undirected edge to k points, the k/2 nearest to our point clockwise around the ring and k/2 nearest 5 anticlockwise (in particular k has to be even). I.e. move a fraction of these edges using the following algorithm: We start with the point 0 (the "base point") and the edge between 0 and the next point, if around in a clockwise fashion, i.e. the edge (0, 1). With probability p we replace ("rewire") this edge with the still undirected edge (0, q), where q is chosen uniformly from the ring elements such that q!=0 and (0, q) was not an edge in the graph before the rewiring; otherwise we leave the edge in place. We then consider base point 1 and the edge (1,2), which we rewire with probability p as before. We continue until all edges between points and nearest neighbours clockwise have been considered. Then we return to base point 0 and the point clockwise from 0 (i.e. test the edge (0, 2)) and proceed round the ring as before. This continues until all edges have been considered, i.e. after k/2 passes around the ring. The final graph still contains nk/2 edges, so the mean degree is k. The expected "mean close degree" is k (1-p) (i.e. the expected number of not rewired edges from each point) and the expected "mean far degree" is kp. 4.4 Kleinberg 2D graphs The next modelling method I chose to look into introduces the concept of physical geography into the model. This model was developed by Kleinberg 2D graph and is described in Kleinberg, J. (1999). This method positions all nodes individually in an X by X lattice not over lapping with each other. We take an nxn square lattice of points (so each point is represented as (x, y) for 1 <= x,y <= n). Define the lattice distance d ((x, y), (u, v)) = |x-u| + |y-v|. From each (x, y) we create: Directed vectors to all (u, v) with (u, v) !=(x, y) and d(((x,y),(u,v)) <= p, the "close" vectors. q directed vectors to (u,v) with d((x,y),(u,v)) > p. These are sampled independently (with repetitions resample) from a distribution such that each (u,v) has probability proportional to d((x,y),(u,v))^(-r), the "far" links. For a point in the "centre" of the lattice (i.e. with x, y such that p < x, y < n-p) we get 2p(p+1) close vectors that are in fact reciprocated. The reason for the somewhat odd degree distribution is that the points not in the centre have less close links as, unlike the Watts-Strogatz model; we do not have a periodic boundary condition. The mean out degree is approximately 2p(p+1) + q, the more points the better the approximation. The "close degree" of each point (i.e. the number of close vectors) is approximately 2p(p+1) and each "far degree" is q (i.e. the number of vectors per point which were chosen at random, whether or not they actually happen to be far away in a lattice-distance sense). 4.5 Multi-Layered Network Modelling Method This work represents the simulation of multiple tiers of lattices representing acquaintance/friendship links. Each individual is represented by a point that has a home location (position in the tier 1 lattice), a work location (tier 2), followed by social and family location, so that the non-home locations are derived from the home location based on a distribution. 6 4.6 Agent Based Network Modelling Method I implement agent based computer simulation. Of the many problems to be faced when attempting to collect complete social network data, the most difficult was a lack of interest from individuals in the population being studied in completing the questionnaires. For various reasons, in spite of my range of efforts (described in more detail in Stevens,2008), the level of non-response created problems for this analysis. For practical and cost reasons, my complete network studies had to be focused on small populations. I chose to simulate the network data instead of collecting real data. This then showed the obvious problem that the network data was not truly representative of the social situation that real data would present. The method I implemented was a new agent-based simulation using parameters taken from census reports detailing changes in population and in physical and social parameters over the past 100 years. The simulation ran for a community with a population of approximately 10,000 individuals, for a simulated period of 100 years. This approach allowed all individuals within the network to behave independently of each other. This method was chosen because it is agent based and represented the individual within the model, rather than looking at the behaviour of a community as a whole. The simulation program was written in C++ for processing speed. Dr. Mark Boddington wrote the program to my specification. In this simulation, times are in days, distances in meters. The Agent Based model represents overlapping community and social networks, made up of links on behalf of family, relationships, work, and local housing, and cantered in differing physical locations. The model has a bitmap (i.e., a 2d array, such as Housing Density [x][y]) determining how many people can live in each small "square". Initially I only made this as a fixed "n" for squares within 2 miles. This was still flexible and allowed the simulation of a cluster of villages, or a town surrounded by a cluster of villages, etc., without changing the code. A second bitmap, which represented "workplace density", enabled people's jobs to be distributed in a similar fashion within this map. A third bitmap represented family locations. Originally I ignored the case of an agent moving to another location within the 7 area, but I did not exclude the possibility of taking this type of case into account later. When a new person was to be created, I simply placed them randomly on the bitmap, each square at XY having a probability of being picked proportional to housing density [x][y] (with a check that I didn’t overcrowd any square). Over time, I expected that the population density would match the housing density map. This bitmap was then expected to change over time too, which should give lots of potential change. All agents within the model had XY location co-ordinates that represented their housing location. Links between agents are predicted to be made and also broken within the model at different variable rates for males and females and for different types of association between individuals. All links between individuals had variable strength from 0 to 10; each link was one of the 5 types of links described in Table 1. I anticipated that the number and types of links would be extremely unpredictable; I assumed that some individuals would have retired and therefore not have any job links, and that a small number of individuals would not have a family social network. I also assumed that all children and a large number of adults do not have a relationship (e.g.: single parent). These social networks can change noticeably and also very quickly over time, as events such as moving location, relationship break-ups and changing jobs have been modelled. The link-strength is a variable whose value stops increasing over time. However, new links are continuously made over time as people’s lives take different courses. Table 1 shows how agents make or break links in the simulation. Type of Link Making of Links Breaking of Links Housing social network Gradual making links. The chance of Virtually instant break making an acquaintance is relative to when moving house distance Education and Job social network Fast making links Friendship social network Gradual making friends. Possibly Gradual breaking other types of links become friendship friendships links with time. Family social network Make at birth permanent link but of Break at death varying strength Relationship Gradual make over 2 years then becoming the strongest type of link with one strong link at a time! Relationship links are age dependant with most marriages in late 20s and early 30s. Virtually instant break when moving job but job links are over longer distance than other links of Instant break Marriages last on average 8 years with a 30% never splitting. Table 1: Making and breaking of links within the social network (Source: My own interpretation of computational agent’s types and roles) 8 Each model looped 100 years one step per day. Agents joined and left the model at set average rates to represent individuals’ births and the deaths. These rates were varied over the ‘years’ to simulate the effect of wars, increasing life expectancy and birth rate variation such as occurred in the 1960's and 1990's. Agents representing individuals also joined and left the model at set average rates to represent individuals who move into and out of an area. Initially the model started with nobody having any friends; the model then allowed the acquaintance-making code to build the network. Per day, the simulation model does the following: We create the required number of houses and jobs for a simulated year. Each unfilled job is filled with a fixed probability. If we fill a job, a link-less newcomer to the town moves in, moving into a house centred on the job location. Each friendship is broken with a fixed probability as that person moves with fixed probability. Similarly, the person switches jobs with fixed probability. The person makes new neighbours within a time, and random meetings occur. 5 Probability Modelling As was previously discussed in the section on probability modelling, the most commonly referenced type are Marcov chains (1971), which has been rejected by most, if not all subjects of academia and apart from including this reference I shall move onto the next section on random modelling. 6 Erdos-Renyi graphs The first method of social network simulation I investigate is Erdos-Renyi graphs. The table bellow shows the numerical output of the simulation computer program for simulations of four different sizes ranging for 10000 to 150000 simulated individuals. Graph size 10000 50000 100000 150000 Mean degree 49.9907 50.01 49.9873 49.9942 S.d of degrees 7.02816 7.06434 7.05878 7.07504 Min degree 25.6667 24.3333 22.3333 21 Max degree 77.3333 83.3333 83.6667 82.3333 Small-world number 2.77005 3.03656 3.26262 3.41715 Mean clustering c. 0.00500806 0.000989636 0.000508889 0.000329489 9 Table 2 - Erdos-Renyi simulation output (Source: output of my own implementation of Erdos-Renyi Social network simulation computer program averaged over 3 trials) I next present in the following figure the resultant network representation of a community from this simulation program. Figure 1 - Erdos-Renyi simulation resultant network (Source: http://en.wikipedia.org/wiki/ErdosRenyi_model) As can be seen the resultant graph is unlike a geographic map. Next is a cumulatively graph of the degree of a node (y axis) plotted against the number of nodes within that count (x axis) in the resultant simulated network. 10 Figure 2 – Erdos-Renyi simulation output (Source: output of my own implementation of Erdos-Renyi Social network simulation computer program using the following parameters m=50.00, mean of 3 trial(s), 2m00 per trial) The following list summaries the Erdos-Renyi modelling technique in relation to using this technique to model communities: Geography The Erdos-Renyi model dose not represent the geography of a community Time Fixed structure. Degree of links Bell curve is ok Link strength No link strength Link type There is no provision for age, gender, job type etc in the models Layers This model has no representation layers Table 3 – Erdos-Renyi simulation summary (Source: My own interpretation of Erdos-Renyi simulation models) The Erdos-Renyi model was no good for modelling all the areas required to model a community. For this reason I explore other types of random network modelling 7 Barabási-Albert graphs 11 The next modelling method I investigate was one developed by Barabási, Albert also none as “Scale Free Networking”. The statistical output of the modelling program for the network is given bellow for networks of four different sizes ranging from 10000 to 150000 simulated individuals. Graph size 10000 50000 100000 150000 Mean degree 49.875 49.975 49.9875 49.9917 S.d of degrees 51.4088 60.4992 63.9836 66.2624 Min degree 22 25 21.6667 25 Max degree 819.333 1949 2676 3621.33 Small-world number 2.66601 2.89743 2.97433 3.02736 0.0211512 0.00600621 0.00356949 0.00276578 Mean clustering c. Table 4 – Barabási-Albert simulation output (Source: output of my own implementation of Barabási-Albert social network simulation computer program averaged over 3 trials) I next present in the following figure the resultant network representation of a community from this simulation program. 12 Figure 3 - Barabási-Albert simulation resultant network (Source: http://en.wikipedia.org/wiki/Scale-free_network) As can be seen the resultant graph is unlike a geographic map. Next is a cumulatively graph of the degree of a node (y axis) plotted against the number of nodes within that count (x axis) in the resultant simulated network. Figure 4 – Barabási -Albert simulation output (Source: output of my own implementation of Barabási-Albert social network simulation computer program averaged over 3 trials) The following list summarizes the Barabási-Albert modelling technique in relation to using this technique to model communities: Geography The Barabási–Albert model dose not represent the geography of a community Time Each random model results in a fixed structure. Degree of links Unnatural Link strength None of the models has link strength. Link type There is no provision for age, gender, job type etc in the random models Layers This model has no representation layers Table 5 – Barabási-Albert simulation summary (Source: My own interpretation of Barabási-Albert simulation models) 13 The Barabási-Albert model was bad for modelling all areas required to model a community. For this reason chose to explore other types of random network modelling 8 Watts-Strogatz graphs The next modelling method I investigate was one developed by Watts and Strogatz. The statistical output of the modelling program for the network is given bellow for networks of four different sizes ranging from 10000 to 150000 simulated individuals. Graph size 10000 50000 100000 150000 Mean degree 50 50 50 50 S.d of degrees 4.82667 4.85487 4.84881 4.86355 Min degree 34 33 32 31.3333 Max degree 70 74 74 76.3333 Small-world number 2.77781 3.05266 3.2898 3.44409 0.0146803 0.0110893 0.0104561 0.0107154 Mean clustering c. Table 6 – Watts-Strogatz simulation output (Source: output of my own implementation of Watts-Strogatz Social network simulation computer program averaged over 3 trials) I next present in the following figure the resultant network representation of a community from this simulation program. 14 Figure 5 - Watts-Strogatz simulation resultant network (Source http://en.wikipedia.org/wiki/File:Network_Community_Structure.png) As can be seen the resultant graph is unlike a geographic map but dose at least have a physical structure but it can be said to be unnatural. Next is a cumulatively graph of the degree of a node (y axis) plotted against the number of nodes within that count (x axis) in the resultant simulated network. 15 Figure 6 – Watts-Strogatz simulation output (Source: output of my own implementation of Watts-Strogatz Social network simulation computer program averaged over 3 trials) The following list summarizes the Watts-Strogatz modelling technique in relation to using this technique to model communities: Geography The Watts-Strogatz model is ok but unnatural. Time The entire random model results in a fixed static structure. Degree of links Bell curve ok Link strength None of the random models has link strength. Link type There is no provision for age, gender, job type etc in the random models Layers This model has no representation layers Table 7 – Watts-Strogatz simulation summary (Source: My own interpretation of Watts-Strogatz simulation models) The Watts-Strogatz model was an unnatural improvement for modelling community geography but it was unsuitable for modelling all other areas required to model a community. For this reason I choose to explore other types of random network modelling 16 9 Kleinberg 2D graphs The next modelling method I chose to investigate, introduced the concept of physical geography into the model. This model was developed by Kleinberg. The statistical output of the modelling program for the network is given bellow for networks of four different sizes ranging from 10000 to 150000 simulated individuals. Graph size* 10000 49729 99856 149769 Mean degree (out) 49.8004 49.9104 49.9367 49.9483 S.d of degrees (out) 0.808801 0.547466 0.461076 0.417106 Min degree (out) 43 43 43 43 Max degree (out) 50 50 50 50 (Out) smallworld number 2.89756 3.39194 3.59781 3.70002 Mean (out) clustering c. 0.0933704 0.0710919 0.0651955 0.0623456 Table 8 – Kleinberg simulation output (Source: output of my own implementation of Kleinberg Social network simulation computer program averaged over 3 trials) * - Note the graph sizes differ slightly from those in the other trials. This is because the Kleinberg model is based on a square lattice, so the number of points has to be the square of an integer. I next present in the following figure the resultant network representation of a community from this simulation program. 17 Figure 7 - Kleinberg simulation resultant network (Source: http://en.wikipedia.org) As can be seen the resultant graph is unlike a geographic map but each cluster in the model can represent a com unity or village. Next is a cumulatively graph of the degree of a node (y axis) plotted against the number of nodes within that count (x axis) in the resultant simulated network. 18 Figure 8 – Kleinberg simulation output (Source: output of my own implementation of Kleinberg Social network simulation computer program averaged over 3 trials) The following list summarizes the Kleinberg modelling technique in relation to using this technique to model communities: Geography Kleinberg is the best random model Time Results are static with a fixed structure. Degree of links Unnatural Link strength No link strength Link type There is no provision for age, gender, job type etc in the random models Layers This model has no representation layers Table 9 – Kleinberg simulation summary (Source: My own interpretation of Kleinberg simulation models) The Kleinberg model was the best for modelling geography in a community with a matrix for a map but it was bad for modelling all other areas required to model a community. 10 Multi Layered Network Modelling The next modelling method I investigate was the multi layered network model developed by myself. The statistical output of the modelling program for the network is given bellow for networks of four different sizes ranging from 10000 to 150000 simulated individuals. 19 Graph size 10000 50000 100000 150000 Mean degree 48.8404 50.0464 50.3359 50.5092 S.d of degrees 3.36403 2.96345 2.86358 2.80794 Min degree 35 35 35 35 Max degree 60 62 63 62 Small-world number 2.97411 3.48605 3.66708 3.76461 Mean clustering c. 0.171855 0.150188 0.142396 0.141109 Table 10 – Multi Layered simulation output (Source: output of my own implementation of Multi Layered Social network simulation computer program averaged over 3 trials) Next, I present in the following figure the resultant network representation of a community from this simulation program. Figure 9 – Multi Layered simulation resultant network (Source: Output of my own implementation of Multi Layered Social network simulation computer program) As can be seen the resultant graph is unlike a geographic map but dose at least have a physical structure but it can be said to be unnatural. Next is a cumulatively graph of the degree of a node (y axis) plotted against the number of nodes within that count (x axis) in the resultant simulated network. The multi-layered model, in contrast, gives a more natural distribution for each node (individual) with a average of 21.93 friends per person (standard deviation= 3.31). 20 Figure 10 – Out degree distribution of Multi-Tired social network simulation (Source: output of my own implementation of Multi-Tiered social network simulation computer program averaged over 3 trials) The following list summarizes the Multi-layered modelling technique in relation to using this technique to model communities: Geography Multi-Tired is ok Time The model results in a fixed structure. Degree of links Bell Curve is ok Link strength None of the models has link strength Link type There is no provision for age, gender, job type etc in the model Layers The Multi-Tired model is ok Table 11 – Multi-layered simulation summary (Source: My own interpretation of Multi Layered simulation models) The multi-layered model has a better representation of a community than random models of community with a limited representation of geography, a bell curve distribution of the degree of links and a method of representing layers within the model. On the negative side the multi-layered model has no method of representing time, link strength or link type in the model. 11 Agent Based Network Modelling 21 The next modelling method I consider is the agent based model developed by myself. The statistical output of the modelling program for the network is given bellow for networks of four different sizes ranging from 10000 to 150000 simulated individuals. Graph size 10000 50000 100000 150000 Mean degree 54.1893 54.3514 54.3872 54.4092 S.d of degrees 9.73575 9.51811 9.42792 9.41752 Min degree 10 9.66667 9.33333 9.33333 Max degree 91 93.6667 104.333 101.667 Small-world number 2.74614 3.01872 3.22805 3.37854 0.0179433 0.0132505 0.012583 0.0123686 Mean clustering c. Table 12 – Agent Based simulation output (Source: output of my own implementation of Agent Based Social network simulation computer program averaged over 3 trials) Next I present in the following figure the resultant network representation of a community from this simulation program. 22 Figure 11 – Agent Based simulation resultant network (Source: Output of my own implementation of Agent Based Social network simulation computer program) As can be seen the resultant graph is a geographic map but dose at least have a physical structure but it can be said to be unnatural. Next is a cumulatively graph of the degree of a node (y axis) plotted against the number of nodes within that count (x axis) in the resultant simulated network. 23 Figure 11 – Out degree distribution of Agent Based social network simulation (Source: output of my own implementation of Agent Based social network simulation computer program averaged over 3 trials) The following list summarizes the Agent-Based modelling technique in relation to using this technique to model communities: Geography This model has a community map Time This model results in an incremental structure. Link strength This model represents link strength Link type This model represents link type Layers This model represents multiple layers Table 13 – Agent based simulation summary (Source: My own interpretation of Agent Based simulation models) Agent based simulation fulfils all of my requirements for the simulation of a community. These are representing in the model as map of the community, the passing of time in the model, the representation of link strength, link type and multiple layers in the model. 12 Summary The applicability of all 6 models to social network analysis as applied to modelling community is discussed below. We equate points within a graph with individuals within our population and edges with social links. What follows is a brief discussion of the applicability of the six models described above to modelling a social network. First I summarise the how each of the models fulfil the com unit y features identified above 24 Geography -- Erdos-Renyi and Barabási-Albert are bad; Watts-Strogatz is OK but unnatural, Kleinberg is better. The Kleinberg and Multi-Layered models would have the individuals embedded within a 2D map and having a set of friends living close, which is a good start. However Kleinberg has a uniform lattice structure, which is unlike the natural aggregations of people into houses, streets, hamlets, villages, towns, cities, and countries. The Agent based model dose represent a map of a community. Time -- A social network is a constantly evolving dynamic object, whereas for a given number of points, each model results in a fixed structure. Social links wax and wane e.g. fade with time or can be strengthened by repeated social contact. BarabásiAlbert is the only model which has any kind of evolution, but links are permanent once made and the preferential attachment results in a scale-free degree distribution, which is undesirable, for example it means that the degree distribution is highly skewed to the right, with individuals of very high degree. The Agent based model dose represent the passing of time. Link strength -- Not all links are equal within a social network. Family ties are very different to ties with people you are linked to by virtue of sharing an office. None of the models has link-strength or type except the Agent based model which dose represent link strength. Link type - Individual properties -- similarly to all links being equal, all points are equal. There is no provision for age, gender, job type etc in the models except the Agent based model which dose represent link type. Layers -- all models are bad. A real social network has various "layers", each with a different distance function and points having close links within each. While Kleinberg might model the housing links. The Agent based model which dose represent layers. Table 14: Computer modelling summary by judging criteria (Source: My own interpretation of simulation models fulfilling judging criteria) Alternately these models can be summarized by type of model. The suitability of each of the models of community I investigated is summarized in the following list Marcov chains No geography, No Time, No link type, No link strength, No layers Random Erdos-Renyi models No geography, No Time, No link type, No link strength, No layers Barabasi-Albet random models No geography, No Time, No link type, No link strength, No layers Watts-Strogatz random models Very Limited geography, No Time, No link type, No link strength, No layers Kleinburrg random models Grid geography, No Time, No link type, No link strength, No layers 25 Multi layered models Grid geography, No Time, No link type, No link strength, Representation of layers Agent based models Stylized map of the geography, Representation of time, Representation of link type, Representation of link strength, Representation of layers Table 15 – Computer modelling summary by technique (Source: My own interpretation of simulation models fulfilling judging criteria) As can be seen from this table I surmise that agent based simulation although not perfect is by far the best way of simulating a community. Bibliography Albert, R. & Barabási, (2002). A-l. “Statistical mechanics of complex networks”, Reviews of Modern Physics 7J. P 47-97. Doran, J. Gilbert. (1995). Simulating Societies, UCL press. Erdos, P. & Renyi, A. (1959). On Random Graphs. Publications Mathematicae, 6, 290-297. Fienberg and Wasserman (1985). Journal of the American Statistical Association, Vol. 80, No. 389 (Mar., 1985), pp. 51-67 Frank and Strauss (1986), Journal of the American Statistical Association, Vol. 81, No. 395 (Sep., 1986), pp. 832-842 Gilbert, N. & Troitzsch K. (1999). Simulation for the Social Scientist, Open University Press. Holland, P. Leinhardt S. (1970) a method for Detecting Structure in Sociometric data, The American Journal of Sociology. Vol.3. (Nov 1970). pp 492-513. Holland, P & Leinhardt, S. (1979). Structural Sociometry. Academic press. Kleinberg, J. (1999). “The Small-World Phenomenon; An Algorithmic Perspective”, Cornell Computer Science Technical Report 99-1776. Nosh, Contractor. (2003). Theories of Communication Networks (co-authored with Peter Monge) Oxford University Press. Markov, A.A. (1931) Theory of algorithms/A.A. Markov; translated from Russian by Jacques J. Schorr-Kon and IPST staff, Israel Program for scientific translations. Robins, G., Pattison, P., & Woolcock, J. (2004). Missing data in networks: exponential random graph (p*) models for networks with non-respondents. Social Networks, 26, 257283. Snijders, T.A.B.(2001). The statistical evaluation of social network dynamics. M.E. Sobel & M.P. Becker (eds.), Sociological Methodology-2001, 361-395. Boston and London: Basil. 26 Snijders, T. & Baerveldt. C (2003). A Multilevel Network Study of the Effects of Delinquent Behaviour on Friendship Evolution. Taylor & Francis. Stevens, J.K. (2008), Analysing the experience of the 1996 intake of postgraduates acquaintances are made, annual (non residential) Graduate Conference, sociology department, University of Essex. Suleiman, R, Troitzsch R & Gilbert N, (2000). Tools and Techniques for Social Science Simulation, Physica-Varlag. Wasserman, S & Pattison, P (1996). Logit models and logistic regressions for social networks. Psychometrika. Watts, D. & S. Strogatz, (1998). "Collective dynamics of small-world networks", Nature 393V.