Microsoft PowerPoint - NCRM EPrints Repository

advertisement

How to do….social network analysis

Mike Savage

(with the help of Nick Crossley)

University of Manchester

What is social network analysis?

• A set of mathematical and statistical techniques for handling relational data .

• For identifying the structural properties of sets of relations (i.e. of networks).

• And for visualising and describing networks.

• Social scientific origins in: sociology, anthropology, social psychology.

• Mathematical bases in: graph theory, matrix algebra and (increasingly) statistics.

• There is an increasing dialogue with physics, maths, computer science, informatics.

The challenge of social networks

• Relational thinking challenges powerful approaches which focus on (a) individual attributes and (b) variable-centred perspectives.

• Emphasises that it is connections (and disconnections) that matter

• Although highly formalised and quantitative, it often depends on qualitative, rather than survey based, research

• Survey measures can often provide data on ego-networks (an individual’s relations) but qualitative work is often necessary to gather information on whole networks

What is relational data?

• Data concerning relations between things

(e.g. agents) rather than the individual properties/attributes of those things.

• We might be interested in relations between: people, organisations, nations, towns …anything.

• We might be interested in relations based upon: knowledge, emotion, exchange, infection/contamination …anything.

What are structural properties of networks?

• Defining characteristics/patterns of the relationships comprising a network taken as a whole.

• SNA is not focused on individual properties of nodes or types of relations but on (i) the ‘whole’ these ‘parts’ comprise, (ii) the position of nodes within it (e.g. central or marginal) and (iii) the

(relational) properties they acquire from it.

• SNA is formal (i.e. not focused on content/ meaning of relationships). That is a strength and a weakness.

Why bother with SNA?

• It is difficult to describe/analyse complex networks discursively.

• SNA organises relational data and makes it (via the software) easily ‘searchable’.

• SNA identifies genuine, meaningful, measurable and potentially sociological relevant properties of networks.

• It shows that networks have properties which make a difference.

Basic Matrices

A simple relational matrix in which presence/absence of a relation is indicated by a 1 or 0 respectively: who drinks with whom?

Tom Dick Sally Fred Alice

Tom 0

Dick 0

0

0

1

1

1

1

0

0

Sally 1

Fred 1

Alice 0

1

1

0

0

0

1

0

0

0

1

0

0

Note that in this case there is unnecessary repetition of information: if Tom drinks with

Sally then Sally drinks with Tom

Tom Sally

Tom 1

Sally 1

• We need to be mindful of this in any calculations we may make. We have one relationship here, not two.

Some relations are ‘directed’ (e.g. liking) and thus not necessarily reciprocated. This is recorded in the matrix. Tom likes Sally but she doesn’t like him. She likes Alice.

Tom

Tom Sally

1

Alice

Sally 1

Alice

• In this case Tom’s relation to Sally and hers to him are distinct and should be treated independently

• It is often not meaningful to ask whether a person is related to their self.

• But it may be (e.g. organisations send memos internally and externally, individuals like/dislike themselves).

• The matrix convention, when it is not meaningful, is to indicate that a person is not related to their self.

• Most software packages, including Pajek , use forms of data entry (short of a full matrix) where relations to self are not indicated unless meaningful.

Relations may be weighted in ordinal/interval manner: e.g. 0 = ‘Don’t like’, 1=‘like’, 2=‘really like’; or telephones n times per week.

Tom

Tom 0

Dick 0

Sally 2

Fred 3

Alice 5

2

3

0

5

Dick Sally Fred Alice

2 1 5 4

3

0

0

3

4

5

2

3

0

0

8

0

Basic sociograms

Graphs represent the same information as matrices in a different way.

• Relationships are indicated by connecting lines.

• If undirected these lines are referred to as ‘edges’.

• If directed these lines are referred to as ‘arcs’ and direction is indicated by an arrow head (potentially at both ends)

• The objects or nodes (e.g. people) connected by arcs/edges are referred to as vertices ( sing.

= vertex).

• These terms derive from graph theory.

• (Examples to follow from Savage, Tampubolon and

Warde, 2008).

Figure 8.1: Communication Networks within the Labour Party

86

42

85

84

17

18

19

83

34

80

79

35

41

39

32

31

30 29 28

20

27

21

26

49 50

52

53

54

55

56

59

60

33

87

43

104 15

9

88

44

14 38

45

8

100 5

1

47

48 13 4

89

11

97

16

99

101

98

102

103

91

95

92

94

93

96

90

76

81 82

78

63

75

62

2

12

74

3

73

6

7

10

36

57

72

77

46

51

40

25

70

24

65

64

58

71

69

68

67

66

61

Note

Pajek

Figure 8.2

:

Communication Networks within the Conservation Group

15

10

8

16

99

98

97

96

95

94

93

45

33

82

81

14

9

58

20

18

13

22

65

21

2

23

55

26

54

25

51

30

50

44

49

48

47

46

32

7

19

12

27

35

1

31

36

37

38

39

40

41

42

34

28

77

84

69

90

29 3

5

24

52 53

92

6

4

43

11

17

57

60

91

89

88

87

86

85

83

80

79

78

76 75 74

73

61

72

71

70

68

67

66

64

62

63

56

59

Pajek

Note

Respondents were asked ‘with whom do you discuss things to do with the organisation (for example, activities, issues, strategy).

Figure 8.3.1:Obtaining Information Network in the Labour Party

418

431 402

375

376 362

127

488

479

470

457

366

416

363

435

447 365

426

427

467

452

449

448

486

393

388

419

474

371

445

400

399

413

455

440

389

409

446

439

381

429

487 451 482

472

387 456

391

464

465

475

384

483

367

382

377

430

408

450

476

466

424

423

480

383

392

458 425

485

397

453 463

414

434

390

395

396

405

428

415

407

411

432

422

410

372 436

437

380

489 481

478

471

468

F igure 8.3.3: Obtaining Information Network in the Conservation Group.

112

73

117

21

104

8 107

123

41 95

124

121

120

113

118

101

18

108

54

10

84

98

110

48

68

9

4

51

28

12

109

13

31 96

36

70

114

361

97

66 99

67

19

128

106

82

39

90

115 72 126

27

83

125

45

17

74

49

2

20

63

360

122

1

89

3

85

64

76

102

127

46

34

24

6

86

92

62

37

5

116

78

105

16

79

23

44

58

65

56

59

81

94

11

42

38

29 26 22

7

15

43

53

103

111

Nice pictures, but so what?

• SNA is often applied to intrinsically interesting groups (elites, policy makers, medical diseases) but it is not always clear how it applies to

‘mundane’ groups

• The value of linking sociograms to qualitative evidence. We used in depth interviews to show how the structure of networks affected people’s narratives

• This shows that the hub and spoke structure

(conservation group) is more clientist and less energising than the inner wheel structure

(Labour Party)

More formal measures

SNA involves measurement of relational properties: e.g.

• Degree (in-degree, out-degree, average degree).

• Density.

• Components, cliques and cores.

• Centrality.

• Centralisation.

• Erdos/Bacon number(s).

• Diameter.

• (see John Scott’s handbook for an excellent intro to these and more)

Density =

• The actual number of connections in a network expressed as a proportion of the total possible number of connections.

• A figure between 0 and 1.

• Not easy to make meaningful comparisons of density across networks of different sizes (or involving different types of relations).

• High density should generate greater: trust, cultural homogeneity and diffusion speed .

Components = sub-graphs within a graph, any of whose vertices can be reached from any other (by whatever route).

• Important because the holes that separate different components within a graph prevent the flow of resources and might indicate tribal or ideological divides.

• Cliques = a subset of vertices within a component, all of whose members are related to one another (density = 1).

• K-cores = a subset of vertices within a component, all of whose members enjoy a specified number of relations (=‘k’) with the others: e.g. a 6-core is a subset, all of whose members enjoy relations with at least 6 of the other members.

• And because of their relations to those outside of their grouping and relative power.

Figure 8.1: Communication Networks within the Labour Party

86

42

85

84

17

18

19

83

34

80

79

35

41

39

32

31

30 29 28

20

27

21

26

49 50

52

53

54

55

56

59

60

33

87

43

104 15

9

88

44

14 38

45

8

100 5

1

47

48 13 4

89

11

97

16

99

101

98

102

103

91

95

92

94

93

96

90

76

81 82

78

63

75

62

2

12

74

3

73

6

7

10

36

57

72

77

46

51

40

25

70

24

65

64

58

71

69

68

67

66

61

Note

Pajek

Figure 8.2

:

Communication Networks within the Conservation Group

15

10

8

16

99

98

97

96

95

94

93

45

33

82

81

14

9

58

20

18

13

22

65

21

2

23

55

26

54

25

51

30

50

44

49

48

47

46

32

7

19

12

27

35

1

31

36

37

38

39

40

41

42

34

28

77

84

69

90

29 3

5

24

52 53

92

6

4

43

11

17

57

60

91

89

88

87

86

85

83

80

79

78

76 75 74

73

61

72

71

70

68

67

66

64

62

63

56

59

Pajek

Note

Respondents were asked ‘with whom do you discuss things to do with the organisation (for example, activities, issues, strategy).

Varieties of (vertex) centrality

Deg

Close

Close

Bet

• Central vertices enjoy both advantages and disadvantages (potentially)

• There are also ‘centralisation’ measures focused upon the network as a whole.

• How centralised is it?

• The most basic of these take variance measures for one of the centrality scores (or something very similar)

• Centralised networks may be more easy to organise.

• They are often robust in relation to random attack/damage.

• But vulnerable to targeted attack.

e.g. remove any ‘spoke’ from this ‘wheel’ and damage is minimal but remove the central hub and the whole thing collapses

(atomises).

The concept of ‘degree’ is also used in this context (and the concept of ‘walk’).

An individual with a Bacon number of 4 is separated from Bacon by ‘four degrees of separation’.

Bacon

1

3

4

2

• The ‘small world’ phenomenon is sometimes referred to as ‘six degrees of separation’.

Diameter of a component =

• The shortest possible path between the two most distant vertices (measured in degrees).

• Or the (mean) average path distance between all possible pairs of vertices.

• Relevant because resources, viruses etc. will take longer to travel across large diameters.

• Information/gossip will be more liable to distortion (Chinese whispers) the more mouths it passes through.

• For a very clear exposition of these and other ideas see:

Scott, J. (2000) Social Network Analysis: A

Handbook, London, Sage.

For a longer, more detailed and more challenging (but very clear) intro, see:

Wasserman, S. and Faust, K. (1994) Social

Network Analysis , Cambridge, CUP.

Pajek software

• Free to download

• Easy to use

• Good book available on it.

• Good on Visualisation.

• Compatible with other packages (e.g. netdraw, SPSS)

• Able to handle very big data sets.

Getting Started

• Preparing a data file for Pajek.

• Visualising with Pajek.

Preparing a data file 1:

• You cannot prepare data files within Pajek itself. You must prepare them separately.

• Pajek doesn’t read Word files.

• Best to use ‘Notepad’ or a similar program.

• Notepad is a standard feature on W4W (look in

‘programs’).

• Notepad saves files in ‘textfile’ format by default. Pajek can read textfiles.

• There are many ways of constructing datafiles for Pajek, here are two:

Simply open Notepad and type:

*vertices n

1 Tom

2 Dick

3 Harry

*edges

1 2

1 3

2 3

Etc.

*vertices n

1 Tom

2 Dick

3 Harry

*arcs

1 2

1 3

2 3

Etc.

Or

You can also weight relationships:

*vertices n

1 name

2 name

3 name

*edges

1 2 2

1 3 1

2 3 4

Etc.

The Pajek book

• De Nooy, W, Mrvar, A. and Batagelj, V.

(2005) Exploratory Social Network

Analysis with Pajek . Cambridge:

Cambridge University Press, £22.99 pbk,

0-521-60262-9.

Download