Week 11
Dragomir R. Radev
Wednesdays, 6:10-8 PM
325 Pupin Terrace
Fall 2010
(29) Bibliometrics
• The Science Citation Index (1960)
– More than 8,700 journals in the natural and social sciences
• Eugene Garfield
• de Solla Price – study of networks of papers and citation patterns
• Citeseer
• Rexa
• Google Scholar
• ACL Anthology Network
• Journal citation reports
• Impact factor:
– Computed over a three-year period as B/A, where
• First two years: A = number of citable items
• Third year: B = the number of citations to them
• In science (2006)
– Science (30.03)
– Nature (26.68)
– PNAS (9.64)
• Favor certain fields and types of research
• Absolute value is meaningless
• Ignores certain type of scholarly work
(e.g., books, software, conference papers)
• Possible to manipulate
• Self-citations
• Ignore citation type (this applies to all other metrics!)
[Weinstock 1971]
• In a given year, about 35% of the papers of all existing papers are not cited at all.
Another 49% are cited only once. The rest are cited an average of 3.2 times each.
• Degree coefficient is about 2.5-3.0
• 7% annual growth
• Most papers are obsolete after 10 years
De Solla Price 1965
• Citation count
• Impact factor
• Pagerank (e.g., http://www.eigenfactor.org/)
• H-index
• Proposed by Jorrge Hirsch of
UCSD in 2005
• Equals the number of papers of yours, h that have been cited at least h times.
• For physicists, 12=tenure,
18=full prof, 45=NAS
(statement by Hirsch)
• See demo (ACL Anthology
Network)
• also: PoP (guess what it means?) h papers
• Galois’s is 2 (short career)
• Hard to compare two people with the same score but very different distribution
• Hugely different based on the underlying database
35
25
25
34
32
32
39
32
30
33
45
45
30
24
23
37
24
25
30
12
12
12
12
12
12
12
15
14
14
14
14
13
12
11
11
11
11
11
16
AAN Google Scholar Name
38 Ken Church
Kevin Knight
Ralph Grishman
Aravind Joshi
Hermann Ney
Fernando Pereira
David Yarowsky
Michael Collins
Chris Manning
Daniel Marcu
Kathy McKeown
Robert Mercer
Franz Och
Yves Schabes
Stuart Shieber
Eric Brill
Eugene Charniak
Ido Dagan
Mark Johnson
Philip Resnik
88 Hector Garcia-Molina (Stanford), ACM Fellow, Member of the National Academy of Engineering
81 Jeffrey D. Ullman (Stanford), ACM Fellow, Member of the National Academy of Engineering
76 Robert Tarjan (Princeton), Turing Award, ACM Fellow, Member of the National Academy of Engineering
75 Deborah Estrin (UCLA), ACM Fellow, IEEE Fellow
75 Don Towsley (U Mass, Amherst), ACM Fellow, IEEE Fellow
73 Ian Foster (Argonne National Laboratory & U Chicago)
71 Scott Shenker (Berkeley), ACM Fellow, IEEE Fellow
70 David Culler (Berkeley), ACM Fellow, Member of the National Academy of Engineering
68 Takeo Kanade (CMU), ACM Fellow, IEEE Fellow, Member of the National Academy of Engineering
61 Mario Gerla (UCLA), IEEE Fellow
61 Nick Jennings (U Southampton), Fellow of the Royal Academy of Engineering
58 Anil K. Jain (Michigan State U), ACM Fellow, IEEE Fellow
57 Demetri Terzopoulos (UCLA), ACM Fellow, IEEE Fellow, Member of the European Academy of Sciences
56 Randy H. Katz (Berkeley), ACM Fellow, IEEE Fellow, Member of the National Academy of Engineering
56 Steven Salzberg (U Maryland)
55 Jennifer Widom (Stanford), ACM Fellow, Member of the National Academy of Engineering
54 Jack Dongarra (U Tennessee), ACM Fellow, IEEE Fellow, Member of the National Academy of Engineering
54 David E. Goldberg (UIUC)
54 Ken Kennedy (Rice), ACM Fellow, IEEE Fellow, Member of the National Academy of Engineering
54 Amir Pnueli (Weizmann and New York University), Turing Award, ACM Fellow, Member of the National Academy of Engineering
54 Herbert A. Simon (CMU), Turing Award, ACM Fellow, Nobel Laureate
53 Sally Floyd (ICSI), ACM Fellow
53 Tomaso Poggio (MIT)
53 Eduardo Sontag (Rutgers), IEEE Fellow
52 Rakesh Agrawal (Microsoft), ACM Fellow, IEEE Fellow, Member of the National Academy of Engineering
52 Stanley Osher (UCLA), Member of the National Academy of Sciences
52 Christos H. Papadimitriou (Berkeley), ACM Fellow, Member of the National Academy of Engineering
51 Jiawei Han (UIUC), ACM Fellow
51 Richard Karp (Berkeley), Turing Award, ACM Fellow, Member of the National Academy of Engineering
51 Alex Pentland (MIT)
[using PoP; collected by Jens Palsberg (UCLA)] http://www.cs.ucla.edu/~palsberg/h-number.html
• 31.5% of the papers have been cited.
• In-degree power law coefficient 1.71
• Diameters:
– Neural networks (n=23,371) d=24, ud=18
– Automata (n=28,168) d=33, ud=19
– Software eng (n=19,018) d=22, ud=16
• Largest connected components:
– NN WCC=79.6%
– Automata WCC=92%
– SE WCC=87.9%
Many reasons why people collaborate:
[Beaver 2001; Glaenzel 2003]
[Paul Erdos]
(23) The Ising model
(24) Percolation on graphs
• Will water flow through a porous stone?
• Let p be the probability that an edge is open.
• This process is called “bond percolation”
• Paths (percolation) appear at p =0.5059. This is a quintessential example for phase transitions q(p)
1
(1,1)
1 p
• Example: ferromagnetism. The Curie point is when there is no longer spontaneous magnetization
• Generic example of a magnetic field:
[http://ibiblio.org/e-notes/Perc/ising.htm]
• Given a lattice in D-dimensional space.
• Each vertex can be -1 or 1.
• Configurations: specific assignments of -1 and 1
• The energy of a configuration is
• In statistical physics: P(S) ~ e βE
[http://ibiblio.org/e-notes/Perc/trans.htm]
• http://webphysics.davidson.edu/applets/ising/def ault.html
• http://stp.clarku.edu/simulations/ising/ising2d.ht
ml
• http://www.phy.syr.edu/courses/ijmp_c/Ising.html
• Ferromagnetic alignment (J>0)
• Temperature tends to break the alignment: causes the spins to randomly change their values
• External magnetic field tends to support the alignment
• The critical value is around 0.59 but has not been derived analytically.
• http://theorie.physik.uniwuerzburg.de/~reents/ComputationalPhysi cs/percgr.html
• http://ibiblio.org/e-notes/Perc/perc.htm
• http://ibiblio.org/e-notes/Perc/distr.htm
• http://stp.clarku.edu/simulations/
(15) Diffusion on graphs
• Epidemic = in the limit of a large graph, a nonzero fraction is infected.
• Fully mixed networks – everyone is connected to everyone the same way.
• In real life this is not true.
• Let f = average number of shortcuts per vertex.
• Let k = 1: every vertex is connected to at least its one nearest neighbor.
• For large L (#vertices), the prob. that two random vertices have a shortcut is:
1 1
2
L
2
kfL
2 kf
L
Moore and Newman 2000. Epidemics and
Percolation in small-world networks.
• Newman 2002
– Outbreak size distribution
– Degree of infected individuals
– Bipartite graphs