Examining Psychokinesis
Examining Psychokinesis:
The Interaction of Human Intention with Random Number Generators.
A Meta-Analysis
1
Submitted: August 19, 2004
Acknowledgments
(removed for blind review)
Examining
Psychokinesis:
The Interaction of Human Intention with Random Number Generators.
A Meta-Analysis
Submitted: August 19, 2004
Abstract
Séance
room phenomena and apports have fascinated mankind for decades. Experimental research has reduced these phenomena to attempts to influence (i) the fall of dice and, later,
(ii) the output of random number generators (RNGs). This overlooks dozens of other PK experiments. It also overlooks the fact that most of the impetus for developing RNG experiments was not to simulate séance phenomena, but to test quantum observational theories to tighten methodologies, and to reduce the need for special subjects. The meta
analysis presented here combined 357 studies that assessed whether human intention could correlate with RNG output. The studies yielded a significant, but very small effect. Study size was strongly and inversely related to effect size; Why the authors prefer to ignore the voluminous literature on this issue, I cannot fathom. this finding was consistent across all
Examining Psychokinesis 2 examined moderator and safeguard variables. A (not well specified) Monte Carlo simulation revealed that the small effect size, the relation between study size and effect size, as well as the extreme effect size heterogeneity, might be a result of publication bias.The idea that individuals can influence inanimate objects by the power of their own mind is a relatively recent concept.
Huh? Isn't sympathetic magic one of the most ancient beliefs? During the 1970s, Uri Geller reawakened mass interest in this putative ability through his demonstrations of spoon bending using his alleged psychic powers (Targ & Puthoff, 1977; Wilson, 1976) and lays claim to this ability even now (e.g., Geller, 1998). Belief in this phenomenon is widespread. In 1991
(Gallup & Newport), 17 percent of American adults believed in "the ability of the mind to move or bend objects using just mental energy" (p. 138) and seven percent even claimed that they had "seen somebody moving or bending an object using mental energy" (p. 141).
Unknown to most academics, a large amount of experimental data has accrued testing the hypothesis of a direct connection between the human mind and the physical world. It is one of the very few lines of research where replication is the main and central target (initially perhaps, but surely not for the past 20 years), a commitment that some methodologists wish to be the commitment of experimental psychologists in general (e.g., Cohen, 1994; Rosenthal &
Rosnow, 1991). This article will trace the development of the empirical evaluation of this alleged phenomenon and will present a new meta
analysis of a large set of studies examining the interaction between human intention and random number generators.
Psi research
Psi phenomena (Thouless, 1942; Thouless & Wiesner, 1946) can be split into two main categories. Psychokinesis (PK) is the common label for the apparent ability of humans to affect objects solely by the power of the mind. Extra
sensory
-
perception (ESP), on the other hand, refers to the apparent ability of humans to acquire information without the mediation of the
Examining Psychokinesis 3 recognized senses or logical inference. Many researchers believe that PK and ESP phenomena are idiosyncratic (from context I'm guessing they mean something like "identical," not idiosyncratic ) (e.g., Pratt, 1949; J. B. Rhine, 1946; Schmeidler, 1982; Stanford, 1978;
Thouless & Wiesner, 1946). Nevertheless, the two phenomena have been treated very differently right from the start of their scientific examination. For instance, whereas J. B. Rhine and his colleagues at the Psychology Department at Duke University published the results of their first ESP card experiments right after they had been conducted (Pratt, 1937; Price &
Pegram, 1937; J.B. Rhine, 1934, 1936, 1937; L. E. Rhine, 1937), they withheld the results of their first PK experiments for nine years (L. E. Rhine & J. B. Rhine, 1943) even though they had been carried out at the same time as the ESP experiments: Rhine and his colleagues did not want to undermine the scientific credibility that they had gained through their pioneering monograph on ESP (Pratt, J. B. Rhine, Smith, Stuart & Greenwood, 1940).
When L. E. Rhine & J. B. Rhine (1943) went public with their early dice experiments, the evidence was based not only on above chance results, but primarily on a particular scoring pattern. In those early PK experiments, the participants' task was to obtain combinations of given die faces. The researchers discovered a decline of "success" during longer series of experiments, a pattern suggestive of mental fatigue (Reeves & Rhine, 1943; J. B. Rhine &
Humphrey, 1944, 1945). This psychologically plausible pattern of decline seemed to eliminate several counter
hypotheses for the positive results obtained, such as die bias or trickery, because they would not lead to such a systematic decline. However, as experimental evidence grew, the decline pattern lost its impact in the chain of evidence.
Verifying psi
Today, in order to verify the existence of psi phenomena, one of two meta analytic approaches is generally undertaken - either the "proof
oriented" or the "process
oriented" meta
-
Examining Psychokinesis 4 analytical approach. The proof
oriented meta
analytical approach tries to verify the existence of psi phenomena by establishing an overall effect. The process
oriented meta
analytical approach tries to verify the existence of psi by establishing a connection between results and moderator variables.
Alleged [probably the intended meaning here and elsewhere is "potential" since the legalistic term "alleged" is inappropriate} moderators of PK, such as the distance between the participant and the target, and various psychological variables, have never been investigated as systematically as alleged moderators of ESP. So far, there have not been any meta
analyses of
PK moderators and the three main literature reviews of PK moderators (Gissurarson, 1992 &
1997; Gissurarson & Morris, 1991; Schmeidler, 1977) have come up with inconclusive results.
On the other hand, the three meta
analyses on ESP moderators established significant correlations between ESP and extraversion (Honorton, Ferrari & Bem, 1998), ESP and belief in ESP (Lawrence, 1998), and ESP and defensiveness (Watt, 1994). The imbalance between systematic reviews of PK and ESP moderators reflects the general disparity between the experimental investigations of the two categories of psi. From the very beginning of experimental investigation into psi, researchers have focused on ESP.
The imbalance between research in ESP and PK is also evident from the proof
oriented meta
analytical approach. Only three (Radin & Ferrari, 1991; Radin & Nelson, 1989, 2002) of the 13 (Bem & Honorton, 1994; Honorton, 1985; Honorton & Ferrari, 1989; Milton, 1993,
1997; Milton & Wiseman, 1999a, 1999b; Radin & Ferrari, 1991; Radin & Nelson, 1989,
2002; Stanford & Stein, 1994; Steinkamp, Milton & Morris, 1998; Storm & Ertel, 2001) meta
analyses on psi data address research on PK. Only two of which provide no evidence for psi (Milton & Wiseman, 1999a, 1999b). (The point of discussing/claiming an "imbalance between research in ESP and PK" is not clear. It also may be incorrect in more recent years
Examining Psychokinesis depending on how one counts RV and Ganzfeld, neither of which are classical ESP, and both of which have a relatively small number of datapoints. In any case, if this issue is to be pursued
ESP should be defined -- the next section shows evidence of confounding very different approaches, e.g., experimental research, social observation, and anecdote.)
Psychology and psi
Psychological approaches to psi have also almost exclusively focused on ESP. For example, there is a large amount of research (I disagree: certainly there is ample rhetoric, but systematic research, no) supporting the hypothesis that alleged ESP experiences are the result of delusions and misinterpretations (e.g., Alcock, 1981; Blackmore, 1992; Persinger, 2001). Personality
oriented research established connections between belief in ESP and several personality variables (Irwin, 1993; see also, Dudley, 2000; McGarry & Newberry, 1981; Musch &
Ehrenberg, 2002). Experience
oriented approaches to paranormal beliefs, which stress the connection between paranormal belief and paranormal experiences (e.g., Alcock, 1981;
Blackmore, 1992; Schouten, 1983) and media oriented approaches, which examine the connection between paranormal belief and depictions of paranormal events in the media (e.g.,
Sparks, 1998; Sparks, Hansen & Shah, 1994; Sparks, Nelson & Campbell, 1997) both focus on ESP, although the paranormal belief scale most frequently used in those studies also has some items on PK (Thalbourne, 1995).
The beginning of the experimental approach to Psychokinesis
Reports of séance room sessions during the late 19 th century are filled with claims of extraordinary movements of objects (e.g., Crookes, Horsley, Bull, & Meyers, 1885), prompting some outstanding researchers of the time to devote at least part of their career to determining whether the alleged phenomena were real (e.g., Crookes, 1889); James, 1896; Richet, 1923). In these early days, as in psychology, case studies and field investigations predominated. Hence, it
5
Examining Psychokinesis 6 is not surprising that in this era experimental approaches and statistical analyses were used only occasionally (e.g., Edgeworth, 1885, 1886; Fisher, 1924; Sanger, 1895; Taylor, 1890). Even J.B.
Rhine, the founder of the experimental study of psi phenomena, abandoned case studies and field investigations as a means of obtaining scientific proof only after he exposed several mediums as frauds (e.g., J.B. Rhine & L.E. Rhine, 1927). However, after a period of several years when he and his colleagues focused almost solely on ESP research, their interest in PK was reawakened in 1937 when a gambler visited the laboratory at Duke University and casually mentioned that many gamblers believed they could mentally influence the outcome of a throw of dice. This inspired J.B. Rhine to perform a series of informal experiments using dice. Very soon experiments with dice became the standard approach for investigating PK.
Difficulties in devising an appropriate methodology soon became apparent and improvements in the experimental procedures were quickly implemented. Standardized methods for throwing the dice were developed. Dice
throwing machines were used to prevent participants from manipulating their throw of the dice. Recording errors were minimized by having experimenters either photograph the outcome of each throw or having a second experimenter independently record the results. Commercial, pipped dice were found to have sides of unequal weight, with the sides with the larger number of excavated pips, such as the 6, being lighter and hence more likely to land uppermost than lower numbers, such as the 1.
Consequently, studies required participants to attempt to score seven with two dice, or used a balanced design in which the target face alternated from one side of the die (e.g., 6) to the opposite site (e.g., 1).
In 1962 Girden (1962a) published a comprehensive critique of dice experiments in the
Psychological Bulletin. Among other things, he criticized the experimenters for pooling data as it suited them, and for changing the experimental design once it appeared that results were not
Examining Psychokinesis 7 going in a favorable direction. He concluded that the results from the early experiments were largely due to the bias in the dice and that the later, better
controlled studies were progressively tending toward non
significant results. Although Murphy (1962) disagreed with Girden's conclusion, he did concede that no "ideal" experiment had yet been published that met all six quality criteria - namely one with (i) a sufficiently large sample size; (ii) a standardized method of throwing the dice; (iii) a balanced design; (iv) an objective record of the outcome of the throw; (v) the hypothesis stated in advance; and (vi) with a prespecified end point.
The controversy about the validity of the dice experiments continued (e.g., Girden, 1962b;
Girden & Girden, 1985; Rush, 1977). Over time, experimental and statistical methods improved and, in 1991, Radin & Ferrari undertook a meta
analysis of the dice experiments.
Dice Meta-Analysis
The dice meta
analysis comprised 148 experimental studies and 31 control studies published between 1935 and 1987. In the experimental studies 2569 participants tried to mentally influence 2,592,817 die casts. In the control studies a total of 153,288 die casts were made without any attempt mentally to influence the dice. The experimental studies were coded for various quality measures, including a number of those mentioned by Girden (1962a).
Table 1 provides the main meta
analytic results 1 . (Given the importance of these calculations,
1 To compare the meta analytic findings from the dice and previous RNG meta analyses with those from our
RNG meta analysis, we converted all effect size measures to the proportion index (pi) (Rosenthal & Rubin,
1989) which we use throughout the paper. This one sample effect size ranges from 0 to 1 with .50 representing the null value (MCE). For two equally likely outcomes, e.g. when tossing a coin, represents the proportion of "hits".
For example, if heads win at a hit rate of 50%, the effect size = .50 indicates that heads and tails came down equally often; if the hit rate for heads were 75%, the effect size would be = .75. The most important property of
is that it converts all cases with more than two equally likely outcomes, like e.g. tossing a die, to the proportional hit rate as if there were just two alternatives.
Examining Psychokinesis it seems odd to relegate all this to a gigantic footnote.) The overall effect size, weighted by the inverse of the variance, is small but highly significant ( ¯ t
= .50610, z =19.68). Radin & Ferrari calculated that approximately 18,000 null effect studies would be required to reduce the result to a non
significant level (Rosenthal, 1979). When the studies were weighted for quality, the effect size decreased considerably (z
?
= 5.27, p = 1.34*10 7 ), but was still significantly above chance.
The authors found that there were indeed problems regarding die bias, with the effect size of the target face 6 being significantly larger than the effect size of any other target face. They concluded that this bias was sufficient to cast doubt on the whole database. They subsequently
8
In order to combine effect sizes from independent studies we used a fixed effect model, weighted by the inverse of the variance (how is the variance per study calculated?) (e.g., Shadish & Haddock, 1994). Because Dean Radin kindly provided us with the basic data files of the dice meta analysis, we were able to compute the overall effect size ¯ . However, although we were able to calculate the overall effect sizes ¯ o
for all meta analysis on the basis of the original data (see Table 2), the die data provided did not enable us to carry out the specific subgroup analyses presented in the meta analysis and summarized in Table 1. Consequently, in order to provide this information we
transformed the published results, which used the effect size r = z/sqrt(n), using ¯ t
= .5¯r + .5. This transformation is accurate as long as the z values of the individual studies are based on two equally likely alternatives (p = q = .5).
However, the z scores of most dice experiments are based on six equally likely alternatives (p = 1/6 and q = 5/6).
(what about the position studies?) Consequently ¯ o
as computed on the basis of the original data and ¯ t
as computed on the basis of the transformation formula diverge slightly because r no longer remains in the limits of
+/ 1. However, the difference between ¯ o
and ¯ t
is very small (< .05%) as long as the z values are not extreme (z >
10, p < 1*10 -
10 ). The difference is smaller the closer the value is to the null value of .50, which is the case for all effect sizes presented here. The difference between the two approaches can be seen when the results of the overall dice meta analysis that are presented in Table 1 are compared with the results presented in Table 2. The differences between the two estimates are determined using z
?
= ( ¯ o
-
¯ t
) / sqrt (SE o
2 + SE t
2 ). Although the difference is statistically significant (z
?
= 4.12, p = 3.72*10 -
5 , two tailed), the order of magnitude is the same.
Examining Psychokinesis 9 reduced their database to only those 69 studies that had correctly controlled for die bias (the
"balanced database"). As shown in Table 1 the resultant overall effect size remained statistically highly significant. However, the effect sizes of the studies in the balanced database were statistically heterogeneous. When Radin & Ferrari trimmed the sample until the effect sizes in the balanced database became homogenous, the effect size was reduced to only ¯ t
= .50158 and fell yet further to ¯ t
= .50147 when the 59 studies were weighted for quality. Only 60 unpublished null effect studies (our calculation (explain) are required to bring the balanced, homogenous and quality
weighted studies down to a non
significant level. Ultimately, the dice meta
analysis did not advance the controversy over the putative PK effect beyond the verdict of
"not proven", as mooted by Girden (1962b, p. 530) almost 30 years earlier.
Moreover, the meta
analysis has several limitations; Radin & Ferrari neither examined the source(s) of heterogeneity in their meta analysis, nor addressed whether the strong correlation between effect size and target face disappeared when they trimmed the 79 studies not using a balanced design from the overall sample. The authors did not analyze potential moderator variables and did not specify inclusion and exclusion criteria. The studies included varied considerably regarding the type of feedback given to participants. Some studies were even carried out totally without feedback. The studies also differed substantially regarding the participants who were recruited; some participants were psychic claimants and others made no claims to having any "psychic powers" at all. However, fundamentally as well as psychologically, the studies differ most in respect of the experimental instructions they received and the time window in which participants had to try to influence the dice. Although most experiments were real time, with the participant's task being mentally to influence the dice as they were thrown, some experiments were "precognition experiments" in which participants were asked to
Examining Psychokinesis 10 predict what die face would land uppermost in a future die cast thrown by someone other than the participant.
From Dice to Random Number Generator
With the arrival of computers, dice experiments were slowly replaced by a new approach.
Beloff & Evans (1961) were the first experimenters to use radioactive decay as a source of randomness to be influenced in a PK study. In the initial experiments, participants would try mentally to slow down or speed up the rate of decay of a radioactive source. The mean disintegration rate subjected to influence was then compared with that of a control condition in which there was no attempt at human influence.
Soon after this, experiments were devised in which the output from the radioactive source was transformed into bits (1s or 0s) that could be stored on a computer. These devices were known as random number generators (RNGs). Later, RNGs used electronic noise or other truly random origins as the source of randomness.
This line of PK research was, and continues to be, pursued by many experimenters, but predominantly by Schmidt (e.g., 1969), and later by the Princeton Anomalies and Engineering
Research (Princeton Engineering Anomalies Research) (PEAR) group at Princeton University
(e.g., Jahn, Dunne & Nelson, 1980).
RNG Experiments
In a typical PK RNG experiment, a participant presses a button to start the accumulation of experimental RNG data. The participant's task is mentally to influence the RNG to produce, say, more 1s than 0s for a predefined number of bits. Participants are generally given real
time feedback of their ongoing performance. The feedback can take a variety of forms. For example, it may consist in the lighting of lamps "moving" in a clockwise or counter clockwise direction, or in clicks provided to the right or left ear, depending on whether the RNG produces a 1 or a
Examining Psychokinesis 11
0. Today, feedback is generally software implemented and is primarily visual. If the RNG is based on a truly random source, it should generate 1s and 0s an equal number of times.
However, because small drifts cannot be totally eliminated, experimental precautions such as the use of an XOR filter, or a balanced experimental design are still required.
The RNG studies have many advantages over the earlier dice experiments, making it much easier to perform quality research with much less effort. Computerization alone meant that many of Girden (1962a) and Murphy's (1962) concerns about methodological quality could be overcome. If we return to Murphy's list of six methodological criteria, then (i) unlike with manual throws of dice, RNGs made it possible to conduct studies with large sample sizes
(reflects a fundamental assumption for which there is no sound evidence: the bit is the sample) in a short space of time; (ii) the RNG was completely impersonal - unlike the dice, it was not open to any classical (normal human) biasing of its output; (iii) balanced designs were still necessary due to potential drifts in the RNG; (iv) the output of the RNG could be stored automatically by computer, thus eliminating recording errors that may have been present in the dice studies; (v) like the dice studies, the hypotheses still had to be formulated in advance; and
(vi) like the dice studies, optional stopping could still be a potential problem. Thus, RNG research entailed that, in practical terms, researchers no longer had to be concerned about alleged weak points (i), (ii) and (iv).
New Limits
From a methodological point of view, RNG experiments have many advantages over the older dice studies. However, in respect of ecological validity, the RNG studies have some failings. Originally, the PK effect to be assessed was macroscopic and visual. Experimentalists then reduced séance room PK, first to PK on dice, and then to PK on a random source in an
RNG. I don't think this historical sequence is correct. I doubt that dice and RNG tests were
Examining Psychokinesis 12 created to simulate macro effects. But PK may not be reducible to a microscopic level (e.g.,
Braude, 1997). Moreover, a dice experiment is psychologically very different from an RNG study. Most people have played with dice, but few have had prior experience with RNGs.
Additionally, an RNG id technical gadget from which the output must be computed before feedback can be presented. Nevertheless, the ease with which PK data can be accumulated using an RNG has led to PK RNG experiments forming a substantial proportion of available data. Three related meta
analyses of these data have already been published.
Previous RNG Meta-Analyses
The first RNG meta
analysis was published by Radin & Nelson (1989) in Foundations of
Physics. This meta
analysis of 597 experimental studies published between 1959 and 1987 found a small but significant effect of ¯ o
= .50018 (SE = .00003, z = 6.53, p < 1*10 10 ) 2 .The size of the effect did not diminish when the studies were weighted for quality or when they were trimmed by 101 studies to render the database homogenous.
The limitations of this meta
analysis are very similar to the limitations of the dice meta
analysis. The authors did not examine the source(s) of heterogeneity and did not specify inclusion and exclusion criteria. (From our FoP paper: "Experiments selected for review examined the following hypothesis: The statistical output of an electronic RNG is correlated with observer intention in accordance with pre-specified instructions, as indicated by the directional shift of distribution parameters (usually the mean) from expected values." That seems pretty well specified to me!) Consequently, participants varied from humans to cockroaches, and the feedback ranged from no feedback at all to the administration of an
2 The meta-analysis provided the overall effect size only in a figure (Fig. 3, p. 1506). Because its first author kindly provided us with the original data, we were able to calculate the overall effect size and the proper statistics.
Examining Psychokinesis 13 electric shock. The meta
analysis included not only studies using true RNGs, which are RNGs based on true random sources such as electronic noise or radioactive decay, but also studies using pseudo RNGs (e.g., Radin, 1982), which are based on deterministic algorithms (with truly random starting points). It might be argued that the authors simply took a very inclusive approach. However, the authors did not discuss the extreme variance in the distribution of the studies' z scores (Again, from the FoP article: "Finally, following the practice of reviewers in the physical sciences (23,24), we deleted potential "outlier" studies to obtain a homogeneous distribution of effect sizes and to reduce the possibility that the calculated mean effect size may have been spuriously enlarged by extreme values." Certainly this is a rationale for what we did, although perhaps it is not a discussion.) and did not assess any potential moderator variables, which were also two weaknesses in the dice meta
analysis. Nevertheless, this first RNG meta
analysis served to justify further experimentation and analyses with the PK RNG approach.
Almost 10 years later, in his book aimed at a popular audience, Radin (1997) recalculated the effect size of the first RNG meta
analysis claiming that the "overall experimental effect, calculated per study, was about 51 percent" (p. 141). However, this newly calculated effect size is two orders of magnitude larger than the effect size of the first RNG meta
analysis (50.018%).
(Doesn't this reflect the different aggregation unit, bit vs experiment?) The increase has two sources. First, Radin removed the 258 PEAR studies included in the first meta
analysis
(without discussing why, because the whole purpose of that piece of the chapter was to address whether PEAR had been replicated, so it didn't make any sense to include it in the mix!) and second, he presented simple mean values instead of weighted means as presented 10 years earlier. The use of simple mean values in meta
analyses is generally discredited (e.g., Shadish &
Haddock, 1994), because it does not take into account that larger studies provide more accurate estimates of effect size. (ONLY assuming es is independent of N)In this case, the
Examining Psychokinesis 14 difference between computing an overall effect size using mean values rather than weighted mean values is dramatic. The removal of the PEAR studies effectively increased the impact of other small studies that had very large effect sizes. The effect of small studies on the overall outcome will be a very important topic in the current meta
analysis.
Recently, Radin & Nelson (2003) published an update of their earlier (1989) RNG meta analysis, adding a further 176 studies to their database. In this update, the PEAR data were collapsed into a new, single datapoint. The authors reported a simple mean effect size of
50.7%. Presented as such, the data appear to suggest that this updated effect size replicates that found in the first RNG meta
analysis. However, when the weighted fixed
effect model is applied to the data, as was used in the first RNG meta
analysis, the effect size of the updated database becomes ¯ o
= .50005, which is significantly smaller than the effect size of the original
RNG meta
analysis (z
?
= 4.27, p = 1.99*10 5 ; see Table 2 for comparison). One reason for the difference is the increase in sample size of the more recent experiments, which also have a concomitant decline in effect size. (es vs. N issue again)
Like the other meta
analyses, the updated 2002 meta
analysis did not investigate any potential moderator variables and no inclusion and exclusion criteria were specified (I don't understand this latter business about not specifying the inclusion criteria); it also did not include a heterogeneity test of the database. All three meta
analyses were conducted by related research teams and thus an independent replication of their findings is lacking. The need for a more thoroughgoing meta
analysis of PK RNG experiments is clear. (fair enough)
Human Intention Interacting with Random Number Generators:
A New Meta-Analysis
The meta
analysis presented here was part of a five
year consortium project on RNG experiments. The consortium comprised research groups from PEAR, USA; the University of
Examining Psychokinesis 15
Giessen, Germany; and the Institut für Grenzgebiete der Psychologie und Psychohygiene
[Institute for Border Areas of Psychology and Mental Hygiene] in Freiburg, Germany. After all three groups in the consortium failed in their appropriately
powered (what was beta?) experiments (assuming an effect per bit model!!!!!) to replicate the mean shift of the PEAR group data (Jahn et al., 2000), which form one of the strongest and most influential datasets in psi research, the question about possible moderating variables in RNG experiments rose to the forefront (historically, and ironically, doing a M-A of the REG literature focusing on the moderating variable question was a task I assigned to my Freiburg team -- Boesch and Boller -- in 1996 and 1997). Consequently, a meta
analysis was conducted to determine whether the existence of an anomalous interaction could be established between direct human intention and the concurrent output of a true RNG, and if so, whether there were moderators or other explanations that influenced the apparent connection.
Method
Literature Search
The meta
analysis began with a search for any experimental studies that examined the possibility of an anomalous connection between the output of an RNG and the presence of a living being. This search was designed to be as comprehensive as possible in the first instance, and to be trimmed later in accordance with our prespecified inclusion and exclusion criteria.
Both published and unpublished manuscripts were sought.
Manual searches were undertaken at the library and archives of the Institut für Grenzgebiete der Psychologie und Psychohygiene in Freiburg, Germany. They included searches of the following journals: Proceedings of the Parapsychological Association Annual Convention (1968, 1977 -
1981, 1983 1999), Research in Parapsychology (1969 1976, 1982, 1984, 1985, 1988), Journal of
Parapsychology (1959
-
1998), Journal of the Society for Psychical Research (1959
-
1999), European
Examining Psychokinesis 16
Journal of Parapsychology (1975 1998), The Journal of the American Society for Psychical Research
(1959
-
1998), Journal of Scientific Exploration (1987
-
1998), Subtle Energies (1991
-
1997), Journal of
Indian Psychology (1978 1999), Tijdschrift voor Parapsychologie (1959 1999), International Journal of
Parapsychology (1959
-
1968), Cuadernos de Parapsicologia (1963
-
1996), Revue Métapsychique (1960
-
1983), Australian Parapsychological Review (1983
-
1991), Research letter of the Parapsychological
Division of the Psychological Laboratory of Utrecht (1971 1984), Bulletin PSILOG (1981 1983),
Journal of the Southern California Society for Psychical Research (1979
-
1985), and the Arbeitsberichte
Parapsychologie der technischen Universität Berlin (1971
-
1980). (Why so many missing years in these journals?)
Electronic searches were conducted on the Psiline Database System (Vers. 1999), a continuously updated specialized electronic resource of parapsychologically
relevant writings
(White, 1991). The key words used to identify relevant articles in this specialized database were
Random Number Generator, RNG, Random Event Generator and REG. Electronic searches were also conducted on six CDs of Dissertation Abstracts Ondisc (Jan. 1961
-
Sep. 1999) using four different search strategies. First, the key words random number generator, RNG, random event
generator, REG, randomness, radioactive, parapsychology, perturbation, psychokinesis, PK, extra
sensory
perception, telepathy, precognition and calibration were used. Second, the key words random and
experiment were combined with event, number, noise, anomalous, anomaly, influence, generator,
apparatus or binary. Third, the key word machine was combined with man or mind. Fourth, the key word zener was combined with diode.
To obtain as many relevant unpublished manuscripts as possible, visits were made to three other prolific parapsychology research institutes: the Rhine Research Center, Durham NC;
PEAR at Princeton University; and the Koestler Parapsychology Unit at Edinburgh University.
Furthermore, a request for unpublished studies was placed on an electronic mailing list for
Examining Psychokinesis 17 professional parapsychologists (Parapsychology Research Forum
-
PRF). Is this really a professional forum? I haven't read it for many years, and it seems to be open to anyone? It is, or was, indeed open to anyone.
Finally the reference sections of all retrieved journal articles, conference proceedings, reports and thesis/dissertations were searched. The search covered a broad range of languages and included items in Dutch, English, French, German, Italian and Spanish and was otherwise limited only because of lack of further available linguistic expertise.
Inclusion and Exclusion Criteria
The final database included only studies that examined the correlation between direct human
intention and the concurrent output of true RNGs. Thus, after the comprehensive literature search was conducted we excluded studies that: (a) involved, implicitly or explicitly, only an
indirect intention toward the RNG. For example, telepathy studies, in which a receiver attempts to gain impressions about the sender's viewing of a target that had been randomly selected by a true RNG, were excluded (e.g., Tart, 1976). Here, the receiver's intention is presumably directed to gaining knowledge about what the sender is viewing, rather than on influencing the
RNG; (b) used animals or plants as participants (e.g., Schmidt, 1970); (c) assessed the possibility of a non
-
intentional, or only ambiguously intentional, effect. For instance, studies evaluating whether hidden RNGs could be influenced when the participant's intention was directed to another task or another RNG (e.g., Varvoglis & McCarthy, 1986) or studies that used babies as participants (e.g., Bierman, 1985); (d) looked for an effect backwards in time or, similarly, in which participants observed the same bits a number of times (e.g., Morris, 1982; Schmidt,
1985); (e) evaluated whether there was an effect of human intention on a pseudo RNG (e.g.,
Radin, 1982). (This list seems to exclude all Field REG studies.)
Examining Psychokinesis 18
Additionally, studies were excluded if their outcome could not be transformed into the effect size ¯ o
that was prespecified for this meta analysis (if the tool we have is a hammer, we will only look at nails). As a result, studies that compared the rate of radioactive decay in the presence of attempted human influence with that of the same element in the absence of human intention (e.g., Beloff & Evans, 1961), were excluded. The cut
off date for inclusion of studies in the meta
analysis was prespecified as 30 th August 2000. Then why do none of the searched literature sources go up to this date?
Defining Studies
Some studies were reported in both published and unpublished forms, or both as a full journal article and elsewhere as an abstract. In these cases, all reports of the same study were used to obtain information for the coding, but the report with the most details was classified as the "main report". The main reports often contained more than one "study". A study was the smallest experimental unit described that did not overlap with other data in the report. This enabled the maximum amount of information to be included. In cases where the same data could be split up in two different ways (e.g., men vs. women or morning sessions vs. afternoon sessions), the split was used that appeared to reflect the author's greatest interest in designing the study.
Many studies performed unattended randomness checks of the RNG to ensure that the apparatus was functioning properly. These control runs were coded in a separate "control" database. Data for these control runs, like the experimental database, were split based on the smallest unit described. In some experiments, data were gathered in the presence of a participant with an instruction to the participant "not to influence" the RNG (e.g., Jahn et al.,
2000). These data were excluded from both experimental and control databases due to the inherent ambiguity of whether intention is playing an influential role. (So something called a
Examining Psychokinesis 19 control condition is ambiguous and can't be included in control? In fact, the PEAR lab did a huge amount of " unattended randomness checks" that are not included in this M-A Now we see that even the baseline condition, which was explicitly designed to be compared against the bi-polar high-vs-low intention conditions is not included in controls?)
Moderator Variables
To identify potential moderators, all variables suggested by previous literature reviews were coded [blindly? No -- this coding was certainly not blind] (Gissurarson, 1992 & 1997;
Gissurarson & Morris, 1991; Schmeidler, 1977). Additionally, several descriptive variables and variables explicitly or implicitly held responsible for the presence of absence of an anomalous correlation were placed on the internet and discussed by researchers interested in the topic.
After considering the feedback and making any requisite revisions to the coding form, 20 papers were randomly selected and independently pilot coded by FS and EB. Afterwards, the two sets of coding were compared, coder disagreements were discussed, ambiguities in the coding descriptions were clarified, and the coding form was finalized.
The variables coded covered six main areas: (i) Basic information, such as year of publication and study status (i.e., formal, pilot, mixed, control); (ii) Participant information, such as selection criteria (e.g., none, psychic claimant, prior success in psi experiment, ?); (iii) Experimenter
information, such as whether the experimenter also acted as a participant (e.g., no, yes, partially);
(iv) Experimental setting, such as type of feedback (visual, auditory, ?); (v) Statistical information variables, such as total number of bits (sample size) [what about bits/effort, bits/subject, etc.?]; and (vi) Safeguard variables.
The final coding form contained 67 variables. The comprehensive coding was applied because, prior to coding the studies, it was not clear which variables would provide enough data for a sensible moderator variable analysis. However, because of the importance of the
Examining Psychokinesis 20 safeguard variables, i.e., the moderators of quality, we prespecified that the impact of the three safeguard variables would be examined independently of their frequency distribution. The safeguards were: (1) RNG control, which recorded whether malfunction of the RNG had been ruled out by the study, either by using a balanced design or by performing control runs of the
RNG (but note that real control data were excluded in some cases); (2) all data reported, which addressed whether the final study size matched the planned size of the study or whether optional stopping or selective reporting may have occurred (note that PEAR REG studies do not report a planned size, but that optional stopping CAN NOT have and effect on the outcome); (3) split of data (define) , which noted whether the split of data reported was explicitly planned or was potentially post
hoc. All safeguards were ranked on a three point scale (yes [2], earlier 3 /unclear[1], no[0]) with the intermediate value being coded either when it was unclear whether the study actually took the safeguard into account or where it was only partially taken into account. Because summary scores of safeguard variables are problematic if considered exclusively (e.g., Jüni, Witschi, Bloch, & Egger, 1999), we examined the influence of the safeguard variables separately and in conjunction.
The main coding was undertaken by FS [blindly?]. For any potentially controversial or difficult decisions, FS consulted with EB. If FS and EB could not agree, the final decision fell to HB, who was blind as to who held which opinion. Over time, HB's decisions generally supported FS and EB equally, thus suggesting that HB served well as mediator.
Analyses
All analyses were performed using SPSS (Vers. 11.5) software. The effect sizes of individual studies were combined into composite mean weighted effect size measures as described in
3 When authors referred to previous studies in which the RNG was tested, studies were coded as controlled
"earlier".
Examining Psychokinesis 21
Footnote 1. To determine whether the from each subsample (class) significantly differed from
MCE, the standard error based on the within
study variance was calculated (Shadish &
Haddock, 1994). (but in this database and M-A, the between-study variance within subsamples is at issue. The chosen se is arguably not appropriate. It could only be correct if the subsamples are indeed homogeneous -- did the authors really identify all the moderators? One that is apparently not identified is subject differences, what PEAR called "operators" and showed to be a powerful moderator.) The resulting z score indicates whether ¯ o
differs from MCE. To determine whether each subsample of s shared a common effect size (i.e., was consistent across studies), a homogeneity statistic Q was calculated, which has an approximately 2 distribution with k - 1 degrees of freedom, where k is the number of effect sizes (Shadish &
Haddock, 1994). Given the importance of this, the method for calculating Q should be better specified. I'm guessing it is the sum of z scores. The difference between two effect size estimates was determined using z
?
= ( ¯
1
-
¯
2
) / sqrt (SE
1
2 + SE
2
2 ).
As an initial, straightforward, sensitivity approach to estimating the overall effect size, we trimmed the overall experimental sample until it became homogeneous. This was conducted by applying an algorithm to the data which successively excluded the study that contributed most to the heterogeneity of the sample. The procedure stopped when the 2 heterogeneity statistic became non
significant (Hedges & Olkin, 1985). A comparison between the resulting homogeneous sample and the studies that had to be trimmed from the overall database (the
"trimmed studies" sample) allows one to assess the reliability of the estimated effect size and to estimate the impact of aberrant values on the overall result.
In an attempt to explore the putative impact of moderator and safeguard variables on the effect size and to determine the source(s) of heterogeneity, a meta
regression analysis was carried out. Meta regression is a multivariate regression analysis with independent studies as
Examining Psychokinesis 22 the unit of observation (e.g., Thompson & Higgins, 2002; Thompson & Sharp, 1999). This analysis determines how the variables in the model account for the heterogeneity of effect size.
We applied a weighted stepwise multiple regression analysis with the moderators as predictors and effect size as the dependent variable.
In the absence of homogeneity, and as a general sensitivity measure, we additionally calculated a random
effect model (Shadish & Haddock, 1994), which takes into account the variance between studies (i.e., heterogeneity on the basis of the Q homogeneity statistics). This should be better described. I'm not entirely clear on the difference between fixed and random effect models. Because, generally, the standard error using a random effects model is larger, the test statistic is consequently more conservative then the test statistic of the fixed effect model. The z
score (rnd) indicates whether ¯ o
differs from MCE using a random effect approach. However, even in the absence of homogeneity, the fixed
effect model is particularly appropriate in the context of the studies collected here because although the impact of some alleged moderator variables will be examined, no moderator has been established yet and the overall effect remains a matter of contention.
Results
Study Characteristics
The literature search retrieved 155 main reports containing 712 experimental studies and
158 control studies. After applying the inclusion and exclusion criteria, the meta analysis included 114 reports containing 357 experimental studies and 142 control studies (see
Appendix).
The basic study characteristics are summarized in Table 3. The heyday of RNG experimentation was in the 1970s, when more than half of the studies were published. A quarter of the studies were published in conference proceedings and reports, but most of the
Examining Psychokinesis 23 studies were published in journals. The number of participants in the studies varied considerably. Approximately one quarter of studies were conducted with a sole participant and another quarter with up to 10 participants. There were only a few studies with more than 100 participants. The sample size of the average study is 6,095,359 bits. However, most studies were much smaller, as indicated by a median sample size of 6,400 bits (see Table 4). The few very large studies considerably increased the average sample size and resulted in an extremely right
skewed distribution of sample size. This variable was therefore log10
transformed.
Consequently, a significant linear correlation or regression coefficient of sample size with another variable would indicate an underlying exponential relationship.
Overall Analyses
When combined, the 357 experimental studies yielded a small, but statistically significant effect ( ¯ o
= .500029, SE = .000011, z = 2.73, p = .003, one
tailed). The 142 control studies yielded a non
significant effect ( ¯ o
= .500026, SE = .000015, z = 1.76, p = .08) that was nevertheless comparable in size to the effect demonstrated in the experimental studies
(z
?
= 0.15, p = .87). However, because RNG experiments do not follow a classical control group design, this comparison is merely descriptive. The control studies had a much larger median sample size than the experimental studies (50,000 bits vs. 6400 bits, respectively).
The two samples differed considerably in respect of their effect size distribution. Whereas the control data distributed homogeneously ( 2 (141) = 138.16, p = .55), the effect size distribution of the experimental studies was extremely heterogeneous ( 2 (356) = 1442.89, p = 1.45*10 130 ).
We therefore conducted several sensitivity analyses on the intentional data in an attempt to determine the source(s) of the heterogeneity.
Examining Psychokinesis 24
Trimmed Sample
As can be seen in Table 4, 70 studies had to be excluded before the 2 heterogeneity statistic became non
significant, a value (20%) that, although at higher end of the span (what span?), is not uncommon in meta analyses on psychological topics (Hedges, 1987). The homogenous sample of 287 studies had very similar characteristics to the original overall sample. Both samples had comparable mean and median sample sizes (numbers of bits per study), and comparable mean and median number of participants (see Table 4). Their effect sizes were also similar (z
?
= .80, p = .42), although the effect in the smaller, homogenized sample did not reach significance.
The heterogeneous sample that had been removed from the experimental database differed considerably from both the overall and the homogenized sample. The removed studies had generally been published earlier and had been carried out with fewer participants than the other two samples; they also contained fewer studies with large sample sizes than the homogeneous subsample (see Table 4). (note the convergence with the observation that es is anticorrelated with N) The effect size, too, of the studies that had been removed was almost one order of magnitude larger than that of the overall sample (z
?
= 3.33, p = 8.68*10 4 ) and of the homogenized sample (z
?
= 2.99, p = 2.77*10 3 ).
Safeguard Variable Analyses
The majority of studies had adequately implemented the specified safeguards (see Table 5).
Almost 40% of the studies (n = 138) were given the highest rating for each of the three safeguards.
Generally, inadequate safeguards did not appear to be a reason for the significant effect found in the experimental studies. Studies implementing RNG controls or reported all data safeguards did not have significantly lower effect sizes than studies that did not implement
Examining Psychokinesis 25 these safeguards (z
?
= .14, p = .89; z
?
= 1.13, p = 0.19; respectively). However, studies with a post
hoc split of data had a significantly larger effect size than studies that had preplanned their split of data (z
?
= 3.30, p = 9.55*10 4 ).
The safeguard sum
score revealed that the mean effect size of the 138 studies with the maximum safeguard rating was tenfold larger than that in the overall experimental database
(z
?
= 2.59, p = 9.50*10 3 ). These high
quality studies had a smaller mean sample size and were conducted slightly more recently than the average study in the overall database. (later studies are smaller?) However, it is not possible to draw any conclusions about the impact of study quality on effect size because of the uneven distribution of studies across the summary scale.
Nevertheless, the trend that study quality is connected with study size and publication date is suggestive. As can be seen from Table 5, smaller and older studies seem to be of lower quality.
(and have smaller effect size?)
In summary, the heterogeneity in the database is not primarily due to the contribution of misleadingly significant results from badly
designed studies.
Moderator Variable Analyses
Beside sample size and year of publication, which are generally highly underrated potential moderators, only very few of the variables coded provided enough entries for us to be able to carry out sensible moderator variable analyses. For instance, we were interested whether participants filled in psychological questionnaires. Although this was the case in 96 studies, only 12 used an established measure. Therefore, beside sample size and year of publication, we focused on five primary variables for RNG experiments which provided enough data for a sensible moderator variable analysis.
The summary given in Table 4 compares the mean effect sizes associated with the 5 potential moderators with those from the overall and the trimmed sample. It is quite obvious
Examining Psychokinesis 26 that sample size is the most important moderator of effect size. (a repeated refrain) The studies in the quartile comprising the smallest studies (Q1) have an effect size which is three orders of magnitude bigger than the effect size in the quartile with the largest studies (Q4). The difference is highly significant (z
?
= 8.84, p < 1*10 10 ). The trend is continuous: the smaller the sample size, the bigger the effect size; a connection which Sterne, Gavaghan, & Egger (2000) called the "small
study effect". The funnel plot (Figure 1) illustrates the effect. Whereas the bigger studies distribute symmetrically round the overall effect size, the distribution of studies below 10,000 bits is increasingly asymmetrical. In respect of the mean year of publication, the largest studies (Q4) stand out from the other three, smaller
study quartiles. The largest studies are, on average, published 4
-
5 years later than the smaller studies. Most of the big studies, with very small effect sizes, have been published only recently (e.g., Jahn, Dunne, Dobyns,
Nelson, & Bradish, 1999; Jahn et al., 2000; Nelson, 1994).
The year of publication underpins the importance of sample size for the outcome of the studies (see Table 4). The oldest studies (Q1), which have the smallest mean sample size, have an effect size which is two orders of magnitude bigger than the effect size of the newest studies
(z
?
= 13.53, p < 1*10 10 ), which have the largest mean sample size. (This seems to differ from statements on preceding page) However, the impact of sample size is not evident for the two middle quartiles. (Note that the largest studies, PEAR, use only unselected subjects, who evidently have a smaller average es. Is this accounted for in the present modeling and M-A?)
The effect size of the old studies (Q2) is significantly larger (z
?
= 2.26, p = 0.02) than the effect size of the new studies (Q3), although the sample size of the old studies is larger (old study N is larger?). Thus, time might play a role on its own. The experiments might have changed in other ways than increase of samples size. (Isn't that what a moderator variable study is supposed to assess?)
Examining Psychokinesis 27
Although causal connections are difficult to establish in meta
analyses, we examined the interaction of sample size and year of publication and their impact on effect size in order to understand how the two moderators are linked to one another. The oldest studies (Q1) are particularly interesting because their effect size is considerably bigger than the effect size of the subsequent studies (Q2
-
Q4). The z
value of the oldest studies is also the highest of the four quartiles and thus indicates that they were the most successful (see Table 4). A median split of the oldest studies according to sample size revealed that the effect sizes of the two halves differ significantly from each other (z
?
= 5.62, p = 5.62*10 8 ). The half with the smaller sample size
(n = 45, M = 982, Mdn = 500) has an effect size of ¯ o
= .519137 (SE = .002484, z = 7.71, p <
1*10 10 ) whereas the half with the bigger sample size (n = 45, M = 36123, Mdn = 9600) has an effect size of ¯ o
= .505000 (SE = .000399, z = 12.53, p < 1*10 10 ). The mean year of publication in both subsamples is the same M = 1971 and not different from the mean year of publication of the whole quartile. The analysis suggests that sample size is the deciding moderator and not year of publication, a finding that the pure magnitude of the effect size in the quartiles of sample size and year of publication also suggests (see Table 4). (It would be good to report the same analysis on the newer studies)
The number of participants in RNG experiments is clearly linked to effect size (see Table 4).
Studies with a sole participant (Q1) have an effect size that is at least one order of magnitude bigger than studies with more than one participant. However, this finding, too, is confounded by sample size (see Table 4). Applying the same method to the quartile of studies with one participant (Q1) as used before, we found that the studies with smaller sample sizes (n = 47,
M = 1817, Mdn = 960) have a bigger effect size ( ¯ o
= .510062, SE = .001784, z = 5.64, p =
1.70*10 8 ) than the studies with larger sample sizes (n = 44, M = 239197, Mdn = 24124,
o
= .500368, SE = .00017, z = 2.19, p = 0.03). The difference is highly significant (z
?
= 5.41, p =
Examining Psychokinesis 28
6.32*10 8 ). This analysis puts the apparent superiority of studies with a sole participant into question and identifies sample size as an important confounder. However, it can be argued that small studies in general and small one
participant studies in particular are fundamentally different from larger studies - an argument that cannot easily be dismissed. It actually is one of the potential sources accounting for the small study effect, which will be discussed later. (But the preceding analysis apparently discounts this with an explicit and relevant comparison.)
The current meta
analysis also seems to support the claim that selected participants perform better than non
selected participants. The claim has already been confirmed by an earlier meta
analysis (Honorton & Ferrari, 1989). As can be seen in Table 4 the effect size of studies with selected participants is one order of magnitude bigger than the effect size of studies that did not select their participants on e.g. grounds of prior success in a psi experiment or for being a psychic claimant. Studies with selected participants are predominantly carried out with only one or very few participants, as indicated by the mean and median number of participants in
Table 4. Studies with unselected populations are regularly carried out with more participants than studies with selected participants. However, this finding is confounded by sample size
(and by number of participants. Studies with unselected populations also have a larger sample size than studies with selected participants. The systematic selection of participants might play an important role in RNG experiments, and it is certainly not implausible that longer experiments (is this an implicit equation of length of experiment with N of samples? It is not so simple.) are tiring for participants and therefore might produce different results. The argument is similar to that regarding the number of participants in RNG experiments, where experiments with fewer participants may be shorter and/or make participants feel more involved.
Study status is an important moderator in meta
analyses that include both formal and pilot experiments. Pilot experiments are likely to comprise a selective sample insofar as they tend to
Examining Psychokinesis 29 be published if they yield significant results (and hence larger
than
usual effect sizes) and not published if they yield unpromising directions for further study. However, pilot and formal studies in this sample did not differ in respect of effect size (z
?
= 0.68, p = 0.50). Although the effect size of the pilot studies was bigger, it is not significantly different from the null value because its SE is more than four times larger (see Table 4). Pilot experiments are, as one would expect, smaller than formal experiments.
The type of feedback to the participant in RNG studies has been regarded as an important issue in psi research from its very inception. The majority of RNG experiments provide participants with visual and some with auditory feedback. Beside the two main categories the coding resulted in a large "other" category with 101 studies, which used, for example, alternating visual and auditory feedback, or no feedback at all. The result is clear
cut: studies providing exclusively auditory feedback outperform not only the studies using visual feedback
(z
?
= 6.12, p = 9.24*10 10 ), but also the studies in the "other" category (z
?
= 5.93, p = 2.01*10 9 ).
However, this finding is based on a very small and very heterogeneous sample of large studies
(see Table 4), although the studies using visual feedback are, on average, even larger.
Nevertheless, the auditory feedback studies were surprisingly comparable to the large sample size studies (Q3) in terms of their mean numbers of participants, year of study, and other variables (see Table 4).
The core (this isn't quite the right word) of all RNG studies is the random source. Although the participants' intention is generally directed (by the instructions given to them) to the feedback and not to the technical details of the RNG, it is the sequence of random numbers that is compared with the theoretical expectation (e.g., binominal distribution) and that is, therefore, allegedly influenced. RNGs are based on truly random radioactive decay, Zener noise, or occasionally thermal noise. As shown in Table 4, the effect size of studies with RNGs
Examining Psychokinesis 30 based on radioactive decay is two orders of magnitude larger than the effect size of studies using noise (z
?
= 4.28, p = 1.87*10 5 ). However, this variable, too, is confounded by sample size.
Studies using radioactive decay are much smaller than studies using noise (see Table 4).
Chronologically, studies with RNGs based on radioactive decay predominated in the very early years of RNG experimentation, as indicated by their mean year of publication, which is just two years above the mean year of publication of the oldest studies in our sample (see Table 4).
Meta-Regression Analysis
The meta
regression analysis included the seven moderator variables (see Table 4) and the three safeguard variables (see Table 5) discussed above, using effect size as the dependent variable and the moderators and the safeguards as predictors. The three variables sample size, year of publication, and number of participants, which were split into quartiles for the previous moderator variable analyses (see Table 4), went into the regression analysis with their nominal values. All other variables were dummy coded. The analysis was weighted by the inverse of the within
study variances (this again assumes that all important moderators have been identified, and that probably is not the case as noted before). From the 10 predictors only two, year of publication and auditory feedback, entered the model (see Table 6) 4 . However, the model
4 The moderator variable analyses already clearly indicated that sample size is the most important predictor of effect size. The importance of the predictor was not confirmed by our weighted meta regression analyses because it is weighted by the inverse of the within study variance which almost perfectly correlates with sample size.
Therefore, sample size cannot enter the regression model. An unweighted stepwise multiple regression analysis with the same predictor variables clearly stresses the importance of sample size in this meta analysis. Sample size enters the model first and accounts for 9% of the variance. After three more steps the model accounts for 20% of the variance with sample size ( = .26), RNG control earlier ( = .28), random source radioactive ( = .14) and split of data preplanned ( =
-
.11) as predictors. However, an unweighted regression
analysis is difficult to interpret, especially when the effect size is so strongly connected with the study variance. Larger studies provide
Examining Psychokinesis 31 accounts for only 5% of the variance. This suggests that none of the variables we put into the regression analysis, nor any combination of the variables, accounts for the great variability in effect size we found in the data.
Random Effect Model
As can be seen in Table 4, the z
score (rnd) for the effect size based on a random
effects model for all heterogeneous subsamples of studies becomes non
significant. This measure also indicates, as all previous analyses have shown, that there is no simple overall effect. If there is an effect of human intention on the concurrent output of true RNGs then at least one moderator must be involved. From all sensitivity analyses presented here, sample size seems to be the most promising candidate. However, the connection between sample size and effect size can have many different causes, as will be discussed in the next section.
Discussion
Altogether, the meta
analysis divulged (an awkward word, perhaps "indicated" is better) three main findings: (i) a very small but statistically significant overall effect, (ii) a tremendous variability of effect size and (iii) a highly visible (I'd say "suspected") small study effect. (the tremendous variability points to an inadequate model, and incomplete specification of moderators. It cannot be appropriate to simply lay a random effects model on such data.
Moreover, this also points to an inadequate fixed effects model.)
Statistical Significance
The meta analysis replicated the finding of the previous meta analyses in the sense that the very small overall effect was significantly different from the expected value. The mean effect size better estimates and should have more weight, otherwise the regression is dominated by the impact of smaller studies, which might be more prone to publication bias and other effects which will be discussed under the heading of the small study effect in the discussion section.
Examining Psychokinesis 32 of the control studies did not differ significantly from MCE, although the size of the effect was comparable to that of the experimental studies. This does not necessarily imply that the effect found in the experimental studies is spurious; RNG experiments do not follow a standard test
control design to determine an effect, rather the test is always (not always) against MCE.
Control data in RNG experiments are simply used to demonstrate that the RNG output fits the theoretical premise (e.g., binominal distribution). (this is not true, or it is an idiosyncratic, and inappropriate definition of control -- which I believe is in fact presented early in this paper.) If the control data do not fit the theoretical premise, the RNG will be revised or a different RNG will be used. As a consequence, published control data are unequivocally (how do they know?) subject to a positive selection process - that is, divergent control data will generally not be published because they would cast doubt on the experiment as a whole. (This seems to imply that studies published without citing control data have withheld that data, which
I doubt is always, or even often, the case.) The fact that the experimental studies reached statistical significance and the control studies did not is a matter of statistical power - the sample size of the control studies is only half as large as the sample size of the experimental studies (1.07*10 9 bits vs. 2.17*10 9 bits, respectively). (as noted earlier, by the author's definition, huge quantities of "control" data are not included in this analysis, so any discussion of the es is misleading.) Moreover, the p value for the control studies was based on two sided testing, whereas the p
value for the intentional studies was one
sided.
The safeguard analyses demonstrated that the significance of the experimental studies is not the result of low quality studies. Nevertheless, the statistical significance, as well as the overall effect size, of the combined experimental studies has dropped continuously from the first meta
analysis to the one reported here. This is partially the result of the more recent meta
analyses including newer, larger studies. However, another difference between the current and the
Examining Psychokinesis 33 previous meta
analysis lies in the application of unequivocal inclusion and exclusion criteria.
We focused exclusively on studies examining the alleged concurrent interaction between direct human intention and RNGs. All previous meta
analyses also included non
intentional (in the sense of fieldreg studies, no; but we did include studies with implied intention) and non
human studies. Although this difference might explain the reduction in effect size and significance level, it cannot explain the extreme statistical heterogeneity of the database. This topic was neglected in the previous RNG meta
analyses (not really, we did look at the issue of homo- and heterogeneity; we just didn't try to figure out where it came from, as that wasn't the purpose of those MAs). We believe that the overall statistical significance found in our meta
analysis cannot be unequivocally interpreted in favor of an anomalous interaction as long as the tremendous variability of effect size remains unexplained. (well, yeah -- so isn't that the purpose here?)
Variability of Effect Size
The variability of effect size in this meta analysis is tremendous. We took several approaches to address this variability. For instance, trimming 20% of the studies from the overall sample resulted in a homogeneous subsample of studies. (but your other analyses indicated there was in fact no justification for the trimming on basis of, say, quality. Therefore, given the time and
N vs es correlations, this trivially must weaken the database.) Although the effect size of the homogenous sample did not differ significantly from that of the overall sample, the outcome did not differ significantly from MCE. However, although this indicates how vulnerable the overall result is in terms of statistical significance, this approach cannot explain what variables or (selection) processes are responsible for the variability. The extreme variability does not seem to be the result of any of the moderator variables examined
-
none of the moderator variable subsamples was independently homogeneous, not even sample size.
Examining Psychokinesis 34
The moderator variable analyses demonstrated that sample size was a consistent and prominent moderator of effect size (variability). All of the moderator variables we analyzed were confounded by sample size. The Monte Carlo simulation of publication bias at the end of the next section will demonstrate that even though variability of effect size and the small
study effect are two separate concepts, they are in fact connected.
Small-Study Effect
For a similar class of studies (what sorts of studies are similar to these?) it is generally assumed that effect size is independent of sample size. (yes) However, from the sensitivity analyses it is evident that the effect sizes in this meta
analysis strongly depend on sample size.
(yes!) The asymmetric distribution of effect sizes in the funnel plot (see Figure 1), as well as in the continuous decline of effect size with increasing sample size in the sample size quartiles, illustrate this. How can this be explained?
Table 7 provides a list of potential sources for the small
study effect. The sources fall into three main categories (1) true heterogeneity, (2) data irregularities, and (3) selection biases. (Is true heterogeneity the representation of a better model? If so it is sensible, but needs to be spelled out.) Chance, another possible explanation for a small
study effect, seems very unlikely because of the magnitude of the effect and the sample size of the meta
analysis.
True heterogeneity
The higher effect sizes of the smaller studies may be due to specific differences in experimental design or setting in the smaller compared with the larger studies. In other words, the small
study effect is seen to be the result of certain moderator variable(s). For instance, smaller studies might be more successful because the participant
experimenter relationship is more intense. The routine of longer experimental series may make it difficult for the experimenter to maintain enthusiasm in the study. However, explanations such as these remain
Examining Psychokinesis 35 speculative as long as they are not systematically investigated and meta
analyzed. (There is another general category of potential explantion that is consistent with the N vs es finding -- the mechanism of the effect may not be correctly modeled by assuming bit-wise effects.)
From the moderator variables investigated in this meta
analysis, the hypotheses that smaller studies on average tested a different type of participant and used a different form of feedback are the most interesting. However, the moderator variable analyses showed that these two variables, as well as all other variables examined, are linked to sample size. For almost all moderator variables, the subsamples ("class" in Table 4) with the smallest mean sample sizes had the biggest effect. This suggests (but does not demonstrate. Given other issues, such as the model assumptions, this conclusion should not be accepted.) that sample size, and not the moderator variables, was the prime factor driving the heterogeneity of the database. This view is also supported by the heterogeneity of effect size observed in all of the moderator subsamples
.
Empirically, true heterogeneity among studies cannot be eliminated as a causal factor for the small
study effect, especially regarding complex interactions, which we have disregarded here.
However, the heterogeneity of the moderator
variable subsamples and the outstanding importance of sample size at all levels of analysis exclude true heterogeneity as the main source accounting for the small
study effect. This ignores goal-directedness, DAT, effort per time, etc.
It also ignores the possibility that PK operates on the sample size distribution, which would predict a perfect sqrt(N) dependency.
Data irregularities
A small
study effect may be due to data irregularities threatening the validity of the data. For example, smaller studies might be of poorer methodological quality, thereby artificially raising their effect size compared with that of larger studies. However, methodological quality
Examining Psychokinesis 36 improves only marginally with increasing sample size (r(357) = .18, p = 8.86*10 4 ) and therefore cannot explain the small
study effect in this meta
analysis. Another form of data irregularity, namely inadequate analysis, is based on the assumption that smaller trials are generally analyzed with less methodological rigor and therefore are more likely to report "false
positive results". However, this potential source is excluded here due to the straightforward and simple effect size measure used. A general source of data irregularity might be that smaller studies are more easily manipulated by fraud than larger studies because, for example, fewer people are involved. However, the number of researchers that would have to be implicated over the years renders this hypothesis very unlikely. In general, none of the data irregularity hypotheses considered can explain the small
study effect.Selection biases
When the inclusion of studies in a meta analysis is systematically biased in a way that smaller studies with smaller (original may be correct?) p
values, i.e., larger effect sizes, are more likely to be included than larger studies with smaller p
values, i.e., smaller effect sizes, a small
study effect may be the result. Several well
known biases such as publication bias, selective reporting bias, foreign language bias, citation bias and time lag bias may be responsible for a small
study effect (e.g., Egger, Dickersin, & Smith, 2001; Mahoney, 1985).
Biased inclusion criteria refer to biases on the side of the meta
analyst. In this particular domain, the two most prominent biases are foreign language bias and citation bias. Foreign language bias occurs when significant results are published in well
circulated, high
impact journals in English, whereas non
significant findings are published in small journals in the authors' native language. Therefore a meta
analysis including studies solely from journals in
English may include a disproportionately large number of significant studies. Citation bias refers to selective quoting. Studies with significant p
values are quoted more often and are more likely to be retrieved by the meta analyst. However, the small study effect in this meta -
Examining Psychokinesis 37 analysis is probably not due to these biases due to the inclusion of non
-
English publications and a very comprehensive search strategy. The authors claimed to use a very comprehensive search to retrieve all known studies, including unpublished studies. In which case, where are all these missing studies coming from? With say only 20 different groups of researchers ever having done these sorts of studies, each group would have had to hide on average 70 filedrawer studies given the authors' later estimate of 1400 filedrawer studies. This is implausible.
One of the most important selection biases to consider in any meta
analysis is publication bias. Publication bias refers to the fact that the probability of a study being published depends to some extent on its p
value. Several independent factors affect the publication of a study.
Rosenthal's term "file drawer problem" (Rosenthal, 1979) focuses on the author as the main source of publication bias, but there are other issues too. Editors' and reviewers' decisions also affect whether a study is published. The time lag from the completion of a study to its publication might also depend on the p value of the study (e.g., Ioannidis, 1998) and additionally contribute to the selection of studies available. Since the development of
Rosenthal's "file drawer" calculation (1979), numerous other methods have been developed to examine the impact of publication bias on meta
analyses (e.g., Dear & Begg, 1992; Duval &
Tweedie, 2000; Hedges, 1992; Iyengar & Greenhouse, 1988; Sterne & Egger, 2001).
In an attempt to examine publication bias we ran a Monte Carlo simulation based on
Hedges (1992) stepped weight function model and simulated a simple selection process.
According to this model, the authors', reviewers', and editors' perceived conclusiveness of a p
value is subject to certain "cliff effects" (Hedges, 1992) and this impacts on the likelihood of a study getting published. Hedges (1992) estimates the weights of the step function based on the available meta analytical data. However, different from Hedges, we used a predefined step weight function model, because we were primarily interested in seeing whether a simple
Examining Psychokinesis 38 selection model may in principle account for the small
study effect present in our meta
analytic data.
We assumed that 100% of studies with a p
value ? .01, 80% of studies with a p
value between p ? .05 and p > .01, 50% of studies with a p
value between p ? .10 and p > .05, 20% of studies with a p
value between p ? .50 and p > .10 and 10% of studies with p
value > .50 (one
sided) are published 5 . From this assumption, we randomly generated uniformly distributed p
values (Sure this doesn't mean p = 0.01 is just as likely as p = 0.5. I think they must mean normally distributed around chance expectation.) and calculated the effect sizes for all
"published" studies and counted the number of "not published" studies. That is, on the basis of the sample size for each of the 357 studies, we simulated a selective null
effect publication process (meaning, I think, the distribution of studies averages at chance, or p = 0.5). The averaged results of the simulation of 1000 meta
analyses are shown in Table 8. As can be seen, the overall effect size based on the Monte Carlo simulation perfectly matches the overall effect size found in our meta
analysis (see Table 4). The simulated data clearly replicated the small
study effect (see Table 8). The simulation also shows that, in order for these results to emerge, a total of 1453 studies had to be unpublished, i.e. for every study published four studies (nonsignificant) had to remain unpublished. That's awfully close to Rosenthal's criterion of
"robust." I wonder how long they worked with their model parameters to produce these results. Also, the results for the small sample size is nearly significantly different from their simulation. I'll see if I can replicate their model to see how insensitive it is to the choice of
5 The term published is used here very broadly to include publications of conference proceedings and reports which in terms of our literature search were considered unpublished. Importantly, in our discussion of the Monte
Carlo simulation, the term "published" also refers to studies obtained by splitting reports into studies. For simplicity, we assumed in the Monte Carlo simulation that the splitting of the 114 reports into 357 experimental studies was subject to the same selection process as the published reports themselves.
Examining Psychokinesis 39 parameters. (also -- is the Hedges work on selective reporting in ordinary psychology a good model for parapsychology?)
A secondary finding, which additionally confirms the value of the simulation, is that publication bias might be responsible not only for the small overall effect and the small
study effect found, but also for a large proportion of the effect size variability. The simulated overall sample as well as the 4 th quartile of the largest studies show a highly significant effect size variability, replicating what was found in our meta
analytical data. The effect size variability in the first three quartiles of the simulation is certainly different from the effect size variability in our meta
analytical data. However, this might be due to the highly idealized boundary conditions of the simulation.
Conclusion
Altogether, the simulation results are in very good agreement with the meta
analytical data
(except for 3/4ths of it), especially regarding the overall effect size and the small
study effect, which was the primary objective of the simulation. The simulation must be considered at least very suggestive, especially as it accounts for all three main findings divulged in the meta
analysis: (i) a very small but statistically significant overall effect, (ii) a tremendous variability of effect size (only in the large study quartile) and (iii) a highly visible small
study effect. In comparison with all other sources potentially accounting for the small
study effect discussed here, publication bias clearly is the most far reaching explanation regarding the main findings of this meta
analysis. (this monte carlo sounds like a simple case of carefully designed inputs to yield the expected outputs, and even then glossing over the mis-fitting.)
Nevertheless, whether the simulation of publication bias is considered conclusive evidence for publication bias in this meta analysis strongly depends on how many unpublished studies one believes it is reasonable to assume there are. However, the number of unpublished studies
Examining Psychokinesis 40 must be seen against the background of the enormous pressure to publish significant results. It is clear that not all results get published and that journals are filled with a selective sample of statistically significant studies (e.g., Sterling, 1959, Sterling, Rosenbaum, & Weinkam, 1995,
Rosenthal, 1979) (all references to non-parapsychology fields). Given that the majority of published studies are underpowered, it is surprising that 95% of articles in psychological journals and more than 85% of articles in medical journals report statistically significant results
(Sterling, Rosenbaum, & Weinkam, 1995). Authors, reviewers, as well as editors, are all involved in the selection process.
J.B. Rhine was the first editor of the Journal of Parapsychology (inception in 1937), the leading journal for experimental work in parapsychology. He initially believed "that little can be learned from a report of an experiment that failed to find psi" (Broughton, 1987, p. 27). More than
25% (n = 96) of the studies included in the current meta
analysis were published in this journal. However, from 1975, the Council of the Parapsychological Association rejected the policy of suppressing non
significant studies in parapsychological journals (Broughton, 1987;
Honorton, 1985). Whereas 48% of the studies in Q1 (1959
-
1973) were statistically significant, this rate dropped to 19% (Q2), 8% (Q3) and 14% (Q4) in the subsequent quartiles, indicating that the policy was implemented. (Interesting to consider the likelihood of these percentages being significant -- as a simple study-based model to compare with the bit-based model.) This demonstrates not only that the publication rate of significant studies in this domain is very different from the rate in conventional fields, but also, and more importantly, it demonstrates that, at least in the early period of RNG experimentation, the publication process was moderately selective in favor of statistically significant studies (Huh? If - as they state - 95% of studies in standard psych journals are significant, yet for psi the figures range from 8% to
Examining Psychokinesis 41
48%, then surely this leads us away from publication bias as an viable explanation, not towards!?).
Statistically, significance is a matter of power, but, when conducting a meta
analysis, it can also be a matter of artifacts like publication bias. From this perspective, not only is the early period of RNG experimentation in great danger of being a highly selective sample, but also the database as a whole. A power analysis based on the overall effect size found in our meta
analysis shows that, for an RNG study to have a power of 80%, its sample size would have to be greater than 1,800,000,000 bits. (Ok, is this just obstinacy, or are they wearing blinders, or what?) (I am afraid it is or what!) None of the studies included in our meta
analysis comes even close to this sample size. Therefore the number of significant studies is highly questionable.
(sigh)
The studies published and collected here are probably a highly selective sample. The Monte
Carlo simulation indicated that there would have to be 1400 unpublished studies to account for the main findings presented here. However, "real world" publication decisions do not simply fit to a single, consistent set of threshold values. Publication decisions in respect of the smallest studies might follow completely different and more complex patterns. Consequently, far fewer (or far more!) studies might be needed in order to replicate the main findings of our meta
analysis than indicated by the Monte Carlo simulation. We believe that the 1400 unpublished studies mark the upper limit of the range of unpublished studies (why?). As a result, we doubt that the RNG database provides good evidence for an anomalous connection between direct human intention and the concurrent output of true RNGs. (what on earth justifies this conclusion?)
The limits of this conclusion are, of course, the assumptions to be made. (well, yeah) One of the main assumptions in undertaking a meta
analysis is the assumption that effect size is
Examining Psychokinesis 42 independent of sample size (finally!). The independence of sample size actually defines effect size measures (Not necessarily; there are all kinds of effect sizes). However, there might be effects where sample size is related to effect size. In such a case, the p
values would be independent of sample size and e.g. be constant across studies. Although our data does not confirm this simple model because the z
score distribution is far too heterogeneous, (explain how this conclusion is justifiable. No such calculation is reported in this paper.) other more complex models are conceivable. However, so far meta analysts in RNG research have not argued along these lines, they have argued that there is a small but replicable constant effect.
Another assumption that is generally made is that intention affects the mean value of the random sequence. (that is not an assumption, but the dependent variable in most experimental designs.) Although other outcomes have been suggested (e.g., Atmanspacher, Bösch, Boller,
Nelson & Scheingraber, 1999; Pallikari & Boller, 1999; Radin, 1989) they have been used only occasionally. (Ignores virtually all of Schmidt's and Kennedy's papers on this topic. Also, on p.
45 of our RNG MA paper in the Jonas book, we say "This means the statistical effects observed in these experiments are effectively independent of sample size, and cannot be explained as simple, linear, force- like mechanisms?. Further indication that a novel approach will be required to explain these effects are experiments strongly resembling RNG studies, but involving pre-recorded random bits rather than bits generated in real-time. Those studies show significant cumulative results similar to those reported here (Bierman, 1998). This implies that some MMI effects, perhaps including those claimed for distant healing, may involve acausal processes."
Although we question the conclusions of our predecessors, we would like to remind the reader that these experiments are highly refined operationalizations of phenomena which have challenged mankind for a very long period of time. The dramatic anomalous PK effects
Examining Psychokinesis 43 reported in séance rooms shrunk to experiments with electronic noise during a 100 year history of PK experiments. This achievement is certainly humble. However, further experiments will be conducted. They should be registered. This is the most straightforward solution for determining with any accuracy the rate of publication bias (e.g., Chalmers, 2001). It allows subsequent meta analysts to resolve more firmly the question whether the overall effect in
RNG experiments is an artifact of publication bias as we suspect. The PK effect in general is of great fundamental importance
-
if genuine. However, until we know with any certainty just how many non
significant, unpublished studies there are likely to be, we doubt that this unique experimental approach will gain the status of being of scientific value. Of course, such a pessimistic statement only serves to diminish interest in this line of work, thus guaranteeing that no one will ever know "the answer."
A quick look through their MA references doesn't show list the four studies below, all of which fit their inclusion criteria. This doesn't persuade me that they were as comprehensive as they claim. What else might they have missed? They also do not include the Jahn, et al. Publication of the 12-year databse, and other PEAR publications that would fit, e.g., Ibison. And they do include something like the Nelson 1994 Time normalization paper. Go figure.
Radin, D. I. & Utts, J. M. (1989). Experiments investigating the influence of intention on random and pseudorandom events. Journal of Scientific Exploration, 3, 65-79.
Radin, D. I. (1993). Environmental modulation and statistical equilibrium in mind-matter interaction. Subtle
Energies, 4 (1), 1-30.
Radin, D. I. (1992). Beyond belief: Exploring interactions among mind, body and environment. Subtle Energies,
2 (3), 1 - 40.
Radin, D. I. (1990). Testing the plausibility of psi-mediated computer system failures. Journal of Parapsychology,
54, 1-19.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 44
References
Alcock, J. E. (1981). Parapsychology: Science or magic? A psychological perspective. Oxford:
Pergamon.
Atmanspacher, H., Bösch, H., Boller, E., Nelson, R. D., & Scheingraber, H. (1999). Deviations from physical randomness due to human agent intention? Chaos, Solitons & Fractals, 10, 935
-
952.
Beloff, J., & Evans, L. (1961). A radioactivity test of psycho
kinesis. Journal of the Society for
Psychical Research, 41, 41
-
46.
Bem, D. J., & Honorton, C. (1994). Does psi exist? Replicable evidence for an anomalous process of information transfer. Psychological Bulletin, 115 (1), 4 18.
Bierman, D. J. (1985). A retro and direct PK test for babies with the manipulation of feedback:
A first trial of independent replication using software exchange. European Journal of
Parapsychology, 5, 373 390.
Blackmore, S. J. (1992). Psychic experiences: Psychic illusions. Skeptical Inquirer, 16, 367 376.
Braude, S. E. (1997). The limits of influence: Psychokinesis and the philosophy of science. Lanham:
University Press of America.
Broughton, R. S. (1987). Publication policy and the Journal of Parapsychology. Journal of
Parapsychology, 51, 21
-
32.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 45
Chalmers, I. (2001). Using systematic reviews and registers of ongoing trials for scientific and ethical trial design, monitoring, and reporting. In M. Egger, G. D. Smith, & D. Altman
(Eds.), Systematic reviews in health care: Meta
-
analysis in context (pp. 429
-
443). London: BMJ
Books.
Cohen, J. (1994). The earth is round (p< .05). American Psychologist, 49, 997
-
1003.
Crookes, W. (1889). Notes of seances with D. D. Home. Proceedings of the Society for Psychical
Research, 6, 98
-
127.
Crookes, W., Horsley, V., Bull, W. C., & Myers, A. T. (1885). Report on an alleged physical phenomenon. Proceedings of the Society for Psychical Research, 3, 460
-
463.
Dear, K. B. G., & Begg, C. B. (1992). An approach for assessing publication bias prior to performing a meta
analysis. Statistical Science, 7, 237
-
245.
Dudley, R. T. (2000). The relationship between negative affect and paranormal belief.
Personality and Individual Differences, 28, 315 321.
Duval, S., & Tweedie, R. (2000). A nonparametric "trim and fill" method of accounting for publication bias in meta
analysis. Journal of the American Statistical Association, 95, 89
-
98.
Edgeworth, F. Y. (1885). The calculus of probabilities applied to psychical research. Proceedings
of the Society for Psychical Research, 3, 190
-
199.
Edgeworth, F. Y. (1886). The calculus of probabilities applied to psychical research. II.
Proceedings of the Society for Psychical Research, 4, 189
-
208.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 46
Egger, M., Dickersin, K., & Smith, G. D. (2001). Problems and limitations in conducting systematic reviews. In M. Egger, G. D. Smith, & D. Altman (Eds.), Systematic reviews in health care: Meta analysis in context (pp. 43 68). London: BMJ Books.
Fisher, R. A. (1924). A method of scoring coincidences in tests with playing cards. Proceedings of
the Society for Psychical Research, 34, 181 185.
Gallup, G., & Newport, F. (1991). Belief in paranormal phenomena among adult Americans.
Skeptical Inquirer, 15, 137
-
146.
Geller, U. (1998). Uri Geller's little book of mind
-
power. London: Robson.
Girden, E. (1962a). A review of psychokinesis (PK). Psychological Bulletin, 59, 353
-
388.
Girden, E. (1962b). A postscript to "A Review of Psychokinesis (PK)". Psychological Bulletin, 59,
529
-
531.
Girden, E., & Girden, E. (1985). Psychokinesis: Fifty years afterward. In P. Kurtz (Eds.), A
Skeptic's Handbook of Parapsychology (pp. 129
-
146). Buffalo, NY: Prometheus Books.
Gissurarson, L. R. (1992). Studies of methods of enhancing and potentially training psychokinesis: A review. Journal of the American Society for Psychical Research, 86, 303
-
346.
Gissurarson, L. R. (1997). Methods of enhancing PK task performance. In S. Krippner (Eds.),
Advances in Parapsychological Research 8 (pp. 88
-
125). Jefferson, North Carolina: Mc Farland
Company.
Gissurarson, L. R., & Morris, R. L. (1991). Examination of six questionnaires as predictors of psychokinesis performance. Journal of Parapsychology, 55, 119
-
145.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 47
Hedges, L. V. (1987). How hard is hard science, how soft is soft science? The empirical cumulativeness of research. American Psychologist, 42, 443 455.
Hedges, L. V. (1992). Modeling publication selection effects in meta analysis. Statistical Science,
7, 246 255.
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta analysis. Orlando: Academic Press.
Honorton, C. (1985). Meta
analysis of psi ganzfeld research: A response to Hyman. Journal of
Parapsychology, 49, 51 91.
Honorton, C., & Ferrari, D. C. (1989). "Future telling": A meta
analysis of forced
choice precognition experiments, 1935
-
1987. Journal of Parapsychology, 53, 281
-
308.
Honorton, C., Ferrari, D. C., & Bem, D. J. (1998). Extraversion and ESP performance: A meta
analysis and a new confirmation. Journal of Parapsychology, 62, 255
-
276.
Irwin, H. J. (1993). Belief in the paranormal: A review of the empirical literature. Journal of the
American Society for Psychical Research, 87, 1
-
39.
Iyengar, S., & Greenhouse, J. B. (1988). Selection models and the file drawer problem.
Statistical Science, 3, 109
-
117.
Ioannidis, J. P. (1998). Effect of the statistical significance of results on the time to completion and publication of randomized efficacy trials. The Journal of the American Medical Association,
279, 281
-
286.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 48
Jahn, R. G., Dunne, B. J., Dobyns, Y. H., Nelson, R. D., & Bradish, G. J. (1999). ArtREG: A random event experiment utilizing picture
-
preference feedback. (Technical Note PEAR 99003).
Princeton NJ 08544: PEAR Laboratory, School of Engineering and Applied Science.
Jahn, R. G., Dunne, B. J., & Nelson, R. D. (1980). Princeton Engineering Anomalies Research.
Program Statement. (Technical Report). Princeton, New Jersey: Princeton University,
Princeton Engineering Anomalies Research, School of Engineering/Applied Science.
Jahn, R. G., Mischo, J., Vaitl, D., Dunne, B. J., Bradish, G. J., Dobyns, Y. H., Lettieri, A.,
Nelson, R. D., Boller, E., Bösch, H., Houtkooper, J. M., & Walter, B. (2000).
Mind/Machine interaction consortium: PortREG replication experiments. Journal of
Scientific Exploration, 14, 499
-
555.
James, W. (1896). Psychical research. Psychological Review, 3, 649
-
652.
Jüni, P., Witschi, A., Bloch, R., & Egger, M. (1999). The hazards of scoring the quality of clinical trials for meta
analysis. The Journal of the American Medical Association, 282, 1054
-
1060.
Lawrence, T. R. (1998). Gathering in the sheep and goats... A meta
analysis of forced choice sheep
goat ESP studies, 1947
-
1993. In N. L. Zingrone, M. J. Schlitz, C. S. Alvarado, & J.
Milton (Eds.), Research in Parapsychology 1993 (pp. 27
-
31). Lanham, MD: Scarecrow Press.
Mahoney, M. J. (1985). Open exchange and epistemic progress. American Psychologist, 40, 29
-
39.
McGarry, J. J., & Newberry, B. H. (1981). Beliefs in paranormal phenomena and locus of control: A field study. Journal of Personality and Social Psychology, 41, 725
-
736.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 49
Milton, J. (1993). A meta
analysis of waking state of consciousness, free response ESP studies.
Proceedings of Presented Papers: The Parapsychological Association 36th Annual Convention, 87 -
104.
Milton, J. (1997). Meta
-
Analysis of free
response ESP studies without altered states of consciousness. Journal of Parapsychology, 61, 279
-
319.
Milton, J., & Wiseman, R. (1999a). A meta
analysis of mass
media tests of extrasensory perception. British Journal of Psychology, 90, 235
-
240.
Milton, J., & Wiseman, R. (1999b). Does psi exist? Lack of replication of an anomalous process of information transfer. Psychological Bulletin, 125, 387
-
391.
Morris, R. L. (1982). Assessing experimental support for true precognition. Journal of
Parapsychology, 46, 321
-
336.
Murphy, G. (1962). Report on paper by Edward Girden on psychokinesis. Psychological Bulletin,
59, 638
-
641.
Musch, J., & Ehrenberg, K. (2002). Probability misjudgement, cognitive ability, and belief in the paranormal. British Journal of Psychology, 93, 177
Nelson, R. D. (1994). Effect size per hour: A natural unit for interpreting anomalies experiments.
(Technical Note PEAR 94003). Princeton University, Princeton, NJ 08544: Princeton
Engineering Anomalies Research, School of Engineering/ Applied Science.
Pallikari, F., & Boller, E. (1999). A rescaled range analysis of random events. Journal of Scientific
Exploration, 13, 35
-
40.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 50
Persinger, M. A. (2001). The neuropsychiatry of paranormal experiences. Journal of
Neuropsychiatry & Clinical Neurosciences, 13, 515
-
523.
Pratt, J. G. (1937). Clairvoyant blind matching. Journal of Parapsychology, 1, 10
-
17.
Pratt, J. G. (1949). The meaning of performance curves in ESP and PK test data. Journal of
Parapsychology, 13, 9
-
22.
Pratt, J. G., Rhine, J. B., Smith, B. M., Stuart, C. E., & Greenwood, J. A. (1940). Extra
sensory perception after sixty years: A critical appraisal of the research in extra
-
sensory perception. New York:
Henry Holt and Company.
Price, M. M., & Pegram, M. H. (1937). Extra
sensory perception among the blind. Journal of
Parapsychology, 1, 143 155.
Radin, D. I. (1982). Experimental attempts to influence pseudorandom number sequences.
Journal of the American Society for Psychical Research, 76, 359
-
374.
Radin, D. I. (1989). Searching for "signatures" in anomalous human
machine interaction data:
A neural network approach. Journal of Scientific Exploration, 3, 185
-
200.
Radin, D. I. (1997). The conscious universe. San Francisco: Harper Edge.
Radin, D. I., & Ferrari, D. C. (1991). Effects of consciousness on the fall of dice: A meta
analysis. Journal of Scientific Exploration, 5, 61
-
83.
Radin, D. I., & Nelson, R. D. (1989). Evidence for consciousness
related anomalies in random physical systems. Foundations of Physics, 19, 1499
-
1514.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 51
Radin, D. I., & Nelson, R. D. (2002). Meta
analysis of mind
matter interaction experiments:
1959 to 2000. In W. B. Jonas (Eds.), Spiritual Healing, Energy Medicine and Intentionality:
Research and Clinical Implications Edinburgh: Harcourt Health Sciences.
Reeves, M. P., & Rhine, J. B. (1943). The PK effect: II. A study in declines. Journal of
Parapsychology, 7, 76 93.
Rhine, J. B. (1934). Extrasensory perception. Boston: Boston Society for Psychic Research.
Rhine, J. B. (1936). Some selected experiments in extra sensory perception. Journal of Abnormal
and Social Psychology, 29, 151 171.
Rhine, J. B. (1937). The effect of distance in ESP tests. Journal of Parapsychology, 1, 172
-
184.
Rhine, J. B. (1946). Editorial: ESP and PK as "psi phenomena". Journal of Parapsychology, 10, 74 -
75.
Rhine, J. B., & Humphrey, B. M. (1944). The PK effect: Special evidence from hit patterns. I.
Quarter distribution of the page. Journal of Parapsychology, 8, 18
-
60.
Rhine, J. B., & Humphrey, B. M. (1945). The PK effect with sixty dice per throw. Journal of
Parapsychology, 9, 203
-
218.
Rhine, J. B., & Rhine, L. E. (1927). One evening's observation on the Margery mediumship.
Journal of Abnormal and Social Psychology, 21, 421.
Rhine, L. E. (1937). Some stimulus variations in extra
sensory perception with child subjects.
Journal of Parapsychology, 1, 102
-
113.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 52
Rhine, L. E., & Rhine, J. B. (1943). The psychokinetic effect: I. The first experiment. Journal of
Parapsychology, 7, 20
-
43.
Richet, C. (1923). Thirty years of physical research. A treatise on metapsychics. New York: MacMillan
Company.
Rosenthal, R. (1979). The "file drawer problem" and tolerance for null results. Psychological
Bulletin, 86, 638
-
641.
Rosenthal, R., & Rosnow, R. L. (1991). Essentials of behavioral research. Methods and data analysis.
(2 ed.). New York: McGraw
-
Hill Publishing.
Rosenthal, R., & Rubin, D. B. (1989). Effect size estimation for one
sample multiple
choice type data: Design, analysis, and meta
analysis. Psychological Bulletin, 106, 332
-
337.
Rush, J. H. (1977). Problems and methods in psychokinesis research. In S. Krippner (Eds.),
Advances in Parapsychological Research 1. Psychokinesis (pp. 15
-
78). New York: Plenum.
Sanger, C. P. (1895). Analysis of Mrs. Verrall's card experiments. Proceedings of the Society for
Psychical Research, 11, 193
-
197.
Schmeidler, G. R. (1977). Research findings in psychokinesis. In S. Krippner (Eds.), Advances in
Parapsychological Research 1. Psychokinesis (pp. 79
-
132). New York: Plenum.
Schmeidler, G. R. (1982). PK research: Findings and theories. In S. Krippner (Eds.), Advances
in Parapsychological Research 3 (pp. 115
-
146). New York: Plenum Press.
Schmidt, H. (1969). Anomalous prediction of quantum processes by some human subjects. Seattle,
WA: Plasma Physics Laboratory. D1
-
82
-
0821:1
-
38.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 53
Schmidt, H. (1970). PK experiments with animals as subjects. Journal of Parapsychology, 34, 255
-
261.
Schmidt, H. (1985). Addition effect for PK on pre recorded targets. Journal of Parapsychology,
49, 229 244.
Schouten, S. A. (1983). Personal experience and belief in ESP. The Journal of Psychology, 114,
219
-
222.
Shadish, W. R., & Haddock, K. C. (1994). Combining estimates of effect size. In L. V. Hedges
& H. Cooper (Eds.), The handbook of research synthesis (pp. 261
-
281). New York: Russell Sage
Foundation.
Sparks, G. G. (1998). Paranormal depictions in the media: How do they affect what people believe? Skeptical Inquirer, 22, 35
-
39.
Sparks, G. G., Hansen, T., & Shah, R. (1994). Do televised depictions of paranormal events influence viewers´ beliefs? Skeptical Inquirer, 18, 386 395.
Sparks, G. G., Nelson, C. L., & Campbell, R. G. (1997). The relationship between exposure to televised messages about paranormal phenomena and paranormal beliefs. Journal of
Broadcasting & Electronic Media, 41, 345
-
359.
Stanford, R. G. (1978). Toward reinterpreting psi events. Journal of the American Society for
Psychical Research, 72, 197
-
214.
Stanford, R. G., & Stein, A. G. (1994). A meta
analysis of ESP studies contrasting hypnosis and a comparison condition. Journal of Parapsychology, 58, 235
-
269.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 54
Steinkamp, F., Milton, J., & Morris, R. L. (1998). A meta
analysis of forced
choice experiments comparing clairvoyance and precognition. Journal of Parapsychology, 62, 193 218.
Sterling, T. D. (1959). Publication decisions and their possible effects on inferences drawn from tests of significance
-
or vice versa. Journal of the American Statistical Association, 54, 34
Sterling, T. D., Rosenbaum, W. L., & Weinkam, J. J. (1995). Publication decisions revisited:
The effect of the outcome of statistical tests on the decision to publish and vice versa.
American Statistician, 49, 112
Sterne, J. A. C., & Egger, M. (2001). Funnel plots for detecting bias in meta analysis:
Guidelines on choice of axis. Journal of Clinical Epidemiology, 54, 1046
-
1055.
Sterne, J. A. C., Egger, M., & Smith, G. D. (2001). Investigating and dealing with publication and other biases. In M. Egger, G. D. Smith, & D. Altman (Eds.), Systematic reviews in health care: Meta analysis in context (pp. 189 208). London: BMJ Books.
Sterne, J. A. C., Gavaghan, D., & Egger, M. (2000). Publication and related bias in meta
analysis: Power of statistical tests and prevalence in the literature. Journal of Clinical
Epidemiology, 53, 1119
-
1129.
Storm, L., & Ertel, S. (2001). Does psi exist? Comments on Milton and Wiseman's (1999) meta
analysis of ganzfeld research. Psychological Bulletin, 127, 424
-
433.
Targ, R., & Puthoff, H. E. (1977). Mind
-
reach. Scientists Look at Psychic Ability. Delacorte Press.
Tart, C. T. (1976). Effects of immediate feedback on ESP performance. Research in
Parapsychology 1975, 80
-
82.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 55
Taylor, G. Le M. (1890). Experimental comparison between chance and thought
transference in correspondence of diagrams. Proceedings of the Society for Psychical Research, 6, 398 405.
Thalbourne, M. A. (1995). Further studies of the measurement and correlates of belief in the paranormal. Journal of the American Society for Psychical Research, 89, 233
-
247.
Thompson, S. G., & Higgins, J. P. T. (2002). How should meta
regression analyses be undertaken and interpreted? Statistics in Medicine, 2002, 1559
-
1574.
Thompson, S. G., & Sharp, S. J. (1999). Explaining heterogeneity in meta
analysis: A comparison of methods. Statistics in Medicine, 18, 2693
-
2708.
Thouless, R. H. (1942). The present position of experimental research into telepathy and related phenomena. Proceedings of the Society for Psychical Research, 47, 1 19.
Thouless, R. H., & Wiesner, B. P. (1946). The psi processes in normal and "paranormal" psychology. Proceedings of the Society for Psychical Research, 48, 177 196.
Varvoglis, M. P., & McCarthy, D. (1986). Conscious
purposive focus and PK: RNG activity in relation to awareness, task
orientation, and feedback. Journal of the American Society for
Psychical Research, 80, 1
-
29.
Watt, C. A. (1994). Meta
analysis of DMT
-
ESP studies and an experimental investigation of perceptual defense/vigilance and extrasensory perception. In E. W. Cook & D. L. Delanoy
(Eds.), Research in Parapsychology 1991 (pp. 64
-
68). Metuchen, NJ: Scarecrow Press.
White, R. A. (1991). The psiline database system. Exceptional Human Experience, 9, 163
-
167.
Wilson, C. (1976). The Geller phenomenon. London: Aldus Books.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 56
Appendix
Studies Used in the Meta
-
Analysis
André, E. (1972). Confirmation of PK action on electronic equipment. Journal of Parapsychology,
36, 283
-
293.
Berger, R. E. (1986). Psi effects without real
time feedback using a PsiLab ][ video game experiment. The Parapsychological Association 29th Annual Convention, 111
-
128.
Bierman, D. J., & Houtkooper, J. M. (1975). Exploratory PK tests with a programmable high speed random number generator. European Journal of Parapsychology, 1, 3
-
14.
Bierman, D. J., De Diana, I. P. F., & Houtkooper, J. M. (1976). Preliminary report on the
Amsterdam experiments with Matthew Manning. European Journal of Parapsychology, 1, 6 16.
Bierman, D. J., & Noortje, V. T. (1977). The performance of healers in PK tests with different
RNG feedback algorithms. Research in Parapsychology 1976, 131
-
133.
Bierman, D. J., & Weiner, D. H. (1980). A preliminary study of the effect of data destruction on the influence of future observers. Journal of Parapsychology, 44, 233
-
243.
Bierman, D. J., & Houtkooper, J. M. (1981). The potential observer effect or the mystery of irreproducibility. European Journal of Parapsychology, 3, 345
-
371.
Bierman, D. J. (1987). Explorations of some theoretical frameworks using a PK
test environment. Proceedings of Presented Papers: The Parapsychological Association 30th Annual
Convention, 33
-
40.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 57
Bierman, D. J. (1988). Testing the IDS model with a gifted subject. Theoretical Parapsychology, 6,
31 36.
Bierman, D. J., & Van Gelderen, W. J. M. (1994). Geomagnetic activity and PK on a low and high trial
rate RNG. Proceedings of Presented Papers: The Parapsychological Association 37th
Annual Convention, 50
-
56.
Braud, L., & Braud, W. (1977). Psychokinetic effects upon a random event generator under conditions of limited feedback to volunteers and experimenter. Proceedings of Presented Papers:
The Parapsychological Association 20th Annual Convention.
Braud, W., & Kirk, J. (1977). Attempt to observe psychokinetic influences upon a random event generator by person
fish teams. European Journal of Parapsychology, 2, 228
-
237.
Braud, W. (1981). Psychokinesis experiments with infants and young children. Research in
Parapsychology 1980, 30
-
31.
Braud, W. G., & Hartgrove, J. (1976). Clairvoyance and psychokinesis in transcendental meditators and matched control subjects: A preliminary study. European Journal of
Parapsychology, 1, 6
-
16.
Braud, W. G. (1978). Recent investigations of microdynamic psychokinesis, with special emphasis on the roles of feedback, effort and awareness. European Journal of Parapsychology, 2,
137
-
162.
Braud, W. G. (1983). Prolonged visualization practice and psychokinesis: A pilot study (RB).
Research in Parapsychology 1982, 187
-
189.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 58
Breederveld, H. (1988). Towards reproducible experiments in psychokinesis IV. Experiments with an electronic random number generator. Theoretical Parapsychology, 6, 43 51.
Breederveld, H. (1989). The Michels experiments: An attempted replication. Journal of the
Society for Psychical Research, 55, 360 363.
Breederveld, H. (2001). De Optimal Stopping Strategie XL PK
experimenten met een random number generator. [The optimal stopping strategy XL. PK experiments with a random number generator]. SRU
-
Bulletin, 13, 22
-
23.
Broughton, R. S., & Millar, B. (1977). A PK experiment with a covert release
of
effort test.
Research in Parapsychology 1976, 28
-
30.
Broughton, R. S. (1979). An experiment with the head of Jut. European Journal of Parapsychology,
2, 337
-
357.
Broughton, R. S., Millar, B., & Johnson, M. (1981). An investigation into the use of aversion therapy techniques for the operant control of PK production in humans. European Journal of
Parapsychology, 3, 317
-
344.
Broughton, R. S., & Higgins, C. A. (1994). An investigation of micro PK and geomagnetism.
Proceedings of Presented Papers: The Parapsychological Association 37th Annual Convention, 87 94.
Broughton, R. S., & Alexander, C. H. (1997). Destruction testing DAT. Proceedings of Presented
Papers: The Parapsychological Association 40th Annual Convention, 100 104.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 59
Crandall, J. E. (1993). Effects of extrinsic motivation on PK performance and its relations to state anxiety and extraversion. Proceedings of Presented Papers: The Parapsychological Association
36th Annual Convention, 372 377.
Curry, C. K. (1978). A modularized random number generator: Engineering design and psychic
experimentation. Unpublished senior thesis, Department of Electrical Engineering and
Computer Science, School of Engineering/Applied Science, Princeton University.
Dalton, K. S. (1994). Remotely influenced ESP performance in a computer task: A preliminary study. Proceedings of Presented Papers: The Parapsychological Association 37th Annual Convention,
95
-
103.
Davis, J. W., & Morrison, M. D. (1978). A test of the Schmidt model's prediction concerning multiple feedback in a PK test. Research in Parapsychology 1977, 163
-
168.
Debes, J., & Morris, R. L. (1982). Comparison of striving and nonstriving instructional sets in a PK study. Journal of Parapsychology, 46, 297 312.
Gerding, J. L. F., Wezelman, R., & Bierman, D. J. (1997). The Druten disturbances
-
Exploratory RSPK research. Proceedings of Presented Papers: The Parapsychological Association
40th Annual Convention, 146
-
161.
Giesler, P. V. (1985). Differential micro
-
PK effects among Afro
-
Brazilian cultists: Three studies using trance
significant symbols as targets. Journal of Parapsychology, 49, 329
-
366.
Gissurarson, L. R. (1986). RNG
-
PK microcomputer "games" overviewed: An experiment with the videogame "PSI INVADERS". European Journal of Parapsychology, 6, 199
-
215.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 60
Gissurarson, L. R., & Morris, R. L. (1990). Volition and psychokinesis: Attempts to enhance
PK performance through the practice of imagery strategies. Journal of Parapsychology, 54, 331 -
370.
Gissurarson, L. R. (1990). Some PK attitudes as determinants of PK performance. European
Journal of Parapsychology, 8, 112
-
122.
Gissurarson, L. R., & Morris, R. L. (1991). Examination of six questionnaires as predictors of psychokinesis performance. Journal of Parapsychology, 55, 119
-
145.
Haraldsson, E. (1970). Subject selection in a machine precognition test. Journal of
Parapsychology, 34, 182
-
191.
Heseltine, G. L. (1977). Electronic random number generator operation associated with EEG activity. Journal of Parapsychology, 41, 103
-
118.
Heseltine, G. L., & Mayer Oakes, S. A. (1978). Electronic random generator operation and
EEG activity: Further studies. Journal of Parapsychology, 42, 123
-
136.
Heseltine, G. L., & Kirk, J. (1980). Examination of a majority
vote technique. Journal of
Parapsychology, 44, 167
-
176.
Hill, S. (1977). PK effects by a single subject on a binary random number generator based on electronic noise. Research in Parapsychology 1976, 26 28.
Honorton, C. (1971). Group PK performance with waking suggestions for muscle tension/ relaxation and active/ passive concentration. Proceedings of the Parapsychological Association, 8,
14
-
15.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 61
Honorton, C. (1971). Automated forced
choice precognition tests with a "sensitive". Journal of
the American Society for Psychical Research, 65, 476
-
481.
Honorton, C., & Barksdale, W. (1972). PK performance with waking suggestions for muscle tension vs. relaxation. Journal of the American Society for Psychical Research, 66, 208 214.
Honorton, C., Ramsey, M., & Cabibbo, C. (1975). Experimenter effects in ESP research.
Journal of the American Society for Psychical Research, 69, 135
-
139.
Honorton, C., & May, E. C. (1976). Volitional control in a psychokinetic task with auditory and visual feedback. Research in Parapsychology 1975, 90
-
91.
Honorton, C. (1977). Effects of meditation and feedback on psychokinetic performance: A pilot study with an instructor of Transcendental Meditation. Research in Parapsychology 1976,
95
-
97.
Honorton, C., & Tremmel, L. (1980). Psitrek: A preliminary effort toward development of psi conducive computer software. Research in Parapsychology 1979, 159 161.
Honorton, C., Barker, P., & Sondow, N. (1983). Feedback and participant
selection parameters in a computer RNG study (RB). Research in Parapsychology 1982, 157
-
159.
Honorton, C. (1987). Precognition and real
time ESP performance in a computer task with an exceptional subject. Journal of Parapsychology, 51, 291
-
320.
Houtkooper, J. M. (1976). Psychokinesis, clairvoyance and personality factors. Proceedings of
Presented Papers: The Parapsychological Association 19th Annual Convention, 1
-
15.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 62
Houtkooper, J. M. (1977). A study of repeated retroactive psychokinesis in relation to direct and random PK effects. European Journal of Parapsychology, 1, 1 20.
Houtkooper, J. M., Schienle, A., Vaitl, D., & Stark, R. (1999). Atmospheric electromagnetism:
An attempt at replicating the correlation between natural sferics and ESP. Proceedings of
Presented Papers: The Parapsychological Association 42nd Annual Convention, 123
-
135.
Jacobs, J. C., Michels, J. A. G., Millar, B., & Millar
-
De Bruyne, M.
-
L. F. L. (1987). Building a
PK trap: The adaptive trial speed method. Proceedings of Presented Papers: The Parapsychological
Association 30th Annual Convention, 348
-
370.
Jahn, R. G., Dunne, B. J., Dobyns, Y. H., Nelson, R. D., & Bradish, G. J. (1999). ArtREG: A random event experiment utilizing picture
-
preference feedback. (Technical Note PEAR 99003).
Princeton NJ 08544: PEAR Laboratory, School of Engineering and Applied Science.
Jahn, R. G., Mischo, J., Vaitl, D., Dunne, B. J., Bradish, G. J., Dobyns, Y. H., Lettieri, A.,
Nelson, R. D., Boller, E., Bösch, H., Houtkooper, J. M., & Walter, B. (2000).
Mind/Machine interaction consortium: PortREG replication experiments. Journal of
Scientific Exploration, 14, 499 555.
Kelly, E. F., & Kanthamani, B. K. (1972). A subject's efforts toward voluntary control. Journal of
Parapsychology, 36, 185 197.
Kelly, E. F., & Lenz, J. (1976). EEG correlates of trial
by
trial performance in a two
choice clairvoyance task: A preliminary study. Research in Parapsychology 1975, 22
-
25.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 63
Kugel, W., Bauer, B., & Bock, W. (1979). Versuchsreihe Telbin
.
[Experimental series Telbin].
(Arbeitsbericht 7). Berlin: Technische Universität Berlin.
Kugel, W. (1999). Amplifying precognition: Four experiments with roulette. Proceedings of
Presented Papers: The Parapsychological Association 42nd Annual Convention, 136
-
146.
Levi, A. (1979). The influence of imagery and feedback on PK effects. Journal of Parapsychology,
43, 275
-
289.
Lignon, Y., & Faton, L. (1977). Le factor psi sexerce sur un appareil électronique .
[The psi factor affects an electronic apparatus]. Psi Réalité, 54 62.
Lounds, P. (1993). The influence of psychokinesis on the randomly
generated order of emotive and non
emotive slides. Journal of the Society for Psychical Research, 59, 187
-
193.
Lucadou, W. v. (1991). Locating Psi
bursts
-
correlations between psychological characteristics of observers and observed quantum physical fluctuations. Proceedings of Presented Papers: The
Parapsychological Association 34th Annual Convention, 265
-
281.
Mabilleau, P. (1982). Electronic dice: A new way for experimentation in "psiology". Le Bulletin
PSILOG, 2, 13
-
14.
Matas, F., & Pantas, L. (1971). A PK experiment comparing meditating vs. nonmeditating subjects. Proceedings of the Parapsychological Association, 8, 12
-
13.
May, E. C., & Honorton, C. (1976). A dynamic PK experiment with Ingo Swann. Research in
Parapsychology 1975, 88
-
89.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 64
Michels, J. A. G. (1987). Consistent high scoring in self
test PK experiments using a stopping strategy. Journal of the Society for Psychical Research, 54, 119 129.
Millar, B., & Broughton, R. (1976). A preliminary PK experiment with a novel comuter
linked high speed random number generator. Research in Parapsychology 1975, 83
-
84.
Millar, B., & Mackenzie, P. (1977). A test of intentional vs unintentional PK. Research in
Parapsychology 1976, 32
-
35.
Millar, B., & Broughton, R. S. (1977). An investigation of the psi enhancement paradigm of
Schmidt. Research in Parapsychology 1976, 23
-
25.
Millar, B. (1983). Random bit generator experimenten. Millar
replicatie. [Random bit generator experiments. Millar's replication]. SRU
-
Bulletin, 8, 119
-
123.
Morris, R., Nanko, M., & Phillips, D. (1978). Intentional observer influence upon measurements of a quantum mechanical system: A comparison of two imagery strategies.
Program and Presented Papers: Parapsychological Association 21st Annual Convention, 266 275.
Morris, R. L., & Reilly, V. (1980). A failure to obtain results with goal
oriented imagery PK and a random event generator with varying hit probability. Research in Parapsychology 1979,
166
-
167.
Morris, R. L., & Harnaday, J. (1981). An attempt to employ mental practice to facilitate PK.
Research in Parapsychology 1980, 103
-
104.
Morris, R. L., & Garcia
-
Noriega, C. (1982). Variations in feedback characteristics and PK success. Research in Parapsychology 1981, 138
-
140.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 65
Morrison, M. D., & Davis, J. W. (1978). PK with immediate, delayed, and multiple feedback: A test of the Schmidt model's predictions. Program and Presented Papers: Parapsychological
Association 21st Annual Convention, 97 117.
Nanko, M. (1981). Use of goal
oriented imagery strategy on a psychokinetic task with "selected" subjects. Journal of the Southern California Society for Psychical Research, 2, 1
-
5.
Nelson, R. D. (1994). Effect size per hour: A natural unit for interpreting anomalies experiments.
(Technical Note PEAR 94003). Princeton University, Princeton, NJ 08544: Princeton
Engineering Anomalies Research, School of Engineering/ Applied Science.
Palmer, J., & Perlstrom, J. R. (1986). Random event generator PK in relation to task instructions: A case of "motivated" error? The Parapsychological Association 29th Annual
Convention, 131
-
147.
Palmer, J. (1995). External psi influence of ESP task performance. Proceedings of Presented Papers:
The Parapsychological Association 38th Convention, 270
-
282.
Palmer, J., & Broughton, R. S. (1995). Performance in a computer task with an exceptional subject: A failure to replicate. Proceedings of Presented Papers: The Parapsychological Association
38th Convention , 289 294.
Palmer, J. (1998). ESP and REG PK with Sean Harribance: Three new studies. Proceedings of
Presented Papers: The Parapsychological Association 41st Annual Convention, 124 134.
Pantas, L. (1971). PK scoring under preferred and nonpreferred conditions. Proceedings of the
Parapsychological Association, 8, 47 49.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 66
Pare, R. (1983). Random bit generator experimenten. Pare
replicatie. [Random bit generator experiments. Pare's replication].
SRU Bulletin, 8, 123 128.
Randall, J. L. (1974). An extended series of ESP and PK tests with three English schoolboys.
Journal of the Society for Psychical Research, 47, 485
-
494.
Reinsel, R. (1987). PK performance as a function of prior stage of sleep and time of night.
Proceedings of Presented Papers: The Parapsychological Association 30th Annual Convention, 332
-
347.
Schechter, E. I., Barker, P., & Varvoglis, M. P. (1983). A preliminary study with a PK game involving distraction from the Psi task (RB). Research in Parapsychology 1982, 152
-
154.
Schechter, E. I., Honorton, C., Barker, P., & Varvoglis, M. P. (1984). Relationships between participant traits and scores on two computer
controlled RNG
-
PK games. Research in
Parapsychology 1983, 32 33.
Schmeidler, G. R., & Borchardt, R. (1981). Psi
scores with random and pseudo
random targets. Research in Parapsychology 1980, 45
-
47.
Schmidt, H. (1969). Anomalous prediction of quantum processes by some human subjects.
(D1
-
82
-
0821). Plasma Physics Laboratory.
Schmidt, H. (1970). A PK test with electronic equipment. Journal of Parapsychology, 34, 175
-
181.
Schmidt, H., & Pantas, L. (1972). Psi tests with internally different machines. Journal of
Parapsychology, 36, 222
-
232.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 67
Schmidt, H. (1972). An attempt to increase the efficiency of PK testing by an increase in the generation speed. Proceedings of Presented Papers: Parapsychological Association 15th Annual
Convention
Schmidt, H. (1973). PK tests with a high
speed random number generator. Journal of
Parapsychology, 37, 105
-
118.
Schmidt, H. (1974/a). PK effect on random time intervals. Research in Parapsychology 1973, 46
-
48.
Schmidt, H. (1974/b). Comparison of PK action on two different random number generators.
Journal of Parapsychology, 38, 47
-
55.
Schmidt, H. (1975). PK experiment with repeated, time displaced feedback. Proceedings of
Presented Papers: The Parapsychological Association 18th Annual Convention.
Schmidt, H. (1976). PK effects on pre
recorded targets. Journal of the American Society for
Psychical Research, 70, 267
-
291.
Schmidt, H., & Terry, J. (1977). Search for a relationship between brainwaves and PK performance. Research in Parapsychology 1976, 30
-
32.
Schmidt, H. (1978). Use of stroboscopic light as rewarding feedback in a PK test with pre
recorded and momentarily generated random events. Program and Presented Papers:
Parapsychological Association 21st Annual Convention, 85 96.
Schmidt, H. (1990). Correlation between mental processes and external random events. Journal
of Scientific Exploration, 4, 233 241.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 68
Schouten, S. A. (1977). Testing some implications of a PK observational theory. European
Journal of Parapsychology, 1, 21
-
31.
Stanford, R. G. (1981). "Associative activation of the unconscious" and "visualization" as methods for influencing the PK target: A second study. Journal of the American Society for
Psychical Research, 75, 229 240.
Stanford, R. G., & Kottoor, T. M. (1985). Disruption of attention and PK
task performance.
Proceedings of Presented Papers: The Parapsychological 28th Annual Convention, 117
-
132.
Stewart, W. C. (1959). Three new ESP tests machines and some preliminary results. Journal of
Parapsychology, 23, 44
-
48.
Talbert, R., & Debes, J. (1981). Time
displacement psychokinetic effects on a random number generator using varying amounts of feedback. Program and Presented Papers: Parapsychological
Association 24th Annual Convention.
Tedder, W., & Braud, W. (1981). Long
distance, nocturnal psychokinesis. Research in
Parapsychology 1980, 100
-
101.
Tedder, W. (1984). Computer
based long distance ESP: An exploratory examination. Research
in Parapsychology 1983, 100
-
101.
Thouless, R. H. (1971). Experiments on psi self
training with Dr. Schmidt's pre
cognitive apparatus. Journal of the Society for Psychical Research, 46, 15
-
21.
Tremmel, L., & Honorton, C. (1980). Directional PK effects with a computer based random generator system: A preliminary study. Research in Parapsychology 1979, 69 71.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 69
Varvoglis, M. P. (1988). A "psychic contest" using a computer
-
RNG task in a non
laboratory setting. Proceedings of Presented Papers: The Parapsychological Association 31st Annual Convention,
36
-
52.
Verbraak, A. (1981). Onafhankelijke random bit generator experimenten
-
Verbraak
replicatie.
[Independent random bit generator experiments - Verbraak's replication]. SRU
-
Bulletin, 6,
134
-
139.
Weiner, D. H., & Bierman, D. J. (1979). An observer effect in data analysis? Research in
Parapsychology 1978, 57
-
58.
Winnett, R. (1977). Effects of meditation and feedback on psychokinetic performance: Results with practitioners of Ajapa Yoga. Research in Parapsychology 1976, 97 98.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 70
Table 1
Main Results of Radin & Ferrari's (1991) Dice Meta-Analysis
Dice-casts "Influenced"
N
All studies
All studies, quality weighted
Balanced studies
Balanced studies, homogenous
Balanced studies, homogenous, quality
148
148
69
59
59
t
0.50610
0.50362
0.50431
0.50158
0.50147 weighted
Dice-casts Control
All studies 31 0.50047
Note . The z -value is based on the null value of ¯ t
= .5
* p < .05. ** p < .01. *** p < .001
SE
.00031
.00036
.00055
.00061
.00063 z
19.68***
10.18***
7.83***
2.60** *
2.33* **
.00128 0.36
***
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 71
Table 2
Previous PK Meta-analyses - Total Samples
N
Dice
1991 Meta-analysis
RNG
1989 First meta-analysis
1997 First MA without PEAR data
2000 Second meta-analysis
148
597
339
515 o
.50822
.50018
.50061
.50005
Note . mean = the unweighted averaged effect size of studies
*** p < .001
SE z
.00041 20.23***
.00003
.00009
.00001
6.53***
6.41***
3.81***
mean
.51105
.50414
.50701
.50568
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 72
Table 3
Basic Study Characteristics - Intentional Studies
Source of studies
Journal
Conference proceeding
Report
Thesis/Dissertation
Number of participants
1
>1 - 10
>10 - 20
>20 - 30
>30 - 40
>40 - 50
>50 - 60
>60 - 70
>70 - 80
>80 - 90
>90 - 100
>100
2
3
33
13
12
12
2
2
7
63
16
2
91
101
52
276
Studies
( n )
Year of publication
1960
1961 - 1970
1971 - 1980
1981 - 1990
1991 - 2000
> 2000
Sample size (bit)
>10 - 100
>100 - 1,000
>1,000 - 10,000
>10,000 - 100,000
>100,000 - 1,000,000
>1,000,000 - 10,000,000
>10,000,000 - 100,000,000
>100,000,000 - 1,000,000,000 3
>1,000,000,000 - 10,000,000,000 1
7
8
74
123
91
42
8
37
1
2
14
202
101
Studies
( n )
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 73
Table 4
Overall, Trimmed Sample and Moderator Variables Summary Statistics
Variable and class n o
SE z
Intentional overall
Trimmed
Homogeneous
Heterogeneous
Sample size (bit)
(Q1) Smallest
(Q2) Small
(Q3) Large
(Q4) Largest
Year of publication
(Q1) Oldest
(Q2) Old
(Q3) New
(Q4) Newest
Number of participants
(Q1) One
(Q2) Few
(Q3) Several
(Q4) Many
Unknown
Participants
Selected
Unselected
Other
Study status
Formal
Pilot
Other
Feedback
Visual
Auditory
Other
Random sources
Noise
Radioactive
Other
357 .500029
287 .500017
70 .500136
89 .523500
91 .505519
91 .503249
86 .500026
90 .505356
97 .500178
85 .500292
85 .500024
91 .500453
101 .500043
57 .499986
81 .500024
27 .500627
55 .500628
244 .500027
58 .500237
192 .500027
152 .500061
13 .500202
221 .500017
35 .502357
101 .500085
194 .500028
96 .502357
67 .500766
.000039
.000012
.000117
.000185
.000011
.000408
.000011
.000049
.000191
.000012
.000382
.000028
.000011
.000544
.000389
.000011
.000011
.000034
.002655
.000914
.000421
.000011
.000394
.000147
.000247
.000011
.000168
.000045
-.35
***
2.10* **
5.35***
3.40** *
2.53* **
.58
***
2.46* **
1.26
***
1.06
***
1.49
***
6.18***
3.06** *
2.60** *
4.33***
1.97* **
2.73** *
1.54
***
4.02***
8.85***
6.03***
7.71***
2.43* **
13.59***
1.21
***
1.18
***
2.23* **
2.70** *
.96
***
M bit
2863035
22955969
677453
134727
8881236
27790
10744544
698876
527819
8370435
50385
3212016
11202736
10442
25523
6095359
6824190
3107153
446
3683
16726
25280771
18553
120378
52993
25390500
116550
1232867
Mdn bit
M py
6400 1979
6400 1981
6593 1976
320 1979
4000 1978
15360 1979
200000 1983
4364 1971
8000 1977
7500 1981
18000 1991
5000 1980
4800 1978
12288 1980
18000 1982
7500 1979
5000 1977
10321 1980
1280 1980
7727 1982
6400 1978
5000 1977
5000 1979
17000 1976
10000 1983
11456 1983
1600 1973
10800 1979
M sub. Mdn sub.
16
61
4
27
8
26
14
9
19
14
24
15
23
30
24
16
16
25
1
7
20
22
14
23
19
14
25
2
16
40
1
16
10
10
6
6
10
6
10
10
6
11
6
10
12
8
1
8
10 1442.89***
10 323.96
***
4 1107.88***
10
10
10
11
318.92***
269.52***
419.04***
262.79***
636.32***
255.24***
122.17** *
244.02***
632.14***
330.26***
168.60***
134.18***
143.80***
579.62***
675.64***
176.85***
616.39***
801.66***
23.55* **
814.63***
254.83***
331.14***
852.46***
459.53***
109.01*** z (rnd)
-.02
.18
.20
.08
.09
.05
.08
.02
.15
.04
.39
.16
.07
.22
.29
.54
.09
.18
.13
.09
.04
.06
.07
.58
.45
.42
.13
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 74
Note . z (rnd) = z -score random model
* p < .05. ** p < .01. *** p < .001
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 75
Table 5
Safeguard Variables Summary Statistics
Variable and class n o
SE
RNG control
Yes (2)
Earlier (1)
No (0)
All data reported
Yes (2)
Unclear (1)
No (0)
Split of data
Preplanned (2)
Unclear (1)
Not preplanned (0)
Safeguard sum-score
Sum = 6 (highest)
Sum = 5
Sum = 4
Sum = 3
Sum = 2
Sum = 1
Sum = 0 (lowest)
233
45
79
138
47
104
46
246
7
104
289
11
57
8
4
10
Note . z (rnd) = z -score random model
* p < .05. ** p < .01. *** p < .001
.500031
.499996
.499979
.500028
.501074
.500204
.500064
.500026
.501111
.500285
.500025
.500178
.500303
.517376
.500000
.503252
.000011
.000052
.000377
.000011
.000537
.000134
.000043
.000011
.000314
.000098
.000011
.000137
.000339
.002755
.026352
.001344 z
2.82** *
-.08
***
-.06
***
2.58** *
2.00* **
1.52
***
1.48
***
2.32* **
3.53** *
2.90** *
2.27* **
1.30
***
.89
***
6.31***
.00
***
2.42* **
M bit
8448064
13471208
33856
7477090
80726
250459
583819
45275647
33029
187695
45373113
143255
55091
4185
120
13840
Mdn bit
M py
10000 1981
1000 1982
4048 1977
6400 1980
37000 1976
7200 1978
6400 1980
25600 1983
4400 1977
6144 1982
48000 1983
6400 1979
1600 1976
1472 1978
120 1977
16000 1975
M sub. Mdn sub.
21
18
12
43
19
14
29
18
3
26
21
30
18
2
10
28
10
8
10
10
10
6
17
10
2
10
10
10
10
1
10
10
2
866.46***
286.75***
289.22***
1352.94***
16.75
***
67.71
***
734.71***
173.27***
522.33***
455.05***
210.59***
391.64***
105.72***
220.96***
.00
***
4.75
*** z (rnd)
.05
.14
.14
.13
.13
.07
.09
.08
.00
.00
.06
.36
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 76
Table 6
Summary of Weighted Stepwise Linear Meta-Regression Analysis for Variables Predicting Effect size of RNG Studies
B SE B ? Step and variable
Step 1
Year of publication
Step 2
Year of publication
-.000025
-.000021
.000007
.000007
-.180
-.153
Auditory feedback
* p < .05. ** p < .01. *** p < .001
.001856 .000770 .128 t
-3.45***
-2.88** *
2.41* **
R 2
.032
.048
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 77
Table 7
Potential Sources of the Small-Study Effect
True heterogeneity
Different intensity/quality
Different participants
Different feedback
Other moderator(s)
Data irregularities
Poor methodological design
Effect intrinsically irreplicable
Inadequate analysis
Fraud
Selection biases
Biased inclusion criteria
Publication bias
Chance
The table is adapted from Sterne, Egger, & Smith, 2001, p. 193.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 78
Table 8
Stepped Weight Function Monte Carlos Simulation of Publication Bias
Overall n
357
o
0.500027
Sample size
(Q1) Smallest
(Q2) Small
(Q3) Large
(Q4) Largest
89
91
91
86
0.514658
0.505118
0.502462
0.500024
SE
0.000021
0.004908
0.001692
0.000794
0.000021 z
2.48* **
5.80***
5.92***
6.07***
2.22* **
Stud
1453
364
367
372
350 z
?
0.36
1.58
0.21
0.88
0.08
2
592.19***
116.52* **
115.21* **
113.27* **
138.64***
Note . Stud = number of studies which did not get published (simulated); z
?
= difference between effect size of the simulated and experimental data
* p < .05. ** p < .01. *** p < .001
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 79
Figure 1
Funnel Plot Intentional Studies
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 80
10,000,000,000
1,000,000,000
100,000,000
10,000,000
1,000,000
100,000
10,000
1,000
100
10
0.35
0.4
0.45
0.5
Effect size (pi)
0.55
Three very extreme values (pi > .70; n < 400) are omitted form the figure for the sake of better representation of all other values.
0.6
0.65