meta-analysis.SBB-DIR.RDN.doc

Examining Psychokinesis

Examining Psychokinesis:

The Interaction of Human Intention with Random Number Generators.

A Meta-Analysis

1

Submitted: August 19, 2004

Acknowledgments

(removed for blind review)

Radin comments inserted with change-tracking

Examining

Psychokinesis:

The Interaction of Human Intention with Random Number Generators.

A Meta-Analysis

Submitted: August 19, 2004

Abstract

Séance

room phenomena and apports have fascinated mankind for decades. Experimental research has reduced these phenomena to attempts to influence (i) the fall of dice and, later,

(ii) the output of random number generators (RNGs). This overlooks dozens of other PK experiments. It also overlooks the fact that most of the impetus for developing RNG experiments was not to simulate séance phenomena, but to test quantum observational theories to tighten methodologies, and to reduce the need for special subjects. The meta

analysis presented here combined 357 studies that assessed whether human intention could correlate with RNG output. The studies yielded a significant, but very small effect. Study size was strongly and inversely related to effect size; Why the authors prefer to ignore the voluminous literature on this issue, I cannot fathom. this finding was consistent across all

Examining Psychokinesis 2 examined moderator and safeguard variables. A (not well specified) Monte Carlo simulation revealed that the small effect size, the relation between study size and effect size, as well as the extreme effect size heterogeneity, might be a result of publication bias.The idea that individuals can influence inanimate objects by the power of their own mind is a relatively recent concept.

Huh? Isn't sympathetic magic one of the most ancient beliefs? During the 1970s, Uri Geller reawakened mass interest in this putative ability through his demonstrations of spoon bending using his alleged psychic powers (Targ & Puthoff, 1977; Wilson, 1976) and lays claim to this ability even now (e.g., Geller, 1998). Belief in this phenomenon is widespread. In 1991

(Gallup & Newport), 17 percent of American adults believed in "the ability of the mind to move or bend objects using just mental energy" (p. 138) and seven percent even claimed that they had "seen somebody moving or bending an object using mental energy" (p. 141).

Unknown to most academics, a large amount of experimental data has accrued testing the hypothesis of a direct connection between the human mind and the physical world. It is one of the very few lines of research where replication is the main and central target (initially perhaps, but surely not for the past 20 years), a commitment that some methodologists wish to be the commitment of experimental psychologists in general (e.g., Cohen, 1994; Rosenthal &

Rosnow, 1991). This article will trace the development of the empirical evaluation of this alleged phenomenon and will present a new meta

analysis of a large set of studies examining the interaction between human intention and random number generators.

Psi research

Psi phenomena (Thouless, 1942; Thouless & Wiesner, 1946) can be split into two main categories. Psychokinesis (PK) is the common label for the apparent ability of humans to affect objects solely by the power of the mind. Extra

sensory

-

perception (ESP), on the other hand, refers to the apparent ability of humans to acquire information without the mediation of the

Examining Psychokinesis 3 recognized senses or logical inference. Many researchers believe that PK and ESP phenomena are idiosyncratic (from context I'm guessing they mean something like "identical," not idiosyncratic ) (e.g., Pratt, 1949; J. B. Rhine, 1946; Schmeidler, 1982; Stanford, 1978;

Thouless & Wiesner, 1946). Nevertheless, the two phenomena have been treated very differently right from the start of their scientific examination. For instance, whereas J. B. Rhine and his colleagues at the Psychology Department at Duke University published the results of their first ESP card experiments right after they had been conducted (Pratt, 1937; Price &

Pegram, 1937; J.B. Rhine, 1934, 1936, 1937; L. E. Rhine, 1937), they withheld the results of their first PK experiments for nine years (L. E. Rhine & J. B. Rhine, 1943) even though they had been carried out at the same time as the ESP experiments: Rhine and his colleagues did not want to undermine the scientific credibility that they had gained through their pioneering monograph on ESP (Pratt, J. B. Rhine, Smith, Stuart & Greenwood, 1940).

When L. E. Rhine & J. B. Rhine (1943) went public with their early dice experiments, the evidence was based not only on above chance results, but primarily on a particular scoring pattern. In those early PK experiments, the participants' task was to obtain combinations of given die faces. The researchers discovered a decline of "success" during longer series of experiments, a pattern suggestive of mental fatigue (Reeves & Rhine, 1943; J. B. Rhine &

Humphrey, 1944, 1945). This psychologically plausible pattern of decline seemed to eliminate several counter

hypotheses for the positive results obtained, such as die bias or trickery, because they would not lead to such a systematic decline. However, as experimental evidence grew, the decline pattern lost its impact in the chain of evidence.

Verifying psi

Today, in order to verify the existence of psi phenomena, one of two meta analytic approaches is generally undertaken - either the "proof

oriented" or the "process

oriented" meta

-

Examining Psychokinesis 4 analytical approach. The proof

oriented meta

analytical approach tries to verify the existence of psi phenomena by establishing an overall effect. The process

oriented meta

analytical approach tries to verify the existence of psi by establishing a connection between results and moderator variables.

Alleged [probably the intended meaning here and elsewhere is "potential" since the legalistic term "alleged" is inappropriate} moderators of PK, such as the distance between the participant and the target, and various psychological variables, have never been investigated as systematically as alleged moderators of ESP. So far, there have not been any meta

analyses of

PK moderators and the three main literature reviews of PK moderators (Gissurarson, 1992 &

1997; Gissurarson & Morris, 1991; Schmeidler, 1977) have come up with inconclusive results.

On the other hand, the three meta

analyses on ESP moderators established significant correlations between ESP and extraversion (Honorton, Ferrari & Bem, 1998), ESP and belief in ESP (Lawrence, 1998), and ESP and defensiveness (Watt, 1994). The imbalance between systematic reviews of PK and ESP moderators reflects the general disparity between the experimental investigations of the two categories of psi. From the very beginning of experimental investigation into psi, researchers have focused on ESP.

The imbalance between research in ESP and PK is also evident from the proof

oriented meta

analytical approach. Only three (Radin & Ferrari, 1991; Radin & Nelson, 1989, 2002) of the 13 (Bem & Honorton, 1994; Honorton, 1985; Honorton & Ferrari, 1989; Milton, 1993,

1997; Milton & Wiseman, 1999a, 1999b; Radin & Ferrari, 1991; Radin & Nelson, 1989,

2002; Stanford & Stein, 1994; Steinkamp, Milton & Morris, 1998; Storm & Ertel, 2001) meta

analyses on psi data address research on PK. Only two of which provide no evidence for psi (Milton & Wiseman, 1999a, 1999b). (The point of discussing/claiming an "imbalance between research in ESP and PK" is not clear. It also may be incorrect in more recent years

Examining Psychokinesis depending on how one counts RV and Ganzfeld, neither of which are classical ESP, and both of which have a relatively small number of datapoints. In any case, if this issue is to be pursued

ESP should be defined -- the next section shows evidence of confounding very different approaches, e.g., experimental research, social observation, and anecdote.)

Psychology and psi

Psychological approaches to psi have also almost exclusively focused on ESP. For example, there is a large amount of research (I disagree: certainly there is ample rhetoric, but systematic research, no) supporting the hypothesis that alleged ESP experiences are the result of delusions and misinterpretations (e.g., Alcock, 1981; Blackmore, 1992; Persinger, 2001). Personality

oriented research established connections between belief in ESP and several personality variables (Irwin, 1993; see also, Dudley, 2000; McGarry & Newberry, 1981; Musch &

Ehrenberg, 2002). Experience

oriented approaches to paranormal beliefs, which stress the connection between paranormal belief and paranormal experiences (e.g., Alcock, 1981;

Blackmore, 1992; Schouten, 1983) and media oriented approaches, which examine the connection between paranormal belief and depictions of paranormal events in the media (e.g.,

Sparks, 1998; Sparks, Hansen & Shah, 1994; Sparks, Nelson & Campbell, 1997) both focus on ESP, although the paranormal belief scale most frequently used in those studies also has some items on PK (Thalbourne, 1995).

The beginning of the experimental approach to Psychokinesis

Reports of séance room sessions during the late 19 th century are filled with claims of extraordinary movements of objects (e.g., Crookes, Horsley, Bull, & Meyers, 1885), prompting some outstanding researchers of the time to devote at least part of their career to determining whether the alleged phenomena were real (e.g., Crookes, 1889); James, 1896; Richet, 1923). In these early days, as in psychology, case studies and field investigations predominated. Hence, it

5

Examining Psychokinesis 6 is not surprising that in this era experimental approaches and statistical analyses were used only occasionally (e.g., Edgeworth, 1885, 1886; Fisher, 1924; Sanger, 1895; Taylor, 1890). Even J.B.

Rhine, the founder of the experimental study of psi phenomena, abandoned case studies and field investigations as a means of obtaining scientific proof only after he exposed several mediums as frauds (e.g., J.B. Rhine & L.E. Rhine, 1927). However, after a period of several years when he and his colleagues focused almost solely on ESP research, their interest in PK was reawakened in 1937 when a gambler visited the laboratory at Duke University and casually mentioned that many gamblers believed they could mentally influence the outcome of a throw of dice. This inspired J.B. Rhine to perform a series of informal experiments using dice. Very soon experiments with dice became the standard approach for investigating PK.

Difficulties in devising an appropriate methodology soon became apparent and improvements in the experimental procedures were quickly implemented. Standardized methods for throwing the dice were developed. Dice

throwing machines were used to prevent participants from manipulating their throw of the dice. Recording errors were minimized by having experimenters either photograph the outcome of each throw or having a second experimenter independently record the results. Commercial, pipped dice were found to have sides of unequal weight, with the sides with the larger number of excavated pips, such as the 6, being lighter and hence more likely to land uppermost than lower numbers, such as the 1.

Consequently, studies required participants to attempt to score seven with two dice, or used a balanced design in which the target face alternated from one side of the die (e.g., 6) to the opposite site (e.g., 1).

In 1962 Girden (1962a) published a comprehensive critique of dice experiments in the

Psychological Bulletin. Among other things, he criticized the experimenters for pooling data as it suited them, and for changing the experimental design once it appeared that results were not

Examining Psychokinesis 7 going in a favorable direction. He concluded that the results from the early experiments were largely due to the bias in the dice and that the later, better

controlled studies were progressively tending toward non

significant results. Although Murphy (1962) disagreed with Girden's conclusion, he did concede that no "ideal" experiment had yet been published that met all six quality criteria - namely one with (i) a sufficiently large sample size; (ii) a standardized method of throwing the dice; (iii) a balanced design; (iv) an objective record of the outcome of the throw; (v) the hypothesis stated in advance; and (vi) with a prespecified end point.

The controversy about the validity of the dice experiments continued (e.g., Girden, 1962b;

Girden & Girden, 1985; Rush, 1977). Over time, experimental and statistical methods improved and, in 1991, Radin & Ferrari undertook a meta

analysis of the dice experiments.

Dice Meta-Analysis

The dice meta

analysis comprised 148 experimental studies and 31 control studies published between 1935 and 1987. In the experimental studies 2569 participants tried to mentally influence 2,592,817 die casts. In the control studies a total of 153,288 die casts were made without any attempt mentally to influence the dice. The experimental studies were coded for various quality measures, including a number of those mentioned by Girden (1962a).

Table 1 provides the main meta

analytic results 1 . (Given the importance of these calculations,

1 To compare the meta analytic findings from the dice and previous RNG meta analyses with those from our

RNG meta analysis, we converted all effect size measures to the proportion index (pi)  (Rosenthal & Rubin,

1989) which we use throughout the paper. This one sample effect size ranges from 0 to 1 with .50 representing the null value (MCE). For two equally likely outcomes, e.g. when tossing a coin,  represents the proportion of "hits".

For example, if heads win at a hit rate of 50%, the effect size  = .50 indicates that heads and tails came down equally often; if the hit rate for heads were 75%, the effect size would be  = .75. The most important property of

 is that it converts all cases with more than two equally likely outcomes, like e.g. tossing a die, to the proportional hit rate as if there were just two alternatives.

Examining Psychokinesis it seems odd to relegate all this to a gigantic footnote.) The overall effect size, weighted by the inverse of the variance, is small but highly significant ( ¯ t

= .50610, z =19.68). Radin & Ferrari calculated that approximately 18,000 null effect studies would be required to reduce the result to a non

significant level (Rosenthal, 1979). When the studies were weighted for quality, the effect size decreased considerably (z

?

= 5.27, p = 1.34*10 7 ), but was still significantly above chance.

The authors found that there were indeed problems regarding die bias, with the effect size of the target face 6 being significantly larger than the effect size of any other target face. They concluded that this bias was sufficient to cast doubt on the whole database. They subsequently

8

In order to combine effect sizes from independent studies we used a fixed effect model, weighted by the inverse of the variance (how is the variance per study calculated?) (e.g., Shadish & Haddock, 1994). Because Dean Radin kindly provided us with the basic data files of the dice meta analysis, we were able to compute the overall effect size ¯ . However, although we were able to calculate the overall effect sizes ¯ o

for all meta analysis on the basis of the original data (see Table 2), the die data provided did not enable us to carry out the specific subgroup analyses presented in the meta analysis and summarized in Table 1. Consequently, in order to provide this information we

transformed the published results, which used the effect size r = z/sqrt(n), using ¯ t

= .5¯r + .5. This transformation is accurate as long as the z values of the individual studies are based on two equally likely alternatives (p = q = .5).

However, the z scores of most dice experiments are based on six equally likely alternatives (p = 1/6 and q = 5/6).

(what about the position studies?) Consequently ¯ o

as computed on the basis of the original data and ¯ t

as computed on the basis of the transformation formula diverge slightly because r no longer remains in the limits of

+/ 1. However, the difference between ¯ o

and ¯ t

is very small (< .05%) as long as the z values are not extreme (z >

10, p < 1*10 -

10 ). The difference is smaller the closer the value is to the null value of .50, which is the case for all effect sizes presented here. The difference between the two approaches can be seen when the results of the overall dice meta analysis that are presented in Table 1 are compared with the results presented in Table 2. The differences between the two estimates are determined using z

?

= ( ¯ o

-

¯ t

) / sqrt (SE o

2 + SE t

2 ). Although the difference is statistically significant (z

?

= 4.12, p = 3.72*10 -

5 , two tailed), the order of magnitude is the same.

Examining Psychokinesis 9 reduced their database to only those 69 studies that had correctly controlled for die bias (the

"balanced database"). As shown in Table 1 the resultant overall effect size remained statistically highly significant. However, the effect sizes of the studies in the balanced database were statistically heterogeneous. When Radin & Ferrari trimmed the sample until the effect sizes in the balanced database became homogenous, the effect size was reduced to only ¯ t

= .50158 and fell yet further to ¯ t

= .50147 when the 59 studies were weighted for quality. Only 60 unpublished null effect studies (our calculation (explain) are required to bring the balanced, homogenous and quality

weighted studies down to a non

significant level. Ultimately, the dice meta

analysis did not advance the controversy over the putative PK effect beyond the verdict of

"not proven", as mooted by Girden (1962b, p. 530) almost 30 years earlier.

Moreover, the meta

analysis has several limitations; Radin & Ferrari neither examined the source(s) of heterogeneity in their meta analysis, nor addressed whether the strong correlation between effect size and target face disappeared when they trimmed the 79 studies not using a balanced design from the overall sample. The authors did not analyze potential moderator variables and did not specify inclusion and exclusion criteria. The studies included varied considerably regarding the type of feedback given to participants. Some studies were even carried out totally without feedback. The studies also differed substantially regarding the participants who were recruited; some participants were psychic claimants and others made no claims to having any "psychic powers" at all. However, fundamentally as well as psychologically, the studies differ most in respect of the experimental instructions they received and the time window in which participants had to try to influence the dice. Although most experiments were real time, with the participant's task being mentally to influence the dice as they were thrown, some experiments were "precognition experiments" in which participants were asked to

Examining Psychokinesis 10 predict what die face would land uppermost in a future die cast thrown by someone other than the participant.

From Dice to Random Number Generator

With the arrival of computers, dice experiments were slowly replaced by a new approach.

Beloff & Evans (1961) were the first experimenters to use radioactive decay as a source of randomness to be influenced in a PK study. In the initial experiments, participants would try mentally to slow down or speed up the rate of decay of a radioactive source. The mean disintegration rate subjected to influence was then compared with that of a control condition in which there was no attempt at human influence.

Soon after this, experiments were devised in which the output from the radioactive source was transformed into bits (1s or 0s) that could be stored on a computer. These devices were known as random number generators (RNGs). Later, RNGs used electronic noise or other truly random origins as the source of randomness.

This line of PK research was, and continues to be, pursued by many experimenters, but predominantly by Schmidt (e.g., 1969), and later by the Princeton Anomalies and Engineering

Research (Princeton Engineering Anomalies Research) (PEAR) group at Princeton University

(e.g., Jahn, Dunne & Nelson, 1980).

RNG Experiments

In a typical PK RNG experiment, a participant presses a button to start the accumulation of experimental RNG data. The participant's task is mentally to influence the RNG to produce, say, more 1s than 0s for a predefined number of bits. Participants are generally given real

time feedback of their ongoing performance. The feedback can take a variety of forms. For example, it may consist in the lighting of lamps "moving" in a clockwise or counter clockwise direction, or in clicks provided to the right or left ear, depending on whether the RNG produces a 1 or a

Examining Psychokinesis 11

0. Today, feedback is generally software implemented and is primarily visual. If the RNG is based on a truly random source, it should generate 1s and 0s an equal number of times.

However, because small drifts cannot be totally eliminated, experimental precautions such as the use of an XOR filter, or a balanced experimental design are still required.

The RNG studies have many advantages over the earlier dice experiments, making it much easier to perform quality research with much less effort. Computerization alone meant that many of Girden (1962a) and Murphy's (1962) concerns about methodological quality could be overcome. If we return to Murphy's list of six methodological criteria, then (i) unlike with manual throws of dice, RNGs made it possible to conduct studies with large sample sizes

(reflects a fundamental assumption for which there is no sound evidence: the bit is the sample) in a short space of time; (ii) the RNG was completely impersonal - unlike the dice, it was not open to any classical (normal human) biasing of its output; (iii) balanced designs were still necessary due to potential drifts in the RNG; (iv) the output of the RNG could be stored automatically by computer, thus eliminating recording errors that may have been present in the dice studies; (v) like the dice studies, the hypotheses still had to be formulated in advance; and

(vi) like the dice studies, optional stopping could still be a potential problem. Thus, RNG research entailed that, in practical terms, researchers no longer had to be concerned about alleged weak points (i), (ii) and (iv).

New Limits

From a methodological point of view, RNG experiments have many advantages over the older dice studies. However, in respect of ecological validity, the RNG studies have some failings. Originally, the PK effect to be assessed was macroscopic and visual. Experimentalists then reduced séance room PK, first to PK on dice, and then to PK on a random source in an

RNG. I don't think this historical sequence is correct. I doubt that dice and RNG tests were

Examining Psychokinesis 12 created to simulate macro effects. But PK may not be reducible to a microscopic level (e.g.,

Braude, 1997). Moreover, a dice experiment is psychologically very different from an RNG study. Most people have played with dice, but few have had prior experience with RNGs.

Additionally, an RNG id technical gadget from which the output must be computed before feedback can be presented. Nevertheless, the ease with which PK data can be accumulated using an RNG has led to PK RNG experiments forming a substantial proportion of available data. Three related meta

analyses of these data have already been published.

Previous RNG Meta-Analyses

The first RNG meta

analysis was published by Radin & Nelson (1989) in Foundations of

Physics. This meta

analysis of 597 experimental studies published between 1959 and 1987 found a small but significant effect of ¯ o

= .50018 (SE = .00003, z = 6.53, p < 1*10 10 ) 2 .The size of the effect did not diminish when the studies were weighted for quality or when they were trimmed by 101 studies to render the database homogenous.

The limitations of this meta

analysis are very similar to the limitations of the dice meta

analysis. The authors did not examine the source(s) of heterogeneity and did not specify inclusion and exclusion criteria. (From our FoP paper: "Experiments selected for review examined the following hypothesis: The statistical output of an electronic RNG is correlated with observer intention in accordance with pre-specified instructions, as indicated by the directional shift of distribution parameters (usually the mean) from expected values." That seems pretty well specified to me!) Consequently, participants varied from humans to cockroaches, and the feedback ranged from no feedback at all to the administration of an

2 The meta-analysis provided the overall effect size only in a figure (Fig. 3, p. 1506). Because its first author kindly provided us with the original data, we were able to calculate the overall effect size and the proper statistics.

Examining Psychokinesis 13 electric shock. The meta

analysis included not only studies using true RNGs, which are RNGs based on true random sources such as electronic noise or radioactive decay, but also studies using pseudo RNGs (e.g., Radin, 1982), which are based on deterministic algorithms (with truly random starting points). It might be argued that the authors simply took a very inclusive approach. However, the authors did not discuss the extreme variance in the distribution of the studies' z scores (Again, from the FoP article: "Finally, following the practice of reviewers in the physical sciences (23,24), we deleted potential "outlier" studies to obtain a homogeneous distribution of effect sizes and to reduce the possibility that the calculated mean effect size may have been spuriously enlarged by extreme values." Certainly this is a rationale for what we did, although perhaps it is not a discussion.) and did not assess any potential moderator variables, which were also two weaknesses in the dice meta

analysis. Nevertheless, this first RNG meta

analysis served to justify further experimentation and analyses with the PK RNG approach.

Almost 10 years later, in his book aimed at a popular audience, Radin (1997) recalculated the effect size of the first RNG meta

analysis claiming that the "overall experimental effect, calculated per study, was about 51 percent" (p. 141). However, this newly calculated effect size is two orders of magnitude larger than the effect size of the first RNG meta

analysis (50.018%).

(Doesn't this reflect the different aggregation unit, bit vs experiment?) The increase has two sources. First, Radin removed the 258 PEAR studies included in the first meta

analysis

(without discussing why, because the whole purpose of that piece of the chapter was to address whether PEAR had been replicated, so it didn't make any sense to include it in the mix!) and second, he presented simple mean values instead of weighted means as presented 10 years earlier. The use of simple mean values in meta

analyses is generally discredited (e.g., Shadish &

Haddock, 1994), because it does not take into account that larger studies provide more accurate estimates of effect size. (ONLY assuming es is independent of N)In this case, the

Examining Psychokinesis 14 difference between computing an overall effect size using mean values rather than weighted mean values is dramatic. The removal of the PEAR studies effectively increased the impact of other small studies that had very large effect sizes. The effect of small studies on the overall outcome will be a very important topic in the current meta

analysis.

Recently, Radin & Nelson (2003) published an update of their earlier (1989) RNG meta analysis, adding a further 176 studies to their database. In this update, the PEAR data were collapsed into a new, single datapoint. The authors reported a simple mean effect size of

50.7%. Presented as such, the data appear to suggest that this updated effect size replicates that found in the first RNG meta

analysis. However, when the weighted fixed

effect model is applied to the data, as was used in the first RNG meta

analysis, the effect size of the updated database becomes ¯ o

= .50005, which is significantly smaller than the effect size of the original

RNG meta

analysis (z

?

= 4.27, p = 1.99*10 5 ; see Table 2 for comparison). One reason for the difference is the increase in sample size of the more recent experiments, which also have a concomitant decline in effect size. (es vs. N issue again)

Like the other meta

analyses, the updated 2002 meta

analysis did not investigate any potential moderator variables and no inclusion and exclusion criteria were specified (I don't understand this latter business about not specifying the inclusion criteria); it also did not include a heterogeneity test of the database. All three meta

analyses were conducted by related research teams and thus an independent replication of their findings is lacking. The need for a more thoroughgoing meta

analysis of PK RNG experiments is clear. (fair enough)

Human Intention Interacting with Random Number Generators:

A New Meta-Analysis

The meta

analysis presented here was part of a five

year consortium project on RNG experiments. The consortium comprised research groups from PEAR, USA; the University of


Giessen, Germany; and the Institut für Grenzgebiete der Psychologie und Psychohygiene

[Institute for Border Areas of Psychology and Mental Hygiene] in Freiburg, Germany. After all three groups in the consortium failed in their appropriately

powered (what was beta?) experiments (assuming an effect per bit model!!!!!) to replicate the mean shift of the PEAR group data (Jahn et al., 2000), which form one of the strongest and most influential datasets in psi research, the question about possible moderating variables in RNG experiments rose to the forefront (historically, and ironically, doing a M-A of the REG literature focusing on the moderating variable question was a task I assigned to my Freiburg team -- Boesch and Boller -- in 1996 and 1997). Consequently, a meta

analysis was conducted to determine whether the existence of an anomalous interaction could be established between direct human intention and the concurrent output of a true RNG, and if so, whether there were moderators or other explanations that influenced the apparent connection.

Method

Literature Search

The meta

analysis began with a search for any experimental studies that examined the possibility of an anomalous connection between the output of an RNG and the presence of a living being. This search was designed to be as comprehensive as possible in the first instance, and to be trimmed later in accordance with our prespecified inclusion and exclusion criteria.

Both published and unpublished manuscripts were sought.

Manual searches were undertaken at the library and archives of the Institut für Grenzgebiete der Psychologie und Psychohygiene in Freiburg, Germany. They included searches of the following journals: Proceedings of the Parapsychological Association Annual Convention (1968, 1977 -

1981, 1983 1999), Research in Parapsychology (1969 1976, 1982, 1984, 1985, 1988), Journal of

Parapsychology (1959

-

1998), Journal of the Society for Psychical Research (1959

-

1999), European


Journal of Parapsychology (1975 1998), The Journal of the American Society for Psychical Research

(1959

-

1998), Journal of Scientific Exploration (1987

-

1998), Subtle Energies (1991

-

1997), Journal of

Indian Psychology (1978 1999), Tijdschrift voor Parapsychologie (1959 1999), International Journal of

Parapsychology (1959

-

1968), Cuadernos de Parapsicologia (1963

-

1996), Revue Métapsychique (1960

-

1983), Australian Parapsychological Review (1983

-

1991), Research letter of the Parapsychological

Division of the Psychological Laboratory of Utrecht (1971 1984), Bulletin PSILOG (1981 1983),

Journal of the Southern California Society for Psychical Research (1979

-

1985), and the Arbeitsberichte

Parapsychologie der technischen Universität Berlin (1971

-

1980). (Why so many missing years in these journals?)

Electronic searches were conducted on the Psiline Database System (Vers. 1999), a continuously updated specialized electronic resource of parapsychologically

relevant writings

(White, 1991). The key words used to identify relevant articles in this specialized database were

Random Number Generator, RNG, Random Event Generator and REG. Electronic searches were also conducted on six CDs of Dissertation Abstracts Ondisc (Jan. 1961

-

Sep. 1999) using four different search strategies. First, the key words random number generator, RNG, random event

generator, REG, randomness, radioactive, parapsychology, perturbation, psychokinesis, PK, extra

sensory

perception, telepathy, precognition and calibration were used. Second, the key words random and

experiment were combined with event, number, noise, anomalous, anomaly, influence, generator,

apparatus or binary. Third, the key word machine was combined with man or mind. Fourth, the key word zener was combined with diode.

To obtain as many relevant unpublished manuscripts as possible, visits were made to three other prolific parapsychology research institutes: the Rhine Research Center, Durham NC;

PEAR at Princeton University; and the Koestler Parapsychology Unit at Edinburgh University.

Furthermore, a request for unpublished studies was placed on an electronic mailing list for

Examining Psychokinesis 17 professional parapsychologists (Parapsychology Research Forum

-

PRF). Is this really a professional forum? I haven't read it for many years, and it seems to be open to anyone? It is, or was, indeed open to anyone.

Finally the reference sections of all retrieved journal articles, conference proceedings, reports and thesis/dissertations were searched. The search covered a broad range of languages and included items in Dutch, English, French, German, Italian and Spanish and was otherwise limited only because of lack of further available linguistic expertise.

Inclusion and Exclusion Criteria

The final database included only studies that examined the correlation between direct human

intention and the concurrent output of true RNGs. Thus, after the comprehensive literature search was conducted we excluded studies that: (a) involved, implicitly or explicitly, only an

indirect intention toward the RNG. For example, telepathy studies, in which a receiver attempts to gain impressions about the sender's viewing of a target that had been randomly selected by a true RNG, were excluded (e.g., Tart, 1976). Here, the receiver's intention is presumably directed to gaining knowledge about what the sender is viewing, rather than on influencing the

RNG; (b) used animals or plants as participants (e.g., Schmidt, 1970); (c) assessed the possibility of a non

-

intentional, or only ambiguously intentional, effect. For instance, studies evaluating whether hidden RNGs could be influenced when the participant's intention was directed to another task or another RNG (e.g., Varvoglis & McCarthy, 1986) or studies that used babies as participants (e.g., Bierman, 1985); (d) looked for an effect backwards in time or, similarly, in which participants observed the same bits a number of times (e.g., Morris, 1982; Schmidt,

1985); (e) evaluated whether there was an effect of human intention on a pseudo RNG (e.g.,

Radin, 1982). (This list seems to exclude all Field REG studies.)


Additionally, studies were excluded if their outcome could not be transformed into the effect size ¯ o

that was prespecified for this meta analysis (if the tool we have is a hammer, we will only look at nails). As a result, studies that compared the rate of radioactive decay in the presence of attempted human influence with that of the same element in the absence of human intention (e.g., Beloff & Evans, 1961), were excluded. The cut

off date for inclusion of studies in the meta

analysis was prespecified as 30 th August 2000. Then why do none of the searched literature sources go up to this date?

Defining Studies

Some studies were reported in both published and unpublished forms, or both as a full journal article and elsewhere as an abstract. In these cases, all reports of the same study were used to obtain information for the coding, but the report with the most details was classified as the "main report". The main reports often contained more than one "study". A study was the smallest experimental unit described that did not overlap with other data in the report. This enabled the maximum amount of information to be included. In cases where the same data could be split up in two different ways (e.g., men vs. women or morning sessions vs. afternoon sessions), the split was used that appeared to reflect the author's greatest interest in designing the study.

Many studies performed unattended randomness checks of the RNG to ensure that the apparatus was functioning properly. These control runs were coded in a separate "control" database. Data for these control runs, like the experimental database, were split based on the smallest unit described. In some experiments, data were gathered in the presence of a participant with an instruction to the participant "not to influence" the RNG (e.g., Jahn et al.,

2000). These data were excluded from both experimental and control databases due to the inherent ambiguity of whether intention is playing an influential role. (So something called a

Examining Psychokinesis 19 control condition is ambiguous and can't be included in control? In fact, the PEAR lab did a huge amount of " unattended randomness checks" that are not included in this M-A Now we see that even the baseline condition, which was explicitly designed to be compared against the bi-polar high-vs-low intention conditions is not included in controls?)

Moderator Variables

To identify potential moderators, all variables suggested by previous literature reviews were coded [blindly? No -- this coding was certainly not blind] (Gissurarson, 1992 & 1997;

Gissurarson & Morris, 1991; Schmeidler, 1977). Additionally, several descriptive variables and variables explicitly or implicitly held responsible for the presence of absence of an anomalous correlation were placed on the internet and discussed by researchers interested in the topic.

After considering the feedback and making any requisite revisions to the coding form, 20 papers were randomly selected and independently pilot coded by FS and EB. Afterwards, the two sets of coding were compared, coder disagreements were discussed, ambiguities in the coding descriptions were clarified, and the coding form was finalized.

The variables coded covered six main areas: (i) Basic information, such as year of publication and study status (i.e., formal, pilot, mixed, control); (ii) Participant information, such as selection criteria (e.g., none, psychic claimant, prior success in psi experiment, ?); (iii) Experimenter

information, such as whether the experimenter also acted as a participant (e.g., no, yes, partially);

(iv) Experimental setting, such as type of feedback (visual, auditory, ?); (v) Statistical information variables, such as total number of bits (sample size) [what about bits/effort, bits/subject, etc.?]; and (vi) Safeguard variables.

The final coding form contained 67 variables. The comprehensive coding was applied because, prior to coding the studies, it was not clear which variables would provide enough data for a sensible moderator variable analysis. However, because of the importance of the

Examining Psychokinesis 20 safeguard variables, i.e., the moderators of quality, we prespecified that the impact of the three safeguard variables would be examined independently of their frequency distribution. The safeguards were: (1) RNG control, which recorded whether malfunction of the RNG had been ruled out by the study, either by using a balanced design or by performing control runs of the

RNG (but note that real control data were excluded in some cases); (2) all data reported, which addressed whether the final study size matched the planned size of the study or whether optional stopping or selective reporting may have occurred (note that PEAR REG studies do not report a planned size, but that optional stopping CAN NOT have and effect on the outcome); (3) split of data (define) , which noted whether the split of data reported was explicitly planned or was potentially post

hoc. All safeguards were ranked on a three point scale (yes [2], earlier 3 /unclear[1], no[0]) with the intermediate value being coded either when it was unclear whether the study actually took the safeguard into account or where it was only partially taken into account. Because summary scores of safeguard variables are problematic if considered exclusively (e.g., Jüni, Witschi, Bloch, & Egger, 1999), we examined the influence of the safeguard variables separately and in conjunction.

The main coding was undertaken by FS [blindly?]. For any potentially controversial or difficult decisions, FS consulted with EB. If FS and EB could not agree, the final decision fell to HB, who was blind as to who held which opinion. Over time, HB's decisions generally supported FS and EB equally, thus suggesting that HB served well as mediator.

Analyses

All analyses were performed using SPSS (Vers. 11.5) software. The effect sizes of individual studies were combined into composite mean weighted effect size measures as described in

3 When authors referred to previous studies in which the RNG was tested, studies were coded as controlled

"earlier".


Footnote 1. To determine whether the  from each subsample (class) significantly differed from

MCE, the standard error based on the within

study variance was calculated (Shadish &

Haddock, 1994). (but in this database and M-A, the between-study variance within subsamples is at issue. The chosen se is arguably not appropriate. It could only be correct if the subsamples are indeed homogeneous -- did the authors really identify all the moderators? One that is apparently not identified is subject differences, what PEAR called "operators" and showed to be a powerful moderator.) The resulting z score indicates whether ¯ o

differs from MCE. To determine whether each subsample of  s shared a common effect size (i.e., was consistent across studies), a homogeneity statistic Q was calculated, which has an approximately  2 distribution with k - 1 degrees of freedom, where k is the number of effect sizes (Shadish &

Haddock, 1994). Given the importance of this, the method for calculating Q should be better specified. I'm guessing it is the sum of z scores. The difference between two effect size estimates was determined using z

?

= ( ¯

1

-

¯

2

) / sqrt (SE

1

2 + SE

2

2 ).

As an initial, straightforward, sensitivity approach to estimating the overall effect size, we trimmed the overall experimental sample until it became homogeneous. This was conducted by applying an algorithm to the data which successively excluded the study that contributed most to the heterogeneity of the sample. The procedure stopped when the  2 heterogeneity statistic became non

significant (Hedges & Olkin, 1985). A comparison between the resulting homogeneous sample and the studies that had to be trimmed from the overall database (the

"trimmed studies" sample) allows one to assess the reliability of the estimated effect size and to estimate the impact of aberrant values on the overall result.

In an attempt to explore the putative impact of moderator and safeguard variables on the effect size and to determine the source(s) of heterogeneity, a meta

regression analysis was carried out. Meta regression is a multivariate regression analysis with independent studies as

Examining Psychokinesis 22 the unit of observation (e.g., Thompson & Higgins, 2002; Thompson & Sharp, 1999). This analysis determines how the variables in the model account for the heterogeneity of effect size.

We applied a weighted stepwise multiple regression analysis with the moderators as predictors and effect size as the dependent variable.

In the absence of homogeneity, and as a general sensitivity measure, we additionally calculated a random

effect model (Shadish & Haddock, 1994), which takes into account the variance between studies (i.e., heterogeneity on the basis of the Q homogeneity statistics). This should be better described. I'm not entirely clear on the difference between fixed and random effect models. Because, generally, the standard error using a random effects model is larger, the test statistic is consequently more conservative then the test statistic of the fixed effect model. The z

score (rnd) indicates whether ¯ o

differs from MCE using a random effect approach. However, even in the absence of homogeneity, the fixed

effect model is particularly appropriate in the context of the studies collected here because although the impact of some alleged moderator variables will be examined, no moderator has been established yet and the overall effect remains a matter of contention.

Results

Study Characteristics

The literature search retrieved 155 main reports containing 712 experimental studies and

158 control studies. After applying the inclusion and exclusion criteria, the meta analysis included 114 reports containing 357 experimental studies and 142 control studies (see

Appendix).

The basic study characteristics are summarized in Table 3. The heyday of RNG experimentation was in the 1970s, when more than half of the studies were published. A quarter of the studies were published in conference proceedings and reports, but most of the

Examining Psychokinesis 23 studies were published in journals. The number of participants in the studies varied considerably. Approximately one quarter of studies were conducted with a sole participant and another quarter with up to 10 participants. There were only a few studies with more than 100 participants. The sample size of the average study is 6,095,359 bits. However, most studies were much smaller, as indicated by a median sample size of 6,400 bits (see Table 4). The few very large studies considerably increased the average sample size and resulted in an extremely right

skewed distribution of sample size. This variable was therefore log10

transformed.

Consequently, a significant linear correlation or regression coefficient of sample size with another variable would indicate an underlying exponential relationship.

Overall Analyses

When combined, the 357 experimental studies yielded a small, but statistically significant effect ( ¯ o

= .500029, SE = .000011, z = 2.73, p = .003, one

tailed). The 142 control studies yielded a non

significant effect ( ¯ o

= .500026, SE = .000015, z = 1.76, p = .08) that was nevertheless comparable in size to the effect demonstrated in the experimental studies

(z

?

= 0.15, p = .87). However, because RNG experiments do not follow a classical control group design, this comparison is merely descriptive. The control studies had a much larger median sample size than the experimental studies (50,000 bits vs. 6400 bits, respectively).

The two samples differed considerably in respect of their effect size distribution. Whereas the control data distributed homogeneously (  2 (141) = 138.16, p = .55), the effect size distribution of the experimental studies was extremely heterogeneous (  2 (356) = 1442.89, p = 1.45*10 130 ).

We therefore conducted several sensitivity analyses on the intentional data in an attempt to determine the source(s) of the heterogeneity.


Trimmed Sample

As can be seen in Table 4, 70 studies had to be excluded before the  2 heterogeneity statistic became non

significant, a value (20%) that, although at higher end of the span (what span?), is not uncommon in meta analyses on psychological topics (Hedges, 1987). The homogenous sample of 287 studies had very similar characteristics to the original overall sample. Both samples had comparable mean and median sample sizes (numbers of bits per study), and comparable mean and median number of participants (see Table 4). Their effect sizes were also similar (z

?

= .80, p = .42), although the effect in the smaller, homogenized sample did not reach significance.

The heterogeneous sample that had been removed from the experimental database differed considerably from both the overall and the homogenized sample. The removed studies had generally been published earlier and had been carried out with fewer participants than the other two samples; they also contained fewer studies with large sample sizes than the homogeneous subsample (see Table 4). (note the convergence with the observation that es is anticorrelated with N) The effect size, too, of the studies that had been removed was almost one order of magnitude larger than that of the overall sample (z

?

= 3.33, p = 8.68*10 4 ) and of the homogenized sample (z

?

= 2.99, p = 2.77*10 3 ).

Safeguard Variable Analyses

The majority of studies had adequately implemented the specified safeguards (see Table 5).

Almost 40% of the studies (n = 138) were given the highest rating for each of the three safeguards.

Generally, inadequate safeguards did not appear to be a reason for the significant effect found in the experimental studies. Studies implementing RNG controls or reported all data safeguards did not have significantly lower effect sizes than studies that did not implement

Examining Psychokinesis 25 these safeguards (z

?

= .14, p = .89; z

?

= 1.13, p = 0.19; respectively). However, studies with a post

hoc split of data had a significantly larger effect size than studies that had preplanned their split of data (z

?

= 3.30, p = 9.55*10 4 ).

The safeguard sum

score revealed that the mean effect size of the 138 studies with the maximum safeguard rating was tenfold larger than that in the overall experimental database

(z

?

= 2.59, p = 9.50*10 3 ). These high

quality studies had a smaller mean sample size and were conducted slightly more recently than the average study in the overall database. (later studies are smaller?) However, it is not possible to draw any conclusions about the impact of study quality on effect size because of the uneven distribution of studies across the summary scale.

Nevertheless, the trend that study quality is connected with study size and publication date is suggestive. As can be seen from Table 5, smaller and older studies seem to be of lower quality.

(and have smaller effect size?)

In summary, the heterogeneity in the database is not primarily due to the contribution of misleadingly significant results from badly

designed studies.

Moderator Variable Analyses

Beside sample size and year of publication, which are generally highly underrated potential moderators, only very few of the variables coded provided enough entries for us to be able to carry out sensible moderator variable analyses. For instance, we were interested whether participants filled in psychological questionnaires. Although this was the case in 96 studies, only 12 used an established measure. Therefore, beside sample size and year of publication, we focused on five primary variables for RNG experiments which provided enough data for a sensible moderator variable analysis.

The summary given in Table 4 compares the mean effect sizes associated with the 5 potential moderators with those from the overall and the trimmed sample. It is quite obvious

Examining Psychokinesis 26 that sample size is the most important moderator of effect size. (a repeated refrain) The studies in the quartile comprising the smallest studies (Q1) have an effect size which is three orders of magnitude bigger than the effect size in the quartile with the largest studies (Q4). The difference is highly significant (z

?

= 8.84, p < 1*10 10 ). The trend is continuous: the smaller the sample size, the bigger the effect size; a connection which Sterne, Gavaghan, & Egger (2000) called the "small

study effect". The funnel plot (Figure 1) illustrates the effect. Whereas the bigger studies distribute symmetrically round the overall effect size, the distribution of studies below 10,000 bits is increasingly asymmetrical. In respect of the mean year of publication, the largest studies (Q4) stand out from the other three, smaller

study quartiles. The largest studies are, on average, published 4

-

5 years later than the smaller studies. Most of the big studies, with very small effect sizes, have been published only recently (e.g., Jahn, Dunne, Dobyns,

Nelson, & Bradish, 1999; Jahn et al., 2000; Nelson, 1994).

The year of publication underpins the importance of sample size for the outcome of the studies (see Table 4). The oldest studies (Q1), which have the smallest mean sample size, have an effect size which is two orders of magnitude bigger than the effect size of the newest studies

(z

?

= 13.53, p < 1*10 10 ), which have the largest mean sample size. (This seems to differ from statements on preceding page) However, the impact of sample size is not evident for the two middle quartiles. (Note that the largest studies, PEAR, use only unselected subjects, who evidently have a smaller average es. Is this accounted for in the present modeling and M-A?)

The effect size of the old studies (Q2) is significantly larger (z

?

= 2.26, p = 0.02) than the effect size of the new studies (Q3), although the sample size of the old studies is larger (old study N is larger?). Thus, time might play a role on its own. The experiments might have changed in other ways than increase of samples size. (Isn't that what a moderator variable study is supposed to assess?)


Although causal connections are difficult to establish in meta

analyses, we examined the interaction of sample size and year of publication and their impact on effect size in order to understand how the two moderators are linked to one another. The oldest studies (Q1) are particularly interesting because their effect size is considerably bigger than the effect size of the subsequent studies (Q2

-

Q4). The z

value of the oldest studies is also the highest of the four quartiles and thus indicates that they were the most successful (see Table 4). A median split of the oldest studies according to sample size revealed that the effect sizes of the two halves differ significantly from each other (z

?

= 5.62, p = 5.62*10 8 ). The half with the smaller sample size

(n = 45, M = 982, Mdn = 500) has an effect size of ¯ o

= .519137 (SE = .002484, z = 7.71, p <

1*10 10 ) whereas the half with the bigger sample size (n = 45, M = 36123, Mdn = 9600) has an effect size of ¯ o

= .505000 (SE = .000399, z = 12.53, p < 1*10 10 ). The mean year of publication in both subsamples is the same M = 1971 and not different from the mean year of publication of the whole quartile. The analysis suggests that sample size is the deciding moderator and not year of publication, a finding that the pure magnitude of the effect size in the quartiles of sample size and year of publication also suggests (see Table 4). (It would be good to report the same analysis on the newer studies)

The number of participants in RNG experiments is clearly linked to effect size (see Table 4).

Studies with a sole participant (Q1) have an effect size that is at least one order of magnitude bigger than studies with more than one participant. However, this finding, too, is confounded by sample size (see Table 4). Applying the same method to the quartile of studies with one participant (Q1) as used before, we found that the studies with smaller sample sizes (n = 47,

M = 1817, Mdn = 960) have a bigger effect size ( ¯ o

= .510062, SE = .001784, z = 5.64, p =

1.70*10 8 ) than the studies with larger sample sizes (n = 44, M = 239197, Mdn = 24124,

 o

= .500368, SE = .00017, z = 2.19, p = 0.03). The difference is highly significant (z

?

= 5.41, p =


6.32*10 8 ). This analysis puts the apparent superiority of studies with a sole participant into question and identifies sample size as an important confounder. However, it can be argued that small studies in general and small one

participant studies in particular are fundamentally different from larger studies - an argument that cannot easily be dismissed. It actually is one of the potential sources accounting for the small study effect, which will be discussed later. (But the preceding analysis apparently discounts this with an explicit and relevant comparison.)

The current meta

analysis also seems to support the claim that selected participants perform better than non

selected participants. The claim has already been confirmed by an earlier meta

analysis (Honorton & Ferrari, 1989). As can be seen in Table 4 the effect size of studies with selected participants is one order of magnitude bigger than the effect size of studies that did not select their participants on e.g. grounds of prior success in a psi experiment or for being a psychic claimant. Studies with selected participants are predominantly carried out with only one or very few participants, as indicated by the mean and median number of participants in

Table 4. Studies with unselected populations are regularly carried out with more participants than studies with selected participants. However, this finding is confounded by sample size

(and by number of participants. Studies with unselected populations also have a larger sample size than studies with selected participants. The systematic selection of participants might play an important role in RNG experiments, and it is certainly not implausible that longer experiments (is this an implicit equation of length of experiment with N of samples? It is not so simple.) are tiring for participants and therefore might produce different results. The argument is similar to that regarding the number of participants in RNG experiments, where experiments with fewer participants may be shorter and/or make participants feel more involved.

Study status is an important moderator in meta

analyses that include both formal and pilot experiments. Pilot experiments are likely to comprise a selective sample insofar as they tend to

Examining Psychokinesis 29 be published if they yield significant results (and hence larger

than

usual effect sizes) and not published if they yield unpromising directions for further study. However, pilot and formal studies in this sample did not differ in respect of effect size (z

?

= 0.68, p = 0.50). Although the effect size of the pilot studies was bigger, it is not significantly different from the null value because its SE is more than four times larger (see Table 4). Pilot experiments are, as one would expect, smaller than formal experiments.

The type of feedback to the participant in RNG studies has been regarded as an important issue in psi research from its very inception. The majority of RNG experiments provide participants with visual and some with auditory feedback. Beside the two main categories the coding resulted in a large "other" category with 101 studies, which used, for example, alternating visual and auditory feedback, or no feedback at all. The result is clear

cut: studies providing exclusively auditory feedback outperform not only the studies using visual feedback

(z

?

= 6.12, p = 9.24*10 10 ), but also the studies in the "other" category (z

?

= 5.93, p = 2.01*10 9 ).

However, this finding is based on a very small and very heterogeneous sample of large studies

(see Table 4), although the studies using visual feedback are, on average, even larger.

Nevertheless, the auditory feedback studies were surprisingly comparable to the large sample size studies (Q3) in terms of their mean numbers of participants, year of study, and other variables (see Table 4).

The core (this isn't quite the right word) of all RNG studies is the random source. Although the participants' intention is generally directed (by the instructions given to them) to the feedback and not to the technical details of the RNG, it is the sequence of random numbers that is compared with the theoretical expectation (e.g., binominal distribution) and that is, therefore, allegedly influenced. RNGs are based on truly random radioactive decay, Zener noise, or occasionally thermal noise. As shown in Table 4, the effect size of studies with RNGs

Examining Psychokinesis 30 based on radioactive decay is two orders of magnitude larger than the effect size of studies using noise (z

?

= 4.28, p = 1.87*10 5 ). However, this variable, too, is confounded by sample size.

Studies using radioactive decay are much smaller than studies using noise (see Table 4).

Chronologically, studies with RNGs based on radioactive decay predominated in the very early years of RNG experimentation, as indicated by their mean year of publication, which is just two years above the mean year of publication of the oldest studies in our sample (see Table 4).

Meta-Regression Analysis

The meta

regression analysis included the seven moderator variables (see Table 4) and the three safeguard variables (see Table 5) discussed above, using effect size as the dependent variable and the moderators and the safeguards as predictors. The three variables sample size, year of publication, and number of participants, which were split into quartiles for the previous moderator variable analyses (see Table 4), went into the regression analysis with their nominal values. All other variables were dummy coded. The analysis was weighted by the inverse of the within

study variances (this again assumes that all important moderators have been identified, and that probably is not the case as noted before). From the 10 predictors only two, year of publication and auditory feedback, entered the model (see Table 6) 4 . However, the model

4 The moderator variable analyses already clearly indicated that sample size is the most important predictor of effect size. The importance of the predictor was not confirmed by our weighted meta regression analyses because it is weighted by the inverse of the within study variance which almost perfectly correlates with sample size.

Therefore, sample size cannot enter the regression model. An unweighted stepwise multiple regression analysis with the same predictor variables clearly stresses the importance of sample size in this meta analysis. Sample size enters the model first and accounts for 9% of the variance. After three more steps the model accounts for 20% of the variance with sample size (  = .26), RNG control earlier (  = .28), random source radioactive (  = .14) and split of data preplanned (  =

-

.11) as predictors. However, an unweighted regression

analysis is difficult to interpret, especially when the effect size is so strongly connected with the study variance. Larger studies provide

Examining Psychokinesis 31 accounts for only 5% of the variance. This suggests that none of the variables we put into the regression analysis, nor any combination of the variables, accounts for the great variability in effect size we found in the data.

Random Effect Model

As can be seen in Table 4, the z

score (rnd) for the effect size based on a random

effects model for all heterogeneous subsamples of studies becomes non

significant. This measure also indicates, as all previous analyses have shown, that there is no simple overall effect. If there is an effect of human intention on the concurrent output of true RNGs then at least one moderator must be involved. From all sensitivity analyses presented here, sample size seems to be the most promising candidate. However, the connection between sample size and effect size can have many different causes, as will be discussed in the next section.

Discussion

Altogether, the meta

analysis divulged (an awkward word, perhaps "indicated" is better) three main findings: (i) a very small but statistically significant overall effect, (ii) a tremendous variability of effect size and (iii) a highly visible (I'd say "suspected") small study effect. (the tremendous variability points to an inadequate model, and incomplete specification of moderators. It cannot be appropriate to simply lay a random effects model on such data.

Moreover, this also points to an inadequate fixed effects model.)

Statistical Significance

The meta analysis replicated the finding of the previous meta analyses in the sense that the very small overall effect was significantly different from the expected value. The mean effect size better estimates and should have more weight, otherwise the regression is dominated by the impact of smaller studies, which might be more prone to publication bias and other effects which will be discussed under the heading of the small study effect in the discussion section.

Examining Psychokinesis 32 of the control studies did not differ significantly from MCE, although the size of the effect was comparable to that of the experimental studies. This does not necessarily imply that the effect found in the experimental studies is spurious; RNG experiments do not follow a standard test

control design to determine an effect, rather the test is always (not always) against MCE.

Control data in RNG experiments are simply used to demonstrate that the RNG output fits the theoretical premise (e.g., binominal distribution). (this is not true, or it is an idiosyncratic, and inappropriate definition of control -- which I believe is in fact presented early in this paper.) If the control data do not fit the theoretical premise, the RNG will be revised or a different RNG will be used. As a consequence, published control data are unequivocally (how do they know?) subject to a positive selection process - that is, divergent control data will generally not be published because they would cast doubt on the experiment as a whole. (This seems to imply that studies published without citing control data have withheld that data, which

I doubt is always, or even often, the case.) The fact that the experimental studies reached statistical significance and the control studies did not is a matter of statistical power - the sample size of the control studies is only half as large as the sample size of the experimental studies (1.07*10 9 bits vs. 2.17*10 9 bits, respectively). (as noted earlier, by the author's definition, huge quantities of "control" data are not included in this analysis, so any discussion of the es is misleading.) Moreover, the p value for the control studies was based on two sided testing, whereas the p

value for the intentional studies was one

sided.

The safeguard analyses demonstrated that the significance of the experimental studies is not the result of low quality studies. Nevertheless, the statistical significance, as well as the overall effect size, of the combined experimental studies has dropped continuously from the first meta

analysis to the one reported here. This is partially the result of the more recent meta

analyses including newer, larger studies. However, another difference between the current and the

Examining Psychokinesis 33 previous meta

analysis lies in the application of unequivocal inclusion and exclusion criteria.

We focused exclusively on studies examining the alleged concurrent interaction between direct human intention and RNGs. All previous meta

analyses also included non

intentional (in the sense of fieldreg studies, no; but we did include studies with implied intention) and non

human studies. Although this difference might explain the reduction in effect size and significance level, it cannot explain the extreme statistical heterogeneity of the database. This topic was neglected in the previous RNG meta

analyses (not really, we did look at the issue of homo- and heterogeneity; we just didn't try to figure out where it came from, as that wasn't the purpose of those MAs). We believe that the overall statistical significance found in our meta

analysis cannot be unequivocally interpreted in favor of an anomalous interaction as long as the tremendous variability of effect size remains unexplained. (well, yeah -- so isn't that the purpose here?)

Variability of Effect Size

The variability of effect size in this meta analysis is tremendous. We took several approaches to address this variability. For instance, trimming 20% of the studies from the overall sample resulted in a homogeneous subsample of studies. (but your other analyses indicated there was in fact no justification for the trimming on basis of, say, quality. Therefore, given the time and

N vs es correlations, this trivially must weaken the database.) Although the effect size of the homogenous sample did not differ significantly from that of the overall sample, the outcome did not differ significantly from MCE. However, although this indicates how vulnerable the overall result is in terms of statistical significance, this approach cannot explain what variables or (selection) processes are responsible for the variability. The extreme variability does not seem to be the result of any of the moderator variables examined

-

none of the moderator variable subsamples was independently homogeneous, not even sample size.


The moderator variable analyses demonstrated that sample size was a consistent and prominent moderator of effect size (variability). All of the moderator variables we analyzed were confounded by sample size. The Monte Carlo simulation of publication bias at the end of the next section will demonstrate that even though variability of effect size and the small

study effect are two separate concepts, they are in fact connected.

Small-Study Effect

For a similar class of studies (what sorts of studies are similar to these?) it is generally assumed that effect size is independent of sample size. (yes) However, from the sensitivity analyses it is evident that the effect sizes in this meta

analysis strongly depend on sample size.

(yes!) The asymmetric distribution of effect sizes in the funnel plot (see Figure 1), as well as in the continuous decline of effect size with increasing sample size in the sample size quartiles, illustrate this. How can this be explained?

Table 7 provides a list of potential sources for the small

study effect. The sources fall into three main categories (1) true heterogeneity, (2) data irregularities, and (3) selection biases. (Is true heterogeneity the representation of a better model? If so it is sensible, but needs to be spelled out.) Chance, another possible explanation for a small

study effect, seems very unlikely because of the magnitude of the effect and the sample size of the meta

analysis.

True heterogeneity

The higher effect sizes of the smaller studies may be due to specific differences in experimental design or setting in the smaller compared with the larger studies. In other words, the small

study effect is seen to be the result of certain moderator variable(s). For instance, smaller studies might be more successful because the participant

experimenter relationship is more intense. The routine of longer experimental series may make it difficult for the experimenter to maintain enthusiasm in the study. However, explanations such as these remain

Examining Psychokinesis 35 speculative as long as they are not systematically investigated and meta

analyzed. (There is another general category of potential explantion that is consistent with the N vs es finding -- the mechanism of the effect may not be correctly modeled by assuming bit-wise effects.)

From the moderator variables investigated in this meta

analysis, the hypotheses that smaller studies on average tested a different type of participant and used a different form of feedback are the most interesting. However, the moderator variable analyses showed that these two variables, as well as all other variables examined, are linked to sample size. For almost all moderator variables, the subsamples ("class" in Table 4) with the smallest mean sample sizes had the biggest effect. This suggests (but does not demonstrate. Given other issues, such as the model assumptions, this conclusion should not be accepted.) that sample size, and not the moderator variables, was the prime factor driving the heterogeneity of the database. This view is also supported by the heterogeneity of effect size observed in all of the moderator subsamples

.

Empirically, true heterogeneity among studies cannot be eliminated as a causal factor for the small

study effect, especially regarding complex interactions, which we have disregarded here.

However, the heterogeneity of the moderator

variable subsamples and the outstanding importance of sample size at all levels of analysis exclude true heterogeneity as the main source accounting for the small

study effect. This ignores goal-directedness, DAT, effort per time, etc.

It also ignores the possibility that PK operates on the sample size distribution, which would predict a perfect sqrt(N) dependency.

Data irregularities

A small

study effect may be due to data irregularities threatening the validity of the data. For example, smaller studies might be of poorer methodological quality, thereby artificially raising their effect size compared with that of larger studies. However, methodological quality

Examining Psychokinesis 36 improves only marginally with increasing sample size (r(357) = .18, p = 8.86*10 4 ) and therefore cannot explain the small

study effect in this meta

analysis. Another form of data irregularity, namely inadequate analysis, is based on the assumption that smaller trials are generally analyzed with less methodological rigor and therefore are more likely to report "false

positive results". However, this potential source is excluded here due to the straightforward and simple effect size measure used. A general source of data irregularity might be that smaller studies are more easily manipulated by fraud than larger studies because, for example, fewer people are involved. However, the number of researchers that would have to be implicated over the years renders this hypothesis very unlikely. In general, none of the data irregularity hypotheses considered can explain the small

study effect.Selection biases

When the inclusion of studies in a meta analysis is systematically biased in a way that smaller studies with smaller (original may be correct?) p

values, i.e., larger effect sizes, are more likely to be included than larger studies with smaller p

values, i.e., smaller effect sizes, a small

study effect may be the result. Several well

known biases such as publication bias, selective reporting bias, foreign language bias, citation bias and time lag bias may be responsible for a small

study effect (e.g., Egger, Dickersin, & Smith, 2001; Mahoney, 1985).

Biased inclusion criteria refer to biases on the side of the meta

analyst. In this particular domain, the two most prominent biases are foreign language bias and citation bias. Foreign language bias occurs when significant results are published in well

circulated, high

impact journals in English, whereas non

significant findings are published in small journals in the authors' native language. Therefore a meta

analysis including studies solely from journals in

English may include a disproportionately large number of significant studies. Citation bias refers to selective quoting. Studies with significant p

values are quoted more often and are more likely to be retrieved by the meta analyst. However, the small study effect in this meta -

Examining Psychokinesis 37 analysis is probably not due to these biases due to the inclusion of non

-

English publications and a very comprehensive search strategy. The authors claimed to use a very comprehensive search to retrieve all known studies, including unpublished studies. In which case, where are all these missing studies coming from? With say only 20 different groups of researchers ever having done these sorts of studies, each group would have had to hide on average 70 filedrawer studies given the authors' later estimate of 1400 filedrawer studies. This is implausible.

One of the most important selection biases to consider in any meta

analysis is publication bias. Publication bias refers to the fact that the probability of a study being published depends to some extent on its p

value. Several independent factors affect the publication of a study.

Rosenthal's term "file drawer problem" (Rosenthal, 1979) focuses on the author as the main source of publication bias, but there are other issues too. Editors' and reviewers' decisions also affect whether a study is published. The time lag from the completion of a study to its publication might also depend on the p value of the study (e.g., Ioannidis, 1998) and additionally contribute to the selection of studies available. Since the development of

Rosenthal's "file drawer" calculation (1979), numerous other methods have been developed to examine the impact of publication bias on meta

analyses (e.g., Dear & Begg, 1992; Duval &

Tweedie, 2000; Hedges, 1992; Iyengar & Greenhouse, 1988; Sterne & Egger, 2001).

In an attempt to examine publication bias we ran a Monte Carlo simulation based on

Hedges (1992) stepped weight function model and simulated a simple selection process.

According to this model, the authors', reviewers', and editors' perceived conclusiveness of a p

value is subject to certain "cliff effects" (Hedges, 1992) and this impacts on the likelihood of a study getting published. Hedges (1992) estimates the weights of the step function based on the available meta analytical data. However, different from Hedges, we used a predefined step weight function model, because we were primarily interested in seeing whether a simple

Examining Psychokinesis 38 selection model may in principle account for the small

study effect present in our meta

analytic data.

We assumed that 100% of studies with a p

value ? .01, 80% of studies with a p

value between p ? .05 and p > .01, 50% of studies with a p

value between p ? .10 and p > .05, 20% of studies with a p

value between p ? .50 and p > .10 and 10% of studies with p

value > .50 (one

sided) are published 5 . From this assumption, we randomly generated uniformly distributed p

values (Sure this doesn't mean p = 0.01 is just as likely as p = 0.5. I think they must mean normally distributed around chance expectation.) and calculated the effect sizes for all

"published" studies and counted the number of "not published" studies. That is, on the basis of the sample size for each of the 357 studies, we simulated a selective null

effect publication process (meaning, I think, the distribution of studies averages at chance, or p = 0.5). The averaged results of the simulation of 1000 meta

analyses are shown in Table 8. As can be seen, the overall effect size based on the Monte Carlo simulation perfectly matches the overall effect size found in our meta

analysis (see Table 4). The simulated data clearly replicated the small

study effect (see Table 8). The simulation also shows that, in order for these results to emerge, a total of 1453 studies had to be unpublished, i.e. for every study published four studies (nonsignificant) had to remain unpublished. That's awfully close to Rosenthal's criterion of

"robust." I wonder how long they worked with their model parameters to produce these results. Also, the results for the small sample size is nearly significantly different from their simulation. I'll see if I can replicate their model to see how insensitive it is to the choice of

5 The term published is used here very broadly to include publications of conference proceedings and reports which in terms of our literature search were considered unpublished. Importantly, in our discussion of the Monte

Carlo simulation, the term "published" also refers to studies obtained by splitting reports into studies. For simplicity, we assumed in the Monte Carlo simulation that the splitting of the 114 reports into 357 experimental studies was subject to the same selection process as the published reports themselves.

Examining Psychokinesis 39 parameters. (also -- is the Hedges work on selective reporting in ordinary psychology a good model for parapsychology?)

A secondary finding, which additionally confirms the value of the simulation, is that publication bias might be responsible not only for the small overall effect and the small

study effect found, but also for a large proportion of the effect size variability. The simulated overall sample as well as the 4 th quartile of the largest studies show a highly significant effect size variability, replicating what was found in our meta

analytical data. The effect size variability in the first three quartiles of the simulation is certainly different from the effect size variability in our meta

analytical data. However, this might be due to the highly idealized boundary conditions of the simulation.

Conclusion

Altogether, the simulation results are in very good agreement with the meta

analytical data

(except for 3/4ths of it), especially regarding the overall effect size and the small

study effect, which was the primary objective of the simulation. The simulation must be considered at least very suggestive, especially as it accounts for all three main findings divulged in the meta

analysis: (i) a very small but statistically significant overall effect, (ii) a tremendous variability of effect size (only in the large study quartile) and (iii) a highly visible small

study effect. In comparison with all other sources potentially accounting for the small

study effect discussed here, publication bias clearly is the most far reaching explanation regarding the main findings of this meta

analysis. (this monte carlo sounds like a simple case of carefully designed inputs to yield the expected outputs, and even then glossing over the mis-fitting.)

Nevertheless, whether the simulation of publication bias is considered conclusive evidence for publication bias in this meta analysis strongly depends on how many unpublished studies one believes it is reasonable to assume there are. However, the number of unpublished studies

Examining Psychokinesis 40 must be seen against the background of the enormous pressure to publish significant results. It is clear that not all results get published and that journals are filled with a selective sample of statistically significant studies (e.g., Sterling, 1959, Sterling, Rosenbaum, & Weinkam, 1995,

Rosenthal, 1979) (all references to non-parapsychology fields). Given that the majority of published studies are underpowered, it is surprising that 95% of articles in psychological journals and more than 85% of articles in medical journals report statistically significant results

(Sterling, Rosenbaum, & Weinkam, 1995). Authors, reviewers, as well as editors, are all involved in the selection process.

J.B. Rhine was the first editor of the Journal of Parapsychology (inception in 1937), the leading journal for experimental work in parapsychology. He initially believed "that little can be learned from a report of an experiment that failed to find psi" (Broughton, 1987, p. 27). More than

25% (n = 96) of the studies included in the current meta

analysis were published in this journal. However, from 1975, the Council of the Parapsychological Association rejected the policy of suppressing non

significant studies in parapsychological journals (Broughton, 1987;

Honorton, 1985). Whereas 48% of the studies in Q1 (1959

-

1973) were statistically significant, this rate dropped to 19% (Q2), 8% (Q3) and 14% (Q4) in the subsequent quartiles, indicating that the policy was implemented. (Interesting to consider the likelihood of these percentages being significant -- as a simple study-based model to compare with the bit-based model.) This demonstrates not only that the publication rate of significant studies in this domain is very different from the rate in conventional fields, but also, and more importantly, it demonstrates that, at least in the early period of RNG experimentation, the publication process was moderately selective in favor of statistically significant studies (Huh? If - as they state - 95% of studies in standard psych journals are significant, yet for psi the figures range from 8% to


48%, then surely this leads us away from publication bias as an viable explanation, not towards!?).

Statistically, significance is a matter of power, but, when conducting a meta

analysis, it can also be a matter of artifacts like publication bias. From this perspective, not only is the early period of RNG experimentation in great danger of being a highly selective sample, but also the database as a whole. A power analysis based on the overall effect size found in our meta

analysis shows that, for an RNG study to have a power of 80%, its sample size would have to be greater than 1,800,000,000 bits. (Ok, is this just obstinacy, or are they wearing blinders, or what?) (I am afraid it is or what!) None of the studies included in our meta

analysis comes even close to this sample size. Therefore the number of significant studies is highly questionable.

(sigh)

The studies published and collected here are probably a highly selective sample. The Monte

Carlo simulation indicated that there would have to be 1400 unpublished studies to account for the main findings presented here. However, "real world" publication decisions do not simply fit to a single, consistent set of threshold values. Publication decisions in respect of the smallest studies might follow completely different and more complex patterns. Consequently, far fewer (or far more!) studies might be needed in order to replicate the main findings of our meta

analysis than indicated by the Monte Carlo simulation. We believe that the 1400 unpublished studies mark the upper limit of the range of unpublished studies (why?). As a result, we doubt that the RNG database provides good evidence for an anomalous connection between direct human intention and the concurrent output of true RNGs. (what on earth justifies this conclusion?)

The limits of this conclusion are, of course, the assumptions to be made. (well, yeah) One of the main assumptions in undertaking a meta

analysis is the assumption that effect size is

Examining Psychokinesis 42 independent of sample size (finally!). The independence of sample size actually defines effect size measures (Not necessarily; there are all kinds of effect sizes). However, there might be effects where sample size is related to effect size. In such a case, the p

values would be independent of sample size and e.g. be constant across studies. Although our data does not confirm this simple model because the z

score distribution is far too heterogeneous, (explain how this conclusion is justifiable. No such calculation is reported in this paper.) other more complex models are conceivable. However, so far meta analysts in RNG research have not argued along these lines, they have argued that there is a small but replicable constant effect.

Another assumption that is generally made is that intention affects the mean value of the random sequence. (that is not an assumption, but the dependent variable in most experimental designs.) Although other outcomes have been suggested (e.g., Atmanspacher, Bösch, Boller,

Nelson & Scheingraber, 1999; Pallikari & Boller, 1999; Radin, 1989) they have been used only occasionally. (Ignores virtually all of Schmidt's and Kennedy's papers on this topic. Also, on p.

45 of our RNG MA paper in the Jonas book, we say "This means the statistical effects observed in these experiments are effectively independent of sample size, and cannot be explained as simple, linear, force- like mechanisms?. Further indication that a novel approach will be required to explain these effects are experiments strongly resembling RNG studies, but involving pre-recorded random bits rather than bits generated in real-time. Those studies show significant cumulative results similar to those reported here (Bierman, 1998). This implies that some MMI effects, perhaps including those claimed for distant healing, may involve acausal processes."

Although we question the conclusions of our predecessors, we would like to remind the reader that these experiments are highly refined operationalizations of phenomena which have challenged mankind for a very long period of time. The dramatic anomalous PK effects

Examining Psychokinesis 43 reported in séance rooms shrunk to experiments with electronic noise during a 100 year history of PK experiments. This achievement is certainly humble. However, further experiments will be conducted. They should be registered. This is the most straightforward solution for determining with any accuracy the rate of publication bias (e.g., Chalmers, 2001). It allows subsequent meta analysts to resolve more firmly the question whether the overall effect in

RNG experiments is an artifact of publication bias as we suspect. The PK effect in general is of great fundamental importance

-

if genuine. However, until we know with any certainty just how many non

significant, unpublished studies there are likely to be, we doubt that this unique experimental approach will gain the status of being of scientific value. Of course, such a pessimistic statement only serves to diminish interest in this line of work, thus guaranteeing that no one will ever know "the answer."

A quick look through their MA references doesn't show list the four studies below, all of which fit their inclusion criteria. This doesn't persuade me that they were as comprehensive as they claim. What else might they have missed? They also do not include the Jahn, et al. Publication of the 12-year databse, and other PEAR publications that would fit, e.g., Ibison. And they do include something like the Nelson 1994 Time normalization paper. Go figure.

Radin, D. I. & Utts, J. M. (1989). Experiments investigating the influence of intention on random and pseudorandom events. Journal of Scientific Exploration, 3, 65-79.

Radin, D. I. (1993). Environmental modulation and statistical equilibrium in mind-matter interaction. Subtle

Energies, 4 (1), 1-30.

Radin, D. I. (1992). Beyond belief: Exploring interactions among mind, body and environment. Subtle Energies,

2 (3), 1 - 40.

Radin, D. I. (1990). Testing the plausibility of psi-mediated computer system failures. Journal of Parapsychology,

54, 1-19.

Human Intention Interacting with Random Number Generators (Printed:

27 August 2004) 44

References

Alcock, J. E. (1981). Parapsychology: Science or magic? A psychological perspective. Oxford:

Pergamon.

Atmanspacher, H., Bösch, H., Boller, E., Nelson, R. D., & Scheingraber, H. (1999). Deviations from physical randomness due to human agent intention? Chaos, Solitons & Fractals, 10, 935

-

952.

Beloff, J., & Evans, L. (1961). A radioactivity test of psycho

kinesis. Journal of the Society for

Psychical Research, 41, 41

-

46.

Bem, D. J., & Honorton, C. (1994). Does psi exist? Replicable evidence for an anomalous process of information transfer. Psychological Bulletin, 115 (1), 4 18.

Bierman, D. J. (1985). A retro and direct PK test for babies with the manipulation of feedback:

A first trial of independent replication using software exchange. European Journal of

Parapsychology, 5, 373 390.

Blackmore, S. J. (1992). Psychic experiences: Psychic illusions. Skeptical Inquirer, 16, 367 376.

Braude, S. E. (1997). The limits of influence: Psychokinesis and the philosophy of science. Lanham:

University Press of America.

Broughton, R. S. (1987). Publication policy and the Journal of Parapsychology. Journal of

Parapsychology, 51, 21

-

32.


27 August 2004) 45

Chalmers, I. (2001). Using systematic reviews and registers of ongoing trials for scientific and ethical trial design, monitoring, and reporting. In M. Egger, G. D. Smith, & D. Altman

(Eds.), Systematic reviews in health care: Meta

-

analysis in context (pp. 429

-

443). London: BMJ

Books.

Cohen, J. (1994). The earth is round (p< .05). American Psychologist, 49, 997

-

1003.

Crookes, W. (1889). Notes of seances with D. D. Home. Proceedings of the Society for Psychical

Research, 6, 98

-

127.

Crookes, W., Horsley, V., Bull, W. C., & Myers, A. T. (1885). Report on an alleged physical phenomenon. Proceedings of the Society for Psychical Research, 3, 460

-

463.

Dear, K. B. G., & Begg, C. B. (1992). An approach for assessing publication bias prior to performing a meta

analysis. Statistical Science, 7, 237

-

245.

Dudley, R. T. (2000). The relationship between negative affect and paranormal belief.

Personality and Individual Differences, 28, 315 321.

Duval, S., & Tweedie, R. (2000). A nonparametric "trim and fill" method of accounting for publication bias in meta

analysis. Journal of the American Statistical Association, 95, 89

-

98.

Edgeworth, F. Y. (1885). The calculus of probabilities applied to psychical research. Proceedings

of the Society for Psychical Research, 3, 190

-

199.

Edgeworth, F. Y. (1886). The calculus of probabilities applied to psychical research. II.

Proceedings of the Society for Psychical Research, 4, 189

-

208.


27 August 2004) 46

Egger, M., Dickersin, K., & Smith, G. D. (2001). Problems and limitations in conducting systematic reviews. In M. Egger, G. D. Smith, & D. Altman (Eds.), Systematic reviews in health care: Meta analysis in context (pp. 43 68). London: BMJ Books.

Fisher, R. A. (1924). A method of scoring coincidences in tests with playing cards. Proceedings of

the Society for Psychical Research, 34, 181 185.

Gallup, G., & Newport, F. (1991). Belief in paranormal phenomena among adult Americans.

Skeptical Inquirer, 15, 137

-

146.

Geller, U. (1998). Uri Geller's little book of mind

-

power. London: Robson.

Girden, E. (1962a). A review of psychokinesis (PK). Psychological Bulletin, 59, 353

-

388.

Girden, E. (1962b). A postscript to "A Review of Psychokinesis (PK)". Psychological Bulletin, 59,

529

-

531.

Girden, E., & Girden, E. (1985). Psychokinesis: Fifty years afterward. In P. Kurtz (Eds.), A

Skeptic's Handbook of Parapsychology (pp. 129

-

146). Buffalo, NY: Prometheus Books.

Gissurarson, L. R. (1992). Studies of methods of enhancing and potentially training psychokinesis: A review. Journal of the American Society for Psychical Research, 86, 303

-

346.

Gissurarson, L. R. (1997). Methods of enhancing PK task performance. In S. Krippner (Eds.),

Advances in Parapsychological Research 8 (pp. 88

-

125). Jefferson, North Carolina: Mc Farland

Company.

Gissurarson, L. R., & Morris, R. L. (1991). Examination of six questionnaires as predictors of psychokinesis performance. Journal of Parapsychology, 55, 119

-

145.


27 August 2004) 47

Hedges, L. V. (1987). How hard is hard science, how soft is soft science? The empirical cumulativeness of research. American Psychologist, 42, 443 455.

Hedges, L. V. (1992). Modeling publication selection effects in meta analysis. Statistical Science,

7, 246 255.

Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta analysis. Orlando: Academic Press.

Honorton, C. (1985). Meta

analysis of psi ganzfeld research: A response to Hyman. Journal of


Honorton, C., & Ferrari, D. C. (1989). "Future telling": A meta

analysis of forced

choice precognition experiments, 1935

-

1987. Journal of Parapsychology, 53, 281

-

308.

Honorton, C., Ferrari, D. C., & Bem, D. J. (1998). Extraversion and ESP performance: A meta

analysis and a new confirmation. Journal of Parapsychology, 62, 255

-

276.

Irwin, H. J. (1993). Belief in the paranormal: A review of the empirical literature. Journal of the

American Society for Psychical Research, 87, 1

-

39.

Iyengar, S., & Greenhouse, J. B. (1988). Selection models and the file drawer problem.

Statistical Science, 3, 109

-

117.

Ioannidis, J. P. (1998). Effect of the statistical significance of results on the time to completion and publication of randomized efficacy trials. The Journal of the American Medical Association,

279, 281

-

286.


27 August 2004) 48

Jahn, R. G., Dunne, B. J., Dobyns, Y. H., Nelson, R. D., & Bradish, G. J. (1999). ArtREG: A random event experiment utilizing picture

-

preference feedback. (Technical Note PEAR 99003).

Princeton NJ 08544: PEAR Laboratory, School of Engineering and Applied Science.

Jahn, R. G., Dunne, B. J., & Nelson, R. D. (1980). Princeton Engineering Anomalies Research.

Program Statement. (Technical Report). Princeton, New Jersey: Princeton University,

Princeton Engineering Anomalies Research, School of Engineering/Applied Science.

Jahn, R. G., Mischo, J., Vaitl, D., Dunne, B. J., Bradish, G. J., Dobyns, Y. H., Lettieri, A.,

Nelson, R. D., Boller, E., Bösch, H., Houtkooper, J. M., & Walter, B. (2000).

Mind/Machine interaction consortium: PortREG replication experiments. Journal of

Scientific Exploration, 14, 499

-

555.

James, W. (1896). Psychical research. Psychological Review, 3, 649

-

652.

Jüni, P., Witschi, A., Bloch, R., & Egger, M. (1999). The hazards of scoring the quality of clinical trials for meta

analysis. The Journal of the American Medical Association, 282, 1054

-

1060.

Lawrence, T. R. (1998). Gathering in the sheep and goats... A meta

analysis of forced choice sheep

goat ESP studies, 1947

-

1993. In N. L. Zingrone, M. J. Schlitz, C. S. Alvarado, & J.

Milton (Eds.), Research in Parapsychology 1993 (pp. 27

-

31). Lanham, MD: Scarecrow Press.

Mahoney, M. J. (1985). Open exchange and epistemic progress. American Psychologist, 40, 29

-

39.

McGarry, J. J., & Newberry, B. H. (1981). Beliefs in paranormal phenomena and locus of control: A field study. Journal of Personality and Social Psychology, 41, 725

-

736.


27 August 2004) 49

Milton, J. (1993). A meta

analysis of waking state of consciousness, free response ESP studies.

Proceedings of Presented Papers: The Parapsychological Association 36th Annual Convention, 87 -

104.

Milton, J. (1997). Meta

-

Analysis of free

response ESP studies without altered states of consciousness. Journal of Parapsychology, 61, 279

-

319.

Milton, J., & Wiseman, R. (1999a). A meta

analysis of mass

media tests of extrasensory perception. British Journal of Psychology, 90, 235

-

240.

Milton, J., & Wiseman, R. (1999b). Does psi exist? Lack of replication of an anomalous process of information transfer. Psychological Bulletin, 125, 387

-

391.

Morris, R. L. (1982). Assessing experimental support for true precognition. Journal of


-

336.

Murphy, G. (1962). Report on paper by Edward Girden on psychokinesis. Psychological Bulletin,

59, 638

-

641.

Musch, J., & Ehrenberg, K. (2002). Probability misjudgement, cognitive ability, and belief in the paranormal. British Journal of Psychology, 93, 177

Nelson, R. D. (1994). Effect size per hour: A natural unit for interpreting anomalies experiments.

(Technical Note PEAR 94003). Princeton University, Princeton, NJ 08544: Princeton

Engineering Anomalies Research, School of Engineering/ Applied Science.

Pallikari, F., & Boller, E. (1999). A rescaled range analysis of random events. Journal of Scientific

Exploration, 13, 35

-

40.


27 August 2004) 50

Persinger, M. A. (2001). The neuropsychiatry of paranormal experiences. Journal of

Neuropsychiatry & Clinical Neurosciences, 13, 515

-

523.

Pratt, J. G. (1937). Clairvoyant blind matching. Journal of Parapsychology, 1, 10

-

17.

Pratt, J. G. (1949). The meaning of performance curves in ESP and PK test data. Journal of


-

22.

Pratt, J. G., Rhine, J. B., Smith, B. M., Stuart, C. E., & Greenwood, J. A. (1940). Extra

sensory perception after sixty years: A critical appraisal of the research in extra

-

sensory perception. New York:

Henry Holt and Company.

Price, M. M., & Pegram, M. H. (1937). Extra

sensory perception among the blind. Journal of


Radin, D. I. (1982). Experimental attempts to influence pseudorandom number sequences.

Journal of the American Society for Psychical Research, 76, 359

-

374.

Radin, D. I. (1989). Searching for "signatures" in anomalous human

machine interaction data:

A neural network approach. Journal of Scientific Exploration, 3, 185

-

200.

Radin, D. I. (1997). The conscious universe. San Francisco: Harper Edge.

Radin, D. I., & Ferrari, D. C. (1991). Effects of consciousness on the fall of dice: A meta

analysis. Journal of Scientific Exploration, 5, 61

-

83.

Radin, D. I., & Nelson, R. D. (1989). Evidence for consciousness

related anomalies in random physical systems. Foundations of Physics, 19, 1499

-

1514.


27 August 2004) 51

Radin, D. I., & Nelson, R. D. (2002). Meta

analysis of mind

matter interaction experiments:

1959 to 2000. In W. B. Jonas (Eds.), Spiritual Healing, Energy Medicine and Intentionality:

Research and Clinical Implications Edinburgh: Harcourt Health Sciences.

Reeves, M. P., & Rhine, J. B. (1943). The PK effect: II. A study in declines. Journal of


Rhine, J. B. (1934). Extrasensory perception. Boston: Boston Society for Psychic Research.

Rhine, J. B. (1936). Some selected experiments in extra sensory perception. Journal of Abnormal

and Social Psychology, 29, 151 171.

Rhine, J. B. (1937). The effect of distance in ESP tests. Journal of Parapsychology, 1, 172

-

184.

Rhine, J. B. (1946). Editorial: ESP and PK as "psi phenomena". Journal of Parapsychology, 10, 74 -

75.

Rhine, J. B., & Humphrey, B. M. (1944). The PK effect: Special evidence from hit patterns. I.

Quarter distribution of the page. Journal of Parapsychology, 8, 18

-

60.

Rhine, J. B., & Humphrey, B. M. (1945). The PK effect with sixty dice per throw. Journal of


-

218.

Rhine, J. B., & Rhine, L. E. (1927). One evening's observation on the Margery mediumship.

Journal of Abnormal and Social Psychology, 21, 421.

Rhine, L. E. (1937). Some stimulus variations in extra

sensory perception with child subjects.

Journal of Parapsychology, 1, 102

-

113.


27 August 2004) 52

Rhine, L. E., & Rhine, J. B. (1943). The psychokinetic effect: I. The first experiment. Journal of


-

43.

Richet, C. (1923). Thirty years of physical research. A treatise on metapsychics. New York: MacMillan

Company.

Rosenthal, R. (1979). The "file drawer problem" and tolerance for null results. Psychological

Bulletin, 86, 638

-

641.

Rosenthal, R., & Rosnow, R. L. (1991). Essentials of behavioral research. Methods and data analysis.

(2 ed.). New York: McGraw

-

Hill Publishing.

Rosenthal, R., & Rubin, D. B. (1989). Effect size estimation for one

sample multiple

choice type data: Design, analysis, and meta

analysis. Psychological Bulletin, 106, 332

-

337.

Rush, J. H. (1977). Problems and methods in psychokinesis research. In S. Krippner (Eds.),

Advances in Parapsychological Research 1. Psychokinesis (pp. 15

-

78). New York: Plenum.

Sanger, C. P. (1895). Analysis of Mrs. Verrall's card experiments. Proceedings of the Society for


-

197.

Schmeidler, G. R. (1977). Research findings in psychokinesis. In S. Krippner (Eds.), Advances in

Parapsychological Research 1. Psychokinesis (pp. 79

-

132). New York: Plenum.

Schmeidler, G. R. (1982). PK research: Findings and theories. In S. Krippner (Eds.), Advances

in Parapsychological Research 3 (pp. 115

-

146). New York: Plenum Press.

Schmidt, H. (1969). Anomalous prediction of quantum processes by some human subjects. Seattle,

WA: Plasma Physics Laboratory. D1

-

82

-

0821:1

-

38.


27 August 2004) 53

Schmidt, H. (1970). PK experiments with animals as subjects. Journal of Parapsychology, 34, 255

-

261.

Schmidt, H. (1985). Addition effect for PK on pre recorded targets. Journal of Parapsychology,

49, 229 244.

Schouten, S. A. (1983). Personal experience and belief in ESP. The Journal of Psychology, 114,

219

-

222.

Shadish, W. R., & Haddock, K. C. (1994). Combining estimates of effect size. In L. V. Hedges

& H. Cooper (Eds.), The handbook of research synthesis (pp. 261

-

281). New York: Russell Sage

Foundation.

Sparks, G. G. (1998). Paranormal depictions in the media: How do they affect what people believe? Skeptical Inquirer, 22, 35

-

39.

Sparks, G. G., Hansen, T., & Shah, R. (1994). Do televised depictions of paranormal events influence viewers´ beliefs? Skeptical Inquirer, 18, 386 395.

Sparks, G. G., Nelson, C. L., & Campbell, R. G. (1997). The relationship between exposure to televised messages about paranormal phenomena and paranormal beliefs. Journal of

Broadcasting & Electronic Media, 41, 345

-

359.

Stanford, R. G. (1978). Toward reinterpreting psi events. Journal of the American Society for


-

214.

Stanford, R. G., & Stein, A. G. (1994). A meta

analysis of ESP studies contrasting hypnosis and a comparison condition. Journal of Parapsychology, 58, 235

-

269.


27 August 2004) 54

Steinkamp, F., Milton, J., & Morris, R. L. (1998). A meta

analysis of forced

choice experiments comparing clairvoyance and precognition. Journal of Parapsychology, 62, 193 218.

Sterling, T. D. (1959). Publication decisions and their possible effects on inferences drawn from tests of significance

-

or vice versa. Journal of the American Statistical Association, 54, 34

Sterling, T. D., Rosenbaum, W. L., & Weinkam, J. J. (1995). Publication decisions revisited:

The effect of the outcome of statistical tests on the decision to publish and vice versa.

American Statistician, 49, 112

Sterne, J. A. C., & Egger, M. (2001). Funnel plots for detecting bias in meta analysis:

Guidelines on choice of axis. Journal of Clinical Epidemiology, 54, 1046

-

1055.

Sterne, J. A. C., Egger, M., & Smith, G. D. (2001). Investigating and dealing with publication and other biases. In M. Egger, G. D. Smith, & D. Altman (Eds.), Systematic reviews in health care: Meta analysis in context (pp. 189 208). London: BMJ Books.

Sterne, J. A. C., Gavaghan, D., & Egger, M. (2000). Publication and related bias in meta

analysis: Power of statistical tests and prevalence in the literature. Journal of Clinical

Epidemiology, 53, 1119

-

1129.

Storm, L., & Ertel, S. (2001). Does psi exist? Comments on Milton and Wiseman's (1999) meta

analysis of ganzfeld research. Psychological Bulletin, 127, 424

-

433.

Targ, R., & Puthoff, H. E. (1977). Mind

-

reach. Scientists Look at Psychic Ability. Delacorte Press.

Tart, C. T. (1976). Effects of immediate feedback on ESP performance. Research in

Parapsychology 1975, 80

-

82.


27 August 2004) 55

Taylor, G. Le M. (1890). Experimental comparison between chance and thought

transference in correspondence of diagrams. Proceedings of the Society for Psychical Research, 6, 398 405.

Thalbourne, M. A. (1995). Further studies of the measurement and correlates of belief in the paranormal. Journal of the American Society for Psychical Research, 89, 233

-

247.

Thompson, S. G., & Higgins, J. P. T. (2002). How should meta

regression analyses be undertaken and interpreted? Statistics in Medicine, 2002, 1559

-

1574.

Thompson, S. G., & Sharp, S. J. (1999). Explaining heterogeneity in meta

analysis: A comparison of methods. Statistics in Medicine, 18, 2693

-

2708.

Thouless, R. H. (1942). The present position of experimental research into telepathy and related phenomena. Proceedings of the Society for Psychical Research, 47, 1 19.

Thouless, R. H., & Wiesner, B. P. (1946). The psi processes in normal and "paranormal" psychology. Proceedings of the Society for Psychical Research, 48, 177 196.

Varvoglis, M. P., & McCarthy, D. (1986). Conscious

purposive focus and PK: RNG activity in relation to awareness, task

orientation, and feedback. Journal of the American Society for


-

29.

Watt, C. A. (1994). Meta

analysis of DMT

-

ESP studies and an experimental investigation of perceptual defense/vigilance and extrasensory perception. In E. W. Cook & D. L. Delanoy

(Eds.), Research in Parapsychology 1991 (pp. 64

-

68). Metuchen, NJ: Scarecrow Press.

White, R. A. (1991). The psiline database system. Exceptional Human Experience, 9, 163

-

167.

Wilson, C. (1976). The Geller phenomenon. London: Aldus Books.


27 August 2004) 56

Appendix

Studies Used in the Meta

-

Analysis

André, E. (1972). Confirmation of PK action on electronic equipment. Journal of Parapsychology,

36, 283

-

293.

Berger, R. E. (1986). Psi effects without real

time feedback using a PsiLab ][ video game experiment. The Parapsychological Association 29th Annual Convention, 111

-

128.

Bierman, D. J., & Houtkooper, J. M. (1975). Exploratory PK tests with a programmable high speed random number generator. European Journal of Parapsychology, 1, 3

-

14.

Bierman, D. J., De Diana, I. P. F., & Houtkooper, J. M. (1976). Preliminary report on the

Amsterdam experiments with Matthew Manning. European Journal of Parapsychology, 1, 6 16.

Bierman, D. J., & Noortje, V. T. (1977). The performance of healers in PK tests with different

RNG feedback algorithms. Research in Parapsychology 1976, 131

-

133.

Bierman, D. J., & Weiner, D. H. (1980). A preliminary study of the effect of data destruction on the influence of future observers. Journal of Parapsychology, 44, 233

-

243.

Bierman, D. J., & Houtkooper, J. M. (1981). The potential observer effect or the mystery of irreproducibility. European Journal of Parapsychology, 3, 345

-

371.

Bierman, D. J. (1987). Explorations of some theoretical frameworks using a PK

test environment. Proceedings of Presented Papers: The Parapsychological Association 30th Annual

Convention, 33

-

40.


27 August 2004) 57

Bierman, D. J. (1988). Testing the IDS model with a gifted subject. Theoretical Parapsychology, 6,

31 36.

Bierman, D. J., & Van Gelderen, W. J. M. (1994). Geomagnetic activity and PK on a low and high trial

rate RNG. Proceedings of Presented Papers: The Parapsychological Association 37th

Annual Convention, 50

-

56.

Braud, L., & Braud, W. (1977). Psychokinetic effects upon a random event generator under conditions of limited feedback to volunteers and experimenter. Proceedings of Presented Papers:

The Parapsychological Association 20th Annual Convention.

Braud, W., & Kirk, J. (1977). Attempt to observe psychokinetic influences upon a random event generator by person

fish teams. European Journal of Parapsychology, 2, 228

-

237.

Braud, W. (1981). Psychokinesis experiments with infants and young children. Research in


-

31.

Braud, W. G., & Hartgrove, J. (1976). Clairvoyance and psychokinesis in transcendental meditators and matched control subjects: A preliminary study. European Journal of


-

16.

Braud, W. G. (1978). Recent investigations of microdynamic psychokinesis, with special emphasis on the roles of feedback, effort and awareness. European Journal of Parapsychology, 2,

137

-

162.

Braud, W. G. (1983). Prolonged visualization practice and psychokinesis: A pilot study (RB).

Research in Parapsychology 1982, 187

-

189.


27 August 2004) 58

Breederveld, H. (1988). Towards reproducible experiments in psychokinesis IV. Experiments with an electronic random number generator. Theoretical Parapsychology, 6, 43 51.

Breederveld, H. (1989). The Michels experiments: An attempted replication. Journal of the

Society for Psychical Research, 55, 360 363.

Breederveld, H. (2001). De Optimal Stopping Strategie XL PK

experimenten met een random number generator. [The optimal stopping strategy XL. PK experiments with a random number generator]. SRU

-

Bulletin, 13, 22

-

23.

Broughton, R. S., & Millar, B. (1977). A PK experiment with a covert release

of

effort test.


-

30.

Broughton, R. S. (1979). An experiment with the head of Jut. European Journal of Parapsychology,

2, 337

-

357.

Broughton, R. S., Millar, B., & Johnson, M. (1981). An investigation into the use of aversion therapy techniques for the operant control of PK production in humans. European Journal of


-

344.

Broughton, R. S., & Higgins, C. A. (1994). An investigation of micro PK and geomagnetism.

Proceedings of Presented Papers: The Parapsychological Association 37th Annual Convention, 87 94.

Broughton, R. S., & Alexander, C. H. (1997). Destruction testing DAT. Proceedings of Presented

Papers: The Parapsychological Association 40th Annual Convention, 100 104.


27 August 2004) 59

Crandall, J. E. (1993). Effects of extrinsic motivation on PK performance and its relations to state anxiety and extraversion. Proceedings of Presented Papers: The Parapsychological Association

36th Annual Convention, 372 377.

Curry, C. K. (1978). A modularized random number generator: Engineering design and psychic

experimentation. Unpublished senior thesis, Department of Electrical Engineering and

Computer Science, School of Engineering/Applied Science, Princeton University.

Dalton, K. S. (1994). Remotely influenced ESP performance in a computer task: A preliminary study. Proceedings of Presented Papers: The Parapsychological Association 37th Annual Convention,

95

-

103.

Davis, J. W., & Morrison, M. D. (1978). A test of the Schmidt model's prediction concerning multiple feedback in a PK test. Research in Parapsychology 1977, 163

-

168.

Debes, J., & Morris, R. L. (1982). Comparison of striving and nonstriving instructional sets in a PK study. Journal of Parapsychology, 46, 297 312.

Gerding, J. L. F., Wezelman, R., & Bierman, D. J. (1997). The Druten disturbances

-

Exploratory RSPK research. Proceedings of Presented Papers: The Parapsychological Association

40th Annual Convention, 146

-

161.

Giesler, P. V. (1985). Differential micro

-

PK effects among Afro

-

Brazilian cultists: Three studies using trance

significant symbols as targets. Journal of Parapsychology, 49, 329

-

366.

Gissurarson, L. R. (1986). RNG

-

PK microcomputer "games" overviewed: An experiment with the videogame "PSI INVADERS". European Journal of Parapsychology, 6, 199

-

215.


27 August 2004) 60

Gissurarson, L. R., & Morris, R. L. (1990). Volition and psychokinesis: Attempts to enhance

PK performance through the practice of imagery strategies. Journal of Parapsychology, 54, 331 -

370.

Gissurarson, L. R. (1990). Some PK attitudes as determinants of PK performance. European


-

122.

Gissurarson, L. R., & Morris, R. L. (1991). Examination of six questionnaires as predictors of psychokinesis performance. Journal of Parapsychology, 55, 119

-

145.

Haraldsson, E. (1970). Subject selection in a machine precognition test. Journal of


-

191.

Heseltine, G. L. (1977). Electronic random number generator operation associated with EEG activity. Journal of Parapsychology, 41, 103

-

118.

Heseltine, G. L., & Mayer Oakes, S. A. (1978). Electronic random generator operation and

EEG activity: Further studies. Journal of Parapsychology, 42, 123

-

136.

Heseltine, G. L., & Kirk, J. (1980). Examination of a majority

vote technique. Journal of


-

176.

Hill, S. (1977). PK effects by a single subject on a binary random number generator based on electronic noise. Research in Parapsychology 1976, 26 28.

Honorton, C. (1971). Group PK performance with waking suggestions for muscle tension/ relaxation and active/ passive concentration. Proceedings of the Parapsychological Association, 8,

14

-

15.


27 August 2004) 61

Honorton, C. (1971). Automated forced

choice precognition tests with a "sensitive". Journal of

the American Society for Psychical Research, 65, 476

-

481.

Honorton, C., & Barksdale, W. (1972). PK performance with waking suggestions for muscle tension vs. relaxation. Journal of the American Society for Psychical Research, 66, 208 214.

Honorton, C., Ramsey, M., & Cabibbo, C. (1975). Experimenter effects in ESP research.

Journal of the American Society for Psychical Research, 69, 135

-

139.

Honorton, C., & May, E. C. (1976). Volitional control in a psychokinetic task with auditory and visual feedback. Research in Parapsychology 1975, 90

-

91.

Honorton, C. (1977). Effects of meditation and feedback on psychokinetic performance: A pilot study with an instructor of Transcendental Meditation. Research in Parapsychology 1976,

95

-

97.

Honorton, C., & Tremmel, L. (1980). Psitrek: A preliminary effort toward development of psi conducive computer software. Research in Parapsychology 1979, 159 161.

Honorton, C., Barker, P., & Sondow, N. (1983). Feedback and participant

selection parameters in a computer RNG study (RB). Research in Parapsychology 1982, 157

-

159.

Honorton, C. (1987). Precognition and real

time ESP performance in a computer task with an exceptional subject. Journal of Parapsychology, 51, 291

-

320.

Houtkooper, J. M. (1976). Psychokinesis, clairvoyance and personality factors. Proceedings of

Presented Papers: The Parapsychological Association 19th Annual Convention, 1

-

15.


27 August 2004) 62

Houtkooper, J. M. (1977). A study of repeated retroactive psychokinesis in relation to direct and random PK effects. European Journal of Parapsychology, 1, 1 20.

Houtkooper, J. M., Schienle, A., Vaitl, D., & Stark, R. (1999). Atmospheric electromagnetism:

An attempt at replicating the correlation between natural sferics and ESP. Proceedings of

Presented Papers: The Parapsychological Association 42nd Annual Convention, 123

-

135.

Jacobs, J. C., Michels, J. A. G., Millar, B., & Millar

-

De Bruyne, M.

-

L. F. L. (1987). Building a

PK trap: The adaptive trial speed method. Proceedings of Presented Papers: The Parapsychological

Association 30th Annual Convention, 348

-

370.

Jahn, R. G., Dunne, B. J., Dobyns, Y. H., Nelson, R. D., & Bradish, G. J. (1999). ArtREG: A random event experiment utilizing picture

-

preference feedback. (Technical Note PEAR 99003).

Princeton NJ 08544: PEAR Laboratory, School of Engineering and Applied Science.

Jahn, R. G., Mischo, J., Vaitl, D., Dunne, B. J., Bradish, G. J., Dobyns, Y. H., Lettieri, A.,

Nelson, R. D., Boller, E., Bösch, H., Houtkooper, J. M., & Walter, B. (2000).

Mind/Machine interaction consortium: PortREG replication experiments. Journal of

Scientific Exploration, 14, 499 555.

Kelly, E. F., & Kanthamani, B. K. (1972). A subject's efforts toward voluntary control. Journal of


Kelly, E. F., & Lenz, J. (1976). EEG correlates of trial

by

trial performance in a two

choice clairvoyance task: A preliminary study. Research in Parapsychology 1975, 22

-

25.


27 August 2004) 63

Kugel, W., Bauer, B., & Bock, W. (1979). Versuchsreihe Telbin

.

[Experimental series Telbin].

(Arbeitsbericht 7). Berlin: Technische Universität Berlin.

Kugel, W. (1999). Amplifying precognition: Four experiments with roulette. Proceedings of

Presented Papers: The Parapsychological Association 42nd Annual Convention, 136

-

146.

Levi, A. (1979). The influence of imagery and feedback on PK effects. Journal of Parapsychology,

43, 275

-

289.

Lignon, Y., & Faton, L. (1977). Le factor psi sexerce sur un appareil électronique .

[The psi factor affects an electronic apparatus]. Psi Réalité, 54 62.

Lounds, P. (1993). The influence of psychokinesis on the randomly

generated order of emotive and non

emotive slides. Journal of the Society for Psychical Research, 59, 187

-

193.

Lucadou, W. v. (1991). Locating Psi

bursts

-

correlations between psychological characteristics of observers and observed quantum physical fluctuations. Proceedings of Presented Papers: The

Parapsychological Association 34th Annual Convention, 265

-

281.

Mabilleau, P. (1982). Electronic dice: A new way for experimentation in "psiology". Le Bulletin

PSILOG, 2, 13

-

14.

Matas, F., & Pantas, L. (1971). A PK experiment comparing meditating vs. nonmeditating subjects. Proceedings of the Parapsychological Association, 8, 12

-

13.

May, E. C., & Honorton, C. (1976). A dynamic PK experiment with Ingo Swann. Research in


-

89.


27 August 2004) 64

Michels, J. A. G. (1987). Consistent high scoring in self

test PK experiments using a stopping strategy. Journal of the Society for Psychical Research, 54, 119 129.

Millar, B., & Broughton, R. (1976). A preliminary PK experiment with a novel comuter

linked high speed random number generator. Research in Parapsychology 1975, 83

-

84.

Millar, B., & Mackenzie, P. (1977). A test of intentional vs unintentional PK. Research in


-

35.

Millar, B., & Broughton, R. S. (1977). An investigation of the psi enhancement paradigm of

Schmidt. Research in Parapsychology 1976, 23

-

25.

Millar, B. (1983). Random bit generator experimenten. Millar

replicatie. [Random bit generator experiments. Millar's replication]. SRU

-

Bulletin, 8, 119

-

123.

Morris, R., Nanko, M., & Phillips, D. (1978). Intentional observer influence upon measurements of a quantum mechanical system: A comparison of two imagery strategies.

Program and Presented Papers: Parapsychological Association 21st Annual Convention, 266 275.

Morris, R. L., & Reilly, V. (1980). A failure to obtain results with goal

oriented imagery PK and a random event generator with varying hit probability. Research in Parapsychology 1979,

166

-

167.

Morris, R. L., & Harnaday, J. (1981). An attempt to employ mental practice to facilitate PK.


-

104.

Morris, R. L., & Garcia

-

Noriega, C. (1982). Variations in feedback characteristics and PK success. Research in Parapsychology 1981, 138

-

140.


27 August 2004) 65

Morrison, M. D., & Davis, J. W. (1978). PK with immediate, delayed, and multiple feedback: A test of the Schmidt model's predictions. Program and Presented Papers: Parapsychological

Association 21st Annual Convention, 97 117.

Nanko, M. (1981). Use of goal

oriented imagery strategy on a psychokinetic task with "selected" subjects. Journal of the Southern California Society for Psychical Research, 2, 1

-

5.

Nelson, R. D. (1994). Effect size per hour: A natural unit for interpreting anomalies experiments.

(Technical Note PEAR 94003). Princeton University, Princeton, NJ 08544: Princeton

Engineering Anomalies Research, School of Engineering/ Applied Science.

Palmer, J., & Perlstrom, J. R. (1986). Random event generator PK in relation to task instructions: A case of "motivated" error? The Parapsychological Association 29th Annual

Convention, 131

-

147.

Palmer, J. (1995). External psi influence of ESP task performance. Proceedings of Presented Papers:

The Parapsychological Association 38th Convention, 270

-

282.

Palmer, J., & Broughton, R. S. (1995). Performance in a computer task with an exceptional subject: A failure to replicate. Proceedings of Presented Papers: The Parapsychological Association

38th Convention , 289 294.

Palmer, J. (1998). ESP and REG PK with Sean Harribance: Three new studies. Proceedings of

Presented Papers: The Parapsychological Association 41st Annual Convention, 124 134.

Pantas, L. (1971). PK scoring under preferred and nonpreferred conditions. Proceedings of the

Parapsychological Association, 8, 47 49.


27 August 2004) 66

Pare, R. (1983). Random bit generator experimenten. Pare

replicatie. [Random bit generator experiments. Pare's replication].

SRU Bulletin, 8, 123 128.

Randall, J. L. (1974). An extended series of ESP and PK tests with three English schoolboys.

Journal of the Society for Psychical Research, 47, 485

-

494.

Reinsel, R. (1987). PK performance as a function of prior stage of sleep and time of night.

Proceedings of Presented Papers: The Parapsychological Association 30th Annual Convention, 332

-

347.

Schechter, E. I., Barker, P., & Varvoglis, M. P. (1983). A preliminary study with a PK game involving distraction from the Psi task (RB). Research in Parapsychology 1982, 152

-

154.

Schechter, E. I., Honorton, C., Barker, P., & Varvoglis, M. P. (1984). Relationships between participant traits and scores on two computer

controlled RNG

-

PK games. Research in

Parapsychology 1983, 32 33.

Schmeidler, G. R., & Borchardt, R. (1981). Psi

scores with random and pseudo

random targets. Research in Parapsychology 1980, 45

-

47.

Schmidt, H. (1969). Anomalous prediction of quantum processes by some human subjects.

(D1

-

82

-

0821). Plasma Physics Laboratory.

Schmidt, H. (1970). A PK test with electronic equipment. Journal of Parapsychology, 34, 175

-

181.

Schmidt, H., & Pantas, L. (1972). Psi tests with internally different machines. Journal of


-

232.


27 August 2004) 67

Schmidt, H. (1972). An attempt to increase the efficiency of PK testing by an increase in the generation speed. Proceedings of Presented Papers: Parapsychological Association 15th Annual

Convention

Schmidt, H. (1973). PK tests with a high

speed random number generator. Journal of


-

118.

Schmidt, H. (1974/a). PK effect on random time intervals. Research in Parapsychology 1973, 46

-

48.

Schmidt, H. (1974/b). Comparison of PK action on two different random number generators.


-

55.

Schmidt, H. (1975). PK experiment with repeated, time displaced feedback. Proceedings of

Presented Papers: The Parapsychological Association 18th Annual Convention.

Schmidt, H. (1976). PK effects on pre

recorded targets. Journal of the American Society for


-

291.

Schmidt, H., & Terry, J. (1977). Search for a relationship between brainwaves and PK performance. Research in Parapsychology 1976, 30

-

32.

Schmidt, H. (1978). Use of stroboscopic light as rewarding feedback in a PK test with pre

recorded and momentarily generated random events. Program and Presented Papers:

Parapsychological Association 21st Annual Convention, 85 96.

Schmidt, H. (1990). Correlation between mental processes and external random events. Journal

of Scientific Exploration, 4, 233 241.


27 August 2004) 68

Schouten, S. A. (1977). Testing some implications of a PK observational theory. European


-

31.

Stanford, R. G. (1981). "Associative activation of the unconscious" and "visualization" as methods for influencing the PK target: A second study. Journal of the American Society for

Psychical Research, 75, 229 240.

Stanford, R. G., & Kottoor, T. M. (1985). Disruption of attention and PK

task performance.

Proceedings of Presented Papers: The Parapsychological 28th Annual Convention, 117

-

132.

Stewart, W. C. (1959). Three new ESP tests machines and some preliminary results. Journal of


-

48.

Talbert, R., & Debes, J. (1981). Time

displacement psychokinetic effects on a random number generator using varying amounts of feedback. Program and Presented Papers: Parapsychological

Association 24th Annual Convention.

Tedder, W., & Braud, W. (1981). Long

distance, nocturnal psychokinesis. Research in


-

101.

Tedder, W. (1984). Computer

based long distance ESP: An exploratory examination. Research

in Parapsychology 1983, 100

-

101.

Thouless, R. H. (1971). Experiments on psi self

training with Dr. Schmidt's pre

cognitive apparatus. Journal of the Society for Psychical Research, 46, 15

-

21.

Tremmel, L., & Honorton, C. (1980). Directional PK effects with a computer based random generator system: A preliminary study. Research in Parapsychology 1979, 69 71.


27 August 2004) 69

Varvoglis, M. P. (1988). A "psychic contest" using a computer

-

RNG task in a non

laboratory setting. Proceedings of Presented Papers: The Parapsychological Association 31st Annual Convention,

36

-

52.

Verbraak, A. (1981). Onafhankelijke random bit generator experimenten

-

Verbraak

replicatie.

[Independent random bit generator experiments - Verbraak's replication]. SRU

-

Bulletin, 6,

134

-

139.

Weiner, D. H., & Bierman, D. J. (1979). An observer effect in data analysis? Research in


-

58.

Winnett, R. (1977). Effects of meditation and feedback on psychokinetic performance: Results with practitioners of Ajapa Yoga. Research in Parapsychology 1976, 97 98.


27 August 2004) 70

Table 1

Main Results of Radin & Ferrari's (1991) Dice Meta-Analysis

Dice-casts "Influenced"

N

All studies

All studies, quality weighted

Balanced studies

Balanced studies, homogenous

Balanced studies, homogenous, quality

148

148

69

59

59

 t

0.50610

0.50362

0.50431

0.50158

0.50147 weighted

Dice-casts Control

All studies 31 0.50047

Note . The z -value is based on the null value of ¯ t

= .5

* p < .05. ** p < .01. *** p < .001

SE

.00031

.00036

.00055

.00061

.00063 z

19.68***

10.18***

7.83***

2.60** *

2.33* **

.00128 0.36

***


27 August 2004) 71

Table 2

Previous PK Meta-analyses - Total Samples

N

Dice

1991 Meta-analysis

RNG

1989 First meta-analysis

1997 First MA without PEAR data

2000 Second meta-analysis

148

597

339

515 o

.50822

.50018

.50061

.50005

Note .  mean = the unweighted averaged effect size of studies

*** p < .001

SE z

.00041 20.23***

.00003

.00009

.00001

6.53***

6.41***

3.81***

 mean

.51105

.50414

.50701

.50568


27 August 2004) 72

Table 3

Basic Study Characteristics - Intentional Studies

Source of studies

Journal

Conference proceeding

Report

Thesis/Dissertation

Number of participants

1

>1 - 10

>10 - 20

>20 - 30

>30 - 40

>40 - 50

>50 - 60

>60 - 70

>70 - 80

>80 - 90

>90 - 100

>100

2

3

33

13

12

12

2

2

7

63

16

2

91

101

52

276

Studies

( n )

Year of publication

 1960

1961 - 1970

1971 - 1980

1981 - 1990

1991 - 2000

> 2000

Sample size (bit)

>10 - 100

>100 - 1,000

>1,000 - 10,000

>10,000 - 100,000

>100,000 - 1,000,000

>1,000,000 - 10,000,000

>10,000,000 - 100,000,000

>100,000,000 - 1,000,000,000 3

>1,000,000,000 - 10,000,000,000 1

7

8

74

123

91

42

8

37

1

2

14

202

101

Studies

( n )


27 August 2004) 73

Table 4

Overall, Trimmed Sample and Moderator Variables Summary Statistics

Variable and class n  o

SE z

Intentional overall

Trimmed

Homogeneous

Heterogeneous

Sample size (bit)

(Q1) Smallest

(Q2) Small

(Q3) Large

(Q4) Largest

Year of publication

(Q1) Oldest

(Q2) Old

(Q3) New

(Q4) Newest

Number of participants

(Q1) One

(Q2) Few

(Q3) Several

(Q4) Many

Unknown

Participants

Selected

Unselected

Other

Study status

Formal

Pilot

Other

Feedback

Visual

Auditory

Other

Random sources

Noise

Radioactive

Other

357 .500029

287 .500017

70 .500136

89 .523500

91 .505519

91 .503249

86 .500026

90 .505356

97 .500178

85 .500292

85 .500024

91 .500453

101 .500043

57 .499986

81 .500024

27 .500627

55 .500628

244 .500027

58 .500237

192 .500027

152 .500061

13 .500202

221 .500017

35 .502357

101 .500085

194 .500028

96 .502357

67 .500766

.000039

.000012

.000117

.000185

.000011

.000408

.000011

.000049

.000191

.000012

.000382

.000028

.000011

.000544

.000389

.000011

.000011

.000034

.002655

.000914

.000421

.000011

.000394

.000147

.000247

.000011

.000168

.000045

-.35

***

2.10* **

5.35***

3.40** *

2.53* **

.58

***

2.46* **

1.26

***

1.06

***

1.49

***

6.18***

3.06** *

2.60** *

4.33***

1.97* **

2.73** *

1.54

***

4.02***

8.85***

6.03***

7.71***

2.43* **

13.59***

1.21

***

1.18

***

2.23* **

2.70** *

.96

***

M bit

2863035

22955969

677453

134727

8881236

27790

10744544

698876

527819

8370435

50385

3212016

11202736

10442

25523

6095359

6824190

3107153

446

3683

16726

25280771

18553

120378

52993

25390500

116550

1232867

Mdn bit

M py

6400 1979

6400 1981

6593 1976

320 1979

4000 1978

15360 1979

200000 1983

4364 1971

8000 1977

7500 1981

18000 1991

5000 1980

4800 1978

12288 1980

18000 1982

7500 1979

5000 1977

10321 1980

1280 1980

7727 1982

6400 1978

5000 1977

5000 1979

17000 1976

10000 1983

11456 1983

1600 1973

10800 1979

M sub. Mdn sub.

16

61

4

27

8

26

14

9

19

14

24

15

23

30

24

16

16

25

1

7

20

22

14

23

19

14

25

 2

16

40

1

16

10

10

6

6

10

6

10

10

6

11

6

10

12

8

1

8

10 1442.89***

10 323.96

***

4 1107.88***

10

10

10

11

318.92***

269.52***

419.04***

262.79***

636.32***

255.24***

122.17** *

244.02***

632.14***

330.26***

168.60***

134.18***

143.80***

579.62***

675.64***

176.85***

616.39***

801.66***

23.55* **

814.63***

254.83***

331.14***

852.46***

459.53***

109.01*** z (rnd)

-.02

.18

.20

.08

.09

.05

.08

.02

.15

.04

.39

.16

.07

.22

.29

.54

.09

.18

.13

.09

.04

.06

.07

.58

.45

.42

.13


27 August 2004) 74

Note . z (rnd) = z -score random model

* p < .05. ** p < .01. *** p < .001


27 August 2004) 75

Table 5

Safeguard Variables Summary Statistics

Variable and class n  o

SE

RNG control

Yes (2)

Earlier (1)

No (0)

All data reported

Yes (2)

Unclear (1)

No (0)

Split of data

Preplanned (2)

Unclear (1)

Not preplanned (0)

Safeguard sum-score

Sum = 6 (highest)

Sum = 5

Sum = 4

Sum = 3

Sum = 2

Sum = 1

Sum = 0 (lowest)

233

45

79

138

47

104

46

246

7

104

289

11

57

8

4

10

Note . z (rnd) = z -score random model

* p < .05. ** p < .01. *** p < .001

.500031

.499996

.499979

.500028

.501074

.500204

.500064

.500026

.501111

.500285

.500025

.500178

.500303

.517376

.500000

.503252

.000011

.000052

.000377

.000011

.000537

.000134

.000043

.000011

.000314

.000098

.000011

.000137

.000339

.002755

.026352

.001344 z

2.82** *

-.08

***

-.06

***

2.58** *

2.00* **

1.52

***

1.48

***

2.32* **

3.53** *

2.90** *

2.27* **

1.30

***

.89

***

6.31***

.00

***

2.42* **

M bit

8448064

13471208

33856

7477090

80726

250459

583819

45275647

33029

187695

45373113

143255

55091

4185

120

13840

Mdn bit

M py

10000 1981

1000 1982

4048 1977

6400 1980

37000 1976

7200 1978

6400 1980

25600 1983

4400 1977

6144 1982

48000 1983

6400 1979

1600 1976

1472 1978

120 1977

16000 1975

M sub. Mdn sub.

21

18

12

43

19

14

29

18

3

26

21

30

18

2

10

28

10

8

10

10

10

6

17

10

2

10

10

10

10

1

10

10

 2

866.46***

286.75***

289.22***

1352.94***

16.75

***

67.71

***

734.71***

173.27***

522.33***

455.05***

210.59***

391.64***

105.72***

220.96***

.00

***

4.75

*** z (rnd)

.05

.14

.14

.13

.13

.07

.09

.08

.00

.00

.06

.36


27 August 2004) 76

Table 6

Summary of Weighted Stepwise Linear Meta-Regression Analysis for Variables Predicting Effect size of RNG Studies

B SE B ? Step and variable

Step 1

Year of publication

Step 2

Year of publication

-.000025

-.000021

.000007

.000007

-.180

-.153

Auditory feedback

* p < .05. ** p < .01. *** p < .001

.001856 .000770 .128 t

-3.45***

-2.88** *

2.41* **

R 2

.032

.048


27 August 2004) 77

Table 7

Potential Sources of the Small-Study Effect

True heterogeneity

Different intensity/quality

Different participants

Different feedback

Other moderator(s)

Data irregularities

Poor methodological design

Effect intrinsically irreplicable

Inadequate analysis

Fraud

Selection biases

Biased inclusion criteria

Publication bias

Chance

The table is adapted from Sterne, Egger, & Smith, 2001, p. 193.


27 August 2004) 78

Table 8

Stepped Weight Function Monte Carlos Simulation of Publication Bias

Overall n

357

 o

0.500027

Sample size

(Q1) Smallest

(Q2) Small

(Q3) Large

(Q4) Largest

89

91

91

86

0.514658

0.505118

0.502462

0.500024

SE

0.000021

0.004908

0.001692

0.000794

0.000021 z

2.48* **

5.80***

5.92***

6.07***

2.22* **

Stud

1453

364

367

372

350 z

?

0.36

1.58

0.21

0.88

0.08

 2

592.19***

116.52* **

115.21* **

113.27* **

138.64***

Note . Stud = number of studies which did not get published (simulated); z

?

= difference between effect size of the simulated and experimental data

* p < .05. ** p < .01. *** p < .001


27 August 2004) 79

Figure 1

Funnel Plot Intentional Studies


27 August 2004) 80

10,000,000,000

1,000,000,000

100,000,000

10,000,000

1,000,000

100,000

10,000

1,000

100

10

0.35

0.4

0.45

0.5

Effect size (pi)

0.55

Three very extreme values (pi > .70; n < 400) are omitted form the figure for the sake of better representation of all other values.

0.6

0.65

meta-analysis.SBB-DIR.RDN.doc

Radin comments inserted with change-tracking

Related documents

Products

Support

meta-analysis.SBB-DIR.RDN.doc

Radin comments inserted with change-tracking

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib