26 April 2010

Why We'll Never Enforce The Immigration Laws

This article and video at CNN is biased and completely unfair to the pro-enforcement position. It also demonstrates why the US will never strictly enforce its immigration laws (and, from my point of view, never should).

25 April 2010

Peer Review

I thought you might be interested by this announcement I've received:
[O]nly 8% members of the Scientific Research Society agreed that 'peer review works well as it is.' (Chubin and Hackett, 1990; p.192)

"A recent U.S. Supreme Court decision and an analysis of the peer review system substantiate complaints about this fundamental aspect of scientific research." (Horrobin, 2001)

Horrobin concludes that peer review "is a non-validated charade whose processes generate results little better than does chance." (Horrobin, 2001) This has been statistically proven and reported by an increasing number of journal editors.

But, "Peer Review is one of the sacred pillars of the scientific edifice" (Goodstein, 2000), it is a necessary condition in quality assurance for Scientific/Engineering publications, and "Peer Review is central to the organization of modern science…why not apply scientific [and engineering] methods to the peer review process" (Horrobin, 2001).

This is the purpose of The 2nd International Symposium on Peer Reviewing: ISPR 2010 (http://www.sysconfer.org/ispr) being organized in the context of The SUMMER 4th International Conference on Knowledge Generation, Communication and Management: KGCM 2010 (http://www.sysconfer.org/kgcm), which will be held on June 29th - July 2nd, in Orlando, Florida, USA.
I agree that peer review is a problem, but I think it's mostly a problem because we lie to ourselves and to the public about what purpose peer review serves. Peer review is about protecting what Kuhn called "normal science," by which he meant incrementalist, paradigmatic, non-revolutionary progress in understanding the world. Every once in a while, the paradigm shifts and normal science becomes impossible; the old understanding of the world is dead and a new paradigm is established. Kuhn says that scientists who worked in the old paradigm can't even work as scientists under the new paradigm, as their entire way of understanding the world has been undermined.

Peer review is meant to paper over the cracks and preserve the old paradigm as long as possible. Reviewers are gatekeepers, who allow into our best journals only those papers that sustain the current paradigm. In this way, scientists are trained only to propose and test incremental contributions to our understanding. Eventually, the cracks become too large and the old paradigm crumbles.

Peer review also promotes good methods and good analysis, although not best methods and best analysis. Moreover, methods and analysis are only two values among many. If the theory is interesting or the data is unusual, reviewers will let defects in methods and/or analysis slide. The real problem, though, is that peer review is in no way an audit of the paper or data despite our letting people assume that it is. If data is bad, no reviewer will be able to ferret that out, nor is review of the raw data a routine part of peer review. We assume that the authors did what they say they did, and dealt honestly with the data they found.

24 April 2010

Not A Triple Tautology

In explaining the phrase ├ępater le bourgeois, Wikipedia uses the wonderful phrase "French Decadent poets." This suggests worlds I had no idea of; poets who aren't decadent, Frenchmen who aren't poets, and even French poets who aren't decadent. Who'd a thought it.

If You:

1. Are in the left-most lane;

2. Are traveling slower than the speed limit; and

3. Have no one to your right,

you are doing it wrong.

23 April 2010

Statistics III

The third installment of our little series on statistics is perhaps the most counter-intuitive lesson of all. The lesson is two-fold:

1. A sample only represents the population of interest as a whole if every member of the population of interest had the same chance of being sampled, which is what "random sample" means, regardless of how large the sample is.

2. Given a truly random sample, the accuracy of population estimates based on the sample depends only on the size of the sample, not on the size of the population.

("Sample" is used here to mean a subset of the population available for study, and thus not the population itself.)

16 April 2010

Statistics II

Why don't statistics work backwards? There are actually many reasons, but three are key. First, statistics requires data and data requires theory. Second, theory is about causation and statistics are about correlation. Correlation does not imply causation. Third, it just can't work backwards.

1. You can't do statistics unless you have data, so data proceeds statistics. But you can't collect data without some theory that tells you which data is relevant. There is lots of unstructured information available about the world. It can't be analyzed unless its been sorted and classified, and since that necessarily must come before statistical analysis, it can only be sorted through theory. (Even with archival data, it only exists before it's relevant.) No one is ever going to regress corporate performance against CEO hair color because, even though the information is available, it makes no theoretical sense.

2. Theory is a proposed causal relationship between two constructs. Statistics depends upon the correlation between one or more independent variables (the possible cause(s)) and the dependent variable (the supposed effect). You can't get to theory from statistics because correlation does not imply causation, though causation requires correlation. There are lots of pairs of things that correlate, even very strongly, without being causally related. Sometimes that's because both are caused by some third construct. Consider, for example, this graph of the relationship between the number of lemons imported from Mexico and US highway fatalities:



(I stole this graph from Derek Lowe, who in turn stole it from a letter to the editor in the Journal of Chemical Information and Modeling (Johnson, 2008).)

"R2 = 0.97" means that Mexican lemons explain approximately 97% of the variability in US highway fatalities over time, an almost unheard of result in the social sciences and much better than lots of studies that have caused otherwise rational people to upset their way of life. Why shouldn't we, then, repeal the traffic laws and just start importing lemons willy-nilly? Because theory tells us that lemons don't cause a reduction in highway deaths, no matter what statistics tells us.

What's really going on in this graph is that we've gotten richer over time, mostly as a result of increased productivity resulting from advanced technology. Richer means that we're importing more luxuries, like Mexican lemons. Advanced technology means that cars can be safer and richer means that safer cars are affordable. (For all I know, it might also be that technology advances have made importing lemons from Mexico easier and cheaper, increasing demand.) In any event, imported lemons and highway fatalities only correlate with each other because they both have causal relationships with a missing factor or two. Only theory can tell us whether a factor is missing and what it might be. Even then, it usually doesn't.

3. The third reason we can't work backward is because that's just not how it works. This has to do with our acceptance of false positives, and with null hypothesis testing, so it's a little more complicated.

Statistics works by trying to figure out, if our two data sets really were random rather than systematically different based on the independent variable of interest, how likely it would be that we would get that distribution. Let's say we suspected, for example, that altitude effects whether a coin lands heads or tails. To test our theory, we flip a coin at ground level and at the summit of Mt. Everest. On the ground, we get 48 heads and 52 tails. At altitude, we get 62 heads and 38 tails. How likely is it that our distribution could happen at random? As it happens, I can test that: the chance of getting that distribution randomly is 0.047, or just under 5%. If I had gotten 61 heads on Everest, the chance would have risen to 6.5%.

As most of you know, the convention in science is that if the probability of a particular distribution being random is less than 5%, we can claim that the two sets of numbers are significantly different. The first thing to note about this is that this is only a convention. It is entirely arbitrary; there is no theory that makes 0.05 better than 0.04 or 0.06. If it were higher, we'd have more false positives, if it were lower, we'd have more false negatives. Just like deciding whether the speed limit should be 55 or 65, in the end all you can do is pick one.

The second thing to note is that this is not the same as saying that the chance that my result is random (and thus wrong) is 0.047. Implicitly, statistical tests compare my actual hypothesis (altitude causes a change in coin flipping results) to an opposite "null" hypothesis (altitude doesn't cause a change in coin flipping results). Generically, the hypothesis being tested is always "these two sets of numbers are significantly different" and the null hypothesis is always "these two sets of numbers are not significantly different." So the 0.047 really means, "the chance that my results are a false positive is 4.7%, if the null hypothesis is true. Of course, if the null hypothesis is false, then my theory is true and my results are necessarily not a false positive. In the real world, we can't know whether my theory is true separate from statistics, but strong theory, well founded on prior results, should be true. In other words, if my theory is convincing, the total chance that my results are a false positive is much less than 5%. (In some fields, the chance that my results are a false positive will be further reduced by replication, but in management we don't do replication.)

Now, what if we try to work backwards? The problem is that, for every hundred random regressions I do, I'll get (by definition) an average of five "significant" but false results even if there's no actual relationship in my data. Once, simply doing the regressions was onerous and something of a brake on this kind of fishing, but now, if I have the data on my computer, I can easily do 100 regressions in an hour. If I do a thousand regressions, I'll have 50 that are significant, and 10 that look really significant (p < 0.01). (One of the annoying things about popular science reporting is the focus on really small values of p. For reasons not worth going into here, if the probability of the null hypothesis being true is less than 5%, it really doesn't matter how small it is.) It is much, much, much more likely (actually, all but certain) that I'll find significance when fishing around than when testing theory. If we work backwards, we have no way of knowing what the chance of a false positive is.

[Because I want to make sure this point is clear, I'm going to beat it over the head a bit. If I come up with a theory and then test it, I know that the chance that my results are random (a false positive) is no more than 5%. Because my theory is strong and based on prior research -- and to be published it has to be vetted by experienced scholars in the field -- I know that the real chance of a false positive is actually quite a bit less than 5%. But if I work backwards, I have no idea what the chance of a false positive is. The particular relationship I found is less than 5% likely to be random, but I know -- by definition, and thus with certainty -- that if I do 100 regressions on completely random data, some of those regressions will be sufficiently unlikely (p < 0.05). In other words, when I'm fishing, the chance of a false positive is 100%, even though it is still true that the chance of any particular significant result being false is less than 5%. If I take that result and then fit a theory to it, I have no idea what the real chance of that result being a false positive is. Inherent in the math behind statistics is the assumption that I am testing a relationship I theorized a priori. That assumption is necessarily violated if I find the relationship and then develop the theory.]

Why can't we just go find significance and then go see if we can develop strong theory? Because we can always find a theory to fit if we know the end point.

07 April 2010

Statistics I

Since it's become clear that I'm not going to write one long post on statistics, I've decided to write an infinite series of short posts on statistics.

Don't say I didn't warn you.

In this post, I'll start with the purpose of statistics: the purpose of statistics is to tell us whether two sets of numbers, that differ based on some characteristic that we suspect might be important, are really different. Although it can get very sophisticated (for example, we sometimes don't know the second set of numbers), that's really all statistics does or can do.

In particular, statistics can't work backwards. We can't see that two sets of numbers are different, and then work our way back to figure out what the difference is. Theory must drive statistics, but statistics can't drive theory.

Next time, what alpha means, and what what alpha means means.