February 21, 2012 § Leave a comment
With respect to both planetary position and to precession of the equinoxes, predictions made with Ptolemy’s system never quite conformed with the best available observations. Further reduction of those minor discrepancies constituted many of the principal problems of normal astronomical research for many of Ptolemy’s successors… Given a particular discrepancy, astronomers were invariably able to eliminate it by making some particular adjustment in Ptolemy’s system of compounded circles. But as time went on, a man looking at the net result of the normal research effort of many astronomers could observe that astronomy’s complexity was increasing far more rapidly than its accuracy and that a discrepancy corrected in one place was likely to show up in another.
Say you have a mathematical model that predicts some system you have no control over. All models and all measurements are wrong, so you’ll never get complete agreement between theory and observations. At some point, though, you’re sure enough about your measurements that you think the inaccuracy is in the model. You can always add ad hoc terms to a model. If you did regression, for example, you can add terms until you get a sufficiently good fit. Given enough degrees of freedom, you can fit to any finite number of observations, though I guess they didn’t have enough computing power to do this in the 15th century. Why is this kind of model complexity bad? It defeats the point of falsification, for one thing.
Even Copernicus’ more elaborate proposal was neither simpler nor more accurate than Ptolemy’s system. Available observational tests, as we shall see more clearly below, provided no basis for a choice between them. Under these circumstances, one of the factors that led astronomers to Copernicus… was the recognized crisis that had been responsible for innovation in the first place. Ptolemaic astronomy had failed to solve its problems; the time had come to give a competitor a chance.
The trouble with falsification is not just that every theory is false, it’s that just about every theory is demonstrably false. You can show that a particular method falls down and practitioners will no-sell this. Instead your burden is to show that the theory is no longer useful, which is a lot harder. On the other hand, falsification does win (usually, maybe) in the very long run: the eventual triumph of Copernicus over Ptolemy is because it fits observation better, even if that wasn’t clear in Copernicus’ time. Crisis generates theories, but in itself, it doesn’t determine which theory wins.
February 13, 2012 § Leave a comment
Often, viewing all [scientific] fields together, it seems instead a rather ramshackle structure with little coherence among its various parts… [S]ubstituting paradigms for rules should make the diversity of scientific fields and specialties easier to understand. Explicit rules, when they exist, are usually common to a very broad scientific group, but paradigms need not be.
Kuhn’s use of “rules” is broad, encompassing everything from Newton’s laws to standards for measurement. The idea that one has to show statistical significance at level 0.05 is a rule adopted by a disparate range of fields, but this doesn’t mean these fields share a paradigm. To ponder: in what ways are rules used to legitimise new paradigms?
The scientific enterprise as a whole does from time to time prove useful, open up new territory, display order, and test long-accepted belief. Nevertheless, the individual engaged on a normal research problem is almost never doing any one of these things… What then challenges him is the conviction that, if only he is skillful enough, he will succeed in solving a puzzle that no one before has solved or solved so well.
This again relies on a narrow definition of the “scientific enterprise”, excluding medicine, for instance, which is often useful. As for testing “long-accepted belief”, there’s a whole industry of contrarians, sometimes including me, that do this. While we would love for those within the paradigm to listen to us, we kind of doubt they will, and instead seek acclamation from those in other paradigms, or more perniciously, the media. Do we fall within Kuhn’s “scientific enterprise” or not?
February 10, 2012 § Leave a comment
In the sciences (though not in fields like medicine, technology, and law, of which the principal raison d’être is an external social need), the formation of specialized journals, the foundation of specialized journals, the foundation of specialists’ societies, and the claim for a special place in the curriculum have usually been associated with a group’s first reception of a single paradigm.
The part of this that bears further thought is the parenthesised aside. Consider macroeconomics. There’s a bunch of journals that are incomprehensible to those who haven’t learned the math and the jargon. Yet instead of a dominant paradigm, there remain a number of competing candidates that are almost, but not quite, incomprehensible to adherents of their rivals. Has macro failed to elevate one paradigm because social science data doesn’t allow a definitive winner? Or is it because the policy stakes are high?
When the individual scientist can take a paradigm for granted, he need no longer, in his major works, attempt to build his field anew, starting from first principles and justifying the use of each concept introduced. That can be left to the writer of textbooks. Given a textbook, however, the creative scientist can begin his research where it leaves off and thus concentrate exclusively upon the subtlest and most esoteric aspects of the natural phenomena that concern his group.
Does this mean that if want to have influence on a field, you should write a text? Depends on the field. A textbook writer in physics doesn’t have much latitude as to what to prioritise and what to downplay — the paradigm is settled. A textbook writer in (frequentist) statistics has more room to manoeuvre. You have to talk about t-tests, because the field expects you to, but you can downplay them in a way you can’t downplay F = ma.
December 22, 2011 § Leave a comment
I, by contrast, propose that the first thing to be taken into account should be the severity of tests… And I hold that what ultimately decides the fate of a theory is the result of a test, i.e. an agreement about basic statements… for me the choice is decisively influenced by the application of the theory and the acceptance of the basic statements in connection with this application…
This is in opposition to preferring the simple on an aesthetic basis. More importantly, he suggests we agree on the basic statements, and not universals.
He draws a long analogy to trial by jury:
The verdict is reached in accordance with a procedure which is governed by rules. These rules are based on certain fundamental principles which are chiefly, if not solely, designed to result in the discovery of objective truth. They sometimes leave room not only for subjective convictions but even for subjective bias.
The ideal these days, I guess, is that everyone can play juror if data are made available. Of course, taking data as basic (or near-basic) statements requires a decision.
The empirical basis of objective science has thus nothing ‘absolute’ about it. Science does not rest upon solid bedrock. The bold structure of its theories rises, as it were, above a swamp.
I see no reason not to believe this. The question is, then, to what extent can the theories built upon the swamp be objective — in particular, when most measurements have an associated error? We need to get into Popper’s treatment of probability before we can deal with this question.
December 16, 2011 § Leave a comment
[The scientist's] aim is to find explanatory theories (if possible, true explanatory theories); that is to say, theories which describe certain structural properties of the world, and which permit us to deduce, with the help of initial conditions, the effects to be explained.
“Initial conditions” are singular statements that apply to a specific event in question. Combining these with universal laws produces predictions. Popper doesn’t require that every event can be deductively predicted from universal laws. But science has to search for such laws that causally explain events. Popper contends that while scientific laws are not verifiable, they are falsifiable.
One angle from which the primacy of falsification might be challenged is instrumentalism. Berkeley suggested abstract theories are instruments for the prediction of observable phenomena, and not genuine assertions about the world. The difference is that between “all models are wrong” and “all models are falsifiable”.
Popper rejects instrumentalism because everybody uses abstract properties in ordinary speech.
There is no sharp dividing line between an ‘empirical language’ and a ‘theoretical language’: we are theorizing all the time, even when we make the most trivial singular statement.
We are always using models, so we’re always wrong. Personally, I can live with this. Under instrumentalism, the crucial question becomes “how wrong”. As long as measurements are taken to be real features of the world, the answer to this can be used in falsificationism.
But what if measurements are dependent on assumptions? This is an implication of conventionalism. Duhem held that universal laws are merely human conventions. Since measurements depend on these laws, a conventionalist might argue that theoretical systems are not only unverifiable but also unfalsifiable. Popper makes a value judgement against conventionalism, not because it’s demonstrably wrong but because it allows explaining away, rendering it useless for science. He quotes Joseph Black:
A nice adaptation of conditions will make almost any hypothesis agree with the phenomena. This will please the imagination but does not advance our knowledge.
Statistics makes such adaptation even easier: the phenomena were merely improbable. The rise of probabilistic models makes it even more valuable to guard against ad hoc adaptations.
December 14, 2011 § Leave a comment
I’m finding important contrasts between The Logic of Scientific Discovery and my fourth-hand preconceptions of the book. Popper differentiates between four kinds of tests:
- “the logical comparison of the conclusions among themselves, by which the internal consistency of the system is tested”
- “the investigation of the logical form of the theory, with the object of determining whether it has the character of an empirical or scientific theory”
- “the comparison with other theories, chiefly with the aim of determining whether the theory would constitute a scientific advance should it survive our various tests”
- “the testing of the theory by way of empirical applications of the conclusions which can be derived from it”
The demarcation problem — “finding a critierion which would enable us to distinguish between the empirical sciences on the one hand, and mathematics and logic as well as ‘metaphysical’ systems on the other” — is something I think about a lot. I hadn’t previously connected this to the induction problem, and will have to think about whether accepting a convention for demarcation lets us build science without induction.
Popper says that scientific statements are objective in the sense that they can be criticised “inter-subjectively”. In practice this seems to mean that other scientists can test the statements. This means “there can be no ultimate statements in science”, which I am satisfied with.
November 30, 2011 § Leave a comment
(riffing on this course announcement)
There are oceans of data out there. The human ability to think in a million dimensions is limited, and that’s where models come in. All of statistics could, if you wished, be reframed in terms of models. An average is just the result of a model where every individual is assigned the same value. This is reductionist, of course, and we strive for a useful combination of simplicity and accuracy.
But all models are wrong (otherwise they’re not models), and we need to know how wrong they are. This is where statistics comes into its own — answering the question “how wrong”. When we’re modelling a particular data set, we can state exactly how wrong by calculating residuals. Usually, however, we also want to simply how wrong. So we have measures like the standard deviation and the root mean squared error. It can also be useful to examine how accurate the RMS error represents the residuals, but quantifying this can easily lead to a sinkhole.
Assessing the accuracy of predictions made by a model is a different matter. Consider the case where you have data from a nice stationary process. The most reliable way of dealing with this is splitting data into training and test sets, though there are shortcuts that may work well in the right circumstances.
In many cases, the data aren’t so nice. This is where you need to be very careful about how much you trust not just your models, but also the models underlying your error assessments. It’s not enough, for instance, to select between a null and an alternative model if neither is particularly close to the truth. Subject matter knowledge is crucial here. Statisticians should do a better job of helping them out.
Conclusion: The value of models seems self-evident. If I were teaching a course on models, I would be tempted for it to consist entirely of repetitions of “all models are wrong”. Would have to think hard about how to make it more constructive than that.