Kuhn vs Copernicus
February 21st, 2012 § Leave a Comment
With respect to both planetary position and to precession of the equinoxes, predictions made with Ptolemy’s system never quite conformed with the best available observations. Further reduction of those minor discrepancies constituted many of the principal problems of normal astronomical research for many of Ptolemy’s successors… Given a particular discrepancy, astronomers were invariably able to eliminate it by making some particular adjustment in Ptolemy’s system of compounded circles. But as time went on, a man looking at the net result of the normal research effort of many astronomers could observe that astronomy’s complexity was increasing far more rapidly than its accuracy and that a discrepancy corrected in one place was likely to show up in another.
Say you have a mathematical model that predicts some system you have no control over. All models and all measurements are wrong, so you’ll never get complete agreement between theory and observations. At some point, though, you’re sure enough about your measurements that you think the inaccuracy is in the model. You can always add ad hoc terms to a model. If you did regression, for example, you can add terms until you get a sufficiently good fit. Given enough degrees of freedom, you can fit to any finite number of observations, though I guess they didn’t have enough computing power to do this in the 15th century. Why is this kind of model complexity bad? It defeats the point of falsification, for one thing.
Even Copernicus’ more elaborate proposal was neither simpler nor more accurate than Ptolemy’s system. Available observational tests, as we shall see more clearly below, provided no basis for a choice between them. Under these circumstances, one of the factors that led astronomers to Copernicus… was the recognized crisis that had been responsible for innovation in the first place. Ptolemaic astronomy had failed to solve its problems; the time had come to give a competitor a chance.
The trouble with falsification is not just that every theory is false, it’s that just about every theory is demonstrably false. You can show that a particular method falls down and practitioners will no-sell this. Instead your burden is to show that the theory is no longer useful, which is a lot harder. On the other hand, falsification does win (usually, maybe) in the very long run: the eventual triumph of Copernicus over Ptolemy is because it fits observation better, even if that wasn’t clear in Copernicus’ time. Crisis generates theories, but in itself, it doesn’t determine which theory wins.
Kuhn: Rules and fouls
February 13th, 2012 § Leave a Comment
Often, viewing all [scientific] fields together, it seems instead a rather ramshackle structure with little coherence among its various parts… [S]ubstituting paradigms for rules should make the diversity of scientific fields and specialties easier to understand. Explicit rules, when they exist, are usually common to a very broad scientific group, but paradigms need not be.
Kuhn’s use of “rules” is broad, encompassing everything from Newton’s laws to standards for measurement. The idea that one has to show statistical significance at level 0.05 is a rule adopted by a disparate range of fields, but this doesn’t mean these fields share a paradigm. To ponder: in what ways are rules used to legitimise new paradigms?
The scientific enterprise as a whole does from time to time prove useful, open up new territory, display order, and test long-accepted belief. Nevertheless, the individual engaged on a normal research problem is almost never doing any one of these things… What then challenges him is the conviction that, if only he is skillful enough, he will succeed in solving a puzzle that no one before has solved or solved so well.
This again relies on a narrow definition of the “scientific enterprise”, excluding medicine, for instance, which is often useful. As for testing “long-accepted belief”, there’s a whole industry of contrarians, sometimes including me, that do this. While we would love for those within the paradigm to listen to us, we kind of doubt they will, and instead seek acclamation from those in other paradigms, or more perniciously, the media. Do we fall within Kuhn’s “scientific enterprise” or not?
Kuhn: Detours en route to normal science
February 10th, 2012 § Leave a Comment
In the sciences (though not in fields like medicine, technology, and law, of which the principal raison d’être is an external social need), the formation of specialized journals, the foundation of specialized journals, the foundation of specialists’ societies, and the claim for a special place in the curriculum have usually been associated with a group’s first reception of a single paradigm.
The part of this that bears further thought is the parenthesised aside. Consider macroeconomics. There’s a bunch of journals that are incomprehensible to those who haven’t learned the math and the jargon. Yet instead of a dominant paradigm, there remain a number of competing candidates that are almost, but not quite, incomprehensible to adherents of their rivals. Has macro failed to elevate one paradigm because social science data doesn’t allow a definitive winner? Or is it because the policy stakes are high?
When the individual scientist can take a paradigm for granted, he need no longer, in his major works, attempt to build his field anew, starting from first principles and justifying the use of each concept introduced. That can be left to the writer of textbooks. Given a textbook, however, the creative scientist can begin his research where it leaves off and thus concentrate exclusively upon the subtlest and most esoteric aspects of the natural phenomena that concern his group.
Does this mean that if want to have influence on a field, you should write a text? Depends on the field. A textbook writer in physics doesn’t have much latitude as to what to prioritise and what to downplay — the paradigm is settled. A textbook writer in (frequentist) statistics has more room to manoeuvre. You have to talk about t-tests, because the field expects you to, but you can downplay them in a way you can’t downplay F = ma.
Popper on choosing between theories
December 22nd, 2011 § Leave a Comment
I, by contrast, propose that the first thing to be taken into account should be the severity of tests… And I hold that what ultimately decides the fate of a theory is the result of a test, i.e. an agreement about basic statements… for me the choice is decisively influenced by the application of the theory and the acceptance of the basic statements in connection with this application…
This is in opposition to preferring the simple on an aesthetic basis. More importantly, he suggests we agree on the basic statements, and not universals.
He draws a long analogy to trial by jury:
The verdict is reached in accordance with a procedure which is governed by rules. These rules are based on certain fundamental principles which are chiefly, if not solely, designed to result in the discovery of objective truth. They sometimes leave room not only for subjective convictions but even for subjective bias.
The ideal these days, I guess, is that everyone can play juror if data are made available. Of course, taking data as basic (or near-basic) statements requires a decision.
The empirical basis of objective science has thus nothing ‘absolute’ about it. Science does not rest upon solid bedrock. The bold structure of its theories rises, as it were, above a swamp.
I see no reason not to believe this. The question is, then, to what extent can the theories built upon the swamp be objective — in particular, when most measurements have an associated error? We need to get into Popper’s treatment of probability before we can deal with this question.
Popper on instrumentalism and conventionalism
December 16th, 2011 § Leave a Comment
[The scientist's] aim is to find explanatory theories (if possible, true explanatory theories); that is to say, theories which describe certain structural properties of the world, and which permit us to deduce, with the help of initial conditions, the effects to be explained.
“Initial conditions” are singular statements that apply to a specific event in question. Combining these with universal laws produces predictions. Popper doesn’t require that every event can be deductively predicted from universal laws. But science has to search for such laws that causally explain events. Popper contends that while scientific laws are not verifiable, they are falsifiable.
One angle from which the primacy of falsification might be challenged is instrumentalism. Berkeley suggested abstract theories are instruments for the prediction of observable phenomena, and not genuine assertions about the world. The difference is that between “all models are wrong” and “all models are falsifiable”.
Popper rejects instrumentalism because everybody uses abstract properties in ordinary speech.
There is no sharp dividing line between an ‘empirical language’ and a ‘theoretical language’: we are theorizing all the time, even when we make the most trivial singular statement.
We are always using models, so we’re always wrong. Personally, I can live with this. Under instrumentalism, the crucial question becomes “how wrong”. As long as measurements are taken to be real features of the world, the answer to this can be used in falsificationism.
But what if measurements are dependent on assumptions? This is an implication of conventionalism. Duhem held that universal laws are merely human conventions. Since measurements depend on these laws, a conventionalist might argue that theoretical systems are not only unverifiable but also unfalsifiable. Popper makes a value judgement against conventionalism, not because it’s demonstrably wrong but because it allows explaining away, rendering it useless for science. He quotes Joseph Black:
A nice adaptation of conditions will make almost any hypothesis agree with the phenomena. This will please the imagination but does not advance our knowledge.
Statistics makes such adaptation even easier: the phenomena were merely improbable. The rise of probabilistic models makes it even more valuable to guard against ad hoc adaptations.
Brad reads Popper
December 14th, 2011 § Leave a Comment
I’m finding important contrasts between The Logic of Scientific Discovery and my fourth-hand preconceptions of the book. Popper differentiates between four kinds of tests:
- “the logical comparison of the conclusions among themselves, by which the internal consistency of the system is tested”
- “the investigation of the logical form of the theory, with the object of determining whether it has the character of an empirical or scientific theory”
- “the comparison with other theories, chiefly with the aim of determining whether the theory would constitute a scientific advance should it survive our various tests”
- “the testing of the theory by way of empirical applications of the conclusions which can be derived from it”
The demarcation problem — “finding a critierion which would enable us to distinguish between the empirical sciences on the one hand, and mathematics and logic as well as ‘metaphysical’ systems on the other” — is something I think about a lot. I hadn’t previously connected this to the induction problem, and will have to think about whether accepting a convention for demarcation lets us build science without induction.
Popper says that scientific statements are objective in the sense that they can be criticised “inter-subjectively”. In practice this seems to mean that other scientists can test the statements. This means “there can be no ultimate statements in science”, which I am satisfied with.
Models, inc.
November 30th, 2011 § Leave a Comment
(riffing on this course announcement)
There are oceans of data out there. The human ability to think in a million dimensions is limited, and that’s where models come in. All of statistics could, if you wished, be reframed in terms of models. An average is just the result of a model where every individual is assigned the same value. This is reductionist, of course, and we strive for a useful combination of simplicity and accuracy.
But all models are wrong (otherwise they’re not models), and we need to know how wrong they are. This is where statistics comes into its own — answering the question “how wrong”. When we’re modelling a particular data set, we can state exactly how wrong by calculating residuals. Usually, however, we also want to simply how wrong. So we have measures like the standard deviation and the root mean squared error. It can also be useful to examine how accurate the RMS error represents the residuals, but quantifying this can easily lead to a sinkhole.
Assessing the accuracy of predictions made by a model is a different matter. Consider the case where you have data from a nice stationary process. The most reliable way of dealing with this is splitting data into training and test sets, though there are shortcuts that may work well in the right circumstances.
In many cases, the data aren’t so nice. This is where you need to be very careful about how much you trust not just your models, but also the models underlying your error assessments. It’s not enough, for instance, to select between a null and an alternative model if neither is particularly close to the truth. Subject matter knowledge is crucial here. Statisticians should do a better job of helping them out.
Conclusion: The value of models seems self-evident. If I were teaching a course on models, I would be tempted for it to consist entirely of repetitions of “all models are wrong”. Would have to think hard about how to make it more constructive than that.
Guest post: BAYESBOT 3000 explains Bayesianism to me and Less Wrong
November 27th, 2011 § Leave a Comment
Greetings, humanoids! I am BAYESBOT 3000, the Bayesian robot. I am here to discuss some ideas that humanoids hold about Bayesians.
Specifically, here are what are claimed by humanoids of the website “Less Wrong” to be core tenets of Bayesianism:
Core tenet 1: Any given observation has many different possible causes.
Core tenet 2: How we interpret any event, and the new information we get from anything, depends on information we already had.
Core tenet 3: We can use the concept of probability to measure our subjective belief in something. Furthermore, we can apply the mathematical laws regarding probability to choosing between different beliefs. If we want our beliefs to be correct, we must do so.
Core tenet 1 is trivially true. In fact, it could be strengthened by deleting “possible” or changing “many” to “an infinite number of”, though the latter may be unwise, as humanoids have difficulty with the concept of infinity.
Core tenet 2 is either also trivially true, or meaningless. A humanoid’s evaluation of an event will depend on the knowledge of that humanoid. A BAYESBOT switched on for the first time will evaluate events based on its programming, which depends on the knowledge of humanoid programmers.
Core tenet 3 comprises three different tenets. Humanoids and BAYESBOTs can use probability to measure belief, just as they could use cubits to measure the length of a manatee. They could use probability to choose between beliefs: for instance, by rolling a die. Where BAYESBOT has a problem is with “If we want our beliefs to be correct, we must do so.” Firstly, BAYESBOT robo-LOLs at the idea of humanoids having correct beliefs. Secondly, if humanoids wish to be, as the website’s name says, less wrong, in the long-run BAYESBOTS and their friendly rivals FREQUENTOBOTS both achieve this. Humanoids are compost in the long-run, so they may be interested in the short-fun instead. There is no guarantee that a BAYESBOT beats a FREQUENTOBOT, or vice versa, on any time scale. The more important matter is BAYESBOTS and FREQUENTOBOTS use inputs efficiently. But humanoids experience a wider range of inputs than we bots. It is not clear to bots or humanoids how to mathematically combine observations of the Sun with Newtonian physics to arrive at a probability that the Sun will rise tomorrow. Using all inputs, it is impossible to arrive at an uncontroversial probability that anthropogenic global warming has occurred. BAYESBOT differs from STRICT FREQUENTOBOT in that BAYESBOT will calculate a probability for this hypothesis given a prior and a set of data. However, the prior will not be perfectly specified, so it is up to humanoids to decide how literally to take such probabilities. BAYESBOTS take such probabilities literally if and only if they are programmed to.
BAYESBOT’s empiricism is as good as its programming. How good that is best determined through the empiricism of those other than BAYESBOT.
Evaluating hypotheses: We are all Bayesians now (for appropriate definitions of “Bayesians”)
November 24th, 2011 § Leave a Comment
(lifted and edited from Phil Birnbaum’s comment section)
You should consider all relevant evidence when evaluating hypotheses. This seems an uncontroversial statement, even among journal editors. Is this necessarily Bayesian? Depends on one’s definition of Bayesianism, but to me the term implies something quantitatives: the use of Bayes’ theorem. If we consider any argument that goes outside the data Bayesian, the term seems too broad to be useful. In particular, if “Bayesianism” is used as an umbrella for any use of subjectivity, well, philosophers have been pointing that out for centuries that science can’t be entirely subjective. It’s necessary, however, to make clear what’s objective and what isn’t; for scientists to use subjective priors (which, to be clear, few Bayesians endorse) obfuscates the difference. On the other hand, I’m totally on board with broadening the definition of “evidence”, though informal evidence should be used informally.
One thing that may or may not be relevant is it doesn’t matter what order you do the conditioning in. That is, in theory summarising all available evidence in a prior and then adjusting for the result of a new experiment gives the same posterior as starting with the experiment result then adjusting for all other evidence. Since there’s rarely an objective prior, you should post all the data and let anyone who wants to update their posterior do so. In practice, humans have all kinds of cognitive biases, not to mention they’re generally not great at integration. You should post the data, but you should help your readers out by providing informative and honest summaries of the data. Hypothesis tests can be nice, but graphs are often more useful.
Out-of-sample validation with 16 data points
November 21st, 2011 § Leave a Comment
It’s worth a try. Let me say that I think it’s a useful thing to do before I start nitpicking: the prediction interval widths are amusing. But:
- If you only allow data prior to an election to be used when fitting a model, you ending using 1948 data to inform parameter estimates a lot, and 2008 data not at all. But if you’re trying to guess how well a model will do in 2012, isn’t this the opposite of what you want?
- Preferring the model with the lowest prediction error isn’t necessarily the right thing to do: it rewards overfitting. All the models are designed after looking at past data. So even the test only allows past data in the estimation, it’s not entirely prospective because the variables have been chosen to give a good fit both in the past and in the future. You could fit a large set of high-order polynomials that give almost no prediction error according to this test, and its prediction for 2012 would be garbage. Parsimony still matters.