III. Popper's Falsificationist Theory of Science Noretta Koertge
Let us now turn from Popper's general account of the growth of knowledge through problem solving to his earlier and better known falsificationist theory of science. In his intellectual autobiography (Unended Quest) and in Chapter 1 of Conjectures and Refutations describes the demarcation problem which his theory is intended to solve. But like any interesting theoretical solution it solves other problems as well and I will present it as a comprehensive methodology and not focus on the demarcation question. Also I will not raise the historical question of how Popper's theory differs from other hypothetic-deductive accounts, such as that proposed by Whewell's Novum Organon Renovatum.
a. Scientific Problems
As we have seen, according to Popper, no inquiry begins in a vacuum. Regardless of what the topic may be, the scientist, like all of us, begins with a motley collection of ideas, some clear, some confused, some true, some false. Puzzlement arises when there are inconsistencies or gaps within existing bodies of knowledge. but how are scientific problems different from those of ordinary life? Or are they different? Let us begin by surveying some typical kinds of scientific problems and then we will comment on their special characteristics.
(i) Problems arising from violated expectations. A common sort of scientific problem arises when something surprising or unexpected occurs and we wonder how or why it happened.
An important problem for early astronomers was the following: In general, celestial bodies, such as the sun, moon and stars, move across the sky in smooth arcs. However, it was discovered that the planets wander around the sky irregularly. Can one describe precisely how the planets move and explain why they move differently from the other heavenly bodies? Plato called this the problem of the planets. Ptolemy, Copernicus, and Kepler each offered a different solution to it.
Here is another example of a scientific problem caused by violated expectations: In 2896 Becquerel found that a batch of photographic plates which had been carefully stored in black paper were fogged. According to the best scientific knowledge available at the time, only visible light or x-rays could expose photographic plates. What could have happened? Becquerel finally began to suspect that the fogging was caused by an unusual rock he had used as a paper weight. And it was thus that he discovered radioactivity. Later Madame Curie showed that the rock contained radium.
(ii) Problems arising from a quest for deep explanations. Even if the scientist is lucky enough to discover a generalization which seems to have no exceptions, he or she is still faced with a problem: What causes the regularity? Why do things happen just that way?
For example, early astronomers asked why the sun rose every day in the east. Some said it was because the sun moved in a circle around the earth. Later this geocentric theory was replaced with a heliocentric theory. In either case, a further question arose: What caused the sun (or earth) to move? According to Aristotle, there was a Prime Moved. Later people suggested a law of circular inertia, saying a wheel would move forever if there were no friction. Newton explained the regular motion in terms of linear inertia and the force of gravity.
There are many other cases in which the problem is to explain a regularity. Bohr wondered why the wavelengths of the spectral lines of hydrogen should fit the simple mathematical formula discovered by Balmer. Mendeleev and other chemists of the late 19th century wondered why the elements should arrange themselves so nicely into a Periodic Table.
By the end of the 18th century, after the work of Boyle and Charles, everyone knew that gases expanded on heating. But why? Caloric theorists said that heat was a fluid which flowed into gases and as a result they took up more room. Kinetic theorists said heat was kinetic energy and hot gases expanded because their molecules moved faster. Both sides agreed on the regularity to be explained, but they offered competing explanations of it.
(iii) Problems arising from a quest for unity. As a science develops, a new sort of problem often arises: Can one find a unified theory which covers two or more domains which have previously been treated separately?
For example, for a long time organic chemistry (which deals primarily with covalent compounds) and inorganic chemistry (which is mainly concerned with ionic compounds) were considered to be quite distinct fields. At this time people believed that naturally occurring organic compounds, such as urea, could not be synthesized in the laboratory because they contained a vital life force. However, today's theories of chemical bonding apply equally well to inorganic and organic materials.
Before Galileo, it was held that terrestrial bodies and celestial bodies obeyed different laws. Galileo (and later Newton) gave a unified account of the motions of all bodies.
A pressing problem in physics today is the search for a unified field theory--a theory which would successfully combine relatively theory and quantum mechanics. Psychologists are looking for a unified theory of learning. Behaviorists can account for some kinds of learning; cognitive psychology provides explanations for other types of learning. But one would like to find a single theory which covers all instances of learning.
(iv) Problems of conflict between theories. Often, problems of finding a unifying explanation are exacerbated because of inconsistencies between the component theories. And contradictions can also arise between theories which appear to cover quite different domains. For example, the biggest objection to Copernicus' astronomical theory was its conflict with Aristotelian physics, according to which nothing could continue to move without a mover. And a strong contemporary objection to Darwin's theory of biological evolution was Kelvin's geophysical calculation of the age of the earth. (It turned out later that Kelvin's thermal estimates were wrong because they did not include the heat generated by radioactive decay.)
Each of the four types of scientific problems discussed above arises out of a rich background of information and expectations. New scientific theories are invented when scientists are faced with a problem: Why did my old theory or set of unconscious expectations fail? What causes this regularity which I have observed? Can I unify these two branches of science? Or resolve the inconsistencies between them?
None of the problem types are unique to science. Myth-makers are looking for deep explanations and try to give unified pictures of the world we live in. And many of our practical problems of existence arise because the common-sense generalizations we make about the world, including other people, are violated.
But although there is no sharp demarcation of scientific problems, there are some obvious differences in degree. In a well developed scientific field, problems arise within a body of knowledge which is generally more extensive, more detailed, and better systematized than that of other domains. (This is not always the case - both folk mythologies or craft technical lore may be of comparable sophistication.) Furthermore, the scientific tradition for the most part actively rewards people who expose contradictions or gaps within the body of science. Folklore and religious systems, by contrast, are often embedded within conservative institutions which discourage criticism or revision of the traditional beliefs.
To summarize, to the extent to which scientific knowledge is well-articulated it is relatively easy to discover flaws in it, and scientific traditions encourage us to take these problems seriously.
b. Scientific Theories
We have described the various sorts of problems which trigger scientific inquiry. Our next task is to characterize the sorts of problem solutions which count as scientific. This is the core of the demarcation problem with which Popper began.
However, let me digress a moment to point out that we have skipped over the process by which these tentative solutions are dreamt up in the first place, namely, the problem of whether there is a logic of discovery.
Early philosophers were optimistic about the prospects of describing a method for discovering true theories. Bacon and other inductivists thought that through careful observation and systematic use of his tables one could easily arrive at the solution to scientific problems. Descartes and other rationalists thought that a systematic analysis of our clear and distinct ideas would provide the answers.
Popper argues that there is no recipe for discovery. All the scientist can do is guess at the answer. Some conjectures will be "happy guesses" as Whewell described them; others will turn out to be dead wrong. It's all a matter of trial and error. In biological evolutions mutations occur by chance--we can't predict what new variations will occur. But natural selection will filter out those who are not adapted to the environment. Likewise for science. People make up all sorts of crazy hypotheses. But tests will weed out those which do not match reality. Quality control is insured by careful testing procedures, not by censorship of new ideas. The pattern of reasoning which leads to a new hypothesis is not important--it may be based on dreams, mystical experiences, weak analogies or what have you. The origins of the idea are irrelevant; what is crucial is how well the scientist's hunch stands up to testing. But not all speculative theories are even capable of being tested. So let us now return to the question of what makes a conjecture scientific.
As our account so far makes clear, the solutions to problems which scientists propose start out being mere hypotheses or conjectures. When they are first proposed, we have no particular reason to believe them true. Furthermore, these hypotheses tend to be rather bold and far-reaching. This is because typical scientific problems all require as solutions theories of high content. Consider Problem Type 1: To explain why our expectations are violated, we need a theory which accounts both for the exceptions and the normal states of affairs we had expected. For example, a good answer to the problem of the planets' irregular motions would also explain the sun's regular motion.
To turn to Problem Type 2: Trying to give a deep explanation of a regularity (such as the Balmer formula for hydrogen spectral lines) generally results in a conjecture which has many other consequences as well (such as a formula for the spectral lines of sodium). As for Problem Type 3, it is clear that a unified theory will have more content than either of the separate fields. And generally such a theory will have lots of new consequences as well. (For example, the unified theory of chemical bonding covered not only traditional organic and inorganic compounds, but a whole new domain of organic-metallic compounds, such as hemoglobin.)
Although they are bold conjectures, Popper argues that conjectures do have one very important property in their favor: they can be tested by means of experiments. If one of our conjectures is false, it is realistic to hope that we will eventually discover its erroneous nature.
Let us now discuss the precise requirements that a theory must satisfy in order to be falsifiable.
(i) The Logical Requirement. Statements of the form "Some A's are B's" cannot be refuted by any report involving a finite number of instances, but universal generalizations, be they affirmative or negative, can be.
A necessary condition for a theory to be falsifiable is that it be logically possible to contradict it by a finite conjunction of sentences which describe particular instances.
Popper used the logical requirement to argue for the unfalsifiable status of many Marxist doctrines. Statements about the "inevitability" of the downfall of capitalism fail the logical requirement if no time limit is given. "Light has a maximum velocity" also fails unless a value is specified.
Many claims which at first appear to be universal generalizations also fail. For example, "Every metal has a melting point" or "every action is rational" may be better analyzed as what Watkins called "all-some" statements, i.e. as saying that for every metal there is some temperature above which it will melt, and for every action, there is some description of the agent's problem situation such that the action was appropriate to it.
On the other hand, the claim "some copper is brittle" looks like it is not open to refutation by a finite observation report; however, if it is accompanied by a recipe, "To make copper brittle, place a thin sheet of it for three days in a nuclear reactor where the neutron flux is..." it becomes testable.
(ii) The Empirical Requirement. Having the proper logical form is not sufficient to insure that a hypothesis is scientifically testable. "All repressions are seated in the libido" satisfies the logical requirement but, as it stands, it is not subject to experimental test. How exactly are we to recognize a repression And even if we could, how could we tell whether or not it is seated in the libido?
Contrast the following sentence which has the same logical form:
"All samples of iron have a melting point less then 2000_ C."
This universal generalization is subject to test. We can easily determine whether a sample is iron or not through chemical analysis. (We might use the potassium thiocyanate test, for example.) And there are also a variety of reliable procedures for measuring melting points.
The contrast in the above two cases suggests the following requirement:
A falsifiable theory is one which is inconsistent with at least one finite conjunction of observation test reports. Popper's discussion of test reports, or 'basic' statements, as he called them in the Logic of Scientific Discovery, is traditional in many respects: they describe observable events occurring in an individual region of space and time (p. 103); they are inter-subjectively testable, i.e. they describe experimental arrangements in such a way that anyone who has learned the relevant technique can check on their validity (p. 99).
But Popper departs from the logical positivist or other standard empiricist accounts by not claiming that the 'basic' statements are infallible, nor are they picked out by any psychological criteria. The store of 'basic' statements and hence whether or not a theory is testable depends on the technology and state of scientific development available at the time. Before the invention of the mass spectrograph, "All atoms of an element have the same weight" would not have been considered testable because as yet there was no way to determine the weights of individual atoms. What counts as an observation sentence also changes with the development of instrumentation and with new theoretical developments. For modern scientists, "This sample is oxygen" and "This is an electron track" are considered to be observation statements. In an earlier era they would not have been. "This sample is a gas which supports combustion" and "This track is a cloud chamber curves towards the positive plate" might have been used instead, if the identity of the gas or of the particle was still in question. The truth of observation statements cannot be decided with certainty; even so, members of the scientific community can tentatively agree in their judgments about the truth of observation statements.
Although Popper originally proposed his falsifiability doctrine as a demarcation between science and pseudo-science, it is probably wiser to view it as a regulative principle to guide the development of good scientific theories, not as a sharp criterion. We can increase the degree of falsifiability of a conjecture by increasing the domain of phenomena to which it applies, by making more precise the descriptive claims about the domain, and by inventing less and less controversial observational procedures for evaluating those claims. More important then the question of whether Freud's theory has any potential falsifiers whatsoever is the question of how we might increase its degree of falsifiability, either by making its claims more precise or by using detection methods such as plethysmography for detecting patterns of sexual arousal instead of relying solely on dreams or other traditional psychoanalytic techniques.
c. Scientific Testing
In his account of the empirical appraisal of scientific theories, Popper once again inverts the positivists' rhetoric. Rather than trying to collect data which will confirm our conjectures, we should instead conduct those tests which seem most likely to refute them.
Popper's central point is nicely illustrated by an anecdote recounted by Francis Bacon:
...it was a good answer that was made by one who, when they showed him hanging in a temple a picture of those who had paid their vows as having escaped shipwreck, and would have him say whether he did not now acknowledge the power of the gods--"Aye," asked he again, "but where are they painted that were drowned after their vows?" And such is the way of all superstition...(The New Organon, BK I, Aphorism LXVI.)
It is obvious that Bacon is criticizing the way data is being used to argue for the "power of the gods." But we need to spell out the objection in detail.
First of all, what exactly is the claim about the power of the gods which is under discussion? It would appear that the basic thesis which can be directly tested is the following: "If one makes a vow during a storm at sea, then one will survive." We can abbreviate the conjecture as: "If V, then S."* The proposed method for collecting data which will either support or refute the conjecture is as follows: Go to churches and record instances of people who paid their vows as thanks for having escaped drowning. Using our abbreviations, we can describe the instances so collected as cases of V and S.
At first glance, it may appear that these data do indeed tend to confirm the conjecture because they are positive instances of the generalization. But let us look more carefully. What kind of evidence would refute the conjecture? The answer is a case of someone who made a solemn vow, but drowned at sea nevertheless.
*This is probably somewhat over simplified. The proponents of the power-or-the-gods theory may have only wished to defend a weaker claim: "If one prays, one is less likely to be drowned." We will postpone the discussion of the testing of probabilistic generalizations until later, i.e., a case of V and not-S. But given our method of collecting data, it is logically impossible that we would ever find such a refuting instance. By looking only at pictures of survivors (i.e., unless it is logically possible that there could have been another cases of S) we will never come across an instance of V and not-S, even if there be millions of such cases. One of the basic principles of scientific testing can be stated roughly as follows: The outcome of a certain test procedure cannot confirm a theory outcome which would have disconfirmed the theory.
In order to test "If V, then S", we should sample the domain of V and find out whether any of them drowned. As Bacon says, "Where are they painted that were drowned after their vows?" In addition, we should also look at examples of people who in fact drowned and find out if any of them had made vows. (This might be difficult to do in practice, but we could check their diaries, ask their mates, etc.) It is useless to look at cases already known to be S or not-V. Such "tests" are irrelevant to the conjecture under consideration because it is logically impossible that they could ever yield a refuting case.
We might label the procedure described by Bacon as "no--risk data collecting" because the way in which the data is collected makes it logically impossible for a refutation to appear. Once pointed out, the methodological error is blatant; nevertheless it can be seductive.
For example, after teaching scientific method for a number of years, I once caught myself reasoning as follows: I observed that all of my close friends who blinked a lot and tipped their heads back when looking at me wore contact lenses. I then started investigating other people who behaved similarly and sure enough I nearly always found independent evidence that they were wearing contacts. Sometimes I asked them. Other times I would see a lens holder in their purse or bathroom, etc. I soon jumped to the following conclusion:
"All people who wear contact lenses blink a lot and peer down their noses when they look at you."
This conclusion was obviously too strong, given that I had done only an informal study on a very small sample. But I did think that my experience justified a more modest statement:
"All contact lens wearer whom I have met blink a lot, etc."
What was not clear to me for quite some time is that none of my observations had served as a test for either conjecture. For I had always begun my observations with people who blinked! Given this choice of sample domain, I could have investigated all the blinkers and peerers in the world and never found a counter-example to my conjecture--not because there weren't any, but simply because it was logically impossible for my method of data collection to uncover them.
Popper adds to Bacon's point by stressing that good scientific tests should be severe ones, that is they should be deliberately designed, using our general background knowledge to probe the conjecture at its weakest point, i.e., to find a refutation if one does in fact exist.
For example, when Kohlberg put forward a theory about the development of moral reasoning in children, he was well advised to test it on children from Turkey and Taiwan. We might expect a theory developed on the basis of experience with kids in Boston to fail when applied to children from quite different cultures and religions. (As it turned out, the Kohlberg theory passed this severe test.)
Similarly, theories about the universality of the Oedipal complex should be tested on aborigines, and theories about language learning on deaf and blind children. Theories about geological change and biological evolution should be tested, where possible, by data from other planets. Physicists know that theories often fail under conditions of high energy or high velocity; and often processes at the micro level violate generalizations which work well with medium-sized objects. For this reason physicists want to build ever bigger accelerators for smaller and smaller particles.
The general procedure for designing a sever test is as follows: The hypothesis under test always makes a series of claims. For example, the claim "All arsenic compounds are poisonous" says that both soluble and insoluble arsenic compounds are poisonous. It also says that both yellow and green non-poisonous substances are free of arsenic. (Don't forget the contrapositive!)
According to our background information, some of these claims sound less plausible than others. For example, since we know that many poisons have to be digested in order to act, we may decide that insoluble arsenic compounds are less likely to be poisonous than soluble ones.
A severe test is one which tests the least plausible claims of a theory. In our example, given our background theories about the relationship between solubility and poisonous character, we should start testing by looking at insoluble arsenic compounds. If the conjecture passes this severe test, we will then look at the class of soluble arsenic compounds. Other things being equal, severe tests, i.e., tests of the least plausible claims of a conjecture, are more stringent than less severe ones.
Note that our appraisal of the severity of tests depends on the background information available at the time. Consider the two claims: (a) "All yellow non-poisonous substances are free of arsenic" and (b) "All green non-poisonous substances are free of arsenic." Which domain should be investigated first if one wishes to perform a severe test of the original conjecture? Recall that counter-example to the original conjecture would be a non-poisonous arsenic compound. So if we think green substances are more likely to contain arsenic than yellow ones, we should sample the domain of non-poisonous green substances. If we know nothing about the typical color of arsenic compounds, however, or if we have reason to believe that color is not correlated to chemical composition, we would judge the tests to be equally severe. (As a matter of fact, many arsenic materials are yellow or black, so there may be a slight preference for a test of yellow non-poisonous substances.)
A special case of severe testing is what Bacon called a "crucial experiment." Here one probes the vulnerability of a hypothesis by comparing its predictions with those of a plausible rival conjecture. If hypothesis A predicts P and rival hypothesis B predicts not-P, checking on whether P or not-P is the case will allow us immediately to eliminate one alternative. Contrary to what its name may imply, a crucial experiment does not prove the truth of the undefeated hypothesis because there may exist more alternatives which we have not yet thought of.
For example, according to the Copernican theory, Venus should wax and wane like the moon. The Ptolemaic system, on the other hand, predicted that Venus should not exhibit extremely different phases at different times. This conflict between the predictions of the rival cosmological systems was noted by Copernicus in 1543. However, it was not possible to conduct a crucial experiment without a telescope. In 1610, Galileo observed that Venus did have phases and so the Ptolemaic system was refuted.
This crucial experiment in no way established the truth of the Copernican heliocentric theory for in 1588 Tycho Brahe had proposed a geocentric system which also gave the correct predictions concerning Venus. The next order of business was to design a crucial experiment between the Tychonic and Copernican system.
Crucial tests are only stringent when the rival hypothesis is a fairly plausible one (as judged against background knowledge). The more plausible the rival conjecture to the hypothesis in question, the more stringent is a crucial test between them. For example, no one would have thought it necessary to design a crucial test if the only rival were an ad hoc hypothesis to the effect that Venus shone by its own light but periodically varied its luminous area from crescent shaped to circular!
Checking on the truth of the lease plausible consequences of a conjecture is the most efficient way of trying to falsify it, and hence Popper recommends tests with samples which are in a sense biased against the conjecture! How can this be reconciled with the standard statistical practices of using random samples or stratified samples? Or can it be?
To develop a full-fledged Popperian approach to statistics is beyond the scope of this course, but I will make a few preliminary remarks. First of all, many statistical studies are not really tests at all, but simply demographic measurements. If Kinsey wishes to make descriptive claims about overall American sexual practices, clearly a non-biased sample is desirable. However, if one is testing the claim that the half-life of radium is always 1600 years or that the M/F ratio of neonates is always 0.51 (regardless of conditions), then it makes sense to focus our inquiry on samples of radium or births in extraordinary circumstances, namely those which on our background knowledge are most likely to violate the general claim.
In the case of evaluating causal claims by means of controlled tests, the Popperian approach once more exhorts us to put most effort into controlling for those factors which are most likely to be alternatives to the causes described by our hypothesis. Of course, since our background hunches about the weaknesses of our conjectures are always fallible, our assessments of the severity of a test are also fallible and this is a good reason for eventually performing a wide variety of tests whether they appear to be severe or not.
d. The Ambiguity of Falsification
As described so far, the logic of testing is simple and clear-cut: (1) We derive a prediction from our conjecture which can be subjected to experimental check. (2) We do the experiment. (3) If the prediction is wrong, the theory is refuted. Period. Or so it would seem. In the typical scientific case, however, the situation is more complicated and the decision as to exactly which premise is to be given up is less straightforward.
Let us illustrate the problem with a famous scientific example, the case of stellar parallax. After Copernicus put forward his theory that the earth revolved around the sun, astronomers noted that if his theory were true, one should be able to detect stellar parallax. If one is moving with respect to an object, then the direction in which the object appears changes. This phenomenon is known as parallax. As a race driver moves past the pit stop, at first it is ahead of him/her. Later it is behind. The angle a in the diagram below is called the angle of parallax.
A similar diagram could be used to illustrate Copernicus' theory of the earth's annual movement with respect to a particular star. But when 17th-century observers looked for stellar parallax, they couldn't detect any. Didn't this mean the theory was false? The supporters of Copernicus' theory decided to blame an auxiliary assumption instead. Their argument can be illustrated with the race-car analogy. Suppose the driver sights on a distant radio tower instead of on the pit stop. Now the angle of parallax may become too small to be easily noticeable. As the radio of D to R increases, a gets smaller. At very large valued of D it will become to small to detect.
According to estimates of the distance between the earth and the stars available at the time, stellar parallax should have been observable. But the Copernicans argued that these estimates were wrong and claimed that the universe was about 1,000 times bigger than had previously been imagined. This bold move turned out to be correct, but 200 years passed before stellar parallax was detected experimentally.
The logic of the testing situation was as follows:
Copernican theory: The earth revolves around the sun, which is stationary relative to the stars.
Auxiliary hypothesis: The distance between the earth and the stars is about 20,000 earth radii.
Experimental Prediction: (Therefore) Stellar parallax should be easily observable with the apparatus available.
Experimental Finding: No stellar parallax is observable with the available apparatus.
Since the prediction failed, one of the premises had to be wrong. Copernicus blamed the auxiliary hypothesis; anti-Copernicans defended it and blamed the theory instead. With no good way at the time to test the auxiliary hypothesis, the status of the Copernican theory was left open.
The philosopher who first stressed that almost all tests involve a lot of auxiliary assumptions was Pierre Duhem, an early 20th-century philosopher, physicist, and historian of science. Hence, we will call the following the Duhemian problem: When an experimental prediction turns out to be false, should the scientist blame the theory under test or the auxiliary assumptions (or both)?
There is no simple solution to the Duhemian problem, but a few guidelines can be laid down. First of all, one should not use the Duhemian problem as a general excuse for one's pet theory. It is not good methodology to say, "My theory's prediction failed? Well, not to worry. I probably made a false auxiliary assumption somewhere along the line." If one wants to keep the theory despite the prediction failure, one must point to a specific auxiliary assumption and then design tests of that auxiliary assumption. If the auxiliary assumption passes the tests, then we should conclude that our theory and not the auxiliary was false.
Sometimes, however, it is not possible nor practical to test auxiliary hypotheses. (We saw an example of this in the Copernican case.) In such instances, we can draw no firm conclusions about the original test situation. If a theory in conjunction with a variety of auxiliary assumptions makes a lot of false experimental predictions, though, we tend to decide that the theory is false, even though we can't conclusively test each auxiliary.
In the history of science, it is fairly rare to find a case where a theory is refuted by a single, decisive experiment. More often theories come to be rejected through a variety of prediction failures. Theories are rarely struck down by a blow from one type of crucial experiment, no matter how many times that experiment is repeated. Rather they are eroded away by an accumulation of anomalous results.
The Dehumian problem situation can be analyzed as follows:
The theory under test (T) when conjoined with one or more auxiliary hypotheses (A) makes a prediction (p).
Experiments show that p is not the case. By modus tollens we know that either T or A (or both) must be false, but logic doesn't tell us which.
(T & A) - > p
~p
(Therefore) ~T, or ~A, or ~T & ~A
Note that in the pure Duhemian problem situation there is no controversy about the experimental result, ~p. Furthermore, all parties agree that T & A imply p. The disagreement arises about whether to revise A or to revise T.
Of course, there are also cases in which people cannot agree on experimental results or on what exactly the implications of the theory are. These latter disagreements can usually be settled either through further experimentation or by means of logical analysis. The Duhemian problem is often more recalcitrant.
e. The Status of Corroborated Theories
We have discussed what happens when our theory's prediction is refuted--either we revise it or adjust an auxiliary hypothesis. What happens if our theory passes the most severe experimental tests we can devise with flying colors? Can we then declare it proven true, or at least highly probable? It is perhaps on this issue that Popper's disagreement with the positivists is deepest.
First of all the history of science strongly suggests that we should never feel completely certain about any scientific generalization, no matter how frequently or stringently it has been tested. Newton's theory of classical mechanics had perhaps the best track record ever; yet it was superceded by Einstein's relativistic mechanics. Here are a few other examples of well-established claims which eventually had to be corrected or rejected:
(i) Matter cannot be created or destroyed. (Not true in nuclear fission or fusion processes.)
(ii) The sun rises once every twenty-four hours. (Not true at the North Pole.)
(iii) All molecules of water are made of the same stuff. (Not true for heavy water, deuterium oxide.)
(iv) The major difference between homo sapiens and the lower animals is that man can use language. (Not true for chimpanzees which can use sign language.)
(v) Living matter can only come from living matter; it cannot be formed from inanimate substances. (Not true--amino acids can be synthesized from ammonia, methane, hydrogen, etc.)
So the history of science warns us that any scientific claim is fallible. Logic and philosophy of science can help us understand why this is so. Here are some of the reasons:
(i) Generalizations cover a potential infinity of cases. But we can only check on a finite number of predictions. We can never be sure that the next case won't violate the rule (e.g., a black swan may turn up in Australia).
(ii) Scientific theories make infinitely precise claims. But we can only make measurements of finite accuracy. (For example, Newton's law of gravitation says the force of gravity varies inversely with the square of the distance, i.e., the exponent is r2.00000...but our measurements cannot discriminate between r2 and r2.0000000001.)
(iii) Many of our scientific laws only hold under idealized conditions--to give two very simple examples, the law of the lever assumed no friction at the fulcrum, and the law of the pendulum assumes there is no air resistance. Of course, we can try to minimize such interferences when we conduct tests, e.g. but resting our lever on a point or setting up a pendulum in a vacuum, but our experiments never achieve the perfect conditions which are assumed in our ideal laws.
(iv) There may be alternative theories which we have not even dreamt of yet which account for all of the data we have in hand.
For all these reasons, theories are underdetermined by our observational results and can never be proved through any amount of observation and experiments. There are no rules for deciding when to accept a theory (for the time being) and move on to new problems, but what we can do is to answer each of the above sources of fallibility as best we can.
(i) By testing in widely scattered domains, we guard ourselves against parochialism, e.g., the black swans in Australia.
(ii) By making our tests as precise and ideal as possible, we can approach the infinite precision and perfection of our theories.
(iii) And the best way to rule out alternative explanations is to deliberately try to imagine radically different ways of explaining our results. If we can devise a new alternative, we can then set up a crucial experiment between the two competing accounts.
But what is the exact epistemological status of theories which have survived critical scrutiny? What positive claims can we make about them? Popper introduced the term corroboration to describe the severity of the tests passed by a hypothesis, but he emphatically denies that the degree of corroboration is to be interpreted as a degree of reasonable belief in the hypothesis or the probability that it is true. However, he does say that for purposes of practical action, it is rational to base our behavior on our most highly corroborated theories.
And for purposes of scientific inquiry we should use the degree of corroboration of various claims as guides to criticism and revision of our scientific systems. The Duhemian problem would become completely intractable if we had no way of at least tentatively assigning the blame for prediction failures. And the whole mechanism of falsification rests on the existence of 'basic' statements, i.e., statements which all observers can test and presumably corroborate for themselves.