The Chronicle of Higher Education’s report on a leaked memo in Harvard’s misconduct investigation of Marc Hauser paints an ugly picture. If the allegations in the memo are accurate, it appears Hauser may have fabricated data or, at best, repeatedly defended a nasty and unnecessary case of coding bias. And unless I’m missing something, it appears he was working with a sketchy experimental design may have strayed from the study design in a way that put him, wearing slick-soled shoes, on a very steep and slippery slope.
[Note: important update at bottom. It will make more sense after you’ve read the rest; but you should make sure you read it too.]
I excerpt from the Chronicle story at length because of this point I want to make about technique.
An internal document … sheds light on what was going on in Mr. Hauser’s lab.… A copy of the document was provided to The Chronicle by a former research assistant in the lab who has since left psychology. The document is the statement he gave to Harvard investigators in 2007.
The former research assistant, who provided the document on condition of anonymity, said his motivation in coming forward was to make it clear that it was solely Mr. Hauser who was responsible for the problems he observed. The former research assistant also hoped that more information might help other researchers make sense of the allegations.
That’s the context, and good for CHE for providing it. It’s important to note this is just one source so far. This is quite a damning account but needs corroboration. Yet it should certainly be published, if for no other reason than to push Harvard to release more specifics.
The specifics offered here, meanwhile, portray a corruption of what can be a marvelously rigorous experimental approach. Again, at length, because it’s all important:
It was one experiment in particular that led members of Mr. Hauser’s lab to become suspicious of his research and, in the end, to report their concerns about the professor to Harvard administrators.
The experiment tested the ability of rhesus monkeys to recognize sound patterns. Researchers played a series of three tones (in a pattern like A-B-A) over a sound system. After establishing the pattern, they would vary it (for instance, A-B-B) and see whether the monkeys were aware of the change. If a monkey looked at the speaker, this was taken as an indication that a difference was noticed.
The method has been used in experiments on primates and human infants. Mr. Hauser has long worked on studies that seemed to show that primates, like rhesus monkeys or cotton-top tamarins, can recognize patterns as well as human infants do. Such pattern recognition is thought to be a component of language acquisition.
Researchers watched videotapes of the experiments and “coded” the results, meaning that they wrote down how the monkeys reacted. As was common practice, two researchers independently coded the results so that their findings could later be compared to eliminate errors or bias.
According to the document that was provided to The Chronicle, the experiment in question was coded by Mr. Hauser and a research assistant in his laboratory. A second research assistant was asked by Mr. Hauser to analyze the results. When the second research assistant analyzed the first research assistant’s codes, he found that the monkeys didn’t seem to notice the change in pattern. In fact, they looked at the speaker more often when the pattern was the same. In other words, the experiment was a bust.
But Mr. Hauser’s coding showed something else entirely: He found that the monkeys did notice the change in pattern—and, according to his numbers, the results were statistically significant. If his coding was right, the experiment was a big success.
It gets worse. Reportedly the second research assistant suggested, rather sensibly, that a third researcher score the results — and Hauser reportedly resisted, repeatedly, in an email exchange that is said to be part of the record in the Harvard investigation. From the Chronicle story:
“i am getting a bit pissed here,” Mr. Hauser wrote in an e-mail to one research assistant. “there were no inconsistencies! let me repeat what happened. i coded everything. then [a research assistant] coded all the trials highlighted in yellow. we only had one trial that didn’t agree. i then mistakenly told [another research assistant] to look at column B when he should have looked at column D. … we need to resolve this because i am not sure why we are going in circles.”
Eventually the research assistant and an equally troubled lab member, a grad student, reviewed and coded the trial themselves. Each coded the monkey’s responses separately — and each got scores matching those of the first assistant, contradicting Hauser’s.
Now comes the part that’s hard to watch:
They then reviewed Mr. Hauser’s coding and, according to the research assistant’s statement, discovered that what he had written down bore little relation to what they had actually observed on the videotapes. He would, for instance, mark that a monkey had turned its head when the monkey didn’t so much as flinch. It wasn’t simply a case of differing interpretations, they believed: His data were just completely wrong.
As word of the problem with the experiment spread, several other lab members revealed they had had similar run-ins with Mr. Hauser, the former research assistant says. This wasn’t the first time something like this had happened. There was, several researchers in the lab believed, a pattern in which Mr. Hauser reported false data and then insisted that it be used.
I think it’s clear to everyone that this looks really bad. If this account is accurate, Hauser either saw things that weren’t there — a spectacular case of expectancy bias — or reported things he did not see. Which latter action is known as data fabrication and a huge sin.
Very troubling. But I wanted to make a point about technique here. If the Chronicle got this right, and if my understanding of these procedures is as correct as I think it is, this memo describes not just bias, but —ouch — a protocol that provides invitations to bias (or fraud) that shouldn’t even exist.
Let me explain. I gained some familiarity with this basic experimental model a few years ago when I profiled Liz Spelke for Scientific American Mind, a wonderful Harvard researcher of infant cognition. Spelke has done beautiful work plumbing the limits of child cognition by using experiments roughly like those Hauser is using here. (She is a co-author with Hauser on some papers, though, as far as I know, not on any under suspicion.) For the profile, talking with her at length and reading many of her papers, I toured her lab and saw some trials done and watched students and assistants code some of the trial videos. And I remember admiring how rigorously she boxed out the possibility of coder bias among those scoring the videos.
As the Chronicle story notes, the core of this experimental model is to expose a monkey or infant to some stimulus, then change the stimulus and see if the subject notices — that is, looks up suddenly, or looks at something longer. As I described it in my piece:
At the heart of Spelke’s method is the observation of “attentional persistence,” the tendency of infants and children to gaze longer at something that is new, surprising, or different. Show a baby a toy bunny over and over again, and the baby will give it a shorter gaze each time. Give the bunny four ears on its tenth appearance, and if the baby looks longer, you know the baby can discern two from four. The method neatly bypasses infants’ deficiencies in speech or directed movement and instead makes the most of the one thing they control well: how long they look at an object.
Elizabeth Spelke did not invent the method of studying attentional persistence; that credit falls to Robert Fantz, a psychologist at Case Western Reserve who in the 1950s and early 1960s discovered that chimps and infants look longer at things they perceive as new, changed, or unexpected. A researcher could thus gauge an infant’s discriminatory and perceptual powers by showing him different, highly controlled scenarios, usually within a stagelike box directly in front of the infant, and observing what changes in the scenarios the infant would perceive as novel.
To do this rigorously, the coder should not know what the subject is being exposed to at any given moment. In Spelke’s lab, for instance, the babies sat on their mom’s laps in a quiet room facing a small table. The stimuli a (patterns of dots, for instance) would be presented on a little curtained stage on the table before them. The webcam filming them, which was over the little stage facing the babies, showed just the babies. It did not show what the babies were watching. (Spelke even had the moms wear blacked-out glasses so they couldn’t see the stimulus and somehow influence the baby’s reaction.)
This meant the coders watching the film saw just the babies and did not know what the babies were watching. They merely noted, within each little trial of a few minutes’ duration, when and for how long the baby’s gaze shifted from left to right, or wandered offstage, or returned to the stimuli.
In the Hauser experiment described in the Chronicle, the equivalent would seem to be to simply watch the monkeys, with no soundtrack playing and no idea what the monkey’s were hearing, and note the points in time when they looked toward the loudspeaker and how long they did so. Only later would you compare those time points against those at which the sound pattern changed. In short the coders should be blind — or deaf, as it were — to the monkey’s stimulus, just as diagnostic coders in drug trials should be blind to which patients get the drug and which placebo. [Note: Later in the day after I posted this, I was informed that the original design protocol did indeed call for such blinding. To what extent or just how that broke down is unclear. See note at bottom for more.]
Yet by the Chronicle’s description, Hauser — and perhaps his other coders as well — knew quite well what the stimuli were, either because he was listening to the soundtrack or knew the patterns so well, having designed them, that he had it in his head when he coded the monkeys’ reactions.
Perhaps I’m missing some constraint here. But there seems to be no good reason that the coder should hear the soundtrack or know when the patterns change — and plenty of reason for the coders not to know these things.
If I’m missing something and someone In the know can lend perspective, please chime. (You can comment below or write me at davidadobbs [at] gmail.com.) I think it’s important to mention this, clarify it as much as possible — partly so we know what went amiss, and partly to protect the more rigorously won gains, and an ingenious, effective, and rigorous experimental model, of a field that is difficult but highly important.
These attentional studies can yield great results when used rigorously. But failing to blind the coders opens a world of temptation that clearly should stay closed.
I’d love to know more. We should know more. Harvard should out the report. Hauser could hardly look worse at this point. And an entire field is taking a horrific beating right now. I’m a little stunned that Harvard doesn’t have a more fluid, open mechanism to deal with cases like this.
NB: The experiment described in the memo mentioned above was never published, but these allegations are obviously relevant
PS: Mind Hacks had a post a couple years ago on Spelke’s work. And Tinker Ready has a post at Nature Networks on what it was like to take an infant to one of Spelke’s trials.
IMPORTANT UPDATE 21 AUG 2010:
Late yesterday, some 12 hours after I published the post above, I was given further information about the protocol in question by someone with knowledge of the it. The person provided credible i.d. but wishes to remain anonymous. The gist of the information is that, as appropriate to good practice, the protocol was originally designed to blind (or deafen) coders to the monkeys’ stimulus, so that the coder would merely observe a monkey in each trial, with the sound off and no knowledge of which pattern was being played, and score the monkey’s changes in behavior.
Obviously this doesn’t jibe with the coding approach that the memo described Hauser himself taking. And the Chronicle’s description leaves it unclear whether other lab members were following a fully blinded protocol during the stretch of time the memo describes. Hard at this point, if not impossible, to account for the discrepancy. Either of the anonymously sourced memos could be erroneous; the Chronicle description might have got some things wrong (easy to do); the protocol may have drifted a bit in the lab, loosening up (a serious problem); and/or the protocol might have been intentionally violated (even more serious problem).
So while the Chronicle memo certainly leaves the impression that Hauser knew the stimuli while he was coding, it never states specifically that was the case (or excerpts the memo with enough detail to know). There’s enough mud in the water to leave some doubt about that.
Does Hauser get the benefit of that doubt in light of the statement Harvard just released? Tough call. I’m not sure we have to or should make that call at this point. It’s not exactly a moot point, because we may be talking the difference between intentional fabrication or not. That’s why it’s important to get the whole record out at some not-too-distant point. I don’t think the information at hand at this point — at least, as far as I’ve seen — gives us enough to judge those most serious questions completely.
Related posts at NC:
Science bloggers diversify the news – w Hauser affair as case study
Watchdogs, sniff this: What investigative science journalism can investigate
More fraud — or more light?
Errors, publishing, and power