A revised computer analysis incorporates suggestions from the embattled climate denier
On Valentine's day, an unknown whistleblower leaked several secret internal documents to the press belonging to the climate denial outfit The Heartland Institute, including a controversial plan to teach antiscience climate denial in a national K12 science curriculum.
Heartland remained surprisingly quiet about the leak for about twenty four hours. Then the next day, the organization issued a statement acknowledging the leak but claiming that one of the leaked documents was a "fake."
This was surprising, since the document, titled "Confidential Memo: 2012 Heartland Climate Strategy," (PDF) simply recapitulates the information contained in much more incriminating detail in Heartland's undisputed Fundraising Plan (PDF). Why all the fuss over the strategy memo? Did Heartland have something to hide?
A clairvoyant claim
In the days that followed, Heartland-aligned blogger Steven Mosher quickly identified the most likely whistleblower as renown climate scientist Peter Gleick. This was a stunningly specific claim and potentially libelous, particularly on the basis of the limited evidence Mosher offered: that the strategy memo, unlike the other documents, had been scanned in; that its metadata indicated a pacific coast time zone; and that it had some stylistic similarities to the way Peter Gleick writes, especially in the use of parentheses and commas - similarities that others noted were also common in the undisputed leaked documents, seemingly invalidating this as criteria for such a bold and public conclusion.
It seemed to me that to make such a certain and specific public claim, which even commenters on Heartland-aligned blogger Anthony Watts' blog later called "near-clearvoyant" [sic], Mosher must have had more solid evidence than he was publicly indicating.
Then Gleick stepped forward and admitted that Mosher was right: he had, in fact, been the person who duped the Heartland Institute into sending him their documents, by posing as a board member.
Wow! Mosher must be a genius! How did he do that!!?
Something rotten in Denmark
In his post indentifying himself as the whistleblower Gleick offered a possible clue. He said someone had mailed the climate strategy memo to him anonymously, and in trying to authenticate its details, he had gone on a fishing expedition in Lake Heartland and landed a lunker - the large cache of far more detailed, incriminating, and undisputed documents.
Could it be that someone aligned with Heartland had sent Gleick the climate strategy memo but didn't expect him to turn the tables on them? Could that be why Mosher and others so confidently fingered Gleick so quickly - because they already knew beyond any doubt? After all, there are probably millions of scanners owned by tree huggers in the pacific time zone. Why immediately, publicly, clairvoyantly, even recklessly pick Gleick, at the risk of a lawsuit? Mosher's explanations didn't add up.
A stylometric computer analysis
Several days ago, Heartland Institute-aligned climate denial blogger Anthony Watts made an interesting suggestion. To prove whether or not the document was authored by Gleick, people could use stylometry and textometry. Watts even helpfully suggested a well-regarded open source java app called JGAAP that purports to do this, and directed people to give it a try.
I decided to take him up on it.
I reported on this experiment some days ago. I used the program to perform analyses of documents written by Peter Gleick, Heartland Staff, and Heartland Institute president Joe Bast, and to compare their writing styles to the allegedly fake climate strategy memo. Surprisingly, the program indicated that of those three options, by far the most likely author of the allegedly forged Heartland climate strategy memo was not Peter Gleick, not Heartland staff, but Heartland Institute president Joe Bast.
Now, before anyone gets too excited, I want to say I am not pulling a Steven Mosher here. The program and my methodology may be subject to flaws. I may have typographical errors in my documents that could influence the results. I may not have chosen the best methods of analysis. The documents I selected may not be a large enough representative sample of the respective writings of the various authors. I may not have chosen a broad enough selection of authors. The program may contain logical or mathematical errors. I would encourage others to attempt to replicate, critique, and perform other analyses.
A surprising response
Because of the uncertainties, I supplied links to the program and the documents I used so others could replicate my work, and I invited readers to criticize my methodology.
Then I got a surprise. One of the most compelling criticisms came from none other than Heartland Institute president Joe Bast, who posted a very lengthy explanation on the Heartland Institute website, accompanied by a statement (PDF) explaining why we should agree that the memo is a fake despite its recapitulating the other documents in lesser detail. Further, Bast argued, he was not the author, and offered this critique:
Computerized Text Analysis
Efforts apparently are underway to use authorship analysis software to find the true author or authors of the memo. Since the memo contains so much material copied and pasted from, or paraphrases of, my own writing, such a comparison of the content and writing style of the forged memo and the stolen documents wouldn't rule me out as a possible author of the memo. I hope persons conducting such analyses will use the text highlighted in the forged memo attached to this current essay, rather than the entire memo, so that their investigation is limited to the actual words of the forger rather than my own.
Bast's high level of concern that he be "ruled out" as an author of the memo again seems striking, but his critique contains what seems like a reasonable suggestion. It's possible that the analyses identified Bast as the most likely author because so much of the strategy memo appears to be cut and pasted from other Heartland documents that he apparently wrote himself.
Bast provided a new version of the strategy memo in which he highlighted the areas that he said had not been cut and pasted from his other writings (PDF) but were rather the work of "the forger" and suggested that only those areas should be used in any stylometric analysis.
To be as fair and objective as possible, I decided to rerun my stylometric analyses only on those sections of writing Bast himself identified.
I began by copying only the text in the disputed climate strategy memo that Bast highlighted as not being cut and pasted from the other Heartland documents. I pasted them into a word document, which you can download to replicate my work (DOCX). You can compare the word document to Bast's PDF to confirm that I only copied the text he identified.
I then ran the analysis precisely as before, using the exact same parameters. From the JGAAP documentation:
Among the simplest to understand conceptually is the so-called "nearest neighbor" algorithms; in this method, we "embed" each document into a high level abstract space of events. Each test document will be examined to see which of the training documents it is "closest" to; for example, if a document is 0.05 units away from a poem by Shakespeare, but 1.75 units away from Spencer, it's more likely to be by Shakespeare.
Of the six known author choices of: The Original Climate Strategy Memo (as a control), Peter Gleick 1 & 2, Joe Bast 1 & 2, and Heartland Staff, here are the scores JGAAP assigned for the most likely authorship of the climate strategy memo "forger's" language Bast identified, ranked from most likely author to least likely author under the same three analyses as before:
Forger's strategy memo language per Bast.docx
Analyzed by Nearest Neighbor Driver with metric Camberra Distance using Character 2Grams as events
1. Original Climate Strategy Memo 2.383187766675455
2. Joe Bast 2 2.9475177474704637
3. Peter Gleick 2 5.897504106656031
4. Heartland Staff 6.248560726914315
5. Joe Bast 10.685914472504823
6. Peter Gleick 11.393316484623435
Analyzed by Nearest Neighbor Driver with metric Camberra Distance using Word 2Grams as events
1. Joe Bast 2 3.6909680918553427
2. Heartland Staff 4.538336544898377
3. Original Climate Strategy Memo 5.194939408160381
4. Peter Gleick 2 10.459679587471603
5. Joe Bast 12.951829867316958
6. Peter Gleick 16.192515938693514
Analyzed by Nearest Neighbor Driver with metric Camberra Distance using Word stems as events
1. Joe Bast 2 7.000819201427195
2. Heartland Staff 7.637334302162223
3. Original Climate Strategy Memo 8.045029041744858
4. Peter Gleick 2 11.391513804534949
5. Peter Gleick 17.094372831175303
6. Joe Bast 17.512155036801435
According to the above analyses by the JGAAP software, which as I caution above may contain unknown errors, and considering only the "forger's" language not cut and pasted from other Heartland documents as identified by Heartland Institute president Joe Bast, the most likely author of the climate strategy memo is - still - Heartland Institute president Joe Bast.
Which leads me back to my earlier questions: how did Mosher and other Heartland-aligned climate denial bloggers know immediately and with such a high confidence that the whistleblower was Peter Gleick when as we see this is tricky stuff, and why is Joe Bast going to such great lengths to disavow the memo?
Get Shawn Lawrence Otto's new book: Fool Me Twice: Fighting the Assault on Science in America, Starred Kirkus Review; Starred Publishers Weekly review. Visit him at http://www.shawnotto.com. Like him on Facebook. Join ScienceDebate.org to get the presidential candidates to debate science.
Follow Shawn Lawrence Otto on Twitter: www.twitter.com/shawnotto