(C) Alec Muffett's DropSafe blog.

(C) Alec Muffett's DropSafe blog.
Author Name: Alec Muffett
This story was originally published on allecmuffett.com. [1]
License: CC-BY-SA 3.0.[2]

No data? No problem! Undisclosed tinkering in Excel behind economics paper

2024-02-05 00:00:00

Almas Heshmati

Last year, a new study on green innovations and patents in 27 countries left one reader slack-jawed. The findings were no surprise. What was baffling was how the authors, two professors of economics in Europe, had pulled off the research in the first place.

The reader, a PhD student in economics, was working with the same data described in the paper. He knew they were riddled with holes – sometimes big ones: For several countries, observations for some of the variables the study tracked were completely absent. The authors made no mention of how they dealt with this problem. On the contrary, they wrote they had “balanced panel data,” which in economic parlance means a dataset with no gaps.

“I was dumbstruck for a week,” said the student, who requested anonymity for fear of harming his career. (His identity is known to Retraction Watch.)

The student wrote a polite email to the paper’s first author, Almas Heshmati, a professor of economics at Jönköping University in Sweden, asking how he dealt with the missing data.

In email correspondence seen by Retraction Watch and a follow-up Zoom call, Heshmati told the student he had used Excel’s autofill function to mend the data. He had marked anywhere from two to four observations before or after the missing values and dragged the selected cells down or up, depending on the case. The program then filled in the blanks. If the new numbers turned negative, Heshmati replaced them with the last positive value Excel had spit out.

The student was shocked. Replacing missing observations with substitute values – an operation known in statistics as imputation – is a common but controversial technique in economics that allows certain types of analyses to be carried out on incomplete data. Researchers have established methods for the practice; each comes with its own drawbacks that affect how the results are interpreted. As far as the student knew, Excel’s autofill function was not among these methods, especially not when applied in a haphazard way without clear justification.

But it got worse. Heshmati’s data, which the student convinced him to share, showed that in several instances where there were no observations to use for the autofill operation, the professor had taken the values from an adjacent country in the spreadsheet. New Zealand’s data had been copied from the Netherlands, for example, and the United States’ data from the United Kingdom.

This way, Heshmati had filled in thousands of empty cells in the dataset – well over one in 10 – including missing values for the study’s outcome variables. A table listing descriptive statistics for the study’s 25 variables referred to “783 observations” of each variable, but did not mention that many of these “observations” were in fact imputations.

“This fellow, he imputed everything,” the student said. “He is a professor, he should know that if you do so much imputation then your data will be entirely fabricated.”

Other experts echoed the student’s concerns when told of the Excel operations underlying the paper.

“That sounds rather horrendous,” said Andrew Harvey, a professor of econometrics at the University of Cambridge, in England. “If you fill in lots of data points in this way it will invalidate a lot of the statistics and associated tests. There are ways of dealing with these problems correctly but they do require some effort.

“Interpolating data is bad practice but lots of people do it and it’s not dishonest so long as it’s mentioned,” Harvey added. “The other point about copying data from one country to another sounds much worse.”

Søren Johansen, an econometrician and professor emeritus at the University of Copenhagen, in Denmark, characterized what Heshmati did as “cheating.”

“The reason it’s cheating isn’t that he’s done it, but that he hasn’t written it down,” Johansen said. “It’s pretty egregious.”

The paper, “Green innovations and patents in OECD countries,” was published in the Journal of Cleaner Production, a highly ranked title from Elsevier. It has been cited just once, according to Clarivate’s Web of Science.

Neither the publisher nor the journal’s editors, whom the student said he alerted to his concerns, have responded to our requests for comment.

Heshmati’s coauthor, Mike Tsionas, a professor of economics at Lancaster University in the UK, died recently. In a eulogy posted on LinkedIn in January, the International Finance and Banking Society hailed Tsionas as “a true luminary in the field of econometrics.”

In a series of emails to Retraction Watch, Heshmati, who, according to the paper, was responsible for data curation, first said Tsionas had been aware of how Heshmati dealt with the missing data.

“If we do not use imputation, such data is almost useless,” Heshmati said. He added that the description of the data in the paper as “balanced” referred to “the final data” – that is, the mended dataset.

Referring to the imputation, Heshmati wrote in a subsequent email:

Of course, the procedure must be acknowledged and explained. I have missed to explain the imputation procedure in the data section unintentionally in the writing stage of the paper. I am fully responsible for imputations and missing to acknowledge it.

He added that when he was approached by the PhD student:

I offered him a zoom meeting to explain to him the procedure and even gave him the data. If I had other intensions [sic] and did not believe in my imputation approach, I would not share the data with him. If I had to start over again, I would have managed the data in the same way as the alternative would mean dropping several countries and years.

Gary Smith, a professor of economics at Pomona College in Claremont, California, said the copying of data between countries was “beyond concerning.” He reviewed Heshmati’s spreadsheet for Retraction Watch and found five cases where more than two dozen data points had been copied from one country to another.

Marco Hafner, a senior economist at the RAND Corporation, a nonprofit think tank, said “using the autofill function may not be the best of ideas in the first place as I can imagine it is not directly evident to what conditions missing values have been determined/imputed.”

Hafner, who is research leader at RAND Europe, added that “under reasonable assumptions and if it’s really necessary for analytical reasons, one could fill in data gaps for one country with data from another country.” But, he said, the impact of those assumptions would need to be reported in a sensitivity analysis – something Heshmati said he had not done.

“At the bare minimum,” Hafner said, the paper should have stated the assumptions underlying the imputation and how it was done – something that, he added, would have reduced the chances of the work getting published should the reviewers find the methods inappropriate.

Like Retraction Watch? You can make a tax-deductible contribution to support our work, subscribe to our free daily digest or paid weekly update, follow us on Twitter, like us on Facebook, or add us to your RSS reader. If you find a retraction that’s not in The Retraction Watch Database, you can let us know here. For comments or feedback, email us at [email protected].

Share this: Email

Facebook

Twitter

[END]

[1] URL: https://retractionwatch.com/2024/02/05/no-data-no-problem-undisclosed-tinkering-in-excel-behind-economics-paper/
[2] URL: https://creativecommons.org/licenses/by-sa/3.0/

DropSafe Blog via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/alecmuffett/