Inside the Race to Save Climate Data in the Age of Trump
A day with a team of researchers from UCLA
Photo by Lukas Schulze/Getty
On Inauguration Day, a grim rain hammered Los Angeles for most of the morning and into the afternoon, part of a record-setting series of storms that would cause flooding, mudslides, and evacuations across southern California. Meanwhile, in a nondescript Lego-block building housing the department of Graduate Education and Information Studies on the UCLA campus, a diverse group gathered over their laptops for reasons practically as grim.
Inspired by and in conjunction with other “hackathons” at the University of Pennsylvania and the University of Toronto, the group’s mission was to preserve and protect precious scientific data related to climate change and environmental regulation by scraping as much information from the Department of Energy website as time would allow. They called it “a guerrilla archiving event.”
“We worked on the DOE data sets because we didn’t want to focus on anything that would be redundant with DataRefuge at UPenn or Toronto,” said Jennifer Pierre, a PhD student in the Department of Information Studies and one of the event’s organizers. “They had already done a lot of work on NASA, NOAA, and EPA, so when we talked to them they said, ‘Here’s everything we’ve scraped so far.’ It’s just a challenge to tackle this huge amount of data and prioritize what data sets should be examined and culled.”
The project speaks to the surreal nature of this political moment. As the Trump administration takes over with characters like Scott Pruitt at the Environmental Protection Agency and Rick Perry at the Department of Energy, no one’s kidding themselves that these men are tasked with anything besides degrading environmental protections and making the country and the world as vulnerable as possible for extractive industries, particularly coal, oil, and gas. For anyone who thought the global community had turned a corner on the battle to arrest runaway climate change, the Trump administration presents an existential threat, not only to taking action but to the very research that investigates what we’re doing to the planet. One of the first things the Trump administration did when it got control of whitehouse.gov was erase any and all references to climate change on the website.
“It’s not so much that we’re worried all these data sets will be erased, but with data the key is that it’s available and accessible,” said Britt Paris, another organizer and PhD in the Department of Information Studies. “What’s almost certain to happen is that these organizations will get defunded, and this can throw the management of data sets into disarray. Then they are made irrelevant over time and can become lost for all intents and purposes. It’s a degrading of the science that backs up our knowledge of what’s happening.”
If you’re like me, and your understanding of all things computer is limited to opening Word documents and streaming janky NBA feeds from reddit, what the UCLA group did that day was, in some cases, actually fairly simple. After assessing a website’s properties and the complexity of the source code, if a site was easily “crawlable,” they simply took the Internet Archive nomination tool and use it to “crawl” a website (meaning, to analyze and download the material in code form). That information is then stored at Internet Archive, a nonprofit digital library and one of the few places that can hold all these terabytes and terrabytes of data. If a site was deemed not crawlable, a sub-set of people spent the day working on the more technically difficult process of writing a script for alternative scraping. In conjunction with DataRefuge and the University of Toronto, the UCLA group also discussed best practices for maintaining the integrity of these data sets.
“Data can’t just be saved. It has to be usable, findable and preserved in a way that’s cohesive,” said Pierre. “Especially with environmental data that uses longitudinal systems of observation like species growth or weather patterns or sea ice extant or temperature gradations. You have to have proper meta-data around this data. There’s a reasoning and logic to how the info is organized. Policy has to be backed up and informed by data that really represents a body of evidence in a readable and understandable manner.”
In this post-truth moment, that almost sounds radical, doesn’t it?
As Donald Trump delivered his speech on an American Carnage, I spent the afternoon listening to this group brainstorm methods and ideas for preserving an indispensable archive of knowledge. It was that fabled assembly of a few dedicated individuals chipping in to change the world, and it had the surprisingly jovial mood created by the sense of like-minded people linking arms. Pierre, Paris, and their fellow organizers ordered too much pizza, and the boxes had to be piled beneath folding tables. Junk food and quick calories were abundant. The carpet of this institutional classroom, the kind you hated from Introduction to Statistics class, was a disaster of crumbs and soda stains. Professors, undergrads, and over-tattooed grad students pecked away at laptops. Jason Scott, the representative from the Internet Archive, oversaw the proceedings in an intriguing black top hat and a button-up shirt of vines and skulls. Everyone’s mind seemed to work in overdrive. No one spoke at less than a mile a minute.
After all, as this data comes under political risk, there are unseen consequences. Several people discussed the downstream impacts losing this data could have on the private sector or the military. This data is important not just to scientists or environmental activists but to everyone from farmers to commercial shippers to the Army Corps of Engineers.
Steve Diggs, a researcher at UC San Diego had an outlook that could either be described as bleak or hard-headedly realistic. Diggs’s work involves archiving and managing hydrographic data, basically deep ocean physics and chemistry. His work has been used by the Intergovernmental Panel on Climate Change Fifth Assessment Report. He saw the threat as more than just a loss of data but a loss of funding for science, period.