Preprints have been shared in the physics community since the early 1950s but mostly among well established professors. Physicist Paul Ginsparg, who receives the Einstein Foundation’s Individual Award, set out to democratize access to scientific results. Today, his preprint server arXiv has spread to many other fields—and made science progress more efficient and fairer.
By Andrew Curry
In August 1991, Paul Ginsparg was a mid-career physicist at Los Alamos National Laboratory in the U.S. He had a desktop computer of his own for the first time, but most of his colleagues still measured e-mail storage space in the hundreds of kilobytes, meaning they often had no room to store work-in-progress papers sent by colleagues.
Harnessing the fledgling internet—then still mostly the province of academics and computer experts—Ginsparg set up an e-mail server that stored articles remotely, allowing anyone with an e-mail account access to the latest research in high-energy physics before it was published in journals and available in university libraries. At the time, Ginsparg had no idea that his solution to the problem would one day change the face of science.
Now known as arXiv, Ginsparg’s invention was the first preprint server. In the more than 30 years since it went online, arXiv has revolutionized the way research results are shared. Put simply, Ginsparg’s key innovation was to make sharing early versions of research results possible online, something that may seem obvious today but was decidedly not in the late ’80s and early ’90s.
Iulia Georgescu, chief editor of Nature Reviews Physics, who nominated Ginsparg for the Einstein Foundation’s award, calls arXiv the first digital platform for scholarly communication. “What Ginsparg did, changed the way we work,” Georgescu says. “ArXiv is a tool for collaboration and communication, and that makes science better.”
When he started thinking about creating a server to distribute early copies of unpublished articles, Ginsparg says, he was drawing on a much older tradition. By the late ’50s, sharing preprints — experimental results or theoretical insights that had not yet been published in a peer-reviewed journal—was a common practice in the rarefied high-energy physics community. “Long before I was a grad student in the late ’70s, there was already an organized pre-print distribution system in high-energy physics,” Ginsparg says. But the system was decidedly analog. More prominent professors got photocopies and passed them to friends and colleagues, whose grad students saw them first. Photocopies eventually went out in the mail, to be shipped across the country or across the Atlantic at surface-mail rates to save costs. “Every week,” Ginsparg recalls, “libraries would put up the new collection of preprints and you would go and have a coffee and peruse preprint stacks.” For Ginsparg, who worked at Harvard before heading to Los Alamos in 1991, the system worked fine. But for people at smaller institutions, waiting on a physical copy published in a scholarly journal might mean a delay of up to a year before the latest research made it to their libraries.
In the fast-moving field of high energy physics, where theories about how the world worked could be upended by the discovery of a new subatomic particle, the lag could be a career-killer. “When there were rumors of new experiments that showed a new particle had been discovered, you couldn’t wait six months to a year to start thinking about new, amazing results,” Ginsparg says now. “It was unfair: There were privileged people who had advance access to research information.” It was also an inefficient way to move science forward. “It meant multiple people plugging away at the same problems, not realizing they may have already been solved,” Ginsparg says.
The inequality of the situation gnawed at Ginsparg and many of his colleagues. “There was this feeling that if you do great work, you want to think you worked harder or were more inspired than your competition,” Ginsparg says, “not that you succeeded because you had privileged access to information.” But in the ’70s and ’80s, with e-mail still in its infancy and data at a costly premium, there weren’t any obvious solutions. “We wouldn’t have done it this way if we had the technology to disseminate it differently,” Ginsparg says. “Given the technology of the time, there was no way of doing it that was intrinsically fair.”
“It meant multiple people plugging away at the same problems, not realizing they may have already been solved.” Paul Ginsparg
Then technology began to change. By the ’80s, most people in the physics community had access to e-mail, a near-instantaneous, near-free means of communicating. In the summer of 1991, Ginsparg spent a few hours talking with colleagues about a central server for physics preprints. Although a central server that could store papers until a researcher requested them had logistical advantages—many e-mail accounts at the time counted their storage capacity in kilobytes—Ginsparg also hoped to make the distribution of research results more democratic. “This fairness issue was paramount,” he says.
It was the right idea at the right time: 1991, the first year of what eventually became the arXiv, coincided with the collapse of the Soviet Union, a difficult time for scientists in former Soviet republics whose universities lost access to expensive journals. “I received e-mails from people everywhere from former Soviet republics saying their libraries were no longer able to subscribe to journals and it didn’t matter as long as they had a modem,” Ginsparg says. “That was really gratifying.” Georgescu, too, relied on the arXiv to access the latest results when she was an undergraduate student at the University of Bucharest in Romania. And when she began publishing preprints of her own, comments from other users helped her improve them. The preprint’s ability to democratize access to scientific results—and science itself—only grew stronger over the decades that followed.
Ginsparg and Georgescu agree that the attitude towards preprints is closely tied to the culture of high-energy physics, where experiments in the ’60s and ’70s involved hundreds of co-authors and massive facilities. As a result, researchers— both experimentalists and the theoreticians who dominated arXiv’s early submissions—tended to approach problems in a collaborative way.
Yet over the past decade, the preprint concept has spread from physics to conquer other fields in science as well. One major stumbling block was the way some researchers emphasized publication in a journal. In many fields, credit for discoveries goes to whoever publishes a final paper first. “To us in physics, this was nonsensical,” Ginsparg says.
Slowly, however, researchers began to realize that time-stamped preprints were an alternate way of establishing that you arrived at a key insight or result first. “Things are really starting to change,” Georgescu says. “Today, researchers from life and clinical sciences are more open to the idea of preprints because a large part of the community consists of younger scientists who grew up with the ideas of sharing and of having easy digital access to scholarly content.” That, Georgescu says, helped the model spread to other fields of science. BioRxiv, a preprint server for biological sciences, was founded in 2013; medRxiv, focused on health sciences, followed in 2019. There is an AfricArxiv, a NutriXiv and a techRxiv, plus dozens more servers devoted to other scientific or geographic niches.
Slowly at first, and then like a tidal wave, the preprint server concept that Ginsparg pioneered has profoundly changed the way science is done. Thirty years later, arXiv handles 200,000 submissions each year, and automated algorithms written by Ginsparg or his collaborators step in hundreds of times a day to address issues with submissions or to weed out problematic papers.
In its early years, arXiv was controversial—distributing preprints for free was seen as a threat to established journals, with their infrastructure, editors and anonymous peer reviewers. “Initially, I thought those two systems couldn’t co-exist,” Ginsparg says. But the journals didn’t disappear: Journal publications continue to serve as the peer-reviewed “version of record” for science, and they remain important for early-career researchers, who at the same time have access to early results via arXiv and other preprint servers.
“Today, researchers from life and clinical sciences are more open to the idea of preprints because they grew up with sharing and having easy digital access to scholarly content.” Iulia Georgescu
Early concerns that preprints would disseminate erroneous results, or put bad science into the public sphere, have proven not just unfounded but incorrect. During the COVID-19 pandemic, peer-reviewed journals have published a string of papers that had to be retracted—“things that never should have made it through peer-review,” Ginsparg says. Meanwhile, preprints put out in the early days of the pandemic were retracted or improved after feedback from colleagues. And others communicated new results about treatments to doctors at a time when waiting for peer-review could cost lives.
Preprints, it turns out, are the perfect way to put information in front of a wide audience of experts, who can weigh in early on any issues. And journalists are learning to report on preprints with appropriate qualifiers: Identifying information drawn from preprints as “not yet peer reviewed,” for example.
The future of arXiv looks bright. Ginsparg continues to tinker, working to bring artificial intelligence tools to further streamline the preprint process. “I want the processes of navigation, information and discovery to be improved,” Ginsparg says. “We want a system that keeps us informed. Ideally, it will have personalized highlighting of the latest results and breaking developments.”