
Artificial intelligence is transforming analog archives into digital data at the Boston Public Library. The library launched the initiative this summer — marking a big step forward when it comes to improving the accessibility of old documents.
As a federal depository, the Boston Public Library boasts one of the largest collections in the country. There are thousands of items you can only view by physically coming in, but with the help of technology, the library is hoping to change that.
“It will make these collections that the BPL holds accessible to anyone in the world,” said Jessica Chapel, chief of digital and online services at the Boston Public Library.
The library is launching a large-scale project to digitize its vast historical collections dating back hundreds of years. Chapel said the goal is to enhance the metadata of documents to make them more searchable online.
“We need this to be sort of as good as or better than human generated metadata,” Chapel said.
That’s where Harvard Law School’s Institutional Data Initiative comes in. IDI works with knowledge institutions around the world to refine and publish their collections as data.
Greg Leppert, executive director of the Institutional Data Initiative, describes this as a win-win partnership.
“Institutions have a lot of data that can be highly impactful and could guide the creation of AI across different phases of its development,” he said. “Let’s make the AI models better at extracting info for libraries, let’s make the models better at citing back to material libraries have, and let’s make info on the whole more reliable as a result.”
Most of the heavy lifting happens at hte library, where documents are scanned page by page.
It’s a labor-intensive process, made possible through a grant from Open AI.
Every document undergoes a conservation and imaging breakdown before being scannd.
“The goal is always no damage,” Chapel said. “We always want things to come out of digitization in as good a shape as they arrived down here.”
The project is still in its pilot phase until the end of the year. The plan is to scale up the digitization process while keeping the library’s values at the forefront.
“We’re not automating the creation of metadata and just putting it out there without human breakdown,” Chapel said. “We’re being very careful in terms of what is generated.”
It takes about an hour to scan 500 pages, so it’s a very time-consuming process. But the library hopes to be able to digitize nearly 250,000 documents over the next three years.