Derek Martin on Nostr: Regarding AI training data & book piracy, in 2011 Google had a project whose goal was ...
Regarding AI training data & book piracy, in 2011 Google had a project whose goal was to scan and index 25 million library books. They ran into legal issues, and the judge said they could keep scanning, but couldn't put them online for everyone to read. They could, however, allow people to query them to figure out what book had the info they needed. They could also use excerpts in search results, but only a page or so. I bet that's a huge set of AI training data now.
https://www.edsurge.com/news/2017-08-10-what-happened-to-google-s-effort-to-scan-millions-of-university-library-booksPublished at
2025-03-21 13:15:10Event JSON
{
"id": "6567c46cab8ec2eb4afd713159bd98d665bc414c5e2e773e3398d6e129d78ce5",
"pubkey": "765d163b0d0ef6f653412b4775f89ec81b323a307fcf7c67d60b27bea869fca2",
"created_at": 1742562910,
"kind": 1,
"tags": [
[
"proxy",
"https://mastodon.cloud/users/lo_fye/statuses/114200602905310771",
"activitypub"
]
],
"content": "Regarding AI training data \u0026 book piracy, in 2011 Google had a project whose goal was to scan and index 25 million library books. They ran into legal issues, and the judge said they could keep scanning, but couldn't put them online for everyone to read. They could, however, allow people to query them to figure out what book had the info they needed. They could also use excerpts in search results, but only a page or so. I bet that's a huge set of AI training data now. https://www.edsurge.com/news/2017-08-10-what-happened-to-google-s-effort-to-scan-millions-of-university-library-books",
"sig": "725b9369a8ac7fc2f3aea34e6ffa5dcc08e26a40d11380ccef4dfb8cbb2e503fcfeb9b85850a44f9010db9b31dcc4721272385a56ef1361c729c8a5f1d726919"
}