A search for "Internet Archive rumors" returns a copy of Fleetwood Mac "Rumours" on my first page of results. Playable in browser and downloadable in high-quality lossless format.
The book lawsuit was over current titles (not really archival and preservation), and the record lawsuit wasn't really about the rare 78s, it was about the modern Jimi Hendrix and Paul McCartney records that somehow slipped in. And their refusal to follow the modern law that they themselves celebrated that made what they're trying to do (including downloads) explicitly legal. But that law prohibited fundraising, and they couldn't resist tweeting out links to Frank Sinatra records with a big banner on top asking for money.
In both lawsuits the discovery revealed tech debt and sloppy process at the Archive that made it impossible for them to argue on behalf of the future we all want.
The Archive purchased 78s likely destined to be destroyed. These where digitized. [0]
"The focus of the lawsuit was the Internet Archive’s Great 78 Project, which officially started in 2017 and aimed to digitize the shellac discs that were the dominant medium for recorded music from the 1890s until the 1940s and 1950s, when vinyl arrived. With the help of audio preservationist George Blood (who was also named as a defendant in the suit), the Archive said it has digitized more than 400,000 of these old recordings." [1]
Benn Jordan discussion of piracy in general. [2]
[0] https://great78.archive.org/
[1] https://www.rollingstone.com/music/music-news/internet-archi...
[2] https://www.youtube.com/watch?v=L7EHRpnJICQ&t=952s
> The Archive purchased 78s likely destined to be destroyed.
It's worse than that.
Many large collections were donated to the archive, under the agreement they would be made available to the public. Part of what influenced Brewster to double down on his original error.
> The Joe Terino Collection, a collection of 70,000 78 rpm singles stored in a warehouse for 40 years.
> The Barrie H. Thorpe Collection, which had been deposited at the Batavia Public Library in Batavia, Illinois, in 2007 by Barrie H. Thorpe (1925–2012). It contains 48,000 singles.
> The Daniel McNeil Collection, with 22,359 singles.
Many more listed at Wikipedia: https://en.wikipedia.org/wiki/The_Great_78_Project
At the same time Internet Archive also launched the "Unlocked Recordings" collection of modern "out of print" LPs, as if that somehow made them free to distribute. The inclusion of Jimi Hendrix, Paul McCartney, and Nina Simone records is called out in the lawsuit.
This is how you ruin the reputation of an organization.
If a copyright holder refuses to make a work available to the public in any way (without a really good reason -- I don't agree with it, but I see Warner Bros. point in withholding certain WWII cartoons, for example) then to me at least they have no ethical claim against someone else who will.
Again, the Music Modernization Act (2018), which the Archive itself celebrated, was meant to put a dent in this problem. You prepare a list of works, then send a spreadsheet to the copyright office. Copyright holders then have 90 days to state that they're using the material commercially. If not, the library or archive (or individual) is free to make these works downloadable.
Orphaned works are a problem, but here was a way to move things a tiny bit forward. Unfortunately, just like Controlled Digital Lending it was left in the hands of some particularly careless people. I'd imagine the settlement also prohibits them from submitting the list (and thus causing the parties to do an enormous amount of research work).
I'm always up for a good ethics debate and likely wholeheartedly agree with your position. But this is a very different issue that resulted in clear damage to the culture we share and the future we all want.
"Legal" vs "ethical" summarizes everything that is wrong about the current state of copyright.
Yeah, I chose that word on purpose. :-(
> tech debt and sloppy process at the Archive
It’s a good time to remind everyone that the budget of the Internet Archive — the largest library in the world — is smaller than the SF public library. It has a small fraction of the budget of Wikipedia.
A ton of Wikipedia citations go to URLs that can now only be found in the Internet Archive Wayback Machine. Since they are so dependent on it, they should support it financially.
Probably the best thing they could do would be to fund the creation of a full mirror in a non-US jurisdiction.
The Internet Archive is not the largest library in the world -- the Library of Congress is, with 170 million+ actual items on shelves.
And the SF public library has 28 buildings, 6x the employee count, and 6x the budget.
Not sure what your point is, as they provide completely different services. As the judge said, "The Internet Archive does not perform the traditional functions of a library."
Or an archive, for that matter. No grants, explicitly no research services, no oversight and no long-term plan.
Librarians have a formal code of ethics and the Internet Archive fails more than half.
https://www.ala.org/tools/ethics
> Librarians have a formal code of ethics and the Internet Archive fails more than half.
I am guessing 4, 7 and 9, but which others?
> 1. We provide the highest level of service to all library users through appropriate and usefully organized resources; equitable service policies; equitable access; and accurate, unbiased, and courteous responses to all requests.
Whether its people suing them, people begging to have personal information removed, or people letting them know credentials have been sitting in the open on Gitlab for over two years, they often do not respond at all. As for "usefully organized resources" well they've overwhelmed. But digitizing 400,000 old records and starting a bank shouldn't be a priority until they get the basics sorted. Like serving torrents that aren't corrupt and missing files.
> 3. We protect each library user's right to privacy and confidentiality...
They leaked 30 million messages from patrons, including driver's licenses and passports.
> 6. We do not advance private interests at the expense of library users, colleagues, or our employing institutions.
They've always played favorites with access. Not only read access but also write access (!) to their databases. They run an entirely separate version of the site for paying institutional clients. Conflict of interest that borders on grifting by employees is not hard to spot.
As a scholar, I use libraries to find and access materials. I think that is the traditional function?
If I want to access a book (or frankly anything else), chances are I can do so in minutes on the Internet archive — I can’t say the same for my university library. Or the library of Congress.
The internet archive has more materials (and more kinds of materials) and serves more than 10x the number of people (digitally).
I recognize that a community library and the LOC have different purposes from the IA. But the scale is huge — and it is extremely useful in scholarship.
Also: did you used to work there? You know a lot about it, but maybe had a bad experience?
Exactly I find it a weird name.
In my country the government has an organisation that archives everything ever released in the country in climate controlled vaults. If somebody wants to read a 17th century newspaper they can.
Link to the tech debt aspect? I knew that was the case but want to know specifics.
Also the book lawsuit wasn't over old or new titles, it was loaning them 1:N instead of 1:1 because "pandemic". I didn't think it was a great idea at the time and everything in that lawsuit has pointed towards it just being an outright foolhardy effort. There were on a great path towards expanding digital lending boundaries (by letting any library add their books to the IA's lending circulation) and screwed it all up.
>it was loaning them 1:N instead of 1:1 because "pandemic
It was over loaning them 1:1, the pandemic actions were barely mentioned as part of the lawsuit and the result is that 1:1 loaning was ruled illegal. The only harm the pandemic actions did was to public opinion.
Or, the 1:1 lending was probably okay until Kahle showed his willingness to abandon copyright entirely with the emergency library, and publishers decided it was worth putting down CDL as a whole.
The book publishers had been building a case against the CDL for a decade. They saw an opportunity to control the narrative and took it.
That's what I said? The pandemic excuse was IA's reason for doing it at first.
> not really archival and preservation
The trick is you want them to be archived now when they're readily available not years from now when they're hard or impossible to find. The difficulty is justifying holding on to them that long when they can't be accessed and deciding when they should be exposed.
Also environmental conditions (ie. fires) can ruin physical archives in the long run.
No joke: the Internet Archive's physical archives are stored within the blast radius of an oil refinery. Also in a part of town where the charter warns not to expect prompt emergency services in the event of a natural disaster.
Yep. Historically, some of the hardest stuff to find are the things that were common and people assumed would always be available and known.
What's the third shaker for?!
https://nowiknow.com/the-mystery-of-the-third-shaker/
TL;DR: There's a unknown 3rd shaker in many 19th century table settings and no one really knows what went in it as far as I can tell. There's some indication it might have been for dried mustard powder but that's the only guess anyone has and it wasn't that popular as a condiment as far as anyone can tell outside of the tiny references in late 19th century catalogs for the 3 shaker sets including a 'mustard bottle'.
"…you want them to be archived now when they're readily available not years from now when they're hard or impossible to find."
Exactly. Over the centuries humans have used various techniques to record information they've wanted to keep for posterity and almost every newer innovation they've adopted has resulted in information being stored for a shorter time than the older technology that preceded it.
That's a sweeping and overly general comment but I'll justify it with several examples from history. First, there are many reasons why people would want to replace a time-honored way of documenting information by using alternative methods. My purpose here to draw attention to how with each new technical innovation storage longevity ends up being shorter than the previous generation.
Examples:
1. Messages chiseled on stone has a longevity of many millennia. Examples: Egyptian hieroglyphics, the Rosetta Stone.
2. Information written on parchment and vellum can, with reasonable care, be in good condition after 1000 years or more. Examples include the Doomsday Book, Magna Carta and the Lindisfarne Gospels (it's ~1300 years old).
3. Paper has the advantage of higher storage density especially if it's the substrate for printed text. Longevity can be high for high quality paper, flax types etc can last over 500 years, whereas cheap paper as in paperbacks and newsprint has a much shorter lifespan, likely 50 - 75 years at most.
4. Painting (oil-on-canvas) has, with care, good longevity, at least 500-plus years. Works of the great masters, Caravaggio, da Vinci (Mona Lisa), et al, are nowadays still in reasonable to good condition.
5. Photography has several advantages over art, it's capable of rendering an image accurately and it's much easier to reproduce multiple images than drawing them all by hand. Longevity and quality of photographic images depends on the technology that's employed. Here are several examples.
-- Wet-collodion process (negative on glass plate). Introduced 1851. Advantage: high resolution, disadvantage: not orthochromatic/panchromatic, sensitive only to blue light. Longevity: if stored with care images will last 200-plus years. Examples Civil War photos of Mathew Brady, Alexander Gardner and others.
-- Kodachrome color transparency film. Introduced 1935. Advantage high resolution, high color accuracy. Longevity: ~200 years under ideal storage conditions.
-- Kodak Kodacolor negative (late 1940s to ~1960). Longevity: only a few years, image fades even when stored under ideal conditions. This tech was an unmitigated disaster, many families lost treasured wedding photos etc. because of fading.
-- Eastman color print film (theater release stock). Subject to considerable fading when stored, film cannot be used for archiving, it's useless if stored for several decades.
-- B&W film stock on acetate base. Longevity: 200-plus years under proper storage conditions.
6. Sound Recordings:
-- 78-RPM shellac disks. ca 1905 - ~1955 (cylinders were earlier, first produced in late 1880s). Sound quality poor to fair. Longevity: if stored under ideal conditions and handled with care then lifetime is >200 years (there are good, well kept 78 recordings that are now about 120 years old and they're still in excellent condition).
Note: that figure is the archive lifetime. Unfortunately, the original instruments—Edison phonographs and like—that 78s were designed to be played on cause considerable wear and damage to 78 RPM disks.
I cringe every time I see YouTube videos of well-meaning owners of treasured acoustic gramophones playing 78 recordings on them. They seem completely oblivious to the damage they're doing to their recordings.
(Just because 78s were designed to be played on these players it doesn't mean they're not damaged by them. In those days pickups were without electrical amplification and to get maximum sound level considerable weight was applied to the stylus. Moreover, to improve stylus (steel needle) tracking an abrasive was added to the shellac which abraded the stylus to best fit the recorded groove. It worked both ways, both the record and stylus were worn during playback).
-- Vinyl LP High Fidelity recordings. 1948 to ~1980 (now in limited revival). Longevity: ~200 years or perhaps longer for a recording kept for archival purposes..
7. Magnetic Recordings—Audiotape, Videotape and Hard Disks. ca 1940s to present—athough its use has declined in recent years. Magnetic tape was often used to make master recordings for both vinyl records, videos (VHS etc.) and in television production, and it's still used for data storage, QIC cartridges, etc. Also, magnetic recording is still the underpinning technology in rotary hard disks.
Longevity: That said, magnetic tape and similar media, hard disks, etc. suffer from loss of magnetic remanence. Simply, the magnetic intensity on storage media decays over time, thus so does the recorded information.
Many factors contribute to the decay of information on magnetic media (which I cannot cover here) but in comparison with the older media (e.g, vinyl, 78s) its lifespan is almost pathetically short. One risks one's data if one uses a hard disk much past its short lifespan of about five years. Moreover, just archiving one's data on a hard disk and assuming it'll be OK because the drive is not actually being used is a risky business. The only way to guarantee the integrity of one's data is to copy it—rewrite it to media before its remanence falls below the threshold where data cannot be recovered, ideally this should be done within the drive's specified lifetime.
With analog recordings loss of remanence shows up as loss of signal-to-noise ratio. I've personally known people who've dug out VHS videos of their wedding to show to their kids on their 21st birthday only to find the tapes not viewable.
8. NAND Memory. These days just about everyone has moved to NAND flash memory because of its speed and convenience. It's hard to buy a PC without either an SSD or NVMe storage, and everyone uses USB trumb drives. NAND storage is great stuff, we all love it.
However, if one is not careful and proactive NAND is a time bomb waiting to destroy your data. First, NAND deteriorates and wears out with usage, second—like magnetic remanence—its electronic storage decays with age even when it's only being used for archival storage.
I speak from experience here, some years ago I created an archival backup of all my important photos and stored them on a new 500GB SSD which I put in a safe place for storage. Several years later when I checked the drive it was completely unresponsive and I thought it dead. As luck would have it, about a half hour later the SSD eventually came back to life.
What happened was the drive's controller locked drive whilst it refreshed its floating-gate charges. Luckily, charges had not decayed below the threshold where my data would have been irrecoverable. If say I'd not checked the drive for another year then chances are that I'd not have been so lucky. (That's just one incident, I'll refrain from discussing other recent NAND failures.)
Yes, it was a backup and my working copies were OK, but the point is obvious, one cannot put modern NAND storage aside and simply just forget about it indefinitely and hope one's data will still be viable. Unfortunately, nothing could be further from the truth.
We shouldn't be surprised by what happened here when we care to look at how NAND flash actually works. Frankly, it's amazing that it works at all let alone the fact that it works so well, NAND is truly a masterpiece of modern semiconductor fabrication.
NAND's functionality depends on quantum tunneling and that we use the fact to force a few hundred electrons across an 'insulated' barrier where (hopefully) they'll stay stored in the FET's floating gate for an indefinite length of time.
What I find amazing is not that quantum tunneling works and that we can store electrons in a floating gate or charge trap (and that's amazing enough of itself) but that the insulation is so good (stray resistance is so high) that it can take some years for this almost infinitesimal charge to leak away and dissipate.
Whilst we can view quantum tunneling and charge storage as a state of stable equilibrium (in that in an ideal transistor the charge would be stored indefinitely), that cannot be said for a real-world device, eventually the charge will dissipate and with it goes your data. NAND storage has many advantages but in comparison with other storage tech its both ephemeral and short lived.
We have a world now running on NAND technology and yet we've no immediate technology in the wings that'd be a better replacement—one that's intrinsically or inherently stable by design, and that's a pretty unsatisfactory and unnerving situation. Moreover, the situation is made worse by our blasé attitude and headlong rush to adopt multilayered 3D NAND which further reduces the electron count in a floating gate. By design, we're deliberately making NAND more unreliable because of our demand for even more storage.
When one thinks about it, it's pretty outrageous that the world is having to rely on dozens of data centers so as to achieve some degree or guarantee of longterm data reliability.
It's hard enough now to find and keep track of information without revengeful moneygrubbing bastards screwing the Internet Archive. We need every bit of reliable archival storage now and their actions are selfish and counterproductive. If this nonsense continues then heaven knows what the future holds for longterm data storage.
Thanks for writing this particularly detailed post; I enjoyed it. Incidentally, you've reminded me I really need to upgrade one of my NAS drives soon...
What's the URL? Curious if it's still valid and if it were uploaded by some random user or one of the archiving projects.
For extra awesome use their Winamp clone. It really whips the llama's ass.
https://archive.org/details/fleetwood_mac-1977-rumours?webam...
Uploaded by user "Ultra Lo Fi Experimental" who appears to be putting lots of big name releases on there, I can't understand why they would do this.
The Reddit support groups are pretty enlightening. Many people think they're just adding things in to a public library and somehow this is all perfectly legal.
It's charming and reminiscent of the best old-school Wikipedia energy. People on the fringe (and probably some OCD people) finding something to do. Curation and contribution feels good, man.
But yeah, holy shit. Brewster Kahle and Jason Scott have said "upload away, we'll figure it all out later" -- then themselves uploaded hundreds of thousands of items to set an example.
"Please don't upload copyrighted material" would go a long way at the top of that PHP upload form. Better yet a checkbox: "This is copyrighted. Archive it but don't republish it." But I suppose where's the fun in that.
I think you overestimate normal people's understanding of copyright. There's tons of videos on youtube with a description like "I don't claim to own this. No copyright intended". If you put such a form most people would probably think they're "not copyrighting" it or whatever confused idea they've got about how the law works or what the words mean.
Here's one: https://archive.org/details/FleetwoodMacRumours
What modern law(s) are you referring to? (Serious question, interested in learning more)
Music Modernization Act (2018). A long overdue tiny step into common sense copyright reform.
https://www.copyright.gov/music-modernization/pre1972-soundr...
"The legislation also establishes a process for lawfully engaging in noncommercial uses of Pre-1972 Sound Recordings that are not being commercially exploited. To qualify for this exemption, a user must file a notice of noncommercial use after conducting a good faith, reasonable search, and the rights owner of the sound recording must not object to the use within 90 days."
Basically Internet Archive could've sent a spreadsheet of works to the Copyright Office and anyone claiming commercial use of these old records had to respond within 90 days. In the lawsuit Brewster is quoted saying that it makes pre-1972 works "Library Fair Use."
The law does not allow people to make commercial use of these recordings (take three seconds to consider why). But they could've archived old recordings all day and then made the ones that cleared the list immediately available for unlimited download. And provided excerpts of the others for research purposes.
Instead they managed to get sued for $696 million. For a side project that nobody cared about (40,000+ downloads) that managed to put the whole org (and several participants' personal assets) at risk for two years.
As an aside, 78 RPM records are not particularly fragile. As a consumer product, yes. By preservation standards, no.
"With proper care and storage, this durable resource can last for centuries" https://www.tandfonline.com/doi/abs/10.1300/J116v08n02_04
"Somehow slipped in" - are you fr rn?
I don't wonder anything about that, was very convenient.
Are you claiming the copyright holders put them there?
You might want to be specific about which ones were some kind of false flag conspiracy plot (is it just "Rumours"?) because there are thousands and thousands of pirated pieces of media on archive.org. I am behind there being some kind of archive project but as things stand the site was/is just Mega with a veneer of respectability.
Mega isn't indexed and decrypts content in javascript on the client. If people are using it to share your stuff, you have to find where they're sharing it to even find out that it's on there. Now, Hollywood did this to themselves, because back when it was Megaupload the fact that they had a search index was used as an excuse to shut down the whole service because a couple of employees used it to access infringing stuff without taking it down, so Mega got rid of that as a means to prevent it from happening again.
Internet Archive is searchable and executes DMCA takedown notices. If something of yours is on there and you don't want it to be, it's not because they're making it hard to change that.
Internet Archive continues to leak the verified email address of the uploader of each item, so conspiracy guy can run a spot check should he so desire.
Fair if true, though it's not like someone uploading things to IA is likely to be using an address like michael.scott @ dundermifflin.com or something. More likely it's something anonymous. Certainly so if they were planting things to sue over, you can bet they'd use an account like dsfhakeij@ mail.ru or something.