Hacker News

Final edit:

This ambiguity is documented at least back to 1984, by IBM, the pre-eminent computer company of the time.

In 1972 IBM started selling the IBM 3333 magnetic disk drive. This product catalog [0] from 1979 shows them marketing the corresponding disks as "100 million bytes" or "200 million bytes" (3336 mdl 1 and 3336 mdl 11, respectively). By 1984, those same disks were marketed in the "IBM Input/Output Device Summary"[1] (which was intended for a customer audience) as "100MB" and "200MB"

0: (PDF page 281) "IBM 3330 DISK STORAGE" http://electronicsandbooks.com/edt/manual/Hardware/I/IBM%20w...

1: (PDF page 38, labeled page 2-7, Fig 2-4) http://electronicsandbooks.com/edt/manual/Hardware/I/IBM%20w...

Also, hats off to http://electronicsandbooks.com/ for keeping such incredible records available for the internet to browse.

-------

Edit: The below is wrong. Older experience has corrected me - there has always been ambiguity (perhaps bifurcated between CPU/OS and storage domains). "And that with such great confidence!", indeed.

-------

The article presents wishful thinking. The wish is for "kilobyte" to have one meaning. For the majority of its existence, it had only one meaning - 1024 bytes. Now it has an ambiguous meaning. People wish for an unambiguous term for 1000 bits, however that word does not exist. People also might wish that others use kibibyte any time they reference 1024 bytes, but that is also wishful thinking.

The author's wishful thinking is falsely presented as fact.

I think kilobyte was the wrong word to ever use for 1024 bytes, and I'd love to go back in time to tell computer scientists that they needed to invent a new prefix to mean "1,024" / "2^10" of something, which kilo- never meant before kilobit / kilobyte were invented. Kibi- is fine, the phonetics sound slightly silly to native English speakers, but the 'bi' indicates binary and I think that's reasonable.

I'm just not going to fool myself with wishful thinking. If, in arrogance or self-righteousness, one simply assumes that every time they see "kilobyte" it means 1,000 bytes - then they will make many, many failures. We will always have to take care to verify whether "kilobyte" means 1,000 or 1,024 bytes before implementing something which relies on that for correctness.

cedilla 9 hours ago [ - ]

You've got it exactly the wrong way around. And that with such great confidence!

There was always a confusion about whether a kilobyte was 1000 or 1024 bytes. Early diskettes always used 1000, only when the 8 bit home computer era started was the 1024 convention firmly established.

Before that it made no sense to talk about kilo as 1024. Earlier computers measured space in records and words, and I guess you can see how in 1960, no one would use kilo to mean 1024 for a 13 bit computer with 40 byte records. A kiloword was, naturally, 1000 words, so why would a kilobyte be 1024?

1024 bearing near ubiquitous was only the case in the 90s or so - except for drive manufacturing and signal processing. Binary prefixes didn't invent the confusion, they were a partial solution. As you point out, while it's possible to clearly indicate binary prefixes, we have no unambiguous notation for decimal bytes.

Sophira 9 hours ago [ - ]

> Early diskettes always used 1000

Even worse, the 3.5" HD floppy disk format used a confusing combination of the two. Its true capacity (when formatted as FAT12) is 1,474,560 bytes. Divide that by 1024 and you get 1440KB; divide that by 1000 and you get the oft-quoted (and often printed on the disk itself) "1.44MB", which is inaccurate no matter how you look at it.

card_zero 8 hours ago [ - ]

I'm not seeing evidence for a 1970s 1000-byte kilobyte. Wikipedia's floppy disk page mentions the IBM Diskette 1 at 242944 bytes (a multiple of 256), and then 5¼-inch disks at 368640 bytes and 1228800 bytes, both multiples of 1024. These are sector sizes. Nobody had a 1000-byte sector, I'll assert.

dooglius 8 hours ago [ - ]

The wiki page agrees with parent, "The double-sided, high-density 1.44 MB (actually 1440 KiB = 1.41 MiB or 1.47 MB) disk drive, which would become the most popular, first shipped in 1986"

bcrl 7 hours ago [ - ]

To make things even more confusing, the high-density floppy introduced on the Amiga 3000 stored 1760 KiB

kstrauser 6 hours ago [ - ]

At least there it stored exactly 3,520 512-byte sectors, or 1,760 KB. They didn't describe them as 1.76MB floppies.

7 hours ago [ - ]

[deleted]

publicdebates 8 hours ago [ - ]

Human history is full of cases where silly mistakes became precedent. HTTP "referal" is just another example.

I wonder if there's a wikipedia article listing these...

nayuki 7 hours ago [ - ]

It's "referer" in the HTTP standard, but "referrer" when correctly spelled in English. https://en.wikipedia.org/wiki/HTTP_referer

theamk 8 hours ago [ - ]

it's, way older in than the 1990's! In computering, "K" always meant 1024 at least from 1970's.

Example: in 1972, DEC PDP 11/40 handbook [0] said on first page: "16-bit word (two 8-bit bytes), direct addressing of 32K 16-bit words or 64K 8-bit bytes (K = 1024)". Same with Intel - in 1977 [1], they proudly said "Static 1K RAMs" on the first page.

[0] https://pdos.csail.mit.edu/6.828/2005/readings/pdp11-40.pdf

[1] https://deramp.com/downloads/mfe_archive/050-Component%20Spe...

bombcar 7 hours ago [ - ]

It was exactly this - and nobody cared until the disks (the only thing that used decimal K) started getting so big that it was noticeable. With a 64K system you're talking 1,536 "extra" bytes of memory - or 1,536 bytes of memory lost when transferring to disk.

But once hard drives started hitting about a gigabyte was when everyone started noticing and howling.

angst_ridden 8 hours ago [ - ]

It was earlier than the 90s, and came with popular 8-bit CPUs in the 80s. The Z-80 microprocessor could address 64kb (which was 65,536 bytes) on its 16-bit address bus.

Similarly, the 4104 chip was a "4kb x 1 bit" RAM chip and stored 4096 bits. You'd see this in the whole 41xx series, and beyond.

magicalist 8 hours ago [ - ]

> The Z-80 microprocessor could address 64kb (which was 65,536 bytes) on its 16-bit address bus.

I was going to say that what it could address and what they called what it could address is an important distinction, but found this fun ad from 1976[1].

"16K Bytes of RAM Memory, expandable to 60K Bytes", "4K Bytes of ROM/RAM Monitor software", seems pretty unambiguous that you're correct.

Interestingly wikipedia at least implies the IBM System 360 popularized the base-2 prefixes[2], citing their 1964 documentation, but I can't find any use of it in there for the main core storage docs they cite[3]. Amusingly the only use of "kb" I can find in the pdf is for data rate off magnetic tape, which is explicitly defined as "kb = thousands of bytes per second", and the only reference to "kilo-" is for "kilobaud", which would have again been base-10. If we give them the benefit of the doubt on this, presumably it was from later System 360 publications where they would have had enough storage to need prefixes to describe it.

[1] https://commons.wikimedia.org/wiki/File:Zilog_Z-80_Microproc...

[2] https://en.wikipedia.org/wiki/Byte#Units_based_on_powers_of_...

[3] http://www.bitsavers.org/pdf/ibm/360/systemSummary/A22-6810-...

pdw 8 hours ago [ - ]

Even then it was not universal. For example, that Apple I ad that got posted a few days ago mentioned that "the system is expandable to 65K". https://upload.wikimedia.org/wikipedia/commons/4/48/Apple_1_...

kstrauser 6 hours ago [ - ]

Someone here the other day said that it could accept 64KB of RAM plus 1KB of ROM, for 65KB total memory.

I don't know if that's correct, but at least it'd explain the mismatch.

wvenable 8 hours ago [ - ]

Seems like a typo given that the ad contains many mentions of K (8K, 32K) and they're all of the 1024 variety.

duskwuff 7 hours ago [ - ]

If you're using base 10, you can get "8K" and "32K" by dividing by 10 and rounding down. The 1024/1000 distinction only becomes significant at 65536.

wvenable 7 hours ago [ - ]

Still the advertisement is filled with details like the number of chips, the number of pins, etc. If you're dealing with chips and pins, it's always going to base-2.

snozolli 7 hours ago [ - ]

only when the 8 bit home computer era started was the 1024 convention firmly established.

That's the microcomputer era that has defined the vast majority of our relationship with computers.

IMO, having lived through this era, the only people pushing 1,000 byte kilobytes were storage manufacturers, because it allows them to bump their numbers up.

https://www.latimes.com/archives/la-xpm-2007-nov-03-fi-seaga...

zephen 8 hours ago [ - ]

> 1024 bearing near ubiquitous was only the case in the 90s or so

More like late 60s. In fact, in the 70s and 80s, I remember the storage vendors being excoriated for "lying" by following the SI standard.

There were two proposals to fix things in the late 60s, by Donald Morrison and Donald Knuth. Neither were accepted.

Another article suggesting we just roll over and accept the decimal versions is here:

https://cacm.acm.org/opinion/si-and-binary-prefixes-clearing...

This article helpfully explains that decimal KB has been "standard" since the very late 90s.

But when such an august personality as Donald Knuth declares the proposal DOA, I have no heartburn using binary KB.

https://www-cs-faculty.stanford.edu/~knuth/news99.html

happytoexplain 9 hours ago [ - ]

Good lord, arrogance and self-righteousness? You're blowing the article out of proportion. They don't say anything non-factual or unreasonable - why inject hostility where none is called for?

In fact, they practically say the same exact thing you have said: In a nutshell, base-10 prefixes were used for base-2 numbers, and now it's hard to undo that standard in practice. They didn't say anything about making assumptions. The only difference is that that the author wants to keep trying, and you don't think it's possible? Which is perfectly fine. It's just not as dramatic as your tone implies.

nerdsniper 8 hours ago [ - ]

I'm not calling the author arrogant or self-righteous. I stated that if a hypothetical person simply assumes that every "kilobyte" they come across is 1,000 bytes, that they are doomed to frequent failures. I implied that for someone to hypothetically adhere to that internal dogma even in the face of impending failures, the primary reasons would be either arrogance or self-righteousness.

adammarples 8 hours ago [ - ]

I don't read any drama or hostility, just a discussion about names. OP says that kilobyte means one thing, the commenter says that it means two things and just saying it doesn't can't make that true. I agree, after all, we don't get to choose the names for things that we would like.

leoc 7 hours ago [ - ]

> The article presents wishful thinking. The wish is for "kilobyte" to have one meaning. For the majority of its existence, it had only one meaning - 1024 bytes. Now it has an ambiguous meaning. People wish for an unambiguous term for 1000 bits, however that word does not exist. People also might wish that others use kibibyte any time they reference 1024 bytes, but that is also wishful thinking.

> The author's wishful thinking is falsely presented as fact.

There's good reason why the meanings of SI prefixes aren't set by convention or by common usage or by immemorial tradition, but by the SI. We had several thousand years of setting weights and measures by local and trade tradition and it was a nightmare, which is how we ended up with the SI. It's not a good show for computing to come along and immediately recreate the long and short ton.

nayuki 6 hours ago [ - ]

> setting weights and measures by local and trade tradition and it was a nightmare

Adding to your point, it is human nature to create industry- or context-specific units and refuse to play with others.

In the non-metric world, I see examples like: Paper publishing uses points (1/72 inch), metal machinists use thousands of an inch, woodworkers use feet and inches and binary fractions, land surveyors use decimal feet (unusual!), waist circumference is in inches, body height is in feet and inches, but you buy fabric by the yard, airplane altitudes are in hundreds to tens of thousands of feet instead of decimal miles. Crude oil is traded in barrels but gasoline is dispensed in gallons. Everyone thinks their usage of units and numbers is intuitive and optimal, and everyone refuses to change.

In the metric(ish) world, I still see many tensions. The micron is a common alternate name for the micrometre, yet why don't we have a millin or nanon or picon? The solution is to eliminate the micron. I've seen the angstrom (0.1 nm) in spectroscopy and in the discussion of CPU transistor sizes, yet it diverts attention away from the picometre. The bar (100 kPa) is popular in talking about things like tire pressure because it's nearly 1 atmosphere. The mmHg is a unit of pressure that sounds metric but is not; the correct unit is pascal. No one in astronomy uses mega/giga/tera/peta/etc.-metres; instead they use AU and parsec and (thousand, million, billion) light-years. Particle physics use eV/keV/MeV instead of some units around the picojoule.

Having a grab bag of units and domains that don't talk to each other is indeed the natural state of things. To put your foot down and say no, your industry does not get its own special snowflake unit, stop that nonsense and use the standardized unit - that takes real effort to achieve.

card_zero 6 hours ago [ - ]

The SI should just have set kilobyte to 1024 in acquiescence to the established standard, instead of being defensive about keeping a strict meaning of the prefix.

soneil 6 hours ago [ - ]

It goes back way further than that. The first IBM harddrive was the IBM 350 for the IBM 305 RAMDAC. It was 5 million characters. Not bytes, bytes weren't "a thing" yet. 5,000,000 characters. The very first harddrive was base-10.

Here's my theory. In the beginning, everything was base10. Because humans.

Binary addressing made sense for RAM. Especially since it makes decoding address lines into chip selects (or slabs of core, or whatever) a piece of cake, having chips be a round number in binary made life easier for everyone.

Then early DOS systems (CP/M comes to mind particularly) mapped disk sectors to RAM regions, so to enable this shortcut, disk sectors became RAM-shaped. The 512-byte sector was born. File sizes can be written in bytes, but what actually matters is how many sectors they take up. So file sizing inherited this shortcut.

But these shortcuts never affected "real computers", only the hamstrung crap people were running at home.

So today we have multiple ecosystems. Some born out of real computers, some with a heavy DOS inheritance. Some of us were taught DOS's limitations as truth, and some of us weren't.

rep_lodsb 2 hours ago [ - ]

RAMAC, not RAMDAC: https://en.wikipedia.org/wiki/History_of_IBM_magnetic_disk_d...

However it doesn't seem to be divided into sectors at all, more like each track is like a loop of magnetic tape. In that context it makes a bit more sense to use decimal units, measuring in bits per second like for serial comms.

Or maybe there were some extra characters used for ECC? 5 million / 100 / 100 = 500 characters per track, leaves 72 bits over for that purpose if the actual size was 512.

First floppy disks - also from IBM - had 128-byte sectors. IIRC, it was chosen because it was the smallest power of two that could store an 80-column line of text (made standard by IBM punched cards).

Disk controllers need to know how many bytes to read for each sector, and the easiest way to do this is by detecting overflow of an n-bit counter. Comparing with 80 or 100 would take more circuitry.

kstrauser 6 hours ago [ - ]

Almost all computers have used power-of-2 sized sectors. The alternative would involve wasted bits (e.g. you can't store as much information in 256 1000-byte units as 256 1024-byte units, so you lose address space) or have to write multiplies and divides and modulos in filesystem code running on machines that don't have opcodes for any of those.

You can get away with those on machines with 64 bit address spaces and TFLOPs of math capacity. You can't on anything older or smaller.

Dwedit 9 hours ago [ - ]

At least it's not a total bizarro unit like "Floppy Disk Megabyte", equal to 1024000 bytes.

amelius 9 hours ago [ - ]

Are you talking about imperial or metric kilobyte?

pif 8 hours ago [ - ]

> Edit: I'm wrong.

You need character to admit that. I bow to you.

BrandoElFollito 7 hours ago [ - ]

> Edit: I'm wrong. Older experience has corrected me - there has always been ambiguity "And that with such great confidence!", indeed.

Kudos for getting back. (and closing the tap of "you are wrong" comments :))

amelius 9 hours ago [ - ]

At this point I just wish 2^10 didn't end up so close to 1000.

hackyhacky 9 hours ago [ - ]

To avoid confusion, I always use "kilobyte" to refer to exactly 512 bytes.

cperciva 8 hours ago [ - ]

Not to be confused with a kilonibble, which is 500 bytes.

kstrauser 6 hours ago [ - ]

People were using metric words for binary numbers since at least the late 1950s: https://en.wikipedia.org/wiki/Timeline_of_binary_prefixes#19...

Which doesn't make it more correct, of course, even through I strongly believe believe that it is (where appropriate for things like memory sizes). Just saying, it goes much further back than 1984.

dgacmu 8 hours ago [ - ]

And networking - we've almost always used standard SI prefixes for, e.g., bandwidth. 1 gigabit per second == 1 * 10^9.

Which makes it really @#ing annoying when you have things like "I want to transmit 8 gigabytes (meaning gibibytes, 2*30) over a 1 gigabit/s link, how long will it take?". Welcome to every networking class in the 90s.

We should continue moving towards a world where 2*k prefixes have separate names and we use SI prefixes only for their precise base-10 meanings. The past is polluted but we hopefully have hundreds of years ahead of us to do things better.

pif 8 hours ago [ - ]

> The wish is for "kilobyte" to have one meaning.

Which is the reality. "kilobyte" means "1000 bytes". There's no possible discussion over this fact.

Many people have been using it wrong for decades, but its literal value did not change.

marssaxman 8 hours ago [ - ]

That is a prescriptivist way of thinking about language, which is useful if you enjoy feeling righteous about correctness, but not so helpful for understanding how communication actually works. In reality-reality, "kilobyte" may mean either "1000 bytes" or "1024 bytes", depending on who is saying it, whom they are saying it to, and what they are saying it about.

You are free to intend only one meaning in your own communication, but you may sometimes find yourself being misunderstood: that, too, is reality.

deathanatos 7 hours ago [ - ]

It's not even really prescriptivist thinking… "Kilobyte" to mean both 1,000 B & 1,024 B is well-established usage, particularly dependent on context (with the context mostly being HDD manufacturers who want to inflate their drive sizes, and … the abomination that is the 1.44 MB diskette…). But a word can be dependent on context, even in prescriptivist settings.

E.g., M-W lists both, with even the 1,024 B definition being listed first. Wiktionary lists the 1,024 B definition, though it is tagged as "informal".

As a prescriptivist myself I would love if the world could standardize on kilo = 1000, kibi = 1024, but that'll likely take some time … and the introduction of the word to the wider public, who I do not think is generally aware of the binary prefixes, and some large companies deciding to use the term, which they likely won't do, since companies are apt to always trade for low-grade perpetual confusion over some short-term confusion during the switch.

marssaxman 6 hours ago [ - ]

Does anyone, other than HDD manufacturers who want to inflate their drive sizes, actually want a 1000-based kilobyte? What would such a unit be useful for? I suspect that a world which standardized on kibi = 1024 would be a world which abandoned the word "kilobyte" altogether.

soneil 6 hours ago [ - ]

> with the context mostly being HDD manufacturers who want to inflate their drive sizes

This is a myth. The first IBM harddrive was 5,000,000 characters in 1956 - before bytes were even common usage. Drives have always been base10, it's not a conspiracy.

Drives are base10, lines are base10, clocks are base10, pretty much everything but RAM is base10. Base2 is the exception, not the rule.

pif 8 hours ago [ - ]

I understand the usual meaning, but I use the correct meaning when precision is required.

stetrain 8 hours ago [ - ]

How can there be both a "usual meaning" and a "correct meaning" when you assert that there is only one meaning and "There's no possible discussion over this fact."

You can say that one meaning is more correct than the other, but that doesn't vanish the other meaning from existence.

jltsiren 7 hours ago [ - ]

When precision is required, you either use kibibytes or define your kilobytes explicitly. Otherwise there is a real risk that the other party does not share your understanding of what a kilobyte should mean in that context. Then the numbers you use have at most one significant figure.

MrDarcy 7 hours ago [ - ]

The correct meaning has always been 1024 bytes where I’m from. Then I worked with more people like you.

Now, it depends.

bigstrat2003 8 hours ago [ - ]

In computers, "kilobyte" has a context dependent meaning. It has been thus for decades. It does not only mean 1000 bytes.

pif 8 hours ago [ - ]

I understand the usual meaning, but I use the correct meaning when precision is required.

wvenable 7 hours ago [ - ]

That's funny. If I used the "correct" meaning when precision was required then I'd be wrong every time I need to use it. In computers, bytes are almost always measured in base-2 increments.

zephen 7 hours ago [ - ]

When dealing with microcontrollers and datasheets and talking to other designers, yes precision is required, and, e.g. 8KB means, unequivocally and unambiguously, 8192 bytes.

kstrauser 6 hours ago [ - ]

Ummm, should we tell him?

zephen 6 hours ago [ - ]

That I can't type worth shit?

Yeah, I already knew that, lol.

But thanks for bringing it to my attention. :-)

kstrauser 6 hours ago [ - ]

I kid good-naturedly. I'm always horrified at what autocorrect has done to my words after it's too late to edit or un-send them. I swear I write words goodly, for realtime!

happytoexplain 8 hours ago [ - ]

The line between "literal" and "colloquial" becomes blurred when a word consisting of strongly-defined parts ("kilo") gets used in official, standardized contexts with a different meaning.

In fact, this is the only case I can think of where that has ever happened.

pif 8 hours ago [ - ]

"colloquial" has no place in official contexts. I'll happily talk about kB and MB without considering the small difference between 1000 and 1024, but on a contract "kilo" will unequivocally mean 1000, unless explicitely defined as 1024 for the sake of that document.

ImPostingOnHN 6 hours ago [ - ]

> on a contract "kilo" will unequivocally mean 1000, unless explicitely defined as 1024 for the sake of that document.

If we are talking about kilobytes, it could just as easily the opposite.

Unless you were referring to only contracts which you yourself draft, in which case it'd be whatever you personally want.

zephen 7 hours ago [ - ]

Knuth thought the international standard promulgated naming (kibibyte) was DOA.

https://www-cs-faculty.stanford.edu/~knuth/news99.html

And he was right.

Context is important.

"K" is an excellent prefix for 1024 bytes when working with small computers, and a metric shit ton of time has been saved by standardizing on that.

When you get to bigger units, marketing intervenes, and, as other commenters have pointed out, we have the storage standard of MB == 1000 * 1024.

But why is that? Certainly it's because of the marketing, but also it's because KB has been standardized for bytes.

> Which is the reality. "kilobyte" means "1000 bytes". There's no possible discussion over this fact.

You couldn't be more wrong. Absolutely nobody talks about 8K bytes of memory and means 8000.