Re: [OT] UTF8-BOM and text encoding detection (was: UTF8-BOM not disregarded in CSV import)

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [OT] UTF8-BOM and text encoding detection (was: UTF8-BOM not disregarded in CSV import)

Warren Young
On Jun 29, 2017, at 11:18 AM, Simon Slavin <[hidden email]> wrote:
>
> On 29 Jun 2017, at 5:39pm, Warren Young <[hidden email]> wrote:
>
>> Before roughly the mid 1970s, the size of a byte was whatever the computer or communications system designer said it was.
>
> You mean that size of a word.

That, too.  Again I give the example of a 12-bit PDP-8 storing 6-bit packed ASCII text.  The word size is 12, and the byte size is 6.

The same machine could instead store 7-bit ASCII from the ASR-33 in its 12-bit words, and we could then speak of 7-bit bytes and 12-bit words.  This, too, was a thing in the PDP-8 world, though rarer, since the core memory field size was 4k words, and the base machine config only had the one field, so 5 wasted bits per character was a painful hit.

> The word "byte" means "by eight”.

I failed to find that in an English corpus search.[1]  A search for “by eight” turns up hundreds of results (apparently limited to 600 by the search engine) but none of the matches is near “byte.”  A search for “by-eight” turns up only one result, also irrelevant.

I suspect the earliest print reference to that definition would be much later than the actual coinage of the word in 1956 by Werner Buchholz, making it a back-formation.  I’d expect to find that definition in print only after the microcomputer revolution that nailed the 8-bit byte into place.

Further counter-citations:

   https://stackoverflow.com/questions/13615764/
   https://en.wikipedia.org/wiki/Byte#History
   https://en.wikipedia.org/wiki/Talk:Byte#Byte_.3D_By-Eight.3F
   https://english.stackexchange.com/questions/121127/etymology-of-byte

I wish I could find a copy of

   Buchholz, W., January 1981:
       "Origin of the Word 'Byte.'"
       IEEE Annals of the History of Computing, 3, 1: p. 72

that is not behind a paywall, as Buchholz is the man who coined the word for the IBM 7030 “Stretch,” which had a variable byte size.  It used 8-bit bytes for I/O, but it had variable-width bytes internally.

We wouldn’t have needed the term “octet” if “byte” always meant “8 bits”.


[1]: http://corpus.byu.edu/coca/

> With each bit of storage costing around 100,000 times what they do now

A bit of trivia I dropped during editing from the prior post: a 5 MB RK05 disk drive cost about the same as a luxury car.  (About US $40,000 today after CPI adjustment.)

Cadillac with all the options or RK05?  Let me think…RK05!
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [OT] UTF8-BOM and text encoding detection (was: UTF8-BOM not disregarded in CSV import)

Simon Slavin-3


On 29 Jun 2017, at 8:01pm, Warren Young <[hidden email]> wrote:

> We wouldn’t have needed the term “octet” if “byte” always meant “8 bits”.

The terms "octet" and "decade" were prevalent across Europe in the 1970s.  Certainly we used them both when I was first learning about computers.  My impression back then was that the term "byte" was the American word for "octet".

The web in general seems to agree with you, not me.  It seems that a word was made of bytes, and bytes were made of nybbles, and nybbles were made of bits, and that how many of which went into what depended on which platform you were talking about.  This contradicts my computing teacher who taught me the "by eight" definition of "byte".

Simon.
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [OT] UTF8-BOM and text encoding detection (was: UTF8-BOM not disregarded in CSV import)

Peter da Silva
In reply to this post by Warren Young
I always saw byte as something that was relevant for systems that could address objects smaller than words... “byte addressed” machines. The term was mnemonic for something bigger than a bit and smaller than a word. It was usually 8 bits =but there were 36-bit machines that were byte addressable 9 bits at a time. The DECsystem 10 guys also referred to the other subdivisions of their 36 bit words as bytes, sometimes, they could be 6, 7, 8, or 9 bits long. I think they had special instructions for operating on them, but they weren’t directly addressable.

There was also a “nibble”, smaller than a “byte”, which was always 4 bits (one hex digit). I don’t think any of the octal people used the word for their three bit digits.
 

_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Reply | Threaded
Open this post in threaded view
|

Re: [OT] UTF8-BOM and text encoding detection (was: UTF8-BOM not disregarded in CSV import)

Niall O'Reilly
On 29 Jun 2017, at 20:19, Peter da Silva wrote:

> The DECsystem 10 guys also referred to the other subdivisions of their
> 36 bit words as bytes, sometimes, they could be 6, 7, 8, or 9 bits
> long. I think they had special instructions for operating on them, but
> they weren’t directly addressable.

   A byte could be 1..36 bits long.

   The special instructions used a data structure called a byte pointer
   to reference the field within a word where the byte was to be placed
   or retrieved.  Four different formats of byte pointer existed, not
all
   supporting the full range of possible byte sizes.

   One of these days, when I really have too much free time, I must run
   up a VM with the Panda TOPS-20 distro and find some examples of
   interesting byte sizes which were actually used for something. 8-)

   /Niall
_______________________________________________
sqlite-users mailing list
[hidden email]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
12