The player piano made significant advances over its predecessor, the
music box.
The music box could only play one song, and for a voice, it only had
little metal strips that would be plucked to make a tune. There were
advanced music boxes that came out later, that could play more than one
song.
People didn't stop making music boxes.
A player piano was an output device for playing back human behavior.
That's right, those little music scrolls contained holes that transcribed
the behavior of the pianist's keyboard articulations into something that you
could hold in your hand.
And you could recreate the pianist's behavior later, on another device,
over and over, as many times as you liked. The pianist wasnt needed again,
once the recording was made.
The roll of paper played more than just strips of metal, it had an entire
piano at its disposal! It was a form of software in terms of it being a way
to program a player piano. And these piano rolls could be bought and sold,
just like any other commodity.
And then Edison introduced the gramophone. Not only did it record some
representation of the panists' behavior, but it reproduced the original
sound that was heard at the initial recording. That sound could take the
form of pianos, flutes, trombones, ....., or simply the human voice,
anything that could make a sound.
People stopped making player pianos.
Computers have historically been limited by the amount of audio and video
they could store. Good sounding audio might take 10 million bytes per minute
depending on several things, but in terms of the old school, audio and video
recording makes huge files.
Yet hardware is getting cheaper and cheaper while at the same time
getting better and better, having more and more capabilities.
If the typing behavior of people may be recorded, why not the audiovisual
behavior as well. If typing behavior can be simulated, why not audiovisual
behavior?
The threshold for this comes when we can make comparisons between two
audio segments, and determine if they are "equivalent." Perhaps that is
already here.
In some ways it makes sense to keep recording text, and keep working with
that, while using converters to go from voice-to-text and from
text-to-voice. But then again, if a recording of behavior is to be played
back, wouldn't it be nice to keep the original inflections, tonality, and
"voice" of the original speaker?
People are still making chatterbots.
|