Friday, January 25, 2013

lossless compression

Seems to me there can't really be such a thing as lossless compression of data. Any system of compression must involve some assumption or prior knowledge of the information that is important in order to discard the rest. This is supported to my mind both philosophically: there is no redundant information in the universe; and empirically: no system of audio or video capture and replay ever looks or sounds like the original and they all differ from each other. There is not even a general movement towards 'better' as more advanced technology is involved. This is because one person's background noise is another person's atmosphere.


Blogger Rowan Collins said...

I think you're referring here to "digitisation" rather than "compression". There is certainly philosophical backing to the notion that no digital representation of an analogue phenomenon can ever be "lossless", because in order to have a perfect digital representation, the analogue original would itself need to be perfect, in the sense of a Platonic Ideal. So a truly perfect circle can be described mathematically, and therefore digitally, but no circle in reality is perfect. But then again, no analogue recording method is really "lossless" in that sense either - however carefully manufactured, the groove on a piece of vinyl could never faithfully represent all the subtleties of the sound in the recording studio, and nor would two copies of it be precisely the same.

However, once digitised, there are many ways of representing the same data, some shorter than others. For instance, if you have an image of a blue square... in nature there's no such thing as a perfectly blue square, as there's no such colour as "pure blue", and no area of colour would be uniformly one shade if you inspected it closely enough. But once you accept that first approximation, a raster-based digital image might store "pure blue" as the hex sequence "0000FF", and repeat it 10000 times to represent a large area of plain blue. Storing that as instructions stating "repeat 0000FF 10000 times" will clearly take less space. Obviously, you also need to have instructions stored somewhere to interpret the compressed file, but if you can come up with instructions that work well across lots of files, you still come out ahead, and you've lost nothing of the "original" digital data because you can convert back to the uncompressed form whenever you want.

A "lossless" audio compression algorithm is just doing this same trick, but tuned to spot the kind of patterns that show up when you digitise an audio signal. Nothing is filtered out by the compression, so the compressed file contains exactly what was recorded by the digital microphone, however (in)accurate that is as a representation of the original sound. A "lossy" compression, in contrast, makes assumptions about which bits of information are "important" and throws some away; once thrown away, that information can never be retrieved from the compressed file.

1:04 PM  
Blogger Seb said...

Yup, good point well made. So the loss is at the point of recording/digitising/sensing and is particular to the sensitivities and intentions of the recording etc. device. However, the two processes (digitisation at input and preparation for storage) are not mutually independent. The digitisation of a blue square is surely driven by a-posteriori knowledge of how it will be compressed, stored, expanded and used. The intention of the original is the key point if we are to avoid the digitiser adding their own personal embellishments (or deleting them). Take a typeface as an example (chosen randomly - not): you can very efficiently represent the intention using mathematical curves and then printing will take care of the idiosyncrasies of different inks, materials etc. rather than trying to digitise every nobble and smudge on every medium. The equivalent in music would be something like a player-piano which digitally records the way a piece of music is played by punching holes in paper and then the nuances of the piano sound are reproduced by the piano. When an analogue recording is digitally remastered there is loss and when it is stored (your point notwithstanding) there is further loss - otherwise every copy would be a 'master' copy - and when it is played there is further loss.

9:30 AM  

Post a Comment

<< Home