Musings on a lossy text compressor
A few years ago, I had a silly idea: to apply JPEG or MPEG technology to text. I didn’t fully understand how it would work at the time. I envisioned a compressor that encoded text, and could decode it legibly, but slightly mangled – just like a JPEG or an MP3. Shut up. I thought it was funny.
Further reflection made me realize that converting an ASCII stream to binary, then converting that to a PCM wave form and running an MP3 encoder on it would result in random ASCII gobbledigook. But I still dreamed about the original idea for a while, anyway.
Until I thought of a very practical application.
Compressors remove what we won’t miss… and so do good editors
Think for a moment about how JPEG or MP3 work. They both look for data that can be removed from the file, with little chance of the user noticing its absence. When I thought about it this way, I realized that a lossy text compressor would be an entirely different beast from a lossy sound or image compressor – it would be a good editor.
Think about it. We all default to passive voice, resulting in sentences bogged down with more words than we need. Individual paragraphs take up more lines than they should. Readers stop reading prematurely.
For example, a phrase like, “I would have liked to,” gets shortened to, “I wanted to.” Or, to edit my own excerpt from my previous article, “One thing I noticed about music reviews is that they seem less fixated on getting a 9.0+ score,” gets stripped down to: “Music reviews seem less fixated on a 9.0+ score.” As you can see, the difference is substantial.
Imagine if bloggers were edited by their own blog
In this sense, a lossy text compressor could be a brilliant feature to add to blog software. Sure, it wouldn’t replace the talent and insight of a professional editor. However, amateur writers – who lack the budget and inclination to pay editors – could benefit greatly from such a feature.
If I were both an application developer, and an English major, I’d totally make it.
Drake
Dan Hiester