Distantly Yours: Web Design and Photos in Bloomington, IN, by Dan Hiester


Musings on a lossy text compressor

A few years ago, I had a silly idea: to apply JPEG or MPEG technology to text. I didn’t fully understand how it would work at the time. I envisioned a compressor that encoded text, and could decode it legibly, but slightly mangled – just like a JPEG or an MP3. Shut up. I thought it was funny.

Further reflection made me realize that converting an ASCII stream to binary, then converting that to a PCM wave form and running an MP3 encoder on it would result in random ASCII gobbledigook. But I still dreamed about the original idea for a while, anyway.

Until I thought of a very practical application.

Compressors remove what we won’t miss… and so do good editors

Think for a moment about how JPEG or MP3 work. They both look for data that can be removed from the file, with little chance of the user noticing its absence. When I thought about it this way, I realized that a lossy text compressor would be an entirely different beast from a lossy sound or image compressor – it would be a good editor.

Think about it. We all default to passive voice, resulting in sentences bogged down with more words than we need. Individual paragraphs take up more lines than they should. Readers stop reading prematurely.

For example, a phrase like, “I would have liked to,” gets shortened to, “I wanted to.” Or, to edit my own excerpt from my previous article, “One thing I noticed about music reviews is that they seem less fixated on getting a 9.0+ score,” gets stripped down to: “Music reviews seem less fixated on a 9.0+ score.” As you can see, the difference is substantial.

Imagine if bloggers were edited by their own blog

In this sense, a lossy text compressor could be a brilliant feature to add to blog software. Sure, it wouldn’t replace the talent and insight of a professional editor. However, amateur writers – who lack the budget and inclination to pay editors – could benefit greatly from such a feature.

If I were both an application developer, and an English major, I’d totally make it.

Comment [2]

Drake

If you could computerize the process of condensing thoughts into concise words – ala Abraham Lincoln – you could make a fortune.

Though I think many English scholars would die a little inside.

Dan Hiester

English scholars would only die a little inside because a lame communications major like me beat them to it.

Rules:
HTML is not allowed, but Textile is. Lost? Check out some Textile Help.
Gravatar:
Your email address can be used to place your personal Gravatar next to your comment.