I just spent an hour writing a utility function that I thought was going to take me 2 minutes, tops. Now I'm kinda wondering whether I saved time or wasted it.

The goal was to automate a task that normally takes me about 15 seconds, taking it down to about 5 seconds. Saving 10 seconds might not seem like it's worth any effort, but over the past few years, I've probably done this particular task a few hundred times, and it was looking like I'd be doing it more, so I figured what the heck.

I thought I could write the function in 2 minutes, which would have paid for itself over the next 12 times I did this task. But it took an hour, so now it won't pay for itself until I've done the task a whopping 720 times. Yow!

So now I have to convince myself that I didn't waste the time writing the function, by writing a blog entry that will probably take an hour, rather than the 10 minutes I originally estimated. *sigh*

My new function is called fix-amazon-url -- an Emacs command that modifies an Amazon.com detail-page URL, copied out of my browser field after visiting the detail page, and converts it into the minimal/canonical form.

I do this a lot as I'm writing documentation. Whenever I reference a book, I look it up on Amazon so I can give the readers a nice hyperlink to the book. But I want the link to be short, or it messes up my beautiful doc.

Example: If I want to include a link to The Seasoned Schemer, then grabbing the URL gives me this:

http://www.amazon.com/exec/obidos/tg/detail/-/026256100X/
ref=pd_bxgy_text_1/104-2794524-3459925?v=glance&s=books&n=507846&st=*

Yuck. I'd prefer it to look like this:

http://www.amazon.com/o/asin/026256100X

(I just used my new Emacs command to do that. Only 719 uses to go! It feels like making house payments now.)

Before I had this function, I had to do all this "stuff" that took 15 to 20 seconds. Selecting the URL field in Internet Explorer scrolls it to the end of the URL, so then I have to hit the Home key, carefully double-click the ASIN part of the URL, copy, paste into Emacs, and then type out the front of the URL. Annoying, especially if you do it a lot, which I do.

So I thought, "Hey, I'll write an Emacs command that converts the URL after I copy it directly from IE's URL field into my document." Easy, right?

Well, actually it turned out to be a total pain. The final function I came up with doesn't look too bad:

(defun fix-amazon-url ()
"Minimizes the Amazon URL under the point. You can paste an Amazon
URL out of your browser, put the cursor in it somewhere, and invoke
this method to convert it."
(interactive) (and (search-backward "http://www.amazon.com" (point-at-bol) t)
(search-forward-regexp
".+/\\([A-Z0-9]\\{10\\}\\)/[^[:space:]\"]+" (point-at-eol) t)
(replace-match
(concat "http://www.amazon.com/o/asin/" (match-string 1) (match-string 3)))))

It's almost exactly what I'd envisioned initially. But search-backward-regexp was behaving oddly, and I don't have my Friedl book handy here (Sweet! 718 more bottles of beer on the wall...), so I remembered that it was different, but not how.

So I changed it to use search-backward, since Amazon URLs always start with the same prefix, and then used search-forward-regexp to simultaneously find the bounds of the URL (so I could delete it) and extract the ASIN (so I could replace it).

But wouldn't you know it, my search wasn't doing what I wanted. In Perl, which is to say, Ruby, I would have probably gotten it done quite zippily, since that's the regex syntax I'm most familiar with. But Emacs regular expressions are, well, irregular, which is to say they suck. But they're a heck of a lot better than nothing for parsing strings, so I kept after it.

Most of the wasted hour was my own fault, just me being dumb. I decided that if one big regexp wasn't going to work, I'd split the function into multiple searches, store the various pieces in variables, and replace everything at the end. I got that working, and it took about 35 minutes, mostly since each of the little pieces wasn't working for me, either. Ouch.

Why looky, my original (working) function is still in my *scratch* buffer, so here, take a look:

(defun fix-amazon-url ()
"Minimizes the Amazon URL under the point. You can
paste an Amazon URL out of your browser, put the cursor
in it somewhere, and invoke this method to convert it."
(interactive) (let ((case-fold-search t))
(or
(let (url start end asin)
(and
;; search-backward-regexp acts a bit oddly, so split the search
(setq
start
(search-backward-regexp "http://www.amazon.com" (point-at-bol) t))

(> (skip-syntax-forward "^\\s-" (point-at-eol)) 0)
(setq end
(if (eq (char-before (point)) ?\")
(1- (point))
(point)))

(setq url (buffer-substring-no-properties start end))

(string-match "/\\([A-Z0-9]\\{10\\}\\)/" url)
(setq asin (match-string 1 url))

(progn (delete-region start end) (backward-char 1) (insert (concat "http://www.amazon.com/o/asin/" asin))
(if (string-match "\"$" url) (insert "\""))
t)))
(message "Sorry, didn't find an Amazon URL under the point."))))

This worked, but made me feel rather cowardly, since I knew, deep down, that my first approach was the right one, and it should have worked. This was just gross.

Point Approaching

Which brings us to the Second Significant Point of this impromptu blog essay: most people stop after they get something working, even if it's ugly.

I understand, I've been there. The temptation is almost overwhelming; after all, it works, and who can argue with that? You've got other stuff to do.

My first point, incidentally, since I guess I forgot to mention it, is that you should get in the habit of automating tedious, repetitive tasks. Doesn't matter if they're in your editor, or your shell, or your desktop environment, or your debugger. If you don't get into this habit, then you're slow.

Yep, you. Sloooooooowwww. Like molasses in January. Of course, when I say "you", I don't mean YOU, since you've obviously completed all your work early, by virtue of having automated away all the tedium, and you have free time to read blogs. Good for you!

But there are gadzillions of engineers who are out there, typing away, cranking out boilerplate code by hand when they could clearly be cranking all that boilerplate code out via automation. (I hope your Irony Detector just went off.)

Enough About You, Back To Me

So what was wrong with the 32-line version above? Nothing! Well, nothing except for the fact that it's 32 lines, and it should have been 1/3 that.

Oops, just had an idea, so I'm going to leave you for a moment. Well, I'm guessing this will be more like an hour, which means it'll probably take me a day, but it's pretty nifty, so I'll at least go write it down... [9:43pm]

[10:19pm] ...OK, I'm back. I just spent the last 36 minutes doing something WAY cool -- another Emacs automation task, but this one's gonna pay for the whole kit and kaboodle, whatever that means. I'm not sure if I'll have time to talk about what it is, but maybe at the end if we're not too tired, 'k?

Anyway, that 32 lines was nagging me. A big blob of working code. Argh.

I decided to go back and try again. I've written big systems with tons of code, and only realized later that they were big because I wrote tons of code. If you can do it with less code, it'll be a better system. Fewer bugs, easier to understand, easier to modify. Big systems suck.

One of the requirements for writing a good system is that you refactor it constantly. One important kind of refactoring is finding the cleanest way to express something. It doesn't matter that it already works -- that's the whole point of refactoring: taking code that works, and making it better, little by little.

You might say I "rewrote" the 32-line function, and sure, I did. But some refactorings are fairly intrusive; e.g. the one where you take a method, turn it into an entire new class, and then split it up. I'd argue that taking a function that works and writing a newer, smaller version of it is a kind of refactoring, particularly if you do it before you call the function "finished" and check it in.

Whatever. Regardless of whether it's refactoring or not, taking code you just wrote and squeezing it down to its simplest computational form is good software engineering. It's a good habit to be in.

Yep, that was another Point

The Third Major Point of this essay is that you need to develop good coding habits. Think of it as being just like hygeine. You have to shower, brush your teeth, pluck your nose hairs, etc. If you don't, you will start to stink. And if you don't develop good code-hygeine habits, your code will start to stink.

And if you don't develop good automation habits, then your productivity will start to stink, although that's technically Point Number One, so I won't harp on it.

So I went back to my 32-line function and prevailed this time. Emacs is a pretty nice environment. It has incremental-development features that you'd just die for if you knew about them. For instance, one of them is that you never have to restart Emacs as you write enhancements for it.

Happiness is never having to restart.

Your whole compile/debug/stop-app/start-app cycle, which usually includes a "get app back to the state it was in on the last compile cycle" step, simply doesn't exist in Emacs. The difference is like producing a document using a word processor instead of using pen and paper. With pen and paper, if you make a mistake, you're screwed. But word processors have a backspace key, and that makes life a whole lot better. Emacs is the word processor of development environments.

With most development environments, if you need to make a change to your code and test it, you have to shut your application down and restart it. Over and over and over, every change you make. Some debuggers sometimes allow you to tweak your code while the process is running, but it's not very common, and it's extremely core-dump prone in C and C++. They've finally added limited support for it in the JVM, although it's not a first-class language feature.

Languages that facilitate rapid prototyping, e.g. Ruby, Python and (maybe?) Perl, allow you to develop the application incrementally, usually using some sort of interactive command prompt where you enter in expressions and definitions. This is nice.

JVM scripting languages let you do this too, by working around the JVM's limitations using a trick where they create a new proxy class with a different name, each time you change and reload your code. Double-indirection. Works, but it's expensive. Not that you'd notice, normally, since you're usually only doing incremental development on your local box, and it all gets frozen into bytecode for production deployments. But I digress.

Emacs is better than ALL of those environments, because it has lisp-interaction-mode, and there's nothing quite like it in the world. You have this working area called the *scratch* buffer, which works sort of like a command shell, except there's no prompt. Instead, you type in some code using normal editing, since it's just like editing in a file. When you're happy, you hit a special key to evaluate it. You can have broken bits of code lying around, just like in a mechanic's workshop, and evaluate any little piece of it you like.

For instance, you can write:

(defun my-function ()
"This is gonna be cool..."

and then just leave it sitting there. In other language environments, there's a prompt that's expecting more input from you. You have to finish the function definition right then and there.

But in lisp-interaction-mode, no big deal, it's just text until you try to evaluate it. So you can move somewhere else in the buffer, write another expression, or part of one, and come back and finish my-function later.

This is the ultimate form of unit testing. You can literally test every single component of a function as you go. You can extract pieces and go off and evaluate them separately, ensuring they work right then and there, rather than waiting until you run the whole thing to see if you guessed right. This process is called bottom-up programming. You grow your code a little bit at a time, assuring yourself that the components work before you start using them to make bigger pieces.

Once you learn about this, you really Really REALLY start to miss it in other environments. People have even found ways to create interaction-modes for some languages (in Emacs, of course) by sending the code off to a subprocess, fetching the result, and then printing it. Doesn't work that well, though, because in most languages you can't really tell where the expression is supposed to begin and end. Well you can, but it requires complex parsing. Lisp's s-expressions really shine here.

Anyway, this probably sounds like babbling to you if you haven't seen it, and I think you have to see it in action to fully appreciate it. So I'll summarize with a slightly different point, which is...

Unit Testing is a Great Habit

Not just a good one: a great one. I recommend, nay, strenuously recommend getting hold of one of the new Pragmatic Unit Testing books from the Pragmatic Programmer guys, such as Pragmatic Unit Testing in Java with JUnit. (Score! 717 bottles of beer on the wall...)

Not only is it a great habit, but it's one that almost nobody practices because it's a hard habit to get into. Emacs makes it easy when you're developing Emacs-Lisp code, but Emacs is less helpful if you're using it for doing, say, Java coding. You just have to be disciplined.

Unit Testing is more like going to the gym than like brushing your teeth. Most people just can't bring themselves to spend the time.

OK, let's wrap up, since I'm obviously turning my 10-minute estimate for this blog entry into a complete joke; it's been more like 2 hours.

The point I set out to make here, which is evidently Point Number Five (what with Unit Testing being Number Four), is that it's OK to "waste" some time writing productivity-enhancing tools, scripts, and editor customizations, even if it doesn't seem likely that you'll get the time back.

To be sure, you should try to exercise intelligent judgement when deciding what things to try to automate, and as you're estimating how long the automation part will take. But if you wind up spending a bit too much time on it, like I did for the URL-munger function today, that's sometimes OK anyway.

Why?

Because you're maintaining your good automation habits, and your good refactoring habits. And you're gaining experience with your automation environment. Next time I write an elisp function I'll be better prepared to deal with Emacs's regular expression syntax (and the myriad other little details I wrestled with during that hour.)

Don't let lack of experience discourage you. I knew essentially nothing about automating my Windows desktop a few months ago, other than that it's possible to do it. But a little at a time, an hour here, an hour there, I've been figuring it out. At this rate it'll be years before I'm halfway decent at it. But I plan on being around years from now, don't you?

So give it a try! Write a little plug-in for your editor, whether it's Eclipse, or JEdit, or Emacs, or VIM, it doesn't matter. All good IDE environments are extensible. Don't try to make something big -- everyone always has some big plug-in they want to write, to learn how to do it, but they can never find time to get around to it. Duh. That's because you never WILL have time. Write small stuff, and eventually it will grow into big stuff.

Point Number Five

Oh wait, we just did that one. Cool! So I guess we're done. Now I can tell you what the amazing idea was that made me run off for 36 minutes, leaving you there just waiting. How rude of me.

See all the pretty syntax-highlighted Lisp code in this blog entry? It looks exactly the way it does in my Emacs buffer. In particular, the colors are just right, and that's a first for me. Until just now, I'd been using GNU source-highlight for syntax-coloring my code -- for blogs, documentation, tutorials, etc. But the GNU program generates fairly evil HTML output, and you have to hand-modify it anyway.

What I've always wanted is the ability to take a section of an Emacs buffer, and turn it into HTML -- basically, put in the right font-tags wherever Emacs has colored the text. As I was writing this blog entry, it occurred to me that I actually know how to this now, and I went off for a few minutes and did it.

Here's the code, which I've run on itself to syntax-highlight itself:

(defun syntax-highlight-region (start end)
"Adds <font> tags into the region that correspond to the
current color of the text. Throws the result into a temp
buffer, so you don't dork the original."
(interactive "r")
(let ((text (buffer-substring start end)))
(with-output-to-temp-buffer "*html-syntax*" (set-buffer standard-output) (insert "<pre>")
(save-excursion (insert text))
(save-excursion (syntax-html-escape-text))
(while (not (eobp))
(let ((plist (text-properties-at (point)))
(next-change
(or (next-single-property-change
(point) 'face (current-buffer))
(point-max))))
(syntax-add-font-tags (point) next-change)
(goto-char next-change)))
(insert "\n</pre>"))))

(defun syntax-add-font-tags (start end)
"Puts <font> tag around text between START and END." (let (face color rgb name r g b)
(and
(setq face (get-text-property start 'face))
(or (if (listp face) (setq face (car face))) t)
(setq color (face-attribute face :foreground))
(setq rgb (assoc (downcase color) color-name-rgb-alist))
(destructuring-bind (name r g b) rgb
(let ((text (buffer-substring-no-properties start end)))
(delete-region start end)
(insert (format "<font color=#%.2x%.2x%.2x>" r g b))
(insert text)
(insert "</font>"))))))

(defun syntax-html-escape-text ()
"HTML-escapes all the text in the current buffer,
starting at (point)."
(save-excursion (replace-string "<" "&lt;"))
(save-excursion (replace-string ">" "&gt;")))
 
Man that's cool.  Except for the fact that Emacs colors symbol names
pink, I guess. 

Let's try it on some Java code:

    /**
* Compresses a byte array with gzip.
*
@param b the array of bytes to compress
*
@param useZLIB true to use ZLIB, false to use GZIP.
*
@return the compressed array
*/
public static byte[] compress ( byte[] b, boolean useZLIB )
{
try {
ByteArrayOutputStream outBuffer = new ByteArrayOutputStream();
DeflaterOutputStream gzip;
if (useZLIB) {
gzip = new DeflaterOutputStream ( outBuffer );
} else {
gzip = new GZIPOutputStream ( outBuffer );
}

gzip.write ( b );
gzip.close();
byte[] zipped = outBuffer.toByteArray();
outBuffer.reset();
return zipped;
}
catch ( Exception xc ) {
xc.printStackTrace();
return null;
}
}

Sweet. Works like a charm! How about some Perl code, then. Continuing in the self-referential vein, I'll use some Perl code from a script I wrote to syntax-highlight Python code:

    #!/usr/bin/perl
    #
# Renders a jython class for putting into an HTML page.
# Author: Steve Yegge, Feb 18, 2003
#
if ( !@ARGV ) {
print "Usage: py2html <pythonfile+>\n";
exit(0)
}

my $keyword = "0000FF";
my $string = "4169FF";
my $function = "CC0000";
my $comment = "008800";
my $self = "777777";

my $in_comment = 0;
my $in_docstring = 0;

for my $file (@ARGV) {
&process_file($file);
}

exit(0);

Awesome. I can't believe I didn't do this before. Doubtless it will need improvements. It barfed on the Perl code above because perl-mode had underlined the @ARGV, which resulted in the face name coming back as a list. But I added this line:

     (or (if (listp face) (setq face (car face))) t)

and it fixed it. Never had to restart Emacs. I just added the line, re-evaluated the function, switched back to the Perl buffer, and ran my command.

The beauty of it is: I can HTML-ify anything in an Emacs buffer, not just source code. Here's what my buffer-list looks like right now:

 MR Buffer           Size  Mode         File
 -- ------           ----  ----         ----
.*  blog-function-2hours.txt  24801 Text c:/work/adj/articles/blog-function-2hours.txt
* *scratch* 2989 Lisp Interaction
% *html-syntax* 143 Help
efuncs.el 28313 Emacs-Lisp c:/home/stevey/emacs/lisp/efuncs.el py2html 3884 Perl c:/misc/tools/bin/py2html
Foo.java 549 JDE c:/tmp/Foo.java foo.html 903 HTML helper c:/tmp/foo.html % *Apropos* 2223 Apropos
*% *Calc Trail* 234 Calc Trail
*% *Calculator* 51 Calculator
* froo 545 Fundamental
% *info* 57751 Info
% *Help* 1291 Help
typing-test.el 38224 Emacs-Lisp c:/home/stevey/emacs/lisp/ttest/typing-test.el
* foo 5895 Lisp Interaction
myfont.el 16160 Emacs-Lisp c:/home/stevey/emacs/lisp/myfont.el opus.rb 80 Ruby c:/tmp/opus.rb * *ruby* 4110 Inferior Ruby
blog-whirlwind-tour.txt 32489 Text c:/work/adj/articles/blog-whirlwind-tour.txt
* *shell* 2 Shell c:/home/stevey/
* *Messages* 1362 Fundamental
% *Completions* 172 Completion List
* *vc* 130 Fundamental

Well, close enough. The columns line up better in Emacs. But still. Pretty cool, eh?

Hope you enjoyed today's blog entry! I enjoyed writing it. Had no idea it was gonna turn out this way, but hey, that's why I write them.

(Published Sep 24th 2004)


Comments

Excellent. Also note that saving time is not the only reason to automate repetitive tasks. You can also save thought as well.

In the amazon-url-fixing example above (which, as detail-page QA, I can appreciate and will probably steal), you don't have to stop thinking about what you're writing. You don't have to "remember to fix them all up later."

You let software (in this case, emacs) handle much of the depth of thought. You save context-switching and it makes your other work faster.

Can you measure that as easily as time? No, not really. But for me personally it's an order-of-magnitude greater effect than the time effect.

BTW if all of your documentation is for folks at Amazon, you can eliminate .amazon.com as well.

Sorry if any of this repeats what you said -- I had to go this entry pretty fast, even though it was really good.

Posted by: Raif at September 24, 2004 02:13 PM


OK, that part about order-of-magnitude wasn't very clear. Convert the timestamp to Seattle time and you'll see why.

What I was trying to say was this. In my experience, the direct time savings from automating a task is far outweighed by the indirect time savings -- avoiding context switching and letting software do the thinking for you.

Or put another way, the really big advantage to automating tasks like this is not that it's faster per se, but that it makes you more productive in the task at hand.

Posted by: Raif at September 24, 2004 05:53 PM


Here you go, you can track the value of your function with this:

(defadvice fix-amazon-url (around fix-amazon-url-value-counter activate)
"Keeps track of how many times you've used this function" ad-do-it (customize-save-variable 'fix-amazon-url-invocation-counter (1+ (or (and (boundp 'fix-amazon-url-invocation-counter) fix-amazon-url-invocation-counter) 0))) (message "Please use me %d more times" (- 720 fix-amazon-url-invocation-counter)))

Posted by: Chris T. at September 24, 2004 06:39 PM


OK, now that I've read the rest of the blog:

My irony detector went off here:

"Emacs makes it easy when you're developing Emacs-Lisp code, but Emacs is less helpful if you're using it for doing, say, Java coding."

Hmm. A development environment that's best suited for developing the development environment... It's like a perpetual motion machine.

Dumping a highlighted buffer as html:

There's an 'htmlize' feature that someone wrote, which will turn a buffer into html. But kudos to you for writing it.

http://fly.srk.fer.hr/~hniksic/emacs/htmlize.el

Posted by: Chris Thomas at September 24, 2004 06:53 PM


Thanks - htmlize.el looks really, really cool. Thanks for not telling me about it before. ;-)

This is why we need to productize Emacs, by the way. All the elisp-archive listings suck, as none of them are complete.

My 35-minute version is a little baby version of htmlize.el. I had fleeting thoughts of multi-version compatibility, overlay support, working on files as well as buffers, properly handling HTML escapes... just about all the things that are fully supported in htmlize.el. I just figured Derek would do it, once I'd done the proof of concept. :-)

Interesting how similar my basic approach is to theirs. Scan the text properties at change boundaries, look up the face name in the rgb alist, format as hex, insert font tags, escape the HTML... I guess I had it *basically* right.

Regarding your irony filter - point taken, but the various Lisp-y folks out there (e.g. Richard Gabriel, JWZ, Paul Graham) would say that you drew the wrong conclusion. You concluded that Emacs should be better at helping you develop things other than itself. They would conclude that you should write everything in Lisp (since you can get essentially the same level of interaction if you're doing Scheme or Common Lisp development.)

Posted by: Steve Yegge at September 24, 2004 10:25 PM


Cool -- I've always wanted that feature somewhere, but usually ended up googling for the term concat Amazon to get a canonical link. I always imagined that there'd be some internal or external service to do it, but never found it.

Posted by: Andrew W. at September 24, 2004 11:23 PM


Even if you never "gain the time back" by executing the function enough times, you've increased your work satisfaction greatly. You got to spend a happy hour programming, whereas you were probably getting slightly annoyed every time you had to convert a url.

Sometimes automating a task is more about keeping things interesting than saving time in the long run.

Posted by: Jon S. at September 27, 2004 01:03 AM


BTW you could also take into account the amount of time /other/ people save into your cost-benefit equation :)

Posted by: Andrew W. at September 27, 2004 10:40 PM


Yes, but I'd have to subtract out all the time they waste reading my blog, so I think I'll just keep it simple. :)

Posted by: Steve Yegge at September 28, 2004 12:20 AM,


"Emacs is the word processor of development environments."

That's the Velvet Elvis of metaphors.

Posted by: Brian R. at October 2, 2004 02:06 AM


Several things: One, automating things is cool and useful. For example, and in a similar vein, I once spent half an hour or so muttering perl incantations, to get a script that would transform a url from the onlines, devo, or master, to one for my desktop, including adding in the right port and such. This made it really easy to find a page of interest elsewhere, and dissect on my desktop.

I too believe in the Rule of Three, id est, that any task I do more than twice ought to be automated. Browse my ~/bin directory on my home box (or even my work one) to see the things I have done to automate my life. For example, any one not using command line completions for our common commands (like, for example, /apollo/bin/runCommand or pubsublisten) is doing WAY too much typing.

Posted by: Timothy K. at October 13, 2004 11:34 PM