After a long hiatus (I have been battling cancer, and am now a proud leukemia survivor), I started coding seriously again recently. Well I never really stopped, but I started a bigger project than usual, on which I am learning enough that I want to blog about it here.
I am actually writing a book on my fight against leukemia, and to do this, I am using asciidoc. I documented a lot of my life during that period of time on a dedicated blog, and of course, I wanted to use that as a base to work on the book, so I decided to write a tool to slurp the blog's RSS, and save each post in a separate text file. The posts' content then have to be converted from HTML to asciidoc , the images have to be downloaded and so on.
The first thing, obviously, is parsing the RSS feed. While it is fairly simple to write a parser using my favorite language, perl, and XML::Simple for example, I researched if, by coincidence, the work was not already done. And actually it is: there is a ruby gem, aptly called "simple-rss" that makes parsing RSS a breeze. The code was so simple, concise and elegant (more on that in another post) that I decided to write the whole thing in Ruby: as an added benefit, I would learn a new language. And the more I worked on the project, the more I liked it. But the thing that made me fall in love with it happened when I started writing my first test suite.
At first, I did not write tests. I did not really know where I was going, and I just hacked a quick prototype of what I wanted. However, the more the project was advancing, the more I realized that this was going to be more than just a 100 lines script, and that I would go way faster, not to mention being safer, if I actually applied TDD on the whole project. More on that later. So I researched a good ruby test framework. I am growing tired of the whole Test::Unit, JUnit norm, and so, just to learn something new, I chose rspec. Now to explain what my first test was, I have to explain a few things about my project.
To convert the HTML to asciidoc, I chose to use a simple approach that I am familiar with: simple regexes, that translate common HTML constructs into their asciidoc equivalent. The reason I chose to do this is the following: the HTML that makes the content of a post on Blogger is most of the time super simple, so I don't need a complex parser, and coming from a perl background, I am very familiar with regexes. So I have quite a lot of regexes, that execute after one another, and basically, my test suite is very simple: I need, for each regex, to test it's validity with a few different strings, and then test a few more detailed strings in order to test the whole chain.
So the idea is very simple: having a file which contains the HTML strings and how they should be transformed to asciidoc, read it, and execute the test for each line, and that's it. That way, I never need to touch the code of my test again, I just have to add more strings to my file as the need arise. Actually, this is TDD at it's finest: as I encounter a particular HTTP construct I want to transform to asciidoc, I add it to my suite, and then when the test ends up passing, I know I'm done. I did not know how to (de)serialize data to a file in Ruby, yaml seemed to be a popular approach, so I just chose that and started writing my first rspec test ever.
15 minutes later, I had the following.
# encoding: utf-8
describe "html-to-asciidoc" do
before :all do
# load all the patterns and the expected transformations
@test_map = YAML::load(File.read("data/tests-html-to-asciidoc.yaml"))
describe "convert!" do
it "transforms html correctly to asciidoc" do
@test_map.each_pair do |html_source,asciidoc|
html = Html2Asciidoc.new(html_source)
html.should eql asciidoc
And the following yaml file:
# Normal text
# br tags
'<br />': "\n"
'<br /><br />': "\n\n"
# bold, italics, underlines
And it works, out of the box. It fits my need perfectly, it is dead simple and easy to maintain, again it took me, without any previous experience in yaml or rspec, 15 minutes total to put together. It is so simple that it actually encourages good coding practices and using TDD, and even if I compare it to perl, which I am more familiar with, I don't think it would be as concise just as quick to write and put together. Actually, increasingly, for most things, I tend to favor ruby even if I have to take some time learning things in the process, just because the code is so much leaner in the end, and because of the simplicity of putting together a test suite, which is paramount for me. I'm just totally in love with Ruby.