Standup 12/01/2008: Fun with libxml

edit Posted by David Goudreau on Monday December 01, 2008 at 10:03PM

Interesting Things

  • Libxml has been giving us some more strange behavior on Linux. If you do
parser = XML::Parser.new
parser.string = '<foo></foo>'
document = parser.parse

# Now watch me fail, but only on Linux!
parser.string = '<bar></bar>'
document = parser.parse
  • We're hosting a MagLev tech talk today compliments of Martin McClure.

  • Joseph Palermo has won the annual Pivotal Labs Mustache Competition. Granted, he was the only entry. But don't let that affect your admiration of his work. Photo to follow.

Standup 11/26/2008: Assorted Incompatibilities

edit Posted by brianj on Wednesday November 26, 2008 at 06:42PM

Interesting Things

The garbage collection bug we encountered on Monday has popped up in new places:

  • Rspec version < 1.1.11 + Linux ruby 1.8.6 patchlevel > 114 = (small explosion)

  • libxml version 0.5.4 (11 versions out of date) + Linux ruby 1.8.6 patchlevel > 114 = (another small explosion)

Also, speaking of libxml:

  • ActsAsSolr was incompatible with more recent versions of libxml, but David has submitted a fix which has been incorporated into mattmatt's github fork.

Ask for Help

"Turning off cache-busting for test environment replaces cache-busting junk with ??. Is there a way to make it replace it with nothing instead?"

With cache busting on, you have would have to test like this:

 admin_button.should have_tag("a > img[src ='/images/v3/admin/image_name.png?12121212?']")

which is awkward.

We turned it off by adding the following to config/environments/test.rb (google said to do it)

 ENV['RAILS_ASSET_ID'] = ''

But, then we had to do this:

 admin_button.should have_tag("a > img[src ='/images/v3/admin/image_name.png??']")

Is there a way to turn off cache-busting that doesn't put weird/ugly question marks at the end of your image names?

"with_tag & have_tag don't work in non-controller (e.g. model) rspecs?"

Yes, they don't. Use Hpricot instead. Or maybe assert_elements.

Tracker 101 Screencast

edit Posted by Dan Podsedly on Wednesday November 26, 2008 at 03:56AM

If you're new to Tracker, or are considering trying it out but haven't signed up yet, check out the Tracker 101 screencast. Thanks to Ian McFarland for recording it.

Better View Testing with Elementor

edit Posted by Pat Nakajima on Tuesday November 25, 2008 at 02:18AM

We've got a few mantras at Pivotal. One of them has to do with testing all the time. It's a Good Thing, for sure. Until recently though, I had always inserted a tacit "except for views" to the end of it. The reason for my reservations wasn't the fact that view tests can be brittle. Any test can be brittle. I didn't like testing views because it seemed like the test I was writing never really described the code I was writing. Let's look at a typical view test to see what I mean:

describe "/posts/index.html.erb" do
  def render_view
    render "/posts/index.html.erb"
    response.body
  end

  before(:each) do
    assigns[:posts] = [
      stub_model(Post, :name => "First!", :body => "first body."),
      stub_model(Post, :name => "Second!", :body => "second body.")
    ]
  end

  describe "assertions using have_tag" do
    it "renders posts" do
      render_view
      response.should have_tag(".post", 2)
    end

    it "renders post headers" do
      render_view
      response.should have_tag(".post .post-name", "First!", 1)
      response.should have_tag(".post .post-name", "Second!", 1)
    end

    it "renders post bodies" do
      render_view
      response.should have_tag(".post .post-body", "first body.", 1)
      response.should have_tag(".post .post-body", "second body.", 1)
    end
  end
end

This snippet uses the have_tag helper. It's somewhat slow, and to my eyes, expresses intent about as well as an apple floating in a top-hat filled with perfume. The "tag" is a selector? The content filter is just the second argument? And the last argument is the amount of "tag" the response should have? You can test like this, but why would you?

I've also seen a more manual pattern, using a library like Hpricot or Nokogiri to parse the response body, then asserting on the results of that:

describe "assertions using Nokogiri" do
  def doc
    @doc ||= Nokogiri(render_view)
  end

  it "renders posts" do
    doc.search('.post').should have(2).nodes
  end

  it "renders post headers" do
    headers = doc.search('.post .post-name')
    headers.should have(2).elements
    headers.detect { |element| element.text == "First!" }.should_not be_nil
    headers.detect { |element| element.text == "Second!" }.should_not be_nil
  end

  it "renders post bodies" do
    bodies = doc.search('.post .post-name')
    bodies.should have(2).elements
    bodies.detect { |element| element.text == "first body." }.should_not be_nil
    bodies.detect { |element| element.text == "second body." }.should_not be_nil
  end
end

It's faster, since it's not using have_tag, but still not very expressive. CSS selectors are still littered across the it statements, but at least it's only once per test. Still, using detect to find content is no good. And I don't think CSS selectors have any business in it statements at all. That seems like asserting on the name of a method being called, not its behavior.

The solution!

Given my problems with the above approaches, I created a gem that allows the following assertion syntax:

it "renders posts" do
  result.should have(2).posts
end

it "renders post headers" do
  result.should have(2).post_headers
  result.should have(1).post_header.with_text("First!")
  result.should have(1).post_header.with_text("Second!")
end

it "renders post bodies" do
  result.should have(2).post_bodies
  result.should have(1).post_body.with_text("first body.")
  result.should have(1).post_body.with_text("second body.")
end

What's a result? And how does it know how many posts, post_headers, and post_bodies it has? The result is defined in a before block like so:

require 'elementor'
require 'elementor/spec'

include Elementor

attr_reader :result

before(:each) do
  @result = elements(:from => :render_view) do |tag|
    tag.posts         ".post"
    tag.post_headers  ".post .post-name"
    tag.post_bodies   ".post .post-body"
  end
end

The elements method allows you to name your CSS selectors using the tag block argument. The tag object uses method_missing to register your names. The :from option specifies a method to be called that will return some raw markup.

Naming selectors alone was a huge win for me, but there are a few other cool bits about the @result object. First, you get to use the with_text helper for filtering content. You'll also get a with_attrs helper for filtering based on a hash of attribute values.

The project is called Elementor, and you can install it like so:

[sudo] gem install elementor

The code is on the GitHub here: github.com/nakajima/elementor (and you can see the CI build here). Take a look at the specs for all of the examples of what you can do. Hopefully, you'll find it as useful as I have. If not, please share your reasons in the comments!

Hide .svn files in changes view of RubyMine

edit Posted by Jonathan Barnes on Monday November 24, 2008 at 10:20PM

You have probably noticed that the changes view is unusable in RubyMine because of all the .svn files showing.

Well if you refresh ( 'changes' reload image ) the changes view they go away...Hurray!

Standup 11/24/2008: Changes to GC in Ruby 1.8.6?

edit Posted by brianj on Monday November 24, 2008 at 06:54PM

Ask for Help

"Does anyone know about recent changes to Ruby 1.8.6 garbage collection?"

We're seeing a strange test failure on one of our CI boxes:

[BUG] object allocation during garbage collection phase ruby 1.8.6 (2008-08-11) [i686-linux]

ruby -v gives this:

ruby 1.8.6 (2008-08-11 patchlevel 287) [i686-linux]

On the developer's OSX machine, which doesn't experience this bug, ruby -v gives:

ruby 1.8.6 (2008-03-03 patchlevel 114) [universal-darwin9.0]

There have apparently been changes to garbage collection in 1.8.7 and 1.9 -- does anyone know if these have been backported to 1.8.6 somewhere between patchlevels 114 and 287?

New York Standup 11/24/2008

edit Posted by Mike Grafton on Monday November 24, 2008 at 03:05PM

Interesting

  • Rails 2.2.2 is released!

  • Even Rails 2.2.2 isn't always threadsafe. I found this out by running a script with JRuby from the command line. The script loaded the Rails environment and then launched two threads that simply tried to resolve an ActiveRecord class constant. Fireworks (in the form of LoadError) ensued deep inside of const_missing. I'll post the full example later today.

  • Tsearch2 is now built into Postgres (as of 8.3). This means you must remove the metadata from your tables, since Postgres now stores it in a separate place.

Unicode Transliteration to Ascii

edit Posted by Chris Heisterkamp on Monday November 24, 2008 at 12:28AM

Matthew O'Connor and I recently worked on a project that sent SMS messages to mobile customers. Unfortunately the SMS aggregator we used on the project rejected messages with non-ascii characters.

One approach we considered was to strip our messages of any characters that were not ascii and send them as is. After looking through some of the rejected messages we realized most of the problems occurred with unicode punctuation. Instead of simple deleting the characters we tried transliterating them to their ascii equivalent.

Our first approach used IConv:

require 'iconv'

module SmsEncoder
  def self.convert(utf8_text)
    text = Iconv.iconv("US-ASCII//TRANSLIT", "UTF-8", utf8_text).first
    text.gsub(/`/, "'")
  rescue Iconv::Failure
    ""
  end
end

For some reason the backtick ` also caused problems so we converted that after using Iconv.

This approach worked perfectly on OS X but as soon as we moved to the Linux servers the libiconv characteristics changed and most untranslatable characters became question marks instead of empty strings.

Instead of wrestling with libiconv we looked for a solution entirely in ruby. We found unidecode which got us most of the way there. Unidecode did a little more than we wanted though and translated Chinese and Japanese characters to their approximate sounds. e.g. 今年1月 gets transliterated to Jin Nian 1Yue

We decided to only transliterate extended latin charaters, punctuation and money symbols.

Here is the final code with the unidecode monkey patch:

require 'set'
require 'unidecode'

module SmsEncoder
  def self.convert(utf8_text)
    Unidecoder.decode(utf8_text.to_s).gsub("[?]", "").gsub(/`/, "'").strip
  end
end

module Unidecoder
  class << self
    def decode(string)
      string.gsub(/[^\x20-\x7e]/u) do |character|
        codepoint = character.unpack("U").first
        if should_transliterate?(codepoint)
          CODEPOINTS[code_group(character)][grouped_point(character)] rescue ""
        else
          ""
        end
      end
    end

    private

    # c.f. http://unicode.org/roadmaps/bmp/
    CODE_POINT_RANGES = {
      :basic_latin => Set.new(32 .. 126),
      :latin1_supplement => Set.new(160 .. 255),
      :latin1_extended_a => Set.new(256 .. 383),
      :latin1_extended_b => Set.new(384 .. 591),
      :general_punctuation => Set.new(8192 .. 8303),
      :currency_symbols => Set.new(8352 .. 8399),
    }

    def should_transliterate?(codepoint)
      @all_ranges ||= CODE_POINT_RANGES.values.sum
      @all_ranges.include? codepoint
    end
  end
end

and tests:

class SmsEncoderTest < Test::Unit::TestCase
  def test_transliteration_of_blank
    assert_equal "", SmsEncoder.convert(nil)
    assert_equal "", SmsEncoder.convert("")
  end

  def test_transliteration_of_whitespace
    assert_equal "", SmsEncoder.convert(" \t\n")
  end

  def test_transliteration_of_text_surrounded_by_space
    assert_equal "abc", SmsEncoder.convert("  abc  ")
  end

  def test_transliteration_of_ascii
    orig_text = "!\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
    conv_text = SmsEncoder.convert(orig_text)
    assert_equal orig_text.gsub(/`/, "'"), conv_text
  end

  def test_transliteration_of_unicode_punctuation
    utf8_text = "“foo” ‹foo› ‘foo’ ,foo, –foo— {foo} (foo) `foo`"
    ascii_text = SmsEncoder.convert(utf8_text)
    assert_equal "\"foo\" <foo> 'foo' ,foo, foo-- {foo} (foo) 'foo'", ascii_text
  end

  def test_transliteration_of_common_latin1_characters
    utf8_text = "ñ ò ^ ¡ ¿ Æ æ ß Ç §"
    ascii_text = SmsEncoder.convert(utf8_text)
    assert_equal "n o ^ ! ? AE ae ss C SS", ascii_text
  end

  def test_transliteration_of_money_characters
    utf8_text = "€ £ $ ¥"
    ascii_text = SmsEncoder.convert(utf8_text)
    assert_equal "EU PS $ Y=", ascii_text
  end

  def test_untransliterable_characters
    utf8_text = "ɏ \x1f \x01 \x00 Ʌ \x7f"
    ascii_text = SmsEncoder.convert(utf8_text)
    assert_equal "", ascii_text

  end

  def test_transliteration_of_chinese_characters
    utf8_text = "ウェブ全体から検索"
    ascii_text = SmsEncoder.convert(utf8_text)
    assert_equal "", ascii_text
  end
end

Standup 11/21/2008: Pro Bono Airwaves

edit Posted by Joe Moore on Saturday November 22, 2008 at 12:07AM

Interesting Things

  • Rails reminder: flash[:notice] = "Good Job" will survive a redirect, while flash.now[:notice] = "Good Job" will not. In general, flash.now is used when you render a template without a redirect, such as when a form submit has validation errors.
  • Good Books: Several folks have recommended JavaScript: The Good Parts.
  • Pro bono: Would anyone like to help out KUSF for free? Their new website project has been stalled for a year.

Ask for Help

"How do you get Selenium to work with Firefox 3?"

If you know how, pull the jar files out of a later release and use those. Good luck!

Why I test.

edit Posted by Abhijit Hiremagalur on Friday November 21, 2008 at 05:58AM

This is a cross post from my personal blog, because I'd like to hear from other Pivots about why they test.

First about unit and integration tests.

  • I write unit tests for focused feedback; i.e. tell me exactly what broke. To keep them focused I try to keep them orthogonal which usually means using fakes of any collaborators.
  • I write integration tests where I need more safety than a unit test will offer. They seem even more important when I’m stubbing and mocking a lot in a dynamically typed language like Ruby.
  • I write both types of tests to help convey intent and understand the problem better. I TDD with either a unit test or integration test, whichever feels natural.

Then about interaction and state based testing.

  • I pick the approach that feels natural at the time, favouring neither by default. I struggle with rules about when to use which.
  • I dislike interaction tests that look suspiciously similar/symmetrical/coupled to the code they refer to. I expect a test to earn its right to exist, and therefore add to the size of the codebase and build’s time, by either conveying intent that is difficult to express in the code itself (which is why I love the term ‘example‘) or addressing some other consciously identified risk.

So, now it's your turn - why do you test?

Other articles: