Nokogiri
Xpathが早いらしい、REXMLより断然。
ということでHpricotで結構時間がかかってバグを吐いたパースが、瞬時にノーミスで。
1.4.4はruby1.8.7@Tiger/1.9.2@SnowLeopardで動作確認。
Hpricot
ruby 1.9.2/SnowLeopard と、ruby1.8.7/Tigerに入れてみるが、Segmentation Fault@Snow, Bus Error@Tigerを吐く。原因不明・・・→Nokogiriでやってみたら回避。
たとえばこんな感じ。parse.rb:33でどちらもひっかかる。Tigerのほうは詳細なエラーを捕捉できないので不明・・・
/opt/local/lib/ruby1.9/gems/1.9.1/gems/hpricot-0.8.4/lib/hpricot/parse.rb:33: [BUG] Segmentation fault
ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10]
-- control frame ----------
c:0010 p:---- s:0056 b:0056 l:000055 d:000055 CFUNC :scan
c:0009 p:0074 s:0051 b:0051 l:000050 d:000050 METHOD /opt/local/lib/ruby1.9/gems/1.9.1/gems/hpricot-0.8.4/lib/hpricot/parse.rb:33
c:0008 p:0030 s:0044 b:0044 l:000043 d:000043 METHOD /opt/local/lib/ruby1.9/gems/1.9.1/gems/hpricot-0.8.4/lib/hpricot/parse.rb:4
c:0007 p:0055 s:0038 b:0038 l:000037 d:000037 METHOD parse_genecard.1_9_2.rb:21
c:0006 p:0062 s:0029 b:0029 l:0017c8 d:000028 BLOCK parse_genecard.1_9_2.rb:54
c:0005 p:---- s:0026 b:0026 l:000025 d:000025 FINISH
c:0004 p:---- s:0024 b:0024 l:000023 d:000023 CFUNC :each
c:0003 p:0237 s:0021 b:0021 l:0017c8 d:000d40 EVAL parse_genecard.1_9_2.rb:50
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:0017c8 d:0017c8 TOP
---------------------------
-- Ruby level backtrace information ----------------------------------------
parse_genecard.1_9_2.rb:50:in `<main>'
parse_genecard.1_9_2.rb:50:in `each'
parse_genecard.1_9_2.rb:54:in `block in <main>'
parse_genecard.1_9_2.rb:21:in `parse_geneinfo'
/opt/local/lib/ruby1.9/gems/1.9.1/gems/hpricot-0.8.4/lib/hpricot/parse.rb:4:in `Hpricot'
/opt/local/lib/ruby1.9/gems/1.9.1/gems/hpricot-0.8.4/lib/hpricot/parse.rb:33:in `make'
/opt/local/lib/ruby1.9/gems/1.9.1/gems/hpricot-0.8.4/lib/hpricot/parse.rb:33:in `scan'
-- C level backtrace information -------------------------------------------
[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html