Hackfoofery

Alson Kemp

Ruby vs. JRuby: URI.parse fails with percent signs

Ran into a nasty issue with URL parsing a while ago in a Rails app.  I was pulling URLs in from Amazon’s shopping APIs and started receiving occasional InvalidURIError exceptions.  Tracked it down to percent signs in the URLs passed from Amazon.  Percent signs are kinda reserved for escaping/encoding some characters in URLs, but, while I’m not sure that the Amazon URLs are truly legal, the parsing function shouldn’t fail on a non-standard usages.  After all, the web is a messy place and failing-safe on parsing messy data seems a better behavior…    Examples:

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> u = URI.parse("http://www.yahoo.com/test?string=thing%ding") # questionable
URI::InvalidURIError: bad URI(is not URI?): http://www.yahoo.com/test?string=thing%ding
 from /home/alson/bin/../apps/jruby-1.4.0/lib/ruby/1.8/uri/common.rb:436:in `split'
 from /home/alson/bin/../apps/jruby-1.4.0/lib/ruby/1.8/uri/common.rb:484:in `parse'
 from (irb):3

irb(main):003:0> u = URI.parse("http://www.yahoo.com/abba?test%20text")  # legal usage
=> #<URI::HTTP:0x164cbde URL:http://www.yahoo.com/abba?test%20text>

irb(main):004:0> u = URI.parse("http://www.yahoo.com/abba?test%u2020text") # quasi-legal usage
URI::InvalidURIError: bad URI(is not URI?): http://www.yahoo.com/abba?test%u2020text
 from /home/alson/bin/../apps/jruby-1.4.0/lib/ruby/1.8/uri/common.rb:436:in `split'
 from /home/alson/bin/../apps/jruby-1.4.0/lib/ruby/1.8/uri/common.rb:484:in `parse'
 from (irb):5

Enter JRuby

Been thinking about checking out JRuby since I would then have access to the universe of Java libraries.  So Ruby’s URI.parse method is not very robust and I should have caught the exception, tried an alternate parser, etc, but what if I used the Java URL decoder on these URLs?  Works as expected:

irb(main):001:0> include_class "java.net.URLDecoder"
=> ["java.net.URLDecoder"]

irb(main):014:0> u = URL.new("http://www.yahoo.com/test?string=thing%ding")                       
=> #<Java::JavaNet::URL:0x13e4a5a>                                                                
irb(main):015:0> u.getQuery                                                                       
=> "string=thing%ding"                            

irb(main):013:0> u = URL.new("http://www.yahoo.com/test%u2020tring")
=> #<Java::JavaNet::URL:0x3aef16>
irb(main):014:0> u.path
=> "/test%u2020tring"

Lesson

Ruby’s a great little language; Java’s a great big library => JRuby’s a great little language with a great big library.

Also, an interesting article on URL parsing from DaringFireball.

Written by alson

November 27th, 2009 at 4:21 pm

Posted in Turbinado

with 7 comments

7 Responses to 'Ruby vs. JRuby: URI.parse fails with percent signs'

Subscribe to comments with RSS or TrackBack to 'Ruby vs. JRuby: URI.parse fails with percent signs'.

  1. Is Turbinado dead?

    Gour

    28 Nov 09 at 7:17 am

  2. Gour,

    I’m not sure whether or not Turbinado is dead. I haven’t had much time to work on it since I’ve been focused on bringing some production websites online (and Turbinado isn’t production ready yet). I’ve got a post with some reflections on Turbinado and Haskell 50% written, so finishing that should help me think through my future plans for both.

    alson

    28 Nov 09 at 12:47 pm

  3. But did you report this to ruby core? of course not… you blogged instead. To what end?

    Ryan Davis

    30 Nov 09 at 8:51 am

  4. Ryan,

    Nice point. I have not reported this to Ruby Core, because I did not know if this was the desired behavior or not. If the Ruby URI parser intends to be strict then it’s acting appropriately (and annoyingly)… That said, per your suggestion, I will report it.

    Blogged about it: I had a hell of a time with this problem and, whether or not it is fixed in Ruby, there’s a decent chance that someone else will have the same problem. I’ve also got a workaround if someone wants to handle problematic URLs in Ruby and I’ll add that to the post.

    alson

    30 Nov 09 at 12:12 pm

  5. addressable gem accepts % in urls – you could use that

    hosiawak

    1 Dec 09 at 10:05 am

  6. hosiawak,

    Thanks for the pointer. That said, I’d prefer to have effective URI parsing be part of the core libraries. Also, I’d been pondering using JRuby, so it was nice to find a simple example of how I could benefit from JRuby.

    alson

    2 Dec 09 at 8:46 pm

  7. Grrr… and this causes traces when using URI::parse with an RVM gemset. RVM uses ‘%’ as a delimiter between VM version and the gemset name.

    Evan Light

    21 Mar 10 at 5:33 pm

Leave a Reply