Hackfoofery

Alson Kemp

Ruby vs. JRuby: URI.parse fails with percent signs

Written by alson

November 27th, 2009 at 4:21 pm

Posted in Turbinado

Ran into a nasty issue with URL parsing a while ago in a Rails app.  I was pulling URLs in from Amazon’s shopping APIs and started receiving occasional InvalidURIError exceptions.  Tracked it down to percent signs in the URLs passed from Amazon.  Percent signs are kinda reserved for escaping/encoding some characters in URLs, but, while I’m not sure that the Amazon URLs are truly legal, the parsing function shouldn’t fail on a non-standard usages.  After all, the web is a messy place and failing-safe on parsing messy data seems a better behavior…    Examples:

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> u = URI.parse("http://www.yahoo.com/test?string=thing%ding") # questionable
URI::InvalidURIError: bad URI(is not URI?): http://www.yahoo.com/test?string=thing%ding
 from /home/alson/bin/../apps/jruby-1.4.0/lib/ruby/1.8/uri/common.rb:436:in `split'
 from /home/alson/bin/../apps/jruby-1.4.0/lib/ruby/1.8/uri/common.rb:484:in `parse'
 from (irb):3

irb(main):003:0> u = URI.parse("http://www.yahoo.com/abba?test%20text")  # legal usage
=> #<URI::HTTP:0x164cbde URL:http://www.yahoo.com/abba?test%20text>

irb(main):004:0> u = URI.parse("http://www.yahoo.com/abba?test%u2020text") # quasi-legal usage
URI::InvalidURIError: bad URI(is not URI?): http://www.yahoo.com/abba?test%u2020text
 from /home/alson/bin/../apps/jruby-1.4.0/lib/ruby/1.8/uri/common.rb:436:in `split'
 from /home/alson/bin/../apps/jruby-1.4.0/lib/ruby/1.8/uri/common.rb:484:in `parse'
 from (irb):5

Enter JRuby

Been thinking about checking out JRuby since I would then have access to the universe of Java libraries.  So Ruby’s URI.parse method is not very robust and I should have caught the exception, tried an alternate parser, etc, but what if I used the Java URL decoder on these URLs?  Works as expected:

irb(main):001:0> include_class "java.net.URLDecoder"
=> ["java.net.URLDecoder"]

irb(main):014:0> u = URL.new("http://www.yahoo.com/test?string=thing%ding")                       
=> #<Java::JavaNet::URL:0x13e4a5a>                                                                
irb(main):015:0> u.getQuery                                                                       
=> "string=thing%ding"                            

irb(main):013:0> u = URL.new("http://www.yahoo.com/test%u2020tring")
=> #<Java::JavaNet::URL:0x3aef16>
irb(main):014:0> u.path
=> "/test%u2020tring"

Lesson

Ruby’s a great little language; Java’s a great big library => JRuby’s a great little language with a great big library.

Also, an interesting article on URL parsing from DaringFireball.

with 7 comments

Announce: Turbinado V0.7

Written by alson

June 14th, 2009 at 12:15 am

Posted in Turbinado

Turbinado has been making solid progress.  See the GitHub repo.

Added a couple of big new features:

HAML and XHTML Templates

While Turbinado used XHTML for templates before, the HSX library (an amazing piece of work) was a real pain to use and threw very strange errors. 

HSX (and its associated libraries) has been torn out of Turbinado and has been replaced by a custom preprocessor called ‘trturbinado’. The preprocessor is run by the webserver over any templates (or views) before they’re compiled and loaded. The preprocessor handles XHTML and a limited version of HAML (to be enhanced).

See the example below. After preprocessing, the ‘someHtml’ and ‘someHAML’ functions will yield the exact sasme code as the ‘someTextXHTML’ function.

Note: I’m not great at writing parsers, so I wouldn’t doubt that there are bugs in the preprocessor!

markup :: VHtml
markup=
  <div>
    <% someTextXHtml %>
    <% someHtml %>
    <% someHAML %>
  </div>
 
-- | These are the raw Turbinado.View.Html style tags
someTextXHtml :: VHtml
someTextXHtml = do s <- getViewDataValue_u "sample_value" :: View String
                   ((tag "div") (
                    ((tag "i") (stringToVHtml s)) +++
                    ((tag "b") (stringToVHtml "some text in Turbinado.View.Html style"))
                    ))
                    
someHtml :: VHtml
someHtml = do s <- getViewDataValue_u "sample_value" :: View String
              <div>
                <i><%= s %></i>
                <b>some text in XHtml style</b>
              </div>
 
someHAML :: VHtml
someHAML = do s <- getViewDataValue_u "sample_value" :: View String
              %div
                %i= s
                %b some text in HAML style
 

FastCGI Support

FastCGI has been an often requested feature, so I rewrote the CGI handling and added FastCGI support, too. Much as with CGI, using FastCGI is very simple. Just tell Apache to use ‘dispatch.fcgi’ to handle requests.

<VirtualHost *>
  DocumentRoot /home/alson/projects/turbinado/static
  <Directory "/home/alson/projects/turbinado/static">
    Options +FollowSymLinks +ExecCGI +Includes
    AddHandler cgi-script cgi
    AddHandler fastcgi-script fcgi
    AllowOverride None
    Order allow,deny
    Allow from all
  </Directory>
  RewriteEngine On
  RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f
  RewriteRule (.*) $1 [L]
  # Use FCGI
  RewriteRule ^(.*)$ %{DOCUMENT_ROOT}/dispatch.fcgi [QSA,L]
  # Use CGI
  # RewriteRule ^(.*)$ %{DOCUMENT_ROOT}/dispatch.fcgi [QSA,L]
</VirtualHost>

Performance of Turbinado’s FastCGI interface is pretty good, but is well short of Turbinado’s HTTP interface, so, if you can live without the benefits of FastCGI, HTTP, which is simpler and faster, is the way to go.

without comments

Turbinado V0.6.5

Written by alson

June 3rd, 2009 at 2:24 am

Posted in Haskell,Turbinado

Been a long silent few months (blog cliche, anyone?), but Turbinado is cranking along again. Bodo (http://www.naumann.cc/) is using Turbinado for a school project so we’ve been working together to advance Turbinado. So what’s new? Mostly touch ups:

HTTP 4k Support

The HTTP 3000 package seems to be slowing down while HTTP 4000 seems to be taking over.  Makes sense to update Turbinado to the latest version.  Done.

Better Forms Support

Added in support for multipart forms, so now you can upload files to Turbinado.  Note: there is currently no limit on the size of the upload, though.  (See Turbinado.Environment.Files for how to access uploaded files.)

Improved ORM

The PostgreSQL ORM is pretty sweet and now it’s even better.  Improved error handling, quoting of tables names, etc.  Next step is to build the MySQL ORM.

Summary

We’re moving again.  Note: turbinado-website is currently updated, but turbinado isn’t (it’ll be fixed by June 4th).

Would love your feedback on what could be done to improve Turbinado.  Although I made a point of running turbinado.org on Turbinado, it may be time to switch to a CMS that allows people to contribute to the project more easily.

with 3 comments

SERPEngine : Charting SERP rankings

Written by alson

March 31st, 2009 at 10:13 pm

Posted in SEO

Taking a bit of a break from Haskell stuff for the moment. 1000 pardons.

——–

Had a bit of spare time on my hands and had an observation that at Angie’s List we have a peon who’s job it is to do daily Google searches for key search terms and then record the rank of each term into a spreadsheet. It’s just as efficient and as career-enhancing as you imagine it might be.

Gotta be a better way to do it… we have machines for this, right?

So I pulled Rails and PostgreSQL out of the tool belt.  Started hacking.  Added a bit of OpenFlashChart.  Hacked a bit more.  et voila!

SERP Engine is pretty simple.  If you have a website, you’ve probably thought a bit about fighting for some natural search terms.  Throw those search terms into SERP Engine and chart your website’s SERP rank for those terms.

Sound useless?  It’s actually pretty cool.  Check out the results for:

Would love feedback.  alson at alsonkemp dot com.

without comments

ANNOUNCE: Turbinado V0.6

Written by alson

March 12th, 2009 at 10:20 pm

Posted in Haskell,Turbinado

It’s been long enough since Turbinado V0.4 that I figured I’d skip V0.5 and go straight to announcing Turbinado V0.6.  Lots of new excellent features:

  • By popular demand, support for CGI serving. Apparently some web hosts don’t support HTTP proxying, so some folks requested CGI support.
  • Statically compiled Layouts, Views, Controllers.
  • Support for “.format” in routes. If a request path is “/User/List.xml”, then the following View will be called: /App/Views/User/ListXml.hs.
  • Lower case paths.
  • Support for cookies (see here for examples).
  • Encrypted cookie sessions (see here to see how to use them).
  • Much easier installs using cabal-install.
  • Support for GHC 6.10.  GHC 6.8 is no longer supported.

Turbinado V0.7 will be all about:

  • Documentation. (seriously.)
  • User authentication.
  • Tutorials.

Installation

Installation is pretty painless if you use cabal-install, so make sure that you have cabal-install installed first.  See here: http://hackage.haskell.org/trac/hackage/wiki/CabalInstall.

To install Turbinado:

git clone git@github.com:alsonkemp/turbinado-website.git
cd turbinado-website
cabal install
[This might fail, saying that "trhsx" can't be found.  However, "trhsx" was built during the install and is probably at ~/.cabal/bin/trhsx, so copy "trhsx" to your path and re-run "cabal install".]

CGI Configuration

[Note: you don’t want to use CGI without statically compiling in some Controllers, Layouts and Views.  See below.]

Usually, Turbinado is called with “-p” to specify the port the process should listen on (e.g. “turbinado -p 8080”).  When called with the “-c” flag, Turbinado will handle CGI requests.  However, because of the process setup and tear down times, responding to CGI requests takes about 250ms, which is considerably slower than responding to HTTP requests (about 1ms).

Again following Rails, Turbinado includes a CGI script called “dispatch.cgi” in the /static directory.

Apache Configuration

In order to use Turbinado’s CGI functionality with Apache, you’ll need to something like the following in order to tell Apache to allow CGI scripts in your Turbinado /static directory and to send all requests (e.g. “^(.\*)$”) to the “dispatch.cgi” script.

  DocumentRoot /home/alson/turbinado-website/static
  &lt;Directory "/home/alson/turbinado-website/static"&gt;
     Options +FollowSymLinks +ExecCGI +Includes
     AddHandler cgi-script .cgi
     AllowOverride None
     Order allow,deny
     Allow from all
  &lt;/Directory&gt;
  RewriteRule ^(.*)$ %{DOCUMENT_ROOT}/dispatch.cgi [QSA,L]

Static Compilation Of Resources

Turbinado is designed to dynamically compile and link in the various resources (Controllers, Layouts, Views) needed to serve a request. However, it can take up to 15 seconds to complete that process the first time a particular page is requested (subsequent requests are very fast). With CGI, the server only ever sees the first request, so Turbinado would never be able to serve a CGI request faster than 10-20 seconds.

To fix this, you can now compile into the server particular resources. See Config/Routes.hs here. Turbinado stores a function along with the file path and function name in a tuple, so you just give Turbinado that information in the Routes.hs file and it’ll load those functions into the CodeStore at startup:

staticLayouts =
    [ ("App/Layouts/Default.hs", "markup", App.Layouts.Default.markup)
    ]

Support for “formats”

Rails has great support for file formats. Turbinado is trying to follow that lead. The system will try to figure out the MIME type based on the extension. According to the standard routes (Config/Routes.hs), the following path=>View mappings will occur:

/abba/ding => /App/Views/Abba/Ding.hs
/abba/ding.xml => /App/Views/Abba/DingXml.hs
/bloof/snort/1.csv => /App/Views/Bloof/SnortCsv.hs

The same Controller handles all formats; only the View will change. Also, usage of a format causes a blank Layout to be used (on the assumption that you don’t want a Layout used with a CSV, XML, etc output).

Lower case paths

Turbinado now defaults to using lower case paths (configured in Config/App.hs), so the following paths get mapped to Controllers and Views as follows:

/abba/ding_fling => /App/Controllers/Abba.hs : dingFling
                 => /App/Views/Abba/DingFling.hs

Cookies

Cookies are now supported. Examples here.

setCookie $ mkCookie "counter" (show 0)
v <- getCookieValue "counter"
deleteCookie "counter"

Cookie Sessions

Session data is encrypted and stuffed into a cookie. The encryption key is set in Config/App.hs. Session usage examples are here.

setSessionValue "counter" "0"
v <- setSessionValue "counter"
deleteSessionKey "counter"
abandonSession

with 14 comments

PS3 + Linux Media Serving (MythTV? GMediaServer? MediaTomb?)

Written by alson

March 9th, 2009 at 3:19 pm

Posted in Geekery

Tagged with

I recently got a PS3 (which is a lovely piece of hardware and software) and, given our collections of MP3s, WMAs and AVIs (a bunch of ripped children’s DVDs so that my son is free to physically shred up the actual DVDs), have been trying to figure out how to serve media to it over the network.

Serving from Windows is ludicrously simple using Windows Media Player 11 or TVersity, but I have a small, super-old-school Linux file server used for backups-and-such and I wanted to use it to serve media.  Turns out to be very simple to do so.

MythTV is the big guy in this space, but it had a number of issues for me:

  • Not particularly straightforward to set up (considering the very simple use case I had for it).
  • Tons of functionality that I didn’t need, including a heavy front-end app (though there is a lighter weight web app I could have installed).
  • I couldn’t figure out how to get it to serve WMAs and, since I’ve ripped a bunch of CDs to WMA, this was a killer.

None of this is to say that MythTV is not a great piece of software; it was just way more than I needed.

GMediaServer is a GNU uPNP media server and it looked pretty good, but I wasn’t sure that it would serve video.  Documentation is a also bit lacking.

Enter MediaTomb.  Simple, lightweight, basic media serving.  “apt-get install mediatomb” and I was pretty much there.  A slight, very well documented modification to the configuration file and I popped open a web browser, browsed to the built-in web interface and told MediaTomb to server my /share/media directory.  Walked over to my PS3 (on which the MediaTomb server was already listed) and started browsing my media files.

The only issue I had was when I updated the media files on the server.  Sometimes MediaTomb wouldn’t see the modifications and would send to the PS3 out-of-date data, probably because I was using inotify rather than just time-based refresh.  I switched to the time-based refresh and deleted the MediaTomb SQLite database and all was right with the world.

Useful links:

  • http://www.mediatomb.cc
  • http://ubuntuforums.org/showthread.php?t=650020
  • http://www.freesoftwaremagazine.com/columns/upnp_mediatomb_ps3_and_me

without comments

Kudos to Bodo

Written by alson

February 10th, 2009 at 2:28 pm

Posted in Haskell,Turbinado

Bodo Nauman (http://www.naumann.cc/) got Turbinado to build with GHC 6.10 (after forcing me to fix a bunch of recently added bugs…).  Details are here.  There’s currently an issue with the crypto Hackage package which doesn’t build with GHC 6.10**.  Fortunately, the darcs version of crypto builds fine.

Bodo beat me by a couple of days to an updated “How to install Turbinado” post.  Since v0.5 is freshly minted (ahem… and debugged) (and includes cookies, cookie sessions (lifted from Michael Snoyman’s hweb), a Rails-like respondTo function, …), I’ll post an update in a day or two.

Adolfo Builes also has started to replicate a Rails tutorial in Turbinado, so I hope I can point you to his results soon.

** The crypto build fails when building some test applications.  What’s weird is that “cabal install” installs the *test* applications into the bin directory.  Why would I want test applications in /usr/local/bin?  I don’t see a Cabal flag for “just build the libraries”…

with 3 comments

Turbinado is “not nearly as innovative”

Written by alson

February 3rd, 2009 at 2:28 pm

Posted in Haskell,Turbinado

From Happstack:

“[If HApps doesn’t do anything…] the project will eventually be superceded by other Haskell frameworks like Turbinado which are not nearly as innovative”

Woohoo!  People are talking about Turbinado!

err… Wait! Turbinado is being dissed!

Turbinado vs. HApps

Each project is valid and valuable, so I’m hesitant to get to far into a X vs. Y discussion, but here are some aspects that I thought about when evaluating HApps and developing Turbinado:

  • Turbinado is built on top of Nicklas Broberg‘s HSP/HSPR, which I consider to be an innovative-port of ASP to Haskell…
  • HApps is a much bigger project with many more features; Turbinado is small and needs your help [shameless plug: help here].
  • HApps seems more like a library of really useful functions which is very flexible; Turbinado is more like a web app framework and provides a web server along with defined mechanisms for adding Controller, Views and ComponentsConvention over Configuration = different styles, not better styles.
  • HApps seems less than simple (see here); Turbinado is all about simple (see here)
  • HApps has an aversion to relational databases; Turbinado observes that lots of people use relational databases and supports them.

Turbinado vs. HApps, again

To me, it seems as though the choice between HApps and Turbinado depends more on orientation than on functionality.  If you are interested in: opposing RDMBSs, but are into something like Sinatra, then HApps is probably the best choice; if you’re interested in writing web apps in the style of Ruby On Rails, then Turbinado (though young and pretty) may be the best choice, especially given the simplicity of building Turbinado (newly cabal installable!  to be described in a forthcoming post with tutorials, install details, singing, dancing, high kicks, etc).

Here’s To Success

Most successful languages have more than one web framework, so I hope for success for both HApps and for Turbinado. They’re very different frameworks and serve very different needs. The Haskell community would be well served if both frameworks survived and thrived.  As Merb and Rails have demonstrated, cross-pollination is excellent; the stronger HApps and Turbinado are, the more they can benefit each other.

with 26 comments

Cabal-install is awesome

Written by alson

January 31st, 2009 at 9:15 pm

Posted in Haskell

See also: 2009-The Year Of Hackage and A Plea For Cabal-install

I’m finally getting Turbinado updated for GHC 6.10 and I owe a huge debt to the cabal-install team (my preferred currency for debt payment is beer).  Turbinado is a bitch to install by hand because it depends on specific version of various packages.  One minor rev off and everything explodes.

So I’ve been bouncing between 3 machines working on Turbinado and trying to get the build cleaned up.  Once cabal-install is installed, it’s a simple “cabal install” to get all dependencies downloaded and built.  Much better than setting up a local packages directory, downloading 10 packages, trying to get the install order figured out and trying to remember which of the 10 packages is installed or not.

It’s fantastic that GHC is finally getting a package and build manager.

If you haven’t played with cabal-install, you should:

darcs get --partial http://darcs.haskell.org/cabal-install/
cd cabal-install
sh bootstrap.sh
ln -s ~/.cabal/bin/cabal ~/bin/cabal  ### or somewhere else useful
cabal update

My only want for cabal-install is for it to install executables somewhere more useful than ~/.cabal.  But that’s a minor gripe…

with 13 comments

ANNOUNCE: Turbinado V0.4

Written by alson

January 18th, 2009 at 2:07 pm

Posted in Haskell,Turbinado

Turbinado continues to evolve:

Turbinado (http://www.turbinado.org) is an easy 
to use Model-View-Controller-ish web framework for Haskell.

The source for the framework can be found at:
http://github.com/alsonkemp/turbinado

The source for the website turbinado.org can be found at:
http://github.com/alsonkemp/turbinado-website
(see the /App directory for the code for www.turbinado.org)

Release 0.4 contains:
* A dramatically improved ORM (or Type-Relation Mapper) 
   which handles foreign keys.  Still PostgreSQL only at this point.
* All dependencies in tmp/dependencies to ease building the application.
* In code documentation (not complete, but starting).
* Documentation (see http://turbinado.org/Architecture).

Release 0.5 will focus on:
* Ease of installation!
* Moving to GHC 6.10 (whenever Debian shifts).  Diego Eche 
   provided a port from plugins to ghc-api.
* Additional functionality (e.g. sessions, authentication, etc).
* Tutorials.

without comments