Hackfoofery

Alson Kemp

A HAML parser for Haskell

HAML‘s lovely. As I’ve been working with Turbinado, I’ve been having some issues with HSP. HSP is an insanely impressive piece of software, but its error messages can be a bit unclear. So I started playing around a bit with HAML. Got me wondering “How easy would be to write a HAML parser in Haskell?”

So I tried. Here’s a first-pass, to-be-updated [and somewhat incomplete] HAML parser for Haskell. Not all of the features are implemented, but it’s a start.  It generates HTML bits suitable for compilation by GHC.

Input

f = content
 
page = #content
         .left.column
         %h2 Welcome to our site!
         %p = print_information
         %p
           = print_inline
         .right.column
           = render
         [abba = ding, ding = abba] dinger

Output

f =
    (stringToHtml "content")
page =
       ((tag "div"![strAttr "id" "content"])
         ((tag "div"![strAttr "class" "left column"])
         ((tag "h2")
             (stringToHtml "Welcome to our site!")
         )
         +++
         ((tag "p")
            (stringToHtml print_information)
         )
         +++
         ((tag "p")
           (stringToHtml print_inline)
         )
         +++
         ((tag "div"![strAttr "class" "right column"])
           (stringToHtml render)
         )
         +++
         ((tag "div"![strAttr "abba" "ding", strAttr "ding" "abba"])
                                    (stringToHtml "dinger")
         )
         )
       )

le Code

module Main where
 
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Language
import Text.ParserCombinators.Parsec.Pos
import qualified Text.ParserCombinators.Parsec.Token as T
import Data.Char
import Data.List
import Data.Maybe
import System.IO.Unsafe
 
main = do s <- getContents
          case (parse mainParser "stdin" s) of
            Left  err -> putStrLn "Error: " >> print err
            Right hs  -> putStrLn hs
 
-- Try to parse HAML, otherwise re-output raw lines
 
mainParser = do whiteSpace
                ls <- many1 (hamlCode <|> tilEOL)
                return $ unlines ls
--
-- * HAML lexer
--
hamlLexer = T.makeTokenParser emptyDef
whiteSpace= T.whiteSpace hamlLexer
lexeme    = T.lexeme hamlLexer
symbol    = T.symbol hamlLexer
natural   = T.natural hamlLexer
parens    = T.parens hamlLexer
semi      = T.semi hamlLexer
squares   = T.squares hamlLexer
stringLiteral= T.stringLiteral hamlLexer
identifier= T.identifier hamlLexer
reserved  = T.reserved hamlLexer
reservedOp= T.reservedOp hamlLexer
commaSep1 = T.commaSep1 hamlLexer
--
-- * Main HAML parsers
--
 
-- hamlCode is just many identifiers (e.g. 'func a b c' followed by '=' followed by a hamlBlock
-- func a b c = %somehaml
hamlCode = try ( do is <- many1 identifier
                    symbol "="
                    currentPos <- getPosition
                    x <- manyTill1
                          (lexeme $ hamlBlock)
                          (notSameIndent currentPos)
                    return $ (concat $ intersperse " " is) ++
                             " = \n" ++
                             (concat $ (intersperse (indent currentPos ++ "+++\n")  $ filter (not . null) $ x))
                  )
 
-- A Block may start with some whitespace, then has a valid bit of data
hamlBlock   = do currentPos <- getPosition
                 bs <- manyTill1
                      (pTag <|> pText)
                      (notSameIndent currentPos)
                 return $ intercalate (indent currentPos ++ "+++\n") bs
 
pTag    = do    currentPos <- getPosition
                try
                    (do t  <- lexeme tagParser
                        ts <- (isInline currentPos >> char '/' >> return []) <|>
                              (hamlBlock)
                        return $ intercalate "\n" $ filter (not . null) $
                          [ (indent currentPos) ++ "((" ++ (if (null ts) then "i" else "") ++ t  ++ ")"
                          , if null ts then [] else ts
                          , (indent currentPos) ++ ")\n"]
                    )
 
pText = lexeme stringParser
 
notSameIndent p = (eof >> return []) <|>
                  (do innerPos <- getPosition
                      case (sourceColumn p) == (sourceColumn innerPos) of
                                True  -> pzero
                                False -> return []
                  )
 
--
-- * Various little parsers
--
 
tagParser :: CharParser () String
tagParser = do     t <- optionMaybe tagParser'
                   i <- optionMaybe idParser
                   c <- optionMaybe (many1 classParser)
                   a <- optionMaybe attributesParser
                   if (isJust t || isJust i || isJust c || isJust a)
                     then
                       do return $ "tag \"" ++ (fromMaybe "div" t) ++ "\"" ++
                           (if not (isJust i || isJust c || isJust a) then "" else
                              concat $
                               [ "!["
                               , intercalate ", " $ filter (not . null)
                                   [ (maybe "" (\i' -> "strAttr \"id\" \"" ++ i' ++ "\"") i)
                                   , (maybe "" (\c' -> "strAttr \"class\" \"" ++ (intercalate " " c') ++ "\"") c)
                                   , (maybe "" (\kv -> intercalate ", " $ map (\(k,v) -> "strAttr \"" ++ k ++ "\" \"" ++ v ++ "\"") kv) a)
                                   ]
                               , "]"]
                           )
                     else pzero
 
tagParser' :: CharParser () String
tagParser' =  do char '%'
                 many1 termChar
 
idParser :: CharParser () String
idParser = do char '#'
              many1 termChar
 
classParser :: CharParser () String
classParser = do char '.'
                 many1 termChar
 
attributesParser :: CharParser () [(String, String)]
attributesParser = squares (commaSep1 attributeParser)
 
attributeParser :: CharParser () (String, String)
attributeParser = do k <- identifier
                     symbol "="
                     cs <- many1 identifier
                     return (k, intercalate " " cs)
 
stringParser :: CharParser () String
stringParser = do   currentPos <- getPosition
                    modifier <- optionMaybe (char '=' <|> char '-')
                    whiteSpace
                    c <- alphaNum
                    cs<- tilEOL
                    case modifier of
                      Just '-' -> return $ (indent currentPos) ++ "-" ++ c:cs
                      Just '=' -> return $ (indent currentPos) ++ "(stringToHtml " ++ c:cs ++ ")"
                      Nothing  -> return $ (indent currentPos) ++ "(stringToHtml \"" ++ c:cs ++ "\")"
 
--
-- * Utility functions
--
 
isInline     p = do p2 <- getPosition
                    case (sourceLine p  ) == (sourceLine p2) of
                      True -> return []
                      False -> pzero
isSameIndent p1 p2 = (sourceColumn p1) == (sourceColumn p2)
 
tilEOL = manyTill1 (noneOf "\n") eol
eol = newline <|> (eof >> return '\n')
 
termChar = satisfy (\c -> (isAlphaNum c) || (c `elem` termPunctuation) )
termPunctuation = "-_"
indent p = take (sourceColumn (p) - 1) (repeat ' ')
 
manyTill1 p e =  do ms <- manyTill p e
                    case (null ms) of
                      True  -> pzero
                      False -> return ms

Golly, but I wish that I’d cleaned up the code, but there it is in all of its raw, un-thought-through glory…

Written by alson

December 11th, 2008 at 1:56 am

Posted in Geekery,Programming

Tagged with , ,

with 2 comments

2 Responses to 'A HAML parser for Haskell'

Subscribe to comments with RSS or TrackBack to 'A HAML parser for Haskell'.

  1. Nice!

    The possibility of having portable templating syntax (Markdown, HAML, etc.) is intriguing. Yet another good thing that can come to Haskell thanks to having support for parsing built into the standard library.

    Paul Brown

    11 Dec 08 at 8:36 pm

  2. Thanks for the kudos! It was fun to write. Not a full implementation, but enough to serve as a starting point for further work.

    Parsec is a pretty sweet library to have built into the standard set of libraries, though I do wish that Parsec had more usage examples. Sometimes the error messages can be a bit terse or misleading, but it’s a pretty nice library.

    alson

    12 Dec 08 at 12:02 am

Leave a Reply