Archive for the ‘Haskell’ tag
A HAML parser for Haskell
HAML‘s lovely. As I’ve been working with Turbinado, I’ve been having some issues with HSP. HSP is an insanely impressive piece of software, but its error messages can be a bit unclear. So I started playing around a bit with HAML. Got me wondering “How easy would be to write a HAML parser in Haskell?”
So I tried. Here’s a first-pass, to-be-updated [and somewhat incomplete] HAML parser for Haskell. Not all of the features are implemented, but it’s a start. It generates HTML bits suitable for compilation by GHC.
Input
f = content page = #content .left.column %h2 Welcome to our site! %p = print_information %p = print_inline .right.column = render [abba = ding, ding = abba] dinger |
Output
f = (stringToHtml "content") page = ((tag "div"![strAttr "id" "content"]) ((tag "div"![strAttr "class" "left column"]) ((tag "h2") (stringToHtml "Welcome to our site!") ) +++ ((tag "p") (stringToHtml print_information) ) +++ ((tag "p") (stringToHtml print_inline) ) +++ ((tag "div"![strAttr "class" "right column"]) (stringToHtml render) ) +++ ((tag "div"![strAttr "abba" "ding", strAttr "ding" "abba"]) (stringToHtml "dinger") ) ) ) |
le Code
module Main where import Text.ParserCombinators.Parsec import Text.ParserCombinators.Parsec.Language import Text.ParserCombinators.Parsec.Pos import qualified Text.ParserCombinators.Parsec.Token as T import Data.Char import Data.List import Data.Maybe import System.IO.Unsafe main = do s <- getContents case (parse mainParser "stdin" s) of Left err -> putStrLn "Error: " >> print err Right hs -> putStrLn hs -- Try to parse HAML, otherwise re-output raw lines mainParser = do whiteSpace ls <- many1 (hamlCode <|> tilEOL) return $ unlines ls -- -- * HAML lexer -- hamlLexer = T.makeTokenParser emptyDef whiteSpace= T.whiteSpace hamlLexer lexeme = T.lexeme hamlLexer symbol = T.symbol hamlLexer natural = T.natural hamlLexer parens = T.parens hamlLexer semi = T.semi hamlLexer squares = T.squares hamlLexer stringLiteral= T.stringLiteral hamlLexer identifier= T.identifier hamlLexer reserved = T.reserved hamlLexer reservedOp= T.reservedOp hamlLexer commaSep1 = T.commaSep1 hamlLexer -- -- * Main HAML parsers -- -- hamlCode is just many identifiers (e.g. 'func a b c' followed by '=' followed by a hamlBlock -- func a b c = %somehaml hamlCode = try ( do is <- many1 identifier symbol "=" currentPos <- getPosition x <- manyTill1 (lexeme $ hamlBlock) (notSameIndent currentPos) return $ (concat $ intersperse " " is) ++ " = \n" ++ (concat $ (intersperse (indent currentPos ++ "+++\n") $ filter (not . null) $ x)) ) -- A Block may start with some whitespace, then has a valid bit of data hamlBlock = do currentPos <- getPosition bs <- manyTill1 (pTag <|> pText) (notSameIndent currentPos) return $ intercalate (indent currentPos ++ "+++\n") bs pTag = do currentPos <- getPosition try (do t <- lexeme tagParser ts <- (isInline currentPos >> char '/' >> return []) <|> (hamlBlock) return $ intercalate "\n" $ filter (not . null) $ [ (indent currentPos) ++ "((" ++ (if (null ts) then "i" else "") ++ t ++ ")" , if null ts then [] else ts , (indent currentPos) ++ ")\n"] ) pText = lexeme stringParser notSameIndent p = (eof >> return []) <|> (do innerPos <- getPosition case (sourceColumn p) == (sourceColumn innerPos) of True -> pzero False -> return [] ) -- -- * Various little parsers -- tagParser :: CharParser () String tagParser = do t <- optionMaybe tagParser' i <- optionMaybe idParser c <- optionMaybe (many1 classParser) a <- optionMaybe attributesParser if (isJust t || isJust i || isJust c || isJust a) then do return $ "tag \"" ++ (fromMaybe "div" t) ++ "\"" ++ (if not (isJust i || isJust c || isJust a) then "" else concat $ [ "![" , intercalate ", " $ filter (not . null) [ (maybe "" (\i' -> "strAttr \"id\" \"" ++ i' ++ "\"") i) , (maybe "" (\c' -> "strAttr \"class\" \"" ++ (intercalate " " c') ++ "\"") c) , (maybe "" (\kv -> intercalate ", " $ map (\(k,v) -> "strAttr \"" ++ k ++ "\" \"" ++ v ++ "\"") kv) a) ] , "]"] ) else pzero tagParser' :: CharParser () String tagParser' = do char '%' many1 termChar idParser :: CharParser () String idParser = do char '#' many1 termChar classParser :: CharParser () String classParser = do char '.' many1 termChar attributesParser :: CharParser () [(String, String)] attributesParser = squares (commaSep1 attributeParser) attributeParser :: CharParser () (String, String) attributeParser = do k <- identifier symbol "=" cs <- many1 identifier return (k, intercalate " " cs) stringParser :: CharParser () String stringParser = do currentPos <- getPosition modifier <- optionMaybe (char '=' <|> char '-') whiteSpace c <- alphaNum cs<- tilEOL case modifier of Just '-' -> return $ (indent currentPos) ++ "-" ++ c:cs Just '=' -> return $ (indent currentPos) ++ "(stringToHtml " ++ c:cs ++ ")" Nothing -> return $ (indent currentPos) ++ "(stringToHtml \"" ++ c:cs ++ "\")" -- -- * Utility functions -- isInline p = do p2 <- getPosition case (sourceLine p ) == (sourceLine p2) of True -> return [] False -> pzero isSameIndent p1 p2 = (sourceColumn p1) == (sourceColumn p2) tilEOL = manyTill1 (noneOf "\n") eol eol = newline <|> (eof >> return '\n') termChar = satisfy (\c -> (isAlphaNum c) || (c `elem` termPunctuation) ) termPunctuation = "-_" indent p = take (sourceColumn (p) - 1) (repeat ' ') manyTill1 p e = do ms <- manyTill p e case (null ms) of True -> pzero False -> return ms |
Golly, but I wish that I’d cleaned up the code, but there it is in all of its raw, un-thought-through glory…