Monadic Parsing with User State

Posted on August 26, 2016
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
module StatefulParsing where
import Control.Monad.State.Class
import Control.Applicative

In this post I propose an extension to the monadic parser framework introduced in a previous post, You could have invented Parsec, that extends the parser to also support embedded user state in your parsing.

This could be used, for example, for parsing a language with user-extensible operators: The precedences and fixidities of operators would be kept in a hashmap threaded along the bind chain.

Instead of posing these changes as diffs, we will rewrite the parser framework from scratch with the updated type.


Parser newtype

Our new parser is polymorphic in both the return type and the user state that, so we have to update the newtype declaration to match.

newtype Parser state result
   = Parser { runParser :: String
                        -> state
                        -> Either String (result, state, String) }

Our tuple now contains the result of the parsing operation and the new user state, along with the stream. We still need to supply a stream to parse, and now also supply the initial state. This will be reflected in our functions.

For convenience, we also make a Parser' a type alias for parsers with no user state.

type Parser' a = Parser () a

Seeing as type constructors are also curried, we can apply η-reduction to get the following, which is what we’ll go with.

type Parser' = Parser ()

Functor instance

instance Functor (Parser st) where

The functor instance remains mostly the same, except now we have to thread the user state around, too.

The instance head also changes to fit the kind signature of the Functor typeclass. Since user state can not change from fmapping, this is fine.

  fn `fmap` (Parser p) = Parser go where
     go st us = case p st us of
          Left e -> Left e
          Right (r, us', st') -> Right (fn r, us', st')

As you can see, the new user state (us') is just returned as is.

Applicative instance

instance Applicative (Parser st) where

The new implementations of pure and <*> need to correctly manipulate the user state. In the case of pure, it’s just passed as-is to the Right constructor.

  pure ret = Parser go where
    go st us = Right (ret, us, st)

Since (<*>) needs to evaluate both sides before applying the function, we need to pass the right-hand side’s generated user state to the right-hand side for evaluation.

  (Parser f) <*> (Parser v) = Parser go where
    go st us = case f st us of
      Left e -> Left e
      Right (fn, us', st') -> case v st' us' of
        Left e -> Left e
        Right (vl, us'', st'') -> Right (fn vl, us'', st'')

Monad instance

instance Monad (Parser st) where

Since we already have an implementation of pure from the Applicative instance, we don’t need to worry about an implementation of return.

  return = pure

The monad instance is much like the existing monad instance, except now we have to give the updated parser state to the new computation.

  (Parser p) >>= f = Parser go where
    go s u = case p s u of
        Left e -> Left e
        Right (x, u', s') -> runParser (f x) s' u'

MonadState instance

instance MonadState st (Parser st) where

Since we now have a state transformer in the parser, we can make it an instance of the MTL’s MonadState class.

The implementation of put must return () (the unit value), and needs to replace the existing state with the supplied one. This operation can not fail.

Since this is a parsing framework, we also need to define how the stream is going to be affected: In this case, it isn’t.

  put us' = Parser go where
    go st _ = Right ((), us', st)

The get function returns the current user state, and leaves it untouched. This operation also does not fail.

  get = Parser go where
    go st us = Right (us, us, st)

Since we’re an instance of MonadState, we needn’t an implementation of modify and friends - They’re given by the MTL.

Alternative instance

instance Alternative (Parser st) where

The Alternative instance uses the same state as it was given for trying the next parse.

The empty parser just fails unconditionally.

  empty = Parser go where
    go _ _ = Left "empty parser"

(<|>) will try both parsers in order, reusing both the state and the stream.

  (Parser p) <|> (Parser q) = Parser go where
    go st us = case p st us of
      Left e -> q st us
      Right v -> Right v

Conclusion

This was a relatively short post. This is because many of the convenience functions defined in the previous post also work with this parser framework, if you replace Parser with Parser'. You can now use get, put and modify to work on the parser’s user state. As a closing note, a convenience function for running parsers with no state is given.

parse :: Parser' a -> String -> Either String a
parse str = case runParser str () of
  Left e -> Left e
  Right (x, _, _) -> x