Name: matryoshka
Owner: SlamData, Inc.
Description: Generalized recursion schemes and traversals for Scala.
Created: 2016-01-19 12:52:07.0
Updated: 2018-05-20 05:50:00.0
Pushed: 2018-05-08 06:22:22.0
Homepage: null
Size: 6719
Language: Scala
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
Generalized folds, unfolds, and traversals for fixed point data structures in Scala.
Add a dependency
aryDependencies += "com.slamdata" %% "matryoshka-core" % "0.18.3"
Optionally, you can also depend on matryoshka-scalacheck
to get Arbitrary
/Cogen
/Shrink
instances for a bunch of pattern functors and fixed points.
Apply some fix for SI-2712. Prior to 2.12, use @milessabin?s compiler plugin. As of 2.12, you can simply add scalacOptions += "-Ypartial-unification"
to your build.sbt.
Add imports as needed. Usually the following should suffice
rt matryoshka._
rt matryoshka.implicits._
but if you need some of our pattern functors, then matryoshka.patterns._
should be added. Also, there will be cases where you need to specify explicit types (although we generally recommend abstracting over {Bir|Cor|R}ecursive
type classes), so you may need matryoshka.data._
(for Fix
, Mu
, and Nu
) and/or matryoshka.instances.fixedpoint._
for things like Nat
, List
, Cofree
, etc. defined in terms of Mu
/Nu
.
This library is predicated on the idea of rewriting your recursive data structures, replacing the recursive type reference with a fresh type parameter.
ed abstract class Expr
l case class Num(value: Long) extends Expr
l case class Mul(l: Expr, r: Expr) extends Expr
could be rewritten as
ed abstract class Expr[A]
l case class Num[A](value: Long) extends Expr[A]
l case class Mul[A](l: A, r: A) extends Expr[A]
This abstract class generally allows a Traverse
instance (or at least a Functor
instance). Then you use one of the fixed point type constructors below to regain your recursive type.
You may also want instances for Delay[Equal, ?]
, Delay[Order, ?]
, and Delay[Show, ?]
(which are very similar to their non-Delay
equivalents) to get instances for fixed points of your functor.
These types take a one-arg type constructor and provide a recursive form of it.
All of these types have instances for Recursive
, Corecursive
, FunctorT
, TraverseT
, Equal
, Show
, and Arbitrary
type classes unless otherwise noted.
Fix
? This is the simplest fixpoint type, implemented with general recursion.Mu
? This is for inductive (finite) recursive structures, models the concept of ?data?, aka, the ?least fixed point?.Nu
? This is for coinductive (potentially infinite) recursive structures, models the concept of ?codata?, aka, the ?greatest fixed point?.Cofree[?[_], A]
? Only has a Corecursive
instance if there?s a Monoid
for A
. This represents a structure with some metadata attached to each node. In addition to the usual operations, it can also be folded using an Elgot algebra.Free[?[_], A]
? Does not have a Recursive
instance. In addition to the usual operations, it can also be created by unfolding with an Elgot coalgebra.So a type like Mu[Expr]
is now isomorphic to the original recursive type. However, the point is to avoid operating on recursive types directly ?
A structure like this makes it possible to separate recursion from your operations. You can now write transformations that operate on only a single node of your structure at a time.
This diagram covers the major classes of transformations. The most basic ones are in the center and the arrows show how they can be generalized in various ways.
Here is a very simple example of an algebra (eval
) and how to apply it to a recursive structure.
e will need a Functor[Expr] in order to call embed bellow
icit val exprFunctor = new scalaz.Functor[Expr] {
erride def map[A, B](fa: Expr[A])(f: (A) => B) = fa match{
case Num(value) => Num[B](value)
case Mul(l, r) => Mul(f(l), f(r))
eval: Algebra[Expr, Long] = { // i.e. Expr[Long] => Long
se Num(x) => x
se Mul(x, y) => x * y
someExpr[T](implicit T: Corecursive.Aux[T, Expr]): T =
l(Num[T](2).embed, Mul(Num[T](3).embed,
Num[T](4).embed).embed).embed
rt matryoshka.data.Mu
Expr[Mu[Expr]].cata(eval) // ? 24
The .embed
calls in someExpr
wrap the nodes in the fixed point type. embed
is generic, and we abstract someExpr
over the fixed point type (only requiring that it has an instance of Corecursive
), so we can postpone the choice of the fixed point as long as possible.
Here is a cheat-sheet (also available in PDF) for some of them.
Those algebras can be applied recursively to your structures using many different folds. cata
in the example above is the simplest fold. It traverses the structure bottom-up, applying the algebra to each node. That is the general behavior of a fold, but more complex ones allow for various comonads and monads to affect the result.
These are the dual of folds ? using coalgebras to deconstruct values into parts, top-down. They are defined in the Corecursive
type class.
Refolds compose an unfold with a fold, never actually constructing the intermediate fixed-point structure. Therefore, they are available on any value, and are not part of a type class.
The structure of these type classes is similar to Recursive
and Corecursive
, but rather than separating them between bottom-up and top-down traversals, FunctorT
has both bottom-up and top-down traversals (and refold), while TraverseT
has all the Kleisli variants (paralleling how Traverse
extends Functor
). A fixed-point type that has both Recursive
and Corecursive
instances has an implied TraverseT
instance.
The benefits of these classes is that it is possible to define the required map
and traverse
operations on fixed-point types that lack either a project
or an embed
(e.g., Cofree[?[_], A]
lacks embed
unless A
has a Monoid
instance, but can easily be map
ped over).
The tradeoff is that these operations can only transform between one fixed-point functor and another (or, in some cases, need to maintain the same functor).
The names of these operations are the same as those in Recursive
and Corecursive
, but prefixed with trans
.
There is an additional (restricted) set of operations that also have a T
suffix (e.g., transCataT
). These only generalize in ?the Elgot position? and require you to maintain the same functor. However, it can be the most natural way to write certain transformations, like matryoshka.algebras.substitute
.
There are generalized forms of most recursion schemes. From the basic cata
(and its dual, ana
), we can generalize in a few ways. We name them using either a prefix or suffix, depending on how they?re generalized.
Most well known (in fact, even referred to as ?generalized recursion schemes?) is generalizing over a Comonad
(or Monad
), converting an algebra like F[A] => A
to F[W[A]] => A
. Many of the other named folds are instances of this ?
W[A] = (T[F], A)
, it?s para
,W[A] = (B, A)
, it?s zygo
, andW[A] = Cofree[F, A]
, it?s histo
.These specializations can give rise to other generalizations. zygoT
uses EnvT[B, ?[_], A]
and ghisto
uses Cofree[?[_], A]
.
Less unique to recursion schemes, there are Kleisli variants that return the result in any monad.
This generalization, stolen from the ?Elgot algebra?, is similar to standard generalization, except it uses W[F[A]] => A
rather than F[W[A]] => A
, with the Comonad
outside the functor. Not all of the forms seem to be as useful as the G
variants, but in some cases, like elgotZygo
, it offers benefits of its own.
Any of these generalizations can be combined, so you can have an algebra that is generalized along two or three dimensions. A fold like cofPara
takes an algebra that?s generalized like zygo
((B, ?)
) in the ?Elgot? dimension and like para
((T[F], ?)
) in the ?G? dimension, which looks like (B, F[(T[F], A)]) => A
. It?s honestly useful. I swear.
Since we can actually derive almost everything from a fairly small number of operations, why don?t we? Well, there are a few reasons, enumerated here in descending order of how valid I think they are:
para
, using gcata(distPara, ?)
would introduce a Corecursive
constraint, and all of the Kleisli variants require Traverse
for the functor, not just Functor
.cata
implemented directly (presumably) performs better than gcata[Id, ?]
. We should have some benchmarks added eventually to actually determine when this is worth doing.gcata
generally requires all the type parameters to be specified, while, say, zygo
doesn?t. You can notice these instances because their definition actually is just to call the generalized version, rather than being implemented directly.Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.