spotify/featran

Name: featran

Owner: Spotify

Description: A Scala feature transformation library for data science and machine learning

Created: 2017-05-08 17:20:27.0

Updated: 2018-05-24 15:31:22.0

Pushed: 2018-05-24 15:31:20.0

Homepage: https://spotify.github.io/featran

Size: 936

Language: Scala

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

featran

Build Status codecov.io GitHub license Maven Central

Featran, also known as Featran77 or F77 (get it?), is a Scala library for feature transformation. It aims to simplify the time consuming task of feature engineering in data science and machine learning processes. It supports various collection types for feature extraction and output formats for feature representation.

Introduction

Most feature transformation logic requires two steps, one global aggregation to summarize data followed by one element-wise mapping to transform them. For example:

We can implement this in a naive way using reduce and map.

 class Point(score: Double, label: String)
data = Seq(Point(1.0, "a"), Point(2.0, "b"), Point(3.0, "c"))

a = data
ap(p => (p.score, p.score, Set(p.label)))
educe((x, y) => (math.min(x._1, y._1), math.max(x._2, y._2), x._3 ++ y._3))

features = data.map { p =>
.score - a._1) / (a._2 - a._1) :: a._3.toList.sorted.map(s => if (s == p.label) 1.0 else 0.0)

But this is unmanageable for complex feature sets. The above logic can be easily expressed in Featran.

rt com.spotify.featran._
rt com.spotify.featran.transformers._

fs = FeatureSpec.of[Point]
equired(_.score)(MinMaxScaler("min-max"))
equired(_.label)(OneHotEncoder("one-hot"))

fe = fs.extract(data)
names = fe.featureNames
features = fe.featureValues[Seq[Double]]

Featran also supports these additional features.

See Examples (source) for detailed examples. See transformers package for a complete list of available feature transformers.

See ScalaDocs for current API documentation.

Artifacts

Feature includes the following artifacts:

License

Copyright 2016-2017 Spotify AB.

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.