ballet.eng package

class ballet.eng.BaseTransformer[source]

Bases: ballet.eng.base.NoFitMixin, sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Base transformer class for developing new transformers

class ballet.eng.BoxCoxTransformer(threshold, lmbda=0.0)[source]

Bases: ballet.eng.base.ConditionalTransformer

Conditionally apply the Box-Cox transformation

In the fit stage, determines which variables (columns) have absolute skew above threshold. In the transform stage, applies the Box-Cox transformation of 1+x to each variable selected previously.

Parameters
  • threshold (float) – skew threshold.

  • lmbda (float) – power parameter of the Box-Cox transform. Defaults to 0.0

class ballet.eng.ColumnSelector(cols)[source]

Bases: ballet.eng.base.BaseTransformer

Select one or more columns from a DataFrame

Parameters

cols (Union[str, List[str]]) – column or columns to select

transform(X, **transform_kwargs)[source]
class ballet.eng.ComputedValueTransformer(func, pass_y=False)[source]

Bases: ballet.eng.base.BaseTransformer

Compute a value on the training data and transform to a constant

For example, compute the mean of a column of the training data, then transform any input array by producing an output of the same shape but filled with the computed mean.

Parameters
  • func (Callable) – function to apply during fit

  • pass_y (bool) – whether to pass y to the function during fit

fit(X, y=None, **fit_kwargs)[source]
transform(X, **transform_kwargs)[source]
class ballet.eng.ConditionalTransformer(condition, satisfy_transform, unsatisfy_transform=None)[source]

Bases: ballet.eng.base.BaseTransformer

Transform columns that satisfy a condition during training

In the fit stage, determines which variables (columns) satisfy the condition. In the transform stage, applies the given transformation to the satisfied columns. If a second transformation is given, applies the second transformation to the complement of the satisfied columns (i.e. the columns that fail to satisfy the condition). Otherwise, these unsatisfied columns are passed through unchanged.

Parameters
  • condition (Callable) – condition function

  • satisfy_transform (Callable) – transform function for satisfied columns

  • unsatisfy_transform (Optional[Callable]) – transform function for unsatisfied columns (defaults to identity)

fit(X, y=None, **fit_args)[source]
transform(X, **transform_args)[source]
class ballet.eng.GroupedFunctionTransformer(func, func_kwargs=None, groupby_kwargs=None)[source]

Bases: sklearn.preprocessing._function_transformer.FunctionTransformer

Transformer that applies a callable to each group of a groupby

Parameters
  • func (Callable) – callable to apply

  • func_kwargs (Optional[dict]) – keyword arguments to pass

  • groupby_kwargs (Optional[dict]) – keyword arguments to pd.DataFrame.groupby. If omitted, no grouping is performed and the function is called on the entire DataFrame.

transform(X, **transform_kwargs)[source]

Transform X using the forward function.

Parameters

X (array-like, shape (n_samples, n_features)) – Input array.

Returns

X_out – Transformed input.

Return type

array-like, shape (n_samples, n_features)

class ballet.eng.GroupwiseTransformer(transformer, groupby_kwargs=None, column_selection=None, handle_unknown='error', handle_error='error')[source]

Bases: ballet.eng.base.BaseTransformer

Transformer that does something different for every group

For each group identified in the training set by the groupby operation, a separate transformer is cloned and fit. This is useful to learn group-wise transformers that do not leak data between the training and test sets. Consider the case of imputing missing values with the mean of some group. A normal, pure-pandas implementation, such as X_te.groupby(by='foo').apply('mean') would leak information about the test set means, which might differ from the training set means.

Parameters
  • transformer (Union[Callable, BaseTransformer, None]) – the transformer to apply to each group. If transformer is a transformer-like instance (i.e. has fit, transform methods etc.), then it is cloned for each group. If transformer is a transformer-like class (i.e. instances of the class are transformer-like), then it is initialized with no arguments for each group. If it is a callable, then it is called with no arguments for each group.

  • groupby_kwargs (Optional[dict]) – keyword arguments to pd.DataFrame.groupby

  • column_selection (Union[str, List[str], None]) – column, or list of columns, to select after the groupby. Equivalent to df.groupby(...)[column_selection]. Defaults to None, i.e. no column selection is performed.

  • handle_unknown (str) – ‘error’ or ‘ignore’, default=’error’. Whether to raise an error or ignore if an unknown group is encountered during transform. When this parameter is set to ‘ignore’ and an unknown group is encountered during transform, the group’s values will be passed through unchanged.

  • handle_error (str) – ‘error’ or ‘ignore’, default=’error’. Whether to raise an error or ignore if an error is raised during transforming an individual group. When this parameter is set to ‘ignore’ and an error is raised when calling the transformer’s transform method on an individual group, the group’s values will be passed through unchanged.

Example usage:

In this example, we create a groupwise transformer that fits a separate imputer for each group encountered. For new data points, values will be imputed according to the mean of its group on the training set, avoiding any data leakage.

>>> from sklearn.impute import SimpleImputer
>>> transformer = GroupwiseTransformer(
...     SimpleImputer(strategy='mean'),
...     groupby_kwargs = {'level': 'name'}
... )
Raises

ballet.exc.BalletError – if handle_unknown==’error’ and an unknown group is encountered at transform-time.

fit(X, y=None, **fit_kwargs)[source]
transform(X, **transform_kwargs)[source]
class ballet.eng.IdentityTransformer[source]

Bases: sklearn.preprocessing._function_transformer.FunctionTransformer

Simple transformer that passes through its input

class ballet.eng.LagImputer(groupby_kwargs=None)[source]

Bases: ballet.eng.base.GroupedFunctionTransformer

Fill missing values using group-specific lags

class ballet.eng.NamedFramer(name)[source]

Bases: ballet.eng.base.BaseTransformer

Convert object to named 1d DataFrame

If transformation is successful, the resulting object is a DataFrame with a name attribute as given.

Parameters

name (str) – name for resulting DataFrame

transform(X, **transform_kwargs)[source]
class ballet.eng.NoFitMixin[source]

Bases: object

Mix-in class for transformations that do not require a fit stage

fit(X, y=None, **fit_kwargs)[source]
class ballet.eng.NullFiller(replacement=0.0, isnull=<function isna>)[source]

Bases: ballet.eng.base.BaseTransformer

Fill values passing a filter with a given replacement

Parameters
  • replacement – replacement for each null value

  • isnull (callable) – vectorized test of whether a value is consider null. Defaults to pandas.isnull.

transform(X, **transform_kwargs)[source]
class ballet.eng.NullIndicator[source]

Bases: ballet.eng.base.BaseTransformer

Indicate whether values are null or not

transform(X, **tranform_kwargs)[source]
class ballet.eng.SimpleFunctionTransformer(func, func_kwargs=None)[source]

Bases: sklearn.preprocessing._function_transformer.FunctionTransformer

Transformer that applies a callable to its input

The callable will be called on the input X in the transform stage, optionally with additional arguments and keyword arguments.

A simple wrapper around FunctionTransformer.

Parameters
  • func (Callable) – callable to apply

  • func_kwargs (Optional[dict]) – keyword arguments to pass

class ballet.eng.SingleLagger(lag, groupby_kwargs=None)[source]

Bases: ballet.eng.base.GroupedFunctionTransformer

Transformer that applies a lag operator to each group

Parameters
  • lag (int) – lag to apply

  • groupby_kwargs (Optional[dict]) – keyword arguments to pd.DataFrame.groupby

class ballet.eng.SubsetTransformer(input, transformer, alias=None)[source]

Bases: sklearn_pandas.dataframe_mapper.DataFrameMapper

Transform a subset of columns with another transformer

Parameters
  • input (Union[str, List[str]]) –

  • transformer (Union[Callable, BaseTransformer, None]) –

  • alias (Optional[str]) –

class ballet.eng.ValueReplacer(value, replacement)[source]

Bases: ballet.eng.base.BaseTransformer

Replace instances of some value with some replacement

Parameters
  • value – value to replace (checked by equality testing)

  • replacement – replacement

transform(X, **transform_kwargs)[source]
ballet.eng.make_multi_lagger(lags, groupby_kwargs=None)[source]

Return a union of transformers that apply different lags

Parameters
  • lags (Iterable[int]) – collection of lags to apply

  • groupby_kwargs (Optional[dict]) – keyword arguments to pd.DataFrame.groupby

Return type

FeatureUnion