ballet.eng.base module

class ballet.eng.base.BaseTransformer[source]

Bases: ballet.eng.base.NoFitMixin, sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Base transformer class for developing new transformers

class ballet.eng.base.ConditionalTransformer(condition, satisfy_transform, unsatisfy_transform=None)[source]

Bases: ballet.eng.base.BaseTransformer

Transform columns that satisfy a condition during training

In the fit stage, determines which variables (columns) satisfy the condition. In the transform stage, applies the given transformation to the satisfied columns. If a second transformation is given, applies the second transformation to the complement of the satisfied columns (i.e. the columns that fail to satisfy the condition). Otherwise, these unsatisfied columns are passed through unchanged.

Parameters
  • condition (Callable) – condition function

  • satisfy_transform (Callable) – transform function for satisfied columns

  • unsatisfy_transform (Optional[Callable]) – transform function for unsatisfied columns (defaults to identity)

fit(X, y=None, **fit_args)[source]
transform(X, **transform_args)[source]
class ballet.eng.base.GroupedFunctionTransformer(func, func_kwargs=None, groupby_kwargs=None)[source]

Bases: sklearn.preprocessing._function_transformer.FunctionTransformer

Transformer that applies a callable to each group of a groupby

Parameters
  • func (Callable) – callable to apply

  • func_kwargs (Optional[dict]) – keyword arguments to pass

  • groupby_kwargs (Optional[dict]) – keyword arguments to pd.DataFrame.groupby. If omitted, no grouping is performed and the function is called on the entire DataFrame.

transform(X, **transform_kwargs)[source]

Transform X using the forward function.

Parameters

X (array-like, shape (n_samples, n_features)) – Input array.

Returns

X_out – Transformed input.

Return type

array-like, shape (n_samples, n_features)

class ballet.eng.base.GroupwiseTransformer(transformer, groupby_kwargs=None, column_selection=None, handle_unknown='error', handle_error='error')[source]

Bases: ballet.eng.base.BaseTransformer

Transformer that does something different for every group

For each group identified in the training set by the groupby operation, a separate transformer is cloned and fit. This is useful to learn group-wise transformers that do not leak data between the training and test sets. Consider the case of imputing missing values with the mean of some group. A normal, pure-pandas implementation, such as X_te.groupby(by='foo').apply('mean') would leak information about the test set means, which might differ from the training set means.

Parameters
  • transformer (Union[Callable, BaseTransformer, None]) – the transformer to apply to each group. If transformer is a transformer-like instance (i.e. has fit, transform methods etc.), then it is cloned for each group. If transformer is a transformer-like class (i.e. instances of the class are transformer-like), then it is initialized with no arguments for each group. If it is a callable, then it is called with no arguments for each group.

  • groupby_kwargs (Optional[dict]) – keyword arguments to pd.DataFrame.groupby

  • column_selection (Union[str, List[str], None]) – column, or list of columns, to select after the groupby. Equivalent to df.groupby(...)[column_selection]. Defaults to None, i.e. no column selection is performed.

  • handle_unknown (str) – ‘error’ or ‘ignore’, default=’error’. Whether to raise an error or ignore if an unknown group is encountered during transform. When this parameter is set to ‘ignore’ and an unknown group is encountered during transform, the group’s values will be passed through unchanged.

  • handle_error (str) – ‘error’ or ‘ignore’, default=’error’. Whether to raise an error or ignore if an error is raised during transforming an individual group. When this parameter is set to ‘ignore’ and an error is raised when calling the transformer’s transform method on an individual group, the group’s values will be passed through unchanged.

Example usage:

In this example, we create a groupwise transformer that fits a separate imputer for each group encountered. For new data points, values will be imputed according to the mean of its group on the training set, avoiding any data leakage.

>>> from sklearn.impute import SimpleImputer
>>> transformer = GroupwiseTransformer(
...     SimpleImputer(strategy='mean'),
...     groupby_kwargs = {'level': 'name'}
... )
Raises

ballet.exc.BalletError – if handle_unknown==’error’ and an unknown group is encountered at transform-time.

fit(X, y=None, **fit_kwargs)[source]
transform(X, **transform_kwargs)[source]
class ballet.eng.base.NoFitMixin[source]

Bases: object

Mix-in class for transformations that do not require a fit stage

fit(X, y=None, **fit_kwargs)[source]
class ballet.eng.base.SimpleFunctionTransformer(func, func_kwargs=None)[source]

Bases: sklearn.preprocessing._function_transformer.FunctionTransformer

Transformer that applies a callable to its input

The callable will be called on the input X in the transform stage, optionally with additional arguments and keyword arguments.

A simple wrapper around FunctionTransformer.

Parameters
  • func (Callable) – callable to apply

  • func_kwargs (Optional[dict]) – keyword arguments to pass

class ballet.eng.base.SubsetTransformer(input, transformer, alias=None)[source]

Bases: sklearn_pandas.dataframe_mapper.DataFrameMapper

Transform a subset of columns with another transformer

Parameters
  • input (Union[str, List[str]]) –

  • transformer (Union[Callable, BaseTransformer, None]) –

  • alias (Optional[str]) –