ballet.eng.base module¶
-
class
ballet.eng.base.
BaseTransformer
[source]¶ Bases:
ballet.eng.base.NoFitMixin
,sklearn.base.TransformerMixin
,sklearn.base.BaseEstimator
Base transformer class for developing new transformers
-
class
ballet.eng.base.
ConditionalTransformer
(condition, satisfy_transform, unsatisfy_transform=None)[source]¶ Bases:
ballet.eng.base.BaseTransformer
Transform columns that satisfy a condition during training
In the fit stage, determines which variables (columns) satisfy the condition. In the transform stage, applies the given transformation to the satisfied columns. If a second transformation is given, applies the second transformation to the complement of the satisfied columns (i.e. the columns that fail to satisfy the condition). Otherwise, these unsatisfied columns are passed through unchanged.
- Parameters
condition (
Callable
) – condition functionsatisfy_transform (
Callable
) – transform function for satisfied columnsunsatisfy_transform (
Optional
[Callable
]) – transform function for unsatisfied columns (defaults to identity)
-
class
ballet.eng.base.
GroupedFunctionTransformer
(func, func_kwargs=None, groupby_kwargs=None)[source]¶ Bases:
sklearn.preprocessing._function_transformer.FunctionTransformer
Transformer that applies a callable to each group of a groupby
- Parameters
func (
Callable
) – callable to applyfunc_kwargs (
Optional
[dict
]) – keyword arguments to passgroupby_kwargs (
Optional
[dict
]) – keyword arguments topd.DataFrame.groupby
. If omitted, no grouping is performed and the function is called on the entire DataFrame.
-
class
ballet.eng.base.
GroupwiseTransformer
(transformer, groupby_kwargs=None, column_selection=None, handle_unknown='error', handle_error='error')[source]¶ Bases:
ballet.eng.base.BaseTransformer
Transformer that does something different for every group
For each group identified in the training set by the groupby operation, a separate transformer is cloned and fit. This is useful to learn group-wise transformers that do not leak data between the training and test sets. Consider the case of imputing missing values with the mean of some group. A normal, pure-pandas implementation, such as
X_te.groupby(by='foo').apply('mean')
would leak information about the test set means, which might differ from the training set means.- Parameters
transformer (
Union
[Callable
,BaseTransformer
,None
]) – the transformer to apply to each group. If transformer is a transformer-like instance (i.e. has fit, transform methods etc.), then it is cloned for each group. If transformer is a transformer-like class (i.e. instances of the class are transformer-like), then it is initialized with no arguments for each group. If it is a callable, then it is called with no arguments for each group.groupby_kwargs (
Optional
[dict
]) – keyword arguments to pd.DataFrame.groupbycolumn_selection (
Union
[str
,List
[str
],None
]) – column, or list of columns, to select after the groupby. Equivalent todf.groupby(...)[column_selection]
. Defaults to None, i.e. no column selection is performed.handle_unknown (
str
) – ‘error’ or ‘ignore’, default=’error’. Whether to raise an error or ignore if an unknown group is encountered during transform. When this parameter is set to ‘ignore’ and an unknown group is encountered during transform, the group’s values will be passed through unchanged.handle_error (
str
) – ‘error’ or ‘ignore’, default=’error’. Whether to raise an error or ignore if an error is raised during transforming an individual group. When this parameter is set to ‘ignore’ and an error is raised when calling the transformer’s transform method on an individual group, the group’s values will be passed through unchanged.
Example usage:
In this example, we create a groupwise transformer that fits a separate imputer for each group encountered. For new data points, values will be imputed according to the mean of its group on the training set, avoiding any data leakage.
>>> from sklearn.impute import SimpleImputer >>> transformer = GroupwiseTransformer( ... SimpleImputer(strategy='mean'), ... groupby_kwargs = {'level': 'name'} ... )
- Raises
ballet.exc.BalletError – if handle_unknown==’error’ and an unknown group is encountered at transform-time.
-
class
ballet.eng.base.
NoFitMixin
[source]¶ Bases:
object
Mix-in class for transformations that do not require a fit stage
-
class
ballet.eng.base.
SimpleFunctionTransformer
(func, func_kwargs=None)[source]¶ Bases:
sklearn.preprocessing._function_transformer.FunctionTransformer
Transformer that applies a callable to its input
The callable will be called on the input X in the transform stage, optionally with additional arguments and keyword arguments.
A simple wrapper around
FunctionTransformer
.- Parameters
func (
Callable
) – callable to applyfunc_kwargs (
Optional
[dict
]) – keyword arguments to pass
-
class
ballet.eng.base.
SubsetTransformer
(input, transformer, alias=None)[source]¶ Bases:
sklearn_pandas.dataframe_mapper.DataFrameMapper
Transform a subset of columns with another transformer
- Parameters
input (
Union
[str
,List
[str
]]) –transformer (
Union
[Callable
,BaseTransformer
,None
]) –alias (
Optional
[str
]) –