ballet.eng package¶
-
class
ballet.eng.
BaseTransformer
[source]¶ Bases:
ballet.eng.base.NoFitMixin
,sklearn.base.TransformerMixin
,sklearn.base.BaseEstimator
Base transformer class for developing new transformers
-
class
ballet.eng.
BoxCoxTransformer
(threshold, lmbda=0.0)[source]¶ Bases:
ballet.eng.base.ConditionalTransformer
Conditionally apply the Box-Cox transformation
In the fit stage, determines which variables (columns) have absolute skew above
threshold
. In the transform stage, applies the Box-Cox transformation of 1+x to each variable selected previously.- Parameters
threshold (
float
) – skew threshold.lmbda (
float
) – power parameter of the Box-Cox transform. Defaults to 0.0
-
class
ballet.eng.
ColumnSelector
(cols)[source]¶ Bases:
ballet.eng.base.BaseTransformer
Select one or more columns from a DataFrame
- Parameters
cols (
Union
[str
,List
[str
]]) – column or columns to select
-
class
ballet.eng.
ComputedValueTransformer
(func, pass_y=False)[source]¶ Bases:
ballet.eng.base.BaseTransformer
Compute a value on the training data and transform to a constant
For example, compute the mean of a column of the training data, then transform any input array by producing an output of the same shape but filled with the computed mean.
- Parameters
func (
Callable
) – function to apply during fitpass_y (
bool
) – whether to pass y to the function during fit
-
class
ballet.eng.
ConditionalTransformer
(condition, satisfy_transform, unsatisfy_transform=None)[source]¶ Bases:
ballet.eng.base.BaseTransformer
Transform columns that satisfy a condition during training
In the fit stage, determines which variables (columns) satisfy the condition. In the transform stage, applies the given transformation to the satisfied columns. If a second transformation is given, applies the second transformation to the complement of the satisfied columns (i.e. the columns that fail to satisfy the condition). Otherwise, these unsatisfied columns are passed through unchanged.
- Parameters
condition (
Callable
) – condition functionsatisfy_transform (
Callable
) – transform function for satisfied columnsunsatisfy_transform (
Optional
[Callable
]) – transform function for unsatisfied columns (defaults to identity)
-
class
ballet.eng.
GroupedFunctionTransformer
(func, func_kwargs=None, groupby_kwargs=None)[source]¶ Bases:
sklearn.preprocessing._function_transformer.FunctionTransformer
Transformer that applies a callable to each group of a groupby
- Parameters
func (
Callable
) – callable to applyfunc_kwargs (
Optional
[dict
]) – keyword arguments to passgroupby_kwargs (
Optional
[dict
]) – keyword arguments topd.DataFrame.groupby
. If omitted, no grouping is performed and the function is called on the entire DataFrame.
-
class
ballet.eng.
GroupwiseTransformer
(transformer, groupby_kwargs=None, column_selection=None, handle_unknown='error', handle_error='error')[source]¶ Bases:
ballet.eng.base.BaseTransformer
Transformer that does something different for every group
For each group identified in the training set by the groupby operation, a separate transformer is cloned and fit. This is useful to learn group-wise transformers that do not leak data between the training and test sets. Consider the case of imputing missing values with the mean of some group. A normal, pure-pandas implementation, such as
X_te.groupby(by='foo').apply('mean')
would leak information about the test set means, which might differ from the training set means.- Parameters
transformer (
Union
[Callable
,BaseTransformer
,None
]) – the transformer to apply to each group. If transformer is a transformer-like instance (i.e. has fit, transform methods etc.), then it is cloned for each group. If transformer is a transformer-like class (i.e. instances of the class are transformer-like), then it is initialized with no arguments for each group. If it is a callable, then it is called with no arguments for each group.groupby_kwargs (
Optional
[dict
]) – keyword arguments to pd.DataFrame.groupbycolumn_selection (
Union
[str
,List
[str
],None
]) – column, or list of columns, to select after the groupby. Equivalent todf.groupby(...)[column_selection]
. Defaults to None, i.e. no column selection is performed.handle_unknown (
str
) – ‘error’ or ‘ignore’, default=’error’. Whether to raise an error or ignore if an unknown group is encountered during transform. When this parameter is set to ‘ignore’ and an unknown group is encountered during transform, the group’s values will be passed through unchanged.handle_error (
str
) – ‘error’ or ‘ignore’, default=’error’. Whether to raise an error or ignore if an error is raised during transforming an individual group. When this parameter is set to ‘ignore’ and an error is raised when calling the transformer’s transform method on an individual group, the group’s values will be passed through unchanged.
Example usage:
In this example, we create a groupwise transformer that fits a separate imputer for each group encountered. For new data points, values will be imputed according to the mean of its group on the training set, avoiding any data leakage.
>>> from sklearn.impute import SimpleImputer >>> transformer = GroupwiseTransformer( ... SimpleImputer(strategy='mean'), ... groupby_kwargs = {'level': 'name'} ... )
- Raises
ballet.exc.BalletError – if handle_unknown==’error’ and an unknown group is encountered at transform-time.
-
class
ballet.eng.
IdentityTransformer
[source]¶ Bases:
sklearn.preprocessing._function_transformer.FunctionTransformer
Simple transformer that passes through its input
-
class
ballet.eng.
LagImputer
(groupby_kwargs=None)[source]¶ Bases:
ballet.eng.base.GroupedFunctionTransformer
Fill missing values using group-specific lags
-
class
ballet.eng.
NamedFramer
(name)[source]¶ Bases:
ballet.eng.base.BaseTransformer
Convert object to named 1d DataFrame
If transformation is successful, the resulting object is a DataFrame with a
name
attribute as given.- Parameters
name (
str
) – name for resulting DataFrame
-
class
ballet.eng.
NoFitMixin
[source]¶ Bases:
object
Mix-in class for transformations that do not require a fit stage
-
class
ballet.eng.
NullFiller
(replacement=0.0, isnull=<function isna>)[source]¶ Bases:
ballet.eng.base.BaseTransformer
Fill values passing a filter with a given replacement
- Parameters
replacement – replacement for each null value
isnull (callable) – vectorized test of whether a value is consider null. Defaults to
pandas.isnull
.
-
class
ballet.eng.
NullIndicator
[source]¶ Bases:
ballet.eng.base.BaseTransformer
Indicate whether values are null or not
-
class
ballet.eng.
SimpleFunctionTransformer
(func, func_kwargs=None)[source]¶ Bases:
sklearn.preprocessing._function_transformer.FunctionTransformer
Transformer that applies a callable to its input
The callable will be called on the input X in the transform stage, optionally with additional arguments and keyword arguments.
A simple wrapper around
FunctionTransformer
.- Parameters
func (
Callable
) – callable to applyfunc_kwargs (
Optional
[dict
]) – keyword arguments to pass
-
class
ballet.eng.
SingleLagger
(lag, groupby_kwargs=None)[source]¶ Bases:
ballet.eng.base.GroupedFunctionTransformer
Transformer that applies a lag operator to each group
- Parameters
lag (
int
) – lag to applygroupby_kwargs (
Optional
[dict
]) – keyword arguments to pd.DataFrame.groupby
-
class
ballet.eng.
SubsetTransformer
(input, transformer, alias=None)[source]¶ Bases:
sklearn_pandas.dataframe_mapper.DataFrameMapper
Transform a subset of columns with another transformer
- Parameters
input (
Union
[str
,List
[str
]]) –transformer (
Union
[Callable
,BaseTransformer
,None
]) –alias (
Optional
[str
]) –
-
class
ballet.eng.
ValueReplacer
(value, replacement)[source]¶ Bases:
ballet.eng.base.BaseTransformer
Replace instances of some value with some replacement
- Parameters
value – value to replace (checked by equality testing)
replacement – replacement
-
ballet.eng.
make_multi_lagger
(lags, groupby_kwargs=None)[source]¶ Return a union of transformers that apply different lags
- Parameters
lags (
Iterable
[int
]) – collection of lags to applygroupby_kwargs (
Optional
[dict
]) – keyword arguments to pd.DataFrame.groupby
- Return type
FeatureUnion