ballet.eng package¶
-
class
ballet.eng.BaseTransformer[source]¶ Bases:
ballet.eng.base.NoFitMixin,sklearn.base.TransformerMixin,sklearn.base.BaseEstimatorBase transformer class for developing new transformers
-
class
ballet.eng.BoxCoxTransformer(threshold, lmbda=0.0)[source]¶ Bases:
ballet.eng.base.ConditionalTransformerConditionally apply the Box-Cox transformation
In the fit stage, determines which variables (columns) have absolute skew above
threshold. In the transform stage, applies the Box-Cox transformation of 1+x to each variable selected previously.- Parameters
threshold (
float) – skew threshold.lmbda (
float) – power parameter of the Box-Cox transform. Defaults to 0.0
-
class
ballet.eng.ColumnSelector(cols)[source]¶ Bases:
ballet.eng.base.BaseTransformerSelect one or more columns from a DataFrame
- Parameters
cols (
Union[str,List[str]]) – column or columns to select
-
class
ballet.eng.ComputedValueTransformer(func, pass_y=False)[source]¶ Bases:
ballet.eng.base.BaseTransformerCompute a value on the training data and transform to a constant
For example, compute the mean of a column of the training data, then transform any input array by producing an output of the same shape but filled with the computed mean.
- Parameters
func (
Callable) – function to apply during fitpass_y (
bool) – whether to pass y to the function during fit
-
class
ballet.eng.ConditionalTransformer(condition, satisfy_transform, unsatisfy_transform=None)[source]¶ Bases:
ballet.eng.base.BaseTransformerTransform columns that satisfy a condition during training
In the fit stage, determines which variables (columns) satisfy the condition. In the transform stage, applies the given transformation to the satisfied columns. If a second transformation is given, applies the second transformation to the complement of the satisfied columns (i.e. the columns that fail to satisfy the condition). Otherwise, these unsatisfied columns are passed through unchanged.
- Parameters
condition (
Callable) – condition functionsatisfy_transform (
Callable) – transform function for satisfied columnsunsatisfy_transform (
Optional[Callable]) – transform function for unsatisfied columns (defaults to identity)
-
class
ballet.eng.GroupedFunctionTransformer(func, func_kwargs=None, groupby_kwargs=None)[source]¶ Bases:
sklearn.preprocessing._function_transformer.FunctionTransformerTransformer that applies a callable to each group of a groupby
- Parameters
func (
Callable) – callable to applyfunc_kwargs (
Optional[dict]) – keyword arguments to passgroupby_kwargs (
Optional[dict]) – keyword arguments topd.DataFrame.groupby. If omitted, no grouping is performed and the function is called on the entire DataFrame.
-
class
ballet.eng.GroupwiseTransformer(transformer, groupby_kwargs=None, column_selection=None, handle_unknown='error', handle_error='error')[source]¶ Bases:
ballet.eng.base.BaseTransformerTransformer that does something different for every group
For each group identified in the training set by the groupby operation, a separate transformer is cloned and fit. This is useful to learn group-wise transformers that do not leak data between the training and test sets. Consider the case of imputing missing values with the mean of some group. A normal, pure-pandas implementation, such as
X_te.groupby(by='foo').apply('mean')would leak information about the test set means, which might differ from the training set means.- Parameters
transformer (
Union[Callable,BaseTransformer,None]) – the transformer to apply to each group. If transformer is a transformer-like instance (i.e. has fit, transform methods etc.), then it is cloned for each group. If transformer is a transformer-like class (i.e. instances of the class are transformer-like), then it is initialized with no arguments for each group. If it is a callable, then it is called with no arguments for each group.groupby_kwargs (
Optional[dict]) – keyword arguments to pd.DataFrame.groupbycolumn_selection (
Union[str,List[str],None]) – column, or list of columns, to select after the groupby. Equivalent todf.groupby(...)[column_selection]. Defaults to None, i.e. no column selection is performed.handle_unknown (
str) – ‘error’ or ‘ignore’, default=’error’. Whether to raise an error or ignore if an unknown group is encountered during transform. When this parameter is set to ‘ignore’ and an unknown group is encountered during transform, the group’s values will be passed through unchanged.handle_error (
str) – ‘error’ or ‘ignore’, default=’error’. Whether to raise an error or ignore if an error is raised during transforming an individual group. When this parameter is set to ‘ignore’ and an error is raised when calling the transformer’s transform method on an individual group, the group’s values will be passed through unchanged.
Example usage:
In this example, we create a groupwise transformer that fits a separate imputer for each group encountered. For new data points, values will be imputed according to the mean of its group on the training set, avoiding any data leakage.
>>> from sklearn.impute import SimpleImputer >>> transformer = GroupwiseTransformer( ... SimpleImputer(strategy='mean'), ... groupby_kwargs = {'level': 'name'} ... )
- Raises
ballet.exc.BalletError – if handle_unknown==’error’ and an unknown group is encountered at transform-time.
-
class
ballet.eng.IdentityTransformer[source]¶ Bases:
sklearn.preprocessing._function_transformer.FunctionTransformerSimple transformer that passes through its input
-
class
ballet.eng.LagImputer(groupby_kwargs=None)[source]¶ Bases:
ballet.eng.base.GroupedFunctionTransformerFill missing values using group-specific lags
-
class
ballet.eng.NamedFramer(name)[source]¶ Bases:
ballet.eng.base.BaseTransformerConvert object to named 1d DataFrame
If transformation is successful, the resulting object is a DataFrame with a
nameattribute as given.- Parameters
name (
str) – name for resulting DataFrame
-
class
ballet.eng.NoFitMixin[source]¶ Bases:
objectMix-in class for transformations that do not require a fit stage
-
class
ballet.eng.NullFiller(replacement=0.0, isnull=<function isna>)[source]¶ Bases:
ballet.eng.base.BaseTransformerFill values passing a filter with a given replacement
- Parameters
replacement – replacement for each null value
isnull (callable) – vectorized test of whether a value is consider null. Defaults to
pandas.isnull.
-
class
ballet.eng.NullIndicator[source]¶ Bases:
ballet.eng.base.BaseTransformerIndicate whether values are null or not
-
class
ballet.eng.SimpleFunctionTransformer(func, func_kwargs=None)[source]¶ Bases:
sklearn.preprocessing._function_transformer.FunctionTransformerTransformer that applies a callable to its input
The callable will be called on the input X in the transform stage, optionally with additional arguments and keyword arguments.
A simple wrapper around
FunctionTransformer.- Parameters
func (
Callable) – callable to applyfunc_kwargs (
Optional[dict]) – keyword arguments to pass
-
class
ballet.eng.SingleLagger(lag, groupby_kwargs=None)[source]¶ Bases:
ballet.eng.base.GroupedFunctionTransformerTransformer that applies a lag operator to each group
- Parameters
lag (
int) – lag to applygroupby_kwargs (
Optional[dict]) – keyword arguments to pd.DataFrame.groupby
-
class
ballet.eng.SubsetTransformer(input, transformer, alias=None)[source]¶ Bases:
sklearn_pandas.dataframe_mapper.DataFrameMapperTransform a subset of columns with another transformer
- Parameters
input (
Union[str,List[str]]) –transformer (
Union[Callable,BaseTransformer,None]) –alias (
Optional[str]) –
-
class
ballet.eng.ValueReplacer(value, replacement)[source]¶ Bases:
ballet.eng.base.BaseTransformerReplace instances of some value with some replacement
- Parameters
value – value to replace (checked by equality testing)
replacement – replacement
-
ballet.eng.make_multi_lagger(lags, groupby_kwargs=None)[source]¶ Return a union of transformers that apply different lags
- Parameters
lags (
Iterable[int]) – collection of lags to applygroupby_kwargs (
Optional[dict]) – keyword arguments to pd.DataFrame.groupby
- Return type
FeatureUnion