ballet.feature module

class ballet.feature.Feature(input, transformer, name=None, description=None, output=None, source=None, options=None)[source]

Bases: object

A feature definition

Conceptually, a feature definition is a learned function that maps raw variables in one data instance to a vector of feature values. A feature definition can produce either a scalar feature value for each instance or a vector of feature values, as in the case of an embedding technique like PCA or the one-hot encoding of a categorical variable.

Parameters
  • input (Union[str, List[str], Callable[…, Union[str, List[str]]]]) – required columns from the input dataframe needed for the transformation. There is also preliminary support for using other pandas indexing, such as selection by callable – if you pass a callable, the entities data frame will be indexed using the callable. This is not officially supported by the underlying sklearn-pandas library, so please report any issues you experience.

  • transformer (Union[Callable, BaseTransformer, None, List[Union[Callable, BaseTransformer, None]]]) – transformer, sequence of transformers, or None. A “transformer” is an instance of a class that provides a fit/transform-style learned transformation. Alternately, a callable can be provided, either by itself or in a list, in which case it will be converted into a :py:class:FunctionTransformer for convenience. If None is provided, it will be replaced with the :py:class:IdentityTransformer.

  • name (Optional[str]) – name of the feature

  • description (Optional[str]) – description of the feature

  • output (Union[str, List[str], None]) – base name or ordered sequence of names of feature values produced by this transformer

  • source (Optional[str]) – the module in which this feature was defined

  • options (Optional[dict]) – options

as_feature_engineering_pipeline()[source]

Return standalone FeatureEngineeringPipeline with this feature

Return type

FeatureEngineeringPipeline

as_input_transformer_tuple()[source]

Return an tuple for passing to DataFrameMapper constructor

Return type

Tuple[Union[str, List[str], Callable[…, Union[str, List[str]]]], Union[TransformerPipeline, ForwardRef], dict]

property author

The author of this feature if it can be inferred from its source

The author can be inferred if the module the feature was defined in follows the pattern package.subpackage.user_username.feature_featurename. Otherwise, returns None.

Return type

Optional[str]

fit(X, y=None)[source]

Fit feature.pipeline

fit_transform(X, y=None)[source]

Fit feature.pipeline and then transform data

property pipeline

A feature engineering pipeline containing just this feature

Return type

FeatureEngineeringPipeline

transform(X)[source]

Transform data using feature.pipeline