ballet.client module¶
-
class
ballet.client.
Client
(prj=None)[source]¶ Bases:
object
User client for validating features
Provides a simple interface to validating a given feature.
- Parameters
prj (
Union
[Project
,module
,str
,PathLike
,None
]) – a way to create a balletProject
, either with an already-created object, the project’s top-level module, or a path within the project. If none of these are provided, will attempt to detect the project by ascending from the current working directory.
-
property
api
¶ Access feature engineering API of this project
- Return type
-
discover
(input=None, primitive=None, expensive_stats=False)[source]¶ Discover existing features
Display information about existing features including summary statistics on the development dataset. If the feature extracts multiple feature values, then the summary statistics (e.g. mean, std, nunique) are computed for each feature value and then averaged. If the development dataset cannot be loaded, computation of summary statistics is skipped.
The following information is shown: - name: the name of the feature - description: the description of the feature - input: the variables that are used as input to the feature - transformer: the transformer/transformer pipeline - output: the output columns of the feature (not usually specified) - author: the GitHub username of the feature’s author - source: the fully-qualified name of the Python module that contains the
feature
- mutual_information: estimated mutual information between the feature (or
averaged over feature values) and the target on the development dataset split
- conditional_mutual_information: estimated conditional mutual information
between the feature (or averaged over feature values) and the target conditional on all other features on the development dataset split
ninputs: the number of input columns to the feature
- nvalues: the number of feature values this feature extracts (i.e. 1 for
a scalar-valued feature and >1 for a vector-valued feature)
- ncontinuous: the number of feature values this feature extracts that are
continuous-valued
- ndiscrete: the number of feature values this feature extracts that are
discrete-valued
mean: mean of the feature on the development dataset split
- std: standard deviation of the feature (or averaged over feature values)
on the development dataset split
- var: variance of the feature (or averaged over feature values) on the
development dataset split
min: minimum of the feature on the development dataset split
- median: median of the feature (or median over feature values) on the
development dataset split
max: maximum of the feature on the development dataset split
- nunique: number of unique values of the feature (or averaged over
feature values) on the development dataset split
The following query operators are supported: - input (str): filter to only features that have
input
in their input/list of inputs
- primitive (str): filter to only features that use primitive
primitive
(i.e. a class with nameprimitive
) in the transformer/transformer pipeline
For other queries, you should just use normal DataFrame indexing:
>>> features_df[features_df['author'] == 'jane'] >>> features_df[features_df['name'].str.contains('married')] >>> features_df[features_df['mutual_information'] > 0.05] >>> features_df[features_df['input'].apply( lambda input: 'A' in input and 'B' in input)]
- Return type
DataFrame
- Returns
data frame with features on the row index and columns as described above
-
project
¶ Access ballet-specific project info