ballet.project module

class ballet.project.FeatureEngineeringProject(*, package, encoder, load_data, extra_features=None, engineer_features=None)[source]

Bases: object

CACHE_TIMEOUT = 600
engineer_features(*args, **kwargs)[source]

Engineer features

Return type

EngineerFeaturesResult

property features

Get all features from the project

Both collects all contrib features from the project and allows extra features to be provided by the API author.

Return type

List[Feature]

load_data(*args, cache=True, **kwargs)[source]

Call the project’s load_data function, caching dataset

Dataset is cached for FeatureEngineeringProject.CACHE_TIMEOUT seconds. To invalidate cache and cause data to be re-loaded from wherever it comes from, pass cache=False.

Typically, the project’s load_data function has this signature and description:

load_data(split='train', input_dir=None)

If input dir is not None, then load whatever dataset appears in
`input_dir`. Otherwise, load the data split indicated by `split`.
Return type

Tuple[DataFrame, DataFrame]

property pipeline

Get the feature engineering pipeline from the existing features

Return type

FeatureEngineeringPipeline

project

Get the Project object representing this project.

class ballet.project.Project(package)[source]

Bases: object

Encapsulate information on a ballet project

This is a utility class mostly useful for easy access to the project’s information from within the ballet.validation package.

Parameters

package (ModuleType) – python package representing imported ballet project

property api
Return type

FeatureEngineeringProject

property branch

Return current git branch according to git tree or CI environment

Return type

Optional[str]

config
classmethod from_cwd()[source]

Create a Project instance by searching up from cwd

Recursively searches for the ballet configuration file at the current working directory and parent directories, stopping when it reaches a file system boundary.

Raises

ConfigurationError – couldn’t find the configuration file

classmethod from_path(path, ascend=False)[source]

Create a Project instance from an fs path to the containing dir

Parameters
  • path (Union[str, PathLike]) – path to directory that contains the project

  • ascend (bool) – if the config file is not found in the given directory, then search in parent directories, stopping at a file system boundary

property on_master
Return type

bool

path

Return the project path (aka project root)

If package.__file__ is /foo/src/foo/__init__.py, then project.path should be /foo.

repo

Return a git.Repo object corresponding to this project

resolve(modname, attr=None)[source]

Import module or attribute from project

Parameters
  • modname (str) – dotted module name relative to top-level with leading dot omited; if trying to import the top-level package, use ‘’ (can also just access self.package)

  • attr (Optional[str]) – attribute to get from the imported module

Example

>>> project.resolve('', '__version__')
# return __version__ attribute from top-level package
>>> project.resolve('api')
# return myproject.api module
>>> project.resolve('api', attr='api')
# return api object from myproject.api module
>>> project.resolve('foo.bar')
# return myproject.foo.bar module
Return type

Any

property version

Some version identifier for the current project

Implementation is to return the abbreviated SHA1 of git HEAD.

Return type

str

ballet.project.detect_github_username(project)[source]

Detect github username

Looks in the following order: 1. github.user git config variable 2. git remote origin 3. $USER 4. ‘username’

Return type

str

ballet.project.load_config(path=None, ascend=True)[source]

User-facing function to load config from project code

The default behavior when no arguments are provided is to detect the calling code using introspection and load a config object by ascending the directory of the calling code. If this does not succeed, you should just pass path directly.

Return type

LazySettings

ballet.project.load_config_at_path(path)[source]

Load config at exact path

Parameters

path (Union[str, PathLike]) – path to config file

Returns

config dict

Return type

dict

ballet.project.load_config_in_dir(path)[source]

Load config in containing directory

Parameters

path (Union[str, PathLike]) – path to containing directory of config file

Return type

LazySettings

Returns

config dict

ballet.project.make_feature_path(contrib_dir, username, featurename)[source]
Return type

Path

ballet.project.relative_to_contrib(diff, project)[source]

Compute relative path of changed file to contrib dir

Parameters
  • diff (Diff) – file diff

  • project (Project) – project

Return type

Path