ballet.project module¶
-
class
ballet.project.
FeatureEngineeringProject
(*, package, encoder, load_data, extra_features=None, engineer_features=None)[source]¶ Bases:
object
-
CACHE_TIMEOUT
= 600¶
-
property
features
¶ Get all features from the project
Both collects all contrib features from the project and allows extra features to be provided by the API author.
- Return type
List
[Feature
]
-
load_data
(*args, cache=True, **kwargs)[source]¶ Call the project’s load_data function, caching dataset
Dataset is cached for FeatureEngineeringProject.CACHE_TIMEOUT seconds. To invalidate cache and cause data to be re-loaded from wherever it comes from, pass cache=False.
Typically, the project’s load_data function has this signature and description:
load_data(split='train', input_dir=None) If input dir is not None, then load whatever dataset appears in `input_dir`. Otherwise, load the data split indicated by `split`.
- Return type
Tuple
[DataFrame
,DataFrame
]
-
property
pipeline
¶ Get the feature engineering pipeline from the existing features
- Return type
-
project
¶ Get the Project object representing this project.
-
-
class
ballet.project.
Project
(package)[source]¶ Bases:
object
Encapsulate information on a ballet project
This is a utility class mostly useful for easy access to the project’s information from within the ballet.validation package.
- Parameters
package (ModuleType) – python package representing imported ballet project
-
property
api
¶ - Return type
-
property
branch
¶ Return current git branch according to git tree or CI environment
- Return type
Optional
[str
]
-
config
¶
-
classmethod
from_cwd
()[source]¶ Create a Project instance by searching up from cwd
Recursively searches for the ballet configuration file at the current working directory and parent directories, stopping when it reaches a file system boundary.
- Raises
ConfigurationError – couldn’t find the configuration file
-
classmethod
from_path
(path, ascend=False)[source]¶ Create a Project instance from an fs path to the containing dir
- Parameters
path (
Union
[str
,PathLike
]) – path to directory that contains the projectascend (
bool
) – if the config file is not found in the given directory, then search in parent directories, stopping at a file system boundary
-
property
on_master
¶ - Return type
bool
-
path
¶ Return the project path (aka project root)
If
package.__file__
is/foo/src/foo/__init__.py
, then project.path should be/foo
.
-
repo
¶ Return a git.Repo object corresponding to this project
-
resolve
(modname, attr=None)[source]¶ Import module or attribute from project
- Parameters
modname (
str
) – dotted module name relative to top-level with leading dot omited; if trying to import the top-level package, use ‘’ (can also just access self.package)attr (
Optional
[str
]) – attribute to get from the imported module
Example
>>> project.resolve('', '__version__') # return __version__ attribute from top-level package >>> project.resolve('api') # return myproject.api module >>> project.resolve('api', attr='api') # return api object from myproject.api module >>> project.resolve('foo.bar') # return myproject.foo.bar module
- Return type
Any
-
property
version
¶ Some version identifier for the current project
Implementation is to return the abbreviated SHA1 of git HEAD.
- Return type
str
-
ballet.project.
detect_github_username
(project)[source]¶ Detect github username
Looks in the following order: 1. github.user git config variable 2. git remote origin 3. $USER 4. ‘username’
- Return type
str
-
ballet.project.
load_config
(path=None, ascend=True)[source]¶ User-facing function to load config from project code
The default behavior when no arguments are provided is to detect the calling code using introspection and load a config object by ascending the directory of the calling code. If this does not succeed, you should just pass path directly.
- Return type
LazySettings
-
ballet.project.
load_config_at_path
(path)[source]¶ Load config at exact path
- Parameters
path (
Union
[str
,PathLike
]) – path to config file- Returns
config dict
- Return type
dict