ballet.project module¶
-
class
ballet.project.FeatureEngineeringProject(*, package, encoder, load_data, extra_features=None, engineer_features=None)[source]¶ Bases:
object-
CACHE_TIMEOUT= 600¶
-
property
features¶ Get all features from the project
Both collects all contrib features from the project and allows extra features to be provided by the API author.
- Return type
List[Feature]
-
load_data(*args, cache=True, **kwargs)[source]¶ Call the project’s load_data function, caching dataset
Dataset is cached for FeatureEngineeringProject.CACHE_TIMEOUT seconds. To invalidate cache and cause data to be re-loaded from wherever it comes from, pass cache=False.
Typically, the project’s load_data function has this signature and description:
load_data(split='train', input_dir=None) If input dir is not None, then load whatever dataset appears in `input_dir`. Otherwise, load the data split indicated by `split`.
- Return type
Tuple[DataFrame,DataFrame]
-
property
pipeline¶ Get the feature engineering pipeline from the existing features
- Return type
-
project¶ Get the Project object representing this project.
-
-
class
ballet.project.Project(package)[source]¶ Bases:
objectEncapsulate information on a ballet project
This is a utility class mostly useful for easy access to the project’s information from within the ballet.validation package.
- Parameters
package (ModuleType) – python package representing imported ballet project
-
property
api¶ - Return type
-
property
branch¶ Return current git branch according to git tree or CI environment
- Return type
Optional[str]
-
config¶
-
classmethod
from_cwd()[source]¶ Create a Project instance by searching up from cwd
Recursively searches for the ballet configuration file at the current working directory and parent directories, stopping when it reaches a file system boundary.
- Raises
ConfigurationError – couldn’t find the configuration file
-
classmethod
from_path(path, ascend=False)[source]¶ Create a Project instance from an fs path to the containing dir
- Parameters
path (
Union[str,PathLike]) – path to directory that contains the projectascend (
bool) – if the config file is not found in the given directory, then search in parent directories, stopping at a file system boundary
-
property
on_master¶ - Return type
bool
-
path¶ Return the project path (aka project root)
If
package.__file__is/foo/src/foo/__init__.py, then project.path should be/foo.
-
repo¶ Return a git.Repo object corresponding to this project
-
resolve(modname, attr=None)[source]¶ Import module or attribute from project
- Parameters
modname (
str) – dotted module name relative to top-level with leading dot omited; if trying to import the top-level package, use ‘’ (can also just access self.package)attr (
Optional[str]) – attribute to get from the imported module
Example
>>> project.resolve('', '__version__') # return __version__ attribute from top-level package >>> project.resolve('api') # return myproject.api module >>> project.resolve('api', attr='api') # return api object from myproject.api module >>> project.resolve('foo.bar') # return myproject.foo.bar module
- Return type
Any
-
property
version¶ Some version identifier for the current project
Implementation is to return the abbreviated SHA1 of git HEAD.
- Return type
str
-
ballet.project.detect_github_username(project)[source]¶ Detect github username
Looks in the following order: 1. github.user git config variable 2. git remote origin 3. $USER 4. ‘username’
- Return type
str
-
ballet.project.load_config(path=None, ascend=True)[source]¶ User-facing function to load config from project code
The default behavior when no arguments are provided is to detect the calling code using introspection and load a config object by ascending the directory of the calling code. If this does not succeed, you should just pass path directly.
- Return type
LazySettings
-
ballet.project.load_config_at_path(path)[source]¶ Load config at exact path
- Parameters
path (
Union[str,PathLike]) – path to config file- Returns
config dict
- Return type
dict