Transformation
Transformations are designed to add analytical features to a query DataFrame. Think of it as post-query augmentation: you define metrics and queries as usual, then apply transformers to compute things like moving averages, z-scores, Pareto flags, ranks, forecasts, or k-means clusters on the returned table.
We apply a transformation to the DataFrame, which (ideally) enriches it by adding one or more new analytical features. After the transformation, the original DataFrame might be different from what it was before the transformation (e.g., it has a new column or a column was updated).
When to use it
-
You want quick analytical features derived from the queried data. Applying them through Derived Metrics is infeasible or difficult to implement.
-
You want pluggable, composable transformations that can be reused across queries.
With transformers you can easily implement standard analyses such as:
-
moving average
-
zscore
-
cumulative
-
pareto
-
linear_regression
-
kmeans
-
etc.
Interface design
Designed for extensibility as a small registry-based interface:
-
Registry: register transformers by name using
register_transformer(name, transformer). -
Definition: attach a transformer to a query via
define_transformation(query_name, transformer, params). -
Execution: apply at runtime with
transform(df, transformer, **params)or use stored configs to transform a query result. -
Configuration: store/retrieve per-query transformation configs with
list_transformations,get_transformation_config, anddelete_transformation.
A transformer can be either:
-
An object implementing
Transformerclass (ABC) with a methodtransform(df: pd.DataFrame, **params) -> pd.DataFrame. -
A plain callable
fn(df: pd.DataFrame, **params) -> pd.DataFrame.
Default transformers
Cube Alchemy ships with a few defaults that are auto-registered:
-
moving_average
-
zscore
You can inspect these under cube_alchemy/transformation/default_transformers. You can use them directly, or register your own custom transformers.
Examples
Register a custom transformer and attach it to a query:
def add_simple_rank(df, by: str, ascending: bool = False, col: str = 'Simple Rank'):
df = df
df[col] = df[by].rank(ascending=ascending, method='dense').astype(int)
return df
cube.register_transformer('rank', add_simple_rank)
cube.define_transformation('Sales by Country and Category', transformer='rank', params={'by': 'Revenue'})
# Later during execution, the newly generated column will be available
cube.query('Sales by Country and Category')
# This is a very simple case which can also be achieved using expressions
cube.define_derived_metric(
name="Revenue Rank",
expression='@pd.Series([Revenue]).rank(ascending=True, method="dense")'
)
# But they operate in a different way and also on different steps on the query execution pipeline.
# Derived Metrics create a new column on the DataFrame based on Dimensions and Metrics. Transformers transform the DataFrame using an arbitrary transformation function (it can delete rows, rename columns, add columns, change order, fetch external data through an api and merge it, etcetc.).
Notes:
-
Each query can configure at most one instance of the same transformer.
-
When designing custom transformers, bear in mind that they may modify the query result, use with discretion.