Skip to content

Hypercube API

The Hypercube class is the central component of Cube Alchemy, providing methods for creating, querying, and analyzing multidimensional data.

Initialization

The Hypercube can be initialized in two ways:

# Option 1: Initialize with data
Hypercube(
  tables: Dict[str, pd.DataFrame] = None,
  rename_original_shared_columns: bool = True,
  apply_composite: bool = True,
  validate: bool = True,
  to_be_stored: bool = False
)

# Option 2: Initialize empty and load data later
Hypercube()
load_data(
  tables: Dict[str, pd.DataFrame],
  rename_original_shared_columns: bool = True,
  apply_composite: bool = True,
  validate: bool = True,
  to_be_stored: bool = False,
  reset_all: bool = False
)

The load_data() method can also be used to reload or update data in an existing hypercube.

Parameters:

  • tables: Dictionary mapping table names to pandas DataFrames
  • rename_original_shared_columns: Controls what happens to shared columns in source tables.
  • True (default): keep them, renamed as <column> (<table_name>). Enables per‑table counts/aggregations.
  • False: drop them from source tables (values remain in link tables). Saves time and memory if per‑table analysis isn’t needed.
  • apply_composite: Whether to automatically create composite keys for multi-column relationships
  • validate: Whether to validate schema and build trajectory cache during initialization
  • to_be_stored: Set to True if the hypercube will be serialized/stored (skips Default context state creation)
  • reset_all (only load_data method): Whether to reset metrics and queries definitions, as well as registered functions when reloading data

Examples:

import pandas as pd
from cube_alchemy import Hypercube

# Option 1: Initialize with data (keep renamed shared columns)
cube1 = Hypercube({
    'Product': products_df,
    'Customer': customers_df,
    'Sales': sales_df
}, rename_original_shared_columns=True)

# Option 2: Initialize empty first, then load data
cube2 = Hypercube()
cube2.load_data({
    'Product': products_df,
    'Customer': customers_df,
    'Sales': sales_df
}, rename_original_shared_columns=False)

# Reload data in an existing hypercube (e.g., when data is updated)
cube1.load_data({
    'Product': updated_products_df,
    'Customer': updated_customers_df,
    'Sales': updated_sales_df
})

# Reset metrics and queries when loading new data schema
cube2.load_data(new_data, reset_all=True)

Core Methods

visualize_graph

visualize_graph(layout_type: str = 'spring', w: int = 12, h: int = 8, full_column_names: bool = True) -> None

Visualize the relationships between tables as a network graph.

Parameters:

  • layout_type: Algorithm for graph layout. Options include:

    • 'spring'(default)
    • 'circular'
    • 'shell'
    • 'random'
    • 'kamada_kawai'
    • 'spectral'
    • 'planar'
    • 'spiral'
  • w: Width of the plot

  • h: Height of the plot

  • full_column_names: Whether to show renamed columns with table reference (e.g., column <table_name>) or just the original column names

Example:

# Visualize the data model relationships
cube.visualize_graph()

# Hide renamed column format for cleaner display
cube.visualize_graph('spring', w=20, h=12, full_column_names=False)
Note: If the displayed graph doesn't look so good try a couple of times or adjust the size.

set_context_state

set_context_state(context_state_name: str, base_context_state_name: str = 'Unfiltered') -> bool

Create a new context state for independent filtering environments.

Parameters:

  • context_state_name: Name for the new context state
  • base_context_state_name: Name of the base context state, the new context state will be a copy of this one.

Returns:

  • Boolean indicating success

Example:

# Create a new context state
cube.set_context_state('Marketing Analysis') # Creates a copy from the unfiltered context state --> self.context_states['Marketing Analysis'] = self.context_states['Unfiltered']

# Apply filters specific to this context
cube.filter({'channel': ['Email', 'Social']}, context_state_name='Marketing Analysis')

Shared columns: counts and distincts

When multiple tables share a column (for example, customer_id), Cube Alchemy builds a link table containing the distinct values of that column across all participating tables. This has two practical implications:

  • Counting on the shared column name (e.g., customer_id) uses the link table and therefore reflects distinct values in the current filtered context across all tables that share it.
  • Counting on a per-table renamed column (e.g., customer_id <orders> or customer_id <customers>) uses that table’s own column values. The result can differ from the shared-column count because it’s scoped to that single table's values and is not the cross-table distinct set.

Example idea:

  • Count distinct customer_id (shared) → distinct customers across all linked tables.
  • Count distinct customer_id <orders> → distinct customers present in the Orders table specifically.

Choose the one that matches your analytical intent: cross-table distincts via the shared column, or table-specific distincts via the renamed columns. Note: per-table renamed columns are available only when rename_original_shared_columns=True; set it to False to drop them and reduce memory/processing if you don’t need that analysis.