Stats


core.py

A module for core functionalities.

bootstrap_metric(returns: np.ndarray, metric: str | Callable = 'sharpe_ratio', n_bootstraps: int = 1000, n_jobs: int = 2, min_length: int = 5, rng: None | Generator = None, **kwargs: dict) -> np.ndarray

Compute input metric using bootstrapping procedure

Parameters:
  • returns (ndarray) –

    a vector-like object of returns

  • metric (str | Callable, default: 'sharpe_ratio' ) –

    input metric, either the name of a metric function (without 'compute_' prefix) defined in the 'stats' module or a callable that accepts returns as the first argument.

  • n_bootstraps (int, default: 1000 ) –

    number of bootstrap samples

  • n_jobs (int, default: 2 ) –

    number of parallel jobs in the computation

  • min_length (int, default: 5 ) –

    minimum size of bootstrap sample

  • rng (None | Generator, default: None ) –

    numpy random Generator

  • kwargs (dict, default: {} ) –

    additional arguments passed to the metric function

Returns:
  • ndarray

    the array contained the bootstrapped results

Example:

>>> import numpy as np
>>> returns = np.random.default_rng().normal(loc=0, scale=0.01, size=100)
>>> results = bootstrap_metric(returns, metric="sharpe_ratio", n_bootstraps=100)
>>> print(results.shape)
(100,)

Using a custom metric function:
>>> def mean_return(x):
...     return np.mean(x)
>>> results = bootstrap_metric(returns, metric=mean_return, n_bootstraps=100)
>>> np.mean(results)
0.0005  # (example output)

compute_pct_returns(x: pd.Series) -> float

Compute the percentage return of a series

Parameters:
  • x (Series) –

    Input pandas Series representing prices or values over time

Returns:
  • float

    The percentage return computed as (last / first) - 1,

  • float

    or NaN if the first value is zero

Example:

>>> import pandas as pd
>>> s = pd.Series([100, 110])
>>> compute_pct_returns(s)
0.10

>>> s_zero = pd.Series([0, 110])
>>> compute_pct_returns(s_zero)
nan

compute_returns(x: pd.Series) -> float

Compute the absolute return of a series

Parameters:
  • x (Series) –

    Input pandas Series representing prices or values over time

Returns:
  • float

    The absolute return computed as last value minus first value

Example:

>>> import pandas as pd
>>> s = pd.Series([100, 105, 110])
>>> compute_returns(s)
10.0

compute_robust_distance(corr: pd.DataFrame) -> pd.DataFrame

Compute a robust version of distance metric from correlation

Parameters:
  • corr (DataFrame) –

    input correlation matrix

Returns:
  • DataFrame

    robust distance

Example:

>>> import pandas as pd
>>> import numpy as np
>>> corr = pd.DataFrame([[1, 0.5], [0.5, 1]], columns=["A", "B"], index=["A", "B"])
>>> dist = compute_robust_distance(corr)
>>> dist.loc["A", "B"]
0.7071067811865476

estimate_correlation(returns: pd.DataFrame, method: str = 'empyrical', rolling_window: int = 5, n_bootstraps: int = 100, n_jobs: int = 2, min_length: int = 5, rng: None | Generator = None) -> pd.DataFrame

Estimate a correlation matrix using a rolling window and bootstrap procedure.

Parameters:
  • returns (DataFrame) –

    DataFrame of returns with assets as columns

  • method (str, default: 'empyrical' ) –

    estimation method, can be 'empyrical', 'glassocv' or 'ledoit_wolf'

  • rolling_window (int, default: 5 ) –

    Window size for rolling returns computation.

  • n_bootstraps (int, default: 100 ) –

    Number of bootstrap samples.

  • n_jobs (int, default: 2 ) –

    Number of parallel jobs.

  • min_length (int, default: 5 ) –

    Minimum block length for bootstrap.

  • rng (None | Generator, default: None ) –

    Random generator for reproducibility.

Returns:
  • DataFrame

    DataFrame containing the estimated correlation matrix

Example:

>>> import pandas as pd
>>> import numpy as np
>>> returns = pd.DataFrame(np.random.normal(0, 0.01, (100, 3)),
...                        columns=["A", "B", "C"])
>>> corr = estimate_correlation(returns, method="ledoit_wolf", n_bootstraps=10)
>>> corr.shape
(3, 3)
>>> corr.columns.tolist()
['A', 'B', 'C']

get_scorecard(portfolio: pd.DataFrame, freq: str = 'Y') -> pd.DataFrame

Generate a performance scorecard of portfolio metrics aggregated by period.

Parameters:
  • portfolio (DataFrame) –

    DataFrame containing at least 'returns' or 'pnl' columns. If one is missing, it will be computed internally

  • freq (str, default: 'Y' ) –

    Resampling frequency: 'Y' (year), 'Q' (quarter), or 'M' (month)

Returns:
  • DataFrame

    DataFrame with metrics such as Sharpe Ratio, Sortino Ratio, Max Drawdown,

  • DataFrame

    VaR, CVaR, and Final P&L for each period plus a total summary.

Example:

>>> import pandas as pd
>>> import numpy as np
>>> dates = pd.date_range("2020-01-01", periods=100, freq="D")
>>> pnl = np.cumsum(np.random.normal(0, 1, size=100))
>>> df = pd.DataFrame({"pnl": pnl}, index=dates)
>>> scorecard = get_scorecard(df, freq="M")
>>> print(scorecard)
Period         2020-M1   2020-M2  Total
Sharpe-Ratio    0.10      0.12     0.11
Sortino-Ratio   0.15      0.18     0.16
MaxDD          -0.25     -0.30    -0.28
VaR            -0.05     -0.04    -0.045
CVaR           -0.07     -0.06    -0.065
FinalP&L       12.34     14.56    26.90