Skip to content

Conversation

JeanVanDyk
Copy link

@JeanVanDyk JeanVanDyk commented May 28, 2025

New Feature: InterventionTimeEstimator for Unknown Treatment Timing

This PR introduces a new model, InterventionTimeEstimator, designed to estimate when an intervention has an effect in a time series — especially in cases where the exact treatment time is unknown.


Use Case

  • Enhances the Interrupted Time Series (ITS) feature by providing a way to infer the likely time of intervention
  • Supports scenarios where the treatment onset is uncertain or delayed
  • Helps identify lagged effects between intervention and observable outcomes

This addition gives users a flexible, Bayesian approach to model treatment timing uncertainty directly within the CausalPy framework.


Notes / Open Questions

  • Where should this model fit into the CausalPy workflow?
    I’m unsure whether InterventionTimeEstimator should be integrated within the InterruptedTimeSeries (ITS) feature, or used as a standalone tool.
    This affects how a user-defined model could be supported.
    Depending on the intended usage, I can propose a solution to allow users to inject their own custom models.

  • Custom model usage — base vs. intervention
    Should users be able to:

    • Provide a custom model to represent the base time series (e.g. intercept, trend, seasonality)?
    • Provide a custom model to capture the intervention effect (e.g. shape or dynamics of the post-switch impact)?
    • Or support both?
  • Covariates
    I considered adding time-varying covariates to improve the fit. Would that be useful or out of scope?

  • Multivariate Time Series
    It's relatively easy to extend the model for multivariate input. Let me know if this is something you'd like to see.


Model Summary

  • Inputs:

    • t: 1D array of time points
    • y: 1D array of observed values
    • Optional span: restricts the window for switchpoint detection
    • Optional coords: can include seasons for modeling periodic effects
    • effect: list of components, e.g. "trend", "level", "impulse"
    • grain_season: number of time steps per season
  • Model Components:

    • Time series is modeled as:
      intercept + trend + seasonal
    • A Uniform prior is used to place a switchpoint
    • A sigmoid curve models the onset of the effect after the switchpoint, applied to the selected effect components

Feel free to share any feedback or suggestions! I'm happy to refine the model or explore extensions based on your input.


📚 Documentation preview 📚: https://causalpy--480.org.readthedocs.build/en/480/

@JeanVanDyk JeanVanDyk added enhancement New feature or request help wanted Extra attention is needed labels May 28, 2025
Copy link

codecov bot commented May 30, 2025

Codecov Report

❌ Patch coverage is 95.79439% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.19%. Comparing base (77f8d59) to head (07ce191).

Files with missing lines Patch % Lines
causalpy/pymc_models.py 93.90% 5 Missing ⚠️
causalpy/experiments/interrupted_time_series.py 96.33% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #480      +/-   ##
==========================================
- Coverage   95.19%   95.19%   -0.01%     
==========================================
  Files          28       28              
  Lines        2457     2643     +186     
==========================================
+ Hits         2339     2516     +177     
- Misses        118      127       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JeanVanDyk JeanVanDyk force-pushed the intervention-time-estimator branch from b7b91de to ee701f2 Compare May 30, 2025 09:50
@JeanVanDyk
Copy link
Author

Note

In my last PR, I added functionality allowing users to specify priors for the effect of the intervention. This provides more flexibility for Bayesian modeling and allows users to incorporate domain knowledge directly into the inference process.

@drbenvincent drbenvincent marked this pull request as draft May 31, 2025 05:28
Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @JeanVanDyk. I'm very excited about the new addition, thanks for putting it together. Bear with me if I'm sometimes slow to reply - busy dad life!

At this point, could I request that you put together an ipynb file that showcases the functionality. I'm assuming this PR will evolve and go through a couple of iterations, so this notebook will end up being a new docs page to help users understand the new functionality. If you edit the ./docs/source/notebooks/index.md file, you can add the notebook name under the interrupted time series section. That way the notebook will render if you either build the docs locally, but it should also render in the remote docs preview build.

Did you still want feedback on the GitHub Discussion at this point?

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@JeanVanDyk
Copy link
Author

Simplified .fit() API

The InterventionTimeEstimator model has been streamlined. The .fit() method now only takes:

  • X: covariates
  • y: observed values
  • coords: coordinate labels for dimensions

Cleaner and more straightforward to use.


Optional Prior Specification

You can now pass a priors dictionary at init time to control intervention effects:

  • "level" and "trend"[mu, sigma] (Normal prior)
  • "impulse"[mu, sigma1, sigma2]
    • mu: amplitude mean (Normal)
    • sigma1: amplitude std (Normal)
    • sigma2: decay rate std (HalfNormal)

If an effect is omitted or an empty list is provided, the model uses default priors.


Support for Inferred Intervention Time

InterruptedTimeSeries now works with models that infer the treatment time.

  • Before fit():

    • If treatment_time is None or a Tuple, the model uses the full dataset to infer the switchpoint.
    • treatment_time is passed to the model via set_time_range().
  • After fit():

    • The inferred switchpoint is retrieved.
    • data_pre is reset accordingly for prediction, as in the standard flow.

Timeline Requirement

The time column t must be the last column of X (excluding y).
This is how the model locates the timeline internally.


Extras and Ongoing Work

  • Included a demo notebook for quick experimentation.
  • Currently working on:
    • More robust input_validation() inside InterruptedTimeSeries.
    • Supporting timelines expressed as datetime objects.

Copy link
Contributor

@NathanielF NathanielF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good work! Thanks so much for adding it.

I've added a comment or two but consider them optional. Approving now. cc @drbenvincent



class ModelException(Exception):
"""Exception raised given when there is some error in user-provided model"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's fine. I guess the docstring just threw me a bit, seemed redundant, but i get the idea of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

It's a nit but you can add ";" after plots to prevent the axes being printed.

Optional - but it might be nice to see (or just discuss) a "failure mode" of the model e.g. when applied to a timeseries with no change point.... i'm guessing it'll show uncertainty? That might be as instructive as highlighting the usage.

@drbenvincent
Copy link
Collaborator

Aiming on reviewing this today. Sorry about the delays!

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a few high level things. Sorry if this comes across as incredibly annoying, but I want to make sure we are adding new features so they are set up well for future maintainability.

Re-packaging this as interrupted time series and change point detection

At the moment we're packaging this as two flavours of interrupted time series, and that is how we've been thinking about it. But I wonder if it might be simpler, more understandable, and potentially more accurate if we package up as a) traditional ITS that we have before, b) new change point detection functionality. What do you think?

(Tagging @NathanielF for thoughts)

Perfecting the handler architecture

If we stick with the current approach, I think there are some improvements to the handler architecture.

Current Flow:

  • Handler preprocessing → splits data
  • Main init → builds matrices from split data, fits model
  • Handler postprocessing → splits data again + builds post-treatment matrices

Better Flow Would Be:

  • Handler preprocessing → prepares training data appropriately
  • Main init → builds matrices, fits model
  • Handler postprocessing → uses the already established data splits

So we could do a refactor... the handlers should probably own the entire data management lifecycle:

[Optional] Support for Prior class

Now that we've merged support for the custom Prior class, it would be really cool to use that here. That would also need some updates to the docs. However I'm also fine for that to be a separate follow up PR that we do in the near future.

LEGEND_FONT_SIZE = 12


class TreatmentTimeHandler(ABC):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy any scikit-learn support to be a new issue. And it can be a low priority one because it does make sense to focus our finite time on the Bayesian side of things.

This model implements three types of changepoints: level shift, trend change, and impulse response.
In words:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a new line before the bullet points in order for the rendered docs to show properly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see the division of ITS and a more general changepoint detection module being conceptually quite appealing.

If it's an easy lift to do, maybe do so now. If it's a big push, I'd say let's merge this one and do a follow on PR.

If i understand you correctly it's just restructuring Jean's existing code so the ITS class calls an independent CP clasd. Not adding new functionality, right?

Copy link
Collaborator

@drbenvincent drbenvincent Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I could tell it would involve:

  • Leaving the InterruptedTimeSeries experiment as it was.
  • Creating a new ChangePointDetection class. This would basically be a near duplication of the InterruptedTimeSeries class but have your new functionality in it
  • Keeping the InterventionTimeEstimator model class, but perhaps renaming it to focus more on change point detection.
  • The TreatmentTimeHandler class hierarchy would disappear. For example, the KnownTreatmentTimeHandler would be irrelevant because it's all handled by the InterruptedTimeSeries experiment class. And the UnknownTreatmentTimeHandler would be merged into the new ChangePointDetection class.
  • So this would in a way be winding the clock back to when we had two different interrupted time series classes but the difference is that the new one is 'sold' as change point detection. I do feel a bit bad for going back and forth with these suggestions, but my feeling is:
    • This increased separation will be better for maintainability
    • It would be clearer to end-users
    • The term 'change point detection' is appealing and people just know what it means.
  • Extracting out the new content in the interrupted time series example notebook into a new change point detection notebook. Slight updates to the new notebook to push the 'change point' aspect. Fine to mention the connections to interrupted time series. And adding a new Change Point sub-section of the "how to" docs page.
  • Updating the homepage to highlight the new feature - but I can do that in a different PR.

I'm aware that's not a trivial amount of work. But conceptually it's easy in that it doesn't involve solving any new problems. It can probably be facilitated by use of an LLM.

Does that seems reasonable @JeanVanDyk, or onerous. If you don't have capacity then I could give it a stab. I'm aware that I've been a bit demanding.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @drbenvincent @NathanielF,

Thanks for your reviews! I also think it would be a nice experiment to include in CausalPy’s package. I’ll start working on it this week, though I can’t promise I’ll have enough time to finish before moving on to another project. I’ll keep you posted on my progress.

Also, if I do get enough time, it might be worth looking into more sophisticated models. I could start digging into the literature to get a sense of the state of the art in Bayesian changepoint modeling.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are on board and happy with the change in 'packaging', then I'm happy to give it a go if you are capacity limited. Let me know @JeanVanDyk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants