-
Notifications
You must be signed in to change notification settings - Fork 82
Draft (new feature) : Model to estimate when a intervention had effect #480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Draft (new feature) : Model to estimate when a intervention had effect #480
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #480 +/- ##
==========================================
- Coverage 95.19% 95.19% -0.01%
==========================================
Files 28 28
Lines 2457 2643 +186
==========================================
+ Hits 2339 2516 +177
- Misses 118 127 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…over intervention's distributions
b7b91de
to
ee701f2
Compare
NoteIn my last PR, I added functionality allowing users to specify priors for the effect of the intervention. This provides more flexibility for Bayesian modeling and allows users to incorporate domain knowledge directly into the inference process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @JeanVanDyk. I'm very excited about the new addition, thanks for putting it together. Bear with me if I'm sometimes slow to reply - busy dad life!
At this point, could I request that you put together an ipynb
file that showcases the functionality. I'm assuming this PR will evolve and go through a couple of iterations, so this notebook will end up being a new docs page to help users understand the new functionality. If you edit the ./docs/source/notebooks/index.md
file, you can add the notebook name under the interrupted time series section. That way the notebook will render if you either build the docs locally, but it should also render in the remote docs preview build.
Did you still want feedback on the GitHub Discussion at this point?
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Simplified
|
…hout given time range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good work! Thanks so much for adding it.
I've added a comment or two but consider them optional. Approving now. cc @drbenvincent
|
||
|
||
class ModelException(Exception): | ||
"""Exception raised given when there is some error in user-provided model""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's fine. I guess the docstring just threw me a bit, seemed redundant, but i get the idea of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
It's a nit but you can add ";" after plots to prevent the axes being printed.
Optional - but it might be nice to see (or just discuss) a "failure mode" of the model e.g. when applied to a timeseries with no change point.... i'm guessing it'll show uncertainty? That might be as instructive as highlighting the usage.
Aiming on reviewing this today. Sorry about the delays! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a few high level things. Sorry if this comes across as incredibly annoying, but I want to make sure we are adding new features so they are set up well for future maintainability.
Re-packaging this as interrupted time series and change point detection
At the moment we're packaging this as two flavours of interrupted time series, and that is how we've been thinking about it. But I wonder if it might be simpler, more understandable, and potentially more accurate if we package up as a) traditional ITS that we have before, b) new change point detection functionality. What do you think?
(Tagging @NathanielF for thoughts)
Perfecting the handler architecture
If we stick with the current approach, I think there are some improvements to the handler architecture.
Current Flow:
- Handler preprocessing → splits data
- Main init → builds matrices from split data, fits model
- Handler postprocessing → splits data again + builds post-treatment matrices
Better Flow Would Be:
- Handler preprocessing → prepares training data appropriately
- Main init → builds matrices, fits model
- Handler postprocessing → uses the already established data splits
So we could do a refactor... the handlers should probably own the entire data management lifecycle:
[Optional] Support for Prior
class
Now that we've merged support for the custom Prior
class, it would be really cool to use that here. That would also need some updates to the docs. However I'm also fine for that to be a separate follow up PR that we do in the near future.
LEGEND_FONT_SIZE = 12 | ||
|
||
|
||
class TreatmentTimeHandler(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy any scikit-learn support to be a new issue. And it can be a low priority one because it does make sense to focus our finite time on the Bayesian side of things.
This model implements three types of changepoints: level shift, trend change, and impulse response. | ||
In words: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a new line before the bullet points in order for the rendered docs to show properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see the division of ITS and a more general changepoint detection module being conceptually quite appealing.
If it's an easy lift to do, maybe do so now. If it's a big push, I'd say let's merge this one and do a follow on PR.
If i understand you correctly it's just restructuring Jean's existing code so the ITS class calls an independent CP clasd. Not adding new functionality, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I could tell it would involve:
- Leaving the
InterruptedTimeSeries
experiment as it was. - Creating a new
ChangePointDetection
class. This would basically be a near duplication of theInterruptedTimeSeries
class but have your new functionality in it - Keeping the
InterventionTimeEstimator
model class, but perhaps renaming it to focus more on change point detection. - The
TreatmentTimeHandler
class hierarchy would disappear. For example, theKnownTreatmentTimeHandler
would be irrelevant because it's all handled by theInterruptedTimeSeries
experiment class. And theUnknownTreatmentTimeHandler
would be merged into the newChangePointDetection
class. - So this would in a way be winding the clock back to when we had two different interrupted time series classes but the difference is that the new one is 'sold' as change point detection. I do feel a bit bad for going back and forth with these suggestions, but my feeling is:
- This increased separation will be better for maintainability
- It would be clearer to end-users
- The term 'change point detection' is appealing and people just know what it means.
Extracting out the new content in the interrupted time series example notebook into a new change point detection notebook. Slight updates to the new notebook to push the 'change point' aspect. Fine to mention the connections to interrupted time series. And adding a new Change Point sub-section of the "how to" docs page.- Updating the homepage to highlight the new feature - but I can do that in a different PR.
I'm aware that's not a trivial amount of work. But conceptually it's easy in that it doesn't involve solving any new problems. It can probably be facilitated by use of an LLM.
Does that seems reasonable @JeanVanDyk, or onerous. If you don't have capacity then I could give it a stab. I'm aware that I've been a bit demanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reviews! I also think it would be a nice experiment to include in CausalPy’s package. I’ll start working on it this week, though I can’t promise I’ll have enough time to finish before moving on to another project. I’ll keep you posted on my progress.
Also, if I do get enough time, it might be worth looking into more sophisticated models. I could start digging into the literature to get a sense of the state of the art in Bayesian changepoint modeling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are on board and happy with the change in 'packaging', then I'm happy to give it a go if you are capacity limited. Let me know @JeanVanDyk
New Feature:
InterventionTimeEstimator
for Unknown Treatment TimingThis PR introduces a new model,
InterventionTimeEstimator
, designed to estimate when an intervention has an effect in a time series — especially in cases where the exact treatment time is unknown.Use Case
This addition gives users a flexible, Bayesian approach to model treatment timing uncertainty directly within the CausalPy framework.
Notes / Open Questions
Where should this model fit into the CausalPy workflow?
I’m unsure whether
InterventionTimeEstimator
should be integrated within theInterruptedTimeSeries
(ITS) feature, or used as a standalone tool.This affects how a user-defined model could be supported.
Depending on the intended usage, I can propose a solution to allow users to inject their own custom models.
Custom model usage — base vs. intervention
Should users be able to:
Covariates
I considered adding time-varying covariates to improve the fit. Would that be useful or out of scope?
Multivariate Time Series
It's relatively easy to extend the model for multivariate input. Let me know if this is something you'd like to see.
Model Summary
Inputs:
t
: 1D array of time pointsy
: 1D array of observed valuesspan
: restricts the window for switchpoint detectioncoords
: can includeseasons
for modeling periodic effectseffect
: list of components, e.g."trend"
,"level"
,"impulse"
grain_season
: number of time steps per seasonModel Components:
intercept + trend + seasonal
effect
componentsFeel free to share any feedback or suggestions! I'm happy to refine the model or explore extensions based on your input.
📚 Documentation preview 📚: https://causalpy--480.org.readthedocs.build/en/480/