My project “Better tools to interpret complex Bambi regression models” was completed under the organization of NumFOCUS, and mentors Tomás Capretto and Osvaldo Martin. Before I describe the project, objectives, and work completed, I would like to thank my mentors Tomás and Osvaldo for their precious time and support throughout the summer. They were always available and timely in communicating over Slack and GitHub, and provided valuable feedback during code reviews. Additionally, I would like to thank NumFOCUS and the Google Summer of Code (GSoC) program for providing the opportunity to work on such an open source project over the summer. It has been an invaluable experience, and I look forward to contributing to open source projects in the future.
Project Description
Bayesian modeling has increased significantly in academia and industry over the past years thanks to the development of high quality and user friendly open source probabilistic programming languages (PPL) in Python and R. Of these is Bambi, a Python library built on top of the PyMC PPL, that makes it easy to specify complex generalized linear multilevel models using a formula notation similar to those found in R. However, as the model building portion of the Bayesian workflow becomes easier, the interpretation of these models has not.
To aid in model interpretability, Bambi (before this project) had a sub-package plots
that supported conditional adjusted predictions plots.
The original objective was to extend upon the existing plotting functionality of conditional adjusted predictions by supporting the plotting of posterior predictive samples, and to provide the additional plotting functions predictive comparisons and predictive slopes. However, after discussion with my mentors, and taking inspiration from marginaleffects, it was decided that in addition to the plotting functions, the plots
sub-package should also have functions that allow the user to return the dataframe used for plotting. For example, calling plot_slopes()
would plot the slopes, and calling slopes()
would return the dataframe used to plot the slopes. To this extent, the plots
sub-package was renamed to interpret
to better reflect all of the supported functionality. These three features allow Bambi modelers to compute and interpret three different “quantities of interest” in a more automatic and effective manner.
Thus, the three main deliverables of this project were to add (or enhance) the following functions in the interpret
sub-package. Additionally, for each feature, tests and documentation need to be added:
- Support posterior predictive samples (pps) in
plot_predictions()
- Write tests
- Add documentation
- Add
comparisons()
andplot_comparisons()
- Write tests
- Add documentation
- Add
slopes()
andplot_slopes()
- Write tests
- Add documentation
Work Completed
All three main deliverables (and their associated sub-deliverables) were completed (merged into the main branch of the Bambi repository) on time. In the table below, the name of the deliverable, link to the feature’s pull request (PR), and link to the documentation are provided.
Deliverable | Feature PR | Documentation PR |
---|---|---|
Allow plot_predictions to plot posterior predictive samples | PR #668 | PR #670 |
Add comparisons and plot_comparisons | PR #684 | PR #695 |
Add slopes and plot_slopes | PR #699 | PR #701 |
In order to quickly obtain a sense of the work completed over the GSoC program, it is probably best to view the documentation PRs. The documentation PRs consist of a Jupyter notebook that demonstrates: (1) why the new feature is useful in interpreting GLMs, (2) a brief overview of how the function computes the quantity of interest based on the user’s inputs, and (3) a demonstration of the function using different GLMs and datasets.
Future Work
As the PRs have been merged upstream, some Bambi users have already been utilizing the new sub-package. The feedback has been positive, and a GitHub issue has been opened to request additional functionality. Going forward, I will continue to contribute to Bambi, maintain the interpret
sub-package, and interact with the Bambi community.