changes to default EBM parameters

interpretml · Oct 21, 2024 · 1dd13a4 · 1dd13a4
1 parent d98b806
commit 1dd13a4
Show file tree

Hide file tree

Showing 3 changed files with 40 additions and 36 deletions.
diff --git a/docs/benchmarks/ebm-benchmark.ipynb b/docs/benchmarks/ebm-benchmark.ipynb
@@ -963,9 +963,13 @@
     "#results_df = results_df[results_df['task'] != 'har']\n",
     "#results_df = results_df[results_df['task'] != 'cnae-9']\n",
     "#results_df = results_df[results_df['task'] != 'MiceProtein']\n",
+    "#\n",
     "#results_df = results_df[results_df['type'] != 'binary']\n",
     "#results_df = results_df[results_df['type'] != 'multiclass']\n",
     "#results_df = results_df[results_df['type'] != 'regression']\n",
+    "#\n",
+    "#results_df = results_df[(results_df['method'] != 'ebm') | (results_df['meta'] == '{}')]\n",
+    "#\n",
     "#results_df = results_df[\n",
     "#    (results_df['task'] == 'CIFAR_10') | \n",
     "#    (results_df['task'] == 'Fashion-MNIST') | \n",

diff --git a/docs/interpret/hyperparameters.md b/docs/interpret/hyperparameters.md
@@ -6,11 +6,11 @@ The parameters below are ordered by tuning importance, with the most important h
 
 
 ## smoothing_rounds
-default: 100
+default: 75 (classification) 500 (regression)
 
-hyperparameters: [0, 50, 100, 200, 500, 1000]
+hyperparameters: [0, 25, 50, 75, 100, 150, 200, 350, 500, 750, 1000, 1500, 2000, 4000]
 
-guidance: This is an important hyperparameter to tune. The optimal smoothing_rounds value will vary depending on the dataset's characteristics. Adjust based on the prevalence of smooth feature response curves.
+guidance: This is an important hyperparameter to tune. Classification seems to prefer a dataset dependent number centered around 75. Regression seems to prefer more smoothing_rounds. The default smoothing_rounds of 500 was chosen based on fitting time, however even higher values seem to improve model performance.
 
 ## learning_rate
 default: 0.015 (classification), 0.04 (regression)
@@ -22,11 +22,11 @@ guidance: This is an important hyperparameter to tune. The conventional wisdom i
 ## interactions
 default: 0.9
 
-ideal: As many as possible
+ideal: As many as possible within interpretability limits.
 
 hyperparameters: [0.0, 0.9, 0.95, 0.99, 100, 250, 1000]
 
-guidance: Introducing more interactions tends to improve model accuracy. Values between 0 and LESS than 1.0 are interpreted as percentages of the number of features. For example, a dataset with 100 features and an interactions value of 0.7 will automatically detect and use 70 interactions. Values of 1 or higher indicate the exact number of interactions to be detected, so for example 1 would create 1 interaction term, and 50 would create 50.
+guidance: Generally, this parameter should be chosen based on interpretability considerations as having too many interactions makes the model less interpretable. A reasonable stragegy is to initially include more interactions than desired then drop the less important interactions in post processing after fitting. See the [remove_terms](./python/api/ExplainableBoostingClassifier.html#interpret.glassbox.ExplainableBoostingClassifier.remove_terms) function. In terms of model performance, introducing more interactions tends to improve model accuracy. Values between 0 and LESS than 1.0 are interpreted as percentages of the number of features. For example, a dataset with 100 features and an interactions value of 0.7 will automatically detect and use 70 interactions. Values of 1 or higher indicate the exact number of interactions to be detected, so for example 1 would create 1 interaction term, and 50 would create 50.
 
 ## inner_bags
 default: 0
@@ -56,9 +56,9 @@ hyperparameters: [64]
 guidance: For max_interaction_bins, more is typically better in term of model performance, however fitting times go up significantly above 64 bins for very little benefit. We recommend using 64 as the default for this reason. If your fitting times are acceptable however, setting max_interaction_bins to 256 or even more might improve the model slightly.
 
 ## greedy_ratio
-default: 12.0
+default: 10.0
 
-hyperparameters: [0.0, 1.0, 2.0, 5.0, 10.0, 12.0, 20.0]
+hyperparameters: [0.0, 1.0, 2.0, 5.0, 10.0, 20.0]
 
 guidance: Values of greedy_ratio above 5.0 seem to result in similar model performance.
 
@@ -79,9 +79,9 @@ hyperparameters: [14]
 guidance: Increasing outer bags beyond 14 provides no observable benefit. Reducing outer_bags below 14 might improve fitting time on machines with less than 14 cores. Setting outer_bags to 8 is reasonable on many datasets, and can improve fitting time.
 
 ## interaction_smoothing_rounds
-default: 100
+default: 75 (classification) 100 (regression)
 
-hyperparameters: [0, 50, 100, 200, 500, 1000]
+hyperparameters: [0, 25, 50, 75, 100, 200, 500, 1000]
 
 guidance: interaction_smoothing_rounds appears to have only a minor impact on model accuracy. 100 is a good default choice, but it might be worth trying other values when optimizing a model.
 
@@ -100,14 +100,14 @@ hyperparameters: [2, 3, 4, 5, 10, 20, 50]
 guidance: The default value usually works well, however experimenting with slightly higher values could potentially enhance generalization on certain datasets. For smaller datasets, having a low value might be better. On larger datasets this parameter seems to have little effect.
 
 ## min_hessian
-default: 0.0
+default: 1e-4 (classification) 0.0 (regression)
 
-hyperparameters: [0.0, 1e-4]
+hyperparameters: [0.0, 1e-6, 1e-4, 1e-2]
 
-guidance: Generally 0.0 is close to the best choice for min_hessian, but on some datasets it might be useful to set min_hessian to a small non-zero value.
+guidance: For RMSE regression min_hessian below the min_samples_leaf value has no effect. For classification this has a minimal impact provided it is a small number.
 
 ## max_rounds
-default: 25000
+default: 9000
 
 ideal: 1000000000 (early stopping should stop long before this point)
 
@@ -125,11 +125,11 @@ hyperparameters: [100, 200]
 guidance: Having 200 early_stopping_rounds results in a slightly better model than the default of 100, but it requires significantly more time to fit in some cases.  early_stopping_rounds beyond 200 does not seem to improve the model.
 
 ## early_stopping_tolerance
-default: 0.0
+default: 1e-5
 
-hyperparameters: [0.0]
+hyperparameters: [0.0, 1e-5]
 
-guidance: Setting early_stopping_tolerance to a small positive value in the range of 1e-4 can help reduce fitting times on some datasets with minimal degradation in model performance. Setting it to a negative value sometimes yields slightly better models. EBMs are a bagged ensemble model, so overfitting each individual bag a little can be beneficial because after the models are averaged together in the ensemble averaging decreases the variance due to overfitting. Using a negative value for early_stopping_tolerance allows the individual models to be overfit.
+guidance: early_stopping_tolerance is set to 1e-5 by default due to fitting time considerations, however setting early_stopping_tolerance to 0.0, or even a negative value sometimes yields slightly higher accuracy. EBMs are a bagged ensemble model, so overfitting each individual bag a little can be beneficial because after the models are averaged together in the ensemble averaging decreases the variance due to overfitting. Using a negative value for early_stopping_tolerance allows the individual models to be overfit.
 
 ## validation_size
 default: 0.15

diff --git a/python/interpret-core/interpret/glassbox/_ebm/_ebm.py b/python/interpret-core/interpret/glassbox/_ebm/_ebm.py
@@ -2450,7 +2450,7 @@ class ExplainableBoostingClassifier(EBMModel, ClassifierMixin, ExplainerMixin):
         Number of inner bags. 0 turns off inner bagging.
     learning_rate : float, default=0.015
         Learning rate for boosting.
-    greedy_ratio : float, default=12.0
+    greedy_ratio : float, default=10.0
         The proportion of greedy boosting steps relative to cyclic boosting steps.
         A value of 0 disables greedy boosting, effectively turning it off.
     cyclic_progress : bool or float, default=False
@@ -2462,16 +2462,16 @@ class ExplainableBoostingClassifier(EBMModel, ClassifierMixin, ExplainerMixin):
         it will be used to update internal gain calculations related to how effective
         each feature is in predicting the target variable. Setting this parameter
         to a value less than 1.0 can be useful for preventing overfitting.
-    smoothing_rounds : int, default=100
+    smoothing_rounds : int, default=75
         Number of initial highly regularized rounds to set the basic shape of the main effect feature graphs.
-    interaction_smoothing_rounds : int, default=100
+    interaction_smoothing_rounds : int, default=75
         Number of initial highly regularized rounds to set the basic shape of the interaction effect feature graphs during fitting.
-    max_rounds : int, default=25000
+    max_rounds : int, default=9000
         Total number of boosting rounds with n_terms boosting steps per round.
     early_stopping_rounds : int, default=100
         Number of rounds with no improvement to trigger early stopping. 0 turns off
         early stopping and boosting will occur for exactly max_rounds.
-    early_stopping_tolerance : float, default=0.0
+    early_stopping_tolerance : float, default=1e-5
         Tolerance that dictates the smallest delta required to be considered an
         improvement which prevents the algorithm from early stopping.
         early_stopping_tolerance is expressed as a percentage of the early
@@ -2489,7 +2489,7 @@ class ExplainableBoostingClassifier(EBMModel, ClassifierMixin, ExplainerMixin):
         the ensemble as a whole.
     min_samples_leaf : int, default=4
         Minimum number of samples allowed in the leaves.
-    min_hessian : float, default=0.0
+    min_hessian : float, default=1e-4
         Minimum hessian required to consider a potential split valid.
     reg_alpha : float, default=0.0
         L1 regularization.
@@ -2643,16 +2643,16 @@ def __init__(
         inner_bags: Optional[int] = 0,
         # Boosting
         learning_rate: float = 0.015,
-        greedy_ratio: Optional[float] = 12.0,
+        greedy_ratio: Optional[float] = 10.0,
         cyclic_progress: Union[bool, float, int] = False,  # noqa: PYI041
-        smoothing_rounds: Optional[int] = 100,
-        interaction_smoothing_rounds: Optional[int] = 100,
-        max_rounds: Optional[int] = 25000,
+        smoothing_rounds: Optional[int] = 75,
+        interaction_smoothing_rounds: Optional[int] = 75,
+        max_rounds: Optional[int] = 9000,
         early_stopping_rounds: Optional[int] = 100,
-        early_stopping_tolerance: Optional[float] = 0.0,
+        early_stopping_tolerance: Optional[float] = 1e-5,
         # Trees
         min_samples_leaf: Optional[int] = 4,
-        min_hessian: Optional[float] = 0.0,
+        min_hessian: Optional[float] = 1e-4,
         reg_alpha: Optional[float] = 0.0,
         reg_lambda: Optional[float] = 0.0,
         max_delta_step: Optional[float] = 0.0,
@@ -2796,7 +2796,7 @@ class ExplainableBoostingRegressor(EBMModel, RegressorMixin, ExplainerMixin):
         Number of inner bags. 0 turns off inner bagging.
     learning_rate : float, default=0.04
         Learning rate for boosting.
-    greedy_ratio : float, default=12.0
+    greedy_ratio : float, default=10.0
         The proportion of greedy boosting steps relative to cyclic boosting steps.
         A value of 0 disables greedy boosting, effectively turning it off.
     cyclic_progress : bool or float, default=False
@@ -2808,16 +2808,16 @@ class ExplainableBoostingRegressor(EBMModel, RegressorMixin, ExplainerMixin):
         it will be used to update internal gain calculations related to how effective
         each feature is in predicting the target variable. Setting this parameter
         to a value less than 1.0 can be useful for preventing overfitting.
-    smoothing_rounds : int, default=100
+    smoothing_rounds : int, default=500
         Number of initial highly regularized rounds to set the basic shape of the main effect feature graphs.
     interaction_smoothing_rounds : int, default=100
         Number of initial highly regularized rounds to set the basic shape of the interaction effect feature graphs during fitting.
-    max_rounds : int, default=25000
+    max_rounds : int, default=9000
         Total number of boosting rounds with n_terms boosting steps per round.
     early_stopping_rounds : int, default=100
         Number of rounds with no improvement to trigger early stopping. 0 turns off
         early stopping and boosting will occur for exactly max_rounds.
-    early_stopping_tolerance : float, default=0.0
+    early_stopping_tolerance : float, default=1e-5
         Tolerance that dictates the smallest delta required to be considered an
         improvement which prevents the algorithm from early stopping.
         early_stopping_tolerance is expressed as a percentage of the early
@@ -2989,13 +2989,13 @@ def __init__(
         inner_bags: Optional[int] = 0,
         # Boosting
         learning_rate: float = 0.04,
-        greedy_ratio: Optional[float] = 12.0,
+        greedy_ratio: Optional[float] = 10.0,
         cyclic_progress: Union[bool, float, int] = False,  # noqa: PYI041
-        smoothing_rounds: Optional[int] = 100,
+        smoothing_rounds: Optional[int] = 500,
         interaction_smoothing_rounds: Optional[int] = 100,
-        max_rounds: Optional[int] = 25000,
+        max_rounds: Optional[int] = 9000,
         early_stopping_rounds: Optional[int] = 100,
-        early_stopping_tolerance: Optional[float] = 0.0,
+        early_stopping_tolerance: Optional[float] = 1e-5,
         # Trees
         min_samples_leaf: Optional[int] = 4,
         min_hessian: Optional[float] = 0.0,