Model Explainability — A Clear and Practical Guide

Dataset used for demonstration later: Titanic dataset

1️⃣ What is Model Explainability?

Simple Definition

Model explainability is the ability to understand why a machine learning model made a specific prediction.

If a model predicts:

“Passenger will survive with 82% probability”

Explainability answers:

Why 82%?
Which features contributed most?
What pushed the prediction up?
What pushed it down?
Would the prediction change if something changed?

Intuitive Analogy

Think of a machine learning model like a committee voting system.

Each feature (age, fare, gender, class, etc.) casts a vote:

Some votes push toward “Survive”
Some push toward “Not Survive”

Explainability tells us:

Who voted?
How strongly?
Who had the most influence?

Without explainability, you only see the final vote count — not how the decision was made.

2️⃣ Why Model Explainability Matters

1. Trust

Stakeholders will not trust black-box predictions.

2. Debugging

You detect:

Data leakage
Bias
Wrong feature behavior
Spurious correlations

3. Compliance

Industries like:

Banking
Healthcare
Government

often require explainability.

4. Fairness

You can detect whether sensitive attributes influence predictions unfairly.

3️⃣ Two Types of Explainability

A) Global Explainability

Answers:

“How does the model behave overall?”

Examples:

Which features are most important?
Does age increase or decrease survival probability?
Is fare positively correlated with survival?

B) Local Explainability

Answers:

“Why did the model predict THIS specific case like this?”

Example:
Passenger A:

Age = 60
Male
3rd class
Fare = 7

Why did this specific passenger get 12% survival probability?

4️⃣ Categories of Explainability Techniques

Explainability techniques fall into 3 major groups:

1️⃣ Intrinsic Explainability (Model is Naturally Interpretable)

These models are transparent by design.

Examples:

Linear Regression
Logistic Regression
Decision Trees (small ones)

Example (Logistic Regression)

$\text{log odds} = \beta_0 + \beta_1 X_1 + \beta_2 X_2$ log odds=β0+β1X1+β2X2

Interpretation:

Positive coefficient → increases probability
Negative coefficient → decreases probability

Very easy to explain.

Limitation:

Often less powerful than complex models

2️⃣ Post-hoc Explainability (Explain After Training)

Used for black-box models like:

Random Forest
XGBoost
Neural Networks

These models are powerful but not naturally interpretable.

We apply explanation techniques after training.

This includes:

SHAP
LIME
Permutation Importance
PDP
ICE
Counterfactual explanations

3️⃣ Model-Agnostic vs Model-Specific

Model-Agnostic

Works with any model.

Examples:

LIME
Permutation Importance
PDP
SHAP (KernelExplainer)

Model-Specific

Optimized for certain models.

Examples:

TreeSHAP (for tree models)
DeepSHAP (for neural networks)

5️⃣ Major Explainability Techniques (Simple Interpretation)

🔷 SHAP (SHapley Additive exPlanations)

Core idea:

Each feature gets a fair contribution value toward the final prediction.

Think of it like:

If total prediction = 0.82 $0.82 = Base\ Value + Contribution_{sex} + Contribution_{fare} + …$ 0.82=Base Value+Contributionsex+Contributionfare+…

Interpretation in simple terms:

Red bar → pushes prediction higher
Blue bar → pushes prediction lower
Larger bar → stronger influence

SHAP gives:

Global explanations
Local explanations
Consistent feature importance

Strength:
Mathematically grounded in game theory.

🔷 LIME (Local Interpretable Model-Agnostic Explanations)

Core idea:

Zoom into one prediction and approximate it using a simple linear model locally.

Think of it like:

“Around this passenger, the complex model behaves approximately like this simple equation.”

Interpretation:

Shows top features influencing THIS prediction
Only local — not global

Strength:
Works with any model.

Limitation:
Approximate, not exact.

🔷 Permutation Feature Importance

Core idea:

Shuffle one feature.
See how much performance drops.

If performance drops significantly → feature is important.

Interpretation:

Larger drop = more important feature

Simple and intuitive.

🔷 Partial Dependence Plot (PDP)

Core idea:

“How does prediction change when ONE feature changes?”

Interpretation:

X-axis → feature value
Y-axis → predicted probability

Example:
If fare increases → survival probability increases.

This shows direction and shape of relationship.

🔷 ICE (Individual Conditional Expectation)

Like PDP but for individual samples.

Interpretation:

Each line = one passenger
Shows variability across individuals

🔷 Counterfactual Explanations

Answers:

“What minimal change would flip the prediction?”

Example:
“If this passenger were female instead of male, survival probability would increase to 70%.”

Extremely intuitive for stakeholders.

6️⃣ How to Interpret Explainability in Very Simple Terms

Imagine explaining to a non-technical person:

Instead of saying:

“SHAP value for feature 3 is 0.21”

You say:

“Being female significantly increased survival probability.”

Instead of:

“Permutation importance score = 0.12”

You say:

“If we remove gender information, model accuracy drops a lot — so gender is very important.”

7️⃣ When to Use Which Method

Goal	Recommended Method
Feature ranking	Permutation / SHAP
Single prediction explanation	SHAP / LIME
Understand feature behavior	PDP
Check fairness	SHAP + subgroup analysis
Regulatory environment	SHAP + Counterfactuals

8️⃣ What Explainability Is NOT

Important clarity:

It does NOT make model causal.
It does NOT prove real-world causation.
It only explains how the model behaves.

Example:
If model says:
“Higher fare → higher survival probability”

That does NOT mean fare causes survival.
It may correlate with passenger class.

9️⃣ Big Picture Summary

Explainability answers three questions:

What features matter most?
How do they affect predictions?
Why did this specific prediction happen?

Think of it as turning:

🔒 Black Box
into
🔎 Glass Box

Hands-on Code for SHAP

			
# ==========================================================
# SECTION 0: INSTALL REQUIRED LIBRARIES
# Run this cell once (skip if already installed)
# ==========================================================
!pip install shap lime eli5 seaborn
# ==========================================================
# SECTION 1: IMPORT LIBRARIES
# ==========================================================
# Data handling
import pandas as pd
import numpy as np
import seaborn as sns
# Visualization
import matplotlib.pyplot as plt
# Model building
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
# Explainability tools
import shap
from lime.lime_tabular import LimeTabularExplainer
from sklearn.inspection import permutation_importance
from sklearn.inspection import PartialDependenceDisplay
import eli5
from eli5.sklearn import PermutationImportance
# Improve plot readability
plt.rcParams["figure.figsize"] = (8, 6)
print("Libraries imported successfully.")
# ==========================================================
# SECTION 2: LOAD AND PREPROCESS DATA
# ==========================================================
# Load Titanic dataset from seaborn
df = sns.load_dataset("titanic")
# Select relevant columns
df = df[['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare']]
# Fill missing age values with median
df['age'] = df['age'].fillna(df['age'].median())
# Encode 'sex' (male=1, female=0)
le = LabelEncoder()
df['sex'] = le.fit_transform(df['sex'])
# Separate features and target
X = df.drop('survived', axis=1)
y = df['survived']
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
print("Data prepared successfully.")
print("Training samples:", X_train.shape)
print("Testing samples:", X_test.shape)
# ==========================================================
# SECTION 3: MODEL TRAINING
# ==========================================================
# Initialize Random Forest
model = RandomForestClassifier(
    n_estimators=200,    # Number of trees
    random_state=42
)
# Train model
model.fit(X_train, y_train)
# Evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Model trained successfully.")
print("Model Accuracy:", accuracy)
# ==========================================================
# SECTION 4: SHAP
# ==========================================================
import shap
# Create explainer (auto-detects correct type)
explainer = shap.Explainer(model)
# Calculate SHAP values
shap_values = explainer(X_test)
print("SHAP values shape:", shap_values.values.shape)
print(shap_values)
# If 3D (binary classification), select class 1
if len(shap_values.values.shape) == 3:
    shap_values = shap_values[:, :, 1]
# Beeswarm plot (best global visualization)
shap.plots.beeswarm(shap_values)
shap.plots.bar(shap_values)
# Pick one passenger
sample_index = 0
shap.plots.waterfall(shap_values[sample_index])

		

Overall Purpose: It provides a comprehensive summary of model behavior across the entire dataset, displaying the distribution of feature impacts rather than just average importance.

Vertical Axis (Y-Axis) – Features: Features are ranked in descending order based on their importance (mean absolute SHAP values), with the most influential features at the top.

Horizontal Axis (X-Axis) – SHAP Value: The horizontal position of each dot indicates the impact of that feature on the model’s output. Points on the right (positive SHAP values) increase the prediction; points on the left (negative SHAP values) decrease it.

Color-Coded Feature Values: Dot color indicates the original value of the feature for that sample. Typically, red represents a high value for the feature, while blue represents a low value.

Global Importance: The bar plot represents the mean absolute SHAP value for each feature. It answers the question: “On average, how much does this feature change the model output (in either direction)?”

Feature Ranking: Features are listed on the y-axis, typically sorted from top to bottom. The feature at the very top has the highest average impact on the model’s decisions.

Magnitude (x-axis): Unlike the beeswarm plot, the x-axis only shows positive values. It represents the average magnitude of the effect

Local Interpretation: Unlike the global bar or beeswarm plots, the waterfall plot is designed for single samples (local interpretability). Each observation in your dataset has its own unique waterfall plot.

Starting Point (Base Value): The plot begins at the Expected Value (bottom of the y-axis), which represents the average model output across the entire training dataset.

Incremental “Steps”: Features are listed on the y-axis, usually sorted by the magnitude of their impact for that specific prediction. Each bar represents a “step” that either increases or decreases the prediction. Direction and Color:

Red Bars (+): Indicate features that increased the prediction relative to the average. Blue Bars (-): Indicate features that decreased the prediction relative to the average.

Actual Feature Values: Next to each feature name on the y-axis, the plot displays the actual value (e.g., Age = 25) that the feature held for that specific sample.

Final Prediction ( ): The top of the plot shows the final model output for that instance. The sum of all the “steps” plus the base value equals this final value ( ).

RP’s Blog on AI

Connect with RP- https://www.linkedin.com/in/ratnakarpandey/

Model Explainability — A Clear and Practical Guide

Model Explainability — A Clear and Practical Guide

1️⃣ What is Model Explainability?

Simple Definition

Intuitive Analogy