Data Drift Example: Code, Part 2
from sklearn.metrics import r2_score, mean_absolute_error
km_clean = np.random.normal(25000, 10000, 200).clip(5000, 60000)
price_clean = 8 * np.exp(-km_clean / 50000) + np.random.normal(0, 0.3, 200)
km_drift = np.random.normal(80000, 20000, 200).clip(30000, 150000)
price_drift = 8 * np.exp(-km_drift / 50000) + np.random.normal(0, 0.3, 200)
pred_clean = model.predict(km_clean.reshape(-1, 1))
pred_drift = model.predict(km_drift.reshape(-1, 1))
print("Clean R^2:", r2_score(price_clean, pred_clean))
print("Drift R^2:", r2_score(price_drift, pred_drift))
print("Clean MAE:", mean_absolute_error(price_clean, pred_clean))
print("Drift MAE:", mean_absolute_error(price_drift, pred_drift))
Expected output from this run:
Clean R^2: 0.908
Drift R^2: -10.675
Clean MAE: 0.241
Drift MAE: 2.366
Only one thing changed between the two evaluations:
the distribution of km values.