Training-Serving Skew

Training-serving skew is a difference between how a model is trained and how it is used in production that causes it to perform worse in the real world than its offline evaluation suggested. The model looks good in testing because the test data resembles the training data, but in serving it encounters data prepared differently, computed by different code, or drawn from a different distribution, and its accuracy quietly suffers.

Google’s documentation for TensorFlow Data Validation, a tool built to detect exactly this problem, describes training-serving skew as arising from issues such as biased sampling or different data sources between the training and serving pipelines. Common causes include feature-engineering code that differs between the training pipeline and the live service, features that are available at training time but not at serving time, and subtle changes in how raw inputs are encoded. A closely related failure is data drift, where the serving distribution gradually moves away from the training distribution over time. Tools like TensorFlow Data Validation compare the statistics of training and serving data against a schema and flag anomalies and distribution shifts.

For a business reader, training-serving skew explains a frustratingly common experience: a model that aced its tests but underdelivers once it is actually deployed, usually because of a mismatch in the data plumbing rather than the model itself.

Sources

Related