Matrix factorization works beautifully for plain user-item ratings, but real recommendation problems have many more features: time of day, device, context, demographics. Standard models like support vector machines struggle on the extremely sparse data those features produce. In a 2010 paper at the IEEE International Conference on Data Mining, Steffen Rendle introduced Factorization Machines (FMs) to bridge the gap.
An FM predicts an outcome from a set of input features, like a linear model, but it also models the interaction between every pair of features. The trick is that instead of learning a separate weight for each pair, which would be hopeless on sparse data, it learns a small latent vector for each feature and models pairwise interactions as the dot products of those vectors. This factorized form lets the model estimate interactions even between feature combinations it has rarely or never seen together, and it computes in linear time. Rendle showed that by choosing the input features appropriately, FMs generalize several specialized factorization models, including matrix factorization, SVD++, and sequential models, within one framework usable by non-experts.
Factorization Machines became a popular and competitive method for click-through-rate prediction and context-aware recommendation, frequently appearing in winning solutions on prediction competitions, and inspired neural successors such as DeepFM.
For a business reader, FMs are why a recommender can use far more than just “who bought what”: they made it practical to fold in all the surrounding context and still get reliable predictions from sparse data.