Predictive analytics may be new to industry, but they aren’t that new to science. That means there are already best practices ready to go. Combine those with business realities, and you get the list that everyone should know, whether you’re measuring gamers, shoppers, gamblers or future users of health care.
First, the nuts & bolts, then the business cases.
Nuts & Bolts
This is exactly what you think it is. It’s the likelihood that the event is going to happen, given as a % or a number from 0 to 1. You can replace “the event” with any of the items in the second list below, or substitute your own.
This is not the same thing as probability. If you look back at that, you won’t see any indication of whether or not it’s accurate. It’s just the guess, and some guesses are pretty bad. As the saying goes, “even a broken clock is right twice a day,” which is a polite way of saying that even a bad model will get some things right. A good model outperforms chance, and has a high confidence rating. You can measure confidence several ways.
This is only one of several ways of conveying confidence in the model, but it’s the best. Consider a score that gives you all of the correct cases. Well, you can just select all of the cases, and you’re going to be right about 100% of the correct ones. Or consider a score that gives you all of the incorrect cases. You can be 100% right there. What an F-Score does is to take those two approaches and average their results. You can’t cheat it, and the score, while looking like a %, is actually better than that. You can read more about the details here.
The best practice is to provide both probability and confidence, and to use F-Score as the confidence measure.
Business Predictive Metrics
With the basics in hand, it’s now all about what you want to predict, and this varies widely by industry and business model. In modeling terms, these are all “dependent variables,” which if you’ve purposely blocked out high school algebra, is just the “thing you want to predict.” So, your business model will use some subset of this list:
Churn is the granddaddy of all predictives because it’s been around longer and studied more, primarily in telecommunication. That’s not surprising because that industry works on a subscription and use model, so people leaving are a big deal. It’s less essential when the industry is a one-time or rare purchase, like say a BMW. The basic definition of churn is a user who stops paying for a service. However, with many business models now being free-to-try/use (“freemium”), definitions of churn have to adapt to reflect activity rather than pure spending. In that case, just when a user quits is harder to determine. So, more powerful models use the interval of a given user’s activity to predict when they have “churned out.”
Speaking of new business models, many new apps, games and even some older industries are allowing users to try their service for free in the hopes of enticing them to pay. For example, Skype and TurboTax have free versions, but they’d really like you more if you paid for the premium services. This is common in “free to play” games, and in many other apps and industries. Conversion is simply the move from “not paying” to “paying.” It’s binary, so it doesn’t matter how much they pay, only that they move from 0 to something.
This one’s pretty obvious and important. It’s how much people will spend over time before they stop spending (churn). In a predictive model, you can set the time horizon to be spending next month or next year. If you make it forever, expect the confidence in the number to drop off when the users are long-lived in the system. A full monetization model for a user’s lifetime is the same thing as their lifetime value--commonly called an LTV. In other words, a good predictive model can improve your LTV accuracy, which leads to smarter CRM.
There’s a large branch of science devoted to the study of social networks, in which people are trying to understand the impact of one person on others, and their friends, and so on (and so on...). “Virality” is an unfortunate but accurate metaphor, but it doesn’t actually quantify much. A more powerful measure like Social Value tells you which people influence others, and how much. Some people use talking as a proxy for influence, as in you posted to Twitter then 5 people followed you, so your score is a 5. No matter how intricate the scoring system, talking is not the same thing as doing. You want to measure actions as they spread across friendship networks, which we invented and automated because science taught us that inferring is a lot less valuable than knowing.
This one varies a lot, but the idea is that you want to know how interested and committed your users were, and will be. Unless you are asking them via surveys, you won’t know the truth, and you’ll be using a proxy for it. The typical proxy is some measure of time spent on the site/service/game/app. Raw time can be used, as can session length, frequency of sessions, time between them, and several flavors. Your choice should make sense in the context of your service.
Fairly self-evident. If your service is ad-driven, you want to know all of the usual ecommerce metrics around hover-overs, clicks, click-throughs, and anything you can about follow-through and purchasing or action. But since you can model the predictive version of anything, there’s no reason why you shouldn’t turn those past metrics into predictives, giving you a window into the quantity, probability and confidence in future clicks, purchases, etc.