Building and deploying predictive models
Having already built a solid data foundation that enables you to have accurate, timely data, you're ready to delve deeper into the data and extract some powerful insights from it. If you're aiming to better understand what drives the patterns you see in the data, conducting some data analysis may be a good starting point. The natural next step in your journey will likely be using your data to classify or predict outcomes before they occur, something that can be facilitated by both traditional statistics and machine learning (ML).
How can a predictive model be helpful? ✍️
Training and applying predictive models can be useful in a wide variety of contexts, spanning from identifying and fixing errors in your data to classifying events into different categories and forecasting outcomes well ahead of time.
Here are some cases in which you could apply a predictive model in practice:
- Predicting demand for different products and optimising inventory levels to minimise cost
- Assessing the likelihood a customer will default on a payment or return a product
- Analysing customer reviews on various websites and/or social media to gauge public sentiment about a product or brand and inform future marketing decisions
- Analysing job candidates data to predict the best fit for job openings and improve the hiring process
- Forecasting when a piece of equipment might fail and taking preemptive action to reduce downtime
- Forecasting asset prices based on macroeconomic indicators and making investment decisions such as buying and selling at the most opportune time
- Filling in missing data based on hidden relationships between different variables, thereby increasing the quality of your reporting
What kind of models are out there? ⚙️
A model is a simplified representation of the real world designed to understand, analyse, and make predictions about the real world. When it comes to data, it's important to distinguish between three kinds of models.
Database models
These models define how data is connected, stored, and retrieved in databases and information systems. Data models ensure that data is consistent, organised, and accurately reflects the information it is intended to represent. In the example below, you can see how a database model can be used to connect information on products, customers and sales, where different columns serve as the links between the three tables:
Data models of this type are essential for reporting purposes (e.g. in software such as Power BI) and serve as the foundation for building actual predictive models.
Statistical models
These models can be used for either explaining and measuring relationships between different variables in the data or to generate predictions for a certain outcome variable (say, number of sales) based on a set of input variables (say, number of potential customers, number of competing products and sales price). Statistical models usually involve using some form of regression analysis.
Machine learning (ML) models
These models are mainly used for predictive purposes. Behind the scenes, these models can take multiple forms: some are more like decision trees on steroids (e.g. random forest, gradient boosting), while others are designed to resemble the way the human brain works (e.g. neural networks). In the example below, you can see how a decision-tree based model may predict the likelihood of a customer returning a particular product:
Machine learning models are usually trained on one set of data, then evaluated on another set of data so as to make sure that they scale well beyond the data they were originally trained, which in turn allows us to apply them to an entirely new set of data.
When we refer to predictive models, we intend the use of either a statistical or a machine learning model.
How does machine learning compare to traditional statistics? 🤖
Compared to traditional statistics, machine learning models offer a higher degree of scalability as well as the ability to apply the models on entirely new datasets (provided that the model has been tested and optimised accordingly). Unlike statistical models, ML models allow for testing the impact of many more different variables at the same time, and usually perform better on large datasets.
Below is a quick comparison of machine learning and statistics, highlighting the things the two approaches are best at:
Ready to build your first model? 🔮
If you're considering whether you can employ the power of statistics and machine learning to analyse, classify or predict outcomes, feel free to reach out for a free, no-strings-attached 30 minutes meeting where we can discuss your needs in more detail:
