Traditional Predictive Analytics vs Machine Learning Methods

What is the difference between traditional predictive analytics done using statistics and its tools and, one using machine learning and deep learning? How are we leveraging machine learning and deep learning to make predictive models better? How to decide the boundary for when to use traditional models and, when to use ML and DL? We have a lot of efficient statistical tools available Stata, SPSS. They are easier to use. Also, computationally efficient.

Thanks

I am unable to find some satisfactory and convincing answer with strong points on google.

Topic deep-learning statistics predictive-modeling machine-learning

Category Data Science


Additional to the existing answers, the core reasons to leverage Deep Learning as opposed to statistics or more traditional ML techniques are:

  • Scale of data: Deep Learning algorithms work efficiently on high amount of data (both structured and unstructured) and are best suited for unstructured data like images, videos, voice, natural language processing etc.
  • Scale of computation: Deep Learning algorithms require high computational power for complex operations like matrix multiplications (GPUs).
  • Feature Extraction: Conversely to traditional machine learning, in DL we do not need to manually identify the features from the dataset based on the domain knowledge and expertise.
  • Data Augmentation: DL enables creating new data by making reasonable modifications to the existing data is called data augmentation. We can use data augmentation technique when our model is overfitting due to less data.
  • Testing Time: Deep Learning algorithms take much less testing time as compared to traditional machine learning algorithms.
  • Dimensionality: As the dimension of the data increases, efficiency of machine learning algorithms starts degrading. Although we have some dimensionality reduction techniques in machine learning like PCA, t-SNE, SVD, MDS etc but deep learning takes care of dimensionality very well.

However,

  • Training Data: Deep Learning algorithms usually require more training data as compared to machine learning algorithms.
  • Training Time: Deep Learning algorithms usually take a longer time to train as compared to machine learning algorithms.
  • Interpretability: Machine Learning algorithms are more interpretable as compared to deep learning algorithms

Sauces:
https://machinelearningmastery.com/
https://www.analyticsvidhya.com/\ https://datascience.stackexchange.com/\ https://reddit.com/r/machinelearning/
https://builtin.com/data-science/


It's an interesting question. According to my understanding, first of all, it will be totally incorrect to say that a statistical approach and machine learning are the same things (even though they might look like).

--Statistics plays a major role in building any machine learning model. But Machine Learning aims to predict something based on the training set and compare to a test set, whereas statistics help us build a relationship between variables.

--Take the basic example of linear regression, where it seems as if both statistical methods and ML are the same(which is not true). Linear regression is a statistical method, we can train a linear regressor and obtain the same outcome as a statistical regression model aiming to minimize the squared error between data points. But in ML, you will be dividing your dataset into a training dataset and build your model from it and test it on a 'test set' to see how it worked. But statistics just focus on the underlying relationship between the variables. Even though this can be used for predictive analysis you mentioned in the question, but the evaluation will not be on the basis of train vs test results, rather by evaluating the significance and robustness of the model parameters.

--ML can be used for building supervised and unsupervised models where statistical methods can help you forecast continuous variables, regression, and classification. ML uses it for sure but has much more objective to it than just finding a relationship.

I hope the answer gives you something to think about.


This question is very hard to answer because ML is very new and not as semantically and academically defined as statistics.

However, it helps to see the problem a bit different by focusing on tools and methods as well as goals and use cases.

Per se machine learning uses statistical and mathematical algorithms to solve computational problems and specifically problems of prediction. This is true for traditional predictive analytics and tools as well.

A random forest model is not intrinsically traditional or ML and indeed I can fit it in SPSS as well as in python.

So what if any are the differences:

First, we have to understand that ML comes from a different domain than classical statistical analysis. For many cases, there will be no real difference but a CS student may call it ML while a Sociologist will call it predictive analysis.

So let's look at more specific questions of you:

  1. Why not use STATA and SPSS?

You can! Indeed STATA and SPSS do have ML capabilities themselves and could be used to model modern ML algorithms. However, for many specific use cases in the ML domain, STATA and SPSS lack computational power and the ability to model large data sets efficiently.

  1. Why not use traditional models like linear regression, etc. all the time?

Not every problem and every type of data fit classical methods. Linear regression, for example, cannot cope with sparse data and large instances of NAs, it cannot predict nonlinear relations and collapses really fast if we use a lot of predictors (a simple image can be converted to 786 predictors for example).

Additionally ML methods are just updating classical predictive analytics by implementing them using state of the art technology. ML is different from traditional statistics mainly in usage, where we have stuff like incremental learning and self-optimization, etc.

Also please understand that there is a huge difference between classical ML algorithms like boosted trees, random forest, Naive Bayes, etc. and deep-learning which is really, really different.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.