This question is very hard to answer because ML is very new and not as semantically and academically defined as statistics.
However, it helps to see the problem a bit different by focusing on tools and methods as well as goals and use cases.
Per se machine learning uses statistical and mathematical algorithms to solve computational problems and specifically problems of prediction. This is true for traditional predictive analytics and tools as well.
A random forest model is not intrinsically traditional or ML and indeed I can fit it in SPSS as well as in python.
So what if any are the differences:
First, we have to understand that ML comes from a different domain than classical statistical analysis. For many cases, there will be no real difference but a CS student may call it ML while a Sociologist will call it predictive analysis.
So let's look at more specific questions of you:
- Why not use STATA and SPSS?
You can! Indeed STATA and SPSS do have ML capabilities themselves and could be used to model modern ML algorithms. However, for many specific use cases in the ML domain, STATA and SPSS lack computational power and the ability to model large data sets efficiently.
- Why not use traditional models like linear regression, etc. all the time?
Not every problem and every type of data fit classical methods. Linear regression, for example, cannot cope with sparse data and large instances of NAs, it cannot predict nonlinear relations and collapses really fast if we use a lot of predictors (a simple image can be converted to 786 predictors for example).
Additionally ML methods are just updating classical predictive analytics by implementing them using state of the art technology. ML is different from traditional statistics mainly in usage, where we have stuff like incremental learning and self-optimization, etc.
Also please understand that there is a huge difference between classical ML algorithms like boosted trees, random forest, Naive Bayes, etc. and deep-learning which is really, really different.