This is not an easy
question because there is no common agreement on what “Data Mining” means. But,
I am going to say that I disagree with the answer from Wikipedia that Yuvraj Singla points to. I don’t think saying
that machine learning focuses on prediction is accurate at all although I
mostly agree with the definition of Data Mining focusing on the discovery of
properties on the data. So, let’s start with that: Data Mining is a
cross-disciplinary field that focuses on discovering properties of data sets.
(Forget about it being the analysis step of “knowledge discovery in databases”
KDD, this was maybe true years ago, it is not anymore).
On the other hand
Machine Learning is a sub-field of data science that focuses on designing
algorithms that can learn from and make predictions on the data. Machine
learning includes Supervised Learning and Unsupervised Learning methods.
Unsupervised methods actually start off from unlabeled data sets, so, in a way,
they are directly related to finding out unknown properties in them (e.g.
clusters or rules). It is clear then that machine learning can be used for data mining. However,
data mining can use other techniques besides or on top of machine learning.
Btw, to make things even more complicated, now we have a new term, Data Science, that is competing for
attention, especially with Data Mining and KDD. Even the SIGKDD group at ACM is
slowly moving towards using Data Science. In their website, they now describe
themselves as “The community for data
mining, data science and analytics”.
My bet is that KDD will disappear as a term pretty soon and data mining will
simply merge into data science.
Data mining isn’t a new invention that came with the digital age. The
concept has been around for over a century, but came into greater public focus
in the 1930s. According to Hacker Bits, one of the first modern moments of data
mining occurred in 1936, when Alan Turing introduced the idea of a universal
machine that could perform computations similar to those of modern-day
computers.
Forbes also reported on Turing’s development of the “Turing Test” in
1950 to determine if a computer has real intelligence or not. To pass his test,
a computer needed to fool a human into believing it was also human. Just two
years later, Arthur Samuel created The Samuel Checkers-playing Program that
appears to be the world’s first self-learning program. It miraculously learned
as it played and got better at winning by studying the best moves. We’ve come a
long way since then. Businesses are now harnessing data mining and machine
learning to improve everything from their sales processes to interpreting
financials for investment purposes. As a result, data scientists have become
vital employees at organizations all over the world as companies seek to
achieve bigger goals with data science than ever before.
With big data becoming so prevalent in the business world, a lot of
data terms tend to be thrown around, with many not quite understanding what
they mean. What is data mining? Is there a difference between machine learning
vs. data science? How do they connect to each other? Isn’t machine learning
just artificial intelligence? All of these are good questions, and discovering
their answers can provide a deeper, more rewarding understanding of data
science and analytics and how they can benefit a company.
Both data
mining and machine learning are rooted in data science and generally
fall under that umbrella. They often intersect or are confused with each other,
but there are a few key distinctions between the two. Here’s a look at some
data mining and machine learning differences between data mining and machine
learning and how they can be used. One key difference between machine learning
and data mining is how they are used and applied in our everyday lives. For
example, data mining is often used by machine learning to see the connections
between relationships. Uber uses machine learning to calculate ETAs for rides
or meal delivery times for UberEATS.
Datamining can be used for a variety of purposes, including financial research.
Investors might use data mining and web scraping to look at a start-up’s
financials and help determine if they want to offer funding. A company may also
use data mining to help collect data on sales trends to better inform
everything from marketing to inventory needs, as well as to secure new leads.
Data mining can be used to comb through social media profiles, websites, and
digital assets to compile information on a company’s ideal leads to start an
outreach campaign. Using data mining can lead to 10,000 leads in 10 minutes.
With this much information, a data scientist can even predict future trends that
will help a company prepare well for what customers may want in the months and
years to come.
Machine
learning embodies the principles of data mining, but can also make automatic
correlations and learn from them to apply to new algorithms. It’s the
technology behind self-driving cars that can quickly adjust to new conditions
while driving. Machine learning also provides instant recommendations when a
buyer purchases a product from Amazon. These algorithms and analytics are
constantly meant to be improving, so the result will only get more accurate
over time. Machine learning isn’t artificial intelligence, but the ability to
learn and improve is still an impressive feat.
Machine learning, on the other hand, can actually learn from the
existing data and provide the foundation necessary for a machine to teach
itself. Zebra Medical Vision developed a machine learning algorithm to predict
cardiovascular conditions and events that lead to the death of over 500,000
Americans each year. Machine learning can look at patterns and learn from them
to adapt behavior for future incidents, while data mining is typically used as
an information source for machine learning to pull from. Although data
scientists can set up data mining to automatically look for specific types of
data and parameters, it doesn’t learn and apply knowledge on its own without
human interaction. Data mining also can’t automatically see the relationship
between existing pieces of data with the same depth that machine learning can.


