What is Bayes Theorem?¶
Bayes theorem is what allows us to go from a sampling (or likelihood) distribution and a prior distribution to a posterior distribution.
What is a Sampling Distribution?¶
A sampling distribution is the probability of seeing our data (X) given our parameters ($\theta$). This is written as $p(X|\theta)$.
For example, we might have data on 1,000 coin flips. Where 1 indicates a head. This can be represented in python as
Logistic Regression and Gradient Descent¶
Logistic regression is an excellent tool to know for classification problems. Classification problems are problems where you are trying to classify observations into groups. To make our examples more concrete, we will consider the Iris dataset. The iris dataset contains 4 attributes for 3 types of iris plants. The purpose is to classify which plant you have just based on the attributes. To simplify things, we will only consider 2 attributes and 2 classes. Here are the data visually:
INTRODUCTION TO PYTHON FOR DATA MINING¶
Python is a great language for data mining. It has a lot of great libraries for exploring, modeling, and visualizing data. To get started I would recommend downloading the Anaconda Package. It comes with most of the libraries you will need and provides and IDE and package manager.