A Naive Bayes Implementation

4 minute read

\[\DeclareMathOperator*{\argmax}{argmax}\]

The Naive Bayes is a method for constructing classifier: that is, a computer program that assign class labels to problem instances.

Theorical Explanation

The naive Bayes learner, often called the naive Bayes classifier, has a performance comparable to that of neural network and decision tree learning. Pretty good, isn’t it? But what’s the problem framework to apply naive Bayes?

The naive Bayes classifier applies to learning tasks where each instance x is described by a conjunction of attribute values and where the target function \(f(x)\) can take on any value from some finite set V. A set of training examples of the target function is provided, and a new instance is presented, described by the tuple of attribute values \(<a_1, a_2, ..., a_n>\). The learner is asked to predict the target value, or classification, for this new instance. - Mitchell, p.177

Good. But how are we going to classify new instances? Well, the most probable target value \(v_{map}\), given the attribute values that describe the instance is that one which leads to the greater posterior probability:

\[\argmax_{v_i \in V} P(v_j \vert a_1,...,a_n)\]

How to compute such posterior probability? We can use Bayes theorem to rewrite the expression:

\[v_{map} = P(v_j \vert a_1,...,a_n) = \argmax_{v_i \in V} \frac{P(a_1,...,a_n \vert v_j)P(v_j)}{P(a_1,a_2,...,a_n)}\]

and, since the denominator does not affect the argmax:

\[v_{map} = \argmax_{v_i \in V} P(a_1,...,a_n \vert v_j)P(v_j)\]

We compute \(P(v_j)\) by counting the frequency with which each target value \(v_j\) occurs in the training data. To estimate \(P(a_1,...,a_n \vert v_j)\) is not that feasible, then we are going to make some assumptions. Actually, we are going to make a naive assumption.

(…) given the target value of the instance, the probability of observing the conjunction \(a_1, a_2, ..., a_n\), is just the product of the probabilities for the individual attributes: \(P(a_1, a_2m . . ., a_n \vert v_j) = \prod _i P(a_i \vert v_j)\). - Mitchell, p.177

Doing this substitution, we get the Naive Bayes Classifier:

\[v_{nb} = \argmax_{v_j \in V} P(v_j) \prod _i P(a_i|v_j)\]

Code

The first step is to import some modules:

import numpy as np
import pandas as pd

The function will receive to parameters:

dataset: a pandas dataframe, with the classes allocated in the last column. Each of the other columns are the attributes, and the rows contains the training examples
instance: a python list. A list of the instance attribute.

and then it returns \(v_{nb}\).

def naive_bayes(dataset,instance):
  """
  Parameters: the dataset (a pandas dataframe, with classes as the last column) and the instance to be classified (a list)
  Returns: v_map (the most probable class of the instance, given its attributes)
  """
  v_nb = 'idk'

  existing_classes = list(set(dataset.iloc[:,-1].tolist())) # this is the v_j \in V
  classes_probability = []

  for v in existing_classes:
    P_vj = dataset.iloc[:,-1].values.tolist().count(v) / dataset.iloc[:,-1].shape[0]
    prod = 1
    subset = dataset[dataset[dataset.columns.tolist()[-1]] == v]
    for it in range(0, len(instance)):
      counting_occurence = subset.iloc[:,it].tolist().count(instance[it])
      prod = prod *(counting_occurence/subset.shape[0])
    classes_probability.append(P_vj*prod)

  v_nb = existing_classes[np.argmax(classes_probability)]

  return v_nb

The first step of the code is to collect all the classes from the last Pandas Data Serie. Then, we allocate an empty list to collect the probability of each class.

For each possible class v we compute the following process:

compute \(P(v_j)\) by counting how many times class v appears in the dataset
get the subset of the pandas dataframe for which the class is v. Allocate a productory starting with 1.
Start a new loop over each subset column, and count how many times the respective instance attribute value appears.
Append each probability value in the classes_probability list.

At the end, we return v_nb as the class v with the greatest respective probability.

Example

In Mitchell, p.71, we see a table related to the “Play Tennis” problem. In p.190 we are given the instance [‘Sunny’, ‘Cool’,’High’,’Strong’]. We are then ask to predict if it should to play tennis or not (to classify the instance).

First of all, we implement the table of page 71:

import pandas as pd

d = {'Outlook':['Sunny','Sunny','Overcast','Rain','Rain','Rain','Overcast','Sunny','Sunny','Rain','Sunny','Overcast','Overcast','Rain'],
     'Temperature':['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mild','Mild','Hot','Mild'],
     'Humidity':['High','High','High','High','Normal','Normal','Normal','High','Normal','Normal','Normal','High','Normal','High',],
     'Wind':['Weak','Strong','Weak','Weak','Weak','Strong','Strong','Weak','Weak','Weak','Strong','Strong','Weak','Strong'],
     'PlayTennis':['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','No']}
df = pd.DataFrame(data= d)   
df  

Such dataframe can be seen below:

Outlook	Temperature	Humidity	Wind	PlayTennis
Sunny	Hot	High	Weak	No
Sunny	Hot	High	Strong	No
Overcast	Hot	High	Weak	Yes
Rain	Mild	High	Weak	Yes
Rain	Cool	Normal	Weak	Yes
Rain	Cool	Normal	Strong	No
Overcast	Cool	Normal	Strong	Yes
Sunny	Mild	High	Weak	No
Sunny	Cool	Normal	Weak	Yes
Rain	Mild	Normal	Weak	Yes
Sunny	Mild	Normal	Strong	Yes
Overcast	Mild	High	Strong	Yes
Overcast	Hot	Normal	Weak	Yes
Rain	Mild	High	Strong	No

The instance can be coded as:

instance = ['Sunny', 'Cool','High','Strong']

Since I already coded the Naive Bayes Classifier, we just need to apply it:

classificacao = naive_bayes(df, instance)

As we can seen, the answer is ‘No’.

References

Share on

Twitter Facebook LinkedIn

Mateus de Assis Silva

A Naive Bayes Implementation

Theorical Explanation

Code

Example

References

Share on

You may also enjoy

Aplication of Multiple Filters over Images

Tiltshift on Video

Motion Detector

Tiltshift