朴素贝叶斯文本分类器 Naive Bayes Classifier on Text Classification

1 Brief Introduction

入门简介

(点击左右按钮 或 使用键盘上的左右键切换页面)

点击全屏 (Full Screen)


2 Naive Bayes Classifier on Text Classification

朴素贝叶斯文本分类器原理


2.1 In English

说人话

  • 第一步,建立模型:
    • 整理出训练数据中所有语句中的词语
    • 计算出每个词语在各类中出现的频率,即
    • 并且求出某类的初步判断概率,即
  • 第二步,应用模型:
    • 计算给定语句在各个类下的概率;
    • 计算语句$t$ 在类$c$中的概率,即:
    • 概率数值最大的即为其所属类;

2.2 In Computer

少说话,往下看


The Basic: Bayes’ Formula

贝叶斯定理

For inverting the conditioning

Suppose that $B_1,B_2,…,B_n$ are n exhaustive events and exhaustive events, then:

$\because P(B_k\cap A) = P(B_k)\cdot P(A|B_k)$

$\text{ based on the Conditional Probability,}$

$\text{and }P(A)=P(B_1)P(A|B_1)+…+P(B_n)P(A|B_n)$

$\text{ based on the Law of Total probability}$


Detailed Explanation

详细解释


1 Our Goal

1 我们的目标

Based on the trained model, given a document (or a sentence/text), we can predict the class of the document.

$d$: a given document, or sentence/text;
$C$: set of all possible classes, e.g. (positive, negative, neutral);
$c$: the final result we want, to be the predicted class of $d$.
Goal is to get the maximum value of $P(c|d),c \in C$, which means given the document $d$, find its class with the maximum probability.

The goal: to get the maximum value of $P(c|d),c \in C$,

Based on Bayes’ Formula, we have,

So our final class $c$ is

(MAP is maximum a posteriori = most likely class)

where we ignore $P(d)$, because it is not related to $c$, like a constant.

  • $x_1,x_2,…,x_n$ are all words in $d$, e.g. all words in an message,
  • $P(c),c\in C$ is the frequency of occurrence of this class, by count the relative frequencies, e.g. the frequencies of normal messages and spam messages.

For $P(x_1,x_2,…,x_n|c)$, we have two assumptions to simplify the prediction:

  • Bag of Words assumption: Assume words position doesn’t matter;
  • Conditional Independence: Assume $P(x_j|c_j)$ are independent, which means each word in a message is independent with other words in the message.

Based on the assumption, we have:

Therefore, we can simplify the prediction $c_{MAP}$ above:

As same as:

$positions$ = all word positions in the test document.

Multiplying floating point numbers may cause underflow loss,
then based on,

then,

This is our final goal, next we need to get the value of each term.


2 Our Model Building Process

2 我们的模型建立过程

  • For the maximum likelihood estimates $P(c_j)$:

Get the frequencies of the class appear in the dataset.
求出某类的初步判断概率,即


  • For the Parameter estimation $P(w_i|c_j)$:

($V$ is the vocabulary maintaining all the words used for classification in dataset we trained)
($V$ 代表的是所有训练集中的语句中的词汇))

Get the frequencies of the word $w_i$ appears within all word in the dataset with class $c_j$.
计算出每个词语在各类中出现的频率,即


Problem:

Not training of some words will lead the result to 0 directly, which is improper.

Solution:

Laplace (add-1) Smoothing for Naive Bayes
默认每个词都多出现一次


3 Result of the model

3 模型的结果


4 Applying the model

4 应用模型

[Example]

We have 3 classes: Positive, Negative, Neutral.
then $C={\text{positive, negative, neutral}}$

We can get the model results:

Then we can find the class with maximum value, e.g. $c_{\text{neutral}}$ is the largest one, then the class of the tested document is Neutral.


References

Slides of COMP1433 Introduction to Data Analytics, The Hong Kong Polytechnic University.
Naive Bayes, Clearly Explained!!! - YouTube


写在最后

初次接触Naive Bayes,相关的知识会继续学习,继续更新.
最后,希望大家一起交流,分享,指出问题,谢谢!


原创文章,转载请标明出处
Made by Mike_Zhang




感谢你的支持

朴素贝叶斯文本分类器 Naive Bayes Classifier on Text Classification
https://ultrafish.io/post/Naive-Bayes-Classifier-on-Text-Classification/
Author
Mike_Zhang
Posted on
April 8, 2022
Licensed under