# Classification of Text

Last Updated on September 30, 2022 by David Vause

## Text Classification

supervised learning: machines learn from past instances.

training phase: information is gathered, and a model is built.

inference phase: model is applied. unlabeled data.

labeled inputs: labels are known.

model is built: the classification model

unlabeled input -> classification model -> labeled outputs.

Learn a classification model on properties (“features”) and their importance (“weights”) from labeled instances.

• X: set of attributes or features {x1, x2, …, xn}  The input.
• y: A “class” label from the label set Y={y1, y2, …, yn}

Apply the model to instances to predict the label.

Validation set. The set of the training data that is used to test on.

• binary classification: the number of possible classes is two. |Y| = 2
• multi-class: number of classes is greater than two. |Y| > 2
• multil-label classification: instances can have two or more labels.

Training phase:

• what are the features and how do you represent them?
• What is the classification model or algorithm?
• What are the model parameters?

Inference phase:

• What is the expected performance? What is a good measure?

## Identifying Features from Text

Types of textual features:

• words
• stop words: commonly occurring words
• normalization: case
• stemming/lemmatizing: plurals are same as singulars
• case
• White House vs. white house
• parts of speech
• whether vs. weather
• grammatical structure, sentence parsing
• semantics: one feature for a particular group of words
• honorifics, numbers, dates

## Naive Bayes Classifiers

• prior probability:
• Pr(y=entertainment), Pr(y=CS), Pr(y=zoology)
• sum equals 1
• update the likelihood of the class given new information.
• posterior probability: Pr(y=entertainment|x=’Python’)

Bayes’ Rule

• $$\text {posterior probability} = \frac{\text{prior probability} \times likelihood} {evidence}$$
• $$Pr(y|X) = \frac {Pr(y) \times Pr(X|y)} {Pr(X)}$$