Last Updated on November 20, 2022 by David Vause
The DataFrame apply() Function
An example using Natural Language Tool Kit
moby_raw: is text from nltk.collections
The first assignment of sentences results in a string list of sentences.
The second assignment makes sentences a DataFrame with one column, sentence.
The next statement adds a column, count.
The apply function applies count_t() to each row in sentences.
Overall, the code reads moby_raw from nltk.collections and finds the average number of words in each sentence.
import nltk import pandas as pd import numpy as np # If you would like to work with the raw text you can use 'moby_raw' with open('moby.txt', 'r') as f: moby_raw = f.read() def count_t(row): row['count'] = len(nltk.word_tokenize(row['sentence'])) return row sentences = nltk.sent_tokenize(moby_raw) sentences = pd.DataFrame(sentences, columns=['sentence']) sentences['count'] = '' sentences = sentences.apply(count_t, axis=1) mean = sentences['count'].mean()
Unpacking a List of Tuples
Another example from Natural Language Tool Kit
moby_tokens contains a list of tokens in the text of Moby Dick:
['[', 'Moby', 'Dick', 'by', 'Herman', 'Melville', '1851', '
nltk.pos_tag returns a list of tuples containing the word and an abbreviation for its part of speech.
import nltk import pandas as pd import numpy as np moby_tokens = nltk.word_tokenize(moby_raw) pos_lst = nltk.pos_tag(moby_tokens) pos_lst = [tup for tup in pos_lst]