Emoji: Extract, Analyze, and Get Insights๏ƒ

An emoji is worth a thousand words! Regular expressions and helper functionality to aid in extracting and finding emoji from text.

EMOJI_ENTRIES

A dictionary of the full emoji list together with unicode code points, textual name, group, and sub-group. Based on v15.1 https://unicode.org/Public/emoji/15.1/emoji-test.txt

emoji_df

The same dictionary as a pandas DataFrame

extract_emoji()

A function for extracting and summarizing emoji in a text list, with statistics about frequencies and usage.

emoji_search()

A function for searching across names, groups, and sub-groups to find emoji based on your keywords of choice.

EMOJI_RAW

A regular expression to extract the full list. See here on how it was developed: https://www.kaggle.com/eliasdabbas/how-to-create-a-python-regex-to-extract-emoji

Extract Emoji from Text๏ƒ

Many times you might have some social media text, or any regular text containing emoji that you want to analyze. The extract_emoji() function does that, and returns useful information about the extracted emoji. You can play around with the following sample text list, modify it, and explore the different stats, and information about the extracted emoji:

text_list = ['I feel like playing basketball ๐Ÿ€',
             'I like playing football โšฝโšฝ',
             'Not feeling like sports today']

emoji_summary = adv.extract_emoji(text_list)
print(emoji_summary.keys())

Return a DataFrame of all emoji entries that match regex.

The search is run on the name of the emoji, its group, and sub-group.

Parameters:

regex (str) -- regular expression (case insensitive)

>>> import advertools as adv
>>> adv.emoji_search("dog")
          codepoint           status  emoji          name             group        sub_group
0             1F436  fully-qualified     ๐Ÿถ      dog face  Animals & Nature    animal-mammal
1             1F415  fully-qualified     ๐Ÿ•           dog  Animals & Nature    animal-mammal
2             1F9AE  fully-qualified     ๐Ÿฆฎ     guide dog  Animals & Nature    animal-mammal
3  1F415 200D 1F9BA  fully-qualified     ๐Ÿ•โ€๐Ÿฆบ   service dog  Animals & Nature    animal-mammal
4             1F32D  fully-qualified     ๐ŸŒญ       hot dog      Food & Drink    food-prepared
>>> blue = adv.emoji_search("blue")
>>> blue
  codepoint           status emoji                name               group     sub_group
0     1F499  fully-qualified     ๐Ÿ’™          blue heart  Smileys & Emotion       emotion
1     1FAD0  fully-qualified     ๐Ÿซ         blueberries       Food & Drink    food-fruit
2     1F4D8  fully-qualified     ๐Ÿ“˜           blue book            Objects    book-paper
3     1F535  fully-qualified     ๐Ÿ”ต         blue circle            Symbols     geometric
4     1F7E6  fully-qualified     ๐ŸŸฆ         blue square            Symbols     geometric
5     1F537  fully-qualified     ๐Ÿ”ท  large blue diamond            Symbols     geometric
6     1F539  fully-qualified     ๐Ÿ”น  small blue diamond            Symbols     geometric
extract_emoji(text_list)[source]๏ƒ

Return a summary dictionary about emoji in text_list

Get a summary of the number of emoji, their frequency, the top ones, and more.

Parameters:

text_list (list) -- A list of text strings.

Returns summary:

A dictionary with various stats about emoji

>>> posts = [
...     "I am grinning ๐Ÿ˜€",
...     "A grinning cat ๐Ÿ˜บ",
...     "hello! ๐Ÿ˜€๐Ÿ˜€๐Ÿ˜€ ๐Ÿ’›๐Ÿ’›",
...     "Just text",
... ]
>>> emoji_summary = extract_emoji(posts)
>>> emoji_summary.keys()
dict_keys(['emoji', 'emoji_text', 'emoji_flat', 'emoji_flat_text',
'emoji_counts', 'emoji_freq', 'top_emoji', 'top_emoji_text',
'top_emoji_groups', 'top_emoji_sub_groups', 'overview'])
>>> emoji_summary["emoji"]
[['๐Ÿ˜€'], ['๐Ÿ˜บ'], ['๐Ÿ˜€', '๐Ÿ˜€', '๐Ÿ˜€', '๐Ÿ’›', '๐Ÿ’›'], []]
>>> emoji_summary["emoji_text"]
[['grinning face'], ['grinning cat'], ['grinning face', 'grinning face',
  'grinning face', 'yellow heart', 'yellow heart'], []]

A simple extract of emoji from each of the posts. An empty list if none exist

>>> emoji_summary["emoji_flat"]
['๐Ÿ˜€', '๐Ÿ˜บ', '๐Ÿ˜€', '๐Ÿ˜€', '๐Ÿ˜€', '๐Ÿ’›', '๐Ÿ’›']
>>> emoji_summary["emoji_flat_text"]
['grinning face', 'grinning cat', 'grinning face', 'grinning face',
'grinning face', 'yellow heart', 'yellow heart']

All emoji in one flat list.

>>> emoji_summary["emoji_counts"]
[1, 1, 5, 0]

The count of emoji per post.

>>> emoji_summary["emoji_freq"]
[(0, 1), (1, 2), (5, 1)]

Shows how many posts had 0, 1, 2, 3, etc. emoji (number_of_emoji, count)

>>> emoji_summary["top_emoji"]
[('๐Ÿ˜€', 4), ('๐Ÿ’›', 2), ('๐Ÿ˜บ', 1)]
>>> emoji_summary["top_emoji_text"]
[('grinning face', 4), ('yellow heart', 2),
 ('grinning cat', 1)]
>>> emoji_summary["top_emoji_groups"]
[('Smileys & Emotion', 7)]
>>> emoji_summary["top_emoji_sub_groups"]
[('face-smiling', 4), ('emotion', 2), ('cat-face', 1)]
>>> emoji_summary["overview"]
{'num_posts': 4,
 'num_emoji': 7,
 'emoji_per_post': 1.75,
 'unique_emoji': 3}