Emoji: Extract, Analyze, and Get Insights

An emoji is worth a thousand words! Regular expressions and helper functionality to aid in extracting and finding emoji from text.

EMOJI_ENTRIES

A dictionary of the full emoji list together with unicode code points, textual name, group, and sub-group. Based on v13.0 https://unicode.org/Public/emoji/13.0/emoji-test.txt

emoji_df

The same dictionary as a pandas DataFrame

extract_emoji()

A function for extracting and summarizing emoji in a text list, with statistics about frequencies and usage.

emoji_search()

A function for searching across names, groups, and sub-groups to find emoji based on your keywords of choice.

EMOJI_RAW

A regular expression to extract the full list. See here on how it was developed: https://www.kaggle.com/eliasdabbas/how-to-create-a-python-regex-to-extract-emoji

Return a DataFrame of all emoji entries where any description contains regex.

“description” can be the name of the emoji, its group, or sub-group.

Parameters

regex (str) – regular expression (case insensitive)

>>> emoji_search('dog')
          codepoint           status  emoji          name             group      sub_group
0             1F436  fully-qualified     🐶      dog face  Animals & Nature   animal-mammal
1             1F415  fully-qualified     🐕           dog  Animals & Nature   animal-mammal
2             1F9AE  fully-qualified     🦮     guide dog  Animals & Nature   animal-mammal
3  1F415 200D 1F9BA  fully-qualified   🐕‍🦺   service dog  Animals & Nature    animal-mammal
4             1F32D  fully-qualified     🌭       hot dog      Food & Drink    food-prepared
>>> blue = adv.emoji_search('blue')
  codepoint           status emoji                name               group     sub_group
0     1F499  fully-qualified     💙          blue heart  Smileys & Emotion       emotion
1     1FAD0  fully-qualified     🫐         blueberries       Food & Drink    food-fruit
2     1F4D8  fully-qualified     📘           blue book            Objects    book-paper
3     1F535  fully-qualified     🔵         blue circle            Symbols     geometric
4     1F7E6  fully-qualified     🟦         blue square            Symbols     geometric
5     1F537  fully-qualified     🔷  large blue diamond            Symbols     geometric
6     1F539  fully-qualified     🔹  small blue diamond            Symbols     geometric
extract_emoji(text_list)[source]

Return a summary dictionary about emoji in text_list

Get a summary of the number of emoji, their frequency, the top ones, and more.

Parameters

text_list (list) – A list of text strings.

Returns summary

A dictionary with various stats about emoji

>>> posts = ['I am grinning 😀','A grinning cat 😺',
...          'hello! 😀😀😀 💛💛', 'Just text']
>>> emoji_summary = extract_emoji(posts)
>>> emoji_summary.keys()
dict_keys(['emoji', 'emoji_text', 'emoji_flat', 'emoji_flat_text',
'emoji_counts', 'emoji_freq', 'top_emoji', 'top_emoji_text',
'top_emoji_groups', 'top_emoji_sub_groups', 'overview'])
>>> emoji_summary['emoji']
[['😀'], ['😺'], ['😀', '😀', '😀', '💛', '💛'], []]
>>> emoji_summary['emoji_text']
[['grinning face'], ['grinning cat'], ['grinning face', 'grinning face',
  'grinning face', 'yellow heart', 'yellow heart'], []]

A simple extract of emoji from each of the posts. An empty list if none exist

>>> emoji_summary['emoji_flat']
['😀', '😺', '😀', '😀', '😀', '💛', '💛']
>>> emoji_summary['emoji_flat_text']
['grinning face', 'grinning cat', 'grinning face', 'grinning face',
'grinning face', 'yellow heart', 'yellow heart']

All emoji in one flat list.

>>> emoji_summary['emoji_counts']
[1, 1, 5, 0]

The count of emoji per post.

>>> emoji_summary['emoji_freq']
[(0, 1), (1, 2), (5, 1)]

Shows how many posts had 0, 1, 2, 3, etc. emoji (number_of_emoji, count)

>>> emoji_summary['top_emoji']
[('😀', 4), ('💛', 2), ('😺', 1)]
>>> emoji_summary['top_emoji_text']
[('grinning face', 4), ('yellow heart', 2),
 ('grinning cat', 1)]
>>> emoji_summary['top_emoji_groups']
[('Smileys & Emotion', 7)]
>>> emoji_summary['top_emoji_sub_groups']
[('face-smiling', 4), ('emotion', 2), ('cat-face', 1)]
>>> emoji_summary['overview']
{'num_posts': 4,
 'num_emoji': 7,
 'emoji_per_post': 1.75,
 'unique_emoji': 3}