Emoji: Extract, Analyze, and Get Insights¶
An emoji is worth a thousand words! Regular expressions and helper functionality to aid in extracting and finding emoji from text.
|
A dictionary of the full emoji list together with unicode code points, textual name, group, and sub-group. Based on v13.0 https://unicode.org/Public/emoji/13.0/emoji-test.txt |
|
The same dictionary as a pandas DataFrame |
A function for extracting and summarizing emoji in a text list, with statistics about frequencies and usage. |
|
A function for searching across names, groups, and sub-groups to find emoji based on your keywords of choice. |
|
|
A regular expression to extract the full list. See here on how it was developed: https://www.kaggle.com/eliasdabbas/how-to-create-a-python-regex-to-extract-emoji |
-
emoji_search
(regex)[source]¶ Return a DataFrame of all emoji entries where any description contains
regex
.“description” can be the name of the emoji, its group, or sub-group.
- Parameters
regex (str) – regular expression (case insensitive)
>>> emoji_search('dog') codepoint status emoji name group sub_group 0 1F436 fully-qualified 🐶 dog face Animals & Nature animal-mammal 1 1F415 fully-qualified 🐕 dog Animals & Nature animal-mammal 2 1F9AE fully-qualified 🦮 guide dog Animals & Nature animal-mammal 3 1F415 200D 1F9BA fully-qualified 🐕🦺 service dog Animals & Nature animal-mammal 4 1F32D fully-qualified 🌭 hot dog Food & Drink food-prepared
>>> blue = adv.emoji_search('blue') codepoint status emoji name group sub_group 0 1F499 fully-qualified 💙 blue heart Smileys & Emotion emotion 1 1FAD0 fully-qualified 🫐 blueberries Food & Drink food-fruit 2 1F4D8 fully-qualified 📘 blue book Objects book-paper 3 1F535 fully-qualified 🔵 blue circle Symbols geometric 4 1F7E6 fully-qualified 🟦 blue square Symbols geometric 5 1F537 fully-qualified 🔷 large blue diamond Symbols geometric 6 1F539 fully-qualified 🔹 small blue diamond Symbols geometric
-
extract_emoji
(text_list)[source]¶ Return a summary dictionary about emoji in
text_list
Get a summary of the number of emoji, their frequency, the top ones, and more.
- Parameters
text_list (list) – A list of text strings.
- Returns summary
A dictionary with various stats about emoji
>>> posts = ['I am grinning 😀','A grinning cat 😺', ... 'hello! 😀😀😀 💛💛', 'Just text']
>>> emoji_summary = extract_emoji(posts) >>> emoji_summary.keys() dict_keys(['emoji', 'emoji_text', 'emoji_flat', 'emoji_flat_text', 'emoji_counts', 'emoji_freq', 'top_emoji', 'top_emoji_text', 'top_emoji_groups', 'top_emoji_sub_groups', 'overview'])
>>> emoji_summary['emoji'] [['😀'], ['😺'], ['😀', '😀', '😀', '💛', '💛'], []]
>>> emoji_summary['emoji_text'] [['grinning face'], ['grinning cat'], ['grinning face', 'grinning face', 'grinning face', 'yellow heart', 'yellow heart'], []]
A simple extract of emoji from each of the posts. An empty list if none exist
>>> emoji_summary['emoji_flat'] ['😀', '😺', '😀', '😀', '😀', '💛', '💛']
>>> emoji_summary['emoji_flat_text'] ['grinning face', 'grinning cat', 'grinning face', 'grinning face', 'grinning face', 'yellow heart', 'yellow heart']
All emoji in one flat list.
>>> emoji_summary['emoji_counts'] [1, 1, 5, 0]
The count of emoji per post.
>>> emoji_summary['emoji_freq'] [(0, 1), (1, 2), (5, 1)]
Shows how many posts had 0, 1, 2, 3, etc. emoji (number_of_emoji, count)
>>> emoji_summary['top_emoji'] [('😀', 4), ('💛', 2), ('😺', 1)]
>>> emoji_summary['top_emoji_text'] [('grinning face', 4), ('yellow heart', 2), ('grinning cat', 1)]
>>> emoji_summary['top_emoji_groups'] [('Smileys & Emotion', 7)]
>>> emoji_summary['top_emoji_sub_groups'] [('face-smiling', 4), ('emotion', 2), ('cat-face', 1)]
>>> emoji_summary['overview'] {'num_posts': 4, 'num_emoji': 7, 'emoji_per_post': 1.75, 'unique_emoji': 3}