Emoji: Extract, Analyze, and Get Insights๏
An emoji is worth a thousand words! Regular expressions and helper functionality to aid in extracting and finding emoji from text.
|
A dictionary of the full emoji list together with unicode code points, textual name, group, and sub-group. Based on v15.1 https://unicode.org/Public/emoji/15.1/emoji-test.txt |
|
The same dictionary as a pandas DataFrame |
A function for extracting and summarizing emoji in a text list, with statistics about frequencies and usage. |
|
A function for searching across names, groups, and sub-groups to find emoji based on your keywords of choice. |
|
|
A regular expression to extract the full list. See here on how it was developed: https://www.kaggle.com/eliasdabbas/how-to-create-a-python-regex-to-extract-emoji |
Emoji Search๏
You can search the whole emoji database with the emoji_search()
function:
import advertools as adv
vegetable_emoji = adv.emoji_search('vegetable')
vegetable_emoji.head()
codepoint |
status |
emoji |
name |
group |
sub_group |
|
---|---|---|---|---|---|---|
0 |
1F951 |
fully-qualified |
๐ฅ |
avocado |
Food & Drink |
food-vegetable |
1 |
1F346 |
fully-qualified |
๐ |
eggplant |
Food & Drink |
food-vegetable |
2 |
1F954 |
fully-qualified |
๐ฅ |
potato |
Food & Drink |
food-vegetable |
3 |
1F955 |
fully-qualified |
๐ฅ |
carrot |
Food & Drink |
food-vegetable |
4 |
1F33D |
fully-qualified |
๐ฝ |
ear of corn |
Food & Drink |
food-vegetable |
Keep in mind that the search uses regular expression, and results might not be exactly what you expect.
love_emoji = adv.emoji_search('love')
love_emoji
codepoint |
status |
emoji |
name |
group |
sub_group |
|
---|---|---|---|---|---|---|
0 |
1F48C |
fully-qualified |
๐ |
love letter |
Smileys & Emotion |
emotion |
1 |
1F91F |
fully-qualified |
๐ค |
love-you gesture |
People & Body |
hand-fingers-partial |
2 |
1F91F 1F3FB |
fully-qualified |
๐ค๐ป |
love-you gesture: light skin tone |
People & Body |
hand-fingers-partial |
3 |
1F91F 1F3FC |
fully-qualified |
๐ค๐ผ |
love-you gesture: medium-light skin tone |
People & Body |
hand-fingers-partial |
4 |
1F91F 1F3FD |
fully-qualified |
๐ค๐ฝ |
love-you gesture: medium skin tone |
People & Body |
hand-fingers-partial |
5 |
1F91F 1F3FE |
fully-qualified |
๐ค๐พ |
love-you gesture: medium-dark skin tone |
People & Body |
hand-fingers-partial |
6 |
1F91F 1F3FF |
fully-qualified |
๐ค๐ฟ |
love-you gesture: dark skin tone |
People & Body |
hand-fingers-partial |
7 |
1F340 |
fully-qualified |
๐ |
four leaf clover |
Animals & Nature |
plant-other |
8 |
1F3E9 |
fully-qualified |
๐ฉ |
love hotel |
Travel & Places |
place-building |
9 |
1F94A |
fully-qualified |
๐ฅ |
boxing glove |
Activities |
sport |
10 |
1F9E4 |
fully-qualified |
๐งค |
gloves |
Objects |
clothing |
11 |
1F1F8 1F1EE |
fully-qualified |
๐ธ๐ฎ |
flag: Slovenia |
Flags |
country-flag |
Extract Emoji from Text๏
Many times you might have some social media text, or any regular text
containing emoji that you want to analyze. The extract_emoji()
function
does that, and returns useful information about the extracted emoji. You can
play around with the following sample text list, modify it, and explore the
different stats, and information about the extracted emoji:
text_list = ['I feel like playing basketball ๐',
'I like playing football โฝโฝ',
'Not feeling like sports today']
emoji_summary = adv.extract_emoji(text_list)
print(emoji_summary.keys())
- emoji_search(regex)[source]๏
Return a DataFrame of all emoji entries that match
regex
.The search is run on the name of the emoji, its group, and sub-group.
- Parameters:
regex (str) -- regular expression (case insensitive)
>>> import advertools as adv >>> adv.emoji_search("dog") codepoint status emoji name group sub_group 0 1F436 fully-qualified ๐ถ dog face Animals & Nature animal-mammal 1 1F415 fully-qualified ๐ dog Animals & Nature animal-mammal 2 1F9AE fully-qualified ๐ฆฎ guide dog Animals & Nature animal-mammal 3 1F415 200D 1F9BA fully-qualified ๐โ๐ฆบ service dog Animals & Nature animal-mammal 4 1F32D fully-qualified ๐ญ hot dog Food & Drink food-prepared
>>> blue = adv.emoji_search("blue") >>> blue codepoint status emoji name group sub_group 0 1F499 fully-qualified ๐ blue heart Smileys & Emotion emotion 1 1FAD0 fully-qualified ๐ซ blueberries Food & Drink food-fruit 2 1F4D8 fully-qualified ๐ blue book Objects book-paper 3 1F535 fully-qualified ๐ต blue circle Symbols geometric 4 1F7E6 fully-qualified ๐ฆ blue square Symbols geometric 5 1F537 fully-qualified ๐ท large blue diamond Symbols geometric 6 1F539 fully-qualified ๐น small blue diamond Symbols geometric
- extract_emoji(text_list)[source]๏
Return a summary dictionary about emoji in
text_list
Get a summary of the number of emoji, their frequency, the top ones, and more.
- Parameters:
text_list (list) -- A list of text strings.
- Returns summary:
A dictionary with various stats about emoji
>>> posts = [ ... "I am grinning ๐", ... "A grinning cat ๐บ", ... "hello! ๐๐๐ ๐๐", ... "Just text", ... ]
>>> emoji_summary = extract_emoji(posts) >>> emoji_summary.keys() dict_keys(['emoji', 'emoji_text', 'emoji_flat', 'emoji_flat_text', 'emoji_counts', 'emoji_freq', 'top_emoji', 'top_emoji_text', 'top_emoji_groups', 'top_emoji_sub_groups', 'overview'])
>>> emoji_summary["emoji"] [['๐'], ['๐บ'], ['๐', '๐', '๐', '๐', '๐'], []]
>>> emoji_summary["emoji_text"] [['grinning face'], ['grinning cat'], ['grinning face', 'grinning face', 'grinning face', 'yellow heart', 'yellow heart'], []]
A simple extract of emoji from each of the posts. An empty list if none exist
>>> emoji_summary["emoji_flat"] ['๐', '๐บ', '๐', '๐', '๐', '๐', '๐']
>>> emoji_summary["emoji_flat_text"] ['grinning face', 'grinning cat', 'grinning face', 'grinning face', 'grinning face', 'yellow heart', 'yellow heart']
All emoji in one flat list.
>>> emoji_summary["emoji_counts"] [1, 1, 5, 0]
The count of emoji per post.
>>> emoji_summary["emoji_freq"] [(0, 1), (1, 2), (5, 1)]
Shows how many posts had 0, 1, 2, 3, etc. emoji (number_of_emoji, count)
>>> emoji_summary["top_emoji"] [('๐', 4), ('๐', 2), ('๐บ', 1)]
>>> emoji_summary["top_emoji_text"] [('grinning face', 4), ('yellow heart', 2), ('grinning cat', 1)]
>>> emoji_summary["top_emoji_groups"] [('Smileys & Emotion', 7)]
>>> emoji_summary["top_emoji_sub_groups"] [('face-smiling', 4), ('emotion', 2), ('cat-face', 1)]
>>> emoji_summary["overview"] {'num_posts': 4, 'num_emoji': 7, 'emoji_per_post': 1.75, 'unique_emoji': 3}