Stopwords in Several Languages¶
List of stopwords by the spaCy 1 package, useful in text mining, analyzing content of social media posts, tweets, web pages, keywords, etc.
Each list is accessible as part of a dictionary stopwords
which is a normal
Python dictionary.
Available Languages¶
Arabic
Azerbaijani
Bengali
Catalan
Chinese
Croatian
Danish
Dutch
English
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Indonesian
Irish
Italian
Japanese
Kazakh
Nepali
Norwegian
Persian
Polish
Portuguese
Romanian
Russian
Sinhala
Spanish
Swedish
Tagalog
Tamil
Tatar
Telugu
Thai
Turkish
Ukrainian
Urdu
Vietnamese
>>> import advertools as adv
>>> sorted(adv.stopwords['english'])[:5]
["a", "about", "above", "across", "after"]
>>> sorted(adv.stopwords['german'])[:5]
["a", "ab", "aber", "ach", "acht"]
To get a list of all available languages, run
>>> adv.stopwords.keys()
dict_keys(['arabic', 'azerbaijani', 'bengali', 'catalan', 'chinese',
'croatian', 'danish', 'dutch', 'english', 'finnish', 'french',
'german', 'greek', 'hebrew', 'hindi', 'hungarian', 'indonesian',
'irish', 'italian', 'japanese', 'kazakh', 'nepali', 'norwegian',
'persian', 'polish', 'portuguese', 'romanian', 'russian', 'sinhala',
'spanish', 'swedish', 'tagalog', 'tamil', 'tatar', 'telugu', 'thai',
'turkish', 'ukrainian', 'urdu', 'vietnamese'])
Footnotes
- 1
Copyright (C) 2016 ExplosionAI UG (haftungsbeschränkt), 2016 spaCy GmbH, 2015 Matthew Honnibal