Stopwords in Several Languages¶

List of stopwords by the spaCy 1 package, useful in text mining, analyzing content of social media posts, tweets, web pages, keywords, etc.

Each list is accessible as part of a dictionary stopwords which is a normal Python dictionary.

Available Languages¶

  • Arabic

  • Azerbaijani

  • Bengali

  • Catalan

  • Chinese

  • Croatian

  • Danish

  • Dutch

  • English

  • Finnish

  • French

  • German

  • Greek

  • Hebrew

  • Hindi

  • Hungarian

  • Indonesian

  • Irish

  • Italian

  • Japanese

  • Kazakh

  • Nepali

  • Norwegian

  • Persian

  • Polish

  • Portuguese

  • Romanian

  • Russian

  • Sinhala

  • Spanish

  • Swedish

  • Tagalog

  • Tamil

  • Tatar

  • Telugu

  • Thai

  • Turkish

  • Ukrainian

  • Urdu

  • Vietnamese

>>> import advertools as adv
>>> sorted(adv.stopwords['english'])[:5]
["a", "about", "above", "across", "after"]
>>> sorted(adv.stopwords['german'])[:5]
["a", "ab", "aber", "ach", "acht"]

To get a list of all available languages, run

>>> adv.stopwords.keys()
dict_keys(['arabic', 'azerbaijani', 'bengali', 'catalan', 'chinese',
'croatian', 'danish', 'dutch', 'english', 'finnish', 'french',
'german', 'greek', 'hebrew', 'hindi', 'hungarian', 'indonesian',
'irish', 'italian', 'japanese', 'kazakh', 'nepali', 'norwegian',
'persian', 'polish', 'portuguese', 'romanian', 'russian', 'sinhala',
'spanish', 'swedish', 'tagalog', 'tamil', 'tatar', 'telugu', 'thai',
'turkish', 'ukrainian', 'urdu', 'vietnamese'])

Footnotes

1

Copyright (C) 2016 ExplosionAI UG (haftungsbeschränkt), 2016 spaCy GmbH, 2015 Matthew Honnibal