crawllogs_to_dfFunction for analyzing the logs of the crawling process
robotstxt_to_dfcan download multiple robots.txt files in one go
advertools: productivity & analysis tools to scale your online marketing¶
You might be doing basic stuff, like copying and pasting text on spread sheets, you might be running large scale automated platforms with sophisticated algorithms, or somewhere in between. In any case your job is all about working with data.
As a data scientist you don’t spend most of your time producing cool visualizations or finding great insights. The majority of your time is spent wrangling with URLs, figuring out how to stitch together two tables, hoping that the dates, won’t break, without you knowing, or trying to generate the next 124,538 keywords for an upcoming campaign, by the end of the week!
advertools is a Python package that can hopefully make that part of your job a little easier.
pip install advertools # OR: pip3 install advertools
The most important thing to achieve in SEM is a proper mapping between the three main elements of a search campaign
Keywords (the intention) -> Ads (your promise) -> Landing Pages (your delivery of the promise) Once you have this done, you can focus on management and analysis. More importantly, once you know that you can set this up in an easy way, you know you can focus on more strategic issues. In practical terms you need two main tables to get started:
Keywords: You can generate keywords (note I didn't say research) with the kw_generate function.
Ads: There are two approaches that you can use:
Bottom-up: You can create text ads for a large number of products by simple replacement of product names, and providing a placeholder in case your text is too long. Check out the ad_create function for more details.
Top-down: Sometimes you have a long description text that you want to split into headlines, descriptions and whatever slots you want to split them into. ad_from_string helps you accomplish that.
Tutorials and additional resources
Probably the most comprehensive online marketing area that is both technical (crawling, indexing, rendering, redirects, etc.) and non-technical (content creation, link building, outreach, etc.). Here are some tools that can help with your SEO
SEO crawler: A generic SEO crawler that can be customized, built with Scrapy, & with several features:
Standard SEO elements extracted by default (title, header tags, body text, status code, reponse and request headers, etc.)
CSS and XPath selectors: You probably have more specific needs in mind, so you can easily pass any selectors to be extracted in addition to the standard elements being extracted
Custom settings: full access to Scrapy's settings, allowing you to better control the crawling behavior (set custom headers, user agent, stop spider after x pages, seconds, megabytes, save crawl logs, run jobs at intervals where you can stop and resume your crawls, which is ideal for large crawls or for continuous monitoring, and many more options)
Following links: option to only crawl a set of specified pages or to follow and discover all pages through links
robots.txt downloader A simple downloader of robots.txt files in a DataFrame format, so you can keep track of changes across crawls if any, and check the rules, sitemaps, etc.
XML Sitemaps downloader / parser An essential part of any SEO analysis is to check XML sitemaps. This is a simple function with which you can download one or more sitemaps (by providing the URL for a robots.txt file, a sitemap file, or a sitemap index
SERP importer and parser for Google & YouTube Connect to Google's API and get the search data you want. Multiple search parameters supported, all in one function call, and all results returned in a DataFrame
Tutorials and additional resources
A visual tool built with the
serp_googfunction to get SERP rankings on Google
A tutorial on analyzing SERPs on a large scale with Python on SEMrush
SERP datasets on Kaggle for practicing on different industries and use cases
SERP notebooks on Kaggle some examples on how you might tackle such data
Function names mostly start with the object you are working on, so you can use autocomplete to discover other options:
kw_: for keywords-related functions
ad_: for ad-related functions
url_: URL tracking and generation
extract_: for extracting entities from social media posts (mentions, hashtags, emoji, etc.)
emoji_: emoji related functions and objects
youtube: a module for querying the YouTube Data API and getting results in a DataFrame
serp_: get search engine results pages in a DataFrame, currently available: Google and YouTube
crawl: a function you will probably use a lot if you do SEO
*_to_df: a set of convenience functions for converting to DataFrames (XML sitemaps, robots.txt files, and lists of URLs)