advertools
master
  • About advertools
  • Survey - your feedback
  • Command Line Interface

SEM

  • Generate SEM Keywords
  • Create Text Ads on a Large Scale
  • Create Text Ads From Description Text

SEO

  • robots.txt
  • XML Sitemaps
  • SEO Spider / Crawler
  • Crawl Strategies
  • Crawl headers (HEAD method only)
  • Log File Analysis
  • Parse and Analyze Crawl Logs in a Dataframe
  • Reverse DNS Lookup
  • Analyze Search Engine Results (SERPs)
  • Google's Knowledge Graph

Text & Content Analysis

  • URL Structure Analysis
  • Emoji Tools
  • Extract Structured Entities from Text
  • Stop Words
  • Text Analysis (absolute & weighted word frequency)
  • Word Tokenization (N-grams)

Social Media

  • Twitter Data API
  • YouTube Data API

Index & Change Log

  • Index & Change Log
advertools
  • »
  • advertools »
  • advertools package »
  • advertools.code_recipes package
  • Edit on GitHub

advertools.code_recipes package

Submodules

  • 🕷 SEO Crawling & Scraping: Strategies & Recipes
    • How to crawl a list of pages, and those pages only (list mode)?
    • How can I crawl a website including its sub-domains?
    • How can I save a copy of the logs of my crawl for auditing them later?
    • How can I automatically stop my crawl based on a certain condition?
    • How can I (dis)obey robots.txt rules?
    • How do I set my User-agent while crawling?
    • How can I control the number of concurrent requests while crawling?
    • How can I slow down the crawling so I don't hit the websites' servers too hard?
    • How can I set multiple settings to the same crawl job?
    • I want to crawl a list of pages, follow links from those pages, but only to a certain specified depth
    • How do I pause/resume crawling, while making sure I don't crawl the same page twice?
    • How do I use a proxy while crawling?
    • How can I change the default request headers?
    • XPath expressions for custom extraction

Module contents

Previous Next

© Copyright 2022, Elias Dabbas. Revision dee48b7c.

Built with Sphinx using a theme provided by Read the Docs.
Read the Docs v: master
Versions
master
latest
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds