Reverse DNS Lookup in Bulk

Getting the host name of a list of IP addresses can be useful in verifying the authenticity of those IP addresses. You typically want to do this as part of a log file analysis pipeline. In this case you have requests made to your server claiming to be of a certain user agent/bot name. Performing a reverse_dns_lookup() on those IP addresses, will get you the actual host name that they belong to.

What the reverse_dns_lookup() function does, is simply like running the host command from the command line, but on a massive scale:

$ host 66.249.80.0
0.80.249.66.in-addr.arpa domain name pointer google-proxy-66-249-80-0.google.com.

Because you usually have a large number of duplicated IP addresses that you want to check, this function makes the process practical and efficient, in comparison to running the command thousands of times from the comand line.

Running the function is very simple, you simply supply a list of the IP addresses that you have. Make sure to keep the duplicates, because the function handles that for you, as well as provide counts and some statistics on the frequency of the IPs:

import advertools as adv
ip_list = ['66.249.66.194', '66.249.66.194', '66.249.66.194',
        '66.249.66.91', '66.249.66.91', '130.185.74.243',
        '31.56.96.51', '5.211.97.39']

host_df = adv.reverse_dns_lookup(ip_list)
host_df

ip_address

count

cum_count

perc

cum_perc

hostname

aliaslist

ipaddrlist

errors

0

66.249.66.194

3

3

0.375

0.375

crawl-66-249-66-194.googlebot.com

194.66.249.66.in-addr.arpa

66.249.66.194

1

66.249.66.91

2

5

0.25

0.625

crawl-66-249-66-91.googlebot.com

91.66.249.66.in-addr.arpa

66.249.66.91

2

130.185.74.243

1

6

0.125

0.75

mail.garda.ir

243.74.185.130.in-addr.arpa

130.185.74.243

3

31.56.96.51

1

7

0.125

0.875

31-56-96-51.shatel.ir

51.96.56.31.in-addr.arpa

31.56.96.51

4

5.211.97.39

1

8

0.125

1

[Errno 1] Unknown host

As you can see, in addition to getting hostnames, aliaslist, and ipaddrlist for the IPs you supplied, you also get counts (absolute and cumulative) as well as percentages (absolute and cumulative). This can give you a good overview of the relative importance of each IP, and can help focus attention on them as needed.

reverse_dns_lookup(ip_list, max_workers=600)[source]

Return the hostname, aliaslist, and ipaddrlist for a list of IP addresses.

This is mainly useful for a long list of typically duplicated IP adresses and helps in getting the information very fast. It is basically the equivalent of running the host command on the command line many times:

$ host advertools.readthedocs.io
advertools.readthedocs.io has address 104.17.32.82
Parameters:
  • ip_list (list) -- A list of IP addresses.

  • max_workers (int) -- The maximum number of workers to use for multi processing.

You also get a simple report about the counts of the IPs to get an overview of the top ones.

Examples

>>> import advertools as adv
>>> ip_list = [
...     "66.249.66.194",
...     "66.249.66.194",
...     "66.249.66.194",
...     "66.249.66.91",
...     "66.249.66.91",
...     "130.185.74.243",
...     "31.56.96.51",
...     "5.211.97.39",
... ]
>>> adv.reverse_dns_lookup([ip_list])

ip_address

count

cum_count

perc

cum_perc

hostname

aliaslist

ipaddrlist

errors

0

66.249.66.194

3

3

0.375

0.375

crawl-66-249-66-194.googlebot.com

194.66.249.66.in-addr.arpa

66.249.66.194

1

66.249.66.91

2

5

0.25

0.625

crawl-66-249-66-91.googlebot.com

91.66.249.66.in-addr.arpa

66.249.66.91

2

130.185.74.243

1

6

0.125

0.75

mail.garda.ir

243.74.185.130.in-addr.arpa

130.185.74.243

3

31.56.96.51

1

7

0.125

0.875

31-56-96-51.shatel.ir

51.96.56.31.in-addr.arpa

31.56.96.51

4

5.211.97.39

1

8

0.125

1

[Errno 1] Unknown host