Reverse DNS Lookup in Bulk
Getting the host name of a list of IP addresses can be useful in verifying
the authenticity of those IP addresses. You typically want to do this as part
of a log file analysis pipeline. In this case you have requests
made to your server claiming to be of a certain user agent/bot name. Performing
a reverse_dns_lookup()
on those IP addresses, will get you the actual
host name that they belong to.
What the reverse_dns_lookup()
function does, is simply like running the
host command from the command line, but on a massive scale:
$ host 66.249.80.0
0.80.249.66.in-addr.arpa domain name pointer google-proxy-66-249-80-0.google.com.
Because you usually have a large number of duplicated IP addresses that you want to check, this function makes the process practical and efficient, in comparison to running the command thousands of times from the comand line.
Running the function is very simple, you simply supply a list of the IP addresses that you have. Make sure to keep the duplicates, because the function handles that for you, as well as provide counts and some statistics on the frequency of the IPs:
import advertools as adv
ip_list = ['66.249.66.194', '66.249.66.194', '66.249.66.194',
'66.249.66.91', '66.249.66.91', '130.185.74.243',
'31.56.96.51', '5.211.97.39']
host_df = adv.reverse_dns_lookup(ip_list)
host_df
ip_address |
count |
cum_count |
perc |
cum_perc |
hostname |
aliaslist |
ipaddrlist |
errors |
|
---|---|---|---|---|---|---|---|---|---|
0 |
66.249.66.194 |
3 |
3 |
0.375 |
0.375 |
crawl-66-249-66-194.googlebot.com |
194.66.249.66.in-addr.arpa |
66.249.66.194 |
|
1 |
66.249.66.91 |
2 |
5 |
0.25 |
0.625 |
crawl-66-249-66-91.googlebot.com |
91.66.249.66.in-addr.arpa |
66.249.66.91 |
|
2 |
130.185.74.243 |
1 |
6 |
0.125 |
0.75 |
mail.garda.ir |
243.74.185.130.in-addr.arpa |
130.185.74.243 |
|
3 |
31.56.96.51 |
1 |
7 |
0.125 |
0.875 |
31-56-96-51.shatel.ir |
51.96.56.31.in-addr.arpa |
31.56.96.51 |
|
4 |
5.211.97.39 |
1 |
8 |
0.125 |
1 |
[Errno 1] Unknown host |
As you can see, in addition to getting hostnames, aliaslist, and ipaddrlist for the IPs you supplied, you also get counts (absolute and cumulative) as well as percentages (absolute and cumulative). This can give you a good overview of the relative importance of each IP, and can help focus attention on them as needed.
- reverse_dns_lookup(ip_list, max_workers=600)[source]
Return the hostname, aliaslist, and ipaddrlist for a list of IP addresses.
This is mainly useful for a long list of typically duplicated IP adresses and helps in getting the information very fast. It is basically the equivalent of running the host command on the command line many times:
$ host advertools.readthedocs.io advertools.readthedocs.io has address 104.17.32.82
- Parameters:
ip_list (list) -- A list of IP addresses.
max_workers (int) -- The maximum number of workers to use for multi processing.
You also get a simple report about the counts of the IPs to get an overview of the top ones.
Examples
>>> import advertools as adv >>> ip_list = [ ... "66.249.66.194", ... "66.249.66.194", ... "66.249.66.194", ... "66.249.66.91", ... "66.249.66.91", ... "130.185.74.243", ... "31.56.96.51", ... "5.211.97.39", ... ] >>> adv.reverse_dns_lookup([ip_list])
ip_address
count
cum_count
perc
cum_perc
hostname
aliaslist
ipaddrlist
errors
0
66.249.66.194
3
3
0.375
0.375
crawl-66-249-66-194.googlebot.com
194.66.249.66.in-addr.arpa
66.249.66.194
1
66.249.66.91
2
5
0.25
0.625
crawl-66-249-66-91.googlebot.com
91.66.249.66.in-addr.arpa
66.249.66.91
2
130.185.74.243
1
6
0.125
0.75
mail.garda.ir
243.74.185.130.in-addr.arpa
130.185.74.243
3
31.56.96.51
1
7
0.125
0.875
31-56-96-51.shatel.ir
51.96.56.31.in-addr.arpa
31.56.96.51
4
5.211.97.39
1
8
0.125
1
[Errno 1] Unknown host