IP Address to Country

From CodeCodex

It can often be useful to determine in which country an IP address appears to be located. There are commercial services that will even locate which city contains the address, but information down to country level is available at no charge from the four regional NICs.

Note that, for various reasons, this information is not absolutely reliable. Therefore, if you use it to conclude something about a user, give them an option to override that conclusion if it turns out to be wrong.

Also, the following scripts only work with IPv4 addresses. The NIC database files include IPv6 allocation information, but using that is left as an exercise for the reader. ☻

Perl[edit]

First, a script get_ip to fetch the latest versions of the NIC IP address allocations. These are saved as files apnic.ip, arin.ip, lacnic.ip and ripe.ip in the current directory. If they already exist, they are updated if newer versions are available. Then the IPv4 allocation information is extracted and written in a sorted, simplified format to a combined file all.ip, where it can be used by the second script.

This script relies on the very useful GNU wget utility to do the actual downloading of the database files.

use strict;

sub false {0;}
sub true {1;}

sub todottedquad # ($ip)
  # given an IP address as a decimal integer, returns its
  # standard formatted representation as a dotted quad.
  {
    my($ip) = @_;
    sprintf
      (
        "%d.%d.%d.%d",
        $ip >> 24 & 255,
        $ip >> 16 & 255,
        $ip >> 8 & 255,
        $ip & 255
      );
  } # todottedquad

sub fromdottedquad # ($ip)
  # given an IP address formatted in standard dotted-quad
  # notation, converts it to a decimal integer.
  {
    my @items = split(/\./, $_[0]);
        $items[0] << 24
    |
        $items[1] << 16
    |
        $items[2] << 8
    |
        $items[3];
  } # fromdottedquad

sub unknown_country {"??";}
    # guaranteed not to match any assigned country code

my %src_urls =
  # where to get IP allocation databases for all the regions of the world
  (
    "lacnic" => "ftp://ftp.lacnic.net/pub/stats/lacnic/delegated-lacnic-latest",
    "ripe" => "ftp://ftp.ripe.net/ripe/stats/delegated-ripencc-latest",
    "apnic" => "ftp://ftp.apnic.net/public/stats/apnic/delegated-apnic-latest",
    "arin" => "ftp://ftp.arin.net/pub/stats/arin/delegated-arin-latest",
  );
my $check_updates = true;
  # true to check for updated IP listings
my $always_rebuild_list = false;
  # false to only rebuild all.ip if IP listings have been updated
my $include_compression_info = false;
  # true to include extra fields in all.ip which may be useful
  # in compressing its contents
my $verbose = false; # whether to display messages

my($changed);
$changed = false;
if ($check_updates)
  {
    for my $name (keys %src_urls)
      {
        my($url, $cmd, $curfile, $newfile, $file_changed);
        my($signal, $status);
        $url = $src_urls{$name};
        $curfile = $name . ".ip";
        $newfile = $curfile . "-new";
        $cmd = "wget " . ($verbose ? "" : " -q ") . $url . " -O " . $newfile;
        if ($verbose)
          {
            print $cmd . "\n";
          } # if
        $status = system($cmd);
        $signal = $status & 127;
        $status >>= 8;
        if ($status == 0 && $signal == 0)
          {
            local(*Curfile, *Newfile);
            if (open(Curfile, $curfile))
              {
                open(Newfile, $newfile) || die "$! opening $newfile";
                $file_changed = false;
                for (;;)
                  {
                    my($cur, $new);
                    $cur = <Curfile>;
                    $new = <Newfile>;
                    if (!defined($cur) && !defined($new))
                      {
                        last; # files match right to the end
                      }
                    elsif (!defined($cur) || !defined($new) || $cur ne $new)
                      {
                        $file_changed = true;
                        last;
                      } # if
                  } # for
                close(Newfile);
                close(Curfile);
              }
            else
              {
                $file_changed = true;
              } # if
            if ($file_changed)
              {
                if ($verbose)
                  {
                    print "replace $curfile with $newfile\n";
                  } # if
                unlink($curfile);
                rename($newfile, $curfile) ||
                    die "$! renaming $newfile to $curfile";
                $changed = true;
              }
            else
              {
                if ($verbose)
                  {
                    print "$newfile unchanged, delete\n";
                  } # if
                unlink($newfile);
              } # if
          }
        else
          {
            print STDERR
                "? " . $url . " status = " . $status .
                ", signal = " . $signal . "\n";
          } # if
      } # for
  } # if $check_updates

if ($changed || $always_rebuild_list)
  {
    local(*File);
    my(%nraddrs, %country, %shifts, %increments);
    my($lastbaseaddr, $nextaddr, $lastcountry);
    %nraddrs = ();
    %country = ();
    for my $name (keys %src_urls)
      {
        my($line, @Items, $country, $baseaddr, $nraddrs, @items);
        open(File, $name . ".ip") || die "$! opening $name.ip";
        while ($line = <File>)
          {
            chomp $line;
            @items = split(/\|/, $line);
            if (@items == 7 && $items[2] eq "ipv4")
              {
                $country = $items[1];
                $baseaddr = fromdottedquad($items[3]);
                $nraddrs = $items[4];
                if ($nraddrs != 0)
                  {
                    $nraddrs{$baseaddr} = $nraddrs;
                    $country{$baseaddr} = $country;
                  } # if
              } # if
          } # while
        close(File);
      } # for each src url
    if (false)
      {
        # fill in allocated lowest and highest addresses with
        # unknown country code
        my(@addrs, $highest);
        @addrs = sort {$a <=> $b} keys %nraddrs;
        if (@addrs == 0 || $addrs[0] != 0)
          {
            $nraddrs{0} = $addrs[0];
            $country{0} = unknown_country;
            unshift(@addrs, 0);
          } # if
        $highest =
                $addrs[@addrs - 1] + $nraddrs{$addrs[@addrs - 1]}
            &
                0xFFFFFFFF;
        if ($highest != 0)
          {
            $nraddrs{$highest} = 0xFFFFFFFF - $highest + 1;
            $country{$highest} = unknown_country;
          } # if
      }
    $lastbaseaddr = 0;
    $nextaddr = 0;
    $lastcountry = unknown_country;
    for my $baseaddr ((sort {$a <=> $b} keys %nraddrs), 0)
      {
        # fill in unallocated ranges with unknown country code, and
        # collapse adjacent contiguous ranges with the same country code.
        # Note final wraparound of $baseaddr to 0 to fill out unallocated
        # addresses at high end.
        my($thisnraddrs, $thiscountry);
        $thisnraddrs = $nraddrs{$baseaddr};
        $thiscountry = $country{$baseaddr};
        if (!defined($thiscountry)) # will only happen on last iteration
          {
            $thisnraddrs = 0;
            $thiscountry = unknown_country;
          } # if
        # print "\$nraddrs{" . todottedquad($baseaddr) . "} = $thisnraddrs\n"; # debug
        if ($baseaddr == $nextaddr)
          {
            # this range contiguous to previous range
            # print "contiguous\n"; # debug
            if ($thiscountry eq $lastcountry)
              {
                # merge adjacent ranges for same country
                # print "merge \$nraddrs{" . todottedquad($lastbaseaddr) . "} = $nraddrs{$lastbaseaddr} += $thisnraddrs\n"; # debug
                $nraddrs{$lastbaseaddr} += $thisnraddrs;
                # print " \$nextaddr = " . todottedquad($nextaddr) . " => "; # debug
                $nextaddr = $nextaddr + $thisnraddrs & 0xFFFFFFFF;
                # print todottedquad($nextaddr) . "\n"; # debug
                delete $nraddrs{$baseaddr};
                delete $country{$baseaddr};
              }
            else
              {
                # print "nomerge\n"; # debug
                $lastbaseaddr = $baseaddr;
                $nextaddr = $baseaddr + $thisnraddrs & 0xFFFFFFFF;
                $lastcountry = $thiscountry;
              } # if
          }
        elsif ($baseaddr != 0 && $baseaddr < $nextaddr)
          {
            # overlap!
            # Assume it's the same country allocation
            # recorded multiple times, just delete this allocation
            # print "overlap!\n"; # debug
            delete $nraddrs{$baseaddr};
            delete $country{$baseaddr};
          }
        else # $baseaddr = 0 or $baseaddr > $nextaddr
          {
            # not contiguous to previous range, insert an
            # unknown-country range in-between (assumes neither
            # is for unknown country if actually present)
            if ($baseaddr < $nextaddr)
              {
                # wrapround on final range
                $nraddrs{$nextaddr} = (0xFFFFFFFF - $nextaddr) + 1;
              }
            else
              {
                $nraddrs{$nextaddr} = $baseaddr - $nextaddr;
              } # if
            # print "gap, " . todottedquad($baseaddr) . ", \$nraddrs{" . todottedquad($nextaddr) . "} = $nraddrs{$nextaddr}\n"; # debug
            $country{$nextaddr} = unknown_country;
            if ($baseaddr >= $nextaddr)
              {
                $lastbaseaddr = $baseaddr;
                $nextaddr = $baseaddr + $thisnraddrs & 0xFFFFFFFF;
                # print " " . todottedquad($baseaddr) . " + $thisnraddrs => " . todottedquad($nextaddr) . "\n"; # debug
                $lastcountry = $thiscountry;
              } # if
          } # if
      } # for
    open(File, ">all.ip");
    for my $baseaddr (sort {$a <=> $b} keys %nraddrs)
      {
        my($highaddr, $lowaddrstr, $highaddrstr);
        $highaddr = $baseaddr + $nraddrs{$baseaddr} - 1;
        $lowaddrstr = todottedquad($baseaddr);
        $highaddrstr = todottedquad($highaddr);
        print File join("|", $country{$baseaddr}, $lowaddrstr, $highaddrstr);
        if ($include_compression_info)
          {
            # include mask and increment information
            my($mask, $shift, $increment);
            $shift = 0;
            for (;;)
              {
                if ($shift == 32)
                  {
                    last;
                  } # if
                ++$shift;
                $mask = (1 << $shift) - 1;
                if (($baseaddr & $mask) != 0 || ($highaddr & $mask) != $mask)
                  {
                    --$shift;
                    last;
                  } # if
              } # for
            if (defined($shifts{$shift}))
              {
                ++$shifts{$shift};
              }
            else
              {
                $shifts{$shift} = 1;
              } # if
            $increment = ($highaddr >> $shift) - ($baseaddr >> $shift);
            if (defined($increments{$increment}))
              {
                ++$increments{$increment};
              }
            else
              {
                $increments{$increment} = 1;
              } # if
            print File join("|", "", $shift, sprintf("%08x", (1 << $shift) - 1), $increment);
          } # if
        print File "\n";
      } # for
    if ($include_compression_info)
      {
        print File "Nr lines: " . scalar(keys %nraddrs) . "\n";
        print File "Shifts: " . join(", ", map {$_ . " => " . $shifts{$_}} sort {$a <=> $b} keys %shifts) . "\n";
        print File "Increments:\n";
        for my $increment (sort {$a <=> $b} keys %increments)
          {
            print File "\t" . $increment . " => " . $increments{$increment} . "\n";
          } # for
      } # if
    close(File);
  } # if $changed || $always_rebuild_list

The above script could be run, say, once every week or two to keep your IP address database up to date.

Next, a script map_ip which, given one or more IPv4 addresses in the usual dotted-quad form, outputs the corresponding country codes.

*unknown_country = \"??";
%addr = (0 => $unknown_country);
@addr = (0);
# @addr = sort keys %addr; # will be true at end

sub find_addr_index # ($ip)
  # returns the low index in @addr of the range of IP addresses
  # containing $ip.
  {
    my($ip) = @_;
    my($low, $high, $mid);
    $low = 0;
    $high = @addr;
    for (;;)
      {
        # assert $addr[$low] <= $ip < $addr[$high]
        if ($low == $high - 1)
          {
            last;
          } # if
        $mid = int(($low + $high) / 2); # assert $addr[$mid] always defined
        if ($addr[$mid] <= $ip)
          {
            $low = $mid;
          }
        else
          {
            $high = $mid;
          } # if
      } # for
    $low;
  } # find_addr_index

$last_highaddr = 1;
open(IP, "all.ip") || die "$! trying to open all.ip";
while (<IP>) # assume sorted by $lowaddr and no overlapping ranges
  {
    my(@items, $country, $lowaddr, $highaddr, $place);
    @items = split(/\|/);
    $country = $items[0];
    $lowaddr = $items[1];
    $highaddr = $items[2];
    @items = split(/\./, $lowaddr);
    $lowaddr =
            $items[0] << 24
        |
            $items[1] << 16
        |
            $items[2] << 8
        |
            $items[3];
    @items = split(/\./, $highaddr);
    $highaddr =
            (
                $items[0] << 24
            |
                $items[1] << 16
            |
                $items[2] << 8
            |
                $items[3]
            )
        +
            1;
    if ($lowaddr > $last_highaddr)
      {
        $addr{$last_highaddr} = $unknown_country;
        push(@addr, $last_highaddr);
      } # if
    if ($addr{$addr[@addr - 1]} ne $country)
      {
        $addr{$lowaddr} = $country;
        push(@addr, $lowaddr);
      } # if
    $last_highaddr = $highaddr;
  } # while
close(IP);
$addr{$last_highaddr} = $unknown_country;
push(@addr, $last_highaddr);

if (@ARGV == 0)
  {
    die "\nUsage:\n\t$0 addr [addr...]\n";
  } # if
for my $ip (@ARGV)
  {
    print $ip . " => ";
    @items = split(/\./, $ip);
    $index = find_addr_index
      (
            $items[0] << 24
        |
            $items[1] << 16
        |
            $items[2] << 8
        |
            $items[3]
      );
    if ($index < @addr)
      {
        print $addr{$addr[$index]};
      }
    else
      {
        print $unknown_country;
      } # if
    print "\n";
  } # for

For instance, the command

./map_ip 67.15.172.20

returns the output

67.15.172.20 => US