This module contains two classes that allow you to look up the Geolocation of IP Addresses.
You must have msticpy installed to run this notebook:
%pip install --upgrade msticpy
This product includes GeoLite2 data created by MaxMind, available from https://www.maxmind.com.
This uses a local database which is downloaded first time when class object is instantiated. It gives very fast lookups but you need to download updates regularly. Maxmind offers a free tier of this database, updated monthly. For greater accuracy and more detailed information they have varying levels of paid service. Please check out their site for more details.
The geoip module uses official maxmind pypi package - geoip2 and also has options to customize the behavior of local maxmind database.
db_folder
: Specify custom path containing local maxmind city database. If not specified, download to .msticpy dir under user`s home dir.force_update
: can be set to True/False to issue force update despite of age check.This library uses services provided by ipstack. https://ipstack.com
IPStack is an online service and also offers a free tier of their service. Again, the paid tiers offer greater accuracy, more detailed information and higher throughput. Please check out their site for more details.
# Imports
import sys
MIN_REQ_PYTHON = (3,6)
if sys.version_info < MIN_REQ_PYTHON:
print('Check the Kernel->Change Kernel menu and ensure that Python 3.6')
print('or later is selected as the active kernel.')
sys.exit("Python %s.%s or later is required.\n" % MIN_REQ_PYTHON)
from IPython.display import display
import pandas as pd
import msticpy.sectools as sectools
from msticpy.nbtools import *
from msticpy.nbtools.entityschema import IpAddress, GeoLocation
from msticpy.sectools.geoip import GeoLiteLookup, IPStackLookup
Signature:
iplocation.lookup_ip(ip_address: str = None,
ip_addr_list: collections.abc.Iterable = None,
ip_entity: msticpy.nbtools.entityschema.IpAddress = None)
Docstring:
Lookup IP location from GeoLite2 data created by MaxMind.
Keyword Arguments:
ip_address {str} -- a single address to look up (default: {None})
ip_addr_list {Iterable} -- a collection of addresses to lookup (default: {None})
ip_entity {IpAddress} -- an IpAddress entity
Returns:
tuple(list{dict}, list{entity}) -- returns raw geolocation results and
same results as IP/Geolocation entities
iplocation = GeoLiteLookup()
loc_result, ip_entity = iplocation.lookup_ip(ip_address='90.156.201.97')
print('Raw result')
display(loc_result)
print('IP Address Entity')
display(ip_entity[0])
Raw result
[{'continent': {'code': 'EU', 'geoname_id': 6255148, 'names': {'de': 'Europa', 'en': 'Europe', 'es': 'Europa', 'fr': 'Europe', 'ja': 'ヨーロッパ', 'pt-BR': 'Europa', 'ru': 'Европа', 'zh-CN': '欧洲'}}, 'country': {'geoname_id': 2017370, 'iso_code': 'RU', 'names': {'de': 'Russland', 'en': 'Russia', 'es': 'Rusia', 'fr': 'Russie', 'ja': 'ロシア', 'pt-BR': 'Rússia', 'ru': 'Россия', 'zh-CN': '俄罗斯联邦'}}, 'location': {'accuracy_radius': 1000, 'latitude': 55.7386, 'longitude': 37.6068, 'time_zone': 'Europe/Moscow'}, 'registered_country': {'geoname_id': 2017370, 'iso_code': 'RU', 'names': {'de': 'Russland', 'en': 'Russia', 'es': 'Rusia', 'fr': 'Russie', 'ja': 'ロシア', 'pt-BR': 'Rússia', 'ru': 'Россия', 'zh-CN': '俄罗斯联邦'}}, 'traits': {'ip_address': '90.156.201.97', 'prefix_len': 17}}]
IP Address Entity
import tempfile
from pathlib import Path
tmp_folder = tempfile.gettempdir()
iplocation = GeoLiteLookup(db_folder=str(Path(tmp_folder).joinpath('geolite')))
loc_result, ip_entity = iplocation.lookup_ip(ip_address='90.156.201.97')
print('Raw result')
display(loc_result)
print('IP Address Entity')
display(ip_entity[0])
Raw result
[{'continent': {'code': 'EU', 'geoname_id': 6255148, 'names': {'de': 'Europa', 'en': 'Europe', 'es': 'Europa', 'fr': 'Europe', 'ja': 'ヨーロッパ', 'pt-BR': 'Europa', 'ru': 'Европа', 'zh-CN': '欧洲'}}, 'country': {'geoname_id': 2017370, 'iso_code': 'RU', 'names': {'de': 'Russland', 'en': 'Russia', 'es': 'Rusia', 'fr': 'Russie', 'ja': 'ロシア', 'pt-BR': 'Rússia', 'ru': 'Россия', 'zh-CN': '俄罗斯联邦'}}, 'location': {'accuracy_radius': 1000, 'latitude': 55.7386, 'longitude': 37.6068, 'time_zone': 'Europe/Moscow'}, 'registered_country': {'geoname_id': 2017370, 'iso_code': 'RU', 'names': {'de': 'Russland', 'en': 'Russia', 'es': 'Rusia', 'fr': 'Russie', 'ja': 'ロシア', 'pt-BR': 'Rússia', 'ru': 'Россия', 'zh-CN': '俄罗斯联邦'}}, 'traits': {'ip_address': '90.156.201.97', 'prefix_len': 17}}]
IP Address Entity
iplocation = GeoLiteLookup(force_update=True)
loc_result, ip_entity = iplocation.lookup_ip(ip_address='90.156.201.97')
print('Raw result')
display(loc_result)
print('IP Address Entity')
display(ip_entity[0])
force_update is set to True. Attempting to download new database to C:\Users\Ian\.msticpy\GeoLite2 Downloading and extracting GeoLite DB archive from MaxMind.... Raw result
e:\src\microsoft\msticpy\msticpy\sectools\geoip.py:609: UserWarning: Error writing GeoIP DB file: C:\Users\Ian\.msticpy\GeoLite2\GeoLite2-City.mmdb - [Errno 22] Invalid argument: 'C:\\Users\\Ian\\.msticpy\\GeoLite2\\GeoLite2-City.mmdb' warnings.warn(f"Error writing GeoIP DB file: {db_file_path} - {err}") e:\src\microsoft\msticpy\msticpy\sectools\geoip.py:536: UserWarning: DB download failed warnings.warn("DB download failed") e:\src\microsoft\msticpy\msticpy\sectools\geoip.py:540: UserWarning: Continuing with cached database. Results may inaccurate. "Continuing with cached database. Results may inaccurate."
[{'continent': {'code': 'EU', 'geoname_id': 6255148, 'names': {'de': 'Europa', 'en': 'Europe', 'es': 'Europa', 'fr': 'Europe', 'ja': 'ヨーロッパ', 'pt-BR': 'Europa', 'ru': 'Европа', 'zh-CN': '欧洲'}}, 'country': {'geoname_id': 2017370, 'iso_code': 'RU', 'names': {'de': 'Russland', 'en': 'Russia', 'es': 'Rusia', 'fr': 'Russie', 'ja': 'ロシア', 'pt-BR': 'Rússia', 'ru': 'Россия', 'zh-CN': '俄罗斯联邦'}}, 'location': {'accuracy_radius': 1000, 'latitude': 55.7386, 'longitude': 37.6068, 'time_zone': 'Europe/Moscow'}, 'registered_country': {'geoname_id': 2017370, 'iso_code': 'RU', 'names': {'de': 'Russland', 'en': 'Russia', 'es': 'Rusia', 'fr': 'Russie', 'ja': 'ロシア', 'pt-BR': 'Rússia', 'ru': 'Россия', 'zh-CN': '俄罗斯联邦'}}, 'traits': {'ip_address': '90.156.201.97', 'prefix_len': 17}}]
IP Address Entity
iplocation = GeoLiteLookup(auto_update=False)
loc_result, ip_entity = iplocation.lookup_ip(ip_address='90.156.201.97')
print('Raw result')
display(loc_result)
print('IP Address Entity')
display(ip_entity[0])
Raw result
[{'continent': {'code': 'EU', 'geoname_id': 6255148, 'names': {'de': 'Europa', 'en': 'Europe', 'es': 'Europa', 'fr': 'Europe', 'ja': 'ヨーロッパ', 'pt-BR': 'Europa', 'ru': 'Европа', 'zh-CN': '欧洲'}}, 'country': {'geoname_id': 2017370, 'iso_code': 'RU', 'names': {'de': 'Russland', 'en': 'Russia', 'es': 'Rusia', 'fr': 'Russie', 'ja': 'ロシア', 'pt-BR': 'Rússia', 'ru': 'Россия', 'zh-CN': '俄罗斯联邦'}}, 'location': {'accuracy_radius': 1000, 'latitude': 55.7386, 'longitude': 37.6068, 'time_zone': 'Europe/Moscow'}, 'registered_country': {'geoname_id': 2017370, 'iso_code': 'RU', 'names': {'de': 'Russland', 'en': 'Russia', 'es': 'Rusia', 'fr': 'Russie', 'ja': 'ロシア', 'pt-BR': 'Rússia', 'ru': 'Россия', 'zh-CN': '俄罗斯联邦'}}, 'traits': {'ip_address': '90.156.201.97', 'prefix_len': 17}}]
IP Address Entity
import socket
socket_info = socket.getaddrinfo("pypi.org",0,0,0,0)
ips = [res[4][0] for res in socket_info]
print(ips)
_, ip_entities = iplocation.lookup_ip(ip_addr_list=ips)
display(ip_entities)
['151.101.128.223', '151.101.192.223', '151.101.0.223', '151.101.64.223']
[IpAddress(Address=151.101.128.223, Location={ 'AdditionalData': {}, 'CountryCode': 'US',...), IpAddress(Address=151.101.192.223, Location={ 'AdditionalData': {}, 'CountryCode': 'US',...), IpAddress(Address=151.101.0.223, Location={ 'AdditionalData': {}, 'CountryCode': 'US', ...), IpAddress(Address=151.101.64.223, Location={ 'AdditionalData': {}, 'CountryCode': 'US', ...)]
Note - requires IPStack API Key, Optional parameter bulk_lookup allows multiple IPs in a single request. This is only available with the paid Professional tier and above.
Init signature: IPStackLookup(api_key: str, bulk_lookup: bool = False)
Docstring:
GeoIP Lookup using IPStack web service.
Raises:
ConnectionError -- Invalid status returned from http request
PermissionError -- Service refused request (e.g. requesting batch of addresses
on free tier API key)
Init docstring:
Create a new instance of IPStackLookup.
Arguments:
api_key {str} -- API Key from IPStack - see https://ipstack.com
bulk_lookup {bool} -- For Professional and above tiers allowing you to
submit multiple IPs in a single request.
Signature:
iplocation.lookup_ip(
['ip_address: str = None', 'ip_addr_list: collections.abc.Iterable = None', 'ip_entity: msticpy.nbtools.entityschema.IpAddress = None'],
) -> tuple
Docstring:
Lookup IP location from IPStack web service.
Keyword Arguments:
ip_address {str} -- a single address to look up (default: {None})
ip_addr_list {Iterable} -- a collection of addresses to lookup (default: {None})
ip_entity {IpAddress} -- an IpAddress entity
Raises:
ConnectionError -- Invalid status returned from http request
PermissionError -- Service refused request (e.g. requesting batch of addresses
on free tier API key)
Returns:
tuple(list{dict}, list{entity}) -- returns raw geolocation results and
same results as IP/Geolocation entities
iplocation = IPStackLookup()
# Enter your IPStack Key here (if not set in msticpyconfig.yaml)
ips_key = nbwidgets.GetEnvironmentKey(env_var='IPSTACK_AUTH',
help_str='To obtain an API key sign up here https://www.ipstack.com/',
prompt='IPStack API key:')
if not iplocation.settings.args.get("AuthKey"):
ips_key.display()
HTML(value='To obtain an API key sign up here https://www.ipstack.com/')
import os
if not iplocation.settings.args.get("AuthKey") and not ips_key.value:
raise ValueError("No Authentication key in config/environment or supplied by user.")
if ips_key.value:
iplocation = IPStackLookup(api_key=ips_key.value)
if "MSTICPY_SKIP_IPSTACK_TEST" not in os.environ:
loc_result, ip_entity = iplocation.lookup_ip(ip_address='90.156.201.97')
print('Raw result')
display(loc_result)
print('IP Address Entity')
display(ip_entity[0])
Raw result
[({'ip': '90.156.201.97', 'type': 'ipv4', 'continent_code': 'AS', 'continent_name': 'Asia', 'country_code': 'RU', 'country_name': 'Russia', 'region_code': 'MOW', 'region_name': 'Moscow', 'city': 'Moscow', 'zip': '115088', 'latitude': 55.712608337402344, 'longitude': 37.68056869506836, 'location': {'geoname_id': 524901, 'capital': 'Moscow', 'languages': [{'code': 'ru', 'name': 'Russian', 'native': 'Русский'}], 'country_flag': 'http://assets.ipstack.com/flags/ru.svg', 'country_flag_emoji': '🇷🇺', 'country_flag_emoji_unicode': 'U+1F1F7 U+1F1FA', 'calling_code': '7', 'is_eu': False}}, 200)]
IP Address Entity
if "MSTICPY_SKIP_IPSTACK_TEST" not in os.environ:
loc_result, ip_entities = iplocation.lookup_ip(ip_addr_list=ips)
print('Raw results')
display(loc_result)
print('IP Address Entities')
display(ip_entities)
Raw results
[({'ip': '151.101.128.223', 'type': 'ipv4', 'continent_code': 'NA', 'continent_name': 'North America', 'country_code': 'US', 'country_name': 'United States', 'region_code': 'CA', 'region_name': 'California', 'city': 'San Francisco', 'zip': '94107', 'latitude': 37.76784896850586, 'longitude': -122.39286041259766, 'location': {'geoname_id': 5391959, 'capital': 'Washington D.C.', 'languages': [{'code': 'en', 'name': 'English', 'native': 'English'}], 'country_flag': 'http://assets.ipstack.com/flags/us.svg', 'country_flag_emoji': '🇺🇸', 'country_flag_emoji_unicode': 'U+1F1FA U+1F1F8', 'calling_code': '1', 'is_eu': False}}, 200), ({'ip': '151.101.192.223', 'type': 'ipv4', 'continent_code': 'NA', 'continent_name': 'North America', 'country_code': 'US', 'country_name': 'United States', 'region_code': 'CA', 'region_name': 'California', 'city': 'San Francisco', 'zip': '94107', 'latitude': 37.76784896850586, 'longitude': -122.39286041259766, 'location': {'geoname_id': 5391959, 'capital': 'Washington D.C.', 'languages': [{'code': 'en', 'name': 'English', 'native': 'English'}], 'country_flag': 'http://assets.ipstack.com/flags/us.svg', 'country_flag_emoji': '🇺🇸', 'country_flag_emoji_unicode': 'U+1F1FA U+1F1F8', 'calling_code': '1', 'is_eu': False}}, 200), ({'ip': '151.101.0.223', 'type': 'ipv4', 'continent_code': 'NA', 'continent_name': 'North America', 'country_code': 'US', 'country_name': 'United States', 'region_code': 'CA', 'region_name': 'California', 'city': 'San Francisco', 'zip': '94107', 'latitude': 37.76784896850586, 'longitude': -122.39286041259766, 'location': {'geoname_id': 5391959, 'capital': 'Washington D.C.', 'languages': [{'code': 'en', 'name': 'English', 'native': 'English'}], 'country_flag': 'http://assets.ipstack.com/flags/us.svg', 'country_flag_emoji': '🇺🇸', 'country_flag_emoji_unicode': 'U+1F1FA U+1F1F8', 'calling_code': '1', 'is_eu': False}}, 200), ({'ip': '151.101.64.223', 'type': 'ipv4', 'continent_code': 'NA', 'continent_name': 'North America', 'country_code': 'US', 'country_name': 'United States', 'region_code': 'CA', 'region_name': 'California', 'city': 'San Francisco', 'zip': '94107', 'latitude': 37.76784896850586, 'longitude': -122.39286041259766, 'location': {'geoname_id': 5391959, 'capital': 'Washington D.C.', 'languages': [{'code': 'en', 'name': 'English', 'native': 'English'}], 'country_flag': 'http://assets.ipstack.com/flags/us.svg', 'country_flag_emoji': '🇺🇸', 'country_flag_emoji_unicode': 'U+1F1FA U+1F1F8', 'calling_code': '1', 'is_eu': False}}, 200)]
IP Address Entities
[IpAddress(Address=151.101.128.223, Location={ 'AdditionalData': {}, 'City': 'San Francis...), IpAddress(Address=151.101.192.223, Location={ 'AdditionalData': {}, 'City': 'San Francis...), IpAddress(Address=151.101.0.223, Location={ 'AdditionalData': {}, 'City': 'San Francisco...), IpAddress(Address=151.101.64.223, Location={ 'AdditionalData': {}, 'City': 'San Francisc...)]
The base class for both implementations has a method that sources the ip addresses from a dataframe column and returns a new dataframe with the location information merged with the input frame
Signature: iplocation.df_lookup_ip(data: pandas.core.frame.DataFrame, column: str)
Docstring:
Lookup Geolocation data from a pandas Dataframe.
Keyword Arguments:
data {pd.DataFrame} -- pandas dataframe containing IpAddress column
column {str} -- the name of the dataframe column to use as a source
import pandas as pd
netflow_df = pd.read_csv("data/az_net_flows.csv")
netflow_df = netflow_df[["AllExtIPs"]].drop_duplicates()
iplocation = GeoLiteLookup()
iplocation.df_lookup_ip(netflow_df, column="AllExtIPs")
AllExtIPs | CountryCode | CountryName | State | City | Longitude | Latitude | Asn | edges | Type | AdditionalData | IpAddress | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 65.55.44.109 | US | United States | Virginia | Boydton | -78.3750 | 36.6534 | None | {} | geolocation | {} | 65.55.44.109 |
1 | 13.71.172.128 | CA | Canada | Ontario | Toronto | -79.4195 | 43.6644 | None | {} | geolocation | {} | 13.71.172.128 |
2 | 13.71.172.130 | CA | Canada | Ontario | Toronto | -79.4195 | 43.6644 | None | {} | geolocation | {} | 13.71.172.130 |
3 | 40.124.45.19 | US | United States | Texas | San Antonio | -98.4926 | 29.4221 | None | {} | geolocation | {} | 40.124.45.19 |
4 | 104.43.212.12 | US | United States | Iowa | Des Moines | -93.6127 | 41.6015 | None | {} | geolocation | {} | 104.43.212.12 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
82 | 20.41.41.23 | US | United States | Virginia | Boydton | -78.3750 | 36.6534 | None | {} | geolocation | {} | 20.41.41.23 |
83 | 52.179.17.38 | US | United States | Virginia | Washington | -78.1539 | 38.7095 | None | {} | geolocation | {} | 52.179.17.38 |
84 | 157.55.134.142 | US | United States | Virginia | Washington | -78.1539 | 38.7095 | None | {} | geolocation | {} | 157.55.134.142 |
85 | 172.217.15.110 | US | United States | None | None | -97.8220 | 37.7510 | None | {} | geolocation | {} | 172.217.15.110 |
86 | 40.91.75.5 | US | United States | Washington | None | -122.3412 | 47.6032 | None | {} | geolocation | {} | 40.91.75.5 |
87 rows × 12 columns
You can derive a class that implements the same operations to use with a different GeoIP service.
The class signature is as follows:
class GeoIpLookup(ABC):
"""Abstract base class for GeoIP Lookup classes."""
@abstractmethod
def lookup_ip(self, ip_address: str = None, ip_addr_list: Iterable = None,
ip_entity: IpAddress = None):
"""
Lookup IP location.
Keyword Arguments:
ip_address {str} -- a single address to look up (default: {None})
ip_addr_list {Iterable} -- a collection of addresses to lookup (default: {None})
ip_entity {IpAddress} -- an IpAddress entity
Returns:
tuple(list{dict}, list{entity}) -- returns raw geolocation results and
same results as IP/Geolocation entities
"""
You should override the lookup_ip method implementing your own method of geoip lookup.
Use the geo_distance function from msticpy.sectools.geoip to calculated distances between two locations. I am indebted to Martin Thoma who posted this solution (which I've modified slightly) on Stackoverflow.
Signature: geo_distance(origin: Tuple[float, float], destination: Tuple[float, float]) -> float
Docstring:
Calculate the Haversine distance.
Author: Martin Thoma - stackoverflow
Parameters
----------
origin : tuple of float
(lat, long)
destination : tuple of float
(lat, long)
Returns
-------
distance_in_km : float
Or where you have source and destination IpAddress entities, you can use the wrapper entity_distance.
Signature:
entity_distance(
['ip_src: msticpy.nbtools.entityschema.IpAddress', 'ip_dest: msticpy.nbtools.entityschema.IpAddress'],
) -> float
Docstring:
Return distance between two IP Entities.
Arguments:
ip_src {IpAddress} -- Source IpAddress Entity
ip_dest {IpAddress} -- Destination IpAddress Entity
Raises:
AttributeError -- if either entity has no location information
Returns:
float -- Distance in kilometers.
from msticpy.sectools.geoip import geo_distance
_, ip_entity1 = iplocation.lookup_ip(ip_address='90.156.201.97')
_, ip_entity2 = iplocation.lookup_ip(ip_address='151.101.64.223')
print(ip_entity1[0])
print(ip_entity2[0])
dist = geo_distance(origin=(ip_entity1[0].Location.Latitude, ip_entity1[0].Location.Longitude),
destination=(ip_entity2[0].Location.Latitude, ip_entity2[0].Location.Longitude))
print(f'\nDistance between IP Locations = {round(dist, 1)}km')
{ 'AdditionalData': {}, 'Address': '90.156.201.97', 'Location': { 'AdditionalData': {}, 'CountryCode': 'RU', 'CountryName': 'Russia', 'Latitude': 55.7386, 'Longitude': 37.6068, 'Type': 'geolocation', 'edges': set()}, 'ThreatIntelligence': [], 'Type': 'ipaddress', 'edges': set()} { 'AdditionalData': {}, 'Address': '151.101.64.223', 'Location': { 'AdditionalData': {}, 'CountryCode': 'US', 'CountryName': 'United States', 'Latitude': 37.751, 'Longitude': -97.822, 'Type': 'geolocation', 'edges': set()}, 'ThreatIntelligence': [], 'Type': 'ipaddress', 'edges': set()} Distance between IP Locations = 8796.8km