Examples by type
from datetime import datetime
import numpy as np
import pandas as pd
LatLong
variables store a tuple containing the Latitude and Longitude of a point on the globe. These primitives transform that location (e.g. what country is it in) and can do comparions between multiple LatLong
variables (e.g. distance between them).
Calculates the distance between points in a city road grid.
from featuretools.primitives import CityblockDistance
cityblock_distance = CityblockDistance()
DC = (38, -77)
Boston = (43, -71)
NYC = (40, -74)
cityblock_distance([DC, DC], [NYC, Boston]) # DC -> NYC, DC -> Boston
0 301.518836 1 672.088624 dtype: float64
Determines the length of a path defined by a series of coordinates.
from featuretools.primitives import PathLength
path_length_km = PathLength(unit='kilometers')
path_length_km([(41.881832, -87.623177), (38.6270, -90.1994), (39.0997, -94.5786)])
805.5203180792812
Determines city/town corresponding to given Latitude and Longitude coordinates.
from featuretools.primitives import LatLongToCity
latlong_to_city = LatLongToCity()
latlong_to_city([(51.52, -0.17), (9.93, 76.25), (37.38, -122.08), (np.nan, np.nan)])
0 Bayswater 1 Cochin 2 Mountain View 3 None Name: results, dtype: object
from featuretools.primitives import PolarityScore
x = ['He loves dogs', 'She hates cats', 'There is a dog', '']
polarity_score = PolarityScore()
polarity_score(x)
0 0.677 1 -0.649 2 0.000 3 0.000 dtype: float64
Calculates the occurences of each different part of speech.
from featuretools.primitives import PartOfSpeechCount
x = ['He was eating cheese', '']
part_of_speech_count = PartOfSpeechCount()
part_of_speech_count(x)
0 [0.0, 0.0] 1 [0.0, 0.0] 2 [0.0, 0.0] 3 [0.0, 0.0] 4 [0.0, 0.0] 5 [1.0, 0.0] 6 [0.0, 0.0] 7 [0.0, 0.0] 8 [0.0, 0.0] 9 [1.0, 0.0] 10 [0.0, 0.0] 11 [0.0, 0.0] 12 [0.0, 0.0] 13 [1.0, 0.0] 14 [0.0, 0.0] dtype: object
These primitives transform DateOfBirth
type variables. They use the time of the feature calculation to extrapolate the current age of a person. This is set by using a cutoff time.
Calculates the age in years as a floating point number given a date of birth.
from featuretools.primitives import Age
reference_date = pd.to_datetime("01-01-2019")
age = Age()
input_ages = [pd.to_datetime("01-01-2000"),
pd.to_datetime("05-30-1983"),
pd.to_datetime("10-17-1997")]
age(input_ages, time=reference_date)
0 19.013699 1 35.616438 2 21.221918 dtype: float64
There are also primitives to see if a birth date falls within a give age range
Determines whether a person is over 18 years old given their date of birth.
from featuretools.primitives import AgeOver18
over18 = AgeOver18()
over18(input_ages, time=reference_date)
0 True 1 True 2 True dtype: bool
Determines whether a person is under 65 years old given their date of birth.
from featuretools.primitives import AgeUnder65
under65 = AgeUnder65()
under65(input_ages, time=reference_date)
0 True 1 True 2 True dtype: bool
from featuretools.primitives import DateToHoliday
date_to_holiday = DateToHoliday()
dates = pd.Series([datetime(2016, 1, 1),
datetime(2016, 2, 27),
datetime(2017, 5, 29, 10, 30, 5),
datetime(2018, 7, 4)])
date_to_holiday(dates)
array(["New Year's Day", nan, 'Memorial Day', 'Independence Day'], dtype=object)
Determines the number of unique calendar days.
from featuretools.primitives import NUniqueDaysOfCalendarYear
n_unique_days_of_calendar_year = NUniqueDaysOfCalendarYear()
times = [datetime(2019, 2, 1),
datetime(2019, 2, 1),
datetime(2018, 2, 1),
datetime(2019, 1, 1)]
n_unique_days_of_calendar_year(times)
2
from featuretools.primitives import PhoneNumberToCountry
phone_number_to_country = PhoneNumberToCountry()
phone_number_to_country(['+55 85 5555555', '+81 55-555-5555', '+1-541-754-3010',])
0 BR 1 JP 2 US dtype: object
from featuretools.primitives import ZIPCodeToState
zipcode_to_state = ZIPCodeToState()
zipcode_to_state(['60622', '94120', '02111-1253'])
0 IL 1 CA 2 MA dtype: object
Determines the median household income for a ZIP Code.
from featuretools.primitives import ZIPCodeToHouseholdIncome
zipcode_to_household_income = ZIPCodeToHouseholdIncome()
zipcode_to_household_income(["82838", "02116", "02116-3899"])
array([ 59000., 103422., 103422.])
The premium primitives have additional nmumeric primitives that add new mathematical transformations and aggregations that aren't present in the open-source library. They are frequently useful in time-series analysis
Determines the number of peaks in a list of numbers.
from featuretools.primitives import NumPeaks
num_peaks = NumPeaks()
num_peaks([-5, 0, 10, 0, 10, -5, -4, -5, 10, 0])
4
Determines the number of times a list crosses 0.
from featuretools.primitives import NumZeroCrossings
num_zero_crossings = NumZeroCrossings()
num_zero_crossings([1, -1, 2, -2, 3, -3])
5
Computes the correlation between two columns of values.
from featuretools.primitives import Correlation
correlation = Correlation()
array_1 = [1, 4, 6, 7]
array_2 = [1, 5, 9, 7]
correlation(array_1, array_2)
0.9221388919541468
Determines the number of values that fall outside a certain range.
from featuretools.primitives import CountOutsideRange
count_outside_range = CountOutsideRange(lower=1.5, upper=3.6)
count_outside_range([1, 2, 3, 4, 5])
3
from featuretools.primitives import FullNameToLastName
full_name_to_last_name = FullNameToLastName()
names = ['Woolf Spector', 'Oliva y Ocana, Dona. Fermina',
'Ware, Mr. Frederick', 'Peter, Michael J', 'Mr. Brown']
full_name_to_last_name(names)
0 Spector 1 Oliva y Ocana 2 Ware 3 Peter 4 Brown Name: last_name, dtype: object
Determines if an email address is from a free email domain.
from featuretools.primitives import IsFreeEmailDomain
is_free_email_domain = IsFreeEmailDomain()
is_free_email_domain(['name@gmail.com', 'name@featuretools.com'])
array([ True, False])
Determines the domain of a url.
from featuretools.primitives import URLToDomain
url_to_domain = URLToDomain()
urls = ['https://play.google.com', 'http://www.google.co.in', 'www.facebook.com']
url_to_domain(urls)
0 play.google.com 1 google.co.in 2 facebook.com dtype: object
Transforms a 2-digit or 3-digit ISO-3166-1 country code into Gross National Income (GNI) per capita.
from featuretools.primitives import CountryCodeToIncome
country_code_to_income = CountryCodeToIncome()
country_code_to_income(['USA', 'AM', 'EC'])
array([58270., 3990., 5920.])
Determines the median household income of a US sub-region.
from featuretools.primitives import SubRegionCodeToMedianHouseholdIncome
sub_region_code_to_median_household_income = SubRegionCodeToMedianHouseholdIncome()
subregions = ["US-AL", "US-IA", "US-VT", "US-DC", "US-MI", "US-NY"]
sub_region_code_to_median_household_income(subregions)
array([51113, 63481, 63805, 83382, 57700, 62447])
Determines the extension of a filepath.
from featuretools.primitives import FileExtension
file_extension = FileExtension()
file_extension(['doc.txt', '~/documents/data.json', 'file'])
0 .txt 1 .json 2 NaN dtype: object