Getting started with DeepMatcher

Note: you can run this notebook live in Google Colab and use free GPUs provided by Google.

This tutorial describes how to effortlessly perform entity matching using deep neural networks. Specifically, we will see how to match pairs of tuples (also called data records or table rows) to determine if they refer to the same real world entity. To do so, we will need labeled examples as input, i.e., tuple pairs which have been annotated as matches or non-matches. This will be used to train our neural network using supervised learning. At the end of this tutorial, you will have a trained neural network as output which you can easily apply to unlabeled tuple pairs to make predictions.

As an overview, here are the 4 steps to use deepmatcher which we will go through in this tutorial:

  1. Setup
  2. Process labeled data
  3. Define neural network model
  4. Train model
  5. Apply model to new data

Let's begin!

Step 0. Setup

If you are running this notebook inside Colab, you will first need to install necessary packages by running the code below:

In [1]:
try:
    import deepmatcher
except:
    !pip install -qqq deepmatcher

Now let's import deepmatcher which will do all the heavy lifting to build and train neural network models for entity matching.

In [2]:
import deepmatcher as dm

We recommend having a GPU available for the training in Step 4. In case a GPU is not available, we will use all available CPU cores. You can run the following command to determine if a GPU is available and will be used for training:

In [3]:
import torch
torch.cuda.is_available()
Out[3]:
True

Download sample data for entity matching

Now let's get some sample data to play with in this tutorial. We will need three sets of labeled data and one set of unlabeled data:

  1. Training Data: This is used for training our neural network model.
  2. Validation Data: This is used for determining the configuration (i.e., hyperparameters) of our model in such a way that the model does not overfit to the training set.
  3. Test Data: This is used to estimate the performance of our trained model on unlabeled data.
  4. Unlabeled Data: The trained model is applied on this data to obtain predictions, which can then be used for downstream tasks in practical application scenarios.

We download these four data sets to the sample_data directory:

In [4]:
!mkdir -p sample_data/itunes-amazon
!wget -qnc -P sample_data/itunes-amazon https://raw.githubusercontent.com/anhaidgroup/deepmatcher/master/examples/sample_data/itunes-amazon/train.csv
!wget -qnc -P sample_data/itunes-amazon https://raw.githubusercontent.com/anhaidgroup/deepmatcher/master/examples/sample_data/itunes-amazon/validation.csv
!wget -qnc -P sample_data/itunes-amazon https://raw.githubusercontent.com/anhaidgroup/deepmatcher/master/examples/sample_data/itunes-amazon/test.csv
!wget -qnc -P sample_data/itunes-amazon https://raw.githubusercontent.com/anhaidgroup/deepmatcher/master/examples/sample_data/itunes-amazon/unlabeled.csv

To get an idea of how our data looks like, let's take a peek at the training dataset:

In [5]:
import pandas as pd
pd.read_csv('sample_data/itunes-amazon/train.csv').head()
Out[5]:
id label left_Song_Name left_Artist_Name left_Album_Name left_Genre left_Price left_CopyRight left_Time left_Released right_Song_Name right_Artist_Name right_Album_Name right_Genre right_Price right_CopyRight right_Time right_Released
0 448 0 Baby When the Light ( David Guetta & Fred Rist... David Guetta Pop Life ( Extended Version ) [ Bonus Version ] Dance , Music , Rock , Pop , House , Electroni... $ 1.29 ‰ ãÑ 2007 Gum Records 6:17 18-Sep-07 Revolver ( Madonna Vs. David Guetta Feat . Lil... David Guetta One Love ( Deluxe Version ) Dance & Electronic $ 1.29 ( C ) 2014 Swedish House Mafia Holdings Ltd ( ... 3:18 August 21 , 2009
1 287 1 Outversion Mark Ronson Version Pop , Music , R&B / Soul,Soul,Dance,Rock,Jazz,... $ 0.99 2007 Mark Ronson under exclusive license to SO... 1:50 10-Jul-07 Outversion Mark Ronson Version [ Explicit ] Pop $ 0.99 ( c ) 2011 J'adore Records 1:50 July 10 , 2007
2 534 0 Peer Pressure ( feat . Traci Nelson ) Snoop Dogg Doggumentary Hip-Hop/Rap , Music , Rock , Gangsta Rap , Wes... $ 1.29 ‰ ãÑ 2011 Capitol Records , LLC . All rights r... 4:07 29-Mar-11 Boom ( ( Feat . T-Pain ) [ Edited ] ) Snoop Dogg Doggumentary [ Edited ] Rap & Hip-Hop , West Coast $ 1.29 ( C ) 2011 Capitol Records , LLC 3:50 March 29 , 2011
3 181 1 Stars Come Out ( Tim Mason Remix ) Zedd Stars Come Out ( Remixes ) - EP Dance , Music , Electronic , House $ 1.29 2012 Dim Mak Inc. 5:49 20-May-14 Stars Come Out ( Dillon Francis Remix ) Zedd Stars Come Out [ Dillon Francis Remix ] Dance & Electronic $ 1.29 2012 Dim Mak Inc. 4:08 May 20 , 2014
4 485 0 Jump ( feat . Nelly Furtado ) Flo Rida R.O.O.T.S. ( Deluxe Version ) Hip-Hop/Rap , Music $ 1.29 ‰ ãÑ 2009 Atlantic Recording Corporation for t... 3:28 30-Mar-09 Yayo [ Feat . Brisco , Billy Blue , Ball Greez... Flo Rida R.O.O.T.S. ( Route Of Overcoming The Struggle ... Rap & Hip-Hop $ 1.29 ( C ) 2012 Motown Records , a Division of UMG ... 7:53 March 30 , 2009

Step 1. Process labeled data

Before we can use our data for training, deepmatcher needs to first load and process it in order to prepare it for neural network training. Currently deepmatcher only supports processing CSV files. Each CSV file is assumed to have the following kinds of columns:

  • "Left" attributes (required): Our goal is to match tuple pairs. "Left" attributes are columns that correspond to the "left" tuple or the first tuple in the tuple pair. These column names are expected to be prefixed with "left_" by default.
  • "Right" attributes (required): "Right" attributes are columns that correspond to the "right" tuple or the second tuple in the tuple pair. These column names are expected to be prefixed with "right_" by default.
  • Label column (required for train, validation, test): Column containing the labels (match or non-match) for each tuple pair. Expected to be named "label" by default
  • ID column (required): Column containing a unique ID for each tuple pair. This is for evaluation convenience. Expected to be named "id" by default.

More details on what data processing involves and ways to customize it are described in this notebook.

Processing labeled data

In order to process our train, validation and test CSV files we call dm.data.process in the following code snippet which will load and process the CSV files and return three processed MatchingDataset objects respectively. These dataset objects will later be used for training and evaluation. The basic parameters to dm.data.process are as follows:

  • path (required): The path where all data is stored. This includes train, validation and test. deepmatcher may create new files in this directory to store information about these data sets. This allows subsequent dm.data.process calls to be much faster.
  • train (required): File name of training data in path directory.
  • validation (required): File name of validation data in path directory.
  • test (optional): File name of test data in path directory.
  • ignore_columns (optional): Any columns in the CSV files that you may want to ignore for the purposes of training. These should be included here.

Note that the train, validation and test CSVs must all share the same schema, i.e., they should have the same columns. Processing data involves several steps and can take several minutes to complete, especially if this is the first time you are running the deepmatcher package.

NOTE: If you are running this in Colab, you may get a message saying 'Memory usage is close to the limit.' You can safely ignore it for now. We are working on reducing the memory footprint.

In [6]:
train, validation, test = dm.data.process(
    path='sample_data/itunes-amazon',
    train='train.csv',
    validation='validation.csv',
    test='test.csv')

Peeking at processed data

Let's take a look at how the processed data looks like. To do this, we get the raw pandas table corresponding to the processed training dataset object.

In [7]:
train_table = train.get_raw_table()
train_table.head()
Out[7]:
id label left_Song_Name left_Artist_Name left_Album_Name left_Genre left_Price left_CopyRight left_Time left_Released right_Song_Name right_Artist_Name right_Album_Name right_Genre right_Price right_CopyRight right_Time right_Released
0 448 0 baby when the light ( david guetta & fred rist... david guetta pop life ( extended version ) [ bonus version ] dance , music , rock , pop , house , electroni... $ 1.29 ‰ ãñ 2007 gum records 6:17 18-sep-07 revolver ( madonna vs. david guetta feat . lil... david guetta one love ( deluxe version ) dance & electronic $ 1.29 ( c ) 2014 swedish house mafia holdings ltd ( ... 3:18 august 21 , 2009
1 287 1 outversion mark ronson version pop , music , r & b / soul , soul , dance , ro... $ 0.99 2007 mark ronson under exclusive license to so... 1:50 10-jul-07 outversion mark ronson version [ explicit ] pop $ 0.99 ( c ) 2011 j'adore records 1:50 july 10 , 2007
2 534 0 peer pressure ( feat . traci nelson ) snoop dogg doggumentary hip-hop/rap , music , rock , gangsta rap , wes... $ 1.29 ‰ ãñ 2011 capitol records , llc . all rights r... 4:07 29-mar-11 boom ( ( feat . t-pain ) [ edited ] ) snoop dogg doggumentary [ edited ] rap & hip-hop , west coast $ 1.29 ( c ) 2011 capitol records , llc 3:50 march 29 , 2011
3 181 1 stars come out ( tim mason remix ) zedd stars come out ( remixes ) - ep dance , music , electronic , house $ 1.29 2012 dim mak inc . 5:49 20-may-14 stars come out ( dillon francis remix ) zedd stars come out [ dillon francis remix ] dance & electronic $ 1.29 2012 dim mak inc . 4:08 may 20 , 2014
4 485 0 jump ( feat . nelly furtado ) flo rida r.o.o.t.s . ( deluxe version ) hip-hop/rap , music $ 1.29 ‰ ãñ 2009 atlantic recording corporation for t... 3:28 30-mar-09 yayo [ feat . brisco , billy blue , ball greez... flo rida r.o.o.t.s . ( route of overcoming the struggle... rap & hip-hop $ 1.29 ( c ) 2012 motown records , a division of umg ... 7:53 march 30 , 2009

The processed attribute values have been tokenized and lowercased so they may not look exactly the same as the input training data. These modifications help the neural network generalize better, i.e., perform better on data not trained on.

Step 2. Define neural network model

In this step you tell deepmatcher what kind of neural network you would like to use for entity matching. The easiest way to do this is to use one of the several kinds of neural network models that comes built-in with deepmatcher. To use a built-in network, construct a dm.MatchingModel as follows:

model = dm.MatchingModel(attr_summarizer='<TYPE>')

where <TYPE> is one of sif, rnn, attention or hybrid. If you are not familiar with what these mean, we strongly recommend taking a look at either slides from our talk on deepmatcher for a high level overview, or our paper for a more detailed explanation. Here we give briefly describe the intuition behind these four model types:

  • sif: This model considers the words present in each attribute value pair to determine a match or non-match. It does not take word order into account.
  • rnn: This model considers the sequences of words present in each attribute value pair to determine a match or non-match.
  • attention: This model considers the alignment of words present in each attribute value pair to determine a match or non-match. It does not take word order into account.
  • hybrid: This model considers the alignment of sequences of words present in each attribute value pair to determine a match or non-match. This is the default.

deepmatcher is highly customizable and allows you to tune almost every aspect of the neural network model for your application scenario. This tutorial discusses the structure of MatchingModels and how they can be customized.

For this tutorial, let's create a hybrid model for entity matching:

In [8]:
model = dm.MatchingModel(attr_summarizer='hybrid')

Step 3. Train model

Next, we train the defined neural network model using the processed training and validation data. To do so, we call the run_train method which takes the following basic parameters:

  • train: The processed training dataset object (of type MatchingDataset).
  • validation: The processed validation dataset object (of type MatchingDataset).
  • epochs: Number of times to go over the entire train data for training the model.
  • batch_size: Number of labeled examples (tuple pairs) to use for each training step. This value may be increased if you have a lot of training data and would like to speed up training. The optimal value is dataset dependent.
  • best_save_path: Path to save the best model.
  • pos_neg_ratio: The ratio of the weight of positive examples (matches) to weight of negative examples (non-matches). This value should be increased if you have fewer matches than non-matches in your data. The optimal value is dataset dependent.

Many other aspects of the training algorithm can be customized. For details on this, please refer the API documentation for run_train

In [9]:
model.run_train(
    train,
    validation,
    epochs=10,
    batch_size=16,
    best_save_path='hybrid_model.pth',
    pos_neg_ratio=3)
* Number of trainable parameters: 17757810
===>  TRAIN Epoch 1
0% [████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04
Finished Epoch 1 || Run Time:    4.1 | Load Time:    0.5 || F1:  46.15 | Prec:  35.42 | Rec:  66.23 || Ex/s:  70.08

===>  EVAL Epoch 1
0% [█] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00
Finished Epoch 1 || Run Time:    0.6 | Load Time:    0.2 || F1:  66.67 | Prec:  60.00 | Rec:  75.00 || Ex/s: 141.47

* Best F1: 66.66666666666667
Saving best model...
Done.
---------------------

===>  TRAIN Epoch 2
0% [████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04
Finished Epoch 2 || Run Time:    3.8 | Load Time:    0.5 || F1:  65.64 | Prec:  54.24 | Rec:  83.12 || Ex/s:  75.19

===>  EVAL Epoch 2
0% [█] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00
Finished Epoch 2 || Run Time:    0.6 | Load Time:    0.2 || F1:  60.32 | Prec:  48.72 | Rec:  79.17 || Ex/s: 143.34

---------------------

===>  TRAIN Epoch 3
0% [████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04
Finished Epoch 3 || Run Time:    3.9 | Load Time:    0.5 || F1:  80.23 | Prec:  71.00 | Rec:  92.21 || Ex/s:  73.89

===>  EVAL Epoch 3
0% [█] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00
Finished Epoch 3 || Run Time:    0.6 | Load Time:    0.2 || F1:  75.47 | Prec:  68.97 | Rec:  83.33 || Ex/s: 143.36

* Best F1: 75.47169811320755
Saving best model...
Done.
---------------------

===>  TRAIN Epoch 4
0% [████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04
Finished Epoch 4 || Run Time:    3.8 | Load Time:    0.5 || F1:  93.25 | Prec:  88.37 | Rec:  98.70 || Ex/s:  74.89

===>  EVAL Epoch 4
0% [█] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00
Finished Epoch 4 || Run Time:    0.6 | Load Time:    0.2 || F1:  84.00 | Prec:  80.77 | Rec:  87.50 || Ex/s: 143.36

* Best F1: 84.0
Saving best model...
Done.
---------------------

===>  TRAIN Epoch 5
0% [████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04
Finished Epoch 5 || Run Time:    3.9 | Load Time:    0.5 || F1:  95.00 | Prec:  91.57 | Rec:  98.70 || Ex/s:  73.68

===>  EVAL Epoch 5
0% [█] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00
Finished Epoch 5 || Run Time:    0.6 | Load Time:    0.2 || F1:  86.27 | Prec:  81.48 | Rec:  91.67 || Ex/s: 142.36

* Best F1: 86.27450980392157
Saving best model...
Done.
---------------------

===>  TRAIN Epoch 6
0% [████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04
Finished Epoch 6 || Run Time:    3.8 | Load Time:    0.5 || F1:  98.06 | Prec:  97.44 | Rec:  98.70 || Ex/s:  74.76

===>  EVAL Epoch 6
0% [█] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00
Finished Epoch 6 || Run Time:    0.6 | Load Time:    0.2 || F1:  82.14 | Prec:  71.88 | Rec:  95.83 || Ex/s: 142.86

---------------------

===>  TRAIN Epoch 7
0% [████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04
Finished Epoch 7 || Run Time:    3.9 | Load Time:    0.5 || F1:  99.35 | Prec:  98.72 | Rec: 100.00 || Ex/s:  74.46

===>  EVAL Epoch 7
0% [█] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00
Finished Epoch 7 || Run Time:    0.6 | Load Time:    0.2 || F1:  86.79 | Prec:  79.31 | Rec:  95.83 || Ex/s: 142.64

* Best F1: 86.79245283018868
Saving best model...
Done.
---------------------

===>  TRAIN Epoch 8
0% [████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04
Finished Epoch 8 || Run Time:    3.8 | Load Time:    0.5 || F1: 100.00 | Prec: 100.00 | Rec: 100.00 || Ex/s:  74.91

===>  EVAL Epoch 8
0% [█] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00
Finished Epoch 8 || Run Time:    0.6 | Load Time:    0.2 || F1:  88.46 | Prec:  82.14 | Rec:  95.83 || Ex/s: 142.52

* Best F1: 88.46153846153845
Saving best model...
Done.
---------------------

===>  TRAIN Epoch 9
0% [████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04
Finished Epoch 9 || Run Time:    3.8 | Load Time:    0.5 || F1: 100.00 | Prec: 100.00 | Rec: 100.00 || Ex/s:  74.72

===>  EVAL Epoch 9
0% [█] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00
Finished Epoch 9 || Run Time:    0.6 | Load Time:    0.2 || F1:  88.46 | Prec:  82.14 | Rec:  95.83 || Ex/s: 136.41

---------------------

===>  TRAIN Epoch 10
0% [████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:03
Finished Epoch 10 || Run Time:    3.8 | Load Time:    0.5 || F1: 100.00 | Prec: 100.00 | Rec: 100.00 || Ex/s:  76.23

===>  EVAL Epoch 10
0% [█] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00
Finished Epoch 10 || Run Time:    0.6 | Load Time:    0.2 || F1:  88.46 | Prec:  82.14 | Rec:  95.83 || Ex/s: 135.69

---------------------

Loading best model...
Training done.
Out[9]:
88.46153846153845

Step 4. Apply model to new data

Evaluating on test data

Now that we have a trained model for entity matching, we can now evaluate its accuracy on test data, to estimate the performance of the model on unlabeled data.

In [10]:
# Compute F1 on test set
model.run_eval(test)
===>  EVAL Epoch 8
Finished Epoch 8 || Run Time:    0.3 | Load Time:    0.2 || F1:  87.10 | Prec:  87.10 | Rec:  87.10 || Ex/s: 199.09

Out[10]:
87.09677419354838

Evaluating on unlabeled data

We finally apply the trained model to unlabeled data to get predictions. To do this, we need to first process the unlabeled data.

Processing unlabeled data

To process unlabeled data, we use dm.data.process_unlabeled, as shown in the code snippet below. The basic parameters for this call are as follows:

  • path (required): The full path to the unlabeled data file (not just the directory).
  • trained_model (required): The trained model. The model is aware of the configuration of the training data on which it was trained, and so deepmatcher reuses the same configuration for the unlabeled data.
  • ignore_columns (optional): Any columns in the unlabeled CSV file that you may want to ignore for the purposes of evaluation. If not specified, the columns that were ignored while processing the training set will also be ignored while processing the unlabeled data.

Note that the unlabeled CSV file must have the same schema as the train, validation and test CSVs.

In [11]:
unlabeled = dm.data.process_unlabeled(
    path='sample_data/itunes-amazon/unlabeled.csv',
    trained_model=model)

Obtaining predictions

Next, we call the run_prediction method which takes a processed data set object and returns a pandas dataframe containing tuple pair IDs (id column) and the corresponding match score predictions (match_score column). match_scores are in [0, 1] and a score above 0.5 indicates a match prediction.

In [12]:
predictions = model.run_prediction(unlabeled)
predictions.head()
===>  PREDICT Epoch 8
0% [██] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01
Finished Epoch 8 || Run Time:    0.9 | Load Time:    0.6 || F1:   0.00 | Prec:   0.00 | Rec:   0.00 || Ex/s:   0.00

Out[12]:
match_score
id
141999 0.165365
302034 0.349144
126354 0.698185
714676 0.186895
659997 0.770332

You may optionally set the output_attributes parameter to also include all attributes present in the original input table. As mentioned earlier, the processed attribute values will likely look a bit different from the attribute values in the input CSV files due to modifications such as tokenization and lowercasing.

In [13]:
predictions = model.run_prediction(unlabeled, output_attributes=True)
predictions.head()
===>  PREDICT Epoch 8
0% [██] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01
Finished Epoch 8 || Run Time:    0.9 | Load Time:    0.6 || F1:   0.00 | Prec:   0.00 | Rec:   0.00 || Ex/s:   0.00

Out[13]:
match_score left_Song_Name left_Artist_Name left_Album_Name left_Genre left_Price left_CopyRight left_Time left_Released right_Song_Name right_Artist_Name right_Album_Name right_Genre right_Price right_CopyRight right_Time right_Released
id
141999 0.165365 Erase ( Samantha Ronson Remix ) [ feat . Priya... The Chainsmokers Erase ( Samantha Ronson Remix ) [ feat . Priya... Pop , Music $ 1.29 2012 Interscope Records 4:09 4-Dec-12 Auuh Remix ( feat . Pacho , Cirilo & Angel Doze ) Javy the Flow Auuh Remix ( feat . Pacho , Cirilo & Angel Doze ) R&B , Dance & Electronic , Rap & Hip-Hop , Con... $ 0.99 2015 Javy the Flow 5:05 June 13 , 2015
302034 0.349144 True Colors ( feat . Nicki Minaj ) Wiz Khalifa Blacc Hollywood Hip-Hop/Rap , Music , Rap , East Coast Rap , H... $ 1.29 2014 Atlantic Recording Corporation for the Un... 4:15 19-Aug-14 House In The Hills ( feat . Curren $ y ) [ Cle... Wiz Khalifa Blacc Hollywood [ Clean ] Rap & Hip-Hop $ 0.99 2014 Atlantic Recording Corporation for the Un... 4:52 August 19 , 2014
126354 0.698185 Biggest Man in Los Angeles Andy Grammer Andy Grammer Pop , Music , Rock , Adult Alternative $ 1.29 ‰ ãÑ 2011 S-Curve Records 3:55 14-Jun-11 Biggest Man In Los Angeles Andy Grammer Andy Grammer [ + Video ] [ + Digital Booklet ] Pop $ 0.99 ( C ) 2011 S-Curve Records 3:54 June 14 , 2011
714676 0.186895 Good for You ( feat . A$ AP Rocky ) [ KASBO Re... Selena Gomez Good for You ( feat . A$ AP Rocky ) [ Remixes ... Pop , Music $ 1.29 2015 Interscope Records 3:41 4-Sep-15 Forget Forever ( STå # FAN Remix ) Selena Gomez For You Pop $ 1.29 ( C ) 2014 Hollywood Records , Inc. 3:46 November 24 , 2014
659997 0.770332 Rewind ( feat . Monty ) Fetty Wap Fetty Wap Hip-Hop/Rap , Music , Rap Album Only 2015 300 Entertainment/RGF Productions 5:36 NaN Rewind ( feat . Monty ) [ Clean ] Fetty Wap Fetty Wap ( Deluxe ) [ Clean ] Rap & Hip-Hop $ 1.29 ( C ) 2000 Mute Records Ltd. , a BMG Company ,... 5:36 September 25 , 2015

You can then save these predictions to CSV and use them for downstream tasks.

In [14]:
predictions.to_csv('sample_data/itunes-amazon/unlabeled_predictions.csv')

Getting predictions on labeled data

You can also get predictions for labeled data such as validation data. To do so, you can simply call the run_prediction method passing the validation data as argument.

In [15]:
valid_predictions = model.run_prediction(validation, output_attributes=True)
valid_predictions.head()
===>  PREDICT Epoch 8
Finished Epoch 8 || Run Time:    0.4 | Load Time:    0.2 || F1:  88.46 | Prec:  82.14 | Rec:  95.83 || Ex/s: 192.18

Out[15]:
match_score label left_Song_Name left_Artist_Name left_Album_Name left_Genre left_Price left_CopyRight left_Time left_Released right_Song_Name right_Artist_Name right_Album_Name right_Genre right_Price right_CopyRight right_Time right_Released
id
299 0.210376 0 Blame ( feat . John Newman ) [ BURNS Remix ] Calvin Harris Blame ( Remixes ) [ feat . John Newman ] - EP Dance , Music $ 1.29 2014 Sony Music Entertainment UK Limited 4:17 10-Nov-14 Pray to God ( R3hab Remix ) Calvin Harris feat . HAIM Pray to God ( Remixes ) Dance & Electronic $ 1.29 ( C ) 2015 Third Pardee Records , LLC under ex... 4:31 September 25 , 2015
48 0.991802 1 Afire Love Ed Sheeran x Singer/Songwriter , Music , Pop , Rock , Conte... $ 1.29 2014 Asylum Records UK , a Warner Music UK Com... 5:14 20-Jun-14 Afire Love Ed Sheeran x Pop $ 1.29 ( C ) 2014 mau5trap Recordings Ltd 5:14 June 20 , 2014
34 0.187756 0 Lifted ( feat . Emeli Sand ' © & Professor Gre... Naughty Boy Hotel Cabana ( Deluxe Version ) Pop , Music , Rock , R&B / Soul , Contemporary... $ 1.29 2013 Naughty Boy Recordings Ltd under exclusiv... 4:15 6-May-14 Never Been The Same [ feat . Thabo ] Naughty Boy Hotel Cabana ( Deluxe Version ) [ Explicit ] Pop $ 1.29 ( C ) 2014 Naughty Boy Recordings Ltd under ex... 3:29 May 6 , 2014
215 0.986806 1 Jack Daniels and Jesus Chase Rice Ignite the Night Country , Music , Urban Cowboy , Contemporary ... $ 1.29 2014 Dack Janiels Records 3:55 19-Aug-14 Jack Daniels and Jesus Chase Rice Ignite the Night ( Party Edition ) [ Explicit ] Country $ 0.89 ( C ) 2014 Dack Janiels Records , under exclus... 3:55 August 19 , 2014
376 0.172195 0 Turn Around ( 5,4,3,2,1 ) [ AK Remix ] Flo Rida Turn Around ( 5,4,3,2,1 ) [ Remixes ] Dance,Music,Hip-Hop / Rap , Dirty South , Rap ... $ 1.29 2011 Atlantic Recording Corporation for the Un... 5:09 19-Apr-11 Good Feeling ( Jaywalker Remix ) Flo Rida Good Feeling ( Remixes ) Dance & Electronic $ 1.29 ( C ) 2009 MPL Tours Inc. under exclusive lice... 4:51 November 15 , 2011