First of all, we should change the Azure Storage version to 0.20.0 so as to use the module BlobService
, which is no longer available in the latest version, to download data from the cloud in this notebook.
Note: Restart the kernel after updating the package.
# change azure storage version in order to use BlobService
# should restart the kernel each time update the package
!pip install azure-storage==0.20.0
Requirement already satisfied: azure-storage==0.20.0 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (0.20.0) Requirement already satisfied: requests in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-storage==0.20.0) (2.19.1) Requirement already satisfied: azure-common in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-storage==0.20.0) (1.1.16) Requirement already satisfied: python-dateutil in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-storage==0.20.0) (2.7.3) Requirement already satisfied: azure-nspkg in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-storage==0.20.0) (2.0.0) Requirement already satisfied: urllib3<1.24,>=1.21.1 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests->azure-storage==0.20.0) (1.23) Requirement already satisfied: idna<2.8,>=2.5 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests->azure-storage==0.20.0) (2.6) Requirement already satisfied: certifi>=2017.4.17 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests->azure-storage==0.20.0) (2017.7.27.1) Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests->azure-storage==0.20.0) (3.0.4) Requirement already satisfied: six>=1.5 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from python-dateutil->azure-storage==0.20.0) (1.11.0)
import os
import numpy as np
import pandas as pd
import azureml
from azureml.core import Workspace, Run
print("Azure ML SDK Version: ", azureml.core.VERSION)
Azure ML SDK Version: 0.1.65
# azure storage settings
azure_storage_account_name = "xiangzhestorage"
azure_storage_account_key = "sWfbekdvTokmuf6Odq5D+e3mC73uabkxieUxIG28hkU0undIjLKje1dqkyNnxm1T/zS4nEHIiislRLDEStamSA=="
from azure.storage.blob import BlobService
blob_service = BlobService(account_name = azure_storage_account_name, account_key = azure_storage_account_key)
data_folder = './data'
os.makedirs(data_folder, exist_ok=True)
data_file_path = "./data/data_after_prep.pkl"
blob_service.get_blob_to_path("xiangzhe-container", "data_after_prep.pkl", data_file_path)
data_file_path = "./data/data_after_prep.pkl"
pd_dataframe = pd.read_pickle(data_file_path)
pd_dataframe.head()
vendor_id | pickup_year | pickup_month | pickup_monthday | pickup_weekday | pickup_hour | pickup_minute | pickup_second | dropoff_year | dropoff_month | ... | dropoff_minute | dropoff_second | passenger_count | pickup_longitude | pickup_latitude | dropoff_longitude | dropoff_latitude | store_and_fwd_flag | trip_duration | distance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | 2016 | 03 | 14 | Monday | 17 | 24 | 55 | 2016 | 03 | ... | 32 | 30 | 1 | -73.982155 | 40.767937 | -73.964630 | 40.765602 | N | 455 | 1.498521 |
1 | 1 | 2016 | 06 | 12 | Sunday | 00 | 43 | 35 | 2016 | 06 | ... | 54 | 38 | 1 | -73.980415 | 40.738564 | -73.999481 | 40.731152 | N | 663 | 1.805507 |
2 | 2 | 2016 | 01 | 19 | Tuesday | 11 | 35 | 24 | 2016 | 01 | ... | 10 | 48 | 1 | -73.979027 | 40.763939 | -74.005333 | 40.710087 | N | 2124 | 6.385098 |
3 | 2 | 2016 | 04 | 06 | Wednesday | 19 | 32 | 31 | 2016 | 04 | ... | 39 | 40 | 1 | -74.010040 | 40.719971 | -74.012268 | 40.706718 | N | 429 | 1.485498 |
4 | 2 | 2016 | 03 | 26 | Saturday | 13 | 30 | 55 | 2016 | 03 | ... | 38 | 10 | 1 | -73.973053 | 40.793209 | -73.972923 | 40.782520 | N | 435 | 1.188588 |
5 rows × 23 columns