To use python with mongo we need to use the pymongo package
pip install pymongo
, or via the anaconda applicationTo connect to our Database we need to instantiate a client connection. To do this wee need:
In addition we may sometimes need to provide an authSource. This simply tells Mongo where the information on our user exists.
from pymongo import MongoClient
client = MongoClient(host='18.219.151.47', #host is the hostname for the database
port=27017, #port is the port number that mongo is running on
username='student', #username for the db
password='emse6992pass', #password for the db
authSource='emse6992') #Since our user only exists for the emse6992 db, we need to specify this
*NOTE: NEVER hard encode your password!!!*
Verify the connection is working:
client.server_info()
{'version': '3.2.22', 'gitVersion': '105acca0d443f9a47c1a5bd608fd7133840a58dd', 'modules': [], 'allocator': 'tcmalloc', 'javascriptEngine': 'mozjs', 'sysInfo': 'deprecated', 'versionArray': [3, 2, 22, 0], 'openssl': {'running': 'OpenSSL 1.0.2g 1 Mar 2016', 'compiled': 'OpenSSL 1.0.2g 1 Mar 2016'}, 'buildEnvironment': {'distmod': 'ubuntu1604', 'distarch': 'x86_64', 'cc': '/opt/mongodbtoolchain/v2/bin/gcc: gcc (GCC) 5.4.0', 'ccflags': '-fno-omit-frame-pointer -fPIC -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -Werror -O2 -Wno-unused-local-typedefs -Wno-unused-function -Wno-deprecated-declarations -Wno-unused-but-set-variable -Wno-missing-braces -fno-builtin-memcmp', 'cxx': '/opt/mongodbtoolchain/v2/bin/g++: g++ (GCC) 5.4.0', 'cxxflags': '-Wnon-virtual-dtor -Woverloaded-virtual -Wno-maybe-uninitialized -std=c++11', 'linkflags': '-fPIC -pthread -Wl,-z,now -rdynamic -fuse-ld=gold -Wl,-z,noexecstack -Wl,--warn-execstack', 'target_arch': 'x86_64', 'target_os': 'linux'}, 'bits': 64, 'debug': False, 'maxBsonObjectSize': 16777216, 'storageEngines': ['devnull', 'ephemeralForTest', 'mmapv1', 'wiredTiger'], 'ok': 1.0}
Even if we have authenticated oursevles, we still need to tell Mongo what database and collections we are interested. Once connected those attributes are name addressable:
conn['database_name']
or conn.database_name
database['coll_name']
or database.coll_name
Connecting to the Database:
db = client.emse6992
# db = client['emse6992'] - Alternative method
Proof we're connected:
db.list_collection_names()
['places', 'twitter_lists', 'twitter_retweets', 'moviesdata', 'housingdata', 'restaurants', 'twitter_friends', 'twitter_favorites', 'test_collection', 'twitter_statuses']
Connecting to the Collections:
favs_coll = db.twitter_favorites
# favs_coll = db['twitter_favorites']
Proof this works:
doc = favs_coll.find_one({})
doc
{'_id': ObjectId('60064b31e991a9c376547e89'), 'created_at': datetime.datetime(2021, 1, 7, 20, 27, 25), 'favorite_count': 152, 'hashtags': [], 'id': 1347278689000513536, 'id_str': '1347278689000513536', 'in_reply_to_screen_name': 'elonmusk', 'in_reply_to_status_id': 1347278232077312000, 'in_reply_to_user_id': 44196397, 'lang': 'en', 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'text': '@elonmusk @4thFromOurStar I think it’s Mars that is playing hard to get or hard to get to. 😉', 'urls': [], 'user': {'created_at': 'Sat Jun 22 20:20:09 +0000 2019', 'default_profile': True, 'description': 'Just a 17 year old Tesla Shareholder🚗🔋Everything Elon🧠❤️Astrophotography📸🚀#TeslaTeens ⚔️ FUTURE Cybertruck owner', 'favourites_count': 20585, 'followers_count': 2390, 'friends_count': 526, 'geo_enabled': True, 'id': 1142527715670519808, 'id_str': '1142527715670519808', 'listed_count': 27, 'location': 'Phoenix, AZ', 'name': 'jordan🚀', 'profile_background_color': 'F5F8FA', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/1142527715670519808/1603392699', 'profile_image_url': 'http://pbs.twimg.com/profile_images/1347136422466129923/j1jTp9My_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1347136422466129923/j1jTp9My_normal.jpg', 'profile_link_color': '1DA1F2', 'profile_sidebar_border_color': 'C0DEED', 'profile_sidebar_fill_color': 'DDEEF6', 'profile_text_color': '333333', 'profile_use_background_image': True, 'screen_name': 'AstroJordy', 'statuses_count': 7532, 'url': 'https://t.co/lg6SohofDL'}, 'user_mentions': [{'id': 44196397, 'id_str': '44196397', 'name': 'Elon Musk', 'screen_name': 'elonmusk'}, {'id': 1091459141397180416, 'id_str': '1091459141397180416', 'name': 'Mars', 'screen_name': '4thFromOurStar'}], 'favorited_by_screen_name': '4thFromOurStar'}
doc['favorited_by_screen_name']
Once connected, we are ready to start querying the database.
The great thing about Python is it's integration with both JSON and Mongo, meaning that the Python Mongo API is almost exactly the same as Monog's own query API.
This method works exactly the same as the Mongo equivelant. In addition the interior logic is a direct 1-to-1 with Mongo's
doc = favs_coll.find_one({"favorited_by_screen_name": "elonmusk"})
doc
Using the twitter_favorites collection, find a singular status with a tesla hashtag
#Room for in-class work
doc = favs_coll.find_one({"hashtags.text": "tesla"},
{'hashtags': 1, 'user.screen_name': 1, 'user.description': 1})
print(doc)
Likewise pymongo's find() works exactly like mongo's console find() command. One thing to note find({})
returns a cursor (iterable), not an actual document.
In Class Questions:
find_one()
vs a list of documents find()
?docs = favs_coll.find({})
print(docs) # notice this is cursor, no actual data
print(docs[600]) # By indexing we can extract results from the query
We can prove the query executed correctly by iterating through all of the documents
# Our query
docs = favs_coll.find({"favorited_by_screen_name": "elonmusk"})
# Variable to store the state of the test
worked = True
# Iterate through each of the docs looking for an invalid state
for doc in docs:
if doc['favorited_by_screen_name'] != 'elonmusk':
worked = False
break
# If worked is still True, then our query worked (or at least passed this evaluation)
if worked:
print("Worked!!")
else:
print("Failed!")
Instead of iterating through the documents, we can also extract all of the documents at once by calling list(docs)
. This approach though comes with some drawbacks.
docs = favs_coll.find({"favorited_by_screen_name": "elonmusk"})
doc_lst = list(docs)
print(len(doc_lst))
docs.count()
Using the twitter_statuses collection, calculate the total number of favorites that elonmusk has received
stats_coll = db.twitter_statuses
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-1-72006d50c552> in <module> ----> 1 stats_coll = db.twitter_statuses NameError: name 'db' is not defined
#Room for in-class work
docs = stats_coll.find({'user.screen_name': 'elonmusk'})
tot = sum([doc.get('favorite_count', 0) for doc in docs])
print(tot)
Would we get the same result if we ran this processes against the twitter_favorites collection?
While pymongo's pattern system effectively parallels the mongo shell, there is one key exception:
In mongo shell the following is valid:
db.coll_name.find({"attr": {$exists: true}})
However, in pymongo this would be phrased as:
db.coll_name.find({"attr": {"$exists": True}})
Since $ isn't a valid value in python, these functions need to be wrapped as strings.
Using a mixture of mongo queries and python, determine if the person who has the most favorited tweet (*favorites collection*) in 2021 is a friend of Elon Musks (screen_name - 'elonmusk').
Note: Sorting with pymongo is slightly different - .sort([("field1", 1), ("field2", -1)])
# Space for work
from datetime import datetime
date = datetime(2021, 1, 1)
docs = favs_coll.find({"created_at": {"$gte": date}}).sort([('favorite_count', -1)])
user = docs[0].get('user').get('screen_name')
friends_coll = db.twitter_friends
doc = friends_coll.find_one({
"$and": [
{"screen_name": user},
{"friend_of_screen_name": 'elonmusk'}
]
})
if doc:
print("friends")
else:
print("not friends")
not friends
These methods enable us to insert one or more documents into the collection
Do not run the following sections!
Question: Will the following cell cause an error?
test_coll = db.test_collection
doc = test_coll.find_one({"test": "passed!"})
print(doc)
None
We can insert any valid object by simply calling:
coll_name.insert_one(doc)
Note: If we do not provide a _id
field in the document mongo will automatically create one. This means that there is nothing stopping us from inserting duplicate records
doc = {"test": "passed!"}
result = test_coll.insert_one(doc)
result.inserted_id
We can verify on the python side by querying for the record
doc = test_coll.find_one({"test": "passed!"})
print(doc)
We can also insert many documents at once:
coll_name.insert_many(docs)
#Don't run this - just for demonstration
docs = [{'test': 'passed-' + str(x)} for x in range(5)]
test_coll.insert_many(docs)
Verification:
# Since it's a sample collection it only has our inserted docs
docs = test_coll.find({})
docs_lst = list(docs)
for doc in docs_lst:
# This will simply help the formatting on the output
print(doc)
As discussed in the slides, these methods are used to modify an existing record.
While they are a bit more complexed than the other methods, I did want to provide a little example.
coll_name.update_one(find_pattern, update_pattern)
# Here we will be adding an attribute that indicates the document has been updated
test_coll.update_one({"test": "passed!"}, {"$set": {"updated": True}})
doc = test_coll.find_one({"test": "updated"})
print(doc)
Works the same way for coll_name.update_many(find_pattern, update_pattern)
test_coll.update_many({"test": {"$exists": True}}, {"$set": {"updated": True}})
docs = test_coll.find({})
for doc in docs:
# This will simply help the formatting on the output
print(doc)
Deleting records works almost the same was as updating, except we only provide a find_pattern to the method.
coll_name.delete_one(find_pattern)
result = test_coll.delete_one({"test": "updated"})
Now we shouldn't be able to find that document:
doc = test_coll.find_one({"test": "updated"})
print(doc)
We can also inspect the DeleteResult from the command:
print(result.raw_result)
print(result.deleted_count)
print(result.acknowledged)
Small example using coll_name.delete_many()
def num_field(field):
docs = test_coll.find({field: {"$exists": True}})
count = sum(1 for x in docs)
return(count)
print(num_field('test'))
test_coll.delete_many({'test': {"$exists": True}})
print(num_field('test'))
{
"name": `your_name`,
"favorite_movie": `movie_name`,
"favorite_bands": [
`band_name_1`,
`band_name_2`,
`etc.`
]
}
# Space for work
resp = test_coll.insert_one(
{
"name": "Joel",
"favorite_movie": 'Big Fish',
"favorite_bands": [
'Jon Bellion',
'Blink-182'
]
}
)
if resp.acknowledged:
print("Inserted")
Inserted
_id = resp.inserted_id
test_coll.find_one({"_id": _id})
{'_id': ObjectId('6019c687124fdc8e5d62b3ca'), 'name': 'Joel', 'favorite_movie': 'Big Fish', 'favorite_bands': ['Jon Bellion', 'Blink-182']}
resp = test_coll.delete_one({"_id": _id})
if resp.acknowledged:
print(f'{resp.deleted_count} documents removed')
1 documents removed