MongoDB with Python - pyMongo - Intermediate I

MongoDB Cursor Iteration in Python

In [1]:
from pymongo import MongoClient, ASCENDING, DESCENDING
import pprint
import random

con = MongoClient("mongodb://mdbuser:[email protected]:27017,cluster0-shard-00-01-mswpe.mongodb.net:27017,\
cluster0-shard-00-02-mswpe.mongodb.net:27017/Cluster0?ssl=true&replicaSet=Cluster0-shard-0&authSource=admin&retryWrites=true")
print(con.list_database_names())

db = con.Cluster0
print(db.list_collection_names())
['Cluster0', 'admin', 'local']
['user', 'movies_scratch', 'testidx', 'movies_initial']

Iterating through result cursor

In [2]:
list(db.user.find({'State':'UP'}))
Out[2]:
[{'_id': ObjectId('5c26fb7932245fbaaa24e47e'),
  'Name': 'Ashley',
  'Age': 26,
  'City': 'Kanpur',
  'grades': [15, 15, 17, 15],
  'details': [{'grade': 10, 'mean': 15}, {'grade': 13, 'mean': 17}],
  'State': 'UP',
  'Marital_Status': None,
  'Subject': ['French', 'English', 'Art']},
 {'_id': ObjectId('5c27f207802bd99ce0af5151'),
  'Name': 'Rashi',
  'Age': 28,
  'City': 'Lucknow',
  'grades': [18, 17, 19, 15],
  'details': [{'grade': 18, 'mean': 17}, {'grade': 17, 'mean': 18}],
  'State': 'UP',
  'Marital_Status': None,
  'Subject': ['Hindi', 'English', 'Math']}]
In [3]:
# can convert the cursor to array
cur = db.user.find({'State':'UP'})
for doc in list(cur):
    print(doc)
    
print(list(cur))
{'_id': ObjectId('5c26fb7932245fbaaa24e47e'), 'Name': 'Ashley', 'Age': 26, 'City': 'Kanpur', 'grades': [15, 15, 17, 15], 'details': [{'grade': 10, 'mean': 15}, {'grade': 13, 'mean': 17}], 'State': 'UP', 'Marital_Status': None, 'Subject': ['French', 'English', 'Art']}
{'_id': ObjectId('5c27f207802bd99ce0af5151'), 'Name': 'Rashi', 'Age': 28, 'City': 'Lucknow', 'grades': [18, 17, 19, 15], 'details': [{'grade': 18, 'mean': 17}, {'grade': 17, 'mean': 18}], 'State': 'UP', 'Marital_Status': None, 'Subject': ['Hindi', 'English', 'Math']}
[]

Better to keep a clone of a big cursor before iterating thorugh it

In [4]:
# you can clone a cursor
cur = db.user.find({'State':'UP'})
cur_cloned = cur.clone()

for doc in cur:
    print(doc)
print('')
    
for doc in cur_cloned:
    print(doc)
    
print('')    
print(list(cur))
print(list(cur_cloned))
{'_id': ObjectId('5c26fb7932245fbaaa24e47e'), 'Name': 'Ashley', 'Age': 26, 'City': 'Kanpur', 'grades': [15, 15, 17, 15], 'details': [{'grade': 10, 'mean': 15}, {'grade': 13, 'mean': 17}], 'State': 'UP', 'Marital_Status': None, 'Subject': ['French', 'English', 'Art']}
{'_id': ObjectId('5c27f207802bd99ce0af5151'), 'Name': 'Rashi', 'Age': 28, 'City': 'Lucknow', 'grades': [18, 17, 19, 15], 'details': [{'grade': 18, 'mean': 17}, {'grade': 17, 'mean': 18}], 'State': 'UP', 'Marital_Status': None, 'Subject': ['Hindi', 'English', 'Math']}

{'_id': ObjectId('5c26fb7932245fbaaa24e47e'), 'Name': 'Ashley', 'Age': 26, 'City': 'Kanpur', 'grades': [15, 15, 17, 15], 'details': [{'grade': 10, 'mean': 15}, {'grade': 13, 'mean': 17}], 'State': 'UP', 'Marital_Status': None, 'Subject': ['French', 'English', 'Art']}
{'_id': ObjectId('5c27f207802bd99ce0af5151'), 'Name': 'Rashi', 'Age': 28, 'City': 'Lucknow', 'grades': [18, 17, 19, 15], 'details': [{'grade': 18, 'mean': 17}, {'grade': 17, 'mean': 18}], 'State': 'UP', 'Marital_Status': None, 'Subject': ['Hindi', 'English', 'Math']}

[]
[]

Reload the Cursor - to reset to initial state

In [5]:
cur = db.user.find({'State':'UP'})
for doc in cur:
    print(doc)
cur.rewind()
print('')
for doc in cur:
    print(doc)
{'_id': ObjectId('5c26fb7932245fbaaa24e47e'), 'Name': 'Ashley', 'Age': 26, 'City': 'Kanpur', 'grades': [15, 15, 17, 15], 'details': [{'grade': 10, 'mean': 15}, {'grade': 13, 'mean': 17}], 'State': 'UP', 'Marital_Status': None, 'Subject': ['French', 'English', 'Art']}
{'_id': ObjectId('5c27f207802bd99ce0af5151'), 'Name': 'Rashi', 'Age': 28, 'City': 'Lucknow', 'grades': [18, 17, 19, 15], 'details': [{'grade': 18, 'mean': 17}, {'grade': 17, 'mean': 18}], 'State': 'UP', 'Marital_Status': None, 'Subject': ['Hindi', 'English', 'Math']}

{'_id': ObjectId('5c26fb7932245fbaaa24e47e'), 'Name': 'Ashley', 'Age': 26, 'City': 'Kanpur', 'grades': [15, 15, 17, 15], 'details': [{'grade': 10, 'mean': 15}, {'grade': 13, 'mean': 17}], 'State': 'UP', 'Marital_Status': None, 'Subject': ['French', 'English', 'Art']}
{'_id': ObjectId('5c27f207802bd99ce0af5151'), 'Name': 'Rashi', 'Age': 28, 'City': 'Lucknow', 'grades': [18, 17, 19, 15], 'details': [{'grade': 18, 'mean': 17}, {'grade': 17, 'mean': 18}], 'State': 'UP', 'Marital_Status': None, 'Subject': ['Hindi', 'English', 'Math']}
In [6]:
# to get document count
cur.count()
C:\Anaconda3\envs\mdb\lib\site-packages\ipykernel_launcher.py:2: DeprecationWarning: count is deprecated. Use Collection.count_documents instead.
  
Out[6]:
2

batch_size - limit the no of records being pulled in one call
If you are facing hardware constraints on mongodb server or requirement to fetch only specific no of records in one call, batch_size is useful but it will make multiple calls to server for fetching all the records
cursor.retrieved - This function will let us know how many records have been fetched in cursor call
cursor.distinct(key) - this function will return the distinct value of key in returned records in cursor

In [7]:
cur = db.user.find({'Age':{'$gt':21, '$lt':35}}).batch_size(5)  # fetch 5 records in each call
print(cur.count())                                              # total record count in cursot

for doc in cur:
    print(cur.retrieved, doc['_id'])                            # print retrieved records and _id for each record
print(cur.distinct('Job'))                                      # Distinct Job value in total 12 records
C:\Anaconda3\envs\mdb\lib\site-packages\ipykernel_launcher.py:2: DeprecationWarning: count is deprecated. Use Collection.count_documents instead.
  
12
5 5c16e869817810ed3fc5e5fa
5 5c16ea1c817810ed3fc5e5fc
5 5c16ea1d817810ed3fc5e5fe
5 5c26b381802bd99ce0af5112
5 5c26b381802bd99ce0af5114
10 5c26b381802bd99ce0af511a
10 5c26b386802bd99ce0af5143
10 5c26b386802bd99ce0af514a
10 5c26fb7932245fbaaa24e47e
10 5c27f207802bd99ce0af5151
12 5c2824cc802bd95eb47c5035
12 5c2980c6802bd982906c7133
['DA', 'Doctor', 'Singer', 'Student', 'Physicist']

There are so many other operations supported by cursor such as limit, skip and sort which are quite useful in mongodb work.

Update all the documents which dont have Fname, Name or name

In [8]:
cur=db.user.find({'$and':[{'Fname':{'$exists':False}}, {'Name':{'$exists':False}}, {'name':{'$exists':False}}]})
i=0
for doc in cur:
    db.user.update_one({'_id':doc['_id']},{'Fname':"User"+str(i)})
    i=i+1
db.user.count_documents({'$and':[{'Fname':{'$exists':False}}, {'Name':{'$exists':False}}, {'name':{'$exists':False}}]})
Out[8]:
0
In [ ]: