Chris IJ Hwang

I am a Quantitative Analyst/Developer and Data Scientist with backgroud of Finance, Education, and IT industry. This site contains some exercises, projects, and studies that I have worked on. If you have any questions, feel free to contact me at ih138 at columbia dot edu.

View My GitHub Profile



Contents

MongoDB

Installation

  1. sudo apt-get install -y mongodb-org
  2. start MongoDB (if it didnt' start yet)
    sudo service mongod start
  3. enable remote access from my server. Edit /etc/mongodb.conf
    # Listen to local interface only. Comment out to listen on all interfaces. 
    #bind_ip = 127.0.0.1
  4. $ sudo service mongod stop
    mongod stop/waiting
    $ sudo service mongod start
    mongod start/running, process 14087

Quick reference

1. Import exercise file

https://docs.mongodb.org/getting-started/shell/import-data/

https://raw.githubusercontent.com/mongodb/docs-assets/primer-dataset/dataset.json

save to a file named primer-dataset.json.

2. dump to mongoDb

Now, Import data into the collection from bash shell (not mongo shell)

db name: test

collection name: restaurants

if there is a same collection, drop it first

$ mongoimport --db test --collection restaurants --drop --file primer-dataset.json

3. Insert in mongo shell

from the shell
$ mongo
Now, in the mongo shell
use test

db.restaurants.insert(
   {
      "address" : {
         "street" : "2 Avenue",
         "zipcode" : "10075",
         "building" : "1480",
         "coord" : [ -73.9557413, 40.7720266 ],
      },
      "borough" : "Manhattan",
      "cuisine" : "Italian",
      "grades" : [
         {
            "date" : ISODate("2014-10-01T00:00:00Z"),
            "grade" : "A",
            "score" : 11
         },
         {
            "date" : ISODate("2014-01-16T00:00:00Z"),
            "grade" : "B",
            "score" : 17
         }
      ],
      "name" : "Vella",
      "restaurant_id" : "41704620"
   }
)

4. Find in mongo shell

db.restaurants.find( { "borough": "Manhattan" } )
db.restaurants.find( { "address.zipcode": "10075" } )
db.restaurants.find( { "grades.grade": "B" } )
db.restaurants.find( { "grades.score": { $gt: 30 } } )
db.restaurants.find( { "grades.score": { $lt: 10 } } )
db.restaurants.find( { "cuisine": "Italian", "address.zipcode": "10075" } )
db.restaurants.find(
   { $or: [ { "cuisine": "Italian" }, { "address.zipcode": "10075" } ] }
)
db.restaurants.find().sort( { "borough": 1, "address.zipcode": 1 } )

Read Operation

db.collection.find() method from MongoDB shell, returns a cursor to the matching documents.

Cursor

  • The returned cursor must be assigned to a variable. Otherwise ti is automatically iterated up to 20 times. So, it shows the first 20 matching documents.
  • Query results returned are in batches. Batch size will not exceed maximum BSON size. The rist batch returns 101 documnets or 1 megabyte. Subsequent batch size is 4 megabytes. (see batchSize() and limit())
var myCursor = db.inventory.find();

var myFirstDocument = myCursor.hasNext() ? myCursor.next() : null;

myCursor.objsLeftInBatch();
    
var myCursor = db.inventory.find( { type: 'food' } );

while (myCursor.hasNext()) {
   print(tojson(myCursor.next()));
}   
var myCursor = db.inventory.find( { type: 'food' } );

while (myCursor.hasNext()) {
   printjson(myCursor.next());
}

The toArray() method loads into RAM all documents returned.

var myCursor =  db.inventory.find( { type: 'food' } );

myCursor.forEach(printjson);
var myCursor = db.inventory.find( { type: 'food' } );
var documentArray = myCursor.toArray();
var myDocument = documentArray[3];
var myCursor = db.inventory.find( { type: 'food' } );
var myDocument = myCursor[3];
# The last line is same to the below:
myCursor.toArray()[3];

5. Update in mongo shell

# first document matched. $currentDate operator is useful.
db.restaurants.update(
    { "name" : "Juni" },
    {
      $set: { "cuisine": "American (New)" },
      $currentDate: { "lastModified": true }
    }
)


db.restaurants.update(
  { "restaurant_id" : "41156888" },
  { $set: { "address.street": "East 31st Street" } }
)

# Multiple documents. This case will change ALL document matched.
db.restaurants.update(
  { "address.zipcode": "10016", cuisine: "Other" },
  {
    $set: { cuisine: "Category To Be Determined" },
    $currentDate: { "lastModified": true }
  },
  { multi: true}
)

# Replace a document
db.restaurants.update(
   { "restaurant_id" : "41704620" },
   {
     "name" : "Vella 2",
     "address" : {
              "coord" : [ -73.9557413, 40.7720266 ],
              "building" : "1480",
              "street" : "2 Avenue",
              "zipcode" : "10075"
     }
   }
)

5. Remove in mongo shell

# remove all document matched by default
db.restaurants.remove( { "borough": "Manhattan" } )

# remove only one document matched.
db.restaurants.remove( { "borough": "Queens" }, { justOne: true } )

# remove all docs
db.restaurants.remove( { } )

# drop a collection
db.restaurants.drop()

6. Data Aggregation in mongo shell

# Using group stage and $sum accumulator
db.restaurants.aggregate(
   [
     { $group: { "_id": "$borough", "count": { $sum: 1 } } }
   ]
);

# The result will be:
{ "_id" : "Staten Island", "count" : 969 }
{ "_id" : "Brooklyn", "count" : 6086 }
{ "_id" : "Manhattan", "count" : 10259 }
{ "_id" : "Queens", "count" : 5656 }
{ "_id" : "Bronx", "count" : 2338 }
{ "_id" : "Missing", "count" : 51 }

# Filter and Group docs
db.restaurants.aggregate(
   [
     { $match: { "borough": "Queens", "cuisine": "Brazilian" } },
     { $group: { "_id": "$address.zipcode" , "count": { $sum: 1 } } }
   ]
);

# The result will be:
{ "_id" : "11368", "count" : 1 }
{ "_id" : "11106", "count" : 3 }
{ "_id" : "11377", "count" : 1 }
{ "_id" : "11103", "count" : 1 }
{ "_id" : "11101", "count" : 2 }

7. Indexs with the mongo shell

# Index on "cuisine" field.
db.restaurants.createIndex( { "cuisine": 1 } )


# Create a compound index.
db.restaurants.createIndex( { "cuisine": 1, "address.zipcode": -1 } )

MongoDB-php driver

In shared-hoting

Download php.ini and put it in "~/www", then check phpinfo.() again, Loaded configuration file

MongoDB project plan

1. MongoDB remote access enable

  1. Check mongeDB setting

    In /etc/mongod.conf

    bind_ip= 127.0.0.1 , mongoDB's private ip , mongoDB's public ip 
    #caution: no space between comma
    		
    if you got this error,
  2. Then, restart the service
    sudo service mongod stop
    sudo service mongod start
    		
  3. Then, check the log
    tail -f /var/log/mongodb/mongod.log 
    		
    It should shows
    I NETWORK  [initandlisten] waiting for connections on port 27017
  4. If error occurs, comment out the bind_ip above. Let aws handle the incoming security. At AWS server, make sure security has open to port 27017, 28017.

Examples with mongo shell

use nota

db.mstudents.insert(
    {
        "name": {
            "first":"Finst",
            "last" :"goodstud",
        },
        "sql_id": 143294,
        "username":"Finst"
    }
 )
    
db.mstudents.insert(
    {
        "name": {
            "first":"Kimmer",
            "last" :"Goods",
        },
        "sql_id": 143295,
        "username":"Kimmer"
    }  
)

#### Multiple documents --> use array.
db.mstudents.insert(
  [
    {
        "name": {
            "first":"Finst",
            "last" :"goodstud",
        },
        "sql_id": 143294,
        "username":"Finst"
    },
    {
        "name": {
            "first":"Kimmer",
            "last" :"Goods",
        },
        "sql_id": 143295,
        "username":"Kimmer"
    }
  ]  
)

    
db.mstudents.find({"sql_id": 143295 })
 
#### Add new fields. 
db.mstudents.update(
    {"sql_id": 143295},
    {
        $set: { "temp": "temp_value" },
        $currentDate: { "lastModified": true }
    }
)

#### Remove fields one by one.
db.mstudents.update(
    {"sql_id": 143295},
    {
        $unset: { "temp": "" }
    }
)
    
db.mstudents.update(
    {"sql_id": 143295},
    {
        $unset: { "lastModified":""}
    }
)