A few new MongoDB performance blogs

Most of my recent blogging has been done on the dbKoda website where I've been writing mostly about MongoDB performance tuning.   If you are interested, here are links to a few of the more interesting posts:

 

 

 

Effective MongoDB indexing (pt 1)

An index is a object with its own unique storage that provides a fast access path into a collection. Indexes exist primarily to enhance performance, so understanding and using indexes effectively is therefore of paramount importance when optimizing MongoDB performance.

B-tree Indexes

The B-tree (“Balanced Tree”) index, is MongoDB’s default index structure. Below is a high-level overview of B-tree index structure.

Read the rest of this post at https://medium.com/dbkoda/getting-started-with-mongodb-indexes-part-1-793667b252e8

Announcing dbKoda 0.7

0.7.0 is the second public release of dbKoda and our first post-MVP release.  With the MVP (Minimal Viable Product) we definitely nailed the "M" criteria, and in this release we've pushing harder on the "V" side of the equation. 

As with 0.6, dbKoda is a free, open source, Vegan product made by groovy people in Melbourne Australia.  It's licensed under the AGPL 3.0.   

As well as many bug fixes, brand new bugs and performance improvements, we added the following features:

Aggregation Builder

At this years MongoDB conference, we spotted one of the MongoDB engineers wearing a "Aggregate() is the new Find()" T-shirt.  It's funny, because it's true: almost every non-trivial MongoDB data retrieval operation requires an aggregation pipeline.  Features such as joins and graph lookups can only be done through the aggregation framework and under the hood features such as the BI connector depend on aggregation as well. 

As anybody who has ever written an aggregation framework pipeline knows, the process is very tedious and error prone - matching braces and getting syntax exactly right is difficult.  So in dbKoda 0.7 our aggregation builder allows you to drag and drop pipeline elements and use file in the blank forms to construct complex pipelines.  It's amazing how quickly you can build up a complex pipeline using the builder - a video on our youtube channel shows me building up a non-trival pipeline in under 60 seconds: so try it out!  

(click to enlarge)

Storage Drilldown 

MongoDB can tell you how much space is used up in databases, collections and indexes, but it is not so good at breaking down space within a document. Because MongoDB's document model supports nested arrays of documents, it’s often the space used within a collection that is the most important thing to identify. For instance, typical reasons for space blowouts MongoDB are unbounded arrays of nested collections.

dbKoda's storage drilldown breaks down space used within databases, collections, indexes and shows you how storage is used within a collection. It does this in an intuitive graphical presentation that allows you to drill in and out of nested documents.

 

SSH tunneling connections

We were all horrified at the explosion of ransomware attacks on MongoDB databases early in 2017. The root cause of the security vulnerabilities in these databases was the failure to correctly create authenticated users, but it is also true that you take your life in your hands whenever you expose a database port to the public Internet. For this reason it’s often the best practice to leave database ports open only within a walled garden. If you want to perform day administration using tools such as dbKoda then you use SSH tunneling to establish a connection. 

This FAQ entry describes SSH tunnelling and how it is used in dbKoda. Put simply, you can now specify an intermediate host which offers you SSH connectivity and use that host to forward  database requests to the secured MongoDB server.

Enhanced JSON viewer

Complex JSON documents can be difficult to read. By default dbKoda will display JSON output as it would appear in the MongoDB shell. We aspire to complete shall compatibility after all!  In 0.7, we offer an enhanced trace and your that allows you to examine JSON documents with multiple levels of detail, allowing you to expand collapse subdocuments and long strings. This facility is available wherever JSON output is displayed on the product, by right clicking and choosing “enhanced JSON output”.

 

Export/Import

0.7 allows you to load or unload data to or from your MongoDB server. This facility provides GUI access to the `mongodump`, `mongorestore`, `mongoexport` and `mongoimport` commands. 

 

Enhanced performance on windows

In 0.6, there were some performance issues when very large amounts of data were displayed in our output panel. We worked hard to resolve these and now believe that performance on windows will match performance on Linux and Mac.

Summing up

We added a lot of cool functionality in this release - we hope you'll try it out and let us know what you think.  Download dbKoda from www.dbkoda.com and let us know what you think at our support site.

Announcing dbKoda!

I'm very excited to announce the release of dbKoda - a next generation database development and administration tool now available for MongoDB.

Those who've been following me know that I've been working with databases since the early Mesozoic period and I've worked in database tooling for almost two decades.

Working with next generation databases like MongoDB has been a lot of fun, but did make me realise how much need there is for a strong tooling ecosystem around these new databases.  I like to think that I made significant contributions to tooling for relational databases and had a strong desire to build something for post-relational systems.

Consequently, late last year I founded the company Southbank Software and this week we launched our first product - dbKoda (www.dbKoda.com).

dbKoda is a modern, open source database development tool.  The first release targets MongoDB.   It is a 100% Javascript application which runs on Linux, Mac or Windows.  It features a rich editing environment with syntax highlighting, code completion and formatting.  It also offers easy graphical access to common MongoDB administration and configuration tasks.

Coding

I'm really excited about dbKoda - I hope that it will become the foundation for a product family that will support modern database development across a wide range of environments.   And working closely with the small team of brilliant dbKoda developers has been an absolute privilege.

Checkout the dbKoda website and download dbKoda here.  You can also checkout an introductory video on dbKoda.   Please also follow dbKoda on https://twitter.com/db_Koda.

Chart

Optimizing the order of MongoDB aggregation steps

An updated version of this blog post can be found at https://www.dbkoda.com/blog/2017/10/14/Optimizing-the-order-of-aggregation-pipelines

 

MongoDB does have a query optimizer, and in most cases it's effective at picking the best of multiple possible plans.  However it's worth remembering that in the case of the aggregate function the sequence in which various steps are executed is completely under your control.  The optimizer won't reorder steps into the optimal sequence to get you out of trouble. 


Optimizing the order of steps probably comes mainly to reducing the amount of data in the pipeline as early as possible – this reduces the amount of work that has to be done by each successive step.  The corollary is that steps that perfom a lot of work on data should be placed after any filter steps.

 

Please go to https://medium.com/dbkoda/optimizing-the-order-of-aggregation-pipelines-44c7e3f4d5dd to read the rest of this post 

 

Bulk inserts in MongoDB

Like most database systems, MongoDB provides API calls that allow multiple documents to be inserted or retrieved in a single operation.

These “Array” or “Bulk” interfaces improve database performance markedly by reducing the number of round trips between the client and the databases – dramatically. To realise how fundamental an optimisation this is, consider that you have a bunch of people that you are going to take across a river. You have a boat that can take 100 people at a time, but for some reason you are only taking one person across in each trip – not smart, right? Failing to take advantage of array inserts is very similar: you are essentially sending network packets that could take hundreds of documents over with only a single document in each packet.

Optimizing bulk reads using .batchSize()

Read the rest of this post at https://medium.com/dbkoda/bulk-operations-in-mongodb-ed49c109d280

 

Graph Lookup in MongoDB 3.3

Specialized graph databases such as Neo4J specialize in traversing graphs of relationships – such as those you might find in a social network.  Many non-graph databases have been incorporating Graph Compute Engines to perform similar tasks.  In the MongoDB 3.3 release, we now have the ability to perform simple graph traversal using the $graphLookup aggregation framework function.  This will become a production feature in the 3.4 release.

The new feature is documented in MongoDB Jira SERVER-23725.  The basic syntax is shown here:

   1: {$graphLookup:
   2:         from: <name of collection to look up into>,
   3:         startWith: <expression>,
   4:         connectFromField: <name of field in document from “from”>,
   5:         connectToField: <name of field in document from “from”>,
   6:         as: <name of field in output document>,
   7:         maxDepth: <optional - non-negative integer>,
   8:         depthField: <optional - name of field in output
   9:  documents>
  10:     }

Please go to https://medium.com/dbkoda/optimising-graph-lookups-in-mongodb-49483afb55c8 to read the rest of this article (updated for MongoDB 3.6)

Join performance in MongoDB 3.2 using $lookup

 

One of the key tenants of MongoDB schema design is to account for the absence of server-side joins.  Data is joined all the time inside of application code of course, but traditionally there’s been no way to perform joins within the server itself. 

This changed in 3.2 with the introduction of the $lookup operator within the aggregation framework.  $lookup performs the equivalent of a left outer join – eg: it retrieves matching data from another document and returns null data if no match is found.

Here’s an example using the MongoDB version of the Sakila dataset that I converted from MySQL back in this post

Please go to https://medium.com/dbkoda/coding-efficient-mongodb-joins-97fe0627751a to read the rest of this post (it has been migrated to a medium.com article and updated). 

Sakila sample schema in MongoDB

2018 Update:  You can download this and other sample schemas we use in dbKoda from https://medium.com/dbkoda/mongodb-sample-collections-52d6a7745908.

I wanted to do some experimenting with MongoDB, but I wasn’t really happy with any of the sample data I could find in the web.  So I decided that I would translate the MySQL “Sakila” schema into MongoDB collections as part of the learning process.   

For those that don’t know, Sakila is a MySQL sample schema that was published about 8 years ago.  It’s based on a DVD rental system.   OK, not the most modern data ever, but DVDs are still a thing aren’t they??

You can get the MongoDB version of Sakilia here.  To load, use unpack using tar zxvf sakilia.tgz then use mongoimport to load the resulting JSON documents.  On windows you should be able to double click on the file to get to the JSON.  

The Sakila database schema is shown below.  There are 16 tables representing a fairly easy to understand inventory of films, staff, customers and stores.

Database diagram

When modelling MongoDB schemas, we partially ignore our relational modelling experience – “normalization” is not the desired end state.   Instead of driving our decision on the nature of the data, we drive it on the nature of operations.  The biggest decision is which “entities” get embedded within documents, and which get linked.  I’m not the best person to articulate these principles – the O’Reilly book “MongoDB Applied Design Patterns” does a pretty good job and this presentation is also useful.

My first shot at mapping the data – which may prove to be flawed as I play with MongoDB queries – collapsed the 16 tables into just 3 documents:  FILMS, STORES and CUSTOMERS.   ACTORS became a nested document in FILMS, STAFF and INVENTORY were nested into STORES, while RENTALS and PAYMENTS nested into CUSTOMERS.   Whether these nestings turn out to be good design decisions will depend somewhat on the application.  Some operations are going to be awkward while others will be expedited.

Here’s a look at the FILMS collection:

image

Here is STORES:

image

And here is CUSTOMERS:

image

Looks like I have to fix some float rounding issues on customers.rentals.payments.amount Smile.

The code that generates the schema is here.   It’s pretty slow, mainly because of the very high number of lookups on rentals and payments.  It would be better to bulk collect everything and scan through it but it would make the code pretty ugly.   If this were Oracle I’m pretty sure I could make it run faster but with MySQL SQL tuning is much harder.

Code is pretty straight forward.  To insert a MongoDB document we get the DBCollection, then create BasicDBObjects which we insert into the DBCollection.  To nest a documnet we create a BasicDBList and insert BasicDBObjects into it.  Then we add the BasicDBList to the parent BasicDBObject.  The following snippit illustrates that sequence.  It's mostly boilerplate code, with the only human decision being the nesting structure. 

   1: DBCollection filmCollection = mongoDb.getCollection(mongoCollection);
   2:  
   3: while (fileRs.next()) { // For each film
   4:  
   5:         // Create the actors document
   6:         BasicDBObject filmDoc = new BasicDBObject();
   7:         Integer filmId = fileRs.getInt("FILM_ID");
   8:         filmDoc.put("_id", filmId);
   9:         filmDoc.put("Title", fileRs.getString("TITLE"));
  10:         // Other attributes
  11:         BasicDBList actorList = getActors(mysqlConn, filmId);
  12:         // put the actor list into the film document
  13:         filmDoc.put("Actors", actorList);
  14:         filmCollection.insert(filmDoc); // insert the film
  15:  
  16:     }

Anyway, hopefully this might be of some use to those moving from MySQL to MongoDB.  Comments welcome!

MongoDB, Oracle and Toad for Cloud Databases

We recently added support for MongoDB in Toad for Cloud Databases, so I took the opportunity of writing my first MongoDB Java program and taking the Toad functionality for a test drive.

MongoDB is a non-relational, document oriented database that is extremely popular with developers (see for instance this Hacker news poll).   Toad for cloud databases allows you to work with non-relational data using SQL by normalizing the data structures and converting SQL to the non-relational calls.

I wanted to get started by creating some MongoDB collections with familiar data.  So I wrote a Java program that takes data out of the Oracle sample schema, and loads it into Mongo as documents.  The program is here.

 

The key parts of the code are shown here:

   1: while (custRs.next()) { // For each customer
   2:     String custId = custRs.getString("CUST_ID");
   3:     String custFirstName = custRs.getString("CUST_FIRST_NAME");
   4:     String custLastName = custRs.getString("CUST_LAST_NAME");
   5:  
   6:     //Create the customer document 
   7:     BasicDBObject custDoc = new BasicDBObject();
   8:     custDoc.put("_id", custId);
   9:     custDoc.put("CustomerFirstName", custFirstName);
  10:     custDoc.put("CustomerLastName", custLastName);
  11:     // Create the product sales document 
  12:     BasicDBObject customerProducts = new BasicDBObject();
  13:     custSalesQry.setString(1, custId);
  14:     ResultSet prodRs = custSalesQry.executeQuery();
  15:     Integer prodCount = 0;
  16:     while (prodRs.next()) { //For each product sale 
  17:         String  timeId=prodRs.getString("TIME_ID"); 
  18:         Integer prodId = prodRs.getInt("PROD_ID");
  19:         String prodName = prodRs.getString("PROD_NAME");
  20:         Float Amount = prodRs.getFloat("AMOUNT_SOLD");
  21:         Float Quantity = prodRs.getFloat("QUANTITY_SOLD");
  22:         // Create the line item document 
  23:         BasicDBObject productItem = new BasicDBObject();            
  24:         productItem.put("prodId", prodId);
  25:         productItem.put("prodName", prodName);
  26:         productItem.put("Amount", Amount);
  27:         productItem.put("Quantity", Quantity);
  28:         // Put the line item in the salesforcustomer document 
  29:         customerProducts.put(timeId, productItem);
  30:         if (prodCount++ > 4) { // Just 5 for this demo
  31:             prodCount = 0;
  32:             break;
  33:         }
  34:     }
  35:     // put the salesforcustomer document in the customer document 
  36:     custDoc.put("SalesForCustomer", customerProducts);
  37:  
  38:     System.out.println(custDoc);
  39:     custColl.insert(custDoc);  //insert the customer 
  40:     custCount++;
  41:  
  42: }

Here’s how it works:

Lines Description
1-4 We loop through each customer,   retrieving the key customer details
7-10 We create a basic MongoDB document that contains the customer details
12 We create another MongoDB document that will contain all the product sales for the customer
16-21 Fetching the data for an individual sale for that customer from Oracle
23-27 We create a document for that single sale
29 Add the sale to the document containing all the sales
36 Add all the sales to the customer
39 Add the customer document to the collection

 

The MongoDB API is very straight forward; much easier than similar APIs for HBase or Cassandra.

When we run the program, we create JSON documents in Mongo DB that look like this:

   1: { "_id" : "7" , "CustomerFirstName" : "Linette" , "CustomerLastName" : "Ingram" , 
   2:     "SalesForCustomer" : {
   3:         "2001-05-30 00:00:00" : { "prodId" : 28 , "prodName" : "Unix/Windows 1-user pack" , "Amount" : 205.76 , "Quantity" : 1.0} , 
   4:         "1998-04-18 00:00:00" : { "prodId" : 129 , "prodName" : "Model NM500X High Yield Toner Cartridge" , "Amount" : 205.48 , "Quantity" : 1.0}
   5:     }
   6: }
   7: { "_id" : "8" , "CustomerFirstName" : "Vida" , "CustomerLastName" : "Puleo" , 
   8:     "SalesForCustomer" : { 
   9:         "1999-01-27 00:00:00" : { "prodId" : 18 , "prodName" : "Envoy Ambassador" , "Amount" : 1726.83 , "Quantity" : 1.0} , 
  10:         "1999-01-28 00:00:00" : { "prodId" : 18 , "prodName" : "Envoy Ambassador" , "Amount" : 1726.83 , "Quantity" : 1.0} , 
  11:         "1998-04-26 00:00:00" : { "prodId" : 20 , "prodName" : "Home Theatre Package with DVD-Audio/Video Play" , "Amount" : 608.39 , "Quantity" : 1.0} ,
  12:         "1998-01-19 00:00:00" : { "prodId" : 28 , "prodName" : "Unix/Windows 1-user pack" , "Amount" : 216.99 , "Quantity" : 1.0} , 
  13:         "1998-03-19 00:00:00" : { "prodId" : 28 , "prodName" : "Unix/Windows 1-user pack" , "Amount" : 216.99 , "Quantity" : 1.0} 
  14:     }
  15: }

 

Toad for Cloud “renormalizes” the documents so that they resemble something that we might use in a more conventional database.  So in this case,   Toad creates two tables from the Mongo collection, one for customers, and one which contains the sales for a customer.   You can rename the auto-generated foreign keys and the sub-table name to make this a bit clearer, as in the example below:

 

SNAGHTML1a75d15d

 

We can more clearly see the relationships in the .NET client by using Toad’s visual query builder (or we could have used the database diagram tool):

 

SNAGHTML1a7b4f62

 

MongoDB has a pretty rich query language, but it’s fairly mysterious to those of us are used to SQL, and it’s certainly not as rich as the SQL language.  Using Toad for Cloud, you can issue ANSI standard SQL against your MongoDB tables and quickly browse or perform complex queries.  Later this year,   this Mongo support will emerge in some of our commercial data management tools such as Toad for Data Analysts and our soon to be announced BI tools.

 

SNAGHTML1bfbfc5b