Sakila sample schema in MongoDB

2018 Update:  You can download this and other sample schemas we use in dbKoda from https://medium.com/dbkoda/mongodb-sample-collections-52d6a7745908.

I wanted to do some experimenting with MongoDB, but I wasn’t really happy with any of the sample data I could find in the web.  So I decided that I would translate the MySQL “Sakila” schema into MongoDB collections as part of the learning process.   

For those that don’t know, Sakila is a MySQL sample schema that was published about 8 years ago.  It’s based on a DVD rental system.   OK, not the most modern data ever, but DVDs are still a thing aren’t they??

You can get the MongoDB version of Sakilia here.  To load, use unpack using tar zxvf sakilia.tgz then use mongoimport to load the resulting JSON documents.  On windows you should be able to double click on the file to get to the JSON.  

The Sakila database schema is shown below.  There are 16 tables representing a fairly easy to understand inventory of films, staff, customers and stores.

Database diagram

When modelling MongoDB schemas, we partially ignore our relational modelling experience – “normalization” is not the desired end state.   Instead of driving our decision on the nature of the data, we drive it on the nature of operations.  The biggest decision is which “entities” get embedded within documents, and which get linked.  I’m not the best person to articulate these principles – the O’Reilly book “MongoDB Applied Design Patterns” does a pretty good job and this presentation is also useful.

My first shot at mapping the data – which may prove to be flawed as I play with MongoDB queries – collapsed the 16 tables into just 3 documents:  FILMS, STORES and CUSTOMERS.   ACTORS became a nested document in FILMS, STAFF and INVENTORY were nested into STORES, while RENTALS and PAYMENTS nested into CUSTOMERS.   Whether these nestings turn out to be good design decisions will depend somewhat on the application.  Some operations are going to be awkward while others will be expedited.

Here’s a look at the FILMS collection:

image

Here is STORES:

image

And here is CUSTOMERS:

image

Looks like I have to fix some float rounding issues on customers.rentals.payments.amount Smile.

The code that generates the schema is here.   It’s pretty slow, mainly because of the very high number of lookups on rentals and payments.  It would be better to bulk collect everything and scan through it but it would make the code pretty ugly.   If this were Oracle I’m pretty sure I could make it run faster but with MySQL SQL tuning is much harder.

Code is pretty straight forward.  To insert a MongoDB document we get the DBCollection, then create BasicDBObjects which we insert into the DBCollection.  To nest a documnet we create a BasicDBList and insert BasicDBObjects into it.  Then we add the BasicDBList to the parent BasicDBObject.  The following snippit illustrates that sequence.  It's mostly boilerplate code, with the only human decision being the nesting structure. 

   1: DBCollection filmCollection = mongoDb.getCollection(mongoCollection);
   2:  
   3: while (fileRs.next()) { // For each film
   4:  
   5:         // Create the actors document
   6:         BasicDBObject filmDoc = new BasicDBObject();
   7:         Integer filmId = fileRs.getInt("FILM_ID");
   8:         filmDoc.put("_id", filmId);
   9:         filmDoc.put("Title", fileRs.getString("TITLE"));
  10:         // Other attributes
  11:         BasicDBList actorList = getActors(mysqlConn, filmId);
  12:         // put the actor list into the film document
  13:         filmDoc.put("Actors", actorList);
  14:         filmCollection.insert(filmDoc); // insert the film
  15:  
  16:     }

Anyway, hopefully this might be of some use to those moving from MySQL to MongoDB.  Comments welcome!

D.I.Y. MySQL 5.1 monitoring

I wrote recently about using events and the new processlist table in MySQL 5.1 to keep track of the number of connected processes.  Although having the PROCESSLIST available as an INFORMATION SCHEMA table is usefull, it seemed to me that having SHOW GLOBAL STATUS exposed in a similar fashion would be far more useful.  So at the MySQL UC last year, I asked Brian Aker if that would be possible.  I know how many enhancement requests MySQL has to deal with, so I was really happy to see that table appear in the latest 5.1 build (5.1.14 beta). 

This table, together with the EVENT scheduler, lets us keep track of the values of status variables over time without having to have any external deamon running.  This won't come anywhere near to matching what MySQL have made avalable in Merlin, but still could be fairly useful.  So lets build a simple system using events to keep track of "interesting" status variables....

Read More

Unit testing stored procedures with Junit

Anyone who has used an automated unit testing framework such as Junit knows just how life-changing an automated test suite can be.   Once you've  experienced validating that recent changes have not broken old code, or discovering subtle bugs via junit that would otherwise have remained undetected , you naturally want to have this capability in all your programming environments.

Guisseppe Maxia has written a few stored procedure snippets to assist with automated unit testing of MySQL routines.  Unfortunately, the MySQL stored procedure language itself does not have the necessary abilities to fully implement the sort of unit testing we would like.  In particular, the inability for a stored procedure to capture the result sets generated by another stored procedure prevents a stored procedure from fully unit testing another. 

So I decided that - for me - Junit offered the best solution.  I created an extension to the Junit class that contains some assertions useful when unit testing stored procedure and extended my PlainSql JDBC wrapper classes to allow a stored procedure to return a single object that contains all its result sets and the values of OUT or INOUT parameters.   This object can be made the target of the various assertions.

If you're not familiar with Java and/or, you might feel that this solution is not for you.  However, the amount of java programming you need to do is very minimal and GUI environments such as Eclipse make it very easy to set up.  The rest of this article is available here;  it contains instructions, my Java classes and examples for setting up Junit test cases for MySQL stored procedures.

Read More

MySQL 5.1 events

I finally got around to working with the 5.1 scheduler.  I wanted to have a simple but non-trivial example, and when I saw Brian Akers post on the new processlist table, I thought of a useful little application:  I would submit an event that would summarize the users and their statuses at regular intervals so I could track user trends.

First off,  I needed to enable the scheduler by adding the following line to my configuration file:

event_scheduler=1

Now the scheduler is ready for action.  So I created a table to hold my process list details:

CREATE TABLE processhistory (h_timestamp DATETIME,
                             processcount INTEGER,
                             activecount INTEGER,
                             lockedcount INTEGER)$$

And then the DML to create an event:

CREATE EVENT evt_process_history
     ON SCHEDULE EVERY 2 MINUTE
     DO
BEGIN
    INSERT INTO processhistory (h_timestamp,processcount,
            activecount,lockedcount)
     SELECT NOW() AS h_timestamp,COUNT(*) AS processcount,
            SUM(active) AS activecount  ,
            SUM(locked) AS lockedcount
       FROM (SELECT CASE command WHEN 'Sleep' THEN 0 ELSE 1
                     END AS active ,
                    CASE state WHEN 'Locked' THEN 1 ELSE 0
                     END AS locked
               FROM information_schema.`PROCESSLIST` P) Q;
END$$

Every two minutes, the event summarizes the status of the sessions currently connected and stores them to the table.  I could use various tools to analze this data, but for convenience I used Excel with the ODBC driver to create a chart of activity:

Chart_1

Cool! Now I can keep track of active sessions over time, which could be useful. On the test database, there is a little ruby program that locks up a table needed by my java TP simulation, so we see those spikes of lock activity. I'm hoping that MySQL expose the SHOW STATUS command as a table as well, since we can't get at the contents of SHOW STATUS from within the stored program language.

Read More