Sakila sample schema in MongoDB

2018 Update:  You can download this and other sample schemas we use in dbKoda from https://medium.com/dbkoda/mongodb-sample-collections-52d6a7745908.

I wanted to do some experimenting with MongoDB, but I wasn’t really happy with any of the sample data I could find in the web.  So I decided that I would translate the MySQL “Sakila” schema into MongoDB collections as part of the learning process.   

For those that don’t know, Sakila is a MySQL sample schema that was published about 8 years ago.  It’s based on a DVD rental system.   OK, not the most modern data ever, but DVDs are still a thing aren’t they??

You can get the MongoDB version of Sakilia here.  To load, use unpack using tar zxvf sakilia.tgz then use mongoimport to load the resulting JSON documents.  On windows you should be able to double click on the file to get to the JSON.  

The Sakila database schema is shown below.  There are 16 tables representing a fairly easy to understand inventory of films, staff, customers and stores.

Database diagram

When modelling MongoDB schemas, we partially ignore our relational modelling experience – “normalization” is not the desired end state.   Instead of driving our decision on the nature of the data, we drive it on the nature of operations.  The biggest decision is which “entities” get embedded within documents, and which get linked.  I’m not the best person to articulate these principles – the O’Reilly book “MongoDB Applied Design Patterns” does a pretty good job and this presentation is also useful.

My first shot at mapping the data – which may prove to be flawed as I play with MongoDB queries – collapsed the 16 tables into just 3 documents:  FILMS, STORES and CUSTOMERS.   ACTORS became a nested document in FILMS, STAFF and INVENTORY were nested into STORES, while RENTALS and PAYMENTS nested into CUSTOMERS.   Whether these nestings turn out to be good design decisions will depend somewhat on the application.  Some operations are going to be awkward while others will be expedited.

Here’s a look at the FILMS collection:

image

Here is STORES:

image

And here is CUSTOMERS:

image

Looks like I have to fix some float rounding issues on customers.rentals.payments.amount Smile.

The code that generates the schema is here.   It’s pretty slow, mainly because of the very high number of lookups on rentals and payments.  It would be better to bulk collect everything and scan through it but it would make the code pretty ugly.   If this were Oracle I’m pretty sure I could make it run faster but with MySQL SQL tuning is much harder.

Code is pretty straight forward.  To insert a MongoDB document we get the DBCollection, then create BasicDBObjects which we insert into the DBCollection.  To nest a documnet we create a BasicDBList and insert BasicDBObjects into it.  Then we add the BasicDBList to the parent BasicDBObject.  The following snippit illustrates that sequence.  It's mostly boilerplate code, with the only human decision being the nesting structure. 

   1: DBCollection filmCollection = mongoDb.getCollection(mongoCollection);
   2:  
   3: while (fileRs.next()) { // For each film
   4:  
   5:         // Create the actors document
   6:         BasicDBObject filmDoc = new BasicDBObject();
   7:         Integer filmId = fileRs.getInt("FILM_ID");
   8:         filmDoc.put("_id", filmId);
   9:         filmDoc.put("Title", fileRs.getString("TITLE"));
  10:         // Other attributes
  11:         BasicDBList actorList = getActors(mysqlConn, filmId);
  12:         // put the actor list into the film document
  13:         filmDoc.put("Actors", actorList);
  14:         filmCollection.insert(filmDoc); // insert the film
  15:  
  16:     }

Anyway, hopefully this might be of some use to those moving from MySQL to MongoDB.  Comments welcome!

Using GET DIAGNOSTICS in MySQL 5.6

When Steven and I wrote MySQL Stored Procedure programming our biggest reservation about the new stored procedure language was the lack of support for proper error handling.  The lack of the SIGNAL and RESIGNAL clauses prevented a programmer from raising an error that could be propagated throughout a call stack properly, and the lack of a general purpose exception handler which could examine error codes at run time led to awkward exception handling code at best, and poorly implemented error handling at worst.

In 5.4 MySQL implemented the SIGNAL and RESIGNAL clauses (see http://guyharrison.squarespace.com/blog/2009/7/13/signal-and-resignal-in-mysql-54-and-60.html), which corrected half of the problem.  Now finally, MySQL 5.6 implements the ANSI GET DIAGNOSTICS clause and we can write a general catch-all exception handler.

Here’s an example:

image

The exception handler is on lines 10-27.  It catches any SQL exception, then uses the GET DIAGNOSTICS clause to fetch the SQLstate, MySQL error code and messages to local variables.  We then decide what to do for anticipated errors – duplicate or badly formed product codes and SIGNAL a more more meaningful application error.  Unexpected errors are RESIGNALed on line 24.

This is a great step forward for MySQL stored procedures – the lack of a means to programmatically examine error codes made proper error handling difficult or impossible.  This is a major step forward in maturity. 

Thanks to Ernst Bonat of www.evisualwww.com for helping me work through the usage of GET DIAGNOSTICS.

SIGNAL and RESIGNAL in MySQL 5.4 and 6.0

One of the most glaring ommissions in the MySQL Stored procedure implementation was the lack of the ANSI standard SIGNAL and RESIGNAL clauses.  These allow a stored procedure to conditionally return an error to the calling program.

When Steven and I wrote MySQL Stored Procedure programming we lamented this ommission, and proposed an addmittedly clumsy workaround.  Our workaround involved creating and procedure in which dynamic SQL in which the error message was embedded in the name of a non-existent table.  When the procedure was executed, the non-existing table name at least allowed the user to see the error.  So for instance, here is the my_signal procedure:

Read More

MySQL stored procedures with Ruby

Ruby's getting an incredible amount of attention recently, largely as the result of Ruby on Rails.  I've played a little with Ruby on Rails and it certainly is the easiest way I've seen so far to develop  web interfaces to a back-end database.

At the same time,  I've been shifting from perl to Java as my language of choice for any serious database utility development.  But I still feel the need for something dynamic and hyper-productive when I'm writing something one-off or for my own use.  I've been playing with Python, but if Ruby has the upper ground as a web platform then maybe I should try Ruby. 

So seeing as how I've just finished the MySQL stored procedure book, first thing is to see if I can use Ruby for MySQL stored procedures.

Database - and MySQL - support for Ruby is kind of all over the place.  There's a DBI option (similar to perl) which provides a consistent interface and there's also native drivers.  For MySQL there are pure-ruby native drivers and drivers written in C.  Since the DBI is based on the native driver, I thought I'd try the native driver first.  The pure-ruby driver gave me some problems so I started with the C driver on Linux (RHAS4). 

Retrieving multiple result sets

The main trick with stored procedures is that they might return multiple result sets. OUT or INOUT parameters can be an issue too, but you can always work around that using session variables. 

If you try to call a stored procedure that returns a result set, you'll at first get a "procedure foo() can't return a result set in the given context error".  This is because the CLIENT_MULTI_RESULTS flag is not set by default when the connection is created.  Luckily we can set that in our own code:

dbh=Mysql.init
dbh.real_connect("127.0.0.1", "root", "secret", "prod",3306,nil,Mysql::CLIENT_MULTI_RESULTS)

The "query" method returns a result set as soon as it is called, but I found it easier to retrieve each result set manually, so i set the query_with_result attribute to false:

dbh.query_with_result=false

The next_result and more_results methods are implemented in the Ruby MySql driver, but there's some weird things about the more_results C API call that causes problems in python and PHP.  In Ruby, the more_results call returns true whether or not there is an additional result.   The only reliable way I found to determine if there is another result set is to try and grab the results and bail out if an exception fires (the exception doesn't generate an error code, btw);
      
    dbh.query("CALL foo()")
    begin
      rs=dbh.use_result
    rescue Mysql::Error => e 
      no_more_results=true
    end

.
We can then call more_results at the end of each rowset loop.  So here's a method that dumps all the result sets from a stored procedure call as XML using this approach (I'm know the Ruby is probably crap, it's like my 3rd Ruby program):

def procXML(dbh,sql)
  connect(dbh)
  no_more_results=false
  dbh.query(sql)
  printf("<?xml version='1.0'?>\n");
  printf("<proc sql=\"%s\">\n",sql)
  result_no=0
  until no_more_results
    begin
      rs=dbh.use_result
    rescue Mysql::Error => e 
      no_more_results=true
    end 
     if no_more_results==false
      result_no+=1
      colcount=rs.fetch_fields.size
      rowno=0
      printf("\t<resultset id=%d columns=%s>\n",result_no,colcount)
      rs.each do |row|
        rowno+=1
        printf "\t\t<row no=%d>\n",rowno
        rs.fetch_fields.each_with_index do |col,i|
          printf("\t\t\t<colvalue column=\"%s\">%s</colvalue>\n",col.name,row[i])
        end
        printf("\t\t</row>\n")
      end
      printf("\t</resultset>\n");
      rs.free
      dbh.next_result
    end
  end
  printf("</proc>\n")
end

No C programming required!

Whew!  No need to hack into the C code.  So you can use MySQL stored procedures in Ruby with the existing native C driver. The problem is that the C driver is not yet available as a binary on Windows yet and trying to compile it turns out to be beyond my old brain (and yes, I used minGW and all the other "right" things).   Hopefully a copy of the MySQL binary driver it will be available in the one-click installer Ruby installer eventually.

The above code doesn't work using the pure-Ruby driver on windows by the way -  there's an "out of sequence" error when trying to execute the stored proc.  I might hack around on that later (at the moment I'm 35,000 ft with 15 minutes of battery left on the way to the MySQL UC).  For now if you want to use MySQL stored procedures in a ruby program on windows I can't help.

Note that ruby seems to hit a bug that causes MySQL to go away if there are two calls to the same stored proc in the same session and the stored proc is created using server-side prepared statements.  Fixed soon hopefully, but for now if you get a "MySQL server has gone away error" you might be hitting the same problem.   Wez posted on this problem here.

I suppose the end of this investigation will probably be to see if there's any way to use stored procedure calls to maintain a Rails AcitveRecord object.  Not that I think you'd necessarily want to, but it would probably be a good learning exercise.

Read More

Building ruby with Oracle and MySQL support on windows

If you did the setup neccessary to compile perl with MySQL and Oracle support (), you are well setup to do the same for ruby.  Why this should be so hard I don't know:  python produces very easy to install windows binaries, but if you want anything beyond the basics in perl and ruby you need to try and turn windows into Unix first. Sigh.

http://www.rubygarden.org/ruby?HowToBuildOnWindows explains the ruby build procedure.   I'm really just adding instructions for getting the ruby dbi modules for mysql and oracle.

Make sure mingw and msys are first in your path.

Enter the mingw shell:  sh

sh-2.04$ ./configure --prefix=/c/tools/myruby

sh-2.04$ make

sh-2.04$ make test

sh-2.04$ make install

Now, lets do ruby gems:

sh-2.04$ cd /tmp/rubygems
sh: cd: /tmp/rubygems: No such file or directory
sh-2.04$ cd /c/tmp
sh-2.04$ cd rubygems
sh-2.04$ export PATH=/c/tools/myruby/bin:$PATH
sh-2.04$ which ruby.exe
/c/tools/myruby/bin/ruby.exe
sh-2.04$ ls
rubygems-0.8.11
sh-2.04$ cd rubygems-0.8.11

sh-2.0.4$ unset RUBYOPT  #If you have cygwin this might be set
sh-2.04$ ruby ./setup.rb
c:\tools\myruby\bin\ruby.exe: no such file to load -- ubygems (LoadError)

Read More

Compiling DBD::mysql and DBD::Oracle on windows

Last week my laptop crashed and while installing the new one I decided to update my perl versions. I mainly use the DBD::mysql and DBD::Oracle modules and although I'm confortable building them on Linux/Unix, like most people I use the Activestate binaries on windows.

However it turns out that Oracle licensing changes now prevent Activestate from distributing an Oracle binary, so I was forced to build them from source. It wasn't easy, but now both the Oracle and MySQL modules are working. Here's the procedure in case it helps anyone.

Install Pxperl

Firstly, you probably want to move to the pxperl windows binaries. Pxperl support the familiar CPAN system for updates. Get Pxperl at www.pxperl.com. The installation should be straight forward.

I installed into c:\tools\pxperl

Install MinGW

You'll need a C compiler capable of building native windows binaries. I used the MinGW system. You can't use cygwin, although I believe that Cygwin might be capable of installing MinGW. Anyway, I got the MinGW system from http://www.mingw.org/. I couldn't use the auto-installer for firewall reasons, so I did a manual download and install.

Firstly, I unpacked the following .gz files into c:\tools\mingw:

  • gcc-java-3.4.2-20040916-1.tar.gz
  • gcc-objc-3.4.2-20040916-1.tar.gz
  • mingw-runtime-3.9.tar.gz w32api-3.5.tar.gz
  • binutils-2.15.91-20040904-1.tar.gz
  • mingw-utils-0.3.tar.gz gcc-core-3.4.2-20040916-1.tar.gz
  • gcc-g++-3.4.2-20040916-1.tar.gz

You probably don't need all of these, and of course the version numbers might be different by the time you read this.

Then I ran the following two executables

  • MSYS-1.0.10.exe
  • msysDTK-1.0.1.exe

...installing both into c:\tools\msys. You must make sure you provide the correct location for MinGW when prompted. Finally, MinGW installs it's own version of perl, so I removed that as well as the make.exe which is inferior.

I added both the bin directories to my path, which now starts something like this:  c:\mysql;c:\tools\msys\1.0\bin; c:\tools\mingw\bin; C:\tools\PXPerl\parrot\bin; C:\tools\PXPerl\bin

Installing DBD::Oracle

Now you can go into cpan (just type CPAN at the command line) and run "Install DBI".  That worked OK for me.

Then I ran "install DBD::Oracle".  That failed.  I can't remember the exact error, but it turns out that a trailing backslash in the include directory for the DBI doesn't work on Windows.  To fix that, run "configure_pxperl" and add an include for that directory in the "Include Directories" section.  For me, the directory was /tools/PXPerl/site/lib/auto/DBI , since I installed pxperl into the tools directory.

Installing DBD::Mysql

For some reason I thought this would be the easy part.  But it actually was really difficult.

In the end, it turns out you need to create your own version of mysqlclient.lib and manually link to that. Check out MySQL Bugs: #8906, for some more details.  Here's the steps that worked for me:

  1. run "install DBD::mysql" from the CPAN prompt
  2. You will get a whole lot of undefined symbol errors which will include the names of the normal mysql client API calls, suffixed with '@4' , '@0' , etc. Make a list of all of these.
  3. Add the missing symbols to the file include/libmysql.def.   
  4. Build your own libmysqlclient library with the following commands (from the directory just above your include directory):
  5. dlltool --input-def include/libmySQL.def --dllname lib/libmySQL.dll --output-lib lib/libmysqlclient2.a -k
  6. Go to the CPAN build area for the DBD-mysql,  for me that was: cd \tools\PXPerl\.cpan\build\DBD-mysql-3.0002
  7.   nmake realclean 
  8. perl Makefile.PL --libs="-L/mysql/lib -lmysqlclient2 -lz -lm -lcrypt -lnsl"
  9. nmake install

And - voila! - you should be OK. The only think you might need to do now is add the top level MySQL directory to your path.  DBD-Mysql wants to find "lib/mysql.dll" so you need to add the directory above that to your path.  I moved all the libraries to c:\mysql\lib and include files to c:\mysql\include, so I added to my path like this:

set PATH=c:\mysql;%PATH%

All done!

Seems to be working OK now for both Oracle and MySQL.  Much more difficult than installing the Activestate binaries but at least now that I'm working from source I can potentially fix bugs although having done it on Linux it's not for the faint hearted (or the incompentent in C++!)

Hopefully pxperl will gain in popularity and as it matures things will work as easily as on Linux.  That would be great.

Read More