Sunday, April 10, 2016

Mapping a continuous range to a discrete value in Cassandra

In mathematics and computer programs, discretization concerns the process of transferring continuous functions, models, and equations into discrete counterparts.

In my case I want to map a continuous integer range to a discrete value.
For example consider the following maping
[100 .. 299] --> 100
[300 .. 799] --> 300
[800 .. 999] --> 800

I created the following discretization table in Cassandra:

CREATE TABLE range_mapping (
   k int, 
   lower int,
   upper int,
   mapped_value int,
   PRIMARY KEY (k, lower, upper)

Problem is that I can't use a query like select mapped_value from range_mapping where k=0 and ? between lower and uppein Cassandra since there is no support for non-EQ relations on two clustering columns in the same query.
Issuing a query like
select * from range_mapping where k = 0 and lower <= 150 and upper >= 150 allow filtering;
returns an error stating "Clustering column "upper" cannot be restricted (preceding column "lower" is restricted by a non-EQ relation)"

The solution I found was using a combination of a clustering column and a secondary index.
I was inspired by this answer on SO.
I removed the ‘upper’ column from the PK so it is no longer a clustering column, and I added a secondary index over it. This assumes there is no overlap in the continuous ranges so having only 'lower' column in the PK provides uniqueness.
I had to add a ‘dummy’ column with a constant value to be able to use an non-eq operator on the ‘upper’ column.
Now that it’s not a clustering column and it has an index I can use the following table to map continuous ranges to a discrete values.

CREATE TABLE range_mapping (
   k int, 
   lower int,
   upper int,
   dummy int,
   mapped_value int,
   PRIMARY KEY (k, lower)
CREATE INDEX upper_index on range_mapping(upper);
CREATE INDEX dummy_index on range_mapping(dummy);

Put in some data:

INSERT INTO range_mapping (k, dummy, lower, upper, mapped_value) VALUES (0, 0, 0, 99, 0);
INSERT INTO range_mapping (k, dummy, lower, upper, mapped_value) VALUES (0, 0, 100, 199, 100);
INSERT INTO range_mapping (k, dummy, lower, upper, mapped_value) VALUES (0, 0, 200, 299, 200);

Now my updated query works as expected:

select mapped_value from range_mapping where k = 0 and dummy = 0 and lower <= 150 and upper >= 150 allow filtering;

returns the value '100'.

Tuesday, February 18, 2014

Cassandra SSTableSimpleUnsortedWriter and Non-Compact Storage

Patrick Callaghan of Datastax created a sample project showing how to bulk load data into Cassandra (thanks Patrick!).
The 'marker' is cell with values only for the clustering column(s) of the primary key and an empty string for the column value.
I asked him
Why do you need the cql3 row marker in I haven't seen any reference to this pattern before.
Is it mandatory? What happens is you don't add it? 
He replied
The row marker is an important part of the difference between compact and non-compact tables. I am creating a non-compact table and this requires a marker for the different clustering columns.
Q: How much space do I save using Compact Storage?
A: Non-Compact Storage adds 2 bytes of overhead per internal cells. The comparator used for these cells is a CompositeType instead of a single component comparator like UTF8Type
Q: When can I use Compact storage?
A: You can use Compact Storage if your table uses compound primary keys (more than one column in the PK) and you have only one data column, or if you have a table with a single-column primary key.
Q: Is it recommended using Compact Storage?
A: No. Non-Compact is the default option for new tables.    1. The overhead that is further diminished by sstable compression, which is enabled by default since Cassandra 1.1.0    2. Collections require CompositeType comparators, it is highly suggested using Non-Compact Storage for being able to evolve your table with collections in the future.    3.  If your table uses a compound primary key then you can't evolve your table and add more than one data column.
Q: What's the risk of not marking rows with the cql3 marker in SSTable files?
A: I don't know. I wouldn't want to be the first to find out :)
I did bulk load rows without the marker into a Non-Compact Storage table in a PoC and it worked well but I wouldn't want to try it out in production.

Bottom line, which I found surprising since it's almost undocumented, is that it if you use a table with a compound primary key it is best practice (maybe a required) to add an empty cell 'marker' to every row when using SSTableSimpleUnsortedWriter.

Monday, July 1, 2013

Cassandra: Using compound keys with SSTableSimpleUnsortedWriter and sstableloader

I started using Cassandra 1.2.5. I created a keyspace and a table with a compound key using CQL3.
   create keyspace test_keyspace with replication = {'class': 'SimpleStrategy', 'replication_factor':1};
   create table test_table ( k1 bigint, k2 bigint, created timestamp, PRIMARY KEY (k1, k2) ) with compaction = { 'class' : 'LeveledCompactionStrategy' };
My next task was to popuulate the table with a lot of data. I used sstableloader for the task, which uses input created via SSTableSimpleUnsortedWriter. The code sample uses a simple key, not a compound key. I looked at the classes in the org.apache.cassandra.db.marshall package and found CompositeType, which looks like what I should be using. Intuitively I thought that since my key is a compund key then the row key is a CompositeType and that the rest works as in the simple example, so I tried using the following code:
   List<AbstractType<?>> compositeList = new ArrayList<AbstractType<?>>();
   compositeList.add( LongType.instance );
   compositeList.add( LongType.instance );
   CompositeType compositeType = CompositeType.getInstance( compositeList );
   SSTableSimpleUnsortedWriter sstableWriter = new SSTableSimpleUnsortedWriter(
      new File( System.getProperty( "output" ) ),
      new Murmur3Partitioner(),
      64 );
   long timestamp = System.currentTimeMillis();
   long nanotimestamp = timestamp * 1000;
   long k1 = 5L;
   long k2 = 10L;
   sstableWriter.newRow( compositeType.builder().add( bytes( k1 ) ).add( bytes( k2 ) ).build() );
   sstableWriter.addColumn( bytes( "created" ), bytes( timestamp ), nanotimestamp );
I then loaded the sstable files to Cassandra using the command "sstableloader -v -debug test_keyspace/test_table/" The command ends without any indication of a problem, but the table remains empty. I went over the node log file and saw this cryptic exception:
java.lang.RuntimeException: java.lang.IllegalArgumentException
        at org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
        at java.util.concurrent.ThreadPoolExecutor$
Caused by: java.lang.IllegalArgumentException
        at java.nio.Buffer.limit(
        at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(
        at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(
        at org.apache.cassandra.db.marshal.AbstractCompositeType.split(
        at org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(
        at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(
        at org.apache.cassandra.db.filter.QueryFilter.collateColumns(
        at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(
        at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(
        at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(
        at org.apache.cassandra.db.ColumnFamilyStore$3.computeNext(
        at org.apache.cassandra.db.ColumnFamilyStore$3.computeNext(
        at org.apache.cassandra.db.ColumnFamilyStore.filter(
        at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(
        at org.apache.cassandra.service.RangeSliceVerbHandler.executeLocally(
        at org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(
        ... 4 more
I sent a question to the Cassandra user mailing list and got a reply from Aaron Morton which pointed me in the right direction ( I inserted a row manually and I used cassandra-cli to see what the data looks like:
RowKey: 5
=> (column=10:created, value=0000013f84be6288, timestamp=1372321637000000)
From this example see that the row key is a single Long value "5", and it has one composite column "10:created" with a timestamp value. Thus the code should look like this:
   List<AbstractType<?>> compositeList = new ArrayList<AbstractType<?>>();
   compositeList.add( LongType.instance );
   compositeList.add( LongType.instance );
   CompositeType compositeType = CompositeType.getInstance( compositeList );
   SSTableSimpleUnsortedWriter sstableWriter = new SSTableSimpleUnsortedWriter(
      new File( System.getProperty( "output" ) ),
      new Murmur3Partitioner(),
      64 );
   long timestamp = System.currentTimeMillis();
   long nanotimestamp = timestamp * 1000;
   long k1 = 5L;
   long k2 = 10L;
   sstableWriter.newRow( bytes( k1 ) );
   sstableWriter.addColumn( compositeType.builder().add( bytes( k2 ) ).add( bytes( "created" ) ).build(), bytes( timestamp ), nanotimestamp );

Tuesday, June 5, 2012

Generating sender-reciever pairs from a list of names (Secret Santa)

A colleague asked me to create a program, to generate sender-reciever pairs from a list of names, for a game we are playing in the office (resembles "Secret Santa"). Each person should be a sender and a reciever.
Since the colleague is not a develpoer and works with Windows I chose to implement it in JavaScript, so all she'd need is to open a browser.

You can see the code below and give it a try here.

<html> <head> <script type="text/javascript"> function makePairs() { var namesTextArea = document.getElementsByName("listOfNames")[0].value; var names = namesTextArea.split('\n'); if ( names.length % 2 != 0 ) { alert("there is an odd number of names, please make sure the number of names is even"); return false; } var html = ""; var senders = []; var recipients = []; while ( names.length > 0 ) { var first_index = Math.random() * names.length; sender = names.splice(first_index, 1); var second_index = Math.random() * names.length; recipient = names.splice(second_index, 1); html += sender + ":" + recipient + "\n"; senders.push( sender ); recipients.push( recipient ); } // make each recipient a sender and vice versa while ( senders.length > 0 ) { var sender_index = Math.random() * senders.length; newRecipient = senders.splice(sender_index, 1); var recipient_index = Math.random() * recipients.length; newSender = recipients.splice(second_index, 1); html += newSender + ":" + newRecipient + "\n"; } document.getElementsByName("pairs")[0].value = html; return false; } </script> </head> <body> <table> <tr> <td valign="top">Put a list with an even number of names here:</td> <td valign="top"><textarea id="listOfNames" name="listOfNames" cols="30" rows="50"></textarea></td> <td valign="top"><input type="button" value="Create pairs" onclick="javascript:return makePairs()"></td> <td valign="top"><textarea id="pairs" name="pairs" cols="30" rows="50"></textarea></td> </tr> </table> </body> </html>

Monday, March 5, 2012

Setting up a multi tenant environment using the Spring Framework

In this post I'll describe how to set up a multi tenant application using the Spring Framework (I am using Spring 3).
I'll use a pure Java application, but the concepts work well in enterprise apps.

The requirements are simple:
  • Single application to serve all the tenants
  • Every tenant uses the same wiring, each tenant has different properties 
  • Have common properties and wiring that can be shared by all the tenants
    • Allow each tenant to override the common settings
The solution:
  • Create a Spring ApplicationContext for every tenant
    • Set the PropertyConfigurer in runtime
Let's see this in action:

public class MultiTenantSpringExample {
 private ClassPathXmlApplicationContext commonCtx; // use ClassPathXmlApplicationContext instead of ApplicationContext so we can destory them
 private List tenantContexts;

 public void init( List tenants ) {

  tenantContexts = new ArrayList( tenants.size() );
  // create a common application context, shared among all the tenants
  commonCtx = new ClassPathXmlApplicationContext( "/commonContext.xml" );
  // set up all the tenants
  for ( String tenant : tenants ) {
   // for each tenant create a Spring ApplicationContext
   ClassPathXmlApplicationContext tenantCtx = new ClassPathXmlApplicationContext();
   tenantCtx.setParent( commonCtx );
   tenantCtx.setConfigLocation( "/tenantContext.xml" );
   TenantPropertyPlaceholderConfigurer beanFactoryPostProcessor = new TenantPropertyPlaceholderConfigurer( tenant );
   tenantCtx.addBeanFactoryPostProcessor( beanFactoryPostProcessor );
   tenantContexts.add( tenantCtx );
 public void destroy() {
  // destroy the tenant contexts
  for ( ClassPathXmlApplicationContext tenantContext : tenantContexts ) {
  // destroy the common context
 public static void main( String[] args ) {
  MultiTenantSpringExample example = new MultiTenantSpringExample();
  example.init( Arrays.asList( "a", "b" ) );

On line 18 I create the common wiring, that is shared among tenants, and set it as the parent for the tenant application context on line 23.
On line 25-27 I inject the PropertyPlaceholderConfigurer, which is created differently for every tenant.

public class TenantPropertyPlaceholderConfigurer extends PropertyPlaceholderConfigurer {
 public TenantPropertyPlaceholderConfigurer( String tenant ) {
  setIgnoreResourceNotFound( true ); // this makes the common file and tenant file optional
  // prepare the default properties
  String defaultPropertiesResourcePath = "/";

  Resource defaultPropertiesResource = new ClassPathResource( defaultPropertiesResourcePath );
  // prepare tenant properties
  String tenantPropertiesResourcePath = '/' + tenant + ".properties";
  Resource tenantPropertiesResource = new ClassPathResource( tenantPropertiesResourcePath );
  // set the locations
  Resource[] locations = new Resource[] { defaultPropertiesResource, tenantPropertiesResource };
  setLocations( locations );

The TenantPropertyPlaceholderConfigurer uses classpath resources, using a common properties file, shared for all the tenants, and a per-tenant properties file.

To complete the example I created a simple class, A, holding an int, and printing the int in the print() method.
public class A {
 private int i;
 public void setI( int i ) {
  this.i = i;
 public int getI() {
  return i;
 public void print() {
  System.out.println( "Example property: " + i );

And the resource files: commonContext.xml, empty in the example but can be used for sharing wiring among tenants

tenantContext.xml, creates an instance of A for every tenant, each one with different property values, and calls the print() method after constructing the object to print the value.
 <bean class="A" id="a" init-method="print">
  <property name="i" value="${tenant-i}">

And two matching properties files:
tenant-i = 1
tenant-i = 2

When executing the 'main' method the program outputs "1" for tenant 'a' and "2" for tenant 'b'

Tuesday, February 21, 2012

Perforce and diff emails

After working with Subversion, and getting used to the colorful post-commit diff email, I wanted to set up a similar setting in Perforce.
Unfortunately I couldn't find post-commit hooks in Perforce.
So I came up with the following steps to generate the diff on the client side:
  1. Create a "Review" trigger in Perforce for the branches you want to monitor.
    In the visual Perforce client click "Connection" > "Edit current user" > "Reviews" tab, now right click and "Include" every branch you want to be notified about.
    This sends an email about every changelist submitted to Perforce, without specifying the changelist content
  2. Get the jar, or source code, that creates the HTML diff.

    I am aware that there are tools to convert diff 2 HTML, but since my team is using Java (thus they have a JRE), mostly on Windows (so no Python), I wanted a Java convertor.
  3. Use a VBA script in Outlook to convert the triggered email to a diff of the changelist content.
    The script fires up the local Perforce client and uses the command "p4 describe " to create the diff, uses the Java class to convert the diff to HTML and replaces the email body with the HTML diff.
    Open the script below in a text editor:

    Sub P4Diff(MyMail As MailItem)
        Dim strSplit As Variant
        Dim changelist As String
        Dim sOutput As String
        Dim sOutputErr As String
        Dim sP4Cmd As String
        Dim sJavaCmd As String
        Dim sJarPath As String
        Dim sP4Port As String
        Dim sP4User As String
        Dim sP4Password As String
        ' error handling directive
        On Error GoTo errMyErrorHandler
        ' Variables that need to be set per environment
        sP4Cmd = "C:\Progra~1\Perforce\p4.exe"
        sJavaCmd = "C:\Java\jre6\bin\java.exe"
     sJarPath = "C:\SomePath\p4diff.jar"
     sP4Port = "yourperforce:1667"
        sP4User = "youruser"
        sP4Password = "yourpassword"
        ' parse the changelist from the email subject
        strSplit = Split(MyMail.Subject, " ")
        changelist = strSplit(2)
        ' create a shell
        Set wshShell = VBA.CreateObject("WScript.Shell")
        ' set the p4 environment variables
        Set processEnvVars = wshShell.Environment("PROCESS")
        processEnvVars("P4PORT") = sP4Port
        processEnvVars("P4USER") = sP4User
        processEnvVars("P4PASSWD") = sP4Password
        ' execute the diff and wait for it to finish
        Set oExec = wshShell.Exec("%COMSPEC% /c " & sP4Cmd & " describe " & changelist & " | " & sJavaCmd & " -jar " & sJarPath)
        Do While oExec.Status = WshRunning
            If oExec.StdOut.AtEndOfStream = 0 Then
                sOutput = sOutput & oExec.StdOut.ReadLine()
            End If
        If oExec.StdOut.AtEndOfStream = 0 Then
            sOutput = sOutput & oExec.StdOut.ReadLine()
        End If
        ' Read the diff result and set in the email body
        sOutputErr = oExec.StdErr.ReadAll()
        MyMail.HTMLBody = sOutput & sOutputErr
        Exit Sub
      MsgBox Err.Description, _
        vbExclamation + vbOKCancel, _
        "Error: " & CStr(Err.Number)
    End Sub

    Change the script variables to match your environment.

    Install the VBA script:
    1.  In Outlook click "Tools" > "Macro" > "Visual Basic Editor"
    2. Open "ThisOutlookSession"
    3. Paste the edited script from your text editor into the VB editor and save
    4. Close the VBA editor and text editor

    Change the Outlook Macro security so it will be able to run the script

  4. Create a rule in Outlook to run the script when the email subject contains "PERFORCE change".
    For some reason this doesn't work well if you run the script AND move the email to a different folder

Sunday, November 20, 2011

Devoxx 2011

I attended the Devoxx 2011 conference in Antwerp, Belgium (thank you RSA Security).
This is the second large conference I've attended, the first being Java ONE in 2007, and I had a great time.

Stephan Janssenn and the Devoxx team did an excellent job organizing the conference.
The conference was packed with over 3,500 participants, 95% men, situated in Metropolis Antwerp Business Center.
The lecture halls were actually cinema theatres. Even though I haven't seen a movie in them (I missed the "Tintin 3D" feature film) they are hands-down the best cinema theatres I've been to, with more than enough leg room, extremely comfortable seats and arm rests which you don't need to fight over :)
Between lectures the screens showed the twitter wall, which was a brilliant idea I liked very much.

Every participant got a wrist band, very similar to the one I got a few months ago from the maternity ward when my son was born.
The thing is, it has to remain on your wrist until the conference is over. I can see why some people found this annoying but it actually worked in my favor, while sitting in the lobby of the hotel waiting for a taxi to take me to the conference the guy that sat next to me also wore the wristband so we ended up talking and sharing the ride.

Day 1: The Java SE keynote by Henrik Stahl was good. There was no big announcement, but I liked the message "Java will always be there for you".
I got a little bored in Cameron Purdy's Java EE keynote, where he promoted Weblogic and Oracle servers and left before it ended.
The "Play 2.0" talk was very good. I was impressed with the creativeness of Play 1.0 when I started using it, but these guys don't rest. They added innovative features to Play 2.0 continuing to make web development easier and faster.
"7 reasons to love JBoss AS 7" sounded promising but lacked technical details. It felt more like a marketing pitch. But it sounds promising and I'll need to check it out.
I then moved to the "JRuby enhancing Java developers' lives" talk, but it started off showing how to write web applications in Ruby, so I left and joined the "PhoneGap" talk, which was very amusing. I even managed to pick up a thing or two about mobile development, which is something I haven't tried (yet).
Next was "NoSQL for Java developers" which showed how a restaurant directory application would look like in an RDBMS (MySQL), a Key-Value store (Redis), a document store (mongoDB) and a multicolumn DB (Cassandra).
I think it's wrong to demonstrate all of them using the same use case. Each DB was built to solve a specific set of problems. Assuming the goal of the lecture was to show how non-relational DBs defer from relational DBs, then to show the strengths of every DB you need a different use case. E.g. show how your web application sessions would work with a KV store as opposed to RDBMS, how document stores have flexible schemas as opposed to rigid schemas in an RDBMS, and how queries performance differs as you scale out.
My day ended with an excellent talk from Brian Goetz about "Language/Library co-evolution in Java SE 8" which focused on Lambda and Closures, and how the existing JDK libraries might be enhanced to use these new features.

Day 2: The keynote from Tim Bray convinced me I should write mobile apps. I'm pretty sure I'm not gonna save the world, but I'm sure to it will be fun.
In the "Introducing Akka" talk I understood the concept of "Actors" and it sure sounds like a good tool to have in your developer toolbox.
"JMS 2.0" showed that there's not much to change in the JMS spec. If the spec wasn't 10 years old I would name the new spec version 1.2 :)
I then went to "Why we shouldn't target women", which about the state of female developers in the IT world. I was surprised to hear that in France and the UK only 15% of CS graduates are women. It was an interesting discussion without any conclusions.
"Java Posse live" was a comic relief. I got a beer and sat on the stairs to watch the show. I'm not sure this Posse podcast will provide much value to the listeners :)
"Having fun with Java and Home Automation" was a peek at the future. You'll be able to control and follow your house on twitter.
I ended the day with the "Code Generation" talk which got me thinking about the need for code generation in enterprise projects in the annotations era, see what Play! does in this area. I'm not sure it's very useful these days. I did learn a bunch of new stuff from the Xtext, Xtend and Spring Roo code generation demos.

Day 3: The final day began with a technical discussion panel which was interesting mostly because of the cynical remarks from Oracle and Google people.
I decided I had to go to one HTML5 talk so I entered "HTML5 Game Development" which was very good. Animation always make me feel I should brush up my Mathematics.
The last talk was about "Shazam", which is an app that identifies songs by hearing them. Roy van Rijn did a fantastic job explaining how he built a similar application in Java, including a good explanation about Fourier Transformation using a yellow stick. His talk ended with a lively discussion about software patents and violation after getting intimidating emails from Landmark (the company holding the Shazam patents).

Notes for next year:
  • Don't stay at Novotel Antwerp. It has no public transportation and no entertainment in its vicinity.
  • Set aside more time to see Antwerp. I have no idea what the city looks like.
  • Get a larger suitcase. My small trolley almost didn't have room for all the marketing items handed out at the booths.
See you again next year!