Technical stuff: 2016

In mathematics and computer programs, discretization concerns the process of transferring continuous functions, models, and equations into discrete counterparts.

In my case I want to map a continuous integer range to a discrete value.
For example consider the following maping
[100 .. 299] --> 100

[300 .. 799] --> 300

[800 .. 999] --> 800

I created the following discretization table in Cassandra:

CREATE TABLE range_mapping (
k int,
lower int,
upper int,
mapped_value int,
PRIMARY KEY (k, lower, upper)
);

Problem is that I can't use a query like select mapped_value from range_mapping where k=0 and ? between lower and upper in Cassandra since there is no support for non-EQ relations on two clustering columns in the same query.
Issuing a query like
select * from range_mapping where k = 0 and lower <= 150 and upper >= 150 allow filtering;
returns an error stating "Clustering column "upper" cannot be restricted (preceding column "lower" is restricted by a non-EQ relation)"

The solution I found was using a combination of a clustering column and a secondary index.
I was inspired by this answer on SO.

I removed the ‘upper’ column from the PK so it is no longer a clustering column, and I added a secondary index over it. This assumes there is no overlap in the continuous ranges so having only 'lower' column in the PK provides uniqueness.

I had to add a ‘dummy’ column with a constant value to be able to use an non-eq operator on the ‘upper’ column.

Now that it’s not a clustering column and it has an index I can use the following table to map continuous ranges to a discrete values.

CREATE TABLE range_mapping (
k int,
lower int,
upper int,
dummy int,
mapped_value int,
PRIMARY KEY (k, lower)

);

CREATE INDEX upper_index on range_mapping(upper);

CREATE INDEX dummy_index on range_mapping(dummy);

Put in some data:

INSERT INTO range_mapping (k, dummy, lower, upper, mapped_value) VALUES (0, 0, 0, 99, 0);

INSERT INTO range_mapping (k, dummy, lower, upper, mapped_value) VALUES (0, 0, 100, 199, 100);

INSERT INTO range_mapping (k, dummy, lower, upper, mapped_value) VALUES (0, 0, 200, 299, 200);

Now my updated query works as expected:

select mapped_value from range_mapping where k = 0 and dummy = 0 and lower <= 150 and upper >= 150 allow filtering;

returns the value '100'.

Technical stuff

Sunday, April 10, 2016

Mapping a continuous range to a discrete value in Cassandra

Followers

Blog Archive

About Me