Fluent Interfaces without Recursive Generics

Fluent interfaces are an API design style that allow you to be expressive while being terse. Consider the following pseudo example API to split a string.

Splitter splitter = new Splitter(",");
splitter.setOmitEmptyStrings(true);
splitter.setTrimOutput(true);
List<String> parts = splitter.splitToList(input);

That is four lines packed with a lot of noise. The two setters could have been written as parameter-less methods but I wanted to stick to Javabean specifications here. Now compare this with the fluent version of Splitter from Google Guava.

List<String> parts = Splitter.on(",") //
	.omitEmptyStrings() //
	.trimResults() //
	.splitToList(input);

Still four lines but now each line clearly conveys intent in a very “fluent” dialogue like manner. So armed with an understanding of the elegancy that fluent interfaces bring to the table, you set out to create a fluent version of your new animal API. You have written up the following classes with fluent API:

public class Animal {

    public Animal eat(Food food) {
        food.consumedBy(this);
        return this;
    }
    
    
    public Animal run(int distance) {
        // some logic here
        return this;
    }
}

public class Dog extends Animal {
    
    public Dog bark() {
        // logic
        return this;
    }
    
    
    public Dog chaseTail() {
        // logic here
        return this;
    }
}

Now you attempt to use your newly created fluent API and write up the following client code:

public void someMethod() {
	Animal a = new Dog().eat(bone).bark(); // Cannot resolve method bark ???
}

You are suddenly perplexed. Why can’t you call method bark()? It is because eat returns an instance of Animal, not Dog and so even though the runtime instance created is a Dog, the static type checking only knows that you have an instance of Animal. This particular situation is easily resolved by switching the two methods:

public void someMethod() {
	Animal a = new Dog().bark().eat(bone);
}

In a real world API, we may not have the luxury of switching methods like this (what if the dog really only likes to bark after eating!) and requiring particular ordering of calls makes for an ugly API. The standard solution in such a case is to introduce generics with a particular twist - recursive generics.

public class Animal<A extends Animal<A>> {

    public A eat(int food) {
        //food.consumedBy(this);
        return (A)this;
    }


    public A run(int distance) {
        // some logic here
        return (A)this;
    }
}

public class Dog extends Animal<Dog> {

    public Dog bark() {
        // logic
        return this;
    }


    public Dog chaseTail() {
        // logic here
        return this;
    }
}

// in some method ...
Animal<Dog> dog = new Dog().eat(3).bark();

Problem solved? Well, almost. Until you realize that if you know include Animal as a member of any class, you bring along baggage - in the form of recursive generics that you have to apply to the container class (or method) and this can easily mushroom into very complex generic definitions. There is an alternate way of building fluent interfaces with inheritance but without generics by accepting a slight compromise. We do this by introduce a very specific typing method that allows us to switch the static type.

public class Animal {

    public Animal eat(int food) {
        //food.consumedBy(this);
        return this;
    }


    public Animal run(int distance) {
        // some logic here
        return this;
    }


    // A way to cast our return type into a specific sub-type.
    public <T extends Animal> T typed() {
        return (T)this;
    }
}

public class Dog {
	...
}

// caller
Animal dog = new Dog().eat(3).<Dog>typed().bark();

While not perfect, this technique allows us to switch between methods of the parent or child class and not incur the added design-time overhead of generics.

Comments

Curve Fitting

Here is a simple technique to modify linear data points to a curve of your choice. If your function looks linear like this:

or even has an inverse linear form like this:

And you want to instead map it to a curve that accentuates the initial values and suppresses the tail:

or has an exponential drop at the end points like this:

then, Cubic Bezier curves can be used to achieve the desired transformation. A Cubic Bezier curve is given by:

B(t) = (1 - t)3P0 + 3(1 -t)2tP1 + 3(1 -t)t2P2 + t3P3

Here P0, P3 are the end-points of the curve and P1 and P2 are control points. A Cubic Bezier curve, at the starting point is always tangential to the line segment P0, P1 and in the direction of P0, P1. At the endpoint, the curve is tangential to P2, P3 and is in the direction of P2, P3. You can play around with various Cubic Beziers shapes at desmos.com.

When implementing a Cubic Bezier curve, the first oddity you will encounter is the fact that unlike polynomial equations, the Bezier does not give you y-coordinates as a function of x-coordinates. Instead, it independently gives you x and y coordinates as a function of a parameter t which varies from 0 to 1. When t = 0, the curve is at point P0 and when t = 1, the curve is at point P3. Worse, when you vary t linearly, you do not get linear increments of x. Instead, you get linear increments along the curve’s length. In fact, it is this property of Bezier curves that makes them suitable as easing functions for animations. By constructive a Cubic Bezier with steep starting and ending slopes, you can get a slow start in the x axis followed by an acceleration in the middle and a gradual slowdown towards the end. But for mapping a traditional function of the form y = f(x), we would like to transform the Bezier to a similar function.

There is no simple mathematical way to achieve this given expressions like t2 and t3 in the equation. Instead, one can approximate a solution by doing a binary search along t, computing the corresponding x until we are within a tolerance limit. At that point, the value of y can be computed. The gist below shows a Java program to do that. In addition, the program allows you to pre-compute y = Bezier(x) for x = (start, end) in specified increments.

The class above can be extended to create a composite Cubic Bezier curve fitting algorithm that combines multiple Cubic Bezier curves to model any shape you want to create.

Comments

Groovy Oddities

Groovy is a popular language for the JVM with a syntax that includes and extends Java’s. Groovy’s appeal comes from the extended syntax - optional semicolons, type-inference (strictly speaking, Groovy is typeless), literals for lists and maps, closures, operator overloading, etc. However, these extra features are not always intuitive in their interaction, leading to some syntactic oddities. Here is a list of oddities that I’ve encountered. This is not an exhaustive list. If you have run into other issues, please do leave a commment below.

Null safety in every step

The null-safe operator allows the in-lining of nullability checks. So instead of:

if (student.name != null) {
	lower = student.name.toLowercase()
}

You can write

lower = student.name?.toLowercase()

The variable lower will be null if name is null. Armed with this powerful operator, you set about to show your newly acquired skill. You are writing a function that accepts Account objects. You know that sometimes the account may be null (if a spurious username is provided). However, if an account is non-null, you know that it has a non-null createdOn field. Your task is to get the month from this field. You write something like this:

void accumulateMonths(Account account) {
   def month = account?.createdOn.month
   if (month) {
	  // Do something with month
   }
}

You run the program and upon receiving a null Account object, you see an exception!

Exception in thread "main" java.lang.NullPointerException: Cannot get property 'month' on null object
at org.codehaus.groovy.runtime.NullObject.getProperty(NullObject.java:57)

Why did Groovy not stop when it evaluated account?. ? Why is it complaining about not being able to get the month property? The exception was thrown because we incorrectly interpreted the semantics of the ?. operator. The null-safe operator in fact does not abort the current chain of calls on encountering a null. It is better translated to the construct below.

def a = account == null ? null : account.createdOn
def month = a.month
if (month) {
}

The null-safe operator is a close-cousin of the ternary operator, not the if condition!

Applying null-safe retrieval to map entries.

In Groovy, you can access map entries as map[key] and list entries as list[index]. So if we had a Map<String, List<Integer>>, and we want to make sure a map entry is not null before we access the list, should we be able to use the null-safe operator?

class NullSafeWithMap {

    static void main(def args) {     
        Map<String, List<Integer>> myMap = ['a': [1, 2, 3], 'b': null, 'c': [4, 5, 6]]
        println (myMap['b']?[0]) // Syntax here here.
    }
}

Unfortunately you will get a syntax error since Groovy thinks the ? is part of the ternary operator. This is because the null-safe operator is ?. not just a ? Putting a period after the question-mark will not help either.

myMap['b']?.[0] // Still a syntax error.

If you replace the list subscript operator with the get() method, the code will work:

println (myMap['b']?.get(0))

An equation to get NPE

Here is a brain teaser. What is the output of the program below?

static void main(def args) {
     println 4 + 5 * 3
}

Alright, that’s not much of a brain-teaser. The answer is obviously 19. Turns out, what I wanted was really (4 + 5) * 3 so that I get 27. That’s easy:

static void main(def args) {
	println (4 + 5) * 3
}

Think the answer will be 27? Think again! Lost in the focus on the math equation is the fact that you were able to write println 4 + 5 * 3 in the first place because Groovy lets you drop parenthesis if you have a single argument and the call is unambiguous. So instead of printing out the product of 4 + 5 and 3, Groovy is attempting to multiply the result of println (4 + 5) and 3. println has a return type of void which when forced into a return value becomes null. So at runtime, Groovy throws an exception stating that it cannot call multiply() on a null object.

Operator precedence rules

Groovy supports operator overloading and provides several pre-defined operators to make your code succinct, such as the << operator to add to lists.

List fruits = []
fruits << 'Apples' // fruits = ['Apples']

While writing your code, you come across a case where if citrus is true, you want to add ‘Oranges’ but otherwise you want to add ‘Apples’. Eager to show off your newly acquired Groovy skill, you write the following:

List fruits = []
boolean citrus = ... 
fruits << citrus ? 'Oranges' : 'Applies'

println fruits

As you extol the virtues of Groovy to your colleagues, you look at the output and are horrified to find: [true]

What just happend? Well, turns out the operator precedence puts the left-shift operator at a higher precedence than the ternary operator. So the expression you really wrote is:

(fruits << citrus) ? 'Oranges' : 'Apples'

In Groovy, a list has a true value if it is non-null and non-empty. So ‘Oranges’ was selected as the value of the expression and sent to … /dev/null? To fix the issue, you have to write

fruits << (citrus ? 'Oranges' : 'Apples')

Block scoped local variables

Seasoned programmers develop habits to prevent certain types of errors. One of those habits is to use blocks to scope local variables so that copy-paste of the code (hopefully confined to some hastily written unit test) doesn’t result in redefinition of the variables, which invariably leads to an attempt to rename the variables, which occasionally results in one rename being left off, which results in a hard-to-find bug. So we write

{
   int a = 5;
   callSomeMethod(a);
}
{
   int a = 10;
   callSomeMethod(a);
}

There are more legitimate uses for the block-scope in methods, such as to ensure that logic that follows does not rely on some previous variable. When translated to Groovy, Java code with blocks fail because Groovy thinks that you have defined closures and therefore skips over them. The workaround?

1.times {
   int a = 5;
   callSomeMethod(a);
}
1.times {
   int a = 10;
   callSomeMethod(a);
}

The above construct is arguably a hack, but it works nevertheless.

Map literal oddities

Groovy supports map literals. This allows you to create a map with a compact and readable syntax.

Map<String, Integer> myMap = ['a': 25]
myMap['b'] = 35
int d = myMap['a']

Armed with this new-found knowledge, you proceed to construct a map as follows.

String name = 'joe'
Map<String, Integer> myMap = [name: 25]
...
int value = myMap[name]

You run your program to discover a runtime exception saying that null cannot be converted to int! Upon further inspection with a debugger, you discover that your map is ['name': 25] not ['joe': 25]. What happened? Turns out you can’t have variables in a map initialization. To achieve the intent above, you have to use a GString variable or enclose the variable in parenthesis.

String name = 'joe'
Map<String, Integer> myMap = ["$name": 25] // or [(name): 25]

Map initialization cannot take the return values of functions or enums either. These also need to be turned into a GString expression or wrapped in parenthesis.

Primitive arrays

If you argue that Java’s primitive arrays are an aberration, you win. Now lets deal with with the real world problem presented by the fact that many java api have been written to accept primitive arrays. Since Groovy uses [] to represent lists, how do you define a primitive array instead?

Given the java method

public class SomeClass {

    public static void soStuff(int[] values) {
        for (int i : values) {
            System.out.println(i);
        }
    }
}

the following Groovy code will fail.

def values = [1, 2, 3]
SomeClass.soStuff(values) 

To coerce values into a primitive array, use the as operator with <type>[] type definition.

def values = [1, 2, 3]
SomeClass.soStuff(values as int[])

Type-lessness

Groovy supports duck typing. Yes, but what kind of duck allows this?

static int abc = new AtomicInteger()

Remember that in Groovy, type declarations in your code are completely ignored. So the compiler will happily compile the above. Groovy 2.0 introduced specific annotations, @TypeChecked and @CompileStatic to alleviate some of these bugs. There are some caveats to be aware of when using these annotations. Refer to this presentation for a good overview.

BigDecimal everywhere

Sometimes features built with good intent can have unintended consequences. The next three topics present examples of such features in Groovy. Groovy converts all division to BigDecimal. This is especially useful in financial services apps. However, it converts all division to BigDecimal as I found out. I was writing code to determine the number of threads to use to process a list of objects. I wanted the thread count to be one-tenth the list size but with a minimum of 1 and a maximum of 50.

int threads = Math.max(1, Math.min(list.size() / 10, 50))

When I ran the program, I got an exception stating, Cannot resolve which method to invoke for [class java.math.BigDecimal, class java.lang.Integer] due to overlapping prototypes between ...

Uh? I was expecting integer division and Groovy instead offered a BigDecimal and couldn’t find an appropriate Math.min() method. The code needs to be rewritten to

int threads = Math.max(1, Math.min((list.size() / 10).intValue(), 50))

Groovy Truth

Groovy evaluates truth as many things - null objects are false, empty lists are false, zero is false. Takes you back to the days of C/C++. People bemoaned the unexpected evaluations of clever expressions in C/C++. Java came along and eschewed all cleverness. People bemoaned the verbose conditional expressions in Java. Groovy came back with C/C++ style truth evaluation.

In a payment app, we want to make sure the amount has been defined. So we write a test of nullability.

if (!amount) {
	throw new IllegalArgumentException(...
}

Except somehow, the amount ends up being exactly zero (lets say this is the amount to be charged and a coupon made the amount zero). Well, the code unexpectedly fails.

Safe-navigation operator

Groovy allows you to test for nullability and continue an operation by using the safe-navigation operator ?. This is an operator that is frequently abused (its use masquerading for thoughtful analysis of nullability) and sometimes results in side-effects too.

Consider the following condition

if (!someObject.someProperty?.taxable) {
}

The author’s intent was that if taxable is false, some piece of code should be executed. However, if someProperty itself is null, in this particular case, the code should not have been executed. The author realized that someProperty is a nullable field and therefore came up with the above construct. If someProperty is null, the expression evaluates to true! The actual condition required was

if (someObject.someProperty && someObject.someProperty.taxable) {
}

Language philosophy

Groovy introduced a number of new language constructs to the JVM and offered a terse and powerful language to program in. Groovy is particularly suited to unit-tests and DSLs and frameworks such as Grails. What is not apparent in the power of Groovy is a trade-off that has been made between discipline and power. Programmers can abuse the power of Groovy with greater ease than Java and the result is bad code that is tersely defined versus bad code that is verbosely defined. The ease of use of maps and lists in Groovy leads to their excessive use often at the expense of proper Object-Orientedness.

Its been an interesting journey since the days of C, C++. C and C++ allowed all kinds of clever programming tricks and shortcuts. Developers grew tired of the side-effects of the tricks and unfortunately, at least with Java, the language designers pivoted to the other extreme. Groovy in some respects represents a swing back to the C/C++ paradigm. Language designers are now attempting to strike the right balance between concise expression vs. correctness, static type vs. verbosity and terseness vs. comprehensibility. Some of the newer languages like Dart, Swift, Kotlin and Ceylon exemplify such a design trend. Ceylon in particular is an exciting JVM language that is based on the philosophy that code is read more often than written. Ceylon has one of the most exciting type-systems and I for one am eagerly looking forward to using it.

Comments

High Performance Postgresql in Java with Pedal

Overview

Postgresql is one of the most popular open-source databases in use. It has good standards support and an impressive feature set - rich data types and advanced runtime management. The Pedal framework for Java enables fast inserts (we are talking orders of magnitude faster) into Postgresql through the Copy command directly at the JPA entity level. In this article, we provide an overview of the Pedal framework and show how to use the Copy command.

The Pedal framework consists of three libraries.

  1. pedal-dialect
  2. pedal-tx
  3. pedal-loader

pedal-dialect enables dialect (i.e., database) and provider (e.g., Hibernate) level features such as retrieval of schema name, mapped table name given a JPA entity, user-types (arrays, bit strings), etc. It also provides support for the Copy command in Posgresql. pedal-tx is a Java-8 only framework that allows for transaction demarcation using Java 8 lambdas, transaction attached storage and transaction attached pre/post commit lambdas. In addition it also provides a DAO abstraction layer with support for fluent JQL/HQL and native queries. With pedal-loader data population scripts for db-unit tests can be written in a Groovy DSL while working at the JPA entity level (with mapped column types, object-level foreign-keys, etc).

Using the Copy Command

To enable copy support, first create an instance of com.eclecticlogic.pedal.dialect.postgresql.CopyCommand, ideally setup as a Spring-bean. It requires access to com.eclecticlogic.pedal.provider.ProviderAccessSpi which can be configured by creating a com.eclecticlogic.pedal.provider.hibernate.HibernateProviderAccessSpiImpl.HibernateProviderAccessSpiImpl passing it a reference to the EntityManagerFactory. Using Spring with Java-based configuration, the code would look like this:

    @Bean
    HibernateProviderAccessSpiImpl hibernateProvider(EntityManagerFactory factory) {
        HibernateProviderAccessSpiImpl impl = new HibernateProviderAccessSpiImpl();
        impl.setEntityManagerFactory(factory);
        return impl;
    }

    @Bean
    public CopyCommand copyCommand(ProviderAccessSpi provider) {
        CopyCommand command = new CopyCommand();
        command.setProviderAccessSpi(provider);
        command.setConnectionAccessor(new TomcatJdbcConnectionAccessor());
        return command;
    }

The connection accessor is specific to the connection pool you are using. It is used to get a handle to the underlying JDBC-4 compliant Postgresql native connection. Pedal-dialect ships with support for the following connection pools:

  1. BoneCP
  2. Commons DBCP 2
  3. Hikari
  4. Tomcat JDBC

To create one for an unsupported connection pool, implement the com.eclecticlogic.pedal.connection.ConnectionAccessor interface.

So what does the code to actual insert rows using the CopyCommand look like? Its simple. Create instance of your entity and copy them into a CopyList<T>. Then pass the CopyList instance to the copy command. Here is the test code in pedal-dialect:

        CopyList<ExoticTypes> list = new CopyList<>();

        // The copy-command can insert 100k of these per second.
        for (int i = 0; i < 10; i++) {
            ExoticTypes et = new ExoticTypes();
            et.setLogin("copyCommand" + i);
            BitSet bs = new BitSet(7);
            bs.set(1);
            bs.set(3);
            bs.set(4);
            et.setCountries(bs);
            et.setAuthorizations(Sets.newHashSet("a", "b", "b", "c"));
            if (i != 9) {
                et.setScores(Lists.newArrayList(1L, 2L, 3L));
            } else {
                et.setScores(Lists.<Long> newArrayList());
            }
            et.setStatus(Status.ACTIVE);
            et.setCustom("this will be made uppercase");
            list.add(et);
        }

        copyCommand.insert(entityManager, list);

The CopyCommand supports a subset of JPA features. Check here for the current supported feature set.

For a moderate row size (100 to 200 bytes), 100k inserts using EntityManager.persist() took 34 seconds while the CopyCommand took 1.4 seconds! That is a 24x speed-up. Make sure you read up on the limitations of the copy feature in Postgresql. In addition, the Copy command will bypass JPA/Hibernate interceptors. So don’t expect Envers to work with it.

Comments

SMTP-Bit - An email protocol to fight spam using Bitcoins.

Core Idea

Email spams and spoofs are a constant nuisance for all. While filters grow in sophistication every year, the fraud perpetrators are always devising new ways around it. So here is a simple proposal to put an end to spam and scam artists (or at least make it an expensive proposition). Proof-of-work (e.g. Hashcash) and Bitcoin based spam filters have been proposed before. This protocol is simply a refinement of the latter.

Email is perhaps the most pervasively used protocol on the Internet with approx 200 billion sent per day. It is unrealistic to expect the installed infrastructure supporting emails to be upgraded to a new protocol easily. Therefore any solution must work with the existing SMTP protocol and offer an option to upgrade to a newer protocol gradually.

The SMTP-Bit protocol associates wallets with email addresses to protect against spam. An email message is required to identify the wallet associated with the sender’s email, a signature to prove that the sender owns the wallet and the public key for the wallet. Before accepting an email, the server checks the public ledger for a transaction from the sender’s wallet to the recipient’s wallet (the transaction needs to transfer a minimum of 1 Satoshi). If it finds such a transaction, the email is accepted. Otherwise it is rejected. To make the verification of the transaction from the public ledger more efficient, the email message can include the transaction id or the receiver’s mail server can store a cache of transactions involving the recipient’s wallet. The transactions can be specially tagged to make them easier to locate from the public ledger.

With such a setup, the bitcoin transaction from party A to party B serves as a trust identifier. If party A uses a different email address, they can continue to communicate with party B based on the previously established trust relationship. Spam filters based on explicit trust (called permission based filters) have been tried before. ChoiceMail is one such example. The downfall of these filters has always been the need to trust a third-party with confidential information. The Bitcoin protocol with its trust-less blockchain algorithm removes this hurdle.

Bootstrapping

For such a protocol to work, we need to provide an efficient mechanism for the recipient’s wallet to be discovered. We would ideally like the registry to be publicly accessible and not under any single party’s control. The Bitcoin transaction ledger can serve as such a registry. To associate a wallet with an email address, the user could be required to tag a transaction from the wallet to any other wallet with the email address (or multiple addresses). Such a transaction could be part of the process of setting up an email account in the first place. It may seem that having a publicly accessible email address would be folly. However, if the protocol is successful at stopping spam, such a public registry would not prove detrimental and in fact could promote legitimate communication.

The protocol will face a chicken-n-egg hurdle to get going. One will not have an incentive to setup an SMTP-Bit enabled email address while others haven’t. It would be rather lonely. However, if the value proposition of not having to spam is worthwhile, it may prove to be sufficient an incentive for multitudes to join.

Spammers and Spoofers

A spammer will need to send you a Satoshi before he/she can send you an unsolicited email. This makes it somewhat expensive to send out bulk emails. Upon receiving the spam, the recipient can simply add the sender’s wallet to a black-list thereby prohibiting future spam.

Domain level wallets

The protocol can be extended to allow the association of a domain-level wallet (perhaps as a TXT record in the DNS entry). This will allow corporations to establish trust with individuals or between corporations without having to enable it on a per-email address basis.

Subscriptions

Marketing departments wanting to send you emails would need to establish a transaction to your wallet. You can unsubscribe by simply blocking their wallet.

High Profile Emails

The SMTP-bit protocol could also open up celebrity emails to the general public. Lets say - I’m going to pick a dead, not-generally-well-known person - Geerhardus Vos wanted his email to be publicly available but he wanted to restrict people from sending him a flood of emails. He could set up a higher transaction threshold for unknown wallets and create a white-list of wallets of his known friends and contacts (folks would then want to give a celebrity their wallet address when making an acquaintance). Nothing prevents an average Joe from setting a higher threshold either - chances are folks will not be willing to pay a higher premium to email your average Joe (or Jane!).

Loss or compromise of Wallet

If your wallet is compromised, the situation is not too different from your email address being compromised. You simply have to ask others to black-list your old wallet and re-establish trust against your new wallet.

Other Considerations

When all 22 million bitcoins are issued, miners will need to be supported by transaction fees. In such a setup, 1 Satoshi may not suffice. A simple solution then is to use 2 Satoshis and pay out 1 Satoshi as the transaction fee.

Comments