Java API for ØMQ

Caution: This document refers to an old version of ØMQ. From version 0.3.1 onwards Java extension is integral part of ØMQ and thus it doesn't have to be downloaded and built separately! Also note that Java API have changed since!

Introduction

This whitepaper describes first version of the Java extension for ØMQ. It is simplified version of ØMQ interface exposed in the form of Java object. Java extension is not yet part of ØMQ package. You have to download it separately (see below) and build it by hand. Any feedback on Java extension is welcome on ØMQ developer's mailing list.

Download

Download Java extension for ØMQ here.

Building it

Download and build ØMQ package:

$ tar -xzf zmq-0.3.tar.gz
$ cd zmq-0.3
$ ./configure
$ make
$ sudo make install

Download un unpack Java extension for ØMQ:

$ tar -xzf Jzmq.tar.gz
$ cd Jzmq

Compile Jzmq class:

$ javac Jzmq.java

Generate JNI headers for the class:

$ javah Jzmq

Compile the extension:

$ g++ -c -fPIC Jzmq.cpp
$ g++ -shared -pthread -o libJzmq.so Jzmq.o libzmq.so

Copy the shared library onto the library path:

$ cp libJzmq.so /usr/lib

Compile the test programs:

$ javac LocalLat.java 
$ javac RemoteLat.java 
$ javac LocalThr.java
$ javac RemoteThr.java

Using it

Java extension's API is currently much simpler when compared to original C++ API. The difference is that Java extension doesn't allow for full control of ØMQ threading as C++ does. Instead, Java extension creates single I/O thread that can be accessed from a single application thread. This doesn't allow for seamless scaling on multicore boxes. However, it is our intent to expose full ØMQ API via Java in the future.

To instantiate ØMQ:

Jzmq obj = new Jzmq (hostname);

Where hostname is name or IP address of the box where zmq_server is running.

To create wiring, createExchange, createQueue and bind functions can be used. For detailed description of how wiring mechanism works have a look here.

int eid = obj.createExchange ("E", Jzmq.SCOPE_GLOBAL, "10.0.0.1:5555");
obj.createQueue ("Q", Jzmq.SCOPE_GLOBAL, "10.0.0.1:5556");
obj.bind ("E", "Q");

Sending a message is pretty straightforward. Message is supplied in form of byte array:

byte msg [] = {1, 2, 3, 4, 5, 6};
obj.send (eid, msg);

Receiving a message is even more simple:

byte [] msg = obj.receive ();

Test results

Tests were performed on two quadcore boxes (Intel Xeon CPU, E5440, 2.83 GHz) connected via direct 1Gb Ethernet link (Intel PRO/1000, PCI Express:2.5GB/s:Width x4). Operating system used was Debian Linux 4.0 (kernel version 2.6.24.7, CONFIG_PREEMPT_VOLUNTARY=y, CONFIG_PREEMPT_BKL=y, CONFIG_HZ=1000).

Latency

End-to-end latency - as measured by LocalLat and RemoteLat - is quite nice. For small messages Java is just couple of microseconds slower than raw C++ program:

Message size C++ Java
1 B 32.7 us 35.62 us
16 B 34.54 us 37.17 us
256 B 42.21 us 43.37 us
4096 B 85.63 us 102.31 us
65536 B 612.99 us 769.7 us

Same values charted on the graph (black line is C++, red line is Java):

java_lat.png

The main performance bottleneck in the Java extension is that message data have to be physically copied between Java heap and JNI heap - the copying happens on both send and receive side. As far as we are aware there's no way to avoid it. In any case, the bottleneck becomes significant only for large messages (i.e. messages over 512 bytes long). For smaller messages, you don't have to worry - copying overhead will be almost unmeasurable.

Throughput

As expected, Java is somehow less efficient than raw C++. Until network limit (1Gb/sec) is reached the throughput is approximately 50% of the C++ throughput. However, once the messages are large enough to exhaust the network (~256 bytes) the throughputs of C++ and Java are exactly the same.

Message size C++ Java
1 B 2,435,820 msgs/sec 1,453,288 msgs/sec
16 B 2,976,623 msgs/sec 1,262,061 msgs/sec
256 B 447,126 msgs/sec 447,494 msgs/sec
4096 B 28,896 msgs/sec 28,907 msgs/sec
65536 B 1,810 msgs/sec 1,810 msgs/sec
java_thr.png

Conclusion

Although Java extension doesn't provide full ØMQ functionality at the moment, performance figures are quite convincing. Latency is only few microseconds above C++ latency (~36 us) and throughput, although worse than in C++, would still allow for decent handling of OPRA feed.