Monitoring Support

Please comment and possibly extend this page!

Why ØMQ does need some introspection capabilities

First a text about logging and monitoring large (and distributed) systems, to lay ground for the discussion. This article that started it all for me: http://highscalability.com/log-everything-all-time

Other sources (aka books, articles, stories) suggest the same, see:

  • Scalable Internet Architectures, Theo Schlossnagle, ISBN 0-672-32699-X
  • Web Operations, John Allspaw & Jesse Robins, ISBN 978-1-449-37744-1

There are much more sources on the WEB showing how important a good monitoring is. Most of the articles published on how a system survided something start with the mentioning of the monitoring that detected something fishy and notifed the ops.

Executive summary:

Gather as much information as possible on what the system does.

To reach this goal we need a way to see what ØMQs idea of the current state is. The following attributes should be made accessible for this purpose:

  • ØMQ socket state (blocked, over HWM …)
  • ØMQ queue lengths on a socket
  • Messages sent/received since last query (needs defined start)
  • Number of underlying OS level sockets/connections

Also frequently requested are the following bits of information:

  • Endpoints (ip, port) of the underlying OS level sockets
  • Buffer sizes of the OS level sockets

With the advent of messaging the necessity of monitoring the flow of messages also rises. It doesn't make sense to closely watch the database queries but ignore the path where the requests and results are transmitted on.

Another important point is the angst of admins. If they can't see what something is doing, they reject it.

Comments: 1

Add a New Comment