Under the hood: logging and start-script

This is the first post in a series I’ll call ‘Under the hood’, explaining work that happened rather in the inside of Loklak and probably got rarely noticed.

It’ll be mostly about how we solved issues, so people can learn from it when they face similar tasks, and what configuration options people should know about.

I’ll start with two rather light toppics: the new logging infrastructure and the changes to the start-script.

Logging

Until recently, all logging in Loklak happened by simply letting loklak write to stdout, which is normally the console output. It then got piped to a text file by the start-script.

cmdline=$cmdline -server -classpath $CLASSPATH org.loklak.LoklakServer >> data/loklak.log 2>&1 & echo $! > data/loklak.pid &;

That is very simple and reliable way to do logging, but it has some major drawbacks:

  • As there’s a lot of logging happening, Loklak would constantly do IO-Operations (that is writing to disk). On modern computers, that is usually a bottleneck which costs alot of CPU-time and constantly triggers the harddrive, which is often very slow. Even with modern SSDs, it’s good to avoid to many IO-Operations.
  • It doesn’t give us any control about the log-files. We can’t set a maximum size of the file. In one case it triggered my system to fill up the whole drive, therefor creating stability problems for the whole system

The interesting part was, that in most places, the logging was already done via a function call which is actually ment to log to a real logging backend. So I searched for some solution which would allow us to use our logging function with a real backend that would handle it smartly, without having to change to much code and tell everyone about it.

Log.getLog().info(“some stuff”);

After some research, I found that logging function we use, actually the one from Jetty, allows to plug in different logging-backends via an abstraction layer called slf4j. All I’d need to do was put the slf4j-jar into the classpath and then add the logging backend I wanted.

After some comparison, I settled with log4j2, a quite mature and efficient system which again proved to be quite easy to use. Put the jars into the classpath, create a config-file and add the path to the config file into java execution line:

cmdline=”$cmdline -server -classpath $CLASSPATH -Dlog4j.configurationFile=$LOGCONFIG org.loklak.LoklakServer >> /dev/null 2>&1 & echo $! > data/loklak.pid &”;

Note that we now pipe output not using the logging system to /dev/null, into the void.

 

So now was the time to see what the logging-backend would offer us. A brief look into our current logging-config:

conf/logs/log-to-file.properties

property.logPath = data
appenders = file

appender.file.type = RollingFile
appender.file.name = LOGFILE
appender.file.fileName = ${logPath}/loklak.log
appender.file.filePattern = ${logPath}/loklak.log-%i.gz
appender.file.layout.type = PatternLayout
appender.file.layout.pattern = %d{yyyy-MM-dd HH:mm:ss.SSS}:%-5level:%C{1}:%t: %msg%n
appender.file.bufferedIO = true
appender.file.bufferSize = 8192
appender.file.immediateFlush = true
appender.file.policies.type=Policies
appender.file.policies.size.type=SizeBasedTriggeringPolicy
appender.file.policies.size.size=10MB
appender.file.strategy.type=DefaultRolloverStrategy
appender.file.strategy.max=3

rootLogger.level = info
rootLogger.appenderRefs = file
rootLogger.appenderRef.stdout.ref = LOGFILE

What we see is, that we specify an appender, a RollingFile appender, and use it to log all output by Loklak by making it the rootLogger.

If we needed to, we could also have different loggers for different java classes and write them to multiple different places. For example we could write to a Database instead of a file.

So what are the important points a Loklak admin should know about? Well it’s the following lines:

rootLogger.level = info

appender.file.bufferedIO = true
appender.file.bufferSize = 8192
appender.file.immediateFlush = false

appender.file.filePattern = ${logPath}/loklak.log-%i.gz
appender.file.policies.size.size=10MB
appender.file.strategy.max=3

What we here specify is to:

  • The logging level: if you are debugging something, you can set it to ‘debug’, if you only care for errors, set it to ‘warn’
  • Use a cache for the log. By default 8KB, it might make sence to even enlarge it on big servers. This makes logging output appear slower in the log.
  • Only write when the cache is full. immediateFlush can be toggled on to always write to file on each logging call. This is very handy for debugging, when you need fast output. Caching with immediate flush toggled on is still faster then writing without caching!
  • When the log-file reaches 10MB, compress it with gzip and rename it to e.g. loklak.log-1.gz. Keep up to three gzipped files around. Here again, it might be desireable to have bigger log-files or more old logs on some system.

Some one-click-deployment hosts like scalingo have their own logging infrastructure, so they can show the output in a web-console. For them we have a second conf-file, making the log print to sdtout again.

To remember: if you want to write something to the log, use either of these:

Log.getLog().info(“some stuff”);

Log.getLog().warn(“some stuff”);

Log.getLog().debug(“some stuff”);

If you want to print a stacktrace, simply add the exception to it:

catch(Exception e){

Log.getLog().warn(“some stuff”, e);

}

That’s about the logging. While not being much work, it should improve performance and save energy on many systems.

Start-script

A second, for me very confusing thing, was that the start-script always reported a successful boot of Loklak, even if it failed directly. This was especially nasty when I started one Loklak installation from one folder, forgot to stop it and tried to start another from a different folder. Of course I wouldn’t see what I expected until looking into the log-file.

The reason for that was that the start-script (start.sh) simple called

cmdline=$cmdline -server -classpath $CLASSPATH org.loklak.LoklakServer >> data/loklak.log 2>&1 & echo $! > data/loklak.pid &;

eval $cmdline

echo loklak server started at port $CUSTOMPORT, open your browser at $LOCALHOST

So to make the script check for a succesful start, I borrowed a trick from the stop-script (stop.sh).

It uses a so called PID-file. Note that in the cmdline, the process ID of Loklak gets written to /data/loklak.pid

Internally, one of the first things Loklak does on startup is

File pid = new File(dataFile, “loklak.pid”);
if (pid.exists()) pid.deleteOnExit();

deleteOnExit tells the java virtual mashine to remove the file when it shuts down. So whenever Loklak shuts down, for whatever reason, the file is gone. (Exception: when the jvm failes unexpectedly. Very rare)

So what the stop-script does is:

  • look into the PID-file for the process ID
  • tell the operating-system to send a shut-down signal to it
  • wait until the PID-file disappears, so we know it shut down

For the startup check I just added another file, startup.tmp, and handle it as follows:

  • Like on the PID-file, Loklak calls deleteOnExit on it
  • When Loklak fails to start up correctly, the file would just disappear
  • When the startup finished correctly, Loklak writes ‘done’ into it

So what the startup-script does is just:

  • Wait until startup.tmp either disappears (and print an error message)
  • Or until it has ‘done’ as content, telling us everything is right

Simple story but very convenient for the user.

A very nice side effect of this is, that our travis build system now checks if the server starts up successfully. If we have some error in the code or the config, we know when creating a push request.

TL;DR

We now have a propper logging backend and startup-script.

If you want to log something in the java code, use

Log.getLog().info(“some stuff”);

or the other variations as mentioned above.

That’s it for the first post. The next one will be about the login system or the new peer features.

Edit

As of today, June 12th 2016, Elasticsearch, whos output wasn’t logged before at all, logs now to the general Loklak.log file.

See https://github.com/loklak/loklak_server/pull/572 for details

Under the hood: logging and start-script