Writing Simple Unit-Tests with JUnit

In the Loklak Server project, we use a number of automation tools like the build testing tool ‘TravisCI’, automated code reviewing tool ‘Codacy’, and ‘Gemnasium’. We are also using JUnit, a java-based unit-testing framework for writing automated Unit-Tests for java projects. It can be used to test methods to check their behaviour whenever there is any change in implementation. These unit-tests are handy and are coded specifically for the project. In the Loklak Server project it is used to test the web-scrapers. Generally JUnit is used to check if there is no change in behaviour of the methods, but in this project, it also helps in keeping check if the website code has been modified, affecting the data that is scraped.

Let’s start with basics, first by setting up, writing a simple Unit-Tests and then Test-Runners. Here we will refer how unit tests have been implemented in Loklak Server to familiarize with the JUnit Framework.


Setting up JUnit with gradle is easy, You have to do just 2 things:-

1) Add JUnit dependency in build.gradle

Dependencies {

. . .

. . .<other compile groups>. . .

compile group: 'com.twitter', name: 'jsr166e', version: '1.1.0'

compile group: 'com.vividsolutions', name: 'jts', version: '1.13'

compile group: 'junit', name: 'junit', version: '4.12'

compile group: 'org.apache.logging.log4j', name: 'log4j-1.2-api', version: '2.6.2'

compile group: 'org.apache.logging.log4j', name: 'log4j-api', version: '2.6.2'

. . .

. . .



2) Add source for ‘test’ task from where tests are built (like here).

Save all tests in test directory and keep its internal directory structure identical to src directory structure. Now set the path in build.gradle so that they can be compiled.

sourceSets.test.java.srcDirs = ['test']


Writing Unit-Tests

In JUnit FrameWork a Unit-Test is a method that tests a particular behaviour of a section of code. Test methods are identified by annotation @Test.

Unit-Test implements methods of source files to test their behaviour. This can be done by fetching the output and comparing it with expected outputs.

The following test tests if twitter url that is created is valid or not that is to be scraped.


 * This unit-test tests twitter url creation



public void testPrepareSearchURL() {

String url;

String[] query = {"fossasia", "from:loklak_test",

"spacex since:2017-04-03 until:2017-04-05"};

String[] filter = {"video", "image", "video,image", "abc,video"};

String[] out_url = {



"and other output url strings to be matched…..."


// checking simple urls

for (int i = 0; i < query.length; i++) {

url = TwitterScraper.prepareSearchURL(query[i], "");

//compare urls with urls created

assertThat(out_url[i], is(url));


// checking urls having filters

for (int i = 0; i < filter.length; i++) {

url = TwitterScraper.prepareSearchURL(query[0], filter[i]);

//compare urls with urls created

assertThat(out_url[i+3], is(url));




Testing the implementation of code is useless as it will either make code more difficult to change or tests useless  . So be cautious while writing tests and keep difference between Implementation and Behaviour in mind.

This is the perfect example for a simple Unit-Test. As we see there are some points, which needs to be observed like:-

1) There is a annotation @Test .

2) Input array of query which is fed to the method TwitterScraper.prepareSearchURL() .

3) Array of urls out_url[], which are the expected urls to output.

4) asserThat() to compare the expected url (in array out_url[]) and the output url (in variable ‘url’).

NOTE: assertEquals() could also be used here, but we prefer to use assert methods to get error message that is readable (We will discuss about this some time later)

And the TestRunner

When we are working on a project, It is not feasible to run tests using gradle as they are first built  (else verified whether tests are build-ready) and then executed. gradle test shall be used only for building and testing the tests. For testing the project, one shall set-up TestRunner. It allows to run specific set of tests, one wants to run.

TestRunners are built once using gradle (with other tests) and can be run whenever you want. Also it is easy to stack up the test classes you want to run in SuiteClasses and @RunWith to run SuiteClasses with the TestRunner.

In loklak server, TestRunner runs the web-scraper tests. They are used by developers to test the changes they have made.

This is a sample TestRunner, code link here .

package org.loklak;

// Library classes imported

import org.junit.runner.RunWith;

import org.junit.runners.Suite;

// Source files to be tested

import org.loklak.harvester.TwitterScraperTest;

import org.loklak.harvester.YoutubeScraperTest;


* TestRunner for harvesters







public class TestRunner {



You can also add TestRunners for different sections of the project. Like here it is initialized only to test harvesters.

To run the TestRunner

Add classpath of the jar file of the project and run ‘JUnitCore’ with TestRunner to get output on terminal.

java -classpath .:build/libs/<yourProject>.jar:build/classes/test org.junit.runner.JUnitCore org.loklak.TestRunner

In the project we have set up a shell script to run the tests.

Few points

1) Build the project and tests separately. Build tests only when changed as they take time to be built and executed.

2) Whenever you are done with the coding part, run the tests using TestRunner.

3) Write unit-tests whenever you add a new feature to the project to keep it up-to-date.

Now lets end up here.

So for now, Code it, Test it and Repeat.


Writing Simple Unit-Tests with JUnit

Displaying error notifications in whatsTrending? app

The issue I am solving in the whatsTrending app is to display error notifications when the date fields and the count field are not validated and when a user enters invalid data. Specifically we want to display error notifications for junk values and dates with formats other than YYYY-MM-DD and any other invalid data in the whatsTrending app’s filter option.

The whatsTrending app is a web app that shows the top trending hashtags of twitter messages in a given date range using tweets collected by the loklak search engine. Users can also limit the number of top hash tags they want to see and use filters with start and end dates.

App to know trending hashtags on twitter

What is the problem? The date fields and the count field are not validated which means junk values and date with formats other than YYYY-MM-DD do not show any error.

So how can the problem be solved? Well the format (pattern) of the date can be verified by regular expression. A regular expression describes a pattern in a given text.So the format checking problem can be described as finding the pattern YYYY-MM-DD in the input date where Y, M and D are numbers.The Regex should specify that the pattern should be present at the beginning of the text.

More detailed information about regex can be found here.

The regex for this pattern is :


The pattern says there should be 4 numbers followed by ‘-’ then two numbers then again ‘-’ and then again two numbers.

This can be implemented the following way :

$scope.isValidDate = function(dateString) {
        var regEx = /^\d{4}-\d{2}-\d{2}$/;
        if (dateString.match(regEx) === null) {
            return false;

        dateComp = dateString.split('-');
        var i=0;
        for (i=0; i<dateComp.length; i++) { dateComp[i] = parseInt(dateComp[i]); } if (dateComp.length > 3) {
            return false;

        if (dateComp[1] > 12 || dateComp[1] <= 0) { return false; } if (dateComp[2] > 31 || dateComp[2] <= 0) { return false; } if (((dateComp[1] === 4) || (dateComp[1] === 6) || (dateComp[1] === 9) || (dateComp[1] === 11)) && (dateComp[2] > 30)) {
            return false;

        if (dateComp[1] ===2) {
            if (((dateComp[0] % 4 === 0) && (dateComp[0] % 100 !== 0)) || (dateComp[0] % 400 === 0)) {
                if (dateComp[2] > 29) {
                    return false;
            } else {
                if (dateComp[2] > 28) {
                    return false;

        return true;

So the first part of the code checks for the above mentioned pattern in the input. If not found it returns false.If found then we split the entire date into a list containing year, month and day and the remaining part if any is removed.Each component is converted to integer.Then further validation is done on the month and day as can be seen from the code above.The range of the month and date is checked.Also leap year checking is done.

In the same way the count field is also validated. The regex for this field is much simpler. We just need to check that the input consists only of numbers and nothing else.
So the regex for this is :


This means repetition of digits in the range 0-9.We search for this pattern in the text. If found we return true else false.The function for this is as follows:

$scope.isNumber = function(numString) {
        var regEx = /^[0-9]+$/;
        return String(numString).match(regEx) != null;

Next we need to call these function and see if their is any error. If there is an error we need to display it.This can be done using a modal. Bootstrap has got an inbuilt modal which can be invoked using javascript.

Showing error using modal

So at first we need to define the modal and its content (empty if necessary as in this case)using HTML.The HTML code for this can be found here.

A small yet nice tutorial on Bootstrap modal can be found here
Next we need to set the content of the modal and invoke it from our JS file on encountering an error.

$scope.displayErrorModal = function(val, type) {
        if (type === 0) {
            if (!$scope.isValidDate(val)) {
                $scope.loading = false;
                $('.modal-body').html('Please enter valid date in YYYY-MM-DD format'); 
                return false; 
         } else { 
             if (!$scope.isNumber(val)) { 
                 $scope.loading = false; 
                 $('.modal-body').html('Please enter a valid number'); 
                 return false; 
         return true; 

The above function accepts a parameter val and another parameter type.The parameter type tells what validation needs to be performed, date validation or number validation and calls previous two methods accordingly and passes val which is the value to validated.If any of the validation fails then it sets the content of the modal using : $(‘.modal-body’).html(“your content”) and then invokes it using : $(‘#modalID’).modal(‘show’). This displays a nice modal on the page and the user is notified about the error.

So this is it for this post.Thanks for reading it.My next post will be on fixing the design of the boilerplate app.

Displaying error notifications in whatsTrending? app

Generating a documentation site from markup documents with Sphinx and Pandoc

Generating a fully fledged website from a set of markup documents is no easy feat. But thanks to the wonderful tool sphinx, it certainly makes the task easier. Sphinx does the heavy lifting of generating a website with built in javascript based search. But sometimes it’s not enough.

This week we were faced with two issues related to documentation generation on loklak_server and susi_server. First let me give you some context. Now sphinx requires an index.rst file within /docs/  which it uses to generate the first page of the site. A very obvious way to fill it which helps us avoid unnecessary duplication is to use the include directive of reStructuredText to include the README file from the root of the repository.

This leads to the following two problems:

  • Include directive can only properly include a reStructuredText, not a markdown document. Given a markdown document, it tries to parse the markdown as  reStructuredText which leads to errors.
  • Any relative links in README break when it is included in another folder.

To fix the first issue, I used pypandoc, a thin wrapper around Pandoc. Pandoc is a wonderful command line tool which allows us to convert documents from one markup format to another. From the official Pandoc website itself,

If you need to convert files from one markup format into another, pandoc is your swiss-army knife.

pypandoc requires a working installation of Pandoc, which can be downloaded and installed automatically using a single line of code.


This gives us a cross-platform way to download pandoc without worrying about the current platform. Now, pypandoc leaves the installer in the current working directory after download, which is fine locally, but creates a problem when run on remote systems like Travis. The installer could get committed accidently to the repository. To solve this, I had to take a look at source code for pypandoc and call an internal method, which pypandoc basically uses to set the name of the installer. I use that method to find out the name of the file and then delete it after installation is over. This is one of many benefits of open-source projects. Had pypandoc not been open source, I would not have been able to do that.

url = pypandoc.pandoc_download._get_pandoc_urls()[0][pf]
filename = url.split(‘/’)[-1]

Here pf is the current platform which can be one of ‘win32’, ‘linux’, or ‘darwin’.

Now let’s take a look at our second issue. To solve that, I used regular expressions to capture any relative links. Capturing links were easy. All links in reStructuredText are in the same following format.

`Title <url>`__

Similarly links in markdown are in the following format


Regular expressions were the perfect candidate to solve this. To detect which links was relative and need to be fixed, I checked which links start with the \docs\ directory and then all I had to do was remove the \docs prefix from those links.

A note about loklak and susi server project

Loklak is a server application which is able to collect messages from various sources, including twitter.

SUSI AI is an intelligent Open Source personal assistant. It is capable of chat and voice interaction and by using APIs to perform actions such as music playback, making to-do lists, setting alarms, streaming podcasts, playing audiobooks, and providing weather, traffic, and other real time information

Generating a documentation site from markup documents with Sphinx and Pandoc

Using NodeBuilder to instantiate node based Elasticsearch client and Visualising data

As elastic.io mentions, Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. But in many setups, it is not possible to manually install an Elasticsearch node on a machine. To handle these type of scenarios, Elasticsearch provides the NodeBuilder module, which can be used to spawn Elasticsearch node programmatically. Let’s see how.

Getting Dependencies

In order to get the ES Java API, we need to add the following line to dependencies.

compile group: 'org.elasticsearch', name: 'securesm', version: '1.0'

The required packages will be fetched the next time we gradle build.

Configuring Settings

In the Elasticsearch Java API, Settings are used to configure the node(s). To create a node, we first need to define its properties.

Settings.Builder settings = new Settings.Builder();

settings.put("cluster.name", "cluster_name");  // The name of the cluster

// Configuring HTTP details
settings.put("http.enabled", "true");
settings.put("http.cors.enabled", "true");
settings.put("http.cors.allow-origin", "https?:\/\/localhost(:[0-9]+)?/");  // Allow requests from localhost
settings.put("http.port", "9200");

// Configuring TCP and host
settings.put("transport.tcp.port", "9300");
settings.put("network.host", "localhost");

// Configuring node details
settings.put("node.data", "true");
settings.put("node.master", "true");

// Configuring index
settings.put("index.number_of_shards", "8");
settings.put("index.number_of_replicas", "2");
settings.put("index.refresh_interval", "10s");
settings.put("index.max_result_window", "10000");

// Defining paths
settings.put("path.conf", "/path/to/conf/");
settings.put("path.data", "/path/to/data/");
settings.put("path.home", "/path/to/data/");

settings.build();  // Buid with the assigned configurations

There are many more settings that can be tuned in order to get desired node configuration.

Building the Node and Getting Clients

The Java API makes it very simple to launch an Elasticsearch node. This example will make use of settings that we just built.

Node elasticsearchNode = NodeBuilder.nodeBuilder().local(false).settings(settings).node();

A piece of cake. Isn’t it? Let’s get a client now, on which we can execute our queries.

Client elasticsearhClient = elasticsearchNode.client();

Shutting Down the Node


A nice implementation of using the module can be seen at ElasticsearchClient.java in the loklak project. It uses the settings from a configuration file and builds the node using it.

Visualisation using elasticsearch-head

So by now, we have an Elasticsearch client which is capable of doing all sorts of operations on the node. But how do we visualise the data that is being stored? Writing code and running it every time to check results is a lengthy thing to do and significantly slows down development/debugging cycle.

To overcome this, we have a web frontend called elasticsearch-head which lets us execute Elasticsearch queries and monitor the cluster.

To run elasticsearch-head, we first need to have grunt-cli installed –

$ sudo npm install -g grunt-cli

Next, we will clone the repository using git and install dependencies –

$ git clone git://github.com/mobz/elasticsearch-head.git
$ cd elasticsearch-head
$ npm install

Next, we simply need to run the server and go to indicated address on a web browser –

$ grunt server

At the top, enter the location at which elasticsearch-head can interact with the cluster and Connect.

Upon connecting, the dashboard appears telling about the status of cluster –

The dashboard shown above is from the loklak project (will talk more about it).

There are 5 major sections in the UI –
1. Overview: The above screenshot, gives details about the indices and shards of the cluster.
2. Index: Gives an overview of all the indices. Also allows to add new from the UI.
3. Browser: Gives a browser window for all the documents in the cluster. It looks something like this –

The left pane allows us to set the filter (index, type and field). The table listed is sortable. But we don’t always get what we are looking for manually. So, we have the following two sections.
4. Structured Query: Gives a dead simple UI that can be used to make a well structured request to Elasticsearch. This is what we need to search for to get Tweets from @gsoc that are indexed –

5. Any Request: Gives an advance console that allows executing any query allowable by Elasticsearch API.

A little about the loklak project and Elasticsearch

loklak is a server application which is able to collect messages from various sources, including twitter. The server contains a search index and a peer-to-peer index sharing interface. All messages are stored in an elasticsearch index.

Source: github/loklak/loklak_server

The project uses Elasticsearch to index all the data that it collects. It uses NodeBuilder to create Elasticsearch node and process the index. It is flexible enough to join an existing cluster instead of creating a new one, just by changing the configuration file.


This blog post tries to explain how NodeBuilder can be used to create Elasticsearch nodes and how they can be configured using Elasticsearch Settings.

It also demonstrates the installation and basic usage of elasticsearch-head, which is a great library to visualise and check queries against an Elasticsearch cluster.

The official Elasticsearch documentation is a good source of reference for its Java API and all other aspects.

Using NodeBuilder to instantiate node based Elasticsearch client and Visualising data

This API or that Library – which one?

Last week, I was playing with a scraper program in Loklak Server project when I came across a library Boilerpipe. There were some issues in the program related to it’s implementation. It worked well. I implemented it, pulled a request but was rejected due to it’s maintenance issues. This wasn’t the first time an API(or a library) has let me down, but this added one more point to my ‘Linear Selection Algorithm’ to select one.

Once Libraries revolutionized the Software Projects and now API‘s are taking abstraction to a greater level. One can find many API’s and libraries on GitHub or on their respective websites, but they may be buggy. This may lead to waste of one’s time and work. I am not blogging to suggest which one to choose between the two, but what to check before getting them into use in development.

So let us select a bunch of these and give score +1 if it satisfies the point, 0 for Don’t care condition and -1 , a BIG NO.

Now initialize the variable score to zero and lets begin.

1. First thing first. is it easy to understand

Does this library code belongs to your knowledge domain? Can you use it without any issue? Also consider your project’s platform compatibility with the library. If you are developing a prototype or a small software(like for an event like Hackathon), you shall choose easy-to-read tutorial as higher priority and score++. But if you are working on a project, you shouldn’t shy going an extra mile and retain the value of score.

2. Does it have any documentation or examples of implementation

It shall have to be well written, well maintained documentation. If it doesn’t, I am ok with examples. Choose well according to your comfort. If none, at least code shall be easy to understand.

3. Does it fulfill all my needs?

Test and try to implement all the methods/ API calls needed for the project. Sometimes it may not have all the methods you need for your application or may be some methods are buggy. Take care of this point, a faulty library can ruin all your hard work.

4. Efficiency and performance (BONUS POINT for this one)

Really important for projects with high capacity/performance issues.

5. See for the Apps where they are implemented

If you are in a hackathon or a dev sprint, Checking for applications working on this API shall work. Just skip the rest of the steps (except the first).

6. Can you find blogs, Stack Overflow questions and tutorials?

If yes, This is a score++

7. An Active Community, a Super GO!

Yaay! An extra plus with the previous point.

8. Don’t tell me it isn’t maintained

This is important as if the library isn’t maintained, you are prone to bugs that may pop up in  future and couldn’t be solved. Also it’s performance can never be improved. If there is no option, It is better to use it’s parts in your code so that you can work on it, if needed.

Now calculate the scores, choose the fittest one and get to work.

So with the deserving library in your hand, my first blog post here ends.

This API or that Library – which one?