Architectural design for supporting susi on multiple messaging services

Susi has been evolving and learning more every single day leveraging the billion+ tweets that the loklak server has indexed. The next important step would be to hookup Susi’s capabilities in a fashion that the world can easily use. A best friend powered by the data that’s scraped on every single platform available. With this in mind, we first dug deep into the facebook messenger potentially exposing susi’s capabilities to more than a billion people on the planet but as we scale and move to other agents like telegram, slack etc.., We needed some architectural changes to minimize the code duplication as well as the number of resources that we consume. In this blog post, i’ll walk you all through the design decision for the architecture planned to expose Susi to the world.

This is a detailed architecture for running all the different messaging services that we wish to accomplish in the near future. Chat / Messengers are becoming something very important and many a times the very first app that one opens up on their smart phone. It’s very important that the data in Loklak be made sense of to the people out there and learn intelligently. Susi is a great step in the process towards using the twitter data and data from other scrapers and data sources so that information can be given to people querying for it. Running a lot of services is really simple when we set up each one of the individually on a separate server but running the same code on multiple servers to just cater to one single messenger like platform ? Nah not a great idea.

Almost all of the messenger platforms be it Facebook Messenger, Telegram, Slack or anything else run on the same method, event driven and use webhooks. The idea here is to have multiple of these webhooks, and create validation endpoints for the same in case they use the GET request validations of the server like how Facebook does before verifying, At the same time many of them need SSL Certificates so that the service can be setup. This part is simplified by the heroku hosting and the default SSL that it provides for every application URL it provides.

All the services residing in the same server host/application can be used to share the common query library i.e. making the requests to /api/susi.json and returning the corresponding json or the answer entry which is available at body.answers[0].actions[0].expression , There’s a lot more information and modular architecture that can be targeted during the cleanup of each of these services into the required folders by using routing from the index.js for the same. In such a system, the index.js behaves as a proxy layer forwarding the requests to the corresponding service agent rather than scanning through the entire index.js file as it is now. So the application structure over time would look like this.

Messenger Architecture Diagram

|- Common\QueryBuilder.js (Common Library to be used across)
|- Facebook/
|- Slack/
|- Telegram/
|- Susi's Chat Interface/
|- Other Services ...,
|- index.js (Route to required agent)
Architectural design for supporting susi on multiple messaging services

User Information via Loklak API

While working on “adding loklak API support to  wordpress plugins”, I found that a lot plugins require Twitter user information e.g. profile_pic, followers_count, tweets by the user etc.

One can get an exhaustive user information using Loklak search and user APIs. In this blog post I would show how one can combine the results from Loklak’s Search and User API to provide all kinds of data required by a WordPress plugin (e.g.).

Starting with Search API!


in url would give you results like this:


As you can see, it provides all tweet (tweeted by the <user>) related information and a small user profile like this:


But what if you require his location, followers_count, friends_count, no. of tweets he has posted till now etc, you would have to take help from the Loklak User API. Sample results for

are given below:


But somehow you also require the information of his followers and people he is following. How do you achieve that? Well Loklak API has provision for that as well. All you need to do is add followers=<count> or following=<count> as need be.

Following data would be added to your user.json results.


It would give you the user information of twitter users followed by that user. Recursively you can get their tweet data, their followers and on and on!!

In order to get complete user profile, you can merge the results from Search and User API and create wonderful applications.

To do all this using Loklak PHP API, refer to the code snippet below.

$connection = new Loklak();
$tweets = $connection->search('', null, null, $username, $no_of_tweets);

$tweets = json_decode($tweets, true);
$tweets = json_decode($tweets['body'], true);

$user = $connection->user($username);
$user = json_decode($user, true);
$user = json_decode($user['body'], true);

$tweet_content = $tweets['statuses'];
for ($i=0; $i < sizeof($tweet_content); $i++) {
     $tweet_content[$i] = array_merge($tweet_content[$i], $user);
User Information via Loklak API

Loklak ShuoShuo: Another feather in the cap !

Work is still going on Loklak Weibo to extract the information as desired and there shall be another post up soon explaining the intricacies of implementation.

Currently, an attempt was made to parse the HTML page of  (another Chinese twitter like service)

Screenshot from 2016-06-05 22:28:18Screenshot from 2016-06-05 22:28:25
Just like last time, The major challenge however is to understand the Chinese annotations especially being from a non-Chinese background. Google translate aids testing the retrieved data by helping me match each phrase or/and line.

I have made use of of  JSoup: The Java HTML parser library which assists in extracting and manipulating data, scrape and parse HTML from the URL. The suggested use of JSoup is designed to deal with all varieties of HTML, hence as of now it is being considered a suitable choice.

Screenshot from 2016-06-05 22:32:53

 *  Shuoshuo Crawler
 *  By Jigyasa Grover, @jig08

package org.loklak.harvester;

import org.jsoup.Jsoup;
import org.jsoup.nodes.*;

public class ShuoshuoCrawler {
    public static void main(String args[]){

        Document shuoshuoHTML = null;
        Element recommendedTalkBox = null;
        Elements recommendedTalksList = null;
        String recommendedTalksResult[] = new String[100];
        Integer numberOfrecommendedTalks = 0;
        Integer i = 0;

        try {
            shuoshuoHTML = Jsoup.connect("").get();

            recommendedTalkBox = shuoshuoHTML.getElementById("list2");
            recommendedTalksList = recommendedTalkBox.getElementsByTag("li");

            for (Element recommendedTalks : recommendedTalksList)
                //System.out.println("\nLine: " + recommendedTalks.text());
                recommendedTalksResult[i] = recommendedTalks.text().toString();
            numberOfrecommendedTalks = i;
            System.out.println("Total Recommended Talks: " + numberOfrecommendedTalks);
            for(int k=0; k<numberOfrecommendedTalks; k++){
                System.out.println("Recommended Talk " + k + ": " + recommendedTalksResult[k]);

        } catch (IOException e) {


QQ Recommended Talks from are now stored as an array of Strings.

Total Recommended Talks: 10
Recommended Talk 0: 不会在意无视我的人,不会忘记帮助过我的人,不会去恨真心爱过我的人。
Recommended Talk 1: 喜欢一个人是一种感觉,不喜欢一个人却是事实。事实容易解释,感觉却难以言喻。
Recommended Talk 2: 一个人容易从别人的世界走出来却走不出自己的沙漠
Recommended Talk 3: 有什么了不起,不就是幸福在左边,我站在了右边?
Recommended Talk 4: 希望我跟你的爱,就像新闻联播一样没有大结局
Recommended Talk 5: 你会遇到别的女子和她举案齐眉,而我自会有别的男子与我白首相携。
Recommended Talk 6: 既然爱,为什么不说出口,有些东西失去了,就再也回不来了!
Recommended Talk 7: 凡事都有可能,永远别说永远。
Recommended Talk 8: 都是因为爱,而喜欢上了怀旧;都是因为你,而喜欢上了怀念。
Recommended Talk 9: 爱是老去,爱是新生,爱是一切,爱是你。

A similar approach can be now used to do the same for Latest QQ talk and QQ talk Leaderboard.

Check out this space for upcoming detail on implementing this technique to parse the entire page and get desired results…

Feedback and Suggestions welcome.

Loklak ShuoShuo: Another feather in the cap !

Loklak API SDK Now supports golang

The Go programming language is quite a recent language. It’s a statically typed and compiled language un like Python or other scripting languages, It has a greater memory safety compared to C, supports garbage collection and an inbuilt support to make http and network requests using the "net/http" and "net/url" packages that are present in it. Go is scalable to very large systems like Java and C++. It also makes things productive and is easily readable because of the far lesser number of keywords that it has.


Some of the key things that we notice with Golang coming from a C/C++ background is that there’s nothing called a class. Wait, what ? Exactly, you read it right, Go considers class to be mentioned as a struct . So the loklak object data structure which we will be using throughout golang’s support for the various API requests is as follows

import (


So in golang, it’s recommended to have the built in packages first followed by the libraries that you’re using remotely, we’re using the pretty print json library which is at So since we follow the DRY (Don’t Repeat Yourself) method, we write a public function called getJSON as follows

func getJSON(route string) (string, error) {
	r, err := http.Get(route)
	if err != nil {
		return "", err
	defer r.Body.Close()

	var b interface{}
	if err := json.NewDecoder(r.Body).Decode(&b); err != nil {
		return "", err
	out, err := prettyjson.Marshal(b)
	return string(out), err

If you’re coming from a C++/C background you’d notice something really odd about this, the return types need to be mentioned in the function header itself, so a function func getJSON(route string) (string, error) takes a string by the variable route as input and returns two values (string, error) as the return types. The above code takes the request URL and returns the corresponding JSON response.

Golang methods are generally not a very preferred type in such REST based scenario API development like that of Loklak, hence most of the queries can be directly made using functions. But we initially have a function for a method where we can set the loklak server URL to take.

// Initiation of the loklak object
func (l *Loklak) Connect(urlString string) {
	u, err := url.Parse(urlString)
	if (err != nil) {
	} else {
		l.baseUrl = urlString

So this takes a string urlString as an parameter and creates a method called Connect() by using a loklak Object, this updates the base URL field of the loklak object. This is obtained as follows in the main package.

loklakObject := new(Loklak)

The Go language has built-in facilities, as well as library support, for writing concurrent programs. Concurrency refers not only to CPU parallelism, but also to asynchrony: letting slow operations like a database or network-read run while the program does other work, as is common in event-based servers. These could prove to be very useful in building apps with Go using loklak and to use loklak data, the high parallelism can be very useful for developers using data from loklak and building applications based on this.

Some very interesting things in golang is that golang doesn’t support function overloading or default parameters, hence the search API which was implemented using default parameters in PHP and Python can’t be implemented that way in Go. This has been tackled by using the search() function and prepackaging the request to be made to it as a loklak object.

// Search function is implemented as a function and not as a method
// Package the parameters required in the loklak object and pass accordingly
func search (l *Loklak) (string) {
	apiQuery := l.baseUrl + "api/search.json"
	req, _ := http.NewRequest("GET",apiQuery, nil)

	q := req.URL.Query()
	// Query constructions
	if l.query != "" {
		constructString := l.query
		if l.since != "" {
			constructString += " since:"+l.since
		if l.until != "" {
			constructString += " until:"+l.until
		if l.from_user != "" {
			constructString += " from:"+l.from_user
	if l.count != "" {
		q.Add("count", l.count)
	if l.source != "" {
		q.Add("source", l.source)
	req.URL.RawQuery = q.Encode()
	queryURL := req.URL.String()
	out, err := getJSON(queryURL)
	if err != nil {
	return out

To use this search capability, one needs to create and package the requested parameters and then call this function with the loklak query.

func main() {
	loklakObject := new(Loklak)
	loklakObject.query = "fossasia"
	loklakObject.since = "2016-05-12"
	loklakObject.until = "2016-06-02"
	loklakObject.count = "10"
	loklakObject.source = "cache"
	searchResponse := search(loklakObject)

The golang API has a great potential in aiding loklak server and being used by applications running golang and looking to build highly scalable and high performance applications using loklak. The API is available on github, feel free to open an issue in case you find a bug or need an enhancement

Loklak API SDK Now supports golang

Search vs Aggregations

    The search aggregations also know as facets(Field aggregations) are often ignored while using the Search API provided by Loklak. This post aims at clearing up the difference and bring out the clear usage. This extends the previous blog, titled “Know who tweeted the most!” where we discussed about the Search API. Using the same example, the results were viewed using the search aggregations. The following two pictures illustrates the difference.



    The above results are achieved by scraping the aggregation field called mentions, which contains the list of user names who mentioned the search hashtag in their tweets. The result is from the one billion tweets(~1,377,367,341) which Loklak had scraped and indexed. The following screenshots show the JSON object of the search aggregations.


    The above search metadata object shows some important fields which can alter the result. The count is set to zero and the cache is disabled so that the aggregations do not consider the remote search results. The query fields shows the constraints on which the query must be performed. It shows the hashtag for which it must query. Since the search aggregations are going to query the whole Loklak index(over one billion tweets) we can limit the result payload by giving time constraints such as “since” and “until” fields. We can even limit the result display by mentioning a value to the limit field. This helps you to form a search query which gives you a much accurate result. The following is a sample query where you can check out the JSON result. Edit the “q” variable to your search requirement and the time fields too. since:2016-01-01 until:2016-03-01&source=cache&count=0&fields=mentions,hashtags

The following shows the aggregations field and the benefits you can get out of it.


    The aggregations field consists of the hashtags and the mentions. The mentions contains all user names denoted by a ‘@’-prefix from the message, which is being listed without the leading ‘@’. The hashtags contains all hashtags appearing in the message, without the leading ‘#’. You have more fields which you can mention in the query according to your need such as limit, source, timezoneOffset, minified etc. For more info on the search aggregations visit the docs page.

You can even visit to the app “knowTheDiff” which guides you through the difference with step by step intro.


The app is still under construction and needs to get more fields into the picture, but can try out the basic difference. Here are some sample queries where you can try out.

Search vs Aggregations

Developer Tools: Build your query using LQL

Writing request queries is definitely a hard job for developers trying to use any API. Sometimes the query strings go wrong and sometimes you’re not getting the output you’ve been expecting. We understood a similar problem in Loklak API for developers and built the Loklak Query Language (LQL).

This tool takes in the fields and the type of query you want to make and dynamically creates the request URL in front of you. You could even test this URL and see the pretty printed JSON responses returned by the server. Here’s the best part, You get to use this with the custom URL of any loklak server that you deploy.


The team has put in quite some effort in scripting easy deployment buttons with Heroku, Bluemix, Scalingo, Docker Cloud etc.., and Tools like this help developers build the queries and look for the data they wanted.

Screen Shot 2016-06-01 at 9.09.29 AM

There are a lot of features that are in store and shows the query for a lot of API endpoints. It’s a great tool to play with, In the future, we’d also be integrating the way the queries have to be made for different programming language APIs that we support. So that way you can directly access the required code segments directly and use them with the supporting library.

It’s a lightweight application, every time a change is made in any of the fields in the form, the query gets generated completely on the client side and at the same time the fields change based on the API call that has been chosen.

Have a look at the LQL here or head over to our github and give us feedback or open an issue.

Developer Tools: Build your query using LQL

Loklak Tweets – Tweets and much much more !

The last update when the tweets functionality was implemented, it was able to post a regular tweet from the loklak webclient application. The tweets functionality has undergone massive changes and multiple code refactor so that there are more features along with the usual features implemented by Twitter, them being

  1. The ability to post a map as an attachment to the tweet.
  2. The ability to post large text attachment 0r markdown to twitter.

Once the user is loggedin, the user can click on the tweet functionality where he’d be provided with a lot of options, post a regular tweet, a tweet with an attached image, a tweet with map attachment or a tweet with markdown.


Screen Shot 2015-08-25 at 7.27.04 pm

Ability to compose new tweets with map attachments

Screen Shot 2015-08-25 at 7.28.22 pm

Ability to write markdown code as attachment to the tweets.

Screen Shot 2015-08-25 at 7.28.30 pm

Live view of the markdown text rendered on the screen so that the user can see the large text content he/she is attaching as markdown text to twitter.

The use can also post their location via loklak to twitter similar to how twitter uses the geolocation service, the navigator request takes the coordinates and reverse looks up the database so that the location name and the nearby areas can be retrieved.

Large text tweet with markdown content

Map attachment tweets

Image upload tweet

All the tweets that are posted over here are cross posted to loklak to make it easier for the server to harvest the tweets, We’ve moved to from after a lot of testing to make sure that the application performs well under heavy load. We have collected more than 42 million tweets so far on the main server.

Loklak Tweets – Tweets and much much more !