Keeping alive the messenger architecture on free heroku dynos

Heroku is a cloud application platform – a new way of building and deploying web apps and a PaaS service to host applications in various programming languages and frameworks on its cloud. The entire architecture behind the messengers has been written with nodejs and deployed to heroku as its production for one main reason which is that it provides a verified and signed SSL Certificate along with its deployment which is useful for facebook messenger integration of susi as well as useful for telegram to trigger webhooks on SSL.

A major problem with the dynos (apps deployment) is that for free users the deployment automatically goes to sleep if no one is using it for a short while and only wakes up when the endpoint is hit and also is available in a day for upto 18 hours. This could be problematic when the deployment is made, so the best way is to have maximum utilization of the resources and keep it awake as and when required. This was resolved with facebook messenger because of incoming event webhooks that are available which request the server to wake up in case they sleep but what about Slack ? The slack server doesn’t send a notification event to the server when a message is sent by a user mentioning susi which is just like every other user and can be added to a channel.

To fix this problem there are multiple approaches that we’ve taken, the first step is to ensure that every fixed time interval the server pings itself so that it’s kept alive. This is accomplished with this fraction of the code

setInterval(function() {
		http.get(heroku_deploy_url);
	}, 1800000); //pings the deployment url every 30 minutes

where the heroku_deploy_url is an environment variable that can be set by the user depending on the URL of the deployment.

Another option was to use the New Relic insights and their availability tracker to keep sending requests after every fixed interval of time and use it to keep the server alive. This can be accomplished by doing the following on the heroku toolbelt.

heroku addons:add newrelic:standard
heroku addons:open newrelic

then using the following ruby script and setting the PING_URL as heroku config:add PING_URL=http://longweburl.herokuapp.com

desc "Pings PING_URL to keep a dyno alive"
    task :dyno_ping do
      require "net/http"

      if ENV['PING_URL']
        uri = URI(ENV['PING_URL'])
        Net::HTTP.get_response(uri)
      end
    end

and then performing the execution of the task by doing

heroku addons:add scheduler:standard
heroku addons:open scheduler
rake dyno_ping

The last option was however to use the existing loklak.net server setup a cronjob on that to query the heroku instance periodically so that the instance quota doesn’t get over and at the same time has as much uptime as majorly required. The best option however would be to upgrade to a hobby plan and purchase a dyno to host the resource.

Keeping alive the messenger architecture on free heroku dynos

Setting up Susi for access from Telegram Messenger

Telegram is one of the popular applications for communication in the open source community and was one of the first apps to come out with end to encryption soon to be followed suit by whatsapp messenger and other messengers. Telegram is used by a large number of people and we the folks at loklak put on our thinking hats and decided why not provide susi’s capabilities to those using Telegram and the telegram integration to the ask susi messengers integration was born.

Consuming the Susi API with Telegram is fairly straightforward in telegram. The first step is the use of botfather in telegram ensures that the bots can be created with ease. So the first step is to login into telegram with your user account and search and talk to BotFather. Bot father would ask a few questions and then provide the required token. You need to save this token and the bot powered by susi is now available.

botfather1

botfather2

This sets up the bot and provides the token, the next step is to use the token and setup the way in which it responds. This was done by keeping the token as an environment variable

var telegramToken = process.env.TELEGRAM_ACCESS_TOKEN;

and continuing to build the response system around what should happen when a message event is received from telegram. To start the bot the standard entry to the bot happens using the /start message that’s sent to the telegram service.

bot.onText(/\/start/, function (msg, match) {
	var fromId = msg.from.id;
	var resp = 'Hi, I am Susi, You can ask me anything !';
	bot.sendMessage(fromId, resp);
});

This initiates the bot if it’s the first time a user is using the bot, here after every event is read and processed by susi and the response is returned

bot.on('message', function (msg) {
	var chatId = msg.chat.id;
	var queryUrl = 'http://loklak.org/api/susi.json?q='+encodeURI(msg.text);
	var message = '';
	// Wait until done and reply
	if (msg.text !== '/start') {
		request({
			url: queryUrl,
			json: true
		}, function (error, response, body) {
			if (!error && response.statusCode === 200) {
				message = body.answers[0].actions[0].expression;
				bot.sendMessage(chatId, message);
			} else {
				message = 'Oops, Looks like Susi is taking a break, She will be back soon';
				bot.sendMessage(chatId, message);
			}
		});
	}
});

Here’s how Susi’s capabilities are now available to all those users on telegram.

Setting up Susi for access from Telegram Messenger

Rolling out Freifunk Router data IOT to Loklak

Freifunk is a non-commercial initiative for free wireless networks. The vision of freifunk is distributing free networks, to democratize the communication infrastructure and promoting social structures locally. There are a lot of routers on the freifunk network that are available across the entire country of germany and a few other countries. Previously there was an IOT system to push data to loklak about each of the freifunk nodes that are available.

This time we’re stretching it a little further, each of the nodes collected are packaged into objects and can be given back to the user in the JSON format so that the user can use this information for visualizations or other tasks needed. This was done using the fetch servlet and each of the given data looks somewhat like this

"communities": {
    "aachen": {
      "name": "Freifunk Aachen",
      "url": "http://www.Freifunk-Aachen.de",
      "meta": "Freifunk Regio Aachen"
    }...,
}

"allTheRouters": [
    {
      "id": "60e327366bfe",
      "lat": "50.564485",
      "long": "6.359705",
      "name": "ffac-FeWo-Zum-Sonnenschein-2",
      "community": "aachen",
      "status": "online",
      "clients": 1
    }...,
}

The complete JSON dumps can be read by querying the Freifunk network and can be used to populate the data available in loklak push from the locations stored on the router network and then fetched to loklak. This information can be harvested every 24 hours to fetch updates of the entire entire and update the results accordingly.

Each of this data is available at the /api/freifunkfetch.json which is queried as follows

	private static String readAll(Reader rd) throws IOException {
		StringBuilder sb = new StringBuilder();
		int cp;
		while ((cp = rd.read()) != -1) {
			sb.append((char) cp);
		}
		return sb.toString();
	}

	public static JSONObject readJsonFromUrl(String url) throws IOException, JSONException {
		InputStream is = new URL(url).openStream();
		try {
			BufferedReader rd = new BufferedReader(new InputStreamReader(is, Charset.forName("UTF-8")));
			String jsonText = readAll(rd);
			JSONObject json = new JSONObject(jsonText);
			return json;
		} finally {
			is.close();
		}
	}

In this way, each of the freifunk node data is available on loklak server and being harvested, thus adding one more IOT Service to the loklak server and harvester.

Rolling out Freifunk Router data IOT to Loklak

Scraping and Feeding IOT data sets into Loklak – Part 2

As the integrations for the IOT services had begun, there are challenges especially with scraping multiple pages at once, such was the case with the NOAA Alerts and Weather information of the US Government. To scrape this information for the live updates that happen every 5 minutes it was necessary to simplify the process with which this process could be completed without running a complex web scraper work on them every time taking up precious cycles, What’s really interesting about the website is the way in which the data can be modeled into XML for any given page. Using this and leveraging the XML and other data conversion logic implemented previously for such task, I started digging deeper into the total working of the website and realized that appending &y=0 to the alerts URL resulted in XML generation, here’s an example of how this works
https://alerts.weather.gov/cap/wwaatmget.php?x=AKC013&y=0
and
https://alerts.weather.gov/cap/wwaatmget.php?x=AKC013

Screen Shot 2016-08-21 at 8.21.08 PM

Equivalent XML being
XML from Source NOAA

So extracting this has become quite a challenge because this poses two different challenges , one is how we can efficiently retrieve the information of the counties and how we can construct the alert urls. Perl to the rescue here !

sub process_statelist {
    my $html = `wget -O- -q https://alerts.weather.gov/`;
    $html =~ [email protected]*summary="Table [email protected]@s;
    $html =~ [email protected]*\s*@@s;
    $html =~ [email protected]\s*.*@@s;
    $html =~ [email protected]\s*@@s;
    %seen = ();

    while ( $html =~ [email protected]/(\w+?)\.php\?x=1">([^<]+)@sg ) {
        my $code = $1;
        my $name = $2;
        $name =~ s/'/\\'/g;
        $name =~ [email protected]\[email protected] @g;
        if (!exists($seen{$code})) {
            push @states_entries, $name;
            push @states_entryValues, $code;
        }
        $seen{$code} = 1;
    }
    open STATE, ">", "states.xml";
    print STATE <<EOF1;




    
EOF1
    foreach my $entry (@states_entries) {
        my $temp = $entry;
        $temp =~ s/'/\\'/g;
        $temp = escapeHTML($temp);
        print STATE "        $temp\n";
    }
    print STATE <<EOF2;
    
    
EOF2
    foreach my $entryValue (@states_entryValues) {
        my $temp = $entryValue;
        print STATE "        $temp\n";
    }
    print STATE <<EOF3;
    


EOF3
    close STATE;
    print "Wrote states.xml.\n";
}

Makes a request to the website and constructs the states list of all the states present in the USA. Now it’s time to construct it’s counties.

sub process_state {
    my $state = shift @_;
    if ( $state !~ /^[a-z]+$/ ) {
        print "Invalid state code: $state (skipped)\n";
        return;
    }

    my $html = `wget -O- -q https://alerts.weather.gov/cap/${state}.php?x=3`;

    my @entries     = ();
    my @entryValues = ();

    $html =~ [email protected].*@@s;
    while ( $html =~
[email protected]\s*?]+>\s*?]+>\s*?]+>\s*?\s*?]+>\s*?]+>([^<]+)\s*?\s*?]+>([^<]+)\s*?\s*@mg
      )
    {
        push @entries,     $2;
        push @entryValues, $1;
    }
    my $unittype = "Entire State";
    if ($state =~ /^mz/) {
        $unittype = "Entire Marine Zone";
    }
    if ($state eq "dc") {
        $unittype = "Entire District";
    }
    if (grep { $_ eq $state } qw(as gu mp um vi) ) {
        $unittype = "Entire Territory";
    }
    if ($state eq "us") {
        $unittype = "Entire Country";
    }
    if ($state eq "mzus") {
        $unittype = "All Marine Zones";
    }
    print COUNTIES <<EOF1;
    
        $unittype
EOF1
    foreach my $entry (@entries) {
        my $temp = $entry;
        $temp =~ s/'/\\'/g;
        $temp = escapeHTML($temp);
        print COUNTIES "        $temp\n";
    }
    print COUNTIES <<EOF2;
    
    
        https://alerts.weather.gov/cap/$state.php?x=0
EOF2
    foreach my $entryValue (@entryValues) {
        my $temp = $entryValue;
        $temp =~ s/'/\\'/g;
        $temp = escapeHTML($temp);
        print COUNTIES "        https://alerts.weather.gov/cap/wwaatmget.php?x=$temp&y=0\n";
    }
    print COUNTIES <<EOF3;
    
EOF3
    print "Processed counties from $state.\n";

}

There we go voila, we now have a perfect mapping in between every single county and the alert URL requirement for that particular county. The NOAA scraper and parser has been quite a challenge but provides the data in real-time from the loklak server. The information can be passed via the XML Parser written as a service at /api/xml2json.json and the developers can receive the information in their required format.

Scraping and Feeding IOT data sets into Loklak – Part 2

Scraping and Feeding IOT datasets into Loklak – Part 1

There’s a lot of open data that’s available online be it a government website providing opendata or different portals having a lot of information in various formats. Many data portals and IOT devices support XML, CSV and JSON type data queries and hence previously in Loklak the integration for type conversion has been made making it a very simple method call so that each source format can be converted into a destination format. This makes it really easy for other parts of the entire code to also reuse the components of the program. For example to convert from XML to JSONML, the requirement is to have a well structured XML document after which performing a method call like this below makes the conversion.

XML.toJSONObject(xmlDataString);

Since we now have all the required data conversion logic in place it was time to start scraping and fetching the information from various data sources that we had targeted and IOT devices. In the previous weeks, the support for tracking the GPS datasets from different GPS devices to render the location has been complete and this time we looked ahead and started with the Earthquake data sets available from the government. The data available here are classified based on duration and magnitude with the fixed values possible for each one of them as

duration : hour, day, week, month
magnitude: significant, 1.0, 2.5, 4.5

So different set of queries can be constructed which roughly translate as follows
1. Data of significant earthquakes in the last hour
2. Data of significant earthquakes in the last day
3. Data of significant earthquakes in the last week
4. Data of significant earthquakes in the last month
5. Data of earthquakes less than 1.0 richters in the last hour
6. Data of earthquakes less than 1.0 richters in the last day
7. Data of earthquakes less than 1.0 richters in the last week
8. Data of earthquakes less than 1.0 richters in the last month

Similarly other queries can also be constructed. All of this data is realtime and refreshes at 5 minute intervals enabling loklak server to harvest this data for use with Susi or providing this information to researchers and scientists / data visualization experts to perform their tasks.

Once this stream has been implemented, it was time to look at similar data structures and integrate them, most of the IOT devices sending out weather related information generally send out similar structure of information, So the next target was to integrate Yahi the haze index, these devices in singapore monitor the air quality. The data which looks like this

[
  {
    "hardwareid": "hardwareid",
    "centerLat": "1.3132409",
    "centerLong": "103.8878271"
  },
  {
    "hardwareid": "48ff73065067555017271587",
    "centerLat": "1.348852",
    "centerLong": "103.926314"
  },
  {
    "hardwareid": "53ff6f066667574829482467",
    "centerLat": "1.3734711",
    "centerLong": "103.9950669"
  },
  {
    "hardwareid": "53ff72065075535141071387",
    "centerLat": "1.3028249",
    "centerLong": "103.762174"
  },
  {
    "hardwareid": "55ff6a065075555332151787",
    "centerLat": "1.2982054",
    "centerLong": "103.8335754"
  },
  {
    "hardwareid": "55ff6b065075555351381887",
    "centerLat": "1.296721",
    "centerLong": "103.787217",
    "lastUpdate": "2015-10-14T16:00:25.550Z"
  },
  {
    "hardwareid": "55ff6b065075555340221787",
    "centerLat": "1.3444644",
    "centerLong": "103.7046901",
    "lastUpdate": "2016-05-19T16:43:03.704Z"
  },
  {
    "hardwareid": "53ff72065075535133531587",
    "centerLat": "1.324921",
    "centerLong": "103.838749",
    "lastUpdate": "2015-11-29T01:45:44.985Z"
  },
  {
    "hardwareid": "53ff72065075535122521387",
    "centerLat": "1.317937",
    "centerLong": "103.911654",
    "lastUpdate": "2015-12-04T09:23:48.912Z"
  },
  {
    "hardwareid": "53ff75065075535117181487",
    "centerLat": "1.372952",
    "centerLong": "103.856987",
    "lastUpdate": "2015-01-22T02:06:23.470Z"
  },
  {
    "hardwareid": "55ff71065075555323451487",
    "centerLat": "1.3132409",
    "fillColor": "green",
    "centerLong": "103.8878271",
    "lastUpdate": "2016-08-21T13:39:01.047Z"
  },
  {
    "hardwareid": "53ff7b065075535156261587",
    "centerLat": "1.289199",
    "fillColor": "blue",
    "centerLong": "103.848112",
    "lastUpdate": "2016-08-21T13:39:06.981Z"
  },
  {
    "hardwareid": "55ff6c065075555332381787",
    "centerLat": "1.2854769",
    "centerLong": "103.8481097",
    "lastUpdate": "2015-03-19T02:31:18.738Z"
  },
  {
    "hardwareid": "55ff70065075555333491887",
    "centerLat": "1.308429",
    "centerLong": "103.796707",
    "lastUpdate": "2015-03-31T00:48:49.772Z"
  },
  {
    "hardwareid": "55ff6d065075555312471787",
    "centerLat": "1.4399071",
    "centerLong": "103.8030919",
    "lastUpdate": "2015-11-15T04:04:41.907Z"
  },
  {
    "hardwareid": "53ff6a065075535139311587",
    "centerLat": "1.310398",
    "fillColor": "green",
    "centerLong": "103.862517",
    "lastUpdate": "2016-08-21T13:38:56.147Z"
  }
]

There’s more that’s happened with IOT and interesting scenarios that happened, I’ll detail this in the next follow up blog post on IOT.

Scraping and Feeding IOT datasets into Loklak – Part 1

Releasing the loklak Python SDK 1.7

Python is one of the most popular languages in which many developers from the open source community and startups write their applications, What makes this happen is the ease of usage for the developers to leverage the library. We noticed the same here at loklak, the data on the loklak server and the new integration of Susi could be leveraged with one line of code each by the developers using the library instead of writing complex reusable components to integrate loklak into their application.

Loklak Susi Python

In the v1.7 release, there have been major changes that’ve been made to the library SDK which includes direct parsing and conversion logic from one format to another i.e. XML => JSON / JSON => XML etc.., Added to this, the ability for Susi and for developers to leverage susi’s capabilities has also been integrated into the recent release. As the library matured, the library now also supports Python3 and Python2 simultaneously. It’s now very simple for a developer to leverage Susi’s capabilities because of the library.

To install the library you can do pip install python-loklak-api, works with both pip3 and pip2. Once the library is installed, it’s very simple to make queries to loklak and to susi with just a few lines of code. Here’s an example of how this could be used and the modularity and robustness with which the library has been built.

>>> from loklak import Loklak
>>> from pprint import pprint
>>> l = Loklak() # Uses the domain loklak.org
>>> susi_result = l.susi('Hi I am Sudheesh')
>>> pprint(susi_result)
{'answer_date': '2016-08-20T04:56:17.371Z',
 'answer_time': 11,
 'answers': [{'actions': [{'expression': 'Hi sudheesh.', 'type': 'answer'}],
              'data': [{'0': 'i am sudheesh', '1': 'sudheesh'}],
              'metadata': {'count': 1, 'hits': 1, 'offset': 0}}],
 'client_id': 'aG9zdF8xODMuODMuMTIuNzY=',
 'count': 1,
 'query': 'Hi I am Sudheesh',
 'query_date': '2016-08-20T04:56:17.360Z',
 'session': {'identity': {'anonymous': True,
                          'name': '183.83.12.76',
                          'type': 'host'}}}

Similarly, fetching the information for a search or a user is also equally easy

>>> l.search('rio')
>>> l.user('sudheesh001')

This makes it useful for hundreds of developers and plugins in Python to potentially leverage this library into various frameworks like Django, Flask, Pyramid or even run it from the command line interface. Head over to our github repository to learn more and detailed documentation.

Releasing the loklak Python SDK 1.7

Architectural design for supporting susi on multiple messaging services

Susi has been evolving and learning more every single day leveraging the billion+ tweets that the loklak server has indexed. The next important step would be to hookup Susi’s capabilities in a fashion that the world can easily use. A best friend powered by the data that’s scraped on every single platform available. With this in mind, we first dug deep into the facebook messenger potentially exposing susi’s capabilities to more than a billion people on the planet but as we scale and move to other agents like telegram, slack etc.., We needed some architectural changes to minimize the code duplication as well as the number of resources that we consume. In this blog post, i’ll walk you all through the design decision for the architecture planned to expose Susi to the world.

This is a detailed architecture for running all the different messaging services that we wish to accomplish in the near future. Chat / Messengers are becoming something very important and many a times the very first app that one opens up on their smart phone. It’s very important that the data in Loklak be made sense of to the people out there and learn intelligently. Susi is a great step in the process towards using the twitter data and data from other scrapers and data sources so that information can be given to people querying for it. Running a lot of services is really simple when we set up each one of the individually on a separate server but running the same code on multiple servers to just cater to one single messenger like platform ? Nah not a great idea.

Almost all of the messenger platforms be it Facebook Messenger, Telegram, Slack or anything else run on the same method, event driven and use webhooks. The idea here is to have multiple of these webhooks, and create validation endpoints for the same in case they use the GET request validations of the server like how Facebook does before verifying, At the same time many of them need SSL Certificates so that the service can be setup. This part is simplified by the heroku hosting and the default SSL that it provides for every application URL it provides.

All the services residing in the same server host/application can be used to share the common query library i.e. making the requests to /api/susi.json and returning the corresponding json or the answer entry which is available at body.answers[0].actions[0].expression , There’s a lot more information and modular architecture that can be targeted during the cleanup of each of these services into the required folders by using routing from the index.js for the same. In such a system, the index.js behaves as a proxy layer forwarding the requests to the corresponding service agent rather than scanning through the entire index.js file as it is now. So the application structure over time would look like this.

Messenger Architecture Diagram

|- Common\QueryBuilder.js (Common Library to be used across)
|- Facebook/
|--------\facebook.js
|--------\supportFiles.js
|- Slack/
|- Telegram/
|- Susi's Chat Interface/
|- Other Services ...,
|- index.js (Route to required agent)
Architectural design for supporting susi on multiple messaging services

Setting up Susi’s capabilities on Facebook Messenger

Facebook’s messenger platform is a great way to reach out to a lot of people from a page that one owns on facebook. The messenger services reach out to almost 900 million people who use the system, that’s a humongous set of people to which Susi’s capabilities could be reached out to if integrated with the facebook messenger and that’s exactly what has been done. Susi is the AI System running in Loklak which contains rules to run the required scrapers or fetch the information directly from facebook. To set this up, we created a new repository deployed on heroku called asksusi_messengers which is going to be a collection of such messenger integrations i.e. to Facebook, Slack, Whatsapp, Telegram etc.., So that the power of mobile and messaging services can be used to make Susi smarter and ways in which people can consume Susi’s capabilities.

Considering the real time nature of people and messages, we have used node.js to take requests by using a webhook on facebook which subscribes to the changes or any event that triggers from the asksusisu facebook page. So here’s the way they really work. Messenger bots uses a web server to process messages it receives or to figure out what messages to send. You also need to have the bot be authenticated to speak with the web server and the bot approved by Facebook to speak with the public. This means that we create the messenger service with facebook apps and then register the endpoint so that facebook can trigger that URL endpoint more like a webhook so that the messages that were sent by the user can be sent to Susi’s AI Service.

Using express this is pretty simple and can be accomplished by the following piece of code that listens to the root of the application.
app.get('/', function (req, res) {
res.send('Susi says Hello.');
});

This ensures that the application is active for a user trying to hit the GET endpoint of the service and as a security method, the rest of the application endpoints need to be on a POST endpoints so that the application’s responses are secure and not anyone can send requests without the required tokens.

Facebook needs an SSL based hostname so that the service can be binded for this. The fastest way this can be spun up is by using the heroku deployments, Hence we use a ProcFile with the contents in it as

web: node index.js

Then configure the application on facebook developers with the given name and setup a messenger webhook over there to send an event to the heroku server that you just deployed, Added to this add your own token which you need to remember and put up on heroku too. After this you will receive a page access token which you can temporarily save somewhere and later push to heroku as a configuration. You can read this configuration by doing and facebook’s verification works.


var token = process.env.FB_PAGE_ACCESS_TOKEN;

// for facebook verification
app.get('/webhook/', function (req, res) {
if (req.query['hub.verify_token'] === 'this_is_my_top_secret_token_that_i_know') {
res.send(req.query['hub.challenge']);
}
res.send('Error, wrong token');
});

You can then receive the information from facebook’s events as follows on a POST request endpoint to the webhook

// to post data
app.post('/webhook/', function (req, res) {
var messaging_events = req.body.entry[0].messaging;
}

The messaging_events gets the required information from facebook in

event.message

and

event.message.text

objects of the triggered webhook event. The query to susi is then constructed


// Construct the query for susi
var queryUrl = 'http://loklak.org/api/susi.json?q='+encodeURI(text);
var message = '';
// Wait until done and reply
request({
url: queryUrl,
json: true
}, function (error, response, body) {
if (!error && response.statusCode === 200) {
message = body.answers[0].actions[0].expression;
sendTextMessage(sender, message);
} else {
message = 'Oops, Looks like Susi is taking a break, She will be back soon';
sendTextMessage(sender, message);
}
});

And voila we have the susi facebook page automatically replying powered by Loklak’s capabilities of Susi. Currently susi can reply with the text messages as well as image responses.

Susi's facebook integration
Susi’s facebook integration
Setting up Susi’s capabilities on Facebook Messenger