Time across seven seas…

It has been rightly said:

Time is of your own making
Its clock ticks in your head.
The moment you stop thought
Time too stops dead.

loklak_org_sticker

Hence to keep up with evolving times, Loklak has now introduced a new service for “time”.

The recently developed API provides the current time and day at the location queried by the user.

The /api/locationwisetime.json API scrapes the results from timeanddate.com using our favourite JSoup as it provides a very convenient API for extracting and manipulating data, scrape and parse HTML from a given URL.

In case of multiple locations with the same name, countries are then also provided along-with corresponding day and time wrapped up as a JSONObject.

A sample query could then be something like: http://loklak.org/api/locationwisetime.json?query=london

Screenshot from 2016-08-17 14:28:28

 

When implemented as a console service, this API can be used along-with our our dear SUSI by utilising the API Endpoints like: http://loklak.org/api/console.json?q=SELECT * FROM locationwisetime WHERE query=’berlin’;

Screenshot from 2016-08-17 14:50:58

LocationWiseTimeService.java for reference:


/**
 *  Location Wise Time
 *  timeanddate.com scraper
 *  Copyright 27.07.2016 by Jigyasa Grover, @jig08
 *
 *  This library is free software; you can redistribute it and/or
 *  modify it under the terms of the GNU Lesser General Public
 *  License as published by the Free Software Foundation; either
 *  version 2.1 of the License, or (at your option) any later version.
 *  
 *  This library is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 *  Lesser General Public License for more details.
 *  
 *  You should have received a copy of the GNU Lesser General Public License
 *  along with this program in the file lgpl21.txt
 *  If not, see <http://www.gnu.org/licenses/>.
 */

package org.loklak.api.search;

import java.io.IOException;

import javax.servlet.http.HttpServletResponse;

import org.json.JSONArray;
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.loklak.server.APIException;
import org.loklak.server.APIHandler;
import org.loklak.server.AbstractAPIHandler;
import org.loklak.server.Authorization;
import org.loklak.server.BaseUserRole;
import org.loklak.server.Query;
import org.loklak.susi.SusiThought;
import org.loklak.tools.storage.JSONObjectWithDefault;

public class LocationWiseTimeService extends AbstractAPIHandler implements APIHandler {

	private static final long serialVersionUID = -1495493690406247295L;

	@Override
	public String getAPIPath() {
		return "/api/locationwisetime.json";
	}

	@Override
	public BaseUserRole getMinimalBaseUserRole() {
		return BaseUserRole.ANONYMOUS;

	}

	@Override
	public JSONObject getDefaultPermissions(BaseUserRole baseUserRole) {
		return null;
	}

	@Override
	public JSONObject serviceImpl(Query call, HttpServletResponse response, Authorization rights,
			JSONObjectWithDefault permissions) throws APIException {
		String query = call.get("query", "");
		return locationWiseTime(query);
	}

	public static SusiThought locationWiseTime(String query) {
		
		Document html = null;

		JSONArray arr = new JSONArray();

		try {
			html = Jsoup.connect("http://www.timeanddate.com/worldclock/results.html?query=" + query).get();
		} catch (IOException e) {
			e.printStackTrace();
		}

		Elements locations = html.select("td");
		int i = 0;
		for (Element e : locations) {
			if (i % 2 == 0) {
				JSONObject obj = new JSONObject();
				String l = e.getElementsByTag("a").text();
				obj.put("location", l);
				String t = e.nextElementSibling().text();
				obj.put("time", t);
				arr.put(obj);
			}
			i++;
		}
		
		SusiThought json = new SusiThought();
		json.setData(arr);
		return json;
	}

}

 

Hope this helps, and worth the “time” 😛

Feel free to ask questions regarding the above code snippet, shall be happy to assist.

Feedback and Suggestions welcome 🙂

Time across seven seas…

Bot integrations of Susi on Online Social Media: Slack

In my past few posts, I have explained the use of Susi in detail. We have come to see Susi as an intelligent chat bot cum search engine, which answers Natural Language queries, and has a large dataset to support it thanks to the various sites we scrape from, the APIs we integrate, and also, the additional services that we make (like the TwitterAnalysisService I talked about). All of these make Susi an excellent chat service.

So now the question comes up: how do we increase its reach?

This is where bot integration comes up. Services like Messenger (Facebook), Google Hangouts, Slack, Gitter etc have a large number of user chatting on their platform, but in addition, they also added an additional service of bot users. These users, when messaged about related queries, answer those queries to the user. We have recently seen a very interesting example of this, when the White House used FB Messenger Bots for people to reach out to President Obama (link). This makes users get quick and instant replies to specific queries, and also, bot integrations on these big platforms make more and more people connect with the bot and its maintainers too.

That is why we believed it would be amazing if Susi were integrated onto these platforms as a bot, so that people realise all the things it is able to do. Now, we need to implement these.

As Sudheesh must have spoken about, we are following a system of maintaining all the bots on one index.js file, all of the bots post to different routes, and we deploy this file and the npm requirements in the package.json so that all run concurrently. Keeping this in mind, I developed the Slack Bot for Susi.

I actually developed the bot both in Python and node, but we will only be using node because of the easiness of deployment. For those who wish to check out the Python code, head over here. The Slack API usage remains the same though.

The main part of our bot will be the Slack RTM (Real Time Messaging) API. It basically reads all the conversation going on, and reports every message in a specified format, like:


{
    "id": 1,
    "type": "message",
    "channel": "C024BE91L",
    "text": "Hello world"
}

There are other parameters also included, like username. More info on all the parameters can be found here.

So this is the API schema. We will be using an npm package called slackbots for implementing our bot. Slackbots gives an easy way of interfacing with the RTM API, so that we can focus on implementing the Susi API in the bot without having to worry much about the RTM. You could read Slackbots’ documentation here.

For making the bot, first go here, register your bot, and get the access token. We will need this token to make authorised requests to the API. For keeping it in a secure place, store it as an environment variable:

export SLACK_TOKEN=<access token>

Now comes the main code. Create a new node project using npm init. Once the package.json is created, execute the following commands:


npm install --save requests
npm install --save slackbots

This install the slackbots and the requests packages in our project. We will need requests to make a connection with the Susi API on http://loklak.org/api/susi.json.

Now we are all set to use slackbots and write our code. Make a new file index.js (add this to the package.json as well). Here’s the code for our slackbot.


'use strict';
/* global require, process, console */
var request = require('request');
var SlackBot = require('slackbots');
var slack_token = process.env.SLACK_TOKEN; //accessing the slack token from environment
var slack_bot = new SlackBot({
	token: slack_token, 
	name: 'susi'
});

slack_bot.on('message', function(data){
	var slackdata = data;
	var msg, channel, output, user;
	if(Object.keys(slackdata).length > 0){
		if('text' in slackdata && slackdata['username'] != 'susi'){
			msg = data['text'];
			channel = data['channel']
		}
		else {
			msg = null;
			channel = null;
		}
	}
	if(msg != null && channel !=null){
		var botid = ':' 
		if (msg.split(" ")[0] != botid){
			//do nothing
		} else{
			var apiurl = 'http://loklak.org/api/susi.json?q=' + msg;
			request(apiurl, function (error, response, body) {
				if (!error && response.statusCode === 200) {
					var data = JSON.parse(body);
					if(data.answers[0].actions.length == 1){
						var susiresponse = data.answers[0].actions[0].expression;
						slack_bot.postMessage(channel, susiresponse);
					} else if(data.answers[0].actions.length == 2 && data.answers[0].actions[1].type == "table"){
						slack_bot.postMessage(channel, data.answers[0].actions[0].expression + " (" + data.answers[0].data.length + " results)");
						for(var i = 0; i < data.answers[0].data.length; ++i){
							var response = data.answers[0].data[i];
							var ansstring = "";
							for(var resp in response){
								ansstring += (resp + ": " + response[resp] + ", ");
							}
							slack_bot.postMessage(channel, ansstring);
						}
					}
				}
			});
		}
	}
});

Let’s go over this code bit by bit. We instantiate SlackBots using our token first. Then, the line slack_bot.on('message', function(data) triggers the RTM API. We first get the message in the conversation, check if its JSON is empty or not. Also, our bot should only reply when the user asks, it should not reply to the queries of itself (because RTM continuously reads input, so even the bot’s replies come under it, so we don’t want the bot to react to its own replies lest we get an infinite loop). This check is done through:


if(Object.keys(slackdata).length > 0){
		if('text' in slackdata && slackdata['username'] != 'susi'){
			msg = data['text'];
			channel = data['channel']
		}
		else {
			msg = null;
			channel = null;
		}
	}

We also get the text message and the channel to post the message into.

Next, we check for an empty message. If there is a message, we check if the message starts with @susi: (my bot was named susi, and the bot id came from the RTM API itself, I hardcoded it). We should only query the Susi API in such a case where the message starts with @susi. And once that check is done, we query the Susi API, and the response is data.answers[0].actions[0].expression (except when it’s a table, then we use data.answers[0].data). Once we get what we need to send, we use SlackBot’s postMessage method, and post the message onto the channel using the RTM API. That’s what the rest of the code does.


if(msg != null && channel !=null){
		var botid = ':' 
		if (msg.split(" ")[0] != botid){
			//do nothing
		} else{
			var apiurl = 'http://loklak.org/api/susi.json?q=' + msg;
			request(apiurl, function (error, response, body) {
				if (!error && response.statusCode === 200) {
					var data = JSON.parse(body);
					if(data.answers[0].actions.length == 1){
						var susiresponse = data.answers[0].actions[0].expression;
						slack_bot.postMessage(channel, susiresponse);
					} else if(data.answers[0].actions.length == 2 && data.answers[0].actions[1].type == "table"){
						slack_bot.postMessage(channel, data.answers[0].actions[0].expression + " (" + data.answers[0].data.length + " results)");
						for(var i = 0; i < data.answers[0].data.length; ++i){
							var response = data.answers[0].data[i];
							var ansstring = "";
							for(var resp in response){
								ansstring += (resp + ": " + response[resp] + ", ");
							}
							slack_bot.postMessage(channel, ansstring);
						}
					}
				}
			});
		}
	}
});

This completes the bot. When you shoot it up from your terminal using node index.js, or deploy it, it will work perfectly.

This can now be used by a wide range of people, and everyone can see all that Susi can do. 🙂

We are still in the process of making bots. FB, Telegram and Slack bots have been made till now, and we will be making more. Feedback, as usual, is welcome. 🙂

Bot integrations of Susi on Online Social Media: Slack

Welcoming Wiki GeoData to Loklak !

Loklak has grown vast with due course of time and it’s capabilities have extended manifold especially by the inclusion of sundry website scraper services and data provider services.

loklak_org_sticker

The recent addition includes a special service which would provide the user with a list of Wikipedia articles tagged with the location when supplied with a specific name of the place.

Thanks to the Media Wiki GeoData API, this service was smoothly integrated in the Loklak Server and SUSI (our very own cute and quirky personal digital assistant)

When the name of the place is sent in the query , firstly the home-grown API loklak.org/api/geocode.json was utilized to get the location co-ordinates i.e. Latitude and Longitude.


URL getCoordURL = null;

String path = "data={\"places\":[\"" + place + "\"]}";

try {
    getCoordURL = new URL("http://loklak.org/api/geocode.json?" + path);
} catch (MalformedURLException e) {
    e.printStackTrace();
}

JSONTokener tokener = null;
try {
    tokener = new JSONTokener(getCoordURL.openStream());
} catch (Exception e1) {
    e1.printStackTrace();
}

JSONObject obj = new JSONObject(tokener);

String longitude = obj.getJSONObject("locations").getJSONObject(place).getJSONArray("location").get(0)
				.toString();
String lattitude = obj.getJSONObject("locations").getJSONObject(place).getJSONArray("location").get(1)
				.toString();

The resultant geographical co-ordinates were then passed on to the Media Wiki GeoData API with other parameters like the radius of the geographical bound to be considered and format of the resultant data along-with the co-ordinates to obtain a list of Page IDs of the corresponding Wikipedia Articles besides Title and Distance.


URL getWikiURL = null;

try {
    getWikiURL = new URL(
                      "https://en.wikipedia.org/w/api.php?action=query&list=geosearch&gsradius=10000&gscoord=" + latitude
			+ "|" + longitude + "&format=json");
} catch (MalformedURLException e) {
    e.printStackTrace();
}

JSONTokener wikiTokener = null;

try {
    wikiTokener = new JSONTokener(getWikiURL.openStream());
} catch (Exception e1) {
    e1.printStackTrace();
}

JSONObject wikiGeoResult = new JSONObject(wikiTokener);

When implemented as a Console Service for Loklak, the servlet was registered as /api/wikigeodata.json?place={place-name} and the API endpoint for example goes like http://localhost:9000/api/console.json?q=SELECT * FROM wikigeodata WHERE place=’Singapore’;

Presto !!

We have a JSON Object as the result with a list of Wikipedia Articles as:

56976976-5e41-11e6-95b6-19e570099739

The Page-IDs thus obtained can now be utilized very easily to diaply the articles by using the placeholder https://en.wikipedia.org/?curid={pageid} for the purpose.

And this way, another facility was included in our diversifying Loklak server.

Questions, feedback, suggestions appreciated 🙂

Welcoming Wiki GeoData to Loklak !

Social Media Analysis using Loklak (Part 3)

In my last two blog posts, I spoke about the TwitterAnalysis Servlet, and how the data actually comes into it through scraping methods. For TwitterAnalysis though, there was one thing that was missing: Susi integration, which I’ll cover in this blog post.

Given that the TwitterAnalysis servlet is basically a Social Media profile analyser, we could definitely get a lot of useful statistics from it. As covered earlier, we are getting likes, retweets, hashtag statistics, sentiment analysis, frequency charts etc. Now, to get this working on Susi, we need to build queries which can use these statistics and give the user valuable information.

First off, the serviceImpl method needs to be changed to return a SusiThought object. SusiThought is a JSONObject which processes the query (does keyword extraction etc), uses the APIs to get an answer to the query, and returns the answer along with the count of answers (incase of a table). SusiThought is what triggers the entire Susi mechanism, so the first thing for Susi integration is to convert TwitterAnalysis to return a SusiThought object:


@Override
	public JSONObject serviceImpl(Query call, HttpServletResponse response, Authorization rights,
			JSONObjectWithDefault permissions) throws APIException {
		String username = call.get("screen_name", "");
		String count = call.get("count", "");
		TwitterAnalysisService.request = call.getRequest();
		return showAnalysis(username, count);
	}
public static SusiThought showAnalysis(String username, String count) {

//rest of the code as explained in last blog post
//SusiThought is a JSONObject so we simply copy-paste the serviceImpl code here

}

Once this is done, we write up the queries in the susi_cognition.

As you may have read in my last blog post, TwitterAnalysis gives the Twitter Profile analysis of a user, it’s basically statistics, so we could have a lot of queries regarding this. So these are the rules I implemented, they are self-explanatory on reading the example fields:


{
			"keys"   :["tweet frequency", "tweets", "month"],
			"score"  :2000,
			"example": "How many tweets did melaniatrump post in May 2016",
			"phrases":[ {"type":"pattern", "expression":"* tweet frequency of * in *"},
				{"type":"pattern", "expression":"* tweets did * post in *"},
				{"type":"pattern", "expression":"* tweets did * post in the month of *"}
			],
			"process":[ {"type": "console", "expression": "SELECT yearwise[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ tweeted $count$ times in $3$"
			]}]
		},
		{
			"keys"   :["tweet frequency", "tweets", "post", "at"],
			"score"  :2000,
			"example": "How many tweets did melaniatrump post at 6 PM",
			"phrases":[ {"type":"pattern", "expression":"* tweet frequency of * at *"},
				{"type":"pattern", "expression":"* tweets did * post at *"}
			],
			"process":[ {"type": "console", "expression": "SELECT hourwise[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ tweeted $count$ times at $3$"
			]}]
		},
		{
			"keys"   :["tweet frequency", "tweets", "post", "on"],
			"score"  :2000,
			"example": "How many tweets did melaniatrump post on Saturdays",
			"phrases":[ {"type":"pattern", "expression":"* tweet frequency of * on *s"},
				{"type":"pattern", "expression":"* tweets did * post on *s"},
				{"type":"pattern", "expression":"* tweet frequency of * on *"},
				{"type":"pattern", "expression":"* tweets did * post on *"}
			],
			"process":[ {"type": "console", "expression": "SELECT daywise[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ tweeted $count$ times on $3$"
			]}]
		},
		{
			"keys"   :["tweet frequency", "chart"],
			"score"  :2000,
			"example": "Show me the yearwise tweet frequency chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* the * tweet frequency chart of *"}],
			"process":[ {"type": "console", "expression": "SELECT $2$ FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"This is the $2$ frequency chart of $3$"
			]}, {"type":"table"}]
		},
		{
			"keys"   :["tweet type", "post", "a"],
			"score"  :2000,
			"example": "How many times did melaniatrump post a video",
			"phrases":[ {"type":"pattern", "expression":"* did * post a *"}],
			"process":[ {"type": "console", "expression": "SELECT $3$ AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ posted a $3$ $count$ times"
			]}]
		},
		{
			"keys"   :["tweet activity", "likes", "count"],
			"example": "How many likes does melaniatrump have in all",
			"score"  :2000,
			"phrases":[ {"type":"pattern", "expression":"* likes does * have *"}
			],
			"process":[ {"type": "console", "expression": "SELECT likes_count AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ has $count$ likes till now"
			]}]
		},
		{
			"keys"   :["tweet activity", "likes", "maximum"],
			"example": "What is the maximum number of likes that melaniatrump got",
			"score"  :2000,
			"phrases":[ {"type":"pattern", "expression":"* maximum * likes that * got"}
			],
			"process":[ {"type": "console", "expression": "SELECT max_likes FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here you go"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet activity", "likes", "average"],
			"example": "What is the average number of likes that melaniatrump gets",
			"score"  :2000,
			"phrases":[ {"type":"pattern", "expression":"* average * likes that * gets"}
			],
			"process":[ {"type": "console", "expression": "SELECT average_number_of_likes AS count FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$3$ gets $count$ likes on an average"
			]}]
		},
		{
			"keys"   :["tweet activity", "likes", "frequency"],
			"score"  :2000,
			"example": "How many times did melaniatrump get 0 likes",
			"phrases":[ {"type":"pattern", "expression":"* * have * likes"},
				{"type":"pattern", "expression":"* * get * likes"}
			],
			"process":[ {"type": "console", "expression": "SELECT likes_chart[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ got $3$ likes, $count$ times"
			]}]
		},
		{
			"keys"   :["tweet activity", "likes", "frequency", "chart"],
			"score"  :2000,
			"example": "Show me the likes frequency chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* likes frequency chart * *"}
			],
			"process":[ {"type": "console", "expression": "SELECT likes_chart FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here is the likes frequency chart"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet activity", "retweets", "count"],
			"score"  :2000,
			"example": "How many retweets does melaniatrump have in all",
			"phrases":[ {"type":"pattern", "expression":"* retweets does * have *"}
			],
			"process":[ {"type": "console", "expression": "SELECT retweets_count AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ has $count$ retweets till now"
			]}]
		},
		{
			"keys"   :["tweet activity", "retweets", "maximum"],
			"score"  :2000,
			"example": "What is the maximum number of retweets that melaniatrump got",
			"phrases":[ {"type":"pattern", "expression":"* maximum * retweets that * got"}
			],
			"process":[ {"type": "console", "expression": "SELECT max_retweets FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here you go"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet activity", "retweets", "average"],
			"score"  :2000,
			"example": "What is the average number of retweets that melaniatrump gets",
			"phrases":[ {"type":"pattern", "expression":"* average * retweets that * gets"}
			],
			"process":[ {"type": "console", "expression": "SELECT average_number_of_retweets AS count FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$3$ gets $count$ retweets on an average"
			]}]
		},
		{
			"keys"   :["tweet activity", "retweets", "frequency"],
			"score"  :2000,
			"example": "How many times did melaniatrump get 0 retweets",
			"phrases":[ {"type":"pattern", "expression":"* * have * retweets"},
				{"type":"pattern", "expression":"* * get * retweets"}
			],
			"process":[ {"type": "console", "expression": "SELECT retweets_chart[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ got $3$ retweets, $count$ times"
			]}]
		},
		{
			"keys"   :["tweet activity", "retweets", "frequency", "chart"],
			"score"  :2000,
			"example": "Show me the retweet frequency chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* retweet frequency chart * *"}
			],
			"process":[ {"type": "console", "expression": "SELECT retweets_chart FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here is the retweets frequency chart"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet activity", "hashtags", "count"],
			"score"  :2000,
			"example": "How many hashtags has melaniatrump used in all",
			"phrases":[ {"type":"pattern", "expression":"* hashtags has * used *"}
			],
			"process":[ {"type": "console", "expression": "SELECT hashtags_used_count AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ has used $count$ hashtags till now"
			]}]
		},
		{
			"keys"   :["tweet activity", "hashtags", "maximum"],
			"score"  :2000,
			"example": "What is the maximum number of hastags that melaniatrump used",
			"phrases":[ {"type":"pattern", "expression":"* maximum * hashtags that * used"}
			],
			"process":[ {"type": "console", "expression": "SELECT max_hashtags FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here you go"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet activity", "hashtags", "average"],
			"score"  :2000,
			"example": "What is the average number of hashtags that melaniatrump uses",
			"phrases":[ {"type":"pattern", "expression":"* average * hashtags that * uses"}
			],
			"process":[ {"type": "console", "expression": "SELECT average_number_of_hashtags_used AS count FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$3$ uses $count$ hashtags on an average"
			]}]
		},
		{
			"keys"   :["tweet activity", "hashtags", "frequency"],
			"score"  :2000,
			"example": "How many times did melaniatrump use 20 hashtags",
			"phrases":[ {"type":"pattern", "expression":"* * use * hashtags"}
			],
			"process":[ {"type": "console", "expression": "SELECT hashtags_chart[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ used $3$ hashtags, $count$ times"
			]}]
		},
		{
			"keys"   :["tweet activity", "hashtags", "frequency", "chart"],
			"score"  :2000,
			"example": "Show me the hashtag frequency chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* hashtag frequency chart * *"}
			],
			"process":[ {"type": "console", "expression": "SELECT hashtags_chart FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here is the hashtags frequency chart"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet content", "language", "frequency"],
			"score"  :2000,
			"example": "How many tweets did melaniatrump write in English?",
			"phrases":[ {"type":"pattern", "expression":"* * write in *"},
				{"type":"pattern", "expression":"* * post in *"},
				{"type":"pattern", "expression":"* of * were written in *"}
			],
			"process":[ {"type": "console", "expression": "SELECT languages[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ posted $count$ tweets in $3$"
			]}]
		},
		{
			"keys"   :["tweet content", "language", "analysis", "chart"],
			"score"  :2000,
			"example": "Show me the language analysis chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* language analysis chart * *"}
			],
			"process":[ {"type": "console", "expression": "SELECT languages FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here is the language analysis chart"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet content", "sentiment", "analysis", "chart"],
			"score"  :2000,
			"example": "Show me the sentiment analysis chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* sentiment analysis chart * *"}
			],
			"process":[ {"type": "console", "expression": "SELECT sentiments FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here is the sentiment analysis chart"
			]}, {"type": "table"}]
		}

(PS: no points for guessing why melaniatrump is there in the examples 😉 )

As has been explained before, I simply write up an expression consisting of parameters and some core words which are hardcoded, and I then fetch up the parameters using $x$ (x = parameter number). These queries can actually give a whole lot of statistics regarding a user’s activity and activity on his profile, so it is definitely a whole lot useful for a chatbot.

Now, to end this, we need a way to process these queries. Enter ConsoleService. Notice that all the process['expression'] SQL queries are of the type:

SELECT <something> FROM twitanalysis where screen_name = '<parameter_of_username>' AND count = '1000';

I have taken count as 1000 because as mentioned in my last blog post, the scraper scrapes and displays a maximum of 1000 results at a time, so I wish to maximise the range.

Converting the above SQL generalised query to regex, we get this form:

SELECT\\h+?(.*?)\\h+?FROM\\h+?twitanalysis\\h+?WHERE\\h+?screen_name\\h??=\\h??'(.*?)'\\h+?AND\\h+?count\\h??=\\h??'(.*?)'\\h??;

The \h is for a whitespace that may occur, and the random queries are just expressed using (.*) (random character selection in regex). Since we have specific parameters (as described above) that we use in our SQL queries, we encapsulate these random character selections into groups.

Now, we need to compile this regex, and point to what needs to be done. This is done in ConsoleService.


dbAccess.put(Pattern.compile("SELECT\\h+?(.*?)\\h+?FROM\\h+?twitanalysis\\h+?WHERE\\h+?screen_name\\h??=\\h??'(.*?)'\\h+?AND\\h+?count\\h??=\\h??'(.*?)'\\h??;"), (flow, matcher) -> {
            SusiThought json = TwitterAnalysisService.showAnalysis(matcher.group(2), matcher.group(3));
            SusiTransfer transfer = new SusiTransfer(matcher.group(1));
            json.setData(transfer.conclude(json.getData()));
            return json;
        });

We basically compile the regex, and feed it to a bifunction (lambda lingo for a function that takes in two params). We take in the groups using matcher.group and since you saw above, the SusiThought object in TwitterAnalysis takes in screen_name and count, so we take them in using the matcher and feed them to the static function showAnalysisinside the TwitterAnalysis servlet. We then get back the JSON. This completes the procedure. TwitterAnalysis is now integrated with the Susi API. 🙂

In my next blog posts, I’ll talk about Bot integrations for Susi, and a Slack Bot for Susi I made, and then I’ll move to Susi monetisation using Amazon API. Feedback is welcome 🙂

Social Media Analysis using Loklak (Part 3)

Improving Code Coverage for Loklak PHP API

Tests


This week, I added tests for suggest, map, markdown, push and susi APIs and fixed tests for user topology related queries. PHPUnit was used as the testing framework since it provides code-coverage support.

Loklak PHP API now has a good test-suite support and associated examples to help you use our services

Below are the Tests that were added.

testPushto test Push API.Selection_001

testSusito test newly added Susi API. Selection_003

Refer to this and this for more info about Susi

testMap – to test map API.

testMarkdown – to test markdown API.Selection_002

For more detailed information regarding the entire Loklak PHP API test-suite, refer to this.

In above process, code-coverage was increased from 33% to 61%. Test-suite continuously updated as new APIs are added to Loklak.

Source support for Search API


Apart from that, since Loklak is scaling beyond Twitter. Source argument has been added to the search API to define the source of search results. As far as wordpress plugins are concerned, since they only require twitter results for now, source has been added as a default argument. See below code: Selection_004

Improving Code Coverage for Loklak PHP API

Push & Pull : Scraped Data into Index and back

With many scrapers being integrated into the Loklak server, it is but natural that the load on the server would also increase if multitude of requests are to be served each millisecond.

Initially, when Loklak only harvested tweets from Twitter, Elasticsearch was implemented along-with a Data Access Object to do the needful task of indexing.

The JSON Object(s) pushed into the index were of the form statuses and had to be in a specific format to be shoved and retrieved easily in the index.

Sample:


{
  "statuses": [
    {
      "id_str": "yourmessageid_1234",
      "screen_name": "testuser",
      "created_at": "2016-07-22T07:53:24.000Z",
      "text": "The rain is spain stays always in the plain",
      "source_type": "GENERIC",
      "place_name": "Georgia, USA",
      "location_point": [
        3.058579854228782,
        50.63296878274201
      ],
      "location_radius": 0,
      "user": {
        "user_id": "youruserid_5678",
        "name": "Mr. Bob",
        
      }
    }
  ]
}

But with the inclusion of many other scrapers like Github, WordPress, Event Brite etc. and RSS Readers it was a bit cumbersome to use the exact same format as that of Twitter because not all fields matched.

For example:


{
  "data": [
    {
      "location": "Canada - Ontario - London",
      "time": "Sun 9:33 PM"
    },
    {
      "location": "South Africa - East London",
      "time": "Mon 3:33 AM"
    }
  ]
}

Hence, Scott suggested an idea of implementing a DAO Wrapper which would enable us to use the same schema as that of Twitter Index to push and pull data.

DAO Wrapper was implemented as GenericJSONBuilder which had the feature of adding the remaining fields of data other than the text into the same schema using Markdown Format

Peeking into the code:


package org.loklak.data;

import javafx.util.Pair;
import org.loklak.objects.MessageEntry;
import org.loklak.objects.QueryEntry;
import org.loklak.objects.SourceType;
import org.loklak.objects.UserEntry;

import java.net.MalformedURLException;
import java.util.*;

/**
 * The json below is the minimum json
 * {
 "statuses": [
 {
 "id_str": "yourmessageid_1234",
 "screen_name": "testuser",
 "created_at": "2016-07-22T07:53:24.000Z",
 "text": "The rain is spain stays always in the plain",
 "source_type": "GENERIC",
 "place_name": "Georgia, USA",
 "location_point": [3.058579854228782,50.63296878274201],
 "location_radius": 0,
 "user": {
 "user_id": "youruserid_5678",
 "name": "Mr. Bob",
 }
 }
 ]
 }
 */
public class DAOWrapper {
    public static final class GenericJSONBuilder{
        private String id_str = null;
        private String screen_name = "unknown";
        private Date created_at = null;
        private String text = "";
        private String place_name = "unknown";
        private String user_name = "[email protected]";
        private String user_id = "unknown";
        private String image = null;
        private double lng = 0.0;
        private double lat = 0.0;
        private int loc_radius = 0;
        private ArrayList extras = new ArrayList();


        /**
         * Not required
         * @param author
         * @param domain
         * @return
         */
        public GenericJSONBuilder setAuthor(String author, String domain){
            user_name = author + "@" + domain;
            screen_name = author;
            return this;
        }

        /**
         * Not required
         * @param user_id_
         * @return
         */
        public GenericJSONBuilder setUserid(String user_id_){
            user_id = user_id_;
            return this;
        }

        /**
         * Not required
         * @param id_str_
         * @return
         */
        public GenericJSONBuilder setIDstr(String id_str_){
            id_str = id_str_;
            return this;
        }

        /**
         * Not required
         * @param createdTime
         * @return
         */
        public GenericJSONBuilder setCreatedTime(Date createdTime){
            created_at = createdTime;
            return this;
        }

        /**
         * Required
         * This is the text field. You can use JSON style in this field
         * @param text_
         * @return
         */
        public GenericJSONBuilder addText(String text_){
            text = text + text_;
            return this;
        }

        /**
         * Not required
         * @param name
         * @return
         */
        public GenericJSONBuilder setPlaceName(String name){
            place_name = name;
            return this;
        }

        /**
         * Not required
         * @param longtitude
         * @param latitude
         * @return
         */
        public GenericJSONBuilder setCoordinate(double longtitude, double latitude){
            lng = longtitude;
            lat = latitude;
            return this;
        }

        /**
         * Not required
         * @param radius
         * @return
         */
        public GenericJSONBuilder setCoordinateRadius(int radius){
            loc_radius = radius;
            return this;
        }


        /**
         * Not required
         * @param key
         * @param value
         * @return
         */
        public GenericJSONBuilder addField(String key, String value){
            String pair_string = "\"" + key + "\": \"" + value + "\"";
            extras.add(pair_string);
            return this;
        }

        private String buildFieldJSON(){
            String extra_json = "";
            for(String e:extras){
                extra_json =  extra_json + e + ",";
            }
            if(extra_json.length() > 2) extra_json = "{" + extra_json.substring(0, extra_json.length() -1) + "}";
            return extra_json;
        }

        /**
         * Not required
         * @param link_
         * @return
         */
        public GenericJSONBuilder setImage(String link_){
            image = link_;
            return this;
        }

        public void persist(){
            try{
                //building message entry
                MessageEntry message = new MessageEntry();

                /**
                 * Use hash of text if id of message is not set
                 */
                if(id_str == null)
                    id_str = String.valueOf(text.hashCode());

                message.setIdStr(id_str);

                /**
                 * Get current time if not set
                 */
                if(created_at == null)
                    created_at = new Date();
                message.setCreatedAt(created_at);


                /**
                 * Append the field as JSON text
                 */
                message.setText(text + buildFieldJSON());

                double[] locPoint = new double[2];
                locPoint[0] = lng;
                locPoint[1] = lat;

                message.setLocationPoint(locPoint);

                message.setLocationRadius(loc_radius);

                message.setPlaceName(place_name, QueryEntry.PlaceContext.ABOUT);
                message.setSourceType(SourceType.GENERIC);

                /**
                 * Insert if there is a image field
                 */
                if(image != null) message.setImages(image);

                //building user
                UserEntry user = new UserEntry(user_id, screen_name, "", user_name);

                //build message and user wrapper
                DAO.MessageWrapper wrapper = new DAO.MessageWrapper(message,user, true);

                DAO.writeMessage(wrapper);
            } catch (MalformedURLException e){
            }
        }
    }





    public static GenericJSONBuilder builder(){
        return new GenericJSONBuilder();
    }





    public static void insert(Insertable msg){

        GenericJSONBuilder bd = builder()
        .setAuthor(msg.getUsername(), msg.getDomain())
        .addText(msg.getText())
        .setUserid(msg.getUserID());


        /**
         * Insert the fields
         */
        List<Pair<String, String>> fields = msg.getExtraField();
        for(Pair<String, String> field:fields){
            bd.addField(field.getKey(), field.getValue());
        }
    }
}

DAOWrapper was then used with other scrappers to push the data into the index as:


...
DAOWrapper dw = new DAOWrapper();
dw.builder().addText(json.toString());
dw.builder().setUserid("profile_"+profile);
dw.builder().persist();
...

Here , addText(...) can be used several times to insert text in the object but set...(...) methods should be used only once and perist() should also be used only once as this is the method which finally pushes into the index.

Now, when a scraper receives a request to scrape a given HTML page, a check is first made if the data already exists in the index with the help of a unique userIDString. This saves the time and effort of scraping the page all over again, instead it simply returns the saved instance.

The check is done something like this:


if(DAO.existUser("profile_"+profile)){
    /*
     *  Return existing JSON Data
    */
}else{
    /*
     *  Scrape the HTML Page addressed by the given URL
    */
}

This pushing and pulling into the index would certainly reduce the load on the Loklak server.

Feel free to ask questions regarding the above.

Feedback and Suggestions welcome 🙂

Push & Pull : Scraped Data into Index and back

All About Peer Deploy App – Part One

We all know that loklak is a distributed peer to peer sharing system, where in you can host your own loklak peer. The advantages of having your own server is that you have the privileges to share your search data which goes into the indexing, eventually resulting in faster search results. So how can this “peer deploy” app help loklak to get more servers up. Before going into the details, let’s discuss a bit about the peers API provided by Loklak.

Loklak provides a transparent view of its peers and server deploys through its peers API.

http://loklak.org/api/peers.json

This API gives you the details of the loklak peers and count of the active servers.

{
      "class": "SuggestServlet",
      "host": "169.55.12.244",
      "port.http": 9000,
      "port.https": 9443,
      "lastSeen": 1470365717753,
      "lastPath": "/api/suggest.json",
      "peername": "anonymous"
},

So you can be a part of this group and host your own search engine and share your indexed data.

There are many ways, where you can deploy a loklak peer. But this app provides you a simple one click deploy buttons.

Selection_273

These one click deploy buttons determine what code you are trying to deploy. If you’re not logged in or don’t have an account, you’ll go through the login flow first. For instance, Heroku uses an app.json manifest in the code repo to figure out what add-ons, config and other deployment steps are required to make the code run. This is used to configure and deploy the app. The similar flow goes with other buttons too, where you deploy to Docker containers, Bluemix and Scalingo.

This app aims at having three modules,

  1. Providing all the one click deploy buttons at one place.
  2. Display of the peers network using D3.js charts.
  3. Have a leaderboard page, counting the number deploys per user.

This app is complete upto the first two levels, the upcoming enhancement can be done using loklak_depot module.

Technology stack

  1. The buildpack is available previously, and the buttons are embedded using the html tags provided by each service provider. Here is the code to the app.
  2. The app is written in angularJS and the Force directed graph is built using the d3.js library.
  3. The app consumes the peers.json to get the data for displaying the graph.

Here is a screenshot of the app.

Selection_272

The upcoming enhancement is to have a leaderboard depending on the number of peers deployed per user. If you are interested you can try deploying the peer from here itself. Checkout how simple it can be to deploy.

 


Deploy


Deploy on Scalingo


Deploy to Bluemix


Deploy to Docker Cloud

All About Peer Deploy App – Part One

Codecov and Gemnasium support for Loklak Webclient

To follow good development practices,  Codecov and Gemnasium support were recently added to loklak_webclient.

Codecov gathers the coverage reports, from all languages, and aggregate the results in a single cohesive report. it helps to find how much code is covered by your tests, sends the information back to developers, to ensure good code quality.

To Add Codecov support to loklak_webclient, I used codecov multilingual bash command. Since coverage report is to be sent to Codecov after Travis CI executes the test suite on the repository, following code was added to .travis.yml.

after_success:
  - bash <(curl -s https://codecov.io/bash)

Gemnasium keeps track of projects dependencies. Since it has extended its support to NodeJS projects on github, it was ideal choice for loklak_webclient. I will be adding Gemnasium support for several other loklak repositories in future.blog3_1

Badges for both integrations were added to Readme.md.

Codecov and Gemnasium support for Loklak Webclient