Social Media Analysis using Loklak (Part 1)

So now in the past 7 posts, I have covered how loklak_depot started off, the beginnings, and how we got loklak_depot on the World Wide Web so that people got to know more about it. Now, I shall go a bit deeper with what we at loklak are trying to do with all the information we are getting.

Since Twitter gives a lot of information, about how people are living their lives, and things on a variety of topics: sports, business, finance, food etc, it is human information and such information can be taken as a (not-completely because there are a lot of users) reliable source for a variety of data analysis. We could do a host of things with that data we get from tweets, and a lot of things as a matter of fact have been done already, like stock market predictions, sentiment analysis etc.

One of these uses is also analysing Social Media profiles. Give in a Twitter / any Online Social Media profile, and the program gives you a complete analysis of the posts that you make and the people you connect with. Another use is Susi, our NLP-based Search Engine, on which a couple of blogposts have already been made.

So while I was spotting issues with the Search API and checking out the others, I had an idea to make an small application to which we supply a Twitter username and we get a Profile analysis of that username. The app has been built and works well. I have only built the servlet because I planned on integrating it to Susi, and using it with susi would actually give a lot of interesting answers.

The app / servlet has the following features I decided to implement:

1. User’s Activity Frequency: yearly, hourly and daily statistics etc
2. User’s Activity Type: how many times did user upload photo / video / status etc
3. Activity on User’s Activity: Number of likes / retweets user did, hashtags analysis etc
4. Analysis of User’s content: Language and Sentiment analysis

So here is the entire code for the program. After this, I will explain how I implemented the parts step by step:


public class TwitterAnalysisService extends AbstractAPIHandler implements APIHandler {

	private static final long serialVersionUID = -3753965521858525803L;

	private static HttpServletRequest request;

	@Override
	public String getAPIPath() {
		return "/api/twitanalysis.json";
	}

	@Override
	public BaseUserRole getMinimalBaseUserRole() {
		return BaseUserRole.ANONYMOUS;
	}

	@Override
	public JSONObject getDefaultPermissions(BaseUserRole baseUserRole) {
		return null;
	}

	@Override
	public JSONObject serviceImpl(Query call, HttpServletResponse response, Authorization rights,
			JSONObjectWithDefault permissions) throws APIException {
		String username = call.get("screen_name", "");
		String count = call.get("count", "");
		TwitterAnalysisService.request = call.getRequest();
		return showAnalysis(username, count);
	}

	public static SusiThought showAnalysis(String username, String count) {

		SusiThought json = new SusiThought();
		JSONArray finalresultarray = new JSONArray();
		JSONObject finalresult = new JSONObject(true);
		String siteurl = request.getRequestURL().toString();
		String baseurl = siteurl.substring(0, siteurl.length() - request.getRequestURI().length())
				+ request.getContextPath();

		String searchurl = baseurl + "/api/search.json?q=from%3A" + username + (count != "" ? ("&count=" + count) : "");
		byte[] searchbyte;
		try {
			searchbyte = ClientConnection.download(searchurl);
		} catch (IOException e) {
			return json.setData(new JSONArray().put(new JSONObject().put("Error", "Can't contact server")));
		}
		String searchstr = UTF8.String(searchbyte);
		JSONObject searchresult = new JSONObject(searchstr);

		JSONArray tweets = searchresult.getJSONArray("statuses");
		if (tweets.length() == 0) {
			finalresult.put("error", "Invalid username " + username + " or no tweets");
			finalresultarray.put(finalresult);
			json.setData(finalresultarray);
			return json;
		}
		finalresult.put("username", username);
		finalresult.put("items_per_page", searchresult.getJSONObject("search_metadata").getString("itemsPerPage"));
		finalresult.put("tweets_analysed", searchresult.getJSONObject("search_metadata").getString("count"));

		// main loop
		JSONObject activityFreq = new JSONObject(true);
		JSONObject activityType = new JSONObject(true);
		int imgCount = 0, audioCount = 0, videoCount = 0, linksCount = 0, likesCount = 0, retweetCount = 0,
				hashtagCount = 0;
		int maxLikes = 0, maxRetweets = 0, maxHashtags = 0;
		String maxLikeslink, maxRetweetslink, maxHashtagslink;
		maxLikeslink = maxRetweetslink = maxHashtagslink = tweets.getJSONObject(0).getString("link");
		List tweetDate = new ArrayList();
		List tweetHour = new ArrayList();
		List tweetDay = new ArrayList();
		List likesList = new ArrayList();
		List retweetsList = new ArrayList();
		List hashtagsList = new ArrayList();
		List languageList = new ArrayList();
		List sentimentList = new ArrayList();
		Calendar calendar = Calendar.getInstance();

		for (int i = 0; i < tweets.length(); i++) {
			JSONObject status = tweets.getJSONObject(i);
			String[] datearr = status.getString("created_at").split("T")[0].split("-");
			calendar.set(Integer.parseInt(datearr[0]), Integer.parseInt(datearr[1]) - 1, Integer.parseInt(datearr[2]));
			Date date = new Date(calendar.getTimeInMillis());
			tweetDate.add(new SimpleDateFormat("MMMM yyyy").format(date));
			tweetDay.add(new SimpleDateFormat("EEEE", Locale.ENGLISH).format(date)); // day
			String times = status.getString("created_at").split("T")[1];
			String hour = times.substring(0, times.length() - 5).split(":")[0];
			tweetHour.add(hour); // hour
			imgCount += status.getInt("images_count");
			audioCount += status.getInt("audio_count");
			videoCount += status.getInt("videos_count");
			linksCount += status.getInt("links_count");
			likesList.add(status.getInt("favourites_count"));
			retweetsList.add(status.getInt("retweet_count"));
			hashtagsList.add(status.getInt("hashtags_count"));
			if (status.has("classifier_emotion")) {
				sentimentList.add(status.getString("classifier_emotion"));
			} else {
				sentimentList.add("neutral");
			}
			if (status.has("classifier_language")) {
				languageList.add(status.getString("classifier_language"));
			} else {
				languageList.add("no_text");
			}
			if (maxLikes < status.getInt("favourites_count")) {
				maxLikes = status.getInt("favourites_count");
				maxLikeslink = status.getString("link");
			}
			if (maxRetweets < status.getInt("retweet_count")) {
				maxRetweets = status.getInt("retweet_count");
				maxRetweetslink = status.getString("link");
			}
			if (maxHashtags < status.getInt("hashtags_count")) {
				maxHashtags = status.getInt("hashtags_count");
				maxHashtagslink = status.getString("link");
			}
			likesCount += status.getInt("favourites_count");
			retweetCount += status.getInt("retweet_count");
			hashtagCount += status.getInt("hashtags_count");
		}
		activityType.put("posted_image", imgCount);
		activityType.put("posted_audio", audioCount);
		activityType.put("posted_video", videoCount);
		activityType.put("posted_link", linksCount);
		activityType.put("posted_story",
				Integer.parseInt(searchresult.getJSONObject("search_metadata").getString("count"))
						- (imgCount + audioCount + videoCount + linksCount));

		JSONObject yearlyact = new JSONObject(true);
		JSONObject hourlyact = new JSONObject(true);
		JSONObject dailyact = new JSONObject(true);
		Set yearset = new HashSet(tweetDate);
		Set hourset = new HashSet(tweetHour);
		Set dayset = new HashSet(tweetDay);

		for (String s : yearset) {
			yearlyact.put(s, Collections.frequency(tweetDate, s));
		}

		for (String s : hourset) {
			hourlyact.put(s, Collections.frequency(tweetHour, s));
		}

		for (String s : dayset) {
			dailyact.put(s, Collections.frequency(tweetDay, s));
		}

		activityFreq.put("yearwise", yearlyact);
		activityFreq.put("hourwise", hourlyact);
		activityFreq.put("daywise", dailyact);
		finalresult.put("tweet_frequency", activityFreq);
		finalresult.put("tweet_type", activityType);

		// activity on my tweets

		JSONObject activityOnTweets = new JSONObject(true);
		JSONObject activityCharts = new JSONObject(true);
		JSONObject likesChart = new JSONObject(true);
		JSONObject retweetChart = new JSONObject(true);
		JSONObject hashtagsChart = new JSONObject(true);

		Set likesSet = new HashSet(likesList);
		Set retweetSet = new HashSet(retweetsList);
		Set hashtagSet = new HashSet(hashtagsList);

		for (Integer i : likesSet) {
			likesChart.put(i.toString(), Collections.frequency(likesList, i));
		}

		for (Integer i : retweetSet) {
			retweetChart.put(i.toString(), Collections.frequency(retweetsList, i));
		}

		for (Integer i : hashtagSet) {
			hashtagsChart.put(i.toString(), Collections.frequency(hashtagsList, i));
		}

		activityOnTweets.put("likes_count", likesCount);
		activityOnTweets.put("max_likes",
				new JSONObject(true).put("number", maxLikes).put("link_to_tweet", maxLikeslink));
		activityOnTweets.put("average_number_of_likes",
				(likesCount / (Integer.parseInt(searchresult.getJSONObject("search_metadata").getString("count")))));

		activityOnTweets.put("retweets_count", retweetCount);
		activityOnTweets.put("max_retweets",
				new JSONObject(true).put("number", maxRetweets).put("link_to_tweet", maxRetweetslink));
		activityOnTweets.put("average_number_of_retweets",
				(retweetCount / (Integer.parseInt(searchresult.getJSONObject("search_metadata").getString("count")))));

		activityOnTweets.put("hashtags_used_count", hashtagCount);
		activityOnTweets.put("max_hashtags",
				new JSONObject(true).put("number", maxHashtags).put("link_to_tweet", maxHashtagslink));
		activityOnTweets.put("average_number_of_hashtags_used",
				(hashtagCount / (Integer.parseInt(searchresult.getJSONObject("search_metadata").getString("count")))));

		activityCharts.put("likes", likesChart);
		activityCharts.put("retweets", retweetChart);
		activityCharts.put("hashtags", hashtagsChart);
		activityOnTweets.put("frequency_charts", activityCharts);
		finalresult.put("activity_on_my_tweets", activityOnTweets);

		// content analysis
		JSONObject contentAnalysis = new JSONObject(true);
		JSONObject languageAnalysis = new JSONObject(true);
		JSONObject sentimentAnalysis = new JSONObject(true);
		Set languageSet = new HashSet(languageList), sentimentSet = new HashSet(sentimentList);

		for (String s : languageSet) {
			languageAnalysis.put(s, Collections.frequency(languageList, s));
		}

		for (String s : sentimentSet) {
			sentimentAnalysis.put(s, Collections.frequency(sentimentList, s));
		}
		contentAnalysis.put("language_analysis", languageAnalysis);
		contentAnalysis.put("sentiment_analysis", sentimentAnalysis);
		finalresult.put("content_analysis", contentAnalysis);
		finalresultarray.put(finalresult);
		json.setData(finalresultarray);
		return json;
	}
}

The code first gets the Search API data from the username (stores it in a byte Array and then converts it to a JSONObject). It returns a SusiThought object so that it can be integrated into Susi. Once that is done, we are all set to analyse the data.

Let us go through the code feature-by-feature and I’ll explain what parts of the code implement those features:

Activity Frequency

For this, I initialised three ArrayLists: tweetDay, tweetHour and tweetDate, which will extract the Date, Day and Time from the tweet timestamp. The extraction is done using the Calendar instance (calculating time elapsed till timestamp and converting it to a date format and getting Month, Day and time from it). Once it is stored in the lists, I use the Collections function of java.util to calculate the frequency, and to keep it unique, I use a HashTable (implemented as a HashSet). So on running, the Activity Frequency of my username looks like this:

Activity Frequency

Activity Type

For this, I extracted the activity types from the Search API and summed every category up (imgCount += status.getInt("images_count"); etc), then directly added them into the JSONObject. Easy.

Activity Type

Activity on User’s Activity

For this, we analyse the stats on likes, retweets and hashtags. I again create three arraylists for them (they store the frequency), and in the main loop, I figure out the maximum number of likes, retweets and hashtags along with the link to the maximum number tweet, and calculate sum of number of likes, retweets and hashtags. Then, I again use Collections to get a frequency chart, and calculate the average number, and add them all into a JSONObject.

Activity on User's Activity

Content Analysis

This was again short: we extracted the emotion and language from the Search API, calculate the number, and put it in the JSONObject like we have done with everything else.

Content Analysis

This completes the TwitterAnalysis servlet: it gives you the entire profile analysis of a Twitter Profile, all powered by loklak.

This was relatively simpler to do. In my next blog post, I will discuss a bit about how loklak scrapes this data and makes it available, and how I integrate this application into Susi, so that we can get interesting profile data from a chatbot effortlessly. As always, feedback is welcome 🙂

Social Media Analysis using Loklak (Part 1)

Spin-Off: Loklak fuels Open Event

Continuing with the Loklak & Open Event Partnership (check out Loklak fuels Open Event ), we can now easily in clicks create our very own web-app for the event with details imported from eventbrite.com powered by Loklak.

The scraping of data done using JSoup, Java HTML parsing was explained in the previous post of this series.

Next, a console service was implemented as the single point for information retrieval from various social networks and websites (a post coming for it soon 😉 ) especially for SUSI (our very own personal digital assistant, a cute one indeed !)

The JSONArray result of the EventBriteCrawler was set in SusiThought, which is nothing but a piece of data that can be remembered. The structure or the thought can be modelled as a table which may be created using the retrieval of information from elsewhere of the current argument.


/** Defining SusiThought as a class 
 * which extends JSONObject
 */

public class SusiThought extends JSONObject {

/* details coming soon.... */

}

/** Modifications in EventBriteCrawler
 *  Returning SusiThought instead of 
 * a simple JSONObject/JSONArray.
 */
public static SusiThought crawlEventBrite(String url) {
    ...
    ...    
    SusiThought json = new SusiThought();
    json.setData(jsonArray);
    return json;
}

 

The API EndPoint was thus created.
A sample is as: http://loklak.org/api/console.json?q=SELECT * FROM eventbrite WHERE url=’https://www.eventbrite.fr/e/billets-europeade-2016-concert-de-musique-vocale-25592599153′;

Screenshot from 2016-07-15 13:22:00

 

The files generated were next imported in the Open Event Web App generator, using simple steps.

screenshot-from-2016-07-04-075700

Screenshot from 2016-07-15 13:25:39 Screenshot from 2016-07-15 13:36:19

It’s amazing to see how a great visual platform is provided to edit details parsed from the plain JSONObject and deploy the personalized web-app !

Screenshot from 2016-07-15 13:59:47

Screenshot from 2016-07-15 12:55:06Screenshot from 2016-07-15 12:55:18Screenshot from 2016-07-15 12:55:29Screenshot from 2016-07-15 12:54:24
Screenshot from 2016-07-15 12:54:33
Tadaa !
We have our very own event web-app with all the information imported from eventbrite.com in a single (well, very few 😛 ) click (s) !

With this, we conclude the Loklak – Open Event – EventBrite series.

Stay tuned for detailed post on SUSI and Console Services 🙂

Spin-Off: Loklak fuels Open Event

Teach Susi some rules

“I’m a young, naive being, sometimes unsure about my own statements, sometimes quirky and maybe rough, provocative in cases where the person who talks to me is provocative as well. In other cases I’m a friendly and a bit cute being”, says Susi.

She needs some help to train her personality and give the best assistance to it’s users, at the same time offers her help. So this blog post helps you on how to can get her help according to your needs.

Here is the data stub, susi_cognition_000.json, where you can teach susi all the manners. Like, produce some rules. There are few interesting things one must know before teaching her the rules.

{
keys :[belief],
score :1,
phrases:[ {type:pattern, expression:*I feel *} ],
process:[],
actions:[ {type:answer, select:random, phrases:[
Why do you feel $2$?
]}]
},

“keys” – These are the words list which helps a rule to get categorized. Like a particular rule can be categorized into happy when the keys are happy, joyous, festive.  The above is an example of a rule(JSON object), which has the following keys.

  • “score” The score determines the priority of the rule. When there are multiple rules under same category, these gets prioritized based on the score. So when there are similar rules out there, the score tags play it’s required role.
  • “phrases” – This sub object has more fields namely type and expression. Type can be a pattern or a regular expression, defining the kind of expression which you are giving. Suppose the expression is a pattern like the above example *I feel * then the type will be a pattern. similarly for regular expressions.
  • “process” – Will explain about the process in the next example rule.
  • “actions” – This has fields like type , select and phrases. The type here defines the type of the answer to be given such as text(answer), pie charts and tables. (the current supported features). select tag ensures the random select of the answer and the phrases contains the list of possible answer styles.

The above example gives you the answer, on using the Susi service.

http://loklak.org/api/susi.json?q=i feel super happy

{
  "session": {"identity": {
    "type": "host",
    "name": "127.0.0.1",
    "anonymous": true
  }},
  "count": 1,
  "answers": [{
    "metadata": {
      "hits": 1,
      "offset": 0,
      "count": 1
    },
    "data": [{
      "0": "i feel super happy",
      "1": "",
      "2": "super happy"
    }],
    "actions": [{
      "expression": "Why do you feel super happy?",
      "type": "answer"
    }]
  }]
}

The above JSON is the result of the susi result. Woah this is how susi replies back. So the answer is here.

"actions": [{
      "expression": "Why do you feel super happy?",
      "type": "answer"
}]

The above answer format has a data

This is a sample example. Let’s dive into some more rules with different formats.

{
keys :[how],
score :100,
example:How many mentions are on reddit about loklak,
phrases:[ {type:pattern, expression:How many mentions are on reddit about *}
],
process:[ {type:console, expression:SELECT COUNT(*) AS count FROM rss WHERE url=’https://www.reddit.com/search.rss?q=$1$’;}],
actions:[ {type:answer, select:random, phrases:[
Here you go, $count$ times!
]}]
},

The above example is a score 100 rule with a high priority range than the previous example. This rule will try to answer the query related to Reddit mentions of a particular tag or a name. Here are few key differences to be noted.

  • “example” – The example key gives a sample query which is useful for understanding about the rule’s input.
  • “process” – This key is being used when you have to get the data from an external service. The above example is trying to query the tables being present using the SQL syntax. It is a console type of querying where the count is being calculated from the rss table from where the data can be retrieved from the URL. (Standard format for rss reader which is being supported by loklak: https://www.reddit.com/search.rss?q=term; 

For the above rule, susi response is this,

http://loklak.org/api/susi.json?q=How many mentions are on reddit about loklak

{
  "session": {"identity": {
    "type": "host",
    "name": "127.0.0.1",
    "anonymous": true
  }},
  "count": 1,
  "answers": [{
    "data": [{"count": 5}],
    "metadata": {"count": 1},
    "actions": [{
      "expression": "Here you go, 5 times!",
      "type": "answer"
    }]
  }]
}

And here’s the required answer :

"actions": [{
      "expression": "Here you go, 5 times!",
      "type": "answer"
 }]

So till now we have seen examples where we can retrieve text answers, Here comes an example which can return a table or a list of answers.

{
keys :[reddit],
score :100,
example:What are the reddit articles about loklak,
phrases:[ {type:pattern, expression:* reddit * about *}
],
process:[ {type:console, expression:SELECT title FROM rss WHERE url=’https://www.reddit.com/search.rss?q=$3$’;}],
actions:[ {type:answer, select:random, phrases:[
Here is the list of articles about $3$
]}, {type:table}]
},

For the above result you can even mention about type being table for getting the list of data. So here is the Susi’s response

http://loklak.org/api/susi.json q=What are the reddit articles about loklak

{
  "session": {"identity": {
    "type": "host",
    "name": "127.0.0.1",
    "anonymous": true
  }},
  "count": 1,
  "answers": [{
    "data": [
      {"title": "programming"},
      {"title": "Datasets Archive"},
      {"title": "We collected 1.3 billion tweets with a distributed, peer-to-peer based free, open source twitter scraper that has a nice API for your self-made apps to evaluate the data: loklak.org"},
      {"title": "A look at Loklak.org and the Webclient progress"},
      {"title": "#Loklak Web client Test A Twitter search engine loklak.org"}
    ],
    "metadata": {"count": 5},
    "actions": [
      {
        "expression": "Here is the list of articles about loklak",
        "type": "answer"
      },
      {"type": "table"}
    ]
  }]
}

So the data object under answers gives you the list of titles tagged under loklak (PS: This is from Reddit)

Another example for getting data, using which one can form pie charts.

{
keys :[president,election,america],
score :100,
comment:a statistical app which tries to predict the american presidential election,
phrases:[ {type:regex, expression:Who will win the 2016 presidential election},
{type:regex, expression:Who will (?:be|become) the next us president}
],
process:[ {type:console, expression:SELECT PERCENT(count) AS percent, hashtag AS president FROM (SELECT COUNT(*) AS count, hashtags AS hashtag FROM messages WHERE query=’election2016′ GROUP BY hashtags) WHERE hashtag IN (‘hillaryclinton’,’berniesanders’,’donaldtrump’);}],
actions:[ {type:answer, select:random, phrases:[
I believe the next president will be $president$ with a likelihood of $percent$ percent but I a not sure.,
I can calculate a likelihood, here is my guess:
]},
{type:piechart, total:100, key: country, value:percent, unit:%}]
},

So the above example gives you the predictions for the US elections. Here you go, Susi’s response.

http://loklak.org/api/susi.json?q=Who will win the 2016 presidential election

{
  "session": {"identity": {
    "type": "host",
    "name": "10.67.93.57",
    "anonymous": true
  }},
  "count": 1,
  "answers": [{
    "metadata": {
      "hits": 21,
      "offset": 0,
      "count": 2
    },
    "data": [
      {
        "percent": 50,
        "president": "hillary2016"
      },
      {
        "percent": 50,
        "president": "trump2016"
      }
    ],
    "actions": [
      {
        "expression": "I believe the next president will be hillary2016 with a likelihood of 50.0 percent but I a not sure.",
        "type": "answer"
      },
      {
        "total": 100,
        "unit": "%",
        "type": "piechart",
        "value": "percent",
        "key": "country"
      }
    ]
  }]
}

Important Nomenclature for defining the variables

Suppose the expression is of the form

* reddit * about *

Using the above expression the following sentences can be formed. “What are the Reddit articles about Loklak” So the first set of words before reddit can be defined as $1$ variable, the set of words between reddit and about can be defined as $2$ variable and the words after about can be $3$ variable, so on. This is how the variable system works. As a whole, the complete sentence can be considered as a $0$ variable.

Some important data sources for console querying

 * http://loklak.org/api/console.json?q=SELECT%20text,%20screen_name,%20user.name%20AS%20user%20FROM%20messages%20WHERE%20query=%271%27;
 * http://loklak.org/api/console.json?q=SELECT%20*%20FROM%20messages%20WHERE%20id=%27742384468560912386%27;
 * http://loklak.org/api/console.json?q=SELECT%20link,screen_name%20FROM%20messages%20WHERE%20id=%27742384468560912386%27;
 * http://loklak.org/api/console.json?q=SELECT%20COUNT(*)%20AS%20count,%20screen_name%20AS%20twitterer%20FROM%20messages%20WHERE%20query=%27loklak%27%20GROUP%20BY%20screen_name;
 * http://loklak.org/api/console.json?q=SELECT%20PERCENT(count)%20AS%20percent,%20screen_name%20FROM%20(SELECT%20COUNT(*)%20AS%20count,%20screen_name%20FROM%20messages%20WHERE%20query=%27loklak%27%20GROUP%20BY%20screen_name)%20WHERE%20screen_name%20IN%20(%27leonmakk%27,%27Daminisatya%27,%27sudheesh001%27,%27shiven_mian%27);
 * http://loklak.org/api/console.json?q=SELECT%20query,%20query_count%20AS%20count%20FROM%20queries%20WHERE%20query=%27auto%27;
 * http://loklak.org/api/console.json?q=SELECT%20*%20FROM%20users%20WHERE%20screen_name=%270rb1t3r%27;
 * http://loklak.org/api/console.json?q=SELECT%20place[0]%20AS%20place,%20population,%20location[0]%20AS%20lon,%20location[1]%20AS%20lat%20FROM%20locations%20WHERE%20location=%27Berlin%27;
 * http://loklak.org/api/console.json?q=SELECT%20*%20FROM%20locations%20WHERE%20location=%2753.1,13.1%27;
 * http://loklak.org/api/console.json?q=SELECT%20description%20FROM%20wikidata%20WHERE%20query=%27football%27;
 * http://loklak.org/api/console.json?q=SELECT%20*%20FROM%20meetup%20WHERE%20url=%27http://www.meetup.com/?q=Women-Who-Code-Delhi%27;
 * http://loklak.org/api/console.json?q=SELECT%20*%20FROM%20rss%20WHERE%20url=%27https://www.reddit.com/search.rss?q=loklak%27;
 * http://loklak.org/api/console.json?q=SELECT%20*%20FROM%20eventbrite%20WHERE%20url=%27url=https://www.eventbrite.com/e/?q=global-health-security-focus-africa-tickets-25740798421%27;

The above are the supported data stores and queries. You can have a look at ConsoleService.java

That’s it and this is how you can train Susi and teach her rules !

These would be useful for knowing more about Susi.

What is Susi?

Susi is an artificial intelligence combining pattern matching, internet data, data flow principles and inference engine principles. It will have some reflection abilities and it will be able to remember the users input to produce deductions and a personalized feed-back.

Why is it built?

To create a personal assistant software like Siri and Cortana, made by the people for the people. Its an free software open mind which expands with the help of it’s users.
What is it’s purpose?

To explore the abilities of an artificial companion and to answer the remaining unanswered questions.

 

Teach Susi some rules

Under the hood: Accounting example

The login-api is now the first service in loklak that utilizes the accounting feature. It does that to protect user accounts against brute-force login attempts.

How does it work?

Quite simple. First, we define some permissions:

public JSONObject getDefaultPermissions(BaseUserRole baseUserRole) {
   JSONObject result = new JSONObject();
   result.put("maxInvalidLogins", 10);
   result.put("blockTimeSeconds", 120);
   result.put("periodSeconds", 60);
   result.put("blockedUntil", 0);
   return result;
}

Each user is only allowed to make 10 invalid login attempts over a period of 60 seconds and will otherwise get blocked for 120 seconds. Why do we save that in the permissions? Because we could change them on user basis. If one user get’s blocked for the 3rd time, we could set his blocked time up to 24h for example. That’s not implemented yet though.

Now, whenever we have a bad login attempt we save it in the accounting system:

authorization.getAccounting().addRequest(this.getClass().getCanonicalName(), "invalid login");

throw new APIException(422, "Invalid credentials");

Note that we have to specify some path or name in the accounting object. We use the full name of the login service, so other service will mess with that.

Now this is how we check:

private void checkInvalidLogins(Query post, Authorization authorization, JSONObjectWithDefault permissions) throws APIException {

   // is already blocked?
   long blockedUntil = permissions.getLong("blockedUntil");
   if(blockedUntil != 0) {
      if (blockedUntil > Instant.now().getEpochSecond()) {
         Log.getLog().info("Blocked ip " + post.getClientHost() + " because of too many invalid login attempts.");
         throw new APIException(403, "Too many invalid login attempts. Try again in "
               + (blockedUntil - Instant.now().getEpochSecond()) + " seconds");
      }
      else{
         authorization.setPermission(this, "blockedUntil", 0);
      }
   }

   // check if too many invalid login attempts were made already
   JSONObject invalidLogins = authorization.getAccounting().getRequests(this.getClass().getCanonicalName());
   long period = permissions.getLong("periodSeconds", 600) * 1000; // get time period in which wrong logins are counted (e.g. the last 10 minutes)
   int counter = 0;
   for(String key : invalidLogins.keySet()){
      if(Long.parseLong(key, 10) > System.currentTimeMillis() - period) counter++;
   }
   if(counter > permissions.getInt("maxInvalidLogins", 10)){
      authorization.setPermission(this, "blockedUntil", Instant.now().getEpochSecond() + permissions.getInt("blockTimeSeconds", 120));
      throw new APIException(403, "Too many invalid login attempts. Try again in "
            + permissions.getInt("blockTimeSeconds", 120) + " seconds");
   }
}

First we check if there’s an client specific override of our permissions: blockedUntil

Normally that is 0, but if it’s set to some second in the future, we respond with an error message, saying how long the client has to wait.

Otherwise, we check how many entries are in the accounting object for service. As the login service only saves bad requests, we only need to know the number and when they were made.

Each request in the accounting object is stored with the current timestamp as key. So we check which key were made in the last 60 seconds (as defined in the permissions). If it’s more than 10, set the permission blockedUntil to the current second + 120.

This is a short example of a mighty tool to archive user specific reactions in our services and could be adopted for many different scenarios. Feel free to try 🙂

Under the hood: Accounting example

Custom colours for loklak walls

You can now customize the background and card colors on loklak walls!

Here’s how we did it:

First, we had to add extra fields to the wall schema:

var UserSchema = new Schema({
  apps: {
    wall: [{
      // other options
      cardBgColour: String,
      cardForeColour: String,
      wallBgColour: String,

Next, we had to add these extra options in the angular controller (wall.js) for the creation modal:

var initWallOptions = function() {
  $scope.newWallOptions.wallBgColour = '#ecf0f5';
  $scope.newWallOptions.cardBgColour = '#ffffff';
}

$scope.$watch('newWallOptions.cardBgColour', function() {
  if ($scope.newWallOptions.cardBgColour) {
    $scope.newWallOptions.cardForeColour = colourCalculator(hexToRgb($scope.newWallOptions.cardBgColour));
  }
});

The $watch function watches for any changes in the card background color and changes the cardForeColour / text color to be black or white depending on the bg color.

Now, we have to use the saved data in the wall display pages (display.html):


   <div ng-style="{'background-color': wall.wallOptions.wallBgColour}" class="wall-container container-fluid">
        <div class="container content-container wall-body">
            <div ng-switch on="wall.wallOptions.layoutStyle" ng-show="wall.statuses.length>0" ng-class="wall.wallOptions.showStatistics || wall.currentAnnoucement?'col-md-8':'col-md-12'" masonry>
                <!-- 1. Linear -->
                <div ng-switch-when="1" linear ng-repeat="status in wall.statuses" open="wall.open" data="status" 
                cardbgcolor="wall.wallOptions.cardBgColour" cardtxtcolor="wall.wallOptions.cardForeColour"></div>
                <!-- 2. Masonry -->
                <div ng-switch-when="2" card ng-repeat="status in wall.statuses" open="wall.open" data="status" 
                cardbgcolor="wall.wallOptions.cardBgColour" cardtxtcolor="wall.wallOptions.cardForeColour"
                leaderboardEnabled="{{wall.wallOptions.showStatistics}}"></div>
                <!-- 3. Single -->
                <div ng-switch-when="3" coa ng-repeat="status in wall.statuses" open="wall.open" data="status"
                cardbgcolor="wall.wallOptions.cardBgColour" cardtxtcolor="wall.wallOptions.cardForeColour"
                ></div>
            </div>
         </div>
     </div>

We pass the saved wall options into each directive using the attributes cardbgcolor, cardtxtcolor, and we use ng-style to evaluate the expression with wallBgColour.

In the linear layout directive file, we use the ‘=’ sign to signal 2-way-binding.

function linearLayoutDirective() {
  return {
    scope: {
      data: '=',
      cardbgcolor:'=',
      cardtxtcolor:'=',
    },
    templateUrl: 'wall/templates/linear.html',
  };
}

Then we can use it in our template (linear.html):

<div class="linear linear-simple" style="background-color: {{cardbgcolor}};">
  <!-- Main content -->
  <p class="linear-content-text" style="color: {{cardtxtcolor}};"></p>
</div>

I have passed the cardbgcolour into the filter

|tweetTextLink:cardbgcolor

so we can also change the colours of the links:

filtersModule.filter('tweetTextLink', function() {
  return function(input, cardBgColour) {
  var textClassName = cardBgColour ? colourCalculator(hexToRgb(cardBgColour)) : '';
  }
}

Have fun customizing your walls at: loklak-wall.herokuapp.com
Screenshot 2016-07-10 09.25.14

Custom colours for loklak walls

Migrating FOSSASIA blog from Drupal to WordPress

Last week I migrated FOSSASIA’s blog from Drupal to WordPress and it was an amazing learning experience.

The steps one can use for migration are as follows:

Create a WordPress website:

In order to convert your drupal website to wordpress, you need to have a wordpress site where the data will be imported. By WordPress site, I mean a local installation where you can test whether the migration worked or not.

Truncate default posts/pages/comments:

Once you have your WP installation ready, truncate the default pages,comments etc from your wordpress database.

TRUNCATE TABLE wordpress.wp_comments;
TRUNCATE TABLE wordpress.wp_links;
TRUNCATE TABLE wordpress.wp_postmeta;
TRUNCATE TABLE wordpress.wp_posts;
TRUNCATE TABLE wordpress.wp_term_relationships;
TRUNCATE TABLE wordpress.wp_term_taxonomy;
TRUNCATE TABLE wordpress.wp_terms;
Selection_068
WordPress Database

Get hold of the Drupal mysql DB:

Import your Drupal DB to your local mysql installation where you have your WP database. Why? because you need to do a lot of “data transfer”!

Selection_069
Drupal Database

Execute a lot of scripts (Just kidding!):

There are some pretty useful online references which provide the required mysql scripts to migrate the data from Drupal to WordPress DB with proper formatting. Look here and here.

Depending on the kind of data you have you might need to do some modifications. e.g. depending on whether you have tags or categories/sub-categories in your data, you might have to modify the following command to suite your needs.

INSERT INTO wordpress.wp_terms (term_id, name, slug, 
term_group)
SELECT
	d.tid,
	d.name,
	REPLACE(LOWER(d.name), ' ', '-'),
	0
FROM drupal.taxonomy_term_data d
INNER JOIN drupal.taxonomy_term_hierarchy h USING(tid);

Recheck if entire data has been imported correctly:

Once you execute the scripts error free. Check if you imported the DB data (users/taxonomies/posts) correctly. Since WP and Drupal store passwords differently, you would have to ask your users/authors/admins to change their passwords on the migrated blog. We are almost there!! (not quite).

Transfer media files to WP and map them to Media:

You would have to transfer your media (pics, videos, attachments etc) to your WordPress installation from Drupal site. Selection_066

Put them under wp-content/uploads/old or any other suitable directory name under wp-content/uploads/.

Once you are done with it! In order to add the files to “Media” under Admin Panel, you can use plugins like Add from Server which map your files to folder “Media” expects your files to be in.

Change the permalinks (optional):

Depending on default permalinks of your Drupal blog, you might have to change the permalink format.

To do that, go to <Your_WP_Site>/wp-admin/options-permalink.php

You can change the permalink structure from one of the many options you are provided. Selection_067

Add themes as you may. Upload your WordPress site online. And we are done!!

The new face of blog.fossasia.org looks like this! Selection_070

Migrating FOSSASIA blog from Drupal to WordPress

User Information via Loklak API

While working on “adding loklak API support to  wordpress plugins”, I found that a lot plugins require Twitter user information e.g. profile_pic, followers_count, tweets by the user etc.

One can get an exhaustive user information using Loklak search and user APIs. In this blog post I would show how one can combine the results from Loklak’s Search and User API to provide all kinds of data required by a WordPress plugin (e.g.).

Starting with Search API!

Typing

http://loklak.org/api/search.json?q=from:KhoslaSopan

in url would give you results like this:

Selection_072

As you can see, it provides all tweet (tweeted by the <user>) related information and a small user profile like this:

Selection_073

But what if you require his location, followers_count, friends_count, no. of tweets he has posted till now etc, you would have to take help from the Loklak User API. Sample results for

http://loklak.org/api/user.json?screen_name=KhoslaSopan

are given below:

Selection_074

But somehow you also require the information of his followers and people he is following. How do you achieve that? Well Loklak API has provision for that as well. All you need to do is add followers=<count> or following=<count> as need be.

http://loklak.org/api/user.json?screen_name=KhoslaSopan&following=100&followers=100

Following data would be added to your user.json results.

Selection_075

It would give you the user information of twitter users followed by that user. Recursively you can get their tweet data, their followers and on and on!!

In order to get complete user profile, you can merge the results from Search and User API and create wonderful applications.

To do all this using Loklak PHP API, refer to the code snippet below.

$connection = new Loklak();
$tweets = $connection->search('', null, null, $username, $no_of_tweets);

$tweets = json_decode($tweets, true);
$tweets = json_decode($tweets['body'], true);

$user = $connection->user($username);
$user = json_decode($user, true);
$user = json_decode($user['body'], true);

$tweet_content = $tweets['statuses'];
for ($i=0; $i < sizeof($tweet_content); $i++) {
     $tweet_content[$i] = array_merge($tweet_content[$i], $user);
}
User Information via Loklak API

TopMenu and SiteMaps – Making loklak crawlable

So now we have seen how loklak_depot actually started off: by an accounts system and lot of security fixes (AAA system etc). We have made the foundation of loklak_depot as simple and branched-out as possible. But before we go on to working on Q&A apps and Susi (our intelligent query system of loklak_depot), I figured out one problem.

How do the users on the WWW get to know about this?

loklak had not been made crawlable until recently. This prevented search engines to crawl loklak.org and display its results. To improve out reach, thus, enabling crawling became necessary.

To enable crawling, what we needed was a sitemap.xml file and a robots.txt. The sitemap specifies the URLs branching out from the main page (including the main page itself) and the robots.txt mainly specifies parts of the site which should NOT be crawled. Thus, both had to be made to enable crawling.

Talking about the main loklak.org website, if you visit the site, you will see a menu on the top which leads to the various links (lets refer to it as the TopMenu). Once the user knows that these links are there, it will automatically crawl them. So it could be simple to create a normal xml file which has those links. But here’s the catch.

We knew loklak.org was something all of us are working on (and updating) regularly, and so the TopMenu is also bound to change. We also did not want to keep updating the HTML files to accommodate changes in the TopMenu. So we decided to do two things:

1. Make the TopMenu dynamic so that only a little change can update it.
2. Generate and update the sitemap.xml dynamically from TopMenu changes, without changing the xml.

For Part 1, we decided to implement a servlet which returns a JSON containing the TopMenu items and their links. We then implement an Angular function which parses this JSON and changes the TopMenu dynamically.

Here is the servlet TopMenuService.java. It’s pretty easy to understand:


public class TopMenuService extends AbstractAPIHandler implements APIHandler {
    
    private static final long serialVersionUID = 1839868262296635665L;

    @Override
    public BaseUserRole getMinimalBaseUserRole() { return BaseUserRole.ANONYMOUS; }

    @Override
    public JSONObject getDefaultPermissions(BaseUserRole baseUserRole) {
        return null;
    }

    @Override
    public String getAPIPath() {
        return "/cms/topmenu.json";
    }
    
    @Override
    public JSONObject serviceImpl(Query call, Authorization rights, final JSONObjectWithDefault permissions) {
        
        int limited_count = (int) DAO.getConfig("download.limited.count", (long) Integer.MAX_VALUE);
    
        JSONObject json = new JSONObject(true);
        JSONArray topmenu = new JSONArray()
            .put(new JSONObject().put("Home", "index.html"))
            .put(new JSONObject().put("About", "about.html"))
            .put(new JSONObject().put("Showcase", "showcase.html"))
            .put(new JSONObject().put("Architecture", "architecture.html"))
            .put(new JSONObject().put("Download", "download.html"))
            .put(new JSONObject().put("Tutorials", "tutorials.html"))
            .put(new JSONObject().put("API", "api.html"));
        if (limited_count > 0) topmenu.put(new JSONObject().put("Dumps", "dump.html"));
        topmenu.put(new JSONObject().put("Apps", "apps/applist/index.html"));
        json.put("items", topmenu);

    }
}

As seen, in a serviceImpl object, we are making a JSONObject containing all the links of loklak.org TopMenu with their URLs, and this object is returned.

Now what we want is make the changes to the index.html and the JavaScript, and here they are:

JS:



angular.element(document).ready(function () {
  var navString = "";
  var winLocation = window.location.href;
  $.getJSON("/cms/topmenu.json", function(data) {
    navItems = data.items;
    navItems = navItems.reverse();
    var count = 0;
    $.each( navItems, function(index, itemData) {
      name = Object.keys(itemData);
      link = itemData[name];
      // Now construct the li items
      liItem = "<li>";
      if (winLocation.indexOf(link) != -1 && count != 1) {
        liItem = "<li class='active'>";
        count = count + 1;
      }
      liItem += "<a href='\/"+link+"'>"+name+"</a></li>";
      liItem = $(liItem);
      $('#navbar > ul').prepend(liItem);
    });
  });
});

HTML:



<nav class="navbar navbar-inverse navbar-fixed-top">
      <div class="container-fluid">
        <div class="navbar-header">
          <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
            <span class="sr-only">Toggle navigation</span>
            <span class="icon-bar"></span>
            <span class="icon-bar"></span>
            <span class="icon-bar"></span>
          </button>
          <a class="navbar-brand" href="#"></a>
        </div>
        <div id="navbar" class="navbar-collapse collapse">
          <ul class="nav navbar-nav navbar-right">
            <!-- This will get populated -->
          </ul>
        </div>
      </div>
    </nav>

So in the Angular function, we parse the JSON and insert the items in the TopMenu code in the HTML. So basically all we need to do is change the entries in TopMenuService.java and the TopMenu will get updated.

So this is Part 1 done. Now comes the crawling part. We need to use TopMenuService.java in a servlet so that only changing the entries in TopMenuService.java will change the sitemap. So basically TopMenuService is the central servlet, changing it should update both sitemap and the TopMenu URLs as shown above.

So I coded another servlet which parses the JSON from TopMenu and makes up a SiteMap:


public class Sitemap extends HttpServlet {

	private static final long serialVersionUID = -8475570405765656976L;
	private final String sitemaphead = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
			+ "<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n";

	@Override
	protected void doPost(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException {
		doGet(request, response);
	}

	@Override
	protected void doGet(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException {
		Query post = RemoteAccess.evaluate(request);
		// String siteurl = request.getRequestURL().toString();
		// String baseurl = siteurl.substring(0, siteurl.length() -
		// request.getRequestURI().length()) + request.getContextPath() + "/";
		String baseurl = "http://loklak.org/";
		JSONObject TopMenuJsonObject = new TopMenuService().serviceImpl(post, null, null);
		JSONArray sitesarr = TopMenuJsonObject.getJSONArray("items");
		response.setCharacterEncoding("UTF-8");
		PrintWriter sos = response.getWriter();
		sos.print(sitemaphead + "\n");
		for (int i = 0; i < sitesarr.length(); i++) {
			JSONObject sitesobj = sitesarr.getJSONObject(i);
			Iterator sites = sitesobj.keys();
			sos.print("<url>\n<loc>" + baseurl + sitesobj.getString(sites.next().toString()) + "/</loc>\n"
					+ "<changefreq>weekly</changefreq>\n</url>\n");
		}
		sos.print("</urlset>");
		sos.println();
		post.finalize();
	}
}

The XML is adhering to the sitemap standard as prescribed here. Basically, I just took up the JSON from TopMenu, used an Iterator to get the keys (if you look at the JSON, you will notice I only need the values from all the objects in the JSONArray). and then print it out using a PrintWriter.

Since we wanted all the URLs to be crawled in the sitemap, the robots.txt looks something like:


User-agent: *
Sitemap: http://loklak.org/api/sitemap.xml

So now we have achieved in getting a dynamically updating SiteMap and TopMenu, all controlled using only a JSONObject in TopMenuService.java. Easy, no?

That’s all for now. In my next post, I will be talking about the Q&A Apps I’m working on, as well as a bit about Susi. Till then, ciao! Feedback as always is appreciated 🙂

TopMenu and SiteMaps – Making loklak crawlable