Social Media Analysis using Loklak (Part 3)

In my last two blog posts, I spoke about the TwitterAnalysis Servlet, and how the data actually comes into it through scraping methods. For TwitterAnalysis though, there was one thing that was missing: Susi integration, which I’ll cover in this blog post.

Given that the TwitterAnalysis servlet is basically a Social Media profile analyser, we could definitely get a lot of useful statistics from it. As covered earlier, we are getting likes, retweets, hashtag statistics, sentiment analysis, frequency charts etc. Now, to get this working on Susi, we need to build queries which can use these statistics and give the user valuable information.

First off, the serviceImpl method needs to be changed to return a SusiThought object. SusiThought is a JSONObject which processes the query (does keyword extraction etc), uses the APIs to get an answer to the query, and returns the answer along with the count of answers (incase of a table). SusiThought is what triggers the entire Susi mechanism, so the first thing for Susi integration is to convert TwitterAnalysis to return a SusiThought object:


@Override
	public JSONObject serviceImpl(Query call, HttpServletResponse response, Authorization rights,
			JSONObjectWithDefault permissions) throws APIException {
		String username = call.get("screen_name", "");
		String count = call.get("count", "");
		TwitterAnalysisService.request = call.getRequest();
		return showAnalysis(username, count);
	}
public static SusiThought showAnalysis(String username, String count) {

//rest of the code as explained in last blog post
//SusiThought is a JSONObject so we simply copy-paste the serviceImpl code here

}

Once this is done, we write up the queries in the susi_cognition.

As you may have read in my last blog post, TwitterAnalysis gives the Twitter Profile analysis of a user, it’s basically statistics, so we could have a lot of queries regarding this. So these are the rules I implemented, they are self-explanatory on reading the example fields:


{
			"keys"   :["tweet frequency", "tweets", "month"],
			"score"  :2000,
			"example": "How many tweets did melaniatrump post in May 2016",
			"phrases":[ {"type":"pattern", "expression":"* tweet frequency of * in *"},
				{"type":"pattern", "expression":"* tweets did * post in *"},
				{"type":"pattern", "expression":"* tweets did * post in the month of *"}
			],
			"process":[ {"type": "console", "expression": "SELECT yearwise[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ tweeted $count$ times in $3$"
			]}]
		},
		{
			"keys"   :["tweet frequency", "tweets", "post", "at"],
			"score"  :2000,
			"example": "How many tweets did melaniatrump post at 6 PM",
			"phrases":[ {"type":"pattern", "expression":"* tweet frequency of * at *"},
				{"type":"pattern", "expression":"* tweets did * post at *"}
			],
			"process":[ {"type": "console", "expression": "SELECT hourwise[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ tweeted $count$ times at $3$"
			]}]
		},
		{
			"keys"   :["tweet frequency", "tweets", "post", "on"],
			"score"  :2000,
			"example": "How many tweets did melaniatrump post on Saturdays",
			"phrases":[ {"type":"pattern", "expression":"* tweet frequency of * on *s"},
				{"type":"pattern", "expression":"* tweets did * post on *s"},
				{"type":"pattern", "expression":"* tweet frequency of * on *"},
				{"type":"pattern", "expression":"* tweets did * post on *"}
			],
			"process":[ {"type": "console", "expression": "SELECT daywise[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ tweeted $count$ times on $3$"
			]}]
		},
		{
			"keys"   :["tweet frequency", "chart"],
			"score"  :2000,
			"example": "Show me the yearwise tweet frequency chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* the * tweet frequency chart of *"}],
			"process":[ {"type": "console", "expression": "SELECT $2$ FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"This is the $2$ frequency chart of $3$"
			]}, {"type":"table"}]
		},
		{
			"keys"   :["tweet type", "post", "a"],
			"score"  :2000,
			"example": "How many times did melaniatrump post a video",
			"phrases":[ {"type":"pattern", "expression":"* did * post a *"}],
			"process":[ {"type": "console", "expression": "SELECT $3$ AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ posted a $3$ $count$ times"
			]}]
		},
		{
			"keys"   :["tweet activity", "likes", "count"],
			"example": "How many likes does melaniatrump have in all",
			"score"  :2000,
			"phrases":[ {"type":"pattern", "expression":"* likes does * have *"}
			],
			"process":[ {"type": "console", "expression": "SELECT likes_count AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ has $count$ likes till now"
			]}]
		},
		{
			"keys"   :["tweet activity", "likes", "maximum"],
			"example": "What is the maximum number of likes that melaniatrump got",
			"score"  :2000,
			"phrases":[ {"type":"pattern", "expression":"* maximum * likes that * got"}
			],
			"process":[ {"type": "console", "expression": "SELECT max_likes FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here you go"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet activity", "likes", "average"],
			"example": "What is the average number of likes that melaniatrump gets",
			"score"  :2000,
			"phrases":[ {"type":"pattern", "expression":"* average * likes that * gets"}
			],
			"process":[ {"type": "console", "expression": "SELECT average_number_of_likes AS count FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$3$ gets $count$ likes on an average"
			]}]
		},
		{
			"keys"   :["tweet activity", "likes", "frequency"],
			"score"  :2000,
			"example": "How many times did melaniatrump get 0 likes",
			"phrases":[ {"type":"pattern", "expression":"* * have * likes"},
				{"type":"pattern", "expression":"* * get * likes"}
			],
			"process":[ {"type": "console", "expression": "SELECT likes_chart[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ got $3$ likes, $count$ times"
			]}]
		},
		{
			"keys"   :["tweet activity", "likes", "frequency", "chart"],
			"score"  :2000,
			"example": "Show me the likes frequency chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* likes frequency chart * *"}
			],
			"process":[ {"type": "console", "expression": "SELECT likes_chart FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here is the likes frequency chart"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet activity", "retweets", "count"],
			"score"  :2000,
			"example": "How many retweets does melaniatrump have in all",
			"phrases":[ {"type":"pattern", "expression":"* retweets does * have *"}
			],
			"process":[ {"type": "console", "expression": "SELECT retweets_count AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ has $count$ retweets till now"
			]}]
		},
		{
			"keys"   :["tweet activity", "retweets", "maximum"],
			"score"  :2000,
			"example": "What is the maximum number of retweets that melaniatrump got",
			"phrases":[ {"type":"pattern", "expression":"* maximum * retweets that * got"}
			],
			"process":[ {"type": "console", "expression": "SELECT max_retweets FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here you go"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet activity", "retweets", "average"],
			"score"  :2000,
			"example": "What is the average number of retweets that melaniatrump gets",
			"phrases":[ {"type":"pattern", "expression":"* average * retweets that * gets"}
			],
			"process":[ {"type": "console", "expression": "SELECT average_number_of_retweets AS count FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$3$ gets $count$ retweets on an average"
			]}]
		},
		{
			"keys"   :["tweet activity", "retweets", "frequency"],
			"score"  :2000,
			"example": "How many times did melaniatrump get 0 retweets",
			"phrases":[ {"type":"pattern", "expression":"* * have * retweets"},
				{"type":"pattern", "expression":"* * get * retweets"}
			],
			"process":[ {"type": "console", "expression": "SELECT retweets_chart[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ got $3$ retweets, $count$ times"
			]}]
		},
		{
			"keys"   :["tweet activity", "retweets", "frequency", "chart"],
			"score"  :2000,
			"example": "Show me the retweet frequency chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* retweet frequency chart * *"}
			],
			"process":[ {"type": "console", "expression": "SELECT retweets_chart FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here is the retweets frequency chart"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet activity", "hashtags", "count"],
			"score"  :2000,
			"example": "How many hashtags has melaniatrump used in all",
			"phrases":[ {"type":"pattern", "expression":"* hashtags has * used *"}
			],
			"process":[ {"type": "console", "expression": "SELECT hashtags_used_count AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ has used $count$ hashtags till now"
			]}]
		},
		{
			"keys"   :["tweet activity", "hashtags", "maximum"],
			"score"  :2000,
			"example": "What is the maximum number of hastags that melaniatrump used",
			"phrases":[ {"type":"pattern", "expression":"* maximum * hashtags that * used"}
			],
			"process":[ {"type": "console", "expression": "SELECT max_hashtags FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here you go"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet activity", "hashtags", "average"],
			"score"  :2000,
			"example": "What is the average number of hashtags that melaniatrump uses",
			"phrases":[ {"type":"pattern", "expression":"* average * hashtags that * uses"}
			],
			"process":[ {"type": "console", "expression": "SELECT average_number_of_hashtags_used AS count FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$3$ uses $count$ hashtags on an average"
			]}]
		},
		{
			"keys"   :["tweet activity", "hashtags", "frequency"],
			"score"  :2000,
			"example": "How many times did melaniatrump use 20 hashtags",
			"phrases":[ {"type":"pattern", "expression":"* * use * hashtags"}
			],
			"process":[ {"type": "console", "expression": "SELECT hashtags_chart[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ used $3$ hashtags, $count$ times"
			]}]
		},
		{
			"keys"   :["tweet activity", "hashtags", "frequency", "chart"],
			"score"  :2000,
			"example": "Show me the hashtag frequency chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* hashtag frequency chart * *"}
			],
			"process":[ {"type": "console", "expression": "SELECT hashtags_chart FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here is the hashtags frequency chart"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet content", "language", "frequency"],
			"score"  :2000,
			"example": "How many tweets did melaniatrump write in English?",
			"phrases":[ {"type":"pattern", "expression":"* * write in *"},
				{"type":"pattern", "expression":"* * post in *"},
				{"type":"pattern", "expression":"* of * were written in *"}
			],
			"process":[ {"type": "console", "expression": "SELECT languages[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ posted $count$ tweets in $3$"
			]}]
		},
		{
			"keys"   :["tweet content", "language", "analysis", "chart"],
			"score"  :2000,
			"example": "Show me the language analysis chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* language analysis chart * *"}
			],
			"process":[ {"type": "console", "expression": "SELECT languages FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here is the language analysis chart"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet content", "sentiment", "analysis", "chart"],
			"score"  :2000,
			"example": "Show me the sentiment analysis chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* sentiment analysis chart * *"}
			],
			"process":[ {"type": "console", "expression": "SELECT sentiments FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here is the sentiment analysis chart"
			]}, {"type": "table"}]
		}

(PS: no points for guessing why melaniatrump is there in the examples πŸ˜‰ )

As has been explained before, I simply write up an expression consisting of parameters and some core words which are hardcoded, and I then fetch up the parameters using $x$ (x = parameter number). These queries can actually give a whole lot of statistics regarding a user’s activity and activity on his profile, so it is definitely a whole lot useful for a chatbot.

Now, to end this, we need a way to process these queries. Enter ConsoleService. Notice that all the process['expression'] SQL queries are of the type:

SELECT <something> FROM twitanalysis where screen_name = '<parameter_of_username>' AND count = '1000';

I have taken count as 1000 because as mentioned in my last blog post, the scraper scrapes and displays a maximum of 1000 results at a time, so I wish to maximise the range.

Converting the above SQL generalised query to regex, we get this form:

SELECT\\h+?(.*?)\\h+?FROM\\h+?twitanalysis\\h+?WHERE\\h+?screen_name\\h??=\\h??'(.*?)'\\h+?AND\\h+?count\\h??=\\h??'(.*?)'\\h??;

The \h is for a whitespace that may occur, and the random queries are just expressed using (.*) (random character selection in regex). Since we have specific parameters (as described above) that we use in our SQL queries, we encapsulate these random character selections into groups.

Now, we need to compile this regex, and point to what needs to be done. This is done in ConsoleService.


dbAccess.put(Pattern.compile("SELECT\\h+?(.*?)\\h+?FROM\\h+?twitanalysis\\h+?WHERE\\h+?screen_name\\h??=\\h??'(.*?)'\\h+?AND\\h+?count\\h??=\\h??'(.*?)'\\h??;"), (flow, matcher) -> {
            SusiThought json = TwitterAnalysisService.showAnalysis(matcher.group(2), matcher.group(3));
            SusiTransfer transfer = new SusiTransfer(matcher.group(1));
            json.setData(transfer.conclude(json.getData()));
            return json;
        });

We basically compile the regex, and feed it to a bifunction (lambda lingo for a function that takes in two params). We take in the groups using matcher.group and since you saw above, the SusiThought object in TwitterAnalysis takes in screen_name and count, so we take them in using the matcher and feed them to the static function showAnalysisinside the TwitterAnalysis servlet. We then get back the JSON. This completes the procedure. TwitterAnalysis is now integrated with the Susi API. πŸ™‚

In my next blog posts, I’ll talk about Bot integrations for Susi, and a Slack Bot for Susi I made, and then I’ll move to Susi monetisation using Amazon API. Feedback is welcome πŸ™‚

Social Media Analysis using Loklak (Part 3)

Social Media Analysis using Loklak (Part 1)

So now in the past 7 posts, I have covered how loklak_depot started off, the beginnings, and how we got loklak_depot on the World Wide Web so that people got to know more about it. Now, I shall go a bit deeper with what we at loklak are trying to do with all the information we are getting.

Since Twitter gives a lot of information, about how people are living their lives, and things on a variety of topics: sports, business, finance, food etc, it is human information and such information can be taken as a (not-completely because there are a lot of users) reliable source for a variety of data analysis. We could do a host of things with that data we get from tweets, and a lot of things as a matter of fact have been done already, like stock market predictions, sentiment analysis etc.

One of these uses is also analysing Social Media profiles. Give in a Twitter / any Online Social Media profile, and the program gives you a complete analysis of the posts that you make and the people you connect with. Another use is Susi, our NLP-based Search Engine, on which a couple of blogposts have already been made.

So while I was spotting issues with the Search API and checking out the others, I had an idea to make an small application to which we supply a Twitter username and we get a Profile analysis of that username. The app has been built and works well. I have only built the servlet because I planned on integrating it to Susi, and using it with susi would actually give a lot of interesting answers.

The app / servlet has the following features I decided to implement:

1. User’s Activity Frequency: yearly, hourly and daily statistics etc
2. User’s Activity Type: how many times did user upload photo / video / status etc
3. Activity on User’s Activity: Number of likes / retweets user did, hashtags analysis etc
4. Analysis of User’s content: Language and Sentiment analysis

So here is the entire code for the program. After this, I will explain how I implemented the parts step by step:


public class TwitterAnalysisService extends AbstractAPIHandler implements APIHandler {

	private static final long serialVersionUID = -3753965521858525803L;

	private static HttpServletRequest request;

	@Override
	public String getAPIPath() {
		return "/api/twitanalysis.json";
	}

	@Override
	public BaseUserRole getMinimalBaseUserRole() {
		return BaseUserRole.ANONYMOUS;
	}

	@Override
	public JSONObject getDefaultPermissions(BaseUserRole baseUserRole) {
		return null;
	}

	@Override
	public JSONObject serviceImpl(Query call, HttpServletResponse response, Authorization rights,
			JSONObjectWithDefault permissions) throws APIException {
		String username = call.get("screen_name", "");
		String count = call.get("count", "");
		TwitterAnalysisService.request = call.getRequest();
		return showAnalysis(username, count);
	}

	public static SusiThought showAnalysis(String username, String count) {

		SusiThought json = new SusiThought();
		JSONArray finalresultarray = new JSONArray();
		JSONObject finalresult = new JSONObject(true);
		String siteurl = request.getRequestURL().toString();
		String baseurl = siteurl.substring(0, siteurl.length() - request.getRequestURI().length())
				+ request.getContextPath();

		String searchurl = baseurl + "/api/search.json?q=from%3A" + username + (count != "" ? ("&count=" + count) : "");
		byte[] searchbyte;
		try {
			searchbyte = ClientConnection.download(searchurl);
		} catch (IOException e) {
			return json.setData(new JSONArray().put(new JSONObject().put("Error", "Can't contact server")));
		}
		String searchstr = UTF8.String(searchbyte);
		JSONObject searchresult = new JSONObject(searchstr);

		JSONArray tweets = searchresult.getJSONArray("statuses");
		if (tweets.length() == 0) {
			finalresult.put("error", "Invalid username " + username + " or no tweets");
			finalresultarray.put(finalresult);
			json.setData(finalresultarray);
			return json;
		}
		finalresult.put("username", username);
		finalresult.put("items_per_page", searchresult.getJSONObject("search_metadata").getString("itemsPerPage"));
		finalresult.put("tweets_analysed", searchresult.getJSONObject("search_metadata").getString("count"));

		// main loop
		JSONObject activityFreq = new JSONObject(true);
		JSONObject activityType = new JSONObject(true);
		int imgCount = 0, audioCount = 0, videoCount = 0, linksCount = 0, likesCount = 0, retweetCount = 0,
				hashtagCount = 0;
		int maxLikes = 0, maxRetweets = 0, maxHashtags = 0;
		String maxLikeslink, maxRetweetslink, maxHashtagslink;
		maxLikeslink = maxRetweetslink = maxHashtagslink = tweets.getJSONObject(0).getString("link");
		List tweetDate = new ArrayList();
		List tweetHour = new ArrayList();
		List tweetDay = new ArrayList();
		List likesList = new ArrayList();
		List retweetsList = new ArrayList();
		List hashtagsList = new ArrayList();
		List languageList = new ArrayList();
		List sentimentList = new ArrayList();
		Calendar calendar = Calendar.getInstance();

		for (int i = 0; i < tweets.length(); i++) {
			JSONObject status = tweets.getJSONObject(i);
			String[] datearr = status.getString("created_at").split("T")[0].split("-");
			calendar.set(Integer.parseInt(datearr[0]), Integer.parseInt(datearr[1]) - 1, Integer.parseInt(datearr[2]));
			Date date = new Date(calendar.getTimeInMillis());
			tweetDate.add(new SimpleDateFormat("MMMM yyyy").format(date));
			tweetDay.add(new SimpleDateFormat("EEEE", Locale.ENGLISH).format(date)); // day
			String times = status.getString("created_at").split("T")[1];
			String hour = times.substring(0, times.length() - 5).split(":")[0];
			tweetHour.add(hour); // hour
			imgCount += status.getInt("images_count");
			audioCount += status.getInt("audio_count");
			videoCount += status.getInt("videos_count");
			linksCount += status.getInt("links_count");
			likesList.add(status.getInt("favourites_count"));
			retweetsList.add(status.getInt("retweet_count"));
			hashtagsList.add(status.getInt("hashtags_count"));
			if (status.has("classifier_emotion")) {
				sentimentList.add(status.getString("classifier_emotion"));
			} else {
				sentimentList.add("neutral");
			}
			if (status.has("classifier_language")) {
				languageList.add(status.getString("classifier_language"));
			} else {
				languageList.add("no_text");
			}
			if (maxLikes < status.getInt("favourites_count")) {
				maxLikes = status.getInt("favourites_count");
				maxLikeslink = status.getString("link");
			}
			if (maxRetweets < status.getInt("retweet_count")) {
				maxRetweets = status.getInt("retweet_count");
				maxRetweetslink = status.getString("link");
			}
			if (maxHashtags < status.getInt("hashtags_count")) {
				maxHashtags = status.getInt("hashtags_count");
				maxHashtagslink = status.getString("link");
			}
			likesCount += status.getInt("favourites_count");
			retweetCount += status.getInt("retweet_count");
			hashtagCount += status.getInt("hashtags_count");
		}
		activityType.put("posted_image", imgCount);
		activityType.put("posted_audio", audioCount);
		activityType.put("posted_video", videoCount);
		activityType.put("posted_link", linksCount);
		activityType.put("posted_story",
				Integer.parseInt(searchresult.getJSONObject("search_metadata").getString("count"))
						- (imgCount + audioCount + videoCount + linksCount));

		JSONObject yearlyact = new JSONObject(true);
		JSONObject hourlyact = new JSONObject(true);
		JSONObject dailyact = new JSONObject(true);
		Set yearset = new HashSet(tweetDate);
		Set hourset = new HashSet(tweetHour);
		Set dayset = new HashSet(tweetDay);

		for (String s : yearset) {
			yearlyact.put(s, Collections.frequency(tweetDate, s));
		}

		for (String s : hourset) {
			hourlyact.put(s, Collections.frequency(tweetHour, s));
		}

		for (String s : dayset) {
			dailyact.put(s, Collections.frequency(tweetDay, s));
		}

		activityFreq.put("yearwise", yearlyact);
		activityFreq.put("hourwise", hourlyact);
		activityFreq.put("daywise", dailyact);
		finalresult.put("tweet_frequency", activityFreq);
		finalresult.put("tweet_type", activityType);

		// activity on my tweets

		JSONObject activityOnTweets = new JSONObject(true);
		JSONObject activityCharts = new JSONObject(true);
		JSONObject likesChart = new JSONObject(true);
		JSONObject retweetChart = new JSONObject(true);
		JSONObject hashtagsChart = new JSONObject(true);

		Set likesSet = new HashSet(likesList);
		Set retweetSet = new HashSet(retweetsList);
		Set hashtagSet = new HashSet(hashtagsList);

		for (Integer i : likesSet) {
			likesChart.put(i.toString(), Collections.frequency(likesList, i));
		}

		for (Integer i : retweetSet) {
			retweetChart.put(i.toString(), Collections.frequency(retweetsList, i));
		}

		for (Integer i : hashtagSet) {
			hashtagsChart.put(i.toString(), Collections.frequency(hashtagsList, i));
		}

		activityOnTweets.put("likes_count", likesCount);
		activityOnTweets.put("max_likes",
				new JSONObject(true).put("number", maxLikes).put("link_to_tweet", maxLikeslink));
		activityOnTweets.put("average_number_of_likes",
				(likesCount / (Integer.parseInt(searchresult.getJSONObject("search_metadata").getString("count")))));

		activityOnTweets.put("retweets_count", retweetCount);
		activityOnTweets.put("max_retweets",
				new JSONObject(true).put("number", maxRetweets).put("link_to_tweet", maxRetweetslink));
		activityOnTweets.put("average_number_of_retweets",
				(retweetCount / (Integer.parseInt(searchresult.getJSONObject("search_metadata").getString("count")))));

		activityOnTweets.put("hashtags_used_count", hashtagCount);
		activityOnTweets.put("max_hashtags",
				new JSONObject(true).put("number", maxHashtags).put("link_to_tweet", maxHashtagslink));
		activityOnTweets.put("average_number_of_hashtags_used",
				(hashtagCount / (Integer.parseInt(searchresult.getJSONObject("search_metadata").getString("count")))));

		activityCharts.put("likes", likesChart);
		activityCharts.put("retweets", retweetChart);
		activityCharts.put("hashtags", hashtagsChart);
		activityOnTweets.put("frequency_charts", activityCharts);
		finalresult.put("activity_on_my_tweets", activityOnTweets);

		// content analysis
		JSONObject contentAnalysis = new JSONObject(true);
		JSONObject languageAnalysis = new JSONObject(true);
		JSONObject sentimentAnalysis = new JSONObject(true);
		Set languageSet = new HashSet(languageList), sentimentSet = new HashSet(sentimentList);

		for (String s : languageSet) {
			languageAnalysis.put(s, Collections.frequency(languageList, s));
		}

		for (String s : sentimentSet) {
			sentimentAnalysis.put(s, Collections.frequency(sentimentList, s));
		}
		contentAnalysis.put("language_analysis", languageAnalysis);
		contentAnalysis.put("sentiment_analysis", sentimentAnalysis);
		finalresult.put("content_analysis", contentAnalysis);
		finalresultarray.put(finalresult);
		json.setData(finalresultarray);
		return json;
	}
}

The code first gets the Search API data from the username (stores it in a byte Array and then converts it to a JSONObject). It returns a SusiThought object so that it can be integrated into Susi. Once that is done, we are all set to analyse the data.

Let us go through the code feature-by-feature and I’ll explain what parts of the code implement those features:

Activity Frequency

For this, I initialised three ArrayLists: tweetDay, tweetHour and tweetDate, which will extract the Date, Day and Time from the tweet timestamp. The extraction is done using the Calendar instance (calculating time elapsed till timestamp and converting it to a date format and getting Month, Day and time from it). Once it is stored in the lists, I use the Collections function of java.util to calculate the frequency, and to keep it unique, I use a HashTable (implemented as a HashSet). So on running, the Activity Frequency of my username looks like this:

Activity Frequency

Activity Type

For this, I extracted the activity types from the Search API and summed every category up (imgCount += status.getInt("images_count"); etc), then directly added them into the JSONObject. Easy.

Activity Type

Activity on User’s Activity

For this, we analyse the stats on likes, retweets and hashtags. I again create three arraylists for them (they store the frequency), and in the main loop, I figure out the maximum number of likes, retweets and hashtags along with the link to the maximum number tweet, and calculate sum of number of likes, retweets and hashtags. Then, I again use Collections to get a frequency chart, and calculate the average number, and add them all into a JSONObject.

Activity on User's Activity

Content Analysis

This was again short: we extracted the emotion and language from the Search API, calculate the number, and put it in the JSONObject like we have done with everything else.

Content Analysis

This completes the TwitterAnalysis servlet: it gives you the entire profile analysis of a Twitter Profile, all powered by loklak.

This was relatively simpler to do. In my next blog post, I will discuss a bit about how loklak scrapes this data and makes it available, and how I integrate this application into Susi, so that we can get interesting profile data from a chatbot effortlessly. As always, feedback is welcome πŸ™‚

Social Media Analysis using Loklak (Part 1)

Big Data Collection with Desktop version of loklak wok

A desktop version of loklak wok is now available. The goal of the wok is to enable users to collect and parse data from social services like twitter and enable users, citizen scientists and companies to analyze big data.

The origin of the project is a tweet by @Frank_gamefreak. Thank you!

Please join the development on GitHub:Β https://github.com/loklak/loklak_wok_desktop

desktop_wok_heap_up_display

How to compile and run

  • import required lib by running setup.sh
  • compile with mvn clean install -Pexecutable-jar
  • run artifact in target dircetory: java -jar wok-desktop-0.0.1-SNAPSHOT-jar-with-all-dependencies.jar
  • stop program with ESC key

To be done

  • The code has been hacked and butchered and is some kind of Frankenstein. It needs cleanup.
  • Font size is hardcoded. How ugly is that?
  • It would be cool to have a project for code shared between Android and Desktop version.
  • The only dependency which can not be resolved via Maven is loklakj. Wouldn’t it be cool to change that?
  • The used font does not seem to support Asian characters.
Big Data Collection with Desktop version of loklak wok

Tweet analytics with loklak and Kibana as a search front-end

You can use Kibana to analyze large amounts of Tweet data as a source for statistical data. Please find more info onΒ http://loklak.org/download.html#kibana

Kibana is a tool to “explore and visualize your data”. It is not actually a search front-end but you can use it as such. Because Kibana is made for elasticsearch, it will instantly fit on loklak without any modification or configuration. Here is what you need to do:

kibana_screenshot

Kibana is pre-configured with default values to attach to an elasticsearch index containing logstash data. We will use a differnt index name than logstash: the loklak index names are ‘messages’ and ‘users’. When the Kibana Settings page is visible in your browser, do:

  • On the ‘Configure an index pattern’ Settings-page of the kibana interface, enter “messages” (without the quotes) in the field “Index name or pattern”.
  • As soon as you typed this in, another field “Time-field name” appears, with a red border and empty. Use the selectbox-arrows on the right side of the empty field to select one entry which is there: “created_at”.
  • Push the ‘Create’ button.

A page with the name “messages” appears and shows all index fields of the loklak messages index. If you want to search the index from Kibana, do:

  • Click on “Discover” in the upper menu bar.
  • You may probably see a page with the headline “No results found”. If your loklak index is not empty, this may be caused by a too tight time range; therefore the next step should solve that:
  • Click on the time picker in the top right corner of the window and select (i.e.) “This month”.
  • A ‘searching’ Message appears, followed with a search result page and a histogram at the top.
  • replace the wild-card symbol ‘*’ in the query input line with a word which you want to search, i.e. ‘fossasia’
  • You can also select a time period using a click-drag over the histogram to narrow the search result.
  • You can click on the field names on the left border to show a field facet. Click on the ‘+’-sign at the facet item to activate the facet.

The remote search to twitter with the twitter scraper is not done using the elasticsearch ‘river’ method to prevent that a user-frontend like Kibana constantly triggers a remote search. Therefore this search method with kibana will not help to enrich your search index with remote search results. This also means that you won’t see any results in Kibana until you searched with the /api/search.json api.

Tweet analytics with loklak and Kibana as a search front-end