Social Media Analysis using Loklak (Part 3)

In my last two blog posts, I spoke about the TwitterAnalysis Servlet, and how the data actually comes into it through scraping methods. For TwitterAnalysis though, there was one thing that was missing: Susi integration, which I’ll cover in this blog post.

Given that the TwitterAnalysis servlet is basically a Social Media profile analyser, we could definitely get a lot of useful statistics from it. As covered earlier, we are getting likes, retweets, hashtag statistics, sentiment analysis, frequency charts etc. Now, to get this working on Susi, we need to build queries which can use these statistics and give the user valuable information.

First off, the serviceImpl method needs to be changed to return a SusiThought object. SusiThought is a JSONObject which processes the query (does keyword extraction etc), uses the APIs to get an answer to the query, and returns the answer along with the count of answers (incase of a table). SusiThought is what triggers the entire Susi mechanism, so the first thing for Susi integration is to convert TwitterAnalysis to return a SusiThought object:


@Override
	public JSONObject serviceImpl(Query call, HttpServletResponse response, Authorization rights,
			JSONObjectWithDefault permissions) throws APIException {
		String username = call.get("screen_name", "");
		String count = call.get("count", "");
		TwitterAnalysisService.request = call.getRequest();
		return showAnalysis(username, count);
	}
public static SusiThought showAnalysis(String username, String count) {

//rest of the code as explained in last blog post
//SusiThought is a JSONObject so we simply copy-paste the serviceImpl code here

}

Once this is done, we write up the queries in the susi_cognition.

As you may have read in my last blog post, TwitterAnalysis gives the Twitter Profile analysis of a user, it’s basically statistics, so we could have a lot of queries regarding this. So these are the rules I implemented, they are self-explanatory on reading the example fields:


{
			"keys"   :["tweet frequency", "tweets", "month"],
			"score"  :2000,
			"example": "How many tweets did melaniatrump post in May 2016",
			"phrases":[ {"type":"pattern", "expression":"* tweet frequency of * in *"},
				{"type":"pattern", "expression":"* tweets did * post in *"},
				{"type":"pattern", "expression":"* tweets did * post in the month of *"}
			],
			"process":[ {"type": "console", "expression": "SELECT yearwise[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ tweeted $count$ times in $3$"
			]}]
		},
		{
			"keys"   :["tweet frequency", "tweets", "post", "at"],
			"score"  :2000,
			"example": "How many tweets did melaniatrump post at 6 PM",
			"phrases":[ {"type":"pattern", "expression":"* tweet frequency of * at *"},
				{"type":"pattern", "expression":"* tweets did * post at *"}
			],
			"process":[ {"type": "console", "expression": "SELECT hourwise[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ tweeted $count$ times at $3$"
			]}]
		},
		{
			"keys"   :["tweet frequency", "tweets", "post", "on"],
			"score"  :2000,
			"example": "How many tweets did melaniatrump post on Saturdays",
			"phrases":[ {"type":"pattern", "expression":"* tweet frequency of * on *s"},
				{"type":"pattern", "expression":"* tweets did * post on *s"},
				{"type":"pattern", "expression":"* tweet frequency of * on *"},
				{"type":"pattern", "expression":"* tweets did * post on *"}
			],
			"process":[ {"type": "console", "expression": "SELECT daywise[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ tweeted $count$ times on $3$"
			]}]
		},
		{
			"keys"   :["tweet frequency", "chart"],
			"score"  :2000,
			"example": "Show me the yearwise tweet frequency chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* the * tweet frequency chart of *"}],
			"process":[ {"type": "console", "expression": "SELECT $2$ FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"This is the $2$ frequency chart of $3$"
			]}, {"type":"table"}]
		},
		{
			"keys"   :["tweet type", "post", "a"],
			"score"  :2000,
			"example": "How many times did melaniatrump post a video",
			"phrases":[ {"type":"pattern", "expression":"* did * post a *"}],
			"process":[ {"type": "console", "expression": "SELECT $3$ AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ posted a $3$ $count$ times"
			]}]
		},
		{
			"keys"   :["tweet activity", "likes", "count"],
			"example": "How many likes does melaniatrump have in all",
			"score"  :2000,
			"phrases":[ {"type":"pattern", "expression":"* likes does * have *"}
			],
			"process":[ {"type": "console", "expression": "SELECT likes_count AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ has $count$ likes till now"
			]}]
		},
		{
			"keys"   :["tweet activity", "likes", "maximum"],
			"example": "What is the maximum number of likes that melaniatrump got",
			"score"  :2000,
			"phrases":[ {"type":"pattern", "expression":"* maximum * likes that * got"}
			],
			"process":[ {"type": "console", "expression": "SELECT max_likes FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here you go"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet activity", "likes", "average"],
			"example": "What is the average number of likes that melaniatrump gets",
			"score"  :2000,
			"phrases":[ {"type":"pattern", "expression":"* average * likes that * gets"}
			],
			"process":[ {"type": "console", "expression": "SELECT average_number_of_likes AS count FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$3$ gets $count$ likes on an average"
			]}]
		},
		{
			"keys"   :["tweet activity", "likes", "frequency"],
			"score"  :2000,
			"example": "How many times did melaniatrump get 0 likes",
			"phrases":[ {"type":"pattern", "expression":"* * have * likes"},
				{"type":"pattern", "expression":"* * get * likes"}
			],
			"process":[ {"type": "console", "expression": "SELECT likes_chart[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ got $3$ likes, $count$ times"
			]}]
		},
		{
			"keys"   :["tweet activity", "likes", "frequency", "chart"],
			"score"  :2000,
			"example": "Show me the likes frequency chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* likes frequency chart * *"}
			],
			"process":[ {"type": "console", "expression": "SELECT likes_chart FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here is the likes frequency chart"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet activity", "retweets", "count"],
			"score"  :2000,
			"example": "How many retweets does melaniatrump have in all",
			"phrases":[ {"type":"pattern", "expression":"* retweets does * have *"}
			],
			"process":[ {"type": "console", "expression": "SELECT retweets_count AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ has $count$ retweets till now"
			]}]
		},
		{
			"keys"   :["tweet activity", "retweets", "maximum"],
			"score"  :2000,
			"example": "What is the maximum number of retweets that melaniatrump got",
			"phrases":[ {"type":"pattern", "expression":"* maximum * retweets that * got"}
			],
			"process":[ {"type": "console", "expression": "SELECT max_retweets FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here you go"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet activity", "retweets", "average"],
			"score"  :2000,
			"example": "What is the average number of retweets that melaniatrump gets",
			"phrases":[ {"type":"pattern", "expression":"* average * retweets that * gets"}
			],
			"process":[ {"type": "console", "expression": "SELECT average_number_of_retweets AS count FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$3$ gets $count$ retweets on an average"
			]}]
		},
		{
			"keys"   :["tweet activity", "retweets", "frequency"],
			"score"  :2000,
			"example": "How many times did melaniatrump get 0 retweets",
			"phrases":[ {"type":"pattern", "expression":"* * have * retweets"},
				{"type":"pattern", "expression":"* * get * retweets"}
			],
			"process":[ {"type": "console", "expression": "SELECT retweets_chart[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ got $3$ retweets, $count$ times"
			]}]
		},
		{
			"keys"   :["tweet activity", "retweets", "frequency", "chart"],
			"score"  :2000,
			"example": "Show me the retweet frequency chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* retweet frequency chart * *"}
			],
			"process":[ {"type": "console", "expression": "SELECT retweets_chart FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here is the retweets frequency chart"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet activity", "hashtags", "count"],
			"score"  :2000,
			"example": "How many hashtags has melaniatrump used in all",
			"phrases":[ {"type":"pattern", "expression":"* hashtags has * used *"}
			],
			"process":[ {"type": "console", "expression": "SELECT hashtags_used_count AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ has used $count$ hashtags till now"
			]}]
		},
		{
			"keys"   :["tweet activity", "hashtags", "maximum"],
			"score"  :2000,
			"example": "What is the maximum number of hastags that melaniatrump used",
			"phrases":[ {"type":"pattern", "expression":"* maximum * hashtags that * used"}
			],
			"process":[ {"type": "console", "expression": "SELECT max_hashtags FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here you go"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet activity", "hashtags", "average"],
			"score"  :2000,
			"example": "What is the average number of hashtags that melaniatrump uses",
			"phrases":[ {"type":"pattern", "expression":"* average * hashtags that * uses"}
			],
			"process":[ {"type": "console", "expression": "SELECT average_number_of_hashtags_used AS count FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$3$ uses $count$ hashtags on an average"
			]}]
		},
		{
			"keys"   :["tweet activity", "hashtags", "frequency"],
			"score"  :2000,
			"example": "How many times did melaniatrump use 20 hashtags",
			"phrases":[ {"type":"pattern", "expression":"* * use * hashtags"}
			],
			"process":[ {"type": "console", "expression": "SELECT hashtags_chart[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ used $3$ hashtags, $count$ times"
			]}]
		},
		{
			"keys"   :["tweet activity", "hashtags", "frequency", "chart"],
			"score"  :2000,
			"example": "Show me the hashtag frequency chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* hashtag frequency chart * *"}
			],
			"process":[ {"type": "console", "expression": "SELECT hashtags_chart FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here is the hashtags frequency chart"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet content", "language", "frequency"],
			"score"  :2000,
			"example": "How many tweets did melaniatrump write in English?",
			"phrases":[ {"type":"pattern", "expression":"* * write in *"},
				{"type":"pattern", "expression":"* * post in *"},
				{"type":"pattern", "expression":"* of * were written in *"}
			],
			"process":[ {"type": "console", "expression": "SELECT languages[$3$] AS count FROM twitanalysis WHERE screen_name='$2$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"$2$ posted $count$ tweets in $3$"
			]}]
		},
		{
			"keys"   :["tweet content", "language", "analysis", "chart"],
			"score"  :2000,
			"example": "Show me the language analysis chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* language analysis chart * *"}
			],
			"process":[ {"type": "console", "expression": "SELECT languages FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here is the language analysis chart"
			]}, {"type": "table"}]
		},
		{
			"keys"   :["tweet content", "sentiment", "analysis", "chart"],
			"score"  :2000,
			"example": "Show me the sentiment analysis chart of melaniatrump",
			"phrases":[ {"type":"pattern", "expression":"* sentiment analysis chart * *"}
			],
			"process":[ {"type": "console", "expression": "SELECT sentiments FROM twitanalysis WHERE screen_name='$3$' AND count='1000';"}],
			"actions":[ {"type": "answer", "select": "random", "phrases":[
				"Here is the sentiment analysis chart"
			]}, {"type": "table"}]
		}

(PS: no points for guessing why melaniatrump is there in the examples πŸ˜‰ )

As has been explained before, I simply write up an expression consisting of parameters and some core words which are hardcoded, and I then fetch up the parameters using $x$ (x = parameter number). These queries can actually give a whole lot of statistics regarding a user’s activity and activity on his profile, so it is definitely a whole lot useful for a chatbot.

Now, to end this, we need a way to process these queries. Enter ConsoleService. Notice that all the process['expression'] SQL queries are of the type:

SELECT <something> FROM twitanalysis where screen_name = '<parameter_of_username>' AND count = '1000';

I have taken count as 1000 because as mentioned in my last blog post, the scraper scrapes and displays a maximum of 1000 results at a time, so I wish to maximise the range.

Converting the above SQL generalised query to regex, we get this form:

SELECT\\h+?(.*?)\\h+?FROM\\h+?twitanalysis\\h+?WHERE\\h+?screen_name\\h??=\\h??'(.*?)'\\h+?AND\\h+?count\\h??=\\h??'(.*?)'\\h??;

The \h is for a whitespace that may occur, and the random queries are just expressed using (.*) (random character selection in regex). Since we have specific parameters (as described above) that we use in our SQL queries, we encapsulate these random character selections into groups.

Now, we need to compile this regex, and point to what needs to be done. This is done in ConsoleService.


dbAccess.put(Pattern.compile("SELECT\\h+?(.*?)\\h+?FROM\\h+?twitanalysis\\h+?WHERE\\h+?screen_name\\h??=\\h??'(.*?)'\\h+?AND\\h+?count\\h??=\\h??'(.*?)'\\h??;"), (flow, matcher) -> {
            SusiThought json = TwitterAnalysisService.showAnalysis(matcher.group(2), matcher.group(3));
            SusiTransfer transfer = new SusiTransfer(matcher.group(1));
            json.setData(transfer.conclude(json.getData()));
            return json;
        });

We basically compile the regex, and feed it to a bifunction (lambda lingo for a function that takes in two params). We take in the groups using matcher.group and since you saw above, the SusiThought object in TwitterAnalysis takes in screen_name and count, so we take them in using the matcher and feed them to the static function showAnalysisinside the TwitterAnalysis servlet. We then get back the JSON. This completes the procedure. TwitterAnalysis is now integrated with the Susi API. πŸ™‚

In my next blog posts, I’ll talk about Bot integrations for Susi, and a Slack Bot for Susi I made, and then I’ll move to Susi monetisation using Amazon API. Feedback is welcome πŸ™‚

Social Media Analysis using Loklak (Part 3)