Social Media Analysis using Loklak (Part 1)

So now in the past 7 posts, I have covered how loklak_depot started off, the beginnings, and how we got loklak_depot on the World Wide Web so that people got to know more about it. Now, I shall go a bit deeper with what we at loklak are trying to do with all the information we are getting.

Since Twitter gives a lot of information, about how people are living their lives, and things on a variety of topics: sports, business, finance, food etc, it is human information and such information can be taken as a (not-completely because there are a lot of users) reliable source for a variety of data analysis. We could do a host of things with that data we get from tweets, and a lot of things as a matter of fact have been done already, like stock market predictions, sentiment analysis etc.

One of these uses is also analysing Social Media profiles. Give in a Twitter / any Online Social Media profile, and the program gives you a complete analysis of the posts that you make and the people you connect with. Another use is Susi, our NLP-based Search Engine, on which a couple of blogposts have already been made.

So while I was spotting issues with the Search API and checking out the others, I had an idea to make an small application to which we supply a Twitter username and we get a Profile analysis of that username. The app has been built and works well. I have only built the servlet because I planned on integrating it to Susi, and using it with susi would actually give a lot of interesting answers.

The app / servlet has the following features I decided to implement:

1. User’s Activity Frequency: yearly, hourly and daily statistics etc
2. User’s Activity Type: how many times did user upload photo / video / status etc
3. Activity on User’s Activity: Number of likes / retweets user did, hashtags analysis etc
4. Analysis of User’s content: Language and Sentiment analysis

So here is the entire code for the program. After this, I will explain how I implemented the parts step by step:


public class TwitterAnalysisService extends AbstractAPIHandler implements APIHandler {

	private static final long serialVersionUID = -3753965521858525803L;

	private static HttpServletRequest request;

	@Override
	public String getAPIPath() {
		return "/api/twitanalysis.json";
	}

	@Override
	public BaseUserRole getMinimalBaseUserRole() {
		return BaseUserRole.ANONYMOUS;
	}

	@Override
	public JSONObject getDefaultPermissions(BaseUserRole baseUserRole) {
		return null;
	}

	@Override
	public JSONObject serviceImpl(Query call, HttpServletResponse response, Authorization rights,
			JSONObjectWithDefault permissions) throws APIException {
		String username = call.get("screen_name", "");
		String count = call.get("count", "");
		TwitterAnalysisService.request = call.getRequest();
		return showAnalysis(username, count);
	}

	public static SusiThought showAnalysis(String username, String count) {

		SusiThought json = new SusiThought();
		JSONArray finalresultarray = new JSONArray();
		JSONObject finalresult = new JSONObject(true);
		String siteurl = request.getRequestURL().toString();
		String baseurl = siteurl.substring(0, siteurl.length() - request.getRequestURI().length())
				+ request.getContextPath();

		String searchurl = baseurl + "/api/search.json?q=from%3A" + username + (count != "" ? ("&count=" + count) : "");
		byte[] searchbyte;
		try {
			searchbyte = ClientConnection.download(searchurl);
		} catch (IOException e) {
			return json.setData(new JSONArray().put(new JSONObject().put("Error", "Can't contact server")));
		}
		String searchstr = UTF8.String(searchbyte);
		JSONObject searchresult = new JSONObject(searchstr);

		JSONArray tweets = searchresult.getJSONArray("statuses");
		if (tweets.length() == 0) {
			finalresult.put("error", "Invalid username " + username + " or no tweets");
			finalresultarray.put(finalresult);
			json.setData(finalresultarray);
			return json;
		}
		finalresult.put("username", username);
		finalresult.put("items_per_page", searchresult.getJSONObject("search_metadata").getString("itemsPerPage"));
		finalresult.put("tweets_analysed", searchresult.getJSONObject("search_metadata").getString("count"));

		// main loop
		JSONObject activityFreq = new JSONObject(true);
		JSONObject activityType = new JSONObject(true);
		int imgCount = 0, audioCount = 0, videoCount = 0, linksCount = 0, likesCount = 0, retweetCount = 0,
				hashtagCount = 0;
		int maxLikes = 0, maxRetweets = 0, maxHashtags = 0;
		String maxLikeslink, maxRetweetslink, maxHashtagslink;
		maxLikeslink = maxRetweetslink = maxHashtagslink = tweets.getJSONObject(0).getString("link");
		List tweetDate = new ArrayList();
		List tweetHour = new ArrayList();
		List tweetDay = new ArrayList();
		List likesList = new ArrayList();
		List retweetsList = new ArrayList();
		List hashtagsList = new ArrayList();
		List languageList = new ArrayList();
		List sentimentList = new ArrayList();
		Calendar calendar = Calendar.getInstance();

		for (int i = 0; i < tweets.length(); i++) {
			JSONObject status = tweets.getJSONObject(i);
			String[] datearr = status.getString("created_at").split("T")[0].split("-");
			calendar.set(Integer.parseInt(datearr[0]), Integer.parseInt(datearr[1]) - 1, Integer.parseInt(datearr[2]));
			Date date = new Date(calendar.getTimeInMillis());
			tweetDate.add(new SimpleDateFormat("MMMM yyyy").format(date));
			tweetDay.add(new SimpleDateFormat("EEEE", Locale.ENGLISH).format(date)); // day
			String times = status.getString("created_at").split("T")[1];
			String hour = times.substring(0, times.length() - 5).split(":")[0];
			tweetHour.add(hour); // hour
			imgCount += status.getInt("images_count");
			audioCount += status.getInt("audio_count");
			videoCount += status.getInt("videos_count");
			linksCount += status.getInt("links_count");
			likesList.add(status.getInt("favourites_count"));
			retweetsList.add(status.getInt("retweet_count"));
			hashtagsList.add(status.getInt("hashtags_count"));
			if (status.has("classifier_emotion")) {
				sentimentList.add(status.getString("classifier_emotion"));
			} else {
				sentimentList.add("neutral");
			}
			if (status.has("classifier_language")) {
				languageList.add(status.getString("classifier_language"));
			} else {
				languageList.add("no_text");
			}
			if (maxLikes < status.getInt("favourites_count")) {
				maxLikes = status.getInt("favourites_count");
				maxLikeslink = status.getString("link");
			}
			if (maxRetweets < status.getInt("retweet_count")) {
				maxRetweets = status.getInt("retweet_count");
				maxRetweetslink = status.getString("link");
			}
			if (maxHashtags < status.getInt("hashtags_count")) {
				maxHashtags = status.getInt("hashtags_count");
				maxHashtagslink = status.getString("link");
			}
			likesCount += status.getInt("favourites_count");
			retweetCount += status.getInt("retweet_count");
			hashtagCount += status.getInt("hashtags_count");
		}
		activityType.put("posted_image", imgCount);
		activityType.put("posted_audio", audioCount);
		activityType.put("posted_video", videoCount);
		activityType.put("posted_link", linksCount);
		activityType.put("posted_story",
				Integer.parseInt(searchresult.getJSONObject("search_metadata").getString("count"))
						- (imgCount + audioCount + videoCount + linksCount));

		JSONObject yearlyact = new JSONObject(true);
		JSONObject hourlyact = new JSONObject(true);
		JSONObject dailyact = new JSONObject(true);
		Set yearset = new HashSet(tweetDate);
		Set hourset = new HashSet(tweetHour);
		Set dayset = new HashSet(tweetDay);

		for (String s : yearset) {
			yearlyact.put(s, Collections.frequency(tweetDate, s));
		}

		for (String s : hourset) {
			hourlyact.put(s, Collections.frequency(tweetHour, s));
		}

		for (String s : dayset) {
			dailyact.put(s, Collections.frequency(tweetDay, s));
		}

		activityFreq.put("yearwise", yearlyact);
		activityFreq.put("hourwise", hourlyact);
		activityFreq.put("daywise", dailyact);
		finalresult.put("tweet_frequency", activityFreq);
		finalresult.put("tweet_type", activityType);

		// activity on my tweets

		JSONObject activityOnTweets = new JSONObject(true);
		JSONObject activityCharts = new JSONObject(true);
		JSONObject likesChart = new JSONObject(true);
		JSONObject retweetChart = new JSONObject(true);
		JSONObject hashtagsChart = new JSONObject(true);

		Set likesSet = new HashSet(likesList);
		Set retweetSet = new HashSet(retweetsList);
		Set hashtagSet = new HashSet(hashtagsList);

		for (Integer i : likesSet) {
			likesChart.put(i.toString(), Collections.frequency(likesList, i));
		}

		for (Integer i : retweetSet) {
			retweetChart.put(i.toString(), Collections.frequency(retweetsList, i));
		}

		for (Integer i : hashtagSet) {
			hashtagsChart.put(i.toString(), Collections.frequency(hashtagsList, i));
		}

		activityOnTweets.put("likes_count", likesCount);
		activityOnTweets.put("max_likes",
				new JSONObject(true).put("number", maxLikes).put("link_to_tweet", maxLikeslink));
		activityOnTweets.put("average_number_of_likes",
				(likesCount / (Integer.parseInt(searchresult.getJSONObject("search_metadata").getString("count")))));

		activityOnTweets.put("retweets_count", retweetCount);
		activityOnTweets.put("max_retweets",
				new JSONObject(true).put("number", maxRetweets).put("link_to_tweet", maxRetweetslink));
		activityOnTweets.put("average_number_of_retweets",
				(retweetCount / (Integer.parseInt(searchresult.getJSONObject("search_metadata").getString("count")))));

		activityOnTweets.put("hashtags_used_count", hashtagCount);
		activityOnTweets.put("max_hashtags",
				new JSONObject(true).put("number", maxHashtags).put("link_to_tweet", maxHashtagslink));
		activityOnTweets.put("average_number_of_hashtags_used",
				(hashtagCount / (Integer.parseInt(searchresult.getJSONObject("search_metadata").getString("count")))));

		activityCharts.put("likes", likesChart);
		activityCharts.put("retweets", retweetChart);
		activityCharts.put("hashtags", hashtagsChart);
		activityOnTweets.put("frequency_charts", activityCharts);
		finalresult.put("activity_on_my_tweets", activityOnTweets);

		// content analysis
		JSONObject contentAnalysis = new JSONObject(true);
		JSONObject languageAnalysis = new JSONObject(true);
		JSONObject sentimentAnalysis = new JSONObject(true);
		Set languageSet = new HashSet(languageList), sentimentSet = new HashSet(sentimentList);

		for (String s : languageSet) {
			languageAnalysis.put(s, Collections.frequency(languageList, s));
		}

		for (String s : sentimentSet) {
			sentimentAnalysis.put(s, Collections.frequency(sentimentList, s));
		}
		contentAnalysis.put("language_analysis", languageAnalysis);
		contentAnalysis.put("sentiment_analysis", sentimentAnalysis);
		finalresult.put("content_analysis", contentAnalysis);
		finalresultarray.put(finalresult);
		json.setData(finalresultarray);
		return json;
	}
}

The code first gets the Search API data from the username (stores it in a byte Array and then converts it to a JSONObject). It returns a SusiThought object so that it can be integrated into Susi. Once that is done, we are all set to analyse the data.

Let us go through the code feature-by-feature and I’ll explain what parts of the code implement those features:

Activity Frequency

For this, I initialised three ArrayLists: tweetDay, tweetHour and tweetDate, which will extract the Date, Day and Time from the tweet timestamp. The extraction is done using the Calendar instance (calculating time elapsed till timestamp and converting it to a date format and getting Month, Day and time from it). Once it is stored in the lists, I use the Collections function of java.util to calculate the frequency, and to keep it unique, I use a HashTable (implemented as a HashSet). So on running, the Activity Frequency of my username looks like this:

Activity Frequency

Activity Type

For this, I extracted the activity types from the Search API and summed every category up (imgCount += status.getInt("images_count"); etc), then directly added them into the JSONObject. Easy.

Activity Type

Activity on User’s Activity

For this, we analyse the stats on likes, retweets and hashtags. I again create three arraylists for them (they store the frequency), and in the main loop, I figure out the maximum number of likes, retweets and hashtags along with the link to the maximum number tweet, and calculate sum of number of likes, retweets and hashtags. Then, I again use Collections to get a frequency chart, and calculate the average number, and add them all into a JSONObject.

Activity on User's Activity

Content Analysis

This was again short: we extracted the emotion and language from the Search API, calculate the number, and put it in the JSONObject like we have done with everything else.

Content Analysis

This completes the TwitterAnalysis servlet: it gives you the entire profile analysis of a Twitter Profile, all powered by loklak.

This was relatively simpler to do. In my next blog post, I will discuss a bit about how loklak scrapes this data and makes it available, and how I integrate this application into Susi, so that we can get interesting profile data from a chatbot effortlessly. As always, feedback is welcome 🙂

Social Media Analysis using Loklak (Part 1)