Automated Deployments and Developmental pipeline for loklak server

Loklak Server project has been growing and seeing an increase in the users consuming the APIs, the contributors and has been extending its reach into other territories like IOT and Artificial Intelligence chats. As the project grew, It became important to keep the server easily deployable which was done previously by integrating the one click button deployment procedures to Loklak Server for anybody to spin up their own servers.

As we grew we made quite a few mistakes in the development, overriding others’ work, conflicting patches and the system kept breaking as we migrated from Java 7 to Java 8. To avoid problems due to an increase in contributions and a lot of members working together, there needed to be a stronger engineering workflow to ensure that the development still goes unhampered and there’s lesser time taken to pull and review.

View post on imgur.com

We’ve strongly adopted the build and release per commit where instead of periodically taking the upstream changes and deploying onto the server we now leverage existing continuous integration tools that we’ve employed to run the builds for us to also perform the deployments onto the staging / dev servers. This was done using Heroku and Travis, where every successful travis build runs a trigger to Heroku to deploy and run the server on the staging server. This has dramatically reduced the errors that we encountered before and also proved as the testing ground for new features before moving them to the production server at loklak.org

Implementation


deploy:
provider: heroku
email: [email protected]
strategy: git
buildpack: https://github.com/loklak/heroku_buildpack_ant_loklak
api_key:
secure: D2o+G28w42F9rDbde......PL/Q=
app: loklak-server-dev
on:
branch: development

Automated Deployments and Developmental pipeline for loklak server

Susi Rule-Score Hierarchy Brainstorming

For Susi’s score system, we need a hierarchy to assign good score values to the rules. To do so we should develop a hierarchy to find an easy system that can be used to assign scores to new rules.

Please add your suggestions below, as you add your ideas we will change a score hierarchy suggestion below.

Preliminary Consideration: Patterns

We have two kinds of rules: such with patterns and others without. The meansing of such rules are:

with pattern(s):

  • (P+LR) variables in pattern should be used for retrieval in internal Susi’s log (reflection memory)
  • (P+IR) variables in pattern should be used for retrieval in internal databases
  • (P+ER) variables in pattern should be used for retrieval in external databases
  • (P+LS) variables in pattern should be stored in Susi’s memory to be used for reflection later
  • (P+IS) variables in pattern should be stored in internal databases to be used for retrieval later
  • (P+ES) variables in pattern should be stored in external databases to be used for retrieval later

without any pattern:

  • (P-D) default answers if no other rule applies
  • (P-O) overruling of rules which would apply, but should not

Secondary Consideration: Purpose

We have three kinds of purposes for Susi answers:

  • (/A) to answer on the users question
  • (/Q) to ask a question to the user in the context of an objective within Susi’s planning to do a conversation
  • (/R) to answer on an answer of the user within the planning of Susi to do a conversation. It appears clear that answers in the context of a Susi conversation strategy should have higher priority.

Combinations of Pattern and Purpose Considerations:

To combine the various Pattern and Purpose types, we write the abbreviations of these items together. For example, we want to answer on a question of the user “Are you happy” with “Yes!, Are you happy as well?” which would be an rule of type P-O/Q. The combination of the both consideration types give 8×3=24 possibilities.

Score Hierarchy

I believe there should be
– score(R) > score(Q) > score(A):
to do a steering of conversations within a conversation plan.
– score(P-O) > score(P+?) > score(P-D):
overruling of pattern-directed answers and default answers in case of pattern fail
– score(P+?S) > score(P+?R):
storing of information (= learning) is more important than answering
– score(P+L?) > score(P+I?) > score(P+E?):
using local information is primary above external information. Reflection is most important.

This produces the following order (with decreasing score, first line has highest sore):

– Overruling of patterns:
– R/P-O
– Q/P-O
– A/P-O

– Answer on an Answer of the user using patterns, possibly learning, otherwise retrieving data
– R/P+LS
– R/P+IS
– R/P+ES
– R/P+LR
– R/P+IR
– R/P+ER

– Asking the user a question with the purpose of learning with the users answer
– Q/P+LS
– Q/P+IS
– Q/P+ES
– Q/P+LR
– Q/P+IR
– Q/P+ER

– Just giving an answer to the question of the user
– A/P+LS
– A/P+IS
– A/P+ES
– A/P+LR
– A/P+IR
– A/P+ER

– Fail-over if no other rule apply to just answer anything, but try to start a new conversation
– R/P-D
– Q/P-D
– A/P-D

Susi Rule-Score Hierarchy Brainstorming

Susi chat interface with visualizations

Susi got few capabilities to visualize it’s response. She can respond by sharing links, showing analytics on pie charts and give you a list of bulleted data. So this post shows you on how these components are integrated into Susi.

The rules which are defined can give data in various compatible forms. It can give links, share some analytics in the form of percentages and certain list of data. For example, in the previous blog post on adding susi rules, we added a sample rule showing on how to add types of responses to susi. If you want more context on it, you can click here.

  • Susi taking responses from data: This type of response is in the form of a table. Susi can take the extra data under the data.answers[0].data , where the type if of table. The below is a sample JSON format from which the tabular data can be parsed.

Selection_251

From the above JSON the data under the answers object is being traced for the tabulated answers. This expression will get you the titles of the reddit articles.

Selection_252

The above response is for the following query

What are the reddit articles about loklak

This is Susi’s response on asksusi.

Selection_249

  • Susi answering using piecharts: Susi rules can also be defined in such a way that the response can give out a well formed pie chart. The data required for the piechart is defined and this can easily be interpreted using highcharts for giving a clear pie chart response. Here is sample JSON response for the following query.
Who will win the 2016 presidential election

Selection_234

The above JSON defines data for piecharts giving percentage and relevant name for that particular object. This is easy to interpret the json for defining the piecharts using highchart.js . The below is the sample code which was used to define the piecharts.

Selection_253

This is how the interface answers with piecharts.

Selection_247

Selection_248

  • So susi can also interpret links from the response and linkify them accordingly.

Here is the sample code on how Susi interprets the links from the response.

Selection_254

The links are linkified and this how Susi responses.

Selection_255

Stay tuned for more updates on Susi.

 

 

Susi chat interface with visualizations

Social Media Analysis using Loklak (Part 2)

In the last post, I spoke about the TwitterAnalysis servlet I developed, how we can analyse the entire twitter profile of a user and get useful data from that servlet. This was pretty simple to do, because all I really did was parse the Search API and some other simple commands which resulted into a concise and detailed profile analysis.

But there’s something that I’ve not spoken about yet, which I will in this blog post. How is the social media data collected in that form? Where does it come from and how?

Loklak, as is known, is a social media search server, which scrapes social media sites and profiles. Scraping is basically checking out the HTML source code of the website and from the relevant tags, getting out the information from there. The p2p nature of loklak enables a lot of peers scrape synchronously and feed tweets to a backend, and store it in their own backend as well.

Scraping is a very well known practice, and in Java, we already have tools like JSoup which are easy-to-use scraping tools. You just need to connect to the website, mention the tags between which the information is present, and voila. Here is an example from the EventBrite scraper we have made:


public static SusiThought crawlEventBrite(String url) {
		Document htmlPage = null;

		try {
			htmlPage = Jsoup.connect(url).get();
		} catch (Exception e) {
			e.printStackTrace();
		}

		String eventID = null;
		String eventName = null;
		String eventDescription = null;

		// TODO Fetch Event Color
		String eventColor = null;

		String imageLink = null;

		String eventLocation = null;

		String startingTime = null;
		String endingTime = null;

		String ticketURL = null;

		Elements tagSection = null;
		Elements tagSpan = null;
		String[][] tags = new String[5][2];
		String topic = null; // By default

		String closingDateTime = null;
		String schedulePublishedOn = null;
		JSONObject creator = new JSONObject();
		String email = null;

		Float latitude = null;
		Float longitude = null;

		String privacy = "public"; // By Default
		String state = "completed"; // By Default
		String eventType = "";

		String temp;
		Elements t;

		eventID = htmlPage.getElementsByTag("body").attr("data-event-id");
		eventName = htmlPage.getElementsByClass("listing-hero-body").text();
		eventDescription = htmlPage.select("div.js-xd-read-more-toggle-view.read-more__toggle-view").text();

		eventColor = null;

		imageLink = htmlPage.getElementsByTag("picture").attr("content");

		eventLocation = htmlPage.select("p.listing-map-card-street-address.text-default").text();

		temp = htmlPage.getElementsByAttributeValue("property", "event:start_time").attr("content");
		if(temp.length() >= 20){
			startingTime = htmlPage.getElementsByAttributeValue("property", "event:start_time").attr("content").substring(0,19);
		}else{
			startingTime = htmlPage.getElementsByAttributeValue("property", "event:start_time").attr("content");
		}

		temp = htmlPage.getElementsByAttributeValue("property", "event:end_time").attr("content");
		if(temp.length() >= 20){
			endingTime = htmlPage.getElementsByAttributeValue("property", "event:end_time").attr("content").substring(0,19);
		}else{
			endingTime = htmlPage.getElementsByAttributeValue("property", "event:end_time").attr("content");
		}

		ticketURL = url + "#tickets";

		// TODO Tags to be modified to fit in the format of Open Event "topic"
		tagSection = htmlPage.getElementsByAttributeValue("data-automation", "ListingsBreadcrumbs");
		tagSpan = tagSection.select("span");
		topic = "";

		int iterator = 0, k = 0;
		for (Element e : tagSpan) {
			if (iterator % 2 == 0) {
				tags[k][1] = "www.eventbrite.com"
						+ e.select("a.js-d-track-link.badge.badge--tag.l-mar-top-2").attr("href");
			} else {
				tags[k][0] = e.text();
				k++;
			}
			iterator++;
		}

		creator.put("email", "");
		creator.put("id", "1"); // By Default

		temp = htmlPage.getElementsByAttributeValue("property", "event:location:latitude").attr("content");
		if(temp.length() > 0){
			latitude = Float
				.valueOf(htmlPage.getElementsByAttributeValue("property", "event:location:latitude").attr("content"));
		}

		temp = htmlPage.getElementsByAttributeValue("property", "event:location:longitude").attr("content");
		if(temp.length() > 0){
			longitude = Float
				.valueOf(htmlPage.getElementsByAttributeValue("property", "event:location:longitude").attr("content"));
		}

		// TODO This returns: "events.event" which is not supported by Open
		// Event Generator
		// eventType = htmlPage.getElementsByAttributeValue("property",
		// "og:type").attr("content");

		String organizerName = null;
		String organizerLink = null;
		String organizerProfileLink = null;
		String organizerWebsite = null;
		String organizerContactInfo = null;
		String organizerDescription = null;
		String organizerFacebookFeedLink = null;
		String organizerTwitterFeedLink = null;
		String organizerFacebookAccountLink = null;
		String organizerTwitterAccountLink = null;

		temp = htmlPage.select("a.js-d-scroll-to.listing-organizer-name.text-default").text();
		if(temp.length() >= 5){
			organizerName = htmlPage.select("a.js-d-scroll-to.listing-organizer-name.text-default").text().substring(4);
		}else{
			organizerName = "";
		}
		organizerLink = url + "#listing-organizer";
		organizerProfileLink = htmlPage
				.getElementsByAttributeValue("class", "js-follow js-follow-target follow-me fx--fade-in is-hidden")
				.attr("href");
		organizerContactInfo = url + "#lightbox_contact";

		Document orgProfilePage = null;

		try {
			orgProfilePage = Jsoup.connect(organizerProfileLink).get();
		} catch (Exception e) {
			e.printStackTrace();
		}

		if(orgProfilePage != null){

			t = orgProfilePage.getElementsByAttributeValue("class", "l-pad-vert-1 organizer-website");
			if(t != null){
				organizerWebsite = orgProfilePage.getElementsByAttributeValue("class", "l-pad-vert-1 organizer-website").text();
			}else{
				organizerWebsite = "";
			}

			t = orgProfilePage.select("div.js-long-text.organizer-description");
			if(t != null){
				organizerDescription = orgProfilePage.select("div.js-long-text.organizer-description").text();
			}else{
				organizerDescription = "";
			}

			organizerFacebookFeedLink = organizerProfileLink + "#facebook_feed";
			organizerTwitterFeedLink = organizerProfileLink + "#twitter_feed";

			t = orgProfilePage.getElementsByAttributeValue("class", "fb-page");
			if(t != null){
				organizerFacebookAccountLink = orgProfilePage.getElementsByAttributeValue("class", "fb-page").attr("data-href");
			}else{
				organizerFacebookAccountLink = "";
			}

			t = orgProfilePage.getElementsByAttributeValue("class", "twitter-timeline");
			if(t != null){
				organizerTwitterAccountLink = orgProfilePage.getElementsByAttributeValue("class", "twitter-timeline").attr("href");
			}else{
				organizerTwitterAccountLink = "";
			}

		}

		

		JSONArray socialLinks = new JSONArray();

		JSONObject fb = new JSONObject();
		fb.put("id", "1");
		fb.put("name", "Facebook");
		fb.put("link", organizerFacebookAccountLink);
		socialLinks.put(fb);

		JSONObject tw = new JSONObject();
		tw.put("id", "2");
		tw.put("name", "Twitter");
		tw.put("link", organizerTwitterAccountLink);
		socialLinks.put(tw);

		JSONArray jsonArray = new JSONArray();

		JSONObject event = new JSONObject();
		event.put("event_url", url);
		event.put("id", eventID);
		event.put("name", eventName);
		event.put("description", eventDescription);
		event.put("color", eventColor);
		event.put("background_url", imageLink);
		event.put("closing_datetime", closingDateTime);
		event.put("creator", creator);
		event.put("email", email);
		event.put("location_name", eventLocation);
		event.put("latitude", latitude);
		event.put("longitude", longitude);
		event.put("start_time", startingTime);
		event.put("end_time", endingTime);
		event.put("logo", imageLink);
		event.put("organizer_description", organizerDescription);
		event.put("organizer_name", organizerName);
		event.put("privacy", privacy);
		event.put("schedule_published_on", schedulePublishedOn);
		event.put("state", state);
		event.put("type", eventType);
		event.put("ticket_url", ticketURL);
		event.put("social_links", socialLinks);
		event.put("topic", topic);
		jsonArray.put(event);

		JSONObject org = new JSONObject();
		org.put("organizer_name", organizerName);
		org.put("organizer_link", organizerLink);
		org.put("organizer_profile_link", organizerProfileLink);
		org.put("organizer_website", organizerWebsite);
		org.put("organizer_contact_info", organizerContactInfo);
		org.put("organizer_description", organizerDescription);
		org.put("organizer_facebook_feed_link", organizerFacebookFeedLink);
		org.put("organizer_twitter_feed_link", organizerTwitterFeedLink);
		org.put("organizer_facebook_account_link", organizerFacebookAccountLink);
		org.put("organizer_twitter_account_link", organizerTwitterAccountLink);
		jsonArray.put(org);

		JSONArray microlocations = new JSONArray();
		jsonArray.put(new JSONObject().put("microlocations", microlocations));

		JSONArray customForms = new JSONArray();
		jsonArray.put(new JSONObject().put("customForms", customForms));

		JSONArray sessionTypes = new JSONArray();
		jsonArray.put(new JSONObject().put("sessionTypes", sessionTypes));

		JSONArray sessions = new JSONArray();
		jsonArray.put(new JSONObject().put("sessions", sessions));

		JSONArray sponsors = new JSONArray();
		jsonArray.put(new JSONObject().put("sponsors", sponsors));

		JSONArray speakers = new JSONArray();
		jsonArray.put(new JSONObject().put("speakers", speakers));

		JSONArray tracks = new JSONArray();
		jsonArray.put(new JSONObject().put("tracks", tracks));
		SusiThought json = new SusiThought();
		json.setData(jsonArray);
		return json;

	}

As is seen, we first connect with the url using Jsoup.connect().get() and then we use methods like getElementByAttributeValue and getElementByTag to extract the information.

This is one way of scraping: by using tools like Jsoup. You could also do it manually. Just connect to the website, and use things like BufferedReader or InputStreamReader etc to manually extract the HTML and then iterate through it and extract the information. This method was adopted for the TwitterScraper we have.

In the TwitterScraper, we first connect to the URL using ClientConnection() and then use BufferedReader to get the HTML code, as shown here.


private static String prepareSearchURL(final String query) {
        // check
        // https://twitter.com/search-advanced for a better syntax
        // https://support.twitter.com/articles/71577-how-to-use-advanced-twitter-search#
        String https_url = "";
        try {
            StringBuilder t = new StringBuilder(query.length());
            for (String s: query.replace('+', ' ').split(" ")) {
                t.append(' ');
                if (s.startsWith("since:") || s.startsWith("until:")) {
                    int u = s.indexOf('_');
                    t.append(u < 0 ? s : s.substring(0, u));
                } else {
                    t.append(s);
                }
            }
            String q = t.length() == 0 ? "*" : URLEncoder.encode(t.substring(1), "UTF-8");
            //https://twitter.com/search?f=tweets&vertical=default&q=kaffee&src=typd
            https_url = "https://twitter.com/search?f=tweets&vertical=default&q=" + q + "&src=typd";
        } catch (UnsupportedEncodingException e) {}
        return https_url;
    }
    
    private static Timeline[] search(
            final String query,
            final Timeline.Order order,
            final boolean writeToIndex,
            final boolean writeToBackend) {
        // check
        // https://twitter.com/search-advanced for a better syntax
        // https://support.twitter.com/articles/71577-how-to-use-advanced-twitter-search#
        String https_url = prepareSearchURL(query);
        Timeline[] timelines = null;
        try {
            ClientConnection connection = new ClientConnection(https_url);
            if (connection.inputStream == null) return null;
            try {
                BufferedReader br = new BufferedReader(new InputStreamReader(connection.inputStream, StandardCharsets.UTF_8));
                timelines = search(br, order, writeToIndex, writeToBackend);
            } catch (IOException e) {
            	Log.getLog().warn(e);
            } finally {
                connection.close();
            }
        } catch (IOException e) {
            // this could mean that twitter rejected the connection (DoS protection?) or we are offline (we should be silent then)
            // Log.getLog().warn(e);
            if (timelines == null) timelines = new Timeline[]{new Timeline(order), new Timeline(order)};
        };

        // wait until all messages in the timeline are ready
        if (timelines == null) {
            // timeout occurred
            timelines = new Timeline[]{new Timeline(order), new Timeline(order)};
        }
        if (timelines != null) {
            if (timelines[0] != null) timelines[0].setScraperInfo("local");
            if (timelines[1] != null) timelines[1].setScraperInfo("local");
        }
        return timelines;
    }

If you check out the Search Servlet at /api/search.json, you would see that it accepts either plain query terms, or you can also do from: username or @username to see messages from the particular username. The prepareSearchURL parses this Search query, and converts it into a term Twitter’s Search can understand (because they don’t have this feature) and we then use Twitter’s Advanced Search to search. In Timelines[] Search, we use BufferedReader to get the HTML of Search Result, and we store it in a Timeline object for further use.

Now is the time when this HTML is to be processed. We need to check out the tags and work with them. This is achieved here:


private static Timeline[] search(
            final BufferedReader br,
            final Timeline.Order order,
            final boolean writeToIndex,
            final boolean writeToBackend) throws IOException {
        Timeline timelineReady = new Timeline(order);
        Timeline timelineWorking = new Timeline(order);
        String input;
        Map props = new HashMap();
        Set images = new LinkedHashSet();
        Set videos = new LinkedHashSet();
        String place_id = "", place_name = "";
        boolean parsing_favourite = false, parsing_retweet = false;
        int line = 0; // first line is 1, according to emacs which numbers the first line also as 1
        boolean debuglog = false;
        while ((input = br.readLine()) != null){
            line++;
            input = input.trim();
            if (input.length() == 0) continue;
            
            // debug
            //if (debuglog) System.out.println(line + ": " + input);            
            //if (input.indexOf("ProfileTweet-actionCount") > 0) System.out.println(input);

            // parse
            int p;
            if ((p = input.indexOf("=\"account-group")) > 0) {
                props.put("userid", new prop(input, p, "data-user-id"));
                continue;
            }
            if ((p = input.indexOf("class=\"avatar")) > 0) {
                props.put("useravatarurl", new prop(input, p, "src"));
                continue;
            }
            if ((p = input.indexOf("class=\"fullname")) > 0) {
                props.put("userfullname", new prop(input, p, null));
                continue;
            }
            if ((p = input.indexOf("class=\"username")) > 0) {
                props.put("usernickname", new prop(input, p, null));
                continue;
            }
            if ((p = input.indexOf("class=\"tweet-timestamp")) > 0) {
                props.put("tweetstatusurl", new prop(input, 0, "href"));
                props.put("tweettimename", new prop(input, p, "title"));
                // don't continue here because "class=\"_timestamp" is in the same line 
            }
            if ((p = input.indexOf("class=\"_timestamp")) > 0) {
                props.put("tweettimems", new prop(input, p, "data-time-ms"));
                continue;
            }
            if ((p = input.indexOf("class=\"ProfileTweet-action--retweet")) > 0) {
                parsing_retweet = true;
                continue;
            }
            if ((p = input.indexOf("class=\"ProfileTweet-action--favorite")) > 0) {
                parsing_favourite = true;
                continue;
            }
            if ((p = input.indexOf("class=\"TweetTextSize")) > 0) {
                // read until closing p tag to account for new lines in tweets
                while (input.lastIndexOf("

") == -1){ input = input + ' ' + br.readLine(); } prop tweettext = new prop(input, p, null); props.put("tweettext", tweettext); continue; } if ((p = input.indexOf("class=\"ProfileTweet-actionCount")) > 0) { if (parsing_retweet) { prop tweetretweetcount = new prop(input, p, "data-tweet-stat-count"); props.put("tweetretweetcount", tweetretweetcount); parsing_retweet = false; } if (parsing_favourite) { props.put("tweetfavouritecount", new prop(input, p, "data-tweet-stat-count")); parsing_favourite = false; } continue; } // get images if ((p = input.indexOf("class=\"media media-thumbnail twitter-timeline-link media-forward is-preview")) > 0 || (p = input.indexOf("class=\"multi-photo")) > 0) { images.add(new prop(input, p, "data-resolved-url-large").value); continue; } // we have two opportunities to get video thumbnails == more images; images in the presence of video content should be treated as thumbnail for the video if ((p = input.indexOf("class=\"animated-gif-thumbnail\"")) > 0) { images.add(new prop(input, 0, "src").value); continue; } if ((p = input.indexOf("class=\"animated-gif\"")) > 0) { images.add(new prop(input, p, "poster").value); continue; } if ((p = input.indexOf("= 0 && input.indexOf("type=\"video/") > p) { videos.add(new prop(input, p, "video-src").value); continue; } if ((p = input.indexOf("class=\"Tweet-geo")) > 0) { prop place_name_prop = new prop(input, p, "title"); place_name = place_name_prop.value; continue; } if ((p = input.indexOf("class=\"ProfileTweet-actionButton u-linkClean js-nav js-geo-pivot-link")) > 0) { prop place_id_prop = new prop(input, p, "data-place-id"); place_id = place_id_prop.value; continue; } if (props.size() == 10 || (debuglog && props.size() > 4 && input.indexOf("stream-item") > 0 /* li class="js-stream-item" starts a new tweet */)) { // the tweet is complete, evaluate the result if (debuglog) System.out.println("*** line " + line + " propss.size() = " + props.size()); prop userid = props.get("userid"); if (userid == null) {if (debuglog) System.out.println("*** line " + line + " MISSING value userid"); continue;} prop usernickname = props.get("usernickname"); if (usernickname == null) {if (debuglog) System.out.println("*** line " + line + " MISSING value usernickname"); continue;} prop useravatarurl = props.get("useravatarurl"); if (useravatarurl == null) {if (debuglog) System.out.println("*** line " + line + " MISSING value useravatarurl"); continue;} prop userfullname = props.get("userfullname"); if (userfullname == null) {if (debuglog) System.out.println("*** line " + line + " MISSING value userfullname"); continue;} UserEntry user = new UserEntry( userid.value, usernickname.value, useravatarurl.value, MessageEntry.html2utf8(userfullname.value) ); ArrayList imgs = new ArrayList(images.size()); imgs.addAll(images); ArrayList vids = new ArrayList(videos.size()); vids.addAll(videos); prop tweettimems = props.get("tweettimems"); if (tweettimems == null) {if (debuglog) System.out.println("*** line " + line + " MISSING value tweettimems"); continue;} prop tweetretweetcount = props.get("tweetretweetcount"); if (tweetretweetcount == null) {if (debuglog) System.out.println("*** line " + line + " MISSING value tweetretweetcount"); continue;} prop tweetfavouritecount = props.get("tweetfavouritecount"); if (tweetfavouritecount == null) {if (debuglog) System.out.println("*** line " + line + " MISSING value tweetfavouritecount"); continue;} TwitterTweet tweet = new TwitterTweet( user.getScreenName(), Long.parseLong(tweettimems.value), props.get("tweettimename").value, props.get("tweetstatusurl").value, props.get("tweettext").value, Long.parseLong(tweetretweetcount.value), Long.parseLong(tweetfavouritecount.value), imgs, vids, place_name, place_id, user, writeToIndex, writeToBackend ); if (!DAO.messages.existsCache(tweet.getIdStr())) { // checking against the exist cache is incomplete. A false negative would just cause that a tweet is // indexed again. if (tweet.willBeTimeConsuming()) { executor.execute(tweet); //new Thread(tweet).start(); // because the executor may run the thread in the current thread it could be possible that the result is here already if (tweet.isReady()) { timelineReady.add(tweet, user); //DAO.log("SCRAPERTEST: messageINIT is ready"); } else { timelineWorking.add(tweet, user); //DAO.log("SCRAPERTEST: messageINIT unshortening"); } } else { // no additional thread needed, run the postprocessing in the current thread tweet.run(); timelineReady.add(tweet, user); } } images.clear(); props.clear(); continue; } } //for (prop p: props.values()) System.out.println(p); br.close(); return new Timeline[]{timelineReady, timelineWorking}; }

I suggest you go to Twitter’s Advanced Search page and search for some terms, and once you have the page loaded, check out its HTML, because we need to work with the tags.

This code is self-explanatory. Once you have the HTML result, it is fairly easy to check by inspection that between which tags is the data we need. We iterate through the code with the while loop, and then check the tags: for example – Images in the search result were stored between a <div class = "media media-thumbnail twitter-timeline-link media-forward is-preview"></div> tag, so we use indexOf to center on those tags and get the images. This is done for all the data we need: username, timestamp, likes count, retweets count, mentions count etc, every single thing that the Search Servlet of loklak shows.

So this is how the Social Media data is scraped, we have covered scraping using tools and manually, which are the most used methods anyway. In my next posts, I will talk about the rules for TwitterAnalysis Servlet, then Social Media Chat bots, and how Susi is integrated into them (especially in FB Messenger and Slack). Feedback is welcome 🙂

Social Media Analysis using Loklak (Part 2)

First Sprint: Susi’s chat interface

This blog post aims at sharing the details on how Susi got it’s custom interface. Susi is well trained with proper rules and sufficient data. But it’s lacking a makeover which can attract people to chat with her. So we give her that custom makeover as a chat interface with starter functionalities. We used a particular technology stack which was believed would make the chat process much simpler and flexible.

  • Handlebars – Why only handlebars? This templating framework helps you to reproduce the chat bubbles without much hassle. Two script blocks with embedded handlebar expressions will do it all. This ensures less front end code without any break in the bubbles template.

Selection_237

This above template is for displaying the user’s message from the send DOM.

Selection_238

This above template is used for binding the Susi’s response into the chat bubble. Every time the request is triggered when the user send’s out a query that is the chat message. For example Hi Susi and Susi responses back saying Hello!. So the response is triggered when the user types in the message and Susi queries using the following URL.

http://loklak.org/api/susi.json?q=Hi Susi

That is how the Template is being embedded into the interface. Along with that Susi is given a separate avatar or artwork and here is one.

screenshot

  • JQuery – We used JQuery for handling the requests and constantly query the Susi API for the answers. This handles the calls very swiftly and the response time is very quick.

Selection_239

In the above code snippet the request is being handled and the response is queried accordingly for the answer. For example, this is the sample JSON

Selection_241

The above JSON is the response when we type “Hello” into Susi’s chat interface. We track down to expression for the actual answer to be provided.

data.answers[0].actions[0].expression

The chat interface also provides other requirements like having keyboard events, scroll events, etc. This chat interface code can be viewed here at asksusi. This is still under development and further enhancements would be to port the telegram UI into it’s functionality.

 

 

First Sprint: Susi’s chat interface

The Making of the Console Service

SUSI , our very own personal digital assistant has been up and running giving quirky answers.

But behind all these are rules which train our cute bot to assist her and decide what answers to provide after parsing the question asked by the users.

The questions could range from any formal-informal greetings, general queries about name, weather, date, time to specific ones like details about some random Github profile or Tweets and Replies  from Twitter, or Weibo or election/football score predictions or simply asking her to read a RSS feed or a WordPress blog for you.

The rules for her training are written after that specific service is implemented which shall help her fetch the particular website/social network in question and scrape data out of it to present it to her operator.

And to help us expand the scope and ability of this naive being, it shall be helpful if users could extend her rule set. For this, it is required to make console service for sites which do not provide access to information without OAuth.

To begin with, let us see how a console service can be made.

Starting with a SampleService class which shall basically include the rudimentary scraper or code fetching the data is defined in the package org.loklak.api.search.
This is made by extending the AbstractAPIHandler class which itself extends the javax.servlet.http.HttpServlet class.
SampleService class further implements APIHandler class.

A placeholder for SampleService class can be as:


package org.loklak.api.search;

/**
* import statements
**/

public class SampleService extends AbstractAPIHandler 
    implements APIHandler{

    private static final long serialVersionUID = 2142441326498450416L;
    /**
     * serialVersionUID could be 
     * auto-generated by the IDE used
    **/

    @Override
    public String getAPIPath() {
        return "/api/service.json";
        /**
         *Choose API path for the service in question
        **/
    }

    @Override
    public BaseUserRole getMinimalBaseUserRole() {
        return BaseUserRole.ANONYMOUS;
    }

    @Override
    public JSONObject getDefaultPermissions(BaseUserRole baseUserRole) {
        return null;
    }

    @Override
    public JSONObject serviceImpl(Query call, HttpServletResponse response, 
        Authorization rights, JSONObjectWithDefault permissions) 
        throws APIException {

        String url = call.get("url", "");
        /**
         *This would extract the argument that will be supplied
         * to the "url" parameter in the "call"
        **/
        return crawlerForService(url);

    }

    public SusiThought crawlerForService(String url) {
        JSONArray arr = new JSONArray();
        
        /**
         * Crawler code or any other function which
         * returns a JSON Array **arr** goes in here 
        **/

        SusiThought json = new SusiThought();
        json.setData(arr);
        return json;
    }

}

 

The JSONArray in the key function crawlerForService is wrapped up in a SusiThought which is nothing but a piece of data that can be remembered. The structure or the thought can be modeled as a table which may be created using the retrieval of information from elsewhere of the current argument.

Now to implement it as a Console Service we include it in the ConsoleService class which is defined in the same package org.loklak.api.search and similarly extends AbstractAPIHandler class and implements APIHandler class.

Here, dbAccess is a static variable of the type SusiSkills where a skill is defined as the ability to inspire, to create thoughts from perception. The data structure of a skill set is a mapping from perception patterns to lambda expressions which induce thoughts.


package org.loklak.api.search;

/**
 * import statements go here
**/

public class ConsoleService extends AbstractAPIHandler 
    implements APIHandler {

    private static final long serialVersionUID = 8578478303032749879L;
    /**
     * serialVersionUID could be 
     * auto-generated by the IDE used
    **/

    @Override
    public BaseUserRole getMinimalBaseUserRole() { 
        return BaseUserRole.ANONYMOUS; 
    }

    @Override
    public JSONObject getDefaultPermissions(BaseUserRole baseUserRole) {
        return null;
    }

    public String getAPIPath() {
        return "/api/console.json";
    }

    public final static SusiSkills dbAccess = new SusiSkills();
        static {

            /**
             * Other "skills" are defined here
             * by "putting" them in "dbAccess"
            **/
    
    dbAccess.put(Pattern.compile("SELECT\\h+?(.*?)\\h+?FROM\\h+?
        sampleservice\\h+?WHERE\\h+?url\\h??=\\h??'(.*?)'\\h??;"), 
            (flow, matcher) -> {
                /**
                 * SusiThought-s are fetched from the Services
                 * implemented as above
                **/
                SusiThought json = SampleService.crawlerForService(matcher.group(2));
                SusiTransfer transfer = new SusiTransfer(matcher.group(1));
                json.setData(transfer.conclude(json.getData()));
                return json;
                });
    }

    @Override
    public JSONObject serviceImpl(Query post, HttpServletResponse response, 
        Authorization rights, final JSONObjectWithDefault permissions) 
        throws APIException {

            String q = post.get("q", "");
            /**
             *This would extract the argument that will be supplied
             * to the "q" parameter in the "post" query
            **/
            

            return dbAccess.inspire(q);
        }

}

 

Now that the console service is made, an API endpoint for the same can correspond to: http://localhost:9000/api/console.json?q=SELECT * FROM sampleservice WHERE url = ‘ … ‘;

The above can serve as a placeholder for creating Console Service which shall enable SUSI widen her horizon and become intelligent.

So, Go ahead and make Susi rules using it and you are done !

If any aid is required in making SUSI Rules, stay tuned for the next post.

Come, contribute to Loklak and SUSI !

The Making of the Console Service

Under the hood: HTTPS in Loklak

For some time now, loklak natively offers HTTPS support. On most one-click-deployments, that is not really necessary, as there’s usually a HTTP-proxy in front, which forwards all the trafic. This HTTP-proxies can then use their own HTTPS implementation. Also, Loklak is usually run with normal user previleges, which means it can’t open a socket on the normal HTTP and HTTPS ports, but only on ports greater than 1024.

Still, in some setups it might be desireable to not have an extra HTTP-proxy installed but still benefit from a secure connection. Espeacially for user-login and similar things.

These are the current options that can be set in conf/config.properties or data/settings/customized_config.properties:

https.mode=(off|on|redirect|only)
https.keysource=(keystore|key-cert)

keystore.name=keystore.jks
keystore.password=123456

https.key=/etc/ssl/private/loklak_key.pem
https.cert=/etc/ssl/certs/loklak_cert_full_chain.pem

The first setting has four option:

  1. off: the default. Only HTTP
  2. on: HTTP and HTTPS
  3. redirect: redirect all HTTP requests to HTTPS
  4. only: only HTTPS

The second lets us choose where to get our keys from. In java, the usual way is to use a key-store. That’s a file, protected by a password (the next two options). Buf if the keysource is set to “key-cert”, we can also use PEM-formated keys and certs, which is generally more common for non-java applications (like apache, nginx etc.).

If choosen, we specify .pem files (the last two options). If a whole certificate chain is required, all the certificates have to be in one file, just copied together.

Loklak will create a keystore from the .pem files using the bouncycastle library. It will not write it to disk. Here’s the code for that:

//generate random password
char[] chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789".toCharArray();
StringBuilder sb = new StringBuilder();
Random random = new Random();
for (int i = 0; i < 20; i++) {
    char c = chars[random.nextInt(chars.length)];
    sb.append(c);
}
String password = keystoreManagerPass = sb.toString();

//get key and cert
File keyFile = new File(DAO.getConfig("https.key", ""));
if(!keyFile.exists() || !keyFile.isFile() || !keyFile.canRead()){
   throw new Exception("Could not find key file");
}
File certFile = new File(DAO.getConfig("https.cert", ""));
if(!certFile.exists() || !certFile.isFile() || !certFile.canRead()){
   throw new Exception("Could not find cert file");
}

Security.addProvider(new org.bouncycastle.jce.provider.BouncyCastleProvider());

byte[] keyBytes = Files.readAllBytes(keyFile.toPath());
byte[] certBytes = Files.readAllBytes(certFile.toPath());

PEMParser parser = new PEMParser(new InputStreamReader(new ByteArrayInputStream(certBytes)));
X509Certificate cert = new JcaX509CertificateConverter().setProvider("BC").getCertificate((X509CertificateHolder) parser.readObject());

parser = new PEMParser(new InputStreamReader(new ByteArrayInputStream(keyBytes)));
PrivateKey key = new JcaPEMKeyConverter().setProvider("BC").getPrivateKey((PrivateKeyInfo) parser.readObject());

keyStore = KeyStore.getInstance(KeyStore.getDefaultType());
keyStore.load(null, null);

keyStore.setCertificateEntry(cert.getSubjectX500Principal().getName(), cert);
keyStore.setKeyEntry("defaultKey",key, password.toCharArray(), new Certificate[] {cert})

 

A last interesting option is:

httpsclient.trustselfsignedcerts=(none|peer|all)

Loklak is by default configured to trust any HTTPS-connection, even if the certificate is wrong. That was done so people behind HTTPS-proxies can still use Loklak.

But it is also possible to make Loklak honor certificates. If “none” is selected, it will behave like most applications: if the certificate is wrong, close the connection. But even then, it’s possible to import certificates system-wide. Loklak will then accept those connections.

It’s also possible to make Loklak work with peers with broken/self-signed certificates (so the connection is atleast not plain text) but still require good certificates from other sources (for example twitter). That’s the “peer” option.

Creating a HttpConnection in java that does not check the certificates is actually much more tricky than creating a save on. Here’s the code to create a connection manager cm that ignores certificates if you need one at some point:

boolean trustAllCerts = ...;
   
   Registry<ConnectionSocketFactory> socketFactoryRegistry = null;
   if(trustAllCerts){
   try {
      SSLConnectionSocketFactory trustSelfSignedSocketFactory = new SSLConnectionSocketFactory(
             new SSLContextBuilder().loadTrustMaterial(null, new TrustSelfSignedStrategy()).build(),
               new TrustAllHostNameVerifier());
   socketFactoryRegistry = RegistryBuilder
               .<ConnectionSocketFactory> create()
               .register("http", new PlainConnectionSocketFactory())
               .register("https", trustSelfSignedSocketFactory)
               .build();
} catch (KeyManagementException | NoSuchAlgorithmException | KeyStoreException e) {
   Log.getLog().warn(e);
}
   }
     
   PoolingHttpClientConnectionManager cm = (trustAllCerts && socketFactoryRegistry != null) ? 
          new PoolingHttpClientConnectionManager(socketFactoryRegistry):
          new PoolingHttpClientConnectionManager();
Under the hood: HTTPS in Loklak

Visualizing NMEA Datasets from GPS Tracking devices with Loklak

Loklak now supports the NMEA format and gives the developers access to the format in a much more friendlier JSON format which can be readily plugged in into the required map visualizer and usable on the web to create dashboards.

The stream URL is the data URL to which the GPS devices are streaming the data which need to be read by Loklak converted and then reused. For a given stream the response is as follows

{
"1": {
"lat": 0,
"long": 0,
"time": 0,
"Q": 0,
"dir": 0,
"alt": 0,
"vel": 0
},
"2": {
"lat": 0,
"long": 0,
"time": 0,
"Q": 0,
"dir": 0,
"alt": 0,
"vel": 0
},
"3": {
"lat": 0,
"long": 0,
"time": 0,
"Q": 0,
"dir": 0,
"alt": 0,
"vel": 0
},
"4": {
"lat": 0,
"long": 0,
"time": 0,
"Q": 0,
"dir": 0,
"alt": 0,
"vel": 0
},
"5": {
"lat": 0,
"long": 0,
"time": 0,
"Q": 0,
"dir": 0,
"alt": 0,
"vel": 0
},
"6": {
"lat": 0,
"long": 0,
"time": 0,
"Q": 0,
"dir": 0,
"alt": 0,
"vel": 0
}
}

We now need to visualize this information. Loklak now has a built in tool which can make this happen present here. You’ll see a screen which asks for the stream URL, Provide the URL where the GPS device is sending the information
NMEA App Loklak
This visualizes the information
Visualized points on Map
This is possible with the data format the NMEA.txt servlet is performing and the client side packaging of the objects while loading them onto a map.

function getTracking() {
var url = document.getElementById('url').value;
var cUrl = window.location.href;
var pName = window.location.pathname;
var baseUrl = cUrl.split(pName)[0]
var urlComplete = baseUrl+'/api/nmea.txt?stream='+url;
var centerlat = 52;
var centerlon = 0;
// set default zoom level
var zoomLevel = 2;
$.getJSON(urlComplete, function (data) {
var GPSObjects = [];
for (var key in data) {
var obj = data[key];
var latitudeObject, longitudeObject
for (var prop in obj) {
if (prop == 'lat') {
latitudeObject = obj[prop];
}
if (prop == 'long') {
longitudeObject = obj[prop];
}
}
var marker = L.marker([latitudeObject, longitudeObject]).addTo(map);
spiralCoords = connectTheDots(marker);
var spiralLine = L.polyline(spiralCoords).addTo(map)
}
});
}

A request is made to the required NMEA data stream and stored into the required latitude and longitude which is then each individually pushed onto the map.

Loklak is now slowly moving towards supporting multiple devices and visualizing the data that it obtains from the device data streams. The NMEA is a global standard for GPS and any tracking devices therefore loklak actively supports more than 10 devices Garmin 15, 15H etc.., from the Garmin series, Xexun 10X series and the Navbie series.

Visualizing NMEA Datasets from GPS Tracking devices with Loklak