Time across seven seas…

It has been rightly said:

Time is of your own making
Its clock ticks in your head.
The moment you stop thought
Time too stops dead.

loklak_org_sticker

Hence to keep up with evolving times, Loklak has now introduced a new service for “time”.

The recently developed API provides the current time and day at the location queried by the user.

The /api/locationwisetime.json API scrapes the results from timeanddate.com using our favourite JSoup as it provides a very convenient API for extracting and manipulating data, scrape and parse HTML from a given URL.

In case of multiple locations with the same name, countries are then also provided along-with corresponding day and time wrapped up as a JSONObject.

A sample query could then be something like: http://loklak.org/api/locationwisetime.json?query=london

Screenshot from 2016-08-17 14:28:28

 

When implemented as a console service, this API can be used along-with our our dear SUSI by utilising the API Endpoints like: http://loklak.org/api/console.json?q=SELECT * FROM locationwisetime WHERE query=’berlin’;

Screenshot from 2016-08-17 14:50:58

LocationWiseTimeService.java for reference:


/**
 *  Location Wise Time
 *  timeanddate.com scraper
 *  Copyright 27.07.2016 by Jigyasa Grover, @jig08
 *
 *  This library is free software; you can redistribute it and/or
 *  modify it under the terms of the GNU Lesser General Public
 *  License as published by the Free Software Foundation; either
 *  version 2.1 of the License, or (at your option) any later version.
 *  
 *  This library is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 *  Lesser General Public License for more details.
 *  
 *  You should have received a copy of the GNU Lesser General Public License
 *  along with this program in the file lgpl21.txt
 *  If not, see <http://www.gnu.org/licenses/>.
 */

package org.loklak.api.search;

import java.io.IOException;

import javax.servlet.http.HttpServletResponse;

import org.json.JSONArray;
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.loklak.server.APIException;
import org.loklak.server.APIHandler;
import org.loklak.server.AbstractAPIHandler;
import org.loklak.server.Authorization;
import org.loklak.server.BaseUserRole;
import org.loklak.server.Query;
import org.loklak.susi.SusiThought;
import org.loklak.tools.storage.JSONObjectWithDefault;

public class LocationWiseTimeService extends AbstractAPIHandler implements APIHandler {

	private static final long serialVersionUID = -1495493690406247295L;

	@Override
	public String getAPIPath() {
		return "/api/locationwisetime.json";
	}

	@Override
	public BaseUserRole getMinimalBaseUserRole() {
		return BaseUserRole.ANONYMOUS;

	}

	@Override
	public JSONObject getDefaultPermissions(BaseUserRole baseUserRole) {
		return null;
	}

	@Override
	public JSONObject serviceImpl(Query call, HttpServletResponse response, Authorization rights,
			JSONObjectWithDefault permissions) throws APIException {
		String query = call.get("query", "");
		return locationWiseTime(query);
	}

	public static SusiThought locationWiseTime(String query) {
		
		Document html = null;

		JSONArray arr = new JSONArray();

		try {
			html = Jsoup.connect("http://www.timeanddate.com/worldclock/results.html?query=" + query).get();
		} catch (IOException e) {
			e.printStackTrace();
		}

		Elements locations = html.select("td");
		int i = 0;
		for (Element e : locations) {
			if (i % 2 == 0) {
				JSONObject obj = new JSONObject();
				String l = e.getElementsByTag("a").text();
				obj.put("location", l);
				String t = e.nextElementSibling().text();
				obj.put("time", t);
				arr.put(obj);
			}
			i++;
		}
		
		SusiThought json = new SusiThought();
		json.setData(arr);
		return json;
	}

}

 

Hope this helps, and worth the “time” 😛

Feel free to ask questions regarding the above code snippet, shall be happy to assist.

Feedback and Suggestions welcome 🙂

Time across seven seas…

Welcoming Wiki GeoData to Loklak !

Loklak has grown vast with due course of time and it’s capabilities have extended manifold especially by the inclusion of sundry website scraper services and data provider services.

loklak_org_sticker

The recent addition includes a special service which would provide the user with a list of Wikipedia articles tagged with the location when supplied with a specific name of the place.

Thanks to the Media Wiki GeoData API, this service was smoothly integrated in the Loklak Server and SUSI (our very own cute and quirky personal digital assistant)

When the name of the place is sent in the query , firstly the home-grown API loklak.org/api/geocode.json was utilized to get the location co-ordinates i.e. Latitude and Longitude.


URL getCoordURL = null;

String path = "data={\"places\":[\"" + place + "\"]}";

try {
    getCoordURL = new URL("http://loklak.org/api/geocode.json?" + path);
} catch (MalformedURLException e) {
    e.printStackTrace();
}

JSONTokener tokener = null;
try {
    tokener = new JSONTokener(getCoordURL.openStream());
} catch (Exception e1) {
    e1.printStackTrace();
}

JSONObject obj = new JSONObject(tokener);

String longitude = obj.getJSONObject("locations").getJSONObject(place).getJSONArray("location").get(0)
				.toString();
String lattitude = obj.getJSONObject("locations").getJSONObject(place).getJSONArray("location").get(1)
				.toString();

The resultant geographical co-ordinates were then passed on to the Media Wiki GeoData API with other parameters like the radius of the geographical bound to be considered and format of the resultant data along-with the co-ordinates to obtain a list of Page IDs of the corresponding Wikipedia Articles besides Title and Distance.


URL getWikiURL = null;

try {
    getWikiURL = new URL(
                      "https://en.wikipedia.org/w/api.php?action=query&list=geosearch&gsradius=10000&gscoord=" + latitude
			+ "|" + longitude + "&format=json");
} catch (MalformedURLException e) {
    e.printStackTrace();
}

JSONTokener wikiTokener = null;

try {
    wikiTokener = new JSONTokener(getWikiURL.openStream());
} catch (Exception e1) {
    e1.printStackTrace();
}

JSONObject wikiGeoResult = new JSONObject(wikiTokener);

When implemented as a Console Service for Loklak, the servlet was registered as /api/wikigeodata.json?place={place-name} and the API endpoint for example goes like http://localhost:9000/api/console.json?q=SELECT * FROM wikigeodata WHERE place=’Singapore’;

Presto !!

We have a JSON Object as the result with a list of Wikipedia Articles as:

56976976-5e41-11e6-95b6-19e570099739

The Page-IDs thus obtained can now be utilized very easily to diaply the articles by using the placeholder https://en.wikipedia.org/?curid={pageid} for the purpose.

And this way, another facility was included in our diversifying Loklak server.

Questions, feedback, suggestions appreciated 🙂

Welcoming Wiki GeoData to Loklak !

User Information via Loklak API

While working on “adding loklak API support to  wordpress plugins”, I found that a lot plugins require Twitter user information e.g. profile_pic, followers_count, tweets by the user etc.

One can get an exhaustive user information using Loklak search and user APIs. In this blog post I would show how one can combine the results from Loklak’s Search and User API to provide all kinds of data required by a WordPress plugin (e.g.).

Starting with Search API!

Typing

http://loklak.org/api/search.json?q=from:KhoslaSopan

in url would give you results like this:

Selection_072

As you can see, it provides all tweet (tweeted by the <user>) related information and a small user profile like this:

Selection_073

But what if you require his location, followers_count, friends_count, no. of tweets he has posted till now etc, you would have to take help from the Loklak User API. Sample results for

http://loklak.org/api/user.json?screen_name=KhoslaSopan

are given below:

Selection_074

But somehow you also require the information of his followers and people he is following. How do you achieve that? Well Loklak API has provision for that as well. All you need to do is add followers=<count> or following=<count> as need be.

http://loklak.org/api/user.json?screen_name=KhoslaSopan&following=100&followers=100

Following data would be added to your user.json results.

Selection_075

It would give you the user information of twitter users followed by that user. Recursively you can get their tweet data, their followers and on and on!!

In order to get complete user profile, you can merge the results from Search and User API and create wonderful applications.

To do all this using Loklak PHP API, refer to the code snippet below.

$connection = new Loklak();
$tweets = $connection->search('', null, null, $username, $no_of_tweets);

$tweets = json_decode($tweets, true);
$tweets = json_decode($tweets['body'], true);

$user = $connection->user($username);
$user = json_decode($user, true);
$user = json_decode($user['body'], true);

$tweet_content = $tweets['statuses'];
for ($i=0; $i < sizeof($tweet_content); $i++) {
     $tweet_content[$i] = array_merge($tweet_content[$i], $user);
}
User Information via Loklak API

Loklak fuels Open Event

A general background building….

The FOSSASIA Open Event Project aims to make it easier for events, conferences, tech summits to easily create Web and Mobile (only Android currently) micro Apps. The project comprises of a data schema for easily storing event details, a server and web front-end that are used to view, modify, update this data easily by the event organizers, a mobile-friendly web-app client to show the event data to attendees, an Android app template which will be used to generate specific apps for each event.

And Eventbrite is the world’s largest self-service ticketing platform. It allows anyone to create, share and find events comprising music festivals, marathons, conferences, hackathons, air guitar contests, political rallies, fundraisers, gaming competitions etc.

Kaboom !

Loklak now has a dedicated Eventbrite scraper API which takes in the URL of the event listing on eventbrite.com and outputs JSON Files as required by the Open Event Generator viz: events.json, organizer.json, user.json, microlocations.json, sessions.json, session_types.json, tracks.json, sponsors.json, speakers.json, social _links.json and custom_forms.json (details: Open Event Server : API Documentation)

What do we differently do than using the Eventbrite API  ? No authentication tokens required. This gels in perfectly with the Loklak missive.

To achieve this, I have simply parsed the HTML Pages using my favorite JSoup: The Java HTML parser library because it provides a very convenient API for extracting and manipulating data, scrape and parse all varieties of HTML from a URL.

The API call format is as: http://loklak.org/api/eventbritecrawler.json?url=https://www.eventbrite.com/[event-name-and-id]

And in return we get all the details on the Eventbrite page as JSONObject and also it gets stored in differently named files in a zipped folder [userHome + “/Downloads/EventBriteInfo”]

Example:

Event URL: https://www.eventbrite.de/e/global-health-security-focus-africa-tickets-25740798421
Screenshot from 2016-07-04 07:04:38

API Call: 
http://loklak.org/api/eventbritecrawler.json?url=https://www.eventbrite.de/e/global-health-security-focus-africa-tickets-25740798421

Output: JSON Object on screen andevents.json, organizer.json, user.json, microlocations.json, sessions.json, session_types.json, tracks.json, sponsors.json, speakers.json, social _links.json and custom_forms.json files written out in a zipped folder locally.

Screenshot from 2016-07-04 07:05:16
Screenshot from 2016-07-04 07:57:00


For reference, the code is as:

/**
 *  Eventbrite.com Crawler v2.0
 *  Copyright 19.06.2016 by Jigyasa Grover, @jig08
 *
 *  This library is free software; you can redistribute it and/or
 *  modify it under the terms of the GNU Lesser General Public
 *  License as published by the Free Software Foundation; either
 *  version 2.1 of the License, or (at your option) any later version.
 *  
 *  This library is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 *  Lesser General Public License for more details.
 *  
 *  You should have received a copy of the GNU Lesser General Public License
 *  along with this program in the file lgpl21.txt
 *  If not, see http://www.gnu.org/licenses/.
 */

package org.loklak.api.search;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.json.JSONArray;
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.loklak.http.RemoteAccess;
import org.loklak.server.Query;

public class EventbriteCrawler extends HttpServlet {

	private static final long serialVersionUID = 5216519528576842483L;

	@Override
	protected void doPost(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException {
		doGet(request, response);
	}

	@Override
	protected void doGet(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException {
		Query post = RemoteAccess.evaluate(request);

		// manage DoS
		if (post.isDoS_blackout()) {
			response.sendError(503, "your request frequency is too high");
			return;
		}

		String url = post.get("url", "");

		Document htmlPage = null;

		try {
			htmlPage = Jsoup.connect(url).get();
		} catch (Exception e) {
			e.printStackTrace();
		}

		String eventID = null;
		String eventName = null;
		String eventDescription = null;

		// TODO Fetch Event Color
		String eventColor = null;

		String imageLink = null;

		String eventLocation = null;

		String startingTime = null;
		String endingTime = null;

		String ticketURL = null;

		Elements tagSection = null;
		Elements tagSpan = null;
		String[][] tags = new String[5][2];
		String topic = null; // By default

		String closingDateTime = null;
		String schedulePublishedOn = null;
		JSONObject creator = new JSONObject();
		String email = null;

		Float latitude = null;
		Float longitude = null;

		String privacy = "public"; // By Default
		String state = "completed"; // By Default
		String eventType = "";

		eventID = htmlPage.getElementsByTag("body").attr("data-event-id");
		eventName = htmlPage.getElementsByClass("listing-hero-body").text();
		eventDescription = htmlPage.select("div.js-xd-read-more-toggle-view.read-more__toggle-view").text();

		eventColor = null;

		imageLink = htmlPage.getElementsByTag("picture").attr("content");

		eventLocation = htmlPage.select("p.listing-map-card-street-address.text-default").text();
		startingTime = htmlPage.getElementsByAttributeValue("property", "event:start_time").attr("content").substring(0,
				19);
		endingTime = htmlPage.getElementsByAttributeValue("property", "event:end_time").attr("content").substring(0,
				19);

		ticketURL = url + "#tickets";

		// TODO Tags to be modified to fit in the format of Open Event "topic"
		tagSection = htmlPage.getElementsByAttributeValue("data-automation", "ListingsBreadcrumbs");
		tagSpan = tagSection.select("span");
		topic = "";

		int iterator = 0, k = 0;
		for (Element e : tagSpan) {
			if (iterator % 2 == 0) {
				tags[k][1] = "www.eventbrite.com"
						+ e.select("a.js-d-track-link.badge.badge--tag.l-mar-top-2").attr("href");
			} else {
				tags[k][0] = e.text();
				k++;
			}
			iterator++;
		}

		creator.put("email", "");
		creator.put("id", "1"); // By Default

		latitude = Float
				.valueOf(htmlPage.getElementsByAttributeValue("property", "event:location:latitude").attr("content"));
		longitude = Float
				.valueOf(htmlPage.getElementsByAttributeValue("property", "event:location:longitude").attr("content"));

		// TODO This returns: "events.event" which is not supported by Open
		// Event Generator
		// eventType = htmlPage.getElementsByAttributeValue("property",
		// "og:type").attr("content");

		String organizerName = null;
		String organizerLink = null;
		String organizerProfileLink = null;
		String organizerWebsite = null;
		String organizerContactInfo = null;
		String organizerDescription = null;
		String organizerFacebookFeedLink = null;
		String organizerTwitterFeedLink = null;
		String organizerFacebookAccountLink = null;
		String organizerTwitterAccountLink = null;

		organizerName = htmlPage.select("a.js-d-scroll-to.listing-organizer-name.text-default").text().substring(4);
		organizerLink = url + "#listing-organizer";
		organizerProfileLink = htmlPage
				.getElementsByAttributeValue("class", "js-follow js-follow-target follow-me fx--fade-in is-hidden")
				.attr("href");
		organizerContactInfo = url + "#lightbox_contact";

		Document orgProfilePage = null;

		try {
			orgProfilePage = Jsoup.connect(organizerProfileLink).get();
		} catch (Exception e) {
			e.printStackTrace();
		}

		organizerWebsite = orgProfilePage.getElementsByAttributeValue("class", "l-pad-vert-1 organizer-website").text();
		organizerDescription = orgProfilePage.select("div.js-long-text.organizer-description").text();
		organizerFacebookFeedLink = organizerProfileLink + "#facebook_feed";
		organizerTwitterFeedLink = organizerProfileLink + "#twitter_feed";
		organizerFacebookAccountLink = orgProfilePage.getElementsByAttributeValue("class", "fb-page").attr("data-href");
		organizerTwitterAccountLink = orgProfilePage.getElementsByAttributeValue("class", "twitter-timeline")
				.attr("href");

		JSONArray socialLinks = new JSONArray();

		JSONObject fb = new JSONObject();
		fb.put("id", "1");
		fb.put("name", "Facebook");
		fb.put("link", organizerFacebookAccountLink);
		socialLinks.put(fb);

		JSONObject tw = new JSONObject();
		tw.put("id", "2");
		tw.put("name", "Twitter");
		tw.put("link", organizerTwitterAccountLink);
		socialLinks.put(tw);

		JSONArray jsonArray = new JSONArray();

		JSONObject event = new JSONObject();
		event.put("event_url", url);
		event.put("id", eventID);
		event.put("name", eventName);
		event.put("description", eventDescription);
		event.put("color", eventColor);
		event.put("background_url", imageLink);
		event.put("closing_datetime", closingDateTime);
		event.put("creator", creator);
		event.put("email", email);
		event.put("location_name", eventLocation);
		event.put("latitude", latitude);
		event.put("longitude", longitude);
		event.put("start_time", startingTime);
		event.put("end_time", endingTime);
		event.put("logo", imageLink);
		event.put("organizer_description", organizerDescription);
		event.put("organizer_name", organizerName);
		event.put("privacy", privacy);
		event.put("schedule_published_on", schedulePublishedOn);
		event.put("state", state);
		event.put("type", eventType);
		event.put("ticket_url", ticketURL);
		event.put("social_links", socialLinks);
		event.put("topic", topic);
		jsonArray.put(event);

		JSONObject org = new JSONObject();
		org.put("organizer_name", organizerName);
		org.put("organizer_link", organizerLink);
		org.put("organizer_profile_link", organizerProfileLink);
		org.put("organizer_website", organizerWebsite);
		org.put("organizer_contact_info", organizerContactInfo);
		org.put("organizer_description", organizerDescription);
		org.put("organizer_facebook_feed_link", organizerFacebookFeedLink);
		org.put("organizer_twitter_feed_link", organizerTwitterFeedLink);
		org.put("organizer_facebook_account_link", organizerFacebookAccountLink);
		org.put("organizer_twitter_account_link", organizerTwitterAccountLink);
		jsonArray.put(org);

		JSONArray microlocations = new JSONArray();
		jsonArray.put(microlocations);

		JSONArray customForms = new JSONArray();
		jsonArray.put(customForms);

		JSONArray sessionTypes = new JSONArray();
		jsonArray.put(sessionTypes);

		JSONArray sessions = new JSONArray();
		jsonArray.put(sessions);

		JSONArray sponsors = new JSONArray();
		jsonArray.put(sponsors);

		JSONArray speakers = new JSONArray();
		jsonArray.put(speakers);

		JSONArray tracks = new JSONArray();
		jsonArray.put(tracks);

		JSONObject eventBriteResult = new JSONObject();
		eventBriteResult.put("Event Brite Event Details", jsonArray);

		// print JSON
		response.setCharacterEncoding("UTF-8");
		PrintWriter sos = response.getWriter();
		sos.print(eventBriteResult.toString(2));
		sos.println();

		String userHome = System.getProperty("user.home");
		String path = userHome + "/Downloads/EventBriteInfo";

		new File(path).mkdir();

		try (FileWriter file = new FileWriter(path + "/event.json")) {
			file.write(event.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/org.json")) {
			file.write(org.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/social_links.json")) {
			file.write(socialLinks.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/microlocations.json")) {
			file.write(microlocations.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/custom_forms.json")) {
			file.write(customForms.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/session_types.json")) {
			file.write(sessionTypes.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/sessions.json")) {
			file.write(sessions.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/sponsors.json")) {
			file.write(sponsors.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/speakers.json")) {
			file.write(speakers.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/tracks.json")) {
			file.write(tracks.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try {
			zipFolder(path, userHome + "/Downloads");
		} catch (Exception e1) {
			e1.printStackTrace();
		}

	}

	static public void zipFolder(String srcFolder, String destZipFile) throws Exception {
		ZipOutputStream zip = null;
		FileOutputStream fileWriter = null;
		fileWriter = new FileOutputStream(destZipFile);
		zip = new ZipOutputStream(fileWriter);
		addFolderToZip("", srcFolder, zip);
		zip.flush();
		zip.close();
	}

	static private void addFileToZip(String path, String srcFile, ZipOutputStream zip) throws Exception {
		File folder = new File(srcFile);
		if (folder.isDirectory()) {
			addFolderToZip(path, srcFile, zip);
		} else {
			byte[] buf = new byte[1024];
			int len;
			FileInputStream in = new FileInputStream(srcFile);
			zip.putNextEntry(new ZipEntry(path + "/" + folder.getName()));
			while ((len = in.read(buf)) > 0) {
				zip.write(buf, 0, len);
			}
			in.close();
		}
	}

	static private void addFolderToZip(String path, String srcFolder, ZipOutputStream zip) throws Exception {
		File folder = new File(srcFolder);

		for (String fileName : folder.list()) {
			if (path.equals("")) {
				addFileToZip(folder.getName(), srcFolder + "/" + fileName, zip);
			} else {
				addFileToZip(path + "/" + folder.getName(), srcFolder + "/" + fileName, zip);
			}
		}
	}

}

Check out https://github.com/loklak/loklak_server for more…


 

Feel free to ask questions regarding the above code snippet.

Also, Stay tuned for the next part of this post which shall include using the scraped information for Open Event.

Feedback and Suggestions welcome 🙂

Loklak fuels Open Event

Loklak ShuoShuo: Another feather in the cap !

Work is still going on Loklak Weibo to extract the information as desired and there shall be another post up soon explaining the intricacies of implementation.

Currently, an attempt was made to parse the HTML page of QQShuoShuo.com  (another Chinese twitter like service)

Screenshot from 2016-06-05 22:28:18Screenshot from 2016-06-05 22:28:25
Just like last time, The major challenge however is to understand the Chinese annotations especially being from a non-Chinese background. Google translate aids testing the retrieved data by helping me match each phrase or/and line.

I have made use of of  JSoup: The Java HTML parser library which assists in extracting and manipulating data, scrape and parse HTML from the URL. The suggested use of JSoup is designed to deal with all varieties of HTML, hence as of now it is being considered a suitable choice.

Screenshot from 2016-06-05 22:32:53

/**
 *  Shuoshuo Crawler
 *  By Jigyasa Grover, @jig08
 **/

package org.loklak.harvester;

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.*;
import org.jsoup.select.Elements;

public class ShuoshuoCrawler {
    public static void main(String args[]){

        Document shuoshuoHTML = null;
        Element recommendedTalkBox = null;
        Elements recommendedTalksList = null;
        String recommendedTalksResult[] = new String[100];
        Integer numberOfrecommendedTalks = 0;
        Integer i = 0;

        try {
            shuoshuoHTML = Jsoup.connect("http://www.qqshuoshuo.com/").get();

            recommendedTalkBox = shuoshuoHTML.getElementById("list2");
            recommendedTalksList = recommendedTalkBox.getElementsByTag("li");

            i=0;
            for (Element recommendedTalks : recommendedTalksList)
            {
                //System.out.println("\nLine: " + recommendedTalks.text());
                recommendedTalksResult[i] = recommendedTalks.text().toString();
                i++;
            }           
            numberOfrecommendedTalks = i;
            System.out.println("Total Recommended Talks: " + numberOfrecommendedTalks);
            for(int k=0; k<numberOfrecommendedTalks; k++){
                System.out.println("Recommended Talk " + k + ": " + recommendedTalksResult[k]);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

    }
}

QQ Recommended Talks from qqshuoshuo.com are now stored as an array of Strings.

Total Recommended Talks: 10
Recommended Talk 0: 不会在意无视我的人,不会忘记帮助过我的人,不会去恨真心爱过我的人。
Recommended Talk 1: 喜欢一个人是一种感觉,不喜欢一个人却是事实。事实容易解释,感觉却难以言喻。
Recommended Talk 2: 一个人容易从别人的世界走出来却走不出自己的沙漠
Recommended Talk 3: 有什么了不起,不就是幸福在左边,我站在了右边?
Recommended Talk 4: 希望我跟你的爱,就像新闻联播一样没有大结局
Recommended Talk 5: 你会遇到别的女子和她举案齐眉,而我自会有别的男子与我白首相携。
Recommended Talk 6: 既然爱,为什么不说出口,有些东西失去了,就再也回不来了!
Recommended Talk 7: 凡事都有可能,永远别说永远。
Recommended Talk 8: 都是因为爱,而喜欢上了怀旧;都是因为你,而喜欢上了怀念。
Recommended Talk 9: 爱是老去,爱是新生,爱是一切,爱是你。

A similar approach can be now used to do the same for Latest QQ talk and QQ talk Leaderboard.


Check out this space for upcoming detail on implementing this technique to parse the entire page and get desired results…

Feedback and Suggestions welcome.

Loklak ShuoShuo: Another feather in the cap !

Search vs Aggregations

    The search aggregations also know as facets(Field aggregations) are often ignored while using the Search API provided by Loklak. This post aims at clearing up the difference and bring out the clear usage. This extends the previous blog, titled “Know who tweeted the most!” where we discussed about the Search API. Using the same example, the results were viewed using the search aggregations. The following two pictures illustrates the difference.

Selection_074

Selection_058

    The above results are achieved by scraping the aggregation field called mentions, which contains the list of user names who mentioned the search hashtag in their tweets. The result is from the one billion tweets(~1,377,367,341) which Loklak had scraped and indexed. The following screenshots show the JSON object of the search aggregations.

Selection_076

    The above search metadata object shows some important fields which can alter the result. The count is set to zero and the cache is disabled so that the aggregations do not consider the remote search results. The query fields shows the constraints on which the query must be performed. It shows the hashtag for which it must query. Since the search aggregations are going to query the whole Loklak index(over one billion tweets) we can limit the result payload by giving time constraints such as “since” and “until” fields. We can even limit the result display by mentioning a value to the limit field. This helps you to form a search query which gives you a much accurate result. The following is a sample query where you can check out the JSON result. Edit the “q” variable to your search requirement and the time fields too.

http://loklak.org/api/search.json?q=loklak since:2016-01-01 until:2016-03-01&source=cache&count=0&fields=mentions,hashtags

The following shows the aggregations field and the benefits you can get out of it.

Selection_078

    The aggregations field consists of the hashtags and the mentions. The mentions contains all user names denoted by a ‘@’-prefix from the message, which is being listed without the leading ‘@’. The hashtags contains all hashtags appearing in the message, without the leading ‘#’. You have more fields which you can mention in the query according to your need such as limit, source, timezoneOffset, minified etc. For more info on the search aggregations visit the docs page.

You can even visit to the app “knowTheDiff” which guides you through the difference with step by step intro.

Selection_075

The app is still under construction and needs to get more fields into the picture, but can try out the basic difference. Here are some sample queries where you can try out.

Search vs Aggregations

Loklak Weibo: Now going beyond Twitter !

As of now, Loklak has done a wonderful job in collecting billion(s) of tweets especially from Twitter. The highlight of this service has been anonymous search of Twitter without the use of any authentication key.

The next step is to go beyond Twitter and collect data from Chinese twitter like services for instance Weibo.com.

Screenshot from 2016-06-02 01:39:19.png

The major challenge however is to understand the Chinese annotations especially being from a non-Chinese background, but now that I have support of a Chinese friend from the Loklak community,  we hope it shall be an easier task to achieve now 🙂

The trick shall be simple, to parse the HTML page of the search results. It has been suggested to use the JSoup: The Java HTML parser library in the loklak_server. It provides a very convenient API for extracting and manipulating data, scrape and parse HTML from a URL. The suggested use of JSoup is designed to deal with all varieties of HTML, hence as of now it is being considered a suitable choice.

In our approach, the HTML page generated by the search query http://s.weibo.com/weibo/<search-string> shall be parsed instead of going the traditional way using the API call by authenticating via the key.

Screenshot from 2016-06-02 01:48:34.png
Sample code snippet to extract the title of the page:

String q = "Berlin";
//Get the Search Query in "q" here
Document doc = null;
String title = null;
try {
doc = Jsoup.connect("http://s.weibo.com/weibo/"+q).get();
title = doc.title();
} catch (IOException e) {
e.printStackTrace();
}

Check out this space for upcoming detail on implementing this technique to parse the entire page and get desired results…

Feedback and Suggestions welcome.

Loklak Weibo: Now going beyond Twitter !

Know who tweeted the most!

It is obvious that all of us are curious to find out who is leading! My app, which is called ‘Twitter Leaderboard uses the search API provided by ‘Loklak’. Using my app you can collect the top five tweeters whose tweets contain the mentioned hashtag.

So here is a simple explanation of how ‘Twitter Leaderboard’ works. I built this app using AngularJS and the search API provided by Loklak. Primarily, I collected the tweets’ data and parse it for the required fields.

The leader board needed to have the tweeter’s name along with their tweet count. Also ‘Twitter Leaderboard’ needed to allow its users to search for a tweet based on the hashtag supplied to it in a query. 

I wrote a small controller to make a HTTP request to the Loklak search API, and parse the retrieved object accordingly.  I parsed the required data in the following manner:

  • data.statuses //The status object containing the requested hashtag.
  • user = statuses[i].screen_name // Get all the screen names.
  • counts = {} //Counts object to store the number of tweets count for each name.

After parsing for the required data from the search results, we have to sort the names according to the tweets count.  And here we go the results are pushed to the front end and displayed.

Selection_058

Another way to improve the search result is by using the Search Aggregations. This feature helps you to get the count results in a better by scaling up the search result. It will help you to count up all the tweets which are indexed(More than a billion tweets). This feature will not consider remote search results. Therefore the remote search is switched off by “source=cache” and the search results are switched off with “count=0”. The request for aggregation is simply hidden in “fields=…” which names the fields where you want to have an aggregation. You can even reduce the result size by restricting your search to certain period of time.

The upgrade to the above app will be provided soon using the Aggregations.

Know who tweeted the most!