Welcoming Wiki GeoData to Loklak !

Loklak has grown vast with due course of time and it’s capabilities have extended manifold especially by the inclusion of sundry website scraper services and data provider services.


The recent addition includes a special service which would provide the user with a list of Wikipedia articles tagged with the location when supplied with a specific name of the place.

Thanks to the Media Wiki GeoData API, this service was smoothly integrated in the Loklak Server and SUSI (our very own cute and quirky personal digital assistant)

When the name of the place is sent in the query , firstly the home-grown API loklak.org/api/geocode.json was utilized to get the location co-ordinates i.e. Latitude and Longitude.

URL getCoordURL = null;

String path = "data={\"places\":[\"" + place + "\"]}";

try {
    getCoordURL = new URL("http://loklak.org/api/geocode.json?" + path);
} catch (MalformedURLException e) {

JSONTokener tokener = null;
try {
    tokener = new JSONTokener(getCoordURL.openStream());
} catch (Exception e1) {

JSONObject obj = new JSONObject(tokener);

String longitude = obj.getJSONObject("locations").getJSONObject(place).getJSONArray("location").get(0)
String lattitude = obj.getJSONObject("locations").getJSONObject(place).getJSONArray("location").get(1)

The resultant geographical co-ordinates were then passed on to the Media Wiki GeoData API with other parameters like the radius of the geographical bound to be considered and format of the resultant data along-with the co-ordinates to obtain a list of Page IDs of the corresponding Wikipedia Articles besides Title and Distance.

URL getWikiURL = null;

try {
    getWikiURL = new URL(
                      "https://en.wikipedia.org/w/api.php?action=query&list=geosearch&gsradius=10000&gscoord=" + latitude
			+ "|" + longitude + "&format=json");
} catch (MalformedURLException e) {

JSONTokener wikiTokener = null;

try {
    wikiTokener = new JSONTokener(getWikiURL.openStream());
} catch (Exception e1) {

JSONObject wikiGeoResult = new JSONObject(wikiTokener);

When implemented as a Console Service for Loklak, the servlet was registered as /api/wikigeodata.json?place={place-name} and the API endpoint for example goes like http://localhost:9000/api/console.json?q=SELECT * FROM wikigeodata WHERE place=’Singapore’;

Presto !!

We have a JSON Object as the result with a list of Wikipedia Articles as:


The Page-IDs thus obtained can now be utilized very easily to diaply the articles by using the placeholder https://en.wikipedia.org/?curid={pageid} for the purpose.

And this way, another facility was included in our diversifying Loklak server.

Questions, feedback, suggestions appreciated 🙂

Welcoming Wiki GeoData to Loklak !

Loklak fuels Open Event

A general background building….

The FOSSASIA Open Event Project aims to make it easier for events, conferences, tech summits to easily create Web and Mobile (only Android currently) micro Apps. The project comprises of a data schema for easily storing event details, a server and web front-end that are used to view, modify, update this data easily by the event organizers, a mobile-friendly web-app client to show the event data to attendees, an Android app template which will be used to generate specific apps for each event.

And Eventbrite is the world’s largest self-service ticketing platform. It allows anyone to create, share and find events comprising music festivals, marathons, conferences, hackathons, air guitar contests, political rallies, fundraisers, gaming competitions etc.

Kaboom !

Loklak now has a dedicated Eventbrite scraper API which takes in the URL of the event listing on eventbrite.com and outputs JSON Files as required by the Open Event Generator viz: events.json, organizer.json, user.json, microlocations.json, sessions.json, session_types.json, tracks.json, sponsors.json, speakers.json, social _links.json and custom_forms.json (details: Open Event Server : API Documentation)

What do we differently do than using the Eventbrite API  ? No authentication tokens required. This gels in perfectly with the Loklak missive.

To achieve this, I have simply parsed the HTML Pages using my favorite JSoup: The Java HTML parser library because it provides a very convenient API for extracting and manipulating data, scrape and parse all varieties of HTML from a URL.

The API call format is as: http://loklak.org/api/eventbritecrawler.json?url=https://www.eventbrite.com/[event-name-and-id]

And in return we get all the details on the Eventbrite page as JSONObject and also it gets stored in differently named files in a zipped folder [userHome + “/Downloads/EventBriteInfo”]


Event URL: https://www.eventbrite.de/e/global-health-security-focus-africa-tickets-25740798421
Screenshot from 2016-07-04 07:04:38

API Call: 

Output: JSON Object on screen andevents.json, organizer.json, user.json, microlocations.json, sessions.json, session_types.json, tracks.json, sponsors.json, speakers.json, social _links.json and custom_forms.json files written out in a zipped folder locally.

Screenshot from 2016-07-04 07:05:16
Screenshot from 2016-07-04 07:57:00

For reference, the code is as:

 *  Eventbrite.com Crawler v2.0
 *  Copyright 19.06.2016 by Jigyasa Grover, @jig08
 *  This library is free software; you can redistribute it and/or
 *  modify it under the terms of the GNU Lesser General Public
 *  License as published by the Free Software Foundation; either
 *  version 2.1 of the License, or (at your option) any later version.
 *  This library is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  Lesser General Public License for more details.
 *  You should have received a copy of the GNU Lesser General Public License
 *  along with this program in the file lgpl21.txt
 *  If not, see http://www.gnu.org/licenses/.

package org.loklak.api.search;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.json.JSONArray;
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.loklak.http.RemoteAccess;
import org.loklak.server.Query;

public class EventbriteCrawler extends HttpServlet {

	private static final long serialVersionUID = 5216519528576842483L;

	protected void doPost(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException {
		doGet(request, response);

	protected void doGet(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException {
		Query post = RemoteAccess.evaluate(request);

		// manage DoS
		if (post.isDoS_blackout()) {
			response.sendError(503, "your request frequency is too high");

		String url = post.get("url", "");

		Document htmlPage = null;

		try {
			htmlPage = Jsoup.connect(url).get();
		} catch (Exception e) {

		String eventID = null;
		String eventName = null;
		String eventDescription = null;

		// TODO Fetch Event Color
		String eventColor = null;

		String imageLink = null;

		String eventLocation = null;

		String startingTime = null;
		String endingTime = null;

		String ticketURL = null;

		Elements tagSection = null;
		Elements tagSpan = null;
		String[][] tags = new String[5][2];
		String topic = null; // By default

		String closingDateTime = null;
		String schedulePublishedOn = null;
		JSONObject creator = new JSONObject();
		String email = null;

		Float latitude = null;
		Float longitude = null;

		String privacy = "public"; // By Default
		String state = "completed"; // By Default
		String eventType = "";

		eventID = htmlPage.getElementsByTag("body").attr("data-event-id");
		eventName = htmlPage.getElementsByClass("listing-hero-body").text();
		eventDescription = htmlPage.select("div.js-xd-read-more-toggle-view.read-more__toggle-view").text();

		eventColor = null;

		imageLink = htmlPage.getElementsByTag("picture").attr("content");

		eventLocation = htmlPage.select("p.listing-map-card-street-address.text-default").text();
		startingTime = htmlPage.getElementsByAttributeValue("property", "event:start_time").attr("content").substring(0,
		endingTime = htmlPage.getElementsByAttributeValue("property", "event:end_time").attr("content").substring(0,

		ticketURL = url + "#tickets";

		// TODO Tags to be modified to fit in the format of Open Event "topic"
		tagSection = htmlPage.getElementsByAttributeValue("data-automation", "ListingsBreadcrumbs");
		tagSpan = tagSection.select("span");
		topic = "";

		int iterator = 0, k = 0;
		for (Element e : tagSpan) {
			if (iterator % 2 == 0) {
				tags[k][1] = "www.eventbrite.com"
						+ e.select("a.js-d-track-link.badge.badge--tag.l-mar-top-2").attr("href");
			} else {
				tags[k][0] = e.text();

		creator.put("email", "");
		creator.put("id", "1"); // By Default

		latitude = Float
				.valueOf(htmlPage.getElementsByAttributeValue("property", "event:location:latitude").attr("content"));
		longitude = Float
				.valueOf(htmlPage.getElementsByAttributeValue("property", "event:location:longitude").attr("content"));

		// TODO This returns: "events.event" which is not supported by Open
		// Event Generator
		// eventType = htmlPage.getElementsByAttributeValue("property",
		// "og:type").attr("content");

		String organizerName = null;
		String organizerLink = null;
		String organizerProfileLink = null;
		String organizerWebsite = null;
		String organizerContactInfo = null;
		String organizerDescription = null;
		String organizerFacebookFeedLink = null;
		String organizerTwitterFeedLink = null;
		String organizerFacebookAccountLink = null;
		String organizerTwitterAccountLink = null;

		organizerName = htmlPage.select("a.js-d-scroll-to.listing-organizer-name.text-default").text().substring(4);
		organizerLink = url + "#listing-organizer";
		organizerProfileLink = htmlPage
				.getElementsByAttributeValue("class", "js-follow js-follow-target follow-me fx--fade-in is-hidden")
		organizerContactInfo = url + "#lightbox_contact";

		Document orgProfilePage = null;

		try {
			orgProfilePage = Jsoup.connect(organizerProfileLink).get();
		} catch (Exception e) {

		organizerWebsite = orgProfilePage.getElementsByAttributeValue("class", "l-pad-vert-1 organizer-website").text();
		organizerDescription = orgProfilePage.select("div.js-long-text.organizer-description").text();
		organizerFacebookFeedLink = organizerProfileLink + "#facebook_feed";
		organizerTwitterFeedLink = organizerProfileLink + "#twitter_feed";
		organizerFacebookAccountLink = orgProfilePage.getElementsByAttributeValue("class", "fb-page").attr("data-href");
		organizerTwitterAccountLink = orgProfilePage.getElementsByAttributeValue("class", "twitter-timeline")

		JSONArray socialLinks = new JSONArray();

		JSONObject fb = new JSONObject();
		fb.put("id", "1");
		fb.put("name", "Facebook");
		fb.put("link", organizerFacebookAccountLink);

		JSONObject tw = new JSONObject();
		tw.put("id", "2");
		tw.put("name", "Twitter");
		tw.put("link", organizerTwitterAccountLink);

		JSONArray jsonArray = new JSONArray();

		JSONObject event = new JSONObject();
		event.put("event_url", url);
		event.put("id", eventID);
		event.put("name", eventName);
		event.put("description", eventDescription);
		event.put("color", eventColor);
		event.put("background_url", imageLink);
		event.put("closing_datetime", closingDateTime);
		event.put("creator", creator);
		event.put("email", email);
		event.put("location_name", eventLocation);
		event.put("latitude", latitude);
		event.put("longitude", longitude);
		event.put("start_time", startingTime);
		event.put("end_time", endingTime);
		event.put("logo", imageLink);
		event.put("organizer_description", organizerDescription);
		event.put("organizer_name", organizerName);
		event.put("privacy", privacy);
		event.put("schedule_published_on", schedulePublishedOn);
		event.put("state", state);
		event.put("type", eventType);
		event.put("ticket_url", ticketURL);
		event.put("social_links", socialLinks);
		event.put("topic", topic);

		JSONObject org = new JSONObject();
		org.put("organizer_name", organizerName);
		org.put("organizer_link", organizerLink);
		org.put("organizer_profile_link", organizerProfileLink);
		org.put("organizer_website", organizerWebsite);
		org.put("organizer_contact_info", organizerContactInfo);
		org.put("organizer_description", organizerDescription);
		org.put("organizer_facebook_feed_link", organizerFacebookFeedLink);
		org.put("organizer_twitter_feed_link", organizerTwitterFeedLink);
		org.put("organizer_facebook_account_link", organizerFacebookAccountLink);
		org.put("organizer_twitter_account_link", organizerTwitterAccountLink);

		JSONArray microlocations = new JSONArray();

		JSONArray customForms = new JSONArray();

		JSONArray sessionTypes = new JSONArray();

		JSONArray sessions = new JSONArray();

		JSONArray sponsors = new JSONArray();

		JSONArray speakers = new JSONArray();

		JSONArray tracks = new JSONArray();

		JSONObject eventBriteResult = new JSONObject();
		eventBriteResult.put("Event Brite Event Details", jsonArray);

		// print JSON
		PrintWriter sos = response.getWriter();

		String userHome = System.getProperty("user.home");
		String path = userHome + "/Downloads/EventBriteInfo";

		new File(path).mkdir();

		try (FileWriter file = new FileWriter(path + "/event.json")) {
		} catch (IOException e1) {

		try (FileWriter file = new FileWriter(path + "/org.json")) {
		} catch (IOException e1) {

		try (FileWriter file = new FileWriter(path + "/social_links.json")) {
		} catch (IOException e1) {

		try (FileWriter file = new FileWriter(path + "/microlocations.json")) {
		} catch (IOException e1) {

		try (FileWriter file = new FileWriter(path + "/custom_forms.json")) {
		} catch (IOException e1) {

		try (FileWriter file = new FileWriter(path + "/session_types.json")) {
		} catch (IOException e1) {

		try (FileWriter file = new FileWriter(path + "/sessions.json")) {
		} catch (IOException e1) {

		try (FileWriter file = new FileWriter(path + "/sponsors.json")) {
		} catch (IOException e1) {

		try (FileWriter file = new FileWriter(path + "/speakers.json")) {
		} catch (IOException e1) {

		try (FileWriter file = new FileWriter(path + "/tracks.json")) {
		} catch (IOException e1) {

		try {
			zipFolder(path, userHome + "/Downloads");
		} catch (Exception e1) {


	static public void zipFolder(String srcFolder, String destZipFile) throws Exception {
		ZipOutputStream zip = null;
		FileOutputStream fileWriter = null;
		fileWriter = new FileOutputStream(destZipFile);
		zip = new ZipOutputStream(fileWriter);
		addFolderToZip("", srcFolder, zip);

	static private void addFileToZip(String path, String srcFile, ZipOutputStream zip) throws Exception {
		File folder = new File(srcFile);
		if (folder.isDirectory()) {
			addFolderToZip(path, srcFile, zip);
		} else {
			byte[] buf = new byte[1024];
			int len;
			FileInputStream in = new FileInputStream(srcFile);
			zip.putNextEntry(new ZipEntry(path + "/" + folder.getName()));
			while ((len = in.read(buf)) > 0) {
				zip.write(buf, 0, len);

	static private void addFolderToZip(String path, String srcFolder, ZipOutputStream zip) throws Exception {
		File folder = new File(srcFolder);

		for (String fileName : folder.list()) {
			if (path.equals("")) {
				addFileToZip(folder.getName(), srcFolder + "/" + fileName, zip);
			} else {
				addFileToZip(path + "/" + folder.getName(), srcFolder + "/" + fileName, zip);


Check out https://github.com/loklak/loklak_server for more…


Feel free to ask questions regarding the above code snippet.

Also, Stay tuned for the next part of this post which shall include using the scraped information for Open Event.

Feedback and Suggestions welcome 🙂

Loklak fuels Open Event

The apps page gets a makeover

Huge transformations took place in the apps section. It’s amazing to see the count of the Loklak apps shoot up. There are more than 20 apps built using various APIs provided by Loklak. Inorder to dynamically manage the apps, apps.json was introduced.


Initially there was a simple card layout. With the required name and the description of the app.


The new tiles design has screenshots of the app and the details of the app is shown when the mouse is hovered on the screenshot. The apps are categorized based on the type of API being used. Accordingly the left navigation bar consists of all the categories.

This page is dynamic and takes the data from a new JSON object from apps API. It has all the apps and their details along with fields like categories list and the corresponding apps under it.


The above array “categories” was used to get all the list of categories on the left navigation bar.


The above category object is being used for getting the list of the apps under each specified category. This is being used to display the app’s details when a specific category is being clicked.

The above JSON made it easy to categorize the apps into various sections and gave a new look to the page. The tiles design was completely designed using standard CSS classes. The page is responsive and dynamic.

The apps page gets a makeover

Let’s ‘Meetup’ with Loklak

Loklak has already started to expand beyond the realms of Twitter and working its way to build an extensive social network.

Now, Loklak aims to bring in data crawled from meetups.com to create a close-knit community.

Chiming together with Meetup’s mission to revitalize local community and help people around the world self-organize, Loklak strives to revolutionize the social networking scenario of the present world.

Screenshot from 2016-06-10 09:23:10
Screenshot from 2016-06-10 09:24:05

In order to extract information viz. Group Name, Location, Description, Group Topics/Tags, Recent Meetups (Day, Date, Time, RSVP, Reviews, Introductory lines etc.) about a specific group on meetups.com I have used the URL http://www.meetup.com/<group-name>/ and then scraped information from the HTML page itself.

Just like previous experiments with other webpages, I have made use of JSoup: The Java HTML parser library in the loklak_server. It provides a very convenient API for extracting and manipulating data, scrape and parse HTML from a URL. The suggested use of JSoup is designed to deal with all varieties of HTML, hence as of now it is being considered a suitable choice.

The information scraped is stored in a multi-dimensional array of recentMeetupsResult[][] and then the data inside can be used accordingly.

A sample code-snippet for reference is as:

 *  Meetups Scraper
 *  By Jigyasa Grover, @jig08

package org.loklak.harvester;

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.*;
import org.jsoup.select.Elements;

public class MeetupsScraper {
	public static void main(String args[]){
		Document meetupHTML = null;
		String meetupGroupName = "Women-Who-Code-Delhi";
		// fetch group name here
		Element groupDescription = null;
		String groupDescriptionString = null;
		Element topicList = null;
		Elements topicListStrings = null;
		String[] topicListArray = new String[100];
		Integer numberOfTopics = 0;
		Element recentMeetupsSection = null;
		Elements recentMeetupsList = null;
		Integer numberOfRecentMeetupsShown = 0;
		Integer i = 0, j = 0;
		String recentMeetupsResult[][] = new String[100][3];
		// recentMeetupsResult[i][0] == date && time
		// recentMeetupsResult[i][1] == Attendance && Review
		// recentMeetupsResult[i][2] == Information
			meetupHTML = Jsoup.connect("http://www.meetup.com/" + meetupGroupName).userAgent("Mozilla)").get();
			groupDescription = meetupHTML.getElementById("groupDesc");
			groupDescriptionString = groupDescription.text();
			System.out.println(meetupGroupName + "ntGroup Description: ntt" + groupDescriptionString);
			topicList = meetupHTML.getElementById("topic-box-2012");
			topicListStrings = topicList.getElementsByTag("a");
			int p = 0;
			for(Element topicListStringsIterator : topicListStrings){
				topicListArray[p] = topicListStringsIterator.text().toString();
			numberOfTopics = p;
			System.out.println("nGroup Topics:");
			for(int l = 0; l<numberOfTopics; l++){
				System.out.println("ntTopic Number "+ l + " : " + topicListArray[l]);
			recentMeetupsSection = meetupHTML.getElementById("recentMeetups");
			recentMeetupsList = recentMeetupsSection.getElementsByTag("p");
			i = 0;
			j = 0;
			for(Element recentMeetups : recentMeetupsList ){				
					j = 0;
				recentMeetupsResult[i][j] = recentMeetups.text().toString();
			numberOfRecentMeetupsShown = i;
			for(int k = 1; k < numberOfRecentMeetupsShown; k++){
				System.out.println("nnRecent Meetup Number" + k + " : n" + 
						"nt Date & Time: " + recentMeetupsResult[k][0] + 
						"nt Attendance: " + recentMeetupsResult[k][1] + 
						"nt Information: " + recentMeetupsResult[k][2]);

		}catch (IOException e) {

In this, simply a HTTP Connection was established and text extracted using "element_name".text()from inside the specific tags using identifiers like classes or ids. The tags from which the information was to be extracted were identified after exploring the web page’s HTML source code.

The above yields results as:

	Group Description: 
		Mission: Women Who Code is a global nonprofit organization dedicated to inspiring women to excel in technology careers by creating a global, connected community of women in technology. The organization tripled in 2013 and has grown to be one of the largest communities of women engineers in the world. Empowerment: Women Who code is a professional community for women in tech. We provide an avenue for women to pursue a career in technology, help them gain new skills and hone existing skills for professional advancement, and foster environments where networking and mentorship are valued. Key Initiatives: - Free technical study groups - Events featuring influential tech industry experts and investors - Hack events - Career and leadership development Current and aspiring coders are welcome.  Bring your laptop and a friend!  Support Women Who Code: Donating to Women Who Code, Inc. (#46-4218859) directly impacts our ability to efficiently run this growing organization, helps us produce new programs that will increase our reach, and enables us to expand into new cities around the world ensuring that women and girls everywhere have the opportunity to pursue a career in technology. Women Who Code (WWCode) is dedicated to providing an empowering experience for everyone who participates in or supports our community, regardless of gender, gender identity and expression, sexual orientation, ability, physical appearance, body size, race, ethnicity, age, religion, or socioeconomic status. Because we value the safety and security of our members and strive to have an inclusive community, we do not tolerate harassment of members or event participants in any form. Our Code of Conduct applies to all events run by Women Who Code, Inc. If you would like to report an incident or contact our leadership team, please submit an incident report form. WomenWhoCode.com

Group Topics:

	Topic Number 0 : Django

	Topic Number 1 : Web Design

	Topic Number 2 : Ruby

	Topic Number 3 : HTML5

	Topic Number 4 : Women Programmers

	Topic Number 5 : JavaScript

	Topic Number 6 : Python

	Topic Number 7 : Women in Technology

	Topic Number 8 : Android Development

	Topic Number 9 : Mobile Technology

	Topic Number 10 : iOS Development

	Topic Number 11 : Women Who Code

	Topic Number 12 : Ruby On Rails

	Topic Number 13 : Computer programming

	Topic Number 14 : WWC

Recent Meetup Number1 : 

	 Date & Time: April 2 · 10:30 AM
	 Attendance: 13 Women Who Code-rs | 5.001
	 Information: Brought to you in collaboration with Women Techmakers Delhi.According to a survey, only 11% of open source participants are women. People find it intimidating to get... Learn more

Recent Meetup Number2 : 

	 Date & Time: March 3 · 3:00 PM
	 Attendance: 21 Women Who Code-rs | 5.001
	 Information: “Behold, the number five is at hand. Grab it and shake and harness the power of networking.” Women Who Code Delhi is proud to present Social Hack Eve, a networking... Learn more

Recent Meetup Number3 : 

	 Date & Time: Oct 18, 2015 · 9:00 AM
	 Attendance: 20 Women Who Code-rs | 4.502
	 Information: Hello Ladies :) Google Women Techmakers is looking for women techies to present a talk in one of the segments of Google DevFest Delhi 2015 planned for October 18, 2015... Learn more

Recent Meetup Number4 : 

	 Date & Time: Jul 5, 2015 · 12:00 PM
	 Attendance: 24 Women Who Code-rs | 4.001 | 1 Photo
	 Information: Agenda: Learning how to use and develop open source software, and contribute to huge existing open source projects.A series of talks by some of this year’s GSoC... Learn more

In this sample, simply text has been retrieved . Advanced versions could include the hyperlinks and multimedia embedded in the web-page and integrating the extracted information in a suitable format.

Check out this space for more details and implementation details of crawlers.
Feedback and Suggestions welcome.

Let’s ‘Meetup’ with Loklak

Loklak ShuoShuo: Another feather in the cap !

Work is still going on Loklak Weibo to extract the information as desired and there shall be another post up soon explaining the intricacies of implementation.

Currently, an attempt was made to parse the HTML page of QQShuoShuo.com  (another Chinese twitter like service)

Screenshot from 2016-06-05 22:28:18Screenshot from 2016-06-05 22:28:25
Just like last time, The major challenge however is to understand the Chinese annotations especially being from a non-Chinese background. Google translate aids testing the retrieved data by helping me match each phrase or/and line.

I have made use of of  JSoup: The Java HTML parser library which assists in extracting and manipulating data, scrape and parse HTML from the URL. The suggested use of JSoup is designed to deal with all varieties of HTML, hence as of now it is being considered a suitable choice.

Screenshot from 2016-06-05 22:32:53

 *  Shuoshuo Crawler
 *  By Jigyasa Grover, @jig08

package org.loklak.harvester;

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.*;
import org.jsoup.select.Elements;

public class ShuoshuoCrawler {
    public static void main(String args[]){

        Document shuoshuoHTML = null;
        Element recommendedTalkBox = null;
        Elements recommendedTalksList = null;
        String recommendedTalksResult[] = new String[100];
        Integer numberOfrecommendedTalks = 0;
        Integer i = 0;

        try {
            shuoshuoHTML = Jsoup.connect("http://www.qqshuoshuo.com/").get();

            recommendedTalkBox = shuoshuoHTML.getElementById("list2");
            recommendedTalksList = recommendedTalkBox.getElementsByTag("li");

            for (Element recommendedTalks : recommendedTalksList)
                //System.out.println("\nLine: " + recommendedTalks.text());
                recommendedTalksResult[i] = recommendedTalks.text().toString();
            numberOfrecommendedTalks = i;
            System.out.println("Total Recommended Talks: " + numberOfrecommendedTalks);
            for(int k=0; k<numberOfrecommendedTalks; k++){
                System.out.println("Recommended Talk " + k + ": " + recommendedTalksResult[k]);

        } catch (IOException e) {


QQ Recommended Talks from qqshuoshuo.com are now stored as an array of Strings.

Total Recommended Talks: 10
Recommended Talk 0: 不会在意无视我的人,不会忘记帮助过我的人,不会去恨真心爱过我的人。
Recommended Talk 1: 喜欢一个人是一种感觉,不喜欢一个人却是事实。事实容易解释,感觉却难以言喻。
Recommended Talk 2: 一个人容易从别人的世界走出来却走不出自己的沙漠
Recommended Talk 3: 有什么了不起,不就是幸福在左边,我站在了右边?
Recommended Talk 4: 希望我跟你的爱,就像新闻联播一样没有大结局
Recommended Talk 5: 你会遇到别的女子和她举案齐眉,而我自会有别的男子与我白首相携。
Recommended Talk 6: 既然爱,为什么不说出口,有些东西失去了,就再也回不来了!
Recommended Talk 7: 凡事都有可能,永远别说永远。
Recommended Talk 8: 都是因为爱,而喜欢上了怀旧;都是因为你,而喜欢上了怀念。
Recommended Talk 9: 爱是老去,爱是新生,爱是一切,爱是你。

A similar approach can be now used to do the same for Latest QQ talk and QQ talk Leaderboard.

Check out this space for upcoming detail on implementing this technique to parse the entire page and get desired results…

Feedback and Suggestions welcome.

Loklak ShuoShuo: Another feather in the cap !

Know who tweeted the most!

It is obvious that all of us are curious to find out who is leading! My app, which is called ‘Twitter Leaderboard uses the search API provided by ‘Loklak’. Using my app you can collect the top five tweeters whose tweets contain the mentioned hashtag.

So here is a simple explanation of how ‘Twitter Leaderboard’ works. I built this app using AngularJS and the search API provided by Loklak. Primarily, I collected the tweets’ data and parse it for the required fields.

The leader board needed to have the tweeter’s name along with their tweet count. Also ‘Twitter Leaderboard’ needed to allow its users to search for a tweet based on the hashtag supplied to it in a query. 

I wrote a small controller to make a HTTP request to the Loklak search API, and parse the retrieved object accordingly.  I parsed the required data in the following manner:

  • data.statuses //The status object containing the requested hashtag.
  • user = statuses[i].screen_name // Get all the screen names.
  • counts = {} //Counts object to store the number of tweets count for each name.

After parsing for the required data from the search results, we have to sort the names according to the tweets count.  And here we go the results are pushed to the front end and displayed.


Another way to improve the search result is by using the Search Aggregations. This feature helps you to get the count results in a better by scaling up the search result. It will help you to count up all the tweets which are indexed(More than a billion tweets). This feature will not consider remote search results. Therefore the remote search is switched off by “source=cache” and the search results are switched off with “count=0”. The request for aggregation is simply hidden in “fields=…” which names the fields where you want to have an aggregation. You can even reduce the result size by restricting your search to certain period of time.

The upgrade to the above app will be provided soon using the Aggregations.

Know who tweeted the most!