Loklak Weibo: Now going beyond Twitter !

As of now, Loklak has done a wonderful job in collecting billion(s) of tweets especially from Twitter. The highlight of this service has been anonymous search of Twitter without the use of any authentication key.

The next step is to go beyond Twitter and collect data from Chinese twitter like services for instance Weibo.com.

Screenshot from 2016-06-02 01:39:19.png

The major challenge however is to understand the Chinese annotations especially being from a non-Chinese background, but now that I have support of a Chinese friend from the Loklak community,  we hope it shall be an easier task to achieve now 🙂

The trick shall be simple, to parse the HTML page of the search results. It has been suggested to use the JSoup: The Java HTML parser library in the loklak_server. It provides a very convenient API for extracting and manipulating data, scrape and parse HTML from a URL. The suggested use of JSoup is designed to deal with all varieties of HTML, hence as of now it is being considered a suitable choice.

In our approach, the HTML page generated by the search query http://s.weibo.com/weibo/<search-string> shall be parsed instead of going the traditional way using the API call by authenticating via the key.

Screenshot from 2016-06-02 01:48:34.png
Sample code snippet to extract the title of the page:

String q = "Berlin";
//Get the Search Query in "q" here
Document doc = null;
String title = null;
try {
doc = Jsoup.connect("http://s.weibo.com/weibo/"+q).get();
title = doc.title();
} catch (IOException e) {
e.printStackTrace();
}

Check out this space for upcoming detail on implementing this technique to parse the entire page and get desired results…

Feedback and Suggestions welcome.

Loklak Weibo: Now going beyond Twitter !

4 thoughts on “Loklak Weibo: Now going beyond Twitter !

Comments are closed.