Loklak gives some hackernews now !

It’s been befittingly said  “Well, news is anything that’s interesting, that relates to what’s happening in the world, what’s happening in areas of the culture that would be of interest to your audience.” by Kurt Loder, the famous American Journalist.

And what better than Hackernews : news.ycombinator.com for the tech community. It helps community by showing the important and latest buzz and sort them by popularity and their links.

Screenshot from 2016-06-17 08:01:42

LOKLAK next tried to include this important piece of information in its server by collecting data from this source. Instead of the usual scraping of HTML Pages we had been doing for other sources before, we have tried to read the RSS stream instead.

Simply put, RSS (Really Simple Syndication) uses a family of standard web feed formats to publish frequently updated information: blog entries, news headlines, audio, video. A standard XML file format ensures compatibility with many different machines/programs. RSS feeds also benefit users who want to receive timely updates from favorite websites or to aggregate data from many sites without signing-in and all.

Hackernews RSS Feed can be fetched via the URL https://news.ycombinator.com/rss and looks something like…

Screenshot from 2016-06-17 09:33:32

In order to keep things simple, I decided to use the ROME Framework to make a RSS Reader for Hackernews for Loklak.

Just for a quick introduction, ROME is a Java framework for RSS and Atom feeds. It’s open source and licensed under the Apache 2.0 license. ROME includes a set of parsers and generators for the various flavors of syndication feeds, as well as converters to convert from one format to another. The parsers can give you back Java objects that are either specific for the format you want to work with, or a generic normalized SyndFeed class that lets you work on with the data without bothering about the incoming or outgoing feed type.

So, I made a function hackernewsRSSReader which basically returns us a JSONObject of JSONArray “Hackernews RSS Feed[]” having JSONObjects each of which represents a ‘news headline’ from the source.

The structure of the JSONObject result obtained is something like:

{
   "Hackernews RSS Feed":[
      {
         "Description":"SyndContentImpl.value=....",
         "Updated-Date":"null",
         "Link":"http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.116.241103",
         "RSS Feed":"https://news.ycombinator.com/rss",
         "Published-Date":"Wed Jun 15 13:30:33 EDT 2016",
         "Hash-Code":"1365366114",
         "Title":"Second Gravitational Wave Detected at LIGO",
         "URI":"http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.116.241103"
      },
     ......
      {
         "Description":"SyndContentImpl.value=....",
         "Updated-Date":"null",
         "Link":"http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-410-principles-of-autonomy-and-decision-making-fall-2010/lecture-notes/MIT16_410F10_lec20.pdf",
         "RSS Feed":"https://news.ycombinator.com/rss",
         "Published-Date":"Wed Jun 15 08:37:36 EDT 2016",
         "Hash-Code":"1649214835",
         "Title":"Intro to Hidden Markov Models (2010) [pdf]",
         "URI":"http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-410-principles-of-autonomy-and-decision-making-fall-2010/lecture-notes/MIT16_410F10_lec20.pdf"
      }
   ]
}

It includes information like Title, Link, HashCode, Published Date, Updated Date, URI and the Description of each “news headline”.

The next step after extracting information is to write it to the back-end and then retrieve it whenever required and display it in the desired format as suitable to the Loklak Web Client after parsing it.

It requires JDOM and ROME jars to be configured into the build path before proceeding with implementation of the RSS Reader.

A look through the code for the HackernewsRSSReader.java :

/**
 *  Hacker News RSS Reader
 *  By Jigyasa Grover, @jig08
 **/

package org.loklak.harvester;

import java.net.MalformedURLException;
import java.net.URL;
import java.util.List;
import org.json.JSONArray;
import org.json.JSONObject;
import com.sun.syndication.feed.synd.SyndEntry;
import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.io.SyndFeedInput;
import com.sun.syndication.io.XmlReader;

public class HackernewsRSSReader {	
	
	/*
	 * For HackernewsRSS, simply pass URL: https://news.ycombinator.com/rss 
	 * in the function to obtain a corresponding JSON
	 */
	@SuppressWarnings({ "unchecked", "static-access" })
	public static JSONObject hackernewsRSSReader(String url){
		 
	        URL feedUrl = null;
			try {
				feedUrl = new URL(url);
			} catch (MalformedURLException e) {
				e.printStackTrace();
			}
	        
	        SyndFeedInput input = new SyndFeedInput();
	        
	        SyndFeed feed = null;
			try {
				feed = input.build(new XmlReader(feedUrl));
			} catch (Exception e) {
				e.printStackTrace();
			}
	        
	        String[][] result = new String[100][7];
	        //result[][0] = Title
	        //result[][1] = Link
	        //result[][2] = URI
	        //result[][3] = Hash Code
	        //result[][4] = PublishedDate
	        //result[][5] = Updated Date
	        //result[][6] = Description
	        
	        @SuppressWarnings("unused")
			int totalEntries = 0;
	        int i = 0;
	        
	        JSONArray jsonArray = new JSONArray();
	        
	        for (SyndEntry entry : (List)feed.getEntries()) {
	        	
	        	result[i][0] = entry.getTitle().toString();
	        	result[i][1] = entry.getLink().toString();
	        	result[i][2] = entry.getUri().toString();
	        	result[i][3] = Integer.toString(entry.hashCode()); 
	        	result[i][4] = entry.getPublishedDate().toString();
	        	result[i][5] = ( (entry.getUpdatedDate() == null) ? ("null") : (entry.getUpdatedDate().toString()) );
	        	result[i][6] = entry.getDescription().toString();
	        	
		        JSONObject jsonObject = new JSONObject();

	        	jsonObject.put("RSS Feed", url);
	        	jsonObject.put("Title", result[i][0]);
	        	jsonObject.put("Link", result[i][1]);
	        	jsonObject.put("URI", result[i][2]);
	        	jsonObject.put("Hash-Code", result[i][3]);
	        	jsonObject.put("Published-Date", result[i][4]);
	        	jsonObject.put("Updated-Date", result[i][5]);
	        	jsonObject.put("Description", result[i][6]);
	        	
	        	jsonArray.put(i, jsonObject);
	        	
	        	i++;
	        }
	        
	        totalEntries = i;
	        
	    JSONObject rssFeed = new JSONObject();
	    rssFeed.put("Hackernews RSS Feed", jsonArray);
	    System.out.println(rssFeed);
		return rssFeed;
		
	}

}

 

Feel free to ask questions regarding the above code snippet.

Also, Stay tuned for more posts on data crawling and parsing for Loklak.

Feedback and Suggestions welcome 🙂

Loklak gives some hackernews now !

One thought on “Loklak gives some hackernews now !

Comments are closed.