Scraping and Feeding IOT data sets into Loklak – Part 2

As the integrations for the IOT services had begun, there are challenges especially with scraping multiple pages at once, such was the case with the NOAA Alerts and Weather information of the US Government. To scrape this information for the live updates that happen every 5 minutes it was necessary to simplify the process with which this process could be completed without running a complex web scraper work on them every time taking up precious cycles, What’s really interesting about the website is the way in which the data can be modeled into XML for any given page. Using this and leveraging the XML and other data conversion logic implemented previously for such task, I started digging deeper into the total working of the website and realized that appending &y=0 to the alerts URL resulted in XML generation, here’s an example of how this works
https://alerts.weather.gov/cap/wwaatmget.php?x=AKC013&y=0
and
https://alerts.weather.gov/cap/wwaatmget.php?x=AKC013

Screen Shot 2016-08-21 at 8.21.08 PM

Equivalent XML being
XML from Source NOAA

So extracting this has become quite a challenge because this poses two different challenges , one is how we can efficiently retrieve the information of the counties and how we can construct the alert urls. Perl to the rescue here !

sub process_statelist {
    my $html = `wget -O- -q https://alerts.weather.gov/`;
    $html =~ [email protected]*summary="Table [email protected]@s;
    $html =~ [email protected]*\s*@@s;
    $html =~ [email protected]\s*.*@@s;
    $html =~ [email protected]\s*@@s;
    %seen = ();

    while ( $html =~ [email protected]/(\w+?)\.php\?x=1">([^<]+)@sg ) {
        my $code = $1;
        my $name = $2;
        $name =~ s/'/\\'/g;
        $name =~ [email protected]\[email protected] @g;
        if (!exists($seen{$code})) {
            push @states_entries, $name;
            push @states_entryValues, $code;
        }
        $seen{$code} = 1;
    }
    open STATE, ">", "states.xml";
    print STATE <<EOF1;




    
EOF1
    foreach my $entry (@states_entries) {
        my $temp = $entry;
        $temp =~ s/'/\\'/g;
        $temp = escapeHTML($temp);
        print STATE "        $temp\n";
    }
    print STATE <<EOF2;
    
    
EOF2
    foreach my $entryValue (@states_entryValues) {
        my $temp = $entryValue;
        print STATE "        $temp\n";
    }
    print STATE <<EOF3;
    


EOF3
    close STATE;
    print "Wrote states.xml.\n";
}

Makes a request to the website and constructs the states list of all the states present in the USA. Now it’s time to construct it’s counties.

sub process_state {
    my $state = shift @_;
    if ( $state !~ /^[a-z]+$/ ) {
        print "Invalid state code: $state (skipped)\n";
        return;
    }

    my $html = `wget -O- -q https://alerts.weather.gov/cap/${state}.php?x=3`;

    my @entries     = ();
    my @entryValues = ();

    $html =~ [email protected].*@@s;
    while ( $html =~
[email protected]\s*?]+>\s*?]+>\s*?]+>\s*?\s*?]+>\s*?]+>([^<]+)\s*?\s*?]+>([^<]+)\s*?\s*@mg
      )
    {
        push @entries,     $2;
        push @entryValues, $1;
    }
    my $unittype = "Entire State";
    if ($state =~ /^mz/) {
        $unittype = "Entire Marine Zone";
    }
    if ($state eq "dc") {
        $unittype = "Entire District";
    }
    if (grep { $_ eq $state } qw(as gu mp um vi) ) {
        $unittype = "Entire Territory";
    }
    if ($state eq "us") {
        $unittype = "Entire Country";
    }
    if ($state eq "mzus") {
        $unittype = "All Marine Zones";
    }
    print COUNTIES <<EOF1;
    
        $unittype
EOF1
    foreach my $entry (@entries) {
        my $temp = $entry;
        $temp =~ s/'/\\'/g;
        $temp = escapeHTML($temp);
        print COUNTIES "        $temp\n";
    }
    print COUNTIES <<EOF2;
    
    
        https://alerts.weather.gov/cap/$state.php?x=0
EOF2
    foreach my $entryValue (@entryValues) {
        my $temp = $entryValue;
        $temp =~ s/'/\\'/g;
        $temp = escapeHTML($temp);
        print COUNTIES "        https://alerts.weather.gov/cap/wwaatmget.php?x=$temp&y=0\n";
    }
    print COUNTIES <<EOF3;
    
EOF3
    print "Processed counties from $state.\n";

}

There we go voila, we now have a perfect mapping in between every single county and the alert URL requirement for that particular county. The NOAA scraper and parser has been quite a challenge but provides the data in real-time from the loklak server. The information can be passed via the XML Parser written as a service at /api/xml2json.json and the developers can receive the information in their required format.

Scraping and Feeding IOT data sets into Loklak – Part 2

Time across seven seas…

It has been rightly said:

Time is of your own making
Its clock ticks in your head.
The moment you stop thought
Time too stops dead.

loklak_org_sticker

Hence to keep up with evolving times, Loklak has now introduced a new service for “time”.

The recently developed API provides the current time and day at the location queried by the user.

The /api/locationwisetime.json API scrapes the results from timeanddate.com using our favourite JSoup as it provides a very convenient API for extracting and manipulating data, scrape and parse HTML from a given URL.

In case of multiple locations with the same name, countries are then also provided along-with corresponding day and time wrapped up as a JSONObject.

A sample query could then be something like: http://loklak.org/api/locationwisetime.json?query=london

Screenshot from 2016-08-17 14:28:28

 

When implemented as a console service, this API can be used along-with our our dear SUSI by utilising the API Endpoints like: http://loklak.org/api/console.json?q=SELECT * FROM locationwisetime WHERE query=’berlin’;

Screenshot from 2016-08-17 14:50:58

LocationWiseTimeService.java for reference:


/**
 *  Location Wise Time
 *  timeanddate.com scraper
 *  Copyright 27.07.2016 by Jigyasa Grover, @jig08
 *
 *  This library is free software; you can redistribute it and/or
 *  modify it under the terms of the GNU Lesser General Public
 *  License as published by the Free Software Foundation; either
 *  version 2.1 of the License, or (at your option) any later version.
 *  
 *  This library is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 *  Lesser General Public License for more details.
 *  
 *  You should have received a copy of the GNU Lesser General Public License
 *  along with this program in the file lgpl21.txt
 *  If not, see <http://www.gnu.org/licenses/>.
 */

package org.loklak.api.search;

import java.io.IOException;

import javax.servlet.http.HttpServletResponse;

import org.json.JSONArray;
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.loklak.server.APIException;
import org.loklak.server.APIHandler;
import org.loklak.server.AbstractAPIHandler;
import org.loklak.server.Authorization;
import org.loklak.server.BaseUserRole;
import org.loklak.server.Query;
import org.loklak.susi.SusiThought;
import org.loklak.tools.storage.JSONObjectWithDefault;

public class LocationWiseTimeService extends AbstractAPIHandler implements APIHandler {

	private static final long serialVersionUID = -1495493690406247295L;

	@Override
	public String getAPIPath() {
		return "/api/locationwisetime.json";
	}

	@Override
	public BaseUserRole getMinimalBaseUserRole() {
		return BaseUserRole.ANONYMOUS;

	}

	@Override
	public JSONObject getDefaultPermissions(BaseUserRole baseUserRole) {
		return null;
	}

	@Override
	public JSONObject serviceImpl(Query call, HttpServletResponse response, Authorization rights,
			JSONObjectWithDefault permissions) throws APIException {
		String query = call.get("query", "");
		return locationWiseTime(query);
	}

	public static SusiThought locationWiseTime(String query) {
		
		Document html = null;

		JSONArray arr = new JSONArray();

		try {
			html = Jsoup.connect("http://www.timeanddate.com/worldclock/results.html?query=" + query).get();
		} catch (IOException e) {
			e.printStackTrace();
		}

		Elements locations = html.select("td");
		int i = 0;
		for (Element e : locations) {
			if (i % 2 == 0) {
				JSONObject obj = new JSONObject();
				String l = e.getElementsByTag("a").text();
				obj.put("location", l);
				String t = e.nextElementSibling().text();
				obj.put("time", t);
				arr.put(obj);
			}
			i++;
		}
		
		SusiThought json = new SusiThought();
		json.setData(arr);
		return json;
	}

}

 

Hope this helps, and worth the “time” 😛

Feel free to ask questions regarding the above code snippet, shall be happy to assist.

Feedback and Suggestions welcome 🙂

Time across seven seas…

Generic Scraper – The whole new design

Previously the generic API converts the webpage source code into a JSON response, with predefined tags. But this API needs a design model which can do much more than simply giving the obvious result. This API is redesigned completely. The architecture is shared in this blog post. Before reading this, you would like to go here and read about the existing API.

Input – The input to the generic scraper is the URL, where in the code internally scrapes the source code. Which is happening over here. (Using JSoup)

Document page = Jsoup.connect(url).get();

The page is the source code extracted from the URL (which is a string). Where in the source code is expected to be in this format.

Selection_283

But this is not the case with few websites where the source code is embedded is inside the script tags like this.

Selection_282

The API must be able to compile such source code and convert into the required format. Such corner cases must be handled.

Scraping – The format of the websites are different. The blog posts are completely different from a discussion posts. Such variations must be handled properly and keeping that in mind the code is being divided into five main sub APIs internally.

Article API – The Article API is used to extract clean article text and other data from news articles, blog posts and other text-heavy pages. Retrieve the full-text, cleaned, related images and videos, author, date, tags—automatically, from any article on any site.

Product API – The Product API automatically extracts complete data from any shopping or e-commerce product page. Retrieve full pricing information, product IDs (SKU, UPC, MPN), images, product specifications, brand and more.

Image API – The Image API identifies the primary image(s) of a submitted web page and returns comprehensive information and metadata for each image.

Discussion API – The Discussion API automatically structures and extracts entire threads or lists of reviews/comments from most discussion pages, forums, and similarly structured web pages.

Advertisements API – The Advertisement API automatically structures the ad details on the website.

Along with the above APIs which covers most of the requirement, the API even supports the general response. When none of the search results matches with the formats, it gives a generic JSON response. The APIs are lucid and must be specified by the user giving the input , in the following pattern.

/api/genericscraper.json?url=http://blog.loklak.net/convert-web-pages-into-structured-data/&type=article

The type covers the sub APIs like article, discussion, product and so on.

Re Usability – The API is designed to use the existing scraper responses which are specific to certain websites like, wordpress, meetups.com, Amazon etc. Internally the API handles the match for such URLs and calls the page specific scrapers for the response. This helps in reusing the existing scrapers in loklak. You can go through this doc page for more details.

Technology Stack – The scraping is done using Jsoup , which is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods. The API endpoint is registered and can be accessed in the given format.

http://loklak.org/api/genericscraper.json?url=””&type=””

Coming up(Part Two) is a deeper version of this blog post which explains in depth about the API implementation and the targeted tags. Stay tuned for further updates.

 

Generic Scraper – The whole new design

The Making of the Console Service

SUSI , our very own personal digital assistant has been up and running giving quirky answers.

But behind all these are rules which train our cute bot to assist her and decide what answers to provide after parsing the question asked by the users.

The questions could range from any formal-informal greetings, general queries about name, weather, date, time to specific ones like details about some random Github profile or Tweets and Replies  from Twitter, or Weibo or election/football score predictions or simply asking her to read a RSS feed or a WordPress blog for you.

The rules for her training are written after that specific service is implemented which shall help her fetch the particular website/social network in question and scrape data out of it to present it to her operator.

And to help us expand the scope and ability of this naive being, it shall be helpful if users could extend her rule set. For this, it is required to make console service for sites which do not provide access to information without OAuth.

To begin with, let us see how a console service can be made.

Starting with a SampleService class which shall basically include the rudimentary scraper or code fetching the data is defined in the package org.loklak.api.search.
This is made by extending the AbstractAPIHandler class which itself extends the javax.servlet.http.HttpServlet class.
SampleService class further implements APIHandler class.

A placeholder for SampleService class can be as:


package org.loklak.api.search;

/**
* import statements
**/

public class SampleService extends AbstractAPIHandler 
    implements APIHandler{

    private static final long serialVersionUID = 2142441326498450416L;
    /**
     * serialVersionUID could be 
     * auto-generated by the IDE used
    **/

    @Override
    public String getAPIPath() {
        return "/api/service.json";
        /**
         *Choose API path for the service in question
        **/
    }

    @Override
    public BaseUserRole getMinimalBaseUserRole() {
        return BaseUserRole.ANONYMOUS;
    }

    @Override
    public JSONObject getDefaultPermissions(BaseUserRole baseUserRole) {
        return null;
    }

    @Override
    public JSONObject serviceImpl(Query call, HttpServletResponse response, 
        Authorization rights, JSONObjectWithDefault permissions) 
        throws APIException {

        String url = call.get("url", "");
        /**
         *This would extract the argument that will be supplied
         * to the "url" parameter in the "call"
        **/
        return crawlerForService(url);

    }

    public SusiThought crawlerForService(String url) {
        JSONArray arr = new JSONArray();
        
        /**
         * Crawler code or any other function which
         * returns a JSON Array **arr** goes in here 
        **/

        SusiThought json = new SusiThought();
        json.setData(arr);
        return json;
    }

}

 

The JSONArray in the key function crawlerForService is wrapped up in a SusiThought which is nothing but a piece of data that can be remembered. The structure or the thought can be modeled as a table which may be created using the retrieval of information from elsewhere of the current argument.

Now to implement it as a Console Service we include it in the ConsoleService class which is defined in the same package org.loklak.api.search and similarly extends AbstractAPIHandler class and implements APIHandler class.

Here, dbAccess is a static variable of the type SusiSkills where a skill is defined as the ability to inspire, to create thoughts from perception. The data structure of a skill set is a mapping from perception patterns to lambda expressions which induce thoughts.


package org.loklak.api.search;

/**
 * import statements go here
**/

public class ConsoleService extends AbstractAPIHandler 
    implements APIHandler {

    private static final long serialVersionUID = 8578478303032749879L;
    /**
     * serialVersionUID could be 
     * auto-generated by the IDE used
    **/

    @Override
    public BaseUserRole getMinimalBaseUserRole() { 
        return BaseUserRole.ANONYMOUS; 
    }

    @Override
    public JSONObject getDefaultPermissions(BaseUserRole baseUserRole) {
        return null;
    }

    public String getAPIPath() {
        return "/api/console.json";
    }

    public final static SusiSkills dbAccess = new SusiSkills();
        static {

            /**
             * Other "skills" are defined here
             * by "putting" them in "dbAccess"
            **/
    
    dbAccess.put(Pattern.compile("SELECT\\h+?(.*?)\\h+?FROM\\h+?
        sampleservice\\h+?WHERE\\h+?url\\h??=\\h??'(.*?)'\\h??;"), 
            (flow, matcher) -> {
                /**
                 * SusiThought-s are fetched from the Services
                 * implemented as above
                **/
                SusiThought json = SampleService.crawlerForService(matcher.group(2));
                SusiTransfer transfer = new SusiTransfer(matcher.group(1));
                json.setData(transfer.conclude(json.getData()));
                return json;
                });
    }

    @Override
    public JSONObject serviceImpl(Query post, HttpServletResponse response, 
        Authorization rights, final JSONObjectWithDefault permissions) 
        throws APIException {

            String q = post.get("q", "");
            /**
             *This would extract the argument that will be supplied
             * to the "q" parameter in the "post" query
            **/
            

            return dbAccess.inspire(q);
        }

}

 

Now that the console service is made, an API endpoint for the same can correspond to: http://localhost:9000/api/console.json?q=SELECT * FROM sampleservice WHERE url = ‘ … ‘;

The above can serve as a placeholder for creating Console Service which shall enable SUSI widen her horizon and become intelligent.

So, Go ahead and make Susi rules using it and you are done !

If any aid is required in making SUSI Rules, stay tuned for the next post.

Come, contribute to Loklak and SUSI !

The Making of the Console Service

Loklak fuels Open Event

A general background building….

The FOSSASIA Open Event Project aims to make it easier for events, conferences, tech summits to easily create Web and Mobile (only Android currently) micro Apps. The project comprises of a data schema for easily storing event details, a server and web front-end that are used to view, modify, update this data easily by the event organizers, a mobile-friendly web-app client to show the event data to attendees, an Android app template which will be used to generate specific apps for each event.

And Eventbrite is the world’s largest self-service ticketing platform. It allows anyone to create, share and find events comprising music festivals, marathons, conferences, hackathons, air guitar contests, political rallies, fundraisers, gaming competitions etc.

Kaboom !

Loklak now has a dedicated Eventbrite scraper API which takes in the URL of the event listing on eventbrite.com and outputs JSON Files as required by the Open Event Generator viz: events.json, organizer.json, user.json, microlocations.json, sessions.json, session_types.json, tracks.json, sponsors.json, speakers.json, social _links.json and custom_forms.json (details: Open Event Server : API Documentation)

What do we differently do than using the Eventbrite API  ? No authentication tokens required. This gels in perfectly with the Loklak missive.

To achieve this, I have simply parsed the HTML Pages using my favorite JSoup: The Java HTML parser library because it provides a very convenient API for extracting and manipulating data, scrape and parse all varieties of HTML from a URL.

The API call format is as: http://loklak.org/api/eventbritecrawler.json?url=https://www.eventbrite.com/[event-name-and-id]

And in return we get all the details on the Eventbrite page as JSONObject and also it gets stored in differently named files in a zipped folder [userHome + “/Downloads/EventBriteInfo”]

Example:

Event URL: https://www.eventbrite.de/e/global-health-security-focus-africa-tickets-25740798421
Screenshot from 2016-07-04 07:04:38

API Call: 
http://loklak.org/api/eventbritecrawler.json?url=https://www.eventbrite.de/e/global-health-security-focus-africa-tickets-25740798421

Output: JSON Object on screen andevents.json, organizer.json, user.json, microlocations.json, sessions.json, session_types.json, tracks.json, sponsors.json, speakers.json, social _links.json and custom_forms.json files written out in a zipped folder locally.

Screenshot from 2016-07-04 07:05:16
Screenshot from 2016-07-04 07:57:00


For reference, the code is as:

/**
 *  Eventbrite.com Crawler v2.0
 *  Copyright 19.06.2016 by Jigyasa Grover, @jig08
 *
 *  This library is free software; you can redistribute it and/or
 *  modify it under the terms of the GNU Lesser General Public
 *  License as published by the Free Software Foundation; either
 *  version 2.1 of the License, or (at your option) any later version.
 *  
 *  This library is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 *  Lesser General Public License for more details.
 *  
 *  You should have received a copy of the GNU Lesser General Public License
 *  along with this program in the file lgpl21.txt
 *  If not, see http://www.gnu.org/licenses/.
 */

package org.loklak.api.search;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.json.JSONArray;
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.loklak.http.RemoteAccess;
import org.loklak.server.Query;

public class EventbriteCrawler extends HttpServlet {

	private static final long serialVersionUID = 5216519528576842483L;

	@Override
	protected void doPost(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException {
		doGet(request, response);
	}

	@Override
	protected void doGet(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException {
		Query post = RemoteAccess.evaluate(request);

		// manage DoS
		if (post.isDoS_blackout()) {
			response.sendError(503, "your request frequency is too high");
			return;
		}

		String url = post.get("url", "");

		Document htmlPage = null;

		try {
			htmlPage = Jsoup.connect(url).get();
		} catch (Exception e) {
			e.printStackTrace();
		}

		String eventID = null;
		String eventName = null;
		String eventDescription = null;

		// TODO Fetch Event Color
		String eventColor = null;

		String imageLink = null;

		String eventLocation = null;

		String startingTime = null;
		String endingTime = null;

		String ticketURL = null;

		Elements tagSection = null;
		Elements tagSpan = null;
		String[][] tags = new String[5][2];
		String topic = null; // By default

		String closingDateTime = null;
		String schedulePublishedOn = null;
		JSONObject creator = new JSONObject();
		String email = null;

		Float latitude = null;
		Float longitude = null;

		String privacy = "public"; // By Default
		String state = "completed"; // By Default
		String eventType = "";

		eventID = htmlPage.getElementsByTag("body").attr("data-event-id");
		eventName = htmlPage.getElementsByClass("listing-hero-body").text();
		eventDescription = htmlPage.select("div.js-xd-read-more-toggle-view.read-more__toggle-view").text();

		eventColor = null;

		imageLink = htmlPage.getElementsByTag("picture").attr("content");

		eventLocation = htmlPage.select("p.listing-map-card-street-address.text-default").text();
		startingTime = htmlPage.getElementsByAttributeValue("property", "event:start_time").attr("content").substring(0,
				19);
		endingTime = htmlPage.getElementsByAttributeValue("property", "event:end_time").attr("content").substring(0,
				19);

		ticketURL = url + "#tickets";

		// TODO Tags to be modified to fit in the format of Open Event "topic"
		tagSection = htmlPage.getElementsByAttributeValue("data-automation", "ListingsBreadcrumbs");
		tagSpan = tagSection.select("span");
		topic = "";

		int iterator = 0, k = 0;
		for (Element e : tagSpan) {
			if (iterator % 2 == 0) {
				tags[k][1] = "www.eventbrite.com"
						+ e.select("a.js-d-track-link.badge.badge--tag.l-mar-top-2").attr("href");
			} else {
				tags[k][0] = e.text();
				k++;
			}
			iterator++;
		}

		creator.put("email", "");
		creator.put("id", "1"); // By Default

		latitude = Float
				.valueOf(htmlPage.getElementsByAttributeValue("property", "event:location:latitude").attr("content"));
		longitude = Float
				.valueOf(htmlPage.getElementsByAttributeValue("property", "event:location:longitude").attr("content"));

		// TODO This returns: "events.event" which is not supported by Open
		// Event Generator
		// eventType = htmlPage.getElementsByAttributeValue("property",
		// "og:type").attr("content");

		String organizerName = null;
		String organizerLink = null;
		String organizerProfileLink = null;
		String organizerWebsite = null;
		String organizerContactInfo = null;
		String organizerDescription = null;
		String organizerFacebookFeedLink = null;
		String organizerTwitterFeedLink = null;
		String organizerFacebookAccountLink = null;
		String organizerTwitterAccountLink = null;

		organizerName = htmlPage.select("a.js-d-scroll-to.listing-organizer-name.text-default").text().substring(4);
		organizerLink = url + "#listing-organizer";
		organizerProfileLink = htmlPage
				.getElementsByAttributeValue("class", "js-follow js-follow-target follow-me fx--fade-in is-hidden")
				.attr("href");
		organizerContactInfo = url + "#lightbox_contact";

		Document orgProfilePage = null;

		try {
			orgProfilePage = Jsoup.connect(organizerProfileLink).get();
		} catch (Exception e) {
			e.printStackTrace();
		}

		organizerWebsite = orgProfilePage.getElementsByAttributeValue("class", "l-pad-vert-1 organizer-website").text();
		organizerDescription = orgProfilePage.select("div.js-long-text.organizer-description").text();
		organizerFacebookFeedLink = organizerProfileLink + "#facebook_feed";
		organizerTwitterFeedLink = organizerProfileLink + "#twitter_feed";
		organizerFacebookAccountLink = orgProfilePage.getElementsByAttributeValue("class", "fb-page").attr("data-href");
		organizerTwitterAccountLink = orgProfilePage.getElementsByAttributeValue("class", "twitter-timeline")
				.attr("href");

		JSONArray socialLinks = new JSONArray();

		JSONObject fb = new JSONObject();
		fb.put("id", "1");
		fb.put("name", "Facebook");
		fb.put("link", organizerFacebookAccountLink);
		socialLinks.put(fb);

		JSONObject tw = new JSONObject();
		tw.put("id", "2");
		tw.put("name", "Twitter");
		tw.put("link", organizerTwitterAccountLink);
		socialLinks.put(tw);

		JSONArray jsonArray = new JSONArray();

		JSONObject event = new JSONObject();
		event.put("event_url", url);
		event.put("id", eventID);
		event.put("name", eventName);
		event.put("description", eventDescription);
		event.put("color", eventColor);
		event.put("background_url", imageLink);
		event.put("closing_datetime", closingDateTime);
		event.put("creator", creator);
		event.put("email", email);
		event.put("location_name", eventLocation);
		event.put("latitude", latitude);
		event.put("longitude", longitude);
		event.put("start_time", startingTime);
		event.put("end_time", endingTime);
		event.put("logo", imageLink);
		event.put("organizer_description", organizerDescription);
		event.put("organizer_name", organizerName);
		event.put("privacy", privacy);
		event.put("schedule_published_on", schedulePublishedOn);
		event.put("state", state);
		event.put("type", eventType);
		event.put("ticket_url", ticketURL);
		event.put("social_links", socialLinks);
		event.put("topic", topic);
		jsonArray.put(event);

		JSONObject org = new JSONObject();
		org.put("organizer_name", organizerName);
		org.put("organizer_link", organizerLink);
		org.put("organizer_profile_link", organizerProfileLink);
		org.put("organizer_website", organizerWebsite);
		org.put("organizer_contact_info", organizerContactInfo);
		org.put("organizer_description", organizerDescription);
		org.put("organizer_facebook_feed_link", organizerFacebookFeedLink);
		org.put("organizer_twitter_feed_link", organizerTwitterFeedLink);
		org.put("organizer_facebook_account_link", organizerFacebookAccountLink);
		org.put("organizer_twitter_account_link", organizerTwitterAccountLink);
		jsonArray.put(org);

		JSONArray microlocations = new JSONArray();
		jsonArray.put(microlocations);

		JSONArray customForms = new JSONArray();
		jsonArray.put(customForms);

		JSONArray sessionTypes = new JSONArray();
		jsonArray.put(sessionTypes);

		JSONArray sessions = new JSONArray();
		jsonArray.put(sessions);

		JSONArray sponsors = new JSONArray();
		jsonArray.put(sponsors);

		JSONArray speakers = new JSONArray();
		jsonArray.put(speakers);

		JSONArray tracks = new JSONArray();
		jsonArray.put(tracks);

		JSONObject eventBriteResult = new JSONObject();
		eventBriteResult.put("Event Brite Event Details", jsonArray);

		// print JSON
		response.setCharacterEncoding("UTF-8");
		PrintWriter sos = response.getWriter();
		sos.print(eventBriteResult.toString(2));
		sos.println();

		String userHome = System.getProperty("user.home");
		String path = userHome + "/Downloads/EventBriteInfo";

		new File(path).mkdir();

		try (FileWriter file = new FileWriter(path + "/event.json")) {
			file.write(event.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/org.json")) {
			file.write(org.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/social_links.json")) {
			file.write(socialLinks.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/microlocations.json")) {
			file.write(microlocations.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/custom_forms.json")) {
			file.write(customForms.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/session_types.json")) {
			file.write(sessionTypes.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/sessions.json")) {
			file.write(sessions.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/sponsors.json")) {
			file.write(sponsors.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/speakers.json")) {
			file.write(speakers.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try (FileWriter file = new FileWriter(path + "/tracks.json")) {
			file.write(tracks.toString());
		} catch (IOException e1) {
			e1.printStackTrace();
		}

		try {
			zipFolder(path, userHome + "/Downloads");
		} catch (Exception e1) {
			e1.printStackTrace();
		}

	}

	static public void zipFolder(String srcFolder, String destZipFile) throws Exception {
		ZipOutputStream zip = null;
		FileOutputStream fileWriter = null;
		fileWriter = new FileOutputStream(destZipFile);
		zip = new ZipOutputStream(fileWriter);
		addFolderToZip("", srcFolder, zip);
		zip.flush();
		zip.close();
	}

	static private void addFileToZip(String path, String srcFile, ZipOutputStream zip) throws Exception {
		File folder = new File(srcFile);
		if (folder.isDirectory()) {
			addFolderToZip(path, srcFile, zip);
		} else {
			byte[] buf = new byte[1024];
			int len;
			FileInputStream in = new FileInputStream(srcFile);
			zip.putNextEntry(new ZipEntry(path + "/" + folder.getName()));
			while ((len = in.read(buf)) > 0) {
				zip.write(buf, 0, len);
			}
			in.close();
		}
	}

	static private void addFolderToZip(String path, String srcFolder, ZipOutputStream zip) throws Exception {
		File folder = new File(srcFolder);

		for (String fileName : folder.list()) {
			if (path.equals("")) {
				addFileToZip(folder.getName(), srcFolder + "/" + fileName, zip);
			} else {
				addFileToZip(path + "/" + folder.getName(), srcFolder + "/" + fileName, zip);
			}
		}
	}

}

Check out https://github.com/loklak/loklak_server for more…


 

Feel free to ask questions regarding the above code snippet.

Also, Stay tuned for the next part of this post which shall include using the scraped information for Open Event.

Feedback and Suggestions welcome 🙂

Loklak fuels Open Event

Now get wordpress blog updates with Loklak !

Loklak shall soon be spoiling its users !

Next, it will be bringing in tiny tweet-like cards showing the blog-posts (title, publishing date, author and content) from the given WordPress Blog URL.

This feature is certain to expand the realm of Loklak’s missive of building a comprehensive and an extensive social network dispensing useful information.

Screenshot from 2016-06-22 04:48:28

In order to implement this feature, I have again made the use of JSoup: The Java HTML parser library as it provides a very convenient API for extracting and manipulating data, scrape and parse HTML from a URL.

The information is scraped making use of JSoup after the corresponding URL in the format "https://[username].wordpress.com/" is passed as an argument to the function scrapeWordpress(String blogURL){..} which returns a JSONObject as the result.

A look at the code snippet :

/**
 *  WordPress Blog Scraper
 *  By Jigyasa Grover, @jig08
 **/

package org.loklak.harvester;

import java.io.IOException;

import org.json.JSONArray;
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class WordPressBlogScraper {
	public static void main(String args[]){
		
		String blogURL = "https://loklaknet.wordpress.com/";
		scrapeWordpress(blogURL);		
	}
	
	public static JSONObject scrapeWordpress(String blogURL) {
		
                Document blogHTML = null;
		
		Elements articles = null;
		Elements articleList_title = null;
		Elements articleList_content = null;
		Elements articleList_dateTime = null;
		Elements articleList_author = null;

		String[][] blogPosts = new String[100][4];
		
		//blogPosts[][0] = Blog Title
		//blogPosts[][1] = Posted On
		//blogPosts[][2] = Author
		//blogPosts[][3] = Blog Content
		
		Integer numberOfBlogs = 0;
		Integer iterator = 0;
		
		try{			
			blogHTML = Jsoup.connect(blogURL).get();
		}catch (IOException e) {
            e.printStackTrace();
        }
			
			articles = blogHTML.getElementsByTag("article");
			
			iterator = 0;
			for(Element article : articles){
				
				articleList_title = article.getElementsByClass("entry-title");				
				for(Element blogs : articleList_title){
					blogPosts[iterator][0] = blogs.text().toString();
				}
				
				articleList_dateTime = article.getElementsByClass("posted-on");				
				for(Element blogs : articleList_dateTime){
					blogPosts[iterator][1] = blogs.text().toString();
				}
				
				articleList_author = article.getElementsByClass("byline");				
				for(Element blogs : articleList_author){
					blogPosts[iterator][2] = blogs.text().toString();
				}
				
				articleList_content = article.getElementsByClass("entry-content");				
				for(Element blogs : articleList_content){
					blogPosts[iterator][3] = blogs.text().toString();
				}
				
				iterator++;
				
			}
			
			numberOfBlogs = iterator;
			
			JSONArray blog = new JSONArray();
			
			for(int k = 0; k<numberOfBlogs; k++){
				JSONObject blogpost = new JSONObject();
				blogpost.put("blog_url", blogURL);
				blogpost.put("title", blogPosts[k][0]);
				blogpost.put("posted_on", blogPosts[k][1]);
				blogpost.put("author", blogPosts[k][2]);
				blogpost.put("content", blogPosts[k][3]);
				blog.put(blogpost);
			}			
			
			JSONObject final_blog_info = new JSONObject();
			
			final_blog_info.put("Wordpress blog: " + blogURL, blog);			

			System.out.println(final_blog_info);
			
			return final_blog_info;
		
	}
}

 

In this, simply a HTTP Connection was established and text extracted using “element_name”.text() from inside the specific tags using identifiers like classes or ids. The tags from which the information was to be extracted were identified after exploring the web page’s HTML source code.

The result thus obtained is in the form of a JSON Object

{
  "Wordpress blog: https://loklaknet.wordpress.com/": [
    {
      "posted_on": "June 19, 2016",
      "blog_url": "https://loklaknet.wordpress.com/",
      "author": "shivenmian",
      "title": "loklak_depot u2013 The Beginning: Accounts (Part 3)",
      "content": "So this is my third post in this five part series on loklak_depo... As always, feedback is duly welcome."
    },
    {
      "posted_on": "June 19, 2016",
      "blog_url": "https://loklaknet.wordpress.com/",
      "author": "sopankhosla",
      "title": "Creating a Loklak App!",
      "content": "Hello everyone! Today I will be shifting from course a...ore info refer to the full documentation here. Happy Coding!!!"
    },
    {
      "posted_on": "June 17, 2016",
      "blog_url": "https://loklaknet.wordpress.com/",
      "author": "leonmakk",
      "title": "Loklak Walls Manual Moderation u2013 tweet storage",
      "content": "Loklak walls are going to....Stay tuned for more updates on this new feature of loklak walls!"
    },
    {
      "posted_on": "June 17, 2016",
      "blog_url": "https://loklaknet.wordpress.com/",
      "author": "Robert",
      "title": "Under the hood: Authentication (login)",
      "content": "In the second post of .....key login is ready."
    },
    {
      "posted_on": "June 17, 2016",
      "blog_url": "https://loklaknet.wordpress.com/",
      "author": "jigyasa",
      "title": "Loklak gives some hackernews now !",
      "content": "It's been befittingly said  u... Also, Stay tuned for more posts on data crawling and parsing for Loklak. Feedback and Suggestions welcome"
    },
    {
      "posted_on": "June 16, 2016",
      "blog_url": "https://loklaknet.wordpress.com/",
      "author": "Damini",
      "title": "Does tweets have emotions?",
      "content": "Tweets do intend some kind o...t of features: classify(feat1,u2026,featN) = argmax(P(cat)*PROD(P(featI|cat)"
    },
    {
      "posted_on": "June 15, 2016",
      "blog_url": "https://loklaknet.wordpress.com/",
      "author": "sudheesh001",
      "title": "Dockerize the loklak server and publish docker images to IBM Containers on Bluemix Cloud",
      "content": "Docker is an open source...nd to create and deploy instantly as well as scale on demand."
    }
  ]
}

 

The next step now would include "writeToBackend"-ing and then parsing the JSONObject as desired.

Feel free to ask questions regarding the above code snippet, shall be happy to assist.

Feedback and Suggestions welcome 🙂

Now get wordpress blog updates with Loklak !

Loklak gives some hackernews now !

It’s been befittingly said  “Well, news is anything that’s interesting, that relates to what’s happening in the world, what’s happening in areas of the culture that would be of interest to your audience.” by Kurt Loder, the famous American Journalist.

And what better than Hackernews : news.ycombinator.com for the tech community. It helps community by showing the important and latest buzz and sort them by popularity and their links.

Screenshot from 2016-06-17 08:01:42

LOKLAK next tried to include this important piece of information in its server by collecting data from this source. Instead of the usual scraping of HTML Pages we had been doing for other sources before, we have tried to read the RSS stream instead.

Simply put, RSS (Really Simple Syndication) uses a family of standard web feed formats to publish frequently updated information: blog entries, news headlines, audio, video. A standard XML file format ensures compatibility with many different machines/programs. RSS feeds also benefit users who want to receive timely updates from favorite websites or to aggregate data from many sites without signing-in and all.

Hackernews RSS Feed can be fetched via the URL https://news.ycombinator.com/rss and looks something like…

Screenshot from 2016-06-17 09:33:32

In order to keep things simple, I decided to use the ROME Framework to make a RSS Reader for Hackernews for Loklak.

Just for a quick introduction, ROME is a Java framework for RSS and Atom feeds. It’s open source and licensed under the Apache 2.0 license. ROME includes a set of parsers and generators for the various flavors of syndication feeds, as well as converters to convert from one format to another. The parsers can give you back Java objects that are either specific for the format you want to work with, or a generic normalized SyndFeed class that lets you work on with the data without bothering about the incoming or outgoing feed type.

So, I made a function hackernewsRSSReader which basically returns us a JSONObject of JSONArray “Hackernews RSS Feed[]” having JSONObjects each of which represents a ‘news headline’ from the source.

The structure of the JSONObject result obtained is something like:

{
   "Hackernews RSS Feed":[
      {
         "Description":"SyndContentImpl.value=....",
         "Updated-Date":"null",
         "Link":"http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.116.241103",
         "RSS Feed":"https://news.ycombinator.com/rss",
         "Published-Date":"Wed Jun 15 13:30:33 EDT 2016",
         "Hash-Code":"1365366114",
         "Title":"Second Gravitational Wave Detected at LIGO",
         "URI":"http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.116.241103"
      },
     ......
      {
         "Description":"SyndContentImpl.value=....",
         "Updated-Date":"null",
         "Link":"http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-410-principles-of-autonomy-and-decision-making-fall-2010/lecture-notes/MIT16_410F10_lec20.pdf",
         "RSS Feed":"https://news.ycombinator.com/rss",
         "Published-Date":"Wed Jun 15 08:37:36 EDT 2016",
         "Hash-Code":"1649214835",
         "Title":"Intro to Hidden Markov Models (2010) [pdf]",
         "URI":"http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-410-principles-of-autonomy-and-decision-making-fall-2010/lecture-notes/MIT16_410F10_lec20.pdf"
      }
   ]
}

It includes information like Title, Link, HashCode, Published Date, Updated Date, URI and the Description of each “news headline”.

The next step after extracting information is to write it to the back-end and then retrieve it whenever required and display it in the desired format as suitable to the Loklak Web Client after parsing it.

It requires JDOM and ROME jars to be configured into the build path before proceeding with implementation of the RSS Reader.

A look through the code for the HackernewsRSSReader.java :

/**
 *  Hacker News RSS Reader
 *  By Jigyasa Grover, @jig08
 **/

package org.loklak.harvester;

import java.net.MalformedURLException;
import java.net.URL;
import java.util.List;
import org.json.JSONArray;
import org.json.JSONObject;
import com.sun.syndication.feed.synd.SyndEntry;
import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.io.SyndFeedInput;
import com.sun.syndication.io.XmlReader;

public class HackernewsRSSReader {	
	
	/*
	 * For HackernewsRSS, simply pass URL: https://news.ycombinator.com/rss 
	 * in the function to obtain a corresponding JSON
	 */
	@SuppressWarnings({ "unchecked", "static-access" })
	public static JSONObject hackernewsRSSReader(String url){
		 
	        URL feedUrl = null;
			try {
				feedUrl = new URL(url);
			} catch (MalformedURLException e) {
				e.printStackTrace();
			}
	        
	        SyndFeedInput input = new SyndFeedInput();
	        
	        SyndFeed feed = null;
			try {
				feed = input.build(new XmlReader(feedUrl));
			} catch (Exception e) {
				e.printStackTrace();
			}
	        
	        String[][] result = new String[100][7];
	        //result[][0] = Title
	        //result[][1] = Link
	        //result[][2] = URI
	        //result[][3] = Hash Code
	        //result[][4] = PublishedDate
	        //result[][5] = Updated Date
	        //result[][6] = Description
	        
	        @SuppressWarnings("unused")
			int totalEntries = 0;
	        int i = 0;
	        
	        JSONArray jsonArray = new JSONArray();
	        
	        for (SyndEntry entry : (List)feed.getEntries()) {
	        	
	        	result[i][0] = entry.getTitle().toString();
	        	result[i][1] = entry.getLink().toString();
	        	result[i][2] = entry.getUri().toString();
	        	result[i][3] = Integer.toString(entry.hashCode()); 
	        	result[i][4] = entry.getPublishedDate().toString();
	        	result[i][5] = ( (entry.getUpdatedDate() == null) ? ("null") : (entry.getUpdatedDate().toString()) );
	        	result[i][6] = entry.getDescription().toString();
	        	
		        JSONObject jsonObject = new JSONObject();

	        	jsonObject.put("RSS Feed", url);
	        	jsonObject.put("Title", result[i][0]);
	        	jsonObject.put("Link", result[i][1]);
	        	jsonObject.put("URI", result[i][2]);
	        	jsonObject.put("Hash-Code", result[i][3]);
	        	jsonObject.put("Published-Date", result[i][4]);
	        	jsonObject.put("Updated-Date", result[i][5]);
	        	jsonObject.put("Description", result[i][6]);
	        	
	        	jsonArray.put(i, jsonObject);
	        	
	        	i++;
	        }
	        
	        totalEntries = i;
	        
	    JSONObject rssFeed = new JSONObject();
	    rssFeed.put("Hackernews RSS Feed", jsonArray);
	    System.out.println(rssFeed);
		return rssFeed;
		
	}

}

 

Feel free to ask questions regarding the above code snippet.

Also, Stay tuned for more posts on data crawling and parsing for Loklak.

Feedback and Suggestions welcome 🙂

Loklak gives some hackernews now !

Let’s ‘Meetup’ with Loklak

Loklak has already started to expand beyond the realms of Twitter and working its way to build an extensive social network.

Now, Loklak aims to bring in data crawled from meetups.com to create a close-knit community.

Chiming together with Meetup’s mission to revitalize local community and help people around the world self-organize, Loklak strives to revolutionize the social networking scenario of the present world.

Screenshot from 2016-06-10 09:23:10
Screenshot from 2016-06-10 09:24:05

In order to extract information viz. Group Name, Location, Description, Group Topics/Tags, Recent Meetups (Day, Date, Time, RSVP, Reviews, Introductory lines etc.) about a specific group on meetups.com I have used the URL http://www.meetup.com/<group-name>/ and then scraped information from the HTML page itself.

Just like previous experiments with other webpages, I have made use of JSoup: The Java HTML parser library in the loklak_server. It provides a very convenient API for extracting and manipulating data, scrape and parse HTML from a URL. The suggested use of JSoup is designed to deal with all varieties of HTML, hence as of now it is being considered a suitable choice.

The information scraped is stored in a multi-dimensional array of recentMeetupsResult[][] and then the data inside can be used accordingly.

A sample code-snippet for reference is as:

/**
 *  Meetups Scraper
 *  By Jigyasa Grover, @jig08
 **/

package org.loklak.harvester;

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.*;
import org.jsoup.select.Elements;

public class MeetupsScraper {
	public static void main(String args[]){
		
		Document meetupHTML = null;
		String meetupGroupName = "Women-Who-Code-Delhi";
		// fetch group name here
		Element groupDescription = null;
		String groupDescriptionString = null;
		Element topicList = null;
		Elements topicListStrings = null;
		String[] topicListArray = new String[100];
		Integer numberOfTopics = 0;
		Element recentMeetupsSection = null;
		Elements recentMeetupsList = null;
		Integer numberOfRecentMeetupsShown = 0;
		Integer i = 0, j = 0;
		String recentMeetupsResult[][] = new String[100][3];
		
		// recentMeetupsResult[i][0] == date && time
		// recentMeetupsResult[i][1] == Attendance && Review
		// recentMeetupsResult[i][2] == Information
				
		try{
			meetupHTML = Jsoup.connect("http://www.meetup.com/" + meetupGroupName).userAgent("Mozilla)").get();
			
			groupDescription = meetupHTML.getElementById("groupDesc");
			groupDescriptionString = groupDescription.text();
			System.out.println(meetupGroupName + "ntGroup Description: ntt" + groupDescriptionString);
			
			topicList = meetupHTML.getElementById("topic-box-2012");
			topicListStrings = topicList.getElementsByTag("a");
			
			int p = 0;
			for(Element topicListStringsIterator : topicListStrings){
				topicListArray[p] = topicListStringsIterator.text().toString();
				p++;
			}
			numberOfTopics = p;
			
			System.out.println("nGroup Topics:");
			for(int l = 0; l<numberOfTopics; l++){
				System.out.println("ntTopic Number "+ l + " : " + topicListArray[l]);
			}
			
			recentMeetupsSection = meetupHTML.getElementById("recentMeetups");
			recentMeetupsList = recentMeetupsSection.getElementsByTag("p");
			
			i = 0;
			j = 0;
			
			for(Element recentMeetups : recentMeetupsList ){				
				if(j%3==0){
					j = 0;
					i++;
				}
				
				recentMeetupsResult[i][j] = recentMeetups.text().toString();
				j++;
				
			}
			
			numberOfRecentMeetupsShown = i;
			
			for(int k = 1; k < numberOfRecentMeetupsShown; k++){
				System.out.println("nnRecent Meetup Number" + k + " : n" + 
						"nt Date & Time: " + recentMeetupsResult[k][0] + 
						"nt Attendance: " + recentMeetupsResult[k][1] + 
						"nt Information: " + recentMeetupsResult[k][2]);
			}

		}catch (IOException e) {
            e.printStackTrace();
        }
		
	}
}

In this, simply a HTTP Connection was established and text extracted using "element_name".text()from inside the specific tags using identifiers like classes or ids. The tags from which the information was to be extracted were identified after exploring the web page’s HTML source code.

The above yields results as:

Women-Who-Code-Delhi
	Group Description: 
		Mission: Women Who Code is a global nonprofit organization dedicated to inspiring women to excel in technology careers by creating a global, connected community of women in technology. The organization tripled in 2013 and has grown to be one of the largest communities of women engineers in the world. Empowerment: Women Who code is a professional community for women in tech. We provide an avenue for women to pursue a career in technology, help them gain new skills and hone existing skills for professional advancement, and foster environments where networking and mentorship are valued. Key Initiatives: - Free technical study groups - Events featuring influential tech industry experts and investors - Hack events - Career and leadership development Current and aspiring coders are welcome.  Bring your laptop and a friend!  Support Women Who Code: Donating to Women Who Code, Inc. (#46-4218859) directly impacts our ability to efficiently run this growing organization, helps us produce new programs that will increase our reach, and enables us to expand into new cities around the world ensuring that women and girls everywhere have the opportunity to pursue a career in technology. Women Who Code (WWCode) is dedicated to providing an empowering experience for everyone who participates in or supports our community, regardless of gender, gender identity and expression, sexual orientation, ability, physical appearance, body size, race, ethnicity, age, religion, or socioeconomic status. Because we value the safety and security of our members and strive to have an inclusive community, we do not tolerate harassment of members or event participants in any form. Our Code of Conduct applies to all events run by Women Who Code, Inc. If you would like to report an incident or contact our leadership team, please submit an incident report form. WomenWhoCode.com

Group Topics:

	Topic Number 0 : Django

	Topic Number 1 : Web Design

	Topic Number 2 : Ruby

	Topic Number 3 : HTML5

	Topic Number 4 : Women Programmers

	Topic Number 5 : JavaScript

	Topic Number 6 : Python

	Topic Number 7 : Women in Technology

	Topic Number 8 : Android Development

	Topic Number 9 : Mobile Technology

	Topic Number 10 : iOS Development

	Topic Number 11 : Women Who Code

	Topic Number 12 : Ruby On Rails

	Topic Number 13 : Computer programming

	Topic Number 14 : WWC


Recent Meetup Number1 : 

	 Date & Time: April 2 · 10:30 AM
	 Attendance: 13 Women Who Code-rs | 5.001
	 Information: Brought to you in collaboration with Women Techmakers Delhi.According to a survey, only 11% of open source participants are women. People find it intimidating to get... Learn more


Recent Meetup Number2 : 

	 Date & Time: March 3 · 3:00 PM
	 Attendance: 21 Women Who Code-rs | 5.001
	 Information: “Behold, the number five is at hand. Grab it and shake and harness the power of networking.” Women Who Code Delhi is proud to present Social Hack Eve, a networking... Learn more


Recent Meetup Number3 : 

	 Date & Time: Oct 18, 2015 · 9:00 AM
	 Attendance: 20 Women Who Code-rs | 4.502
	 Information: Hello Ladies :) Google Women Techmakers is looking for women techies to present a talk in one of the segments of Google DevFest Delhi 2015 planned for October 18, 2015... Learn more


Recent Meetup Number4 : 

	 Date & Time: Jul 5, 2015 · 12:00 PM
	 Attendance: 24 Women Who Code-rs | 4.001 | 1 Photo
	 Information: Agenda: Learning how to use and develop open source software, and contribute to huge existing open source projects.A series of talks by some of this year’s GSoC... Learn more

In this sample, simply text has been retrieved . Advanced versions could include the hyperlinks and multimedia embedded in the web-page and integrating the extracted information in a suitable format.


Check out this space for more details and implementation details of crawlers.
Feedback and Suggestions welcome.

Let’s ‘Meetup’ with Loklak