Hadoop Tutorial

Hadoop Tutorial

First, before beginning this Hadoop Tutorial, let's explain some terms.

What is Big Data?

Big Data is the reality of to do business for most organizations. Big data is a collection of large data sets which can not be processed using routine data processing techniques. Big Data is no longer a given, it became a complete subject which involves various tools, techniques and frameworks. Big data involves data produced by applications and devices. Some areas that are under the Big Data roof

Some examples of Big Data

Social Media data such as Facebook and Twitter capture information and views displayed by millions of people worldwide.
Currency Exchange data contains information on the "buy" decisions and "sell" on the one hand made from different companies by clients.
Power Grid data contains information used by a particular node with respect to a base station.
Transport Data includes model, capacity, availability and distance of a vehicle.
Search Engine Data recover much data from different databases.

Big Data types

The data will be of three types:
- structured data is relational data.
- semi-structured data are XML data.
- unstructured data are documents like Word, PDF, Text, Media Newspapers

Technologies used in Big Data

There are various technologies in the market from different vendors, including Amazon, IBM, Microsoft, etc. to manage large volumes of data. In this article we will examine the two following classes of technologies:

Operational Big Data

It includes a system like MongoDB operational capabilities that provide real-time, interactive workloads where data is mainly captured and stores. NoSQL big data systems are designed to take advantage of new architectures of cloud, it makes operational workloads of large data much more manageable, cheaper and faster to implement.

Big Data Analytics

It includes systems such as Massively Parallel Processing (MPP) systems and database MapReduce analytic capabilities that provide complex analysis to show which can affect most or all of the data. MapReduce provide a new method of data analysis that is corresponding to the capabilities of SQL and MapReduce-based system that can be scaled from single servers to thousands of high-end devices and low.

Difficulties encountered by Big Data

The main challenges related to large volumes of data are:

• Data Capture
• Conservation
• Storage
• Research
• Share
• Transfer
• Analysis
• Presentation

Hadoop Big Data Solution

Old traditional approach

In traditional business old approach will have a computer to store and process large data. In these data will be stored in an RDBMS such as Oracle Database, MS SQL Server or DB2 and complicated software can be written to interact with the database process the required data and present users for purposes of analysis.

Limitation of the traditional approach

We use this approach where we have less volume of data that can be accommodated by the database servers or standard data to the processor limit that is currently processing the data. But when it comes to trade with huge amounts of data traditional approach is really a tedious task to process the data via a traditional database server.

The Google solution

MapReduce algorithm is Google's solution for this problem. In this algorithm, we split the task into smaller parts and assign the parts to many computers on the network and collect the results to form the final result dataset.

Where the Apache Hadoop fits in?

Let's first begin in this Hadoop tutorial what the Apache Hadoop actually is. Hadoop is basically an open framework of software that can store data and process data through hardware clusters. Hadoop is designed to grow from a single server to thousands of machines offering to each local storage and computer. Hadoop gives a massive storage for any data type with enormous processing power and the ability to handle tasks or virtually unlimited parallel jobs.

Hadoop Big Data Solution and history

Doug Cutting, Mike Cafarella and his team took the solution provided by Google and started an open source project called Hadoop. Hadoop in 2005 is a trademark of the Apache Software Foundation. Apache Hadoop is an open source framework written in Java that allows processing of large data sets on distributed computer clusters using simple programming models. Hadoop runs applications using the MapReduce algorithm, where the data are processed in parallel on various processor nodes. Hadoop framework is capable enough to develop applications that run on computer clusters and they could do a full statistical analysis to huge amounts of data.


Hadoop architecture framework

Hadoop Framework consists of four modules:

1) Common Hadoop
2) Hadoop Yarn
3) Hadoop Distributed File System
4) Hadoop MapReduce

Hadoop Tutorial


Discussing in detail the four hadoop modules

1) Common Hadoop is Java libraries and utilities required by other Hadoop modules. These libraries provide files and OS level abstractions system to contain the Java files and scripts needed to start Hadoop.

2) Yarn Hadoop is a framework for task scheduling and managing cluster resources.

3) Hadoop distributed file system that provides broadband access to the application data.

4) Hadoop MapReduce is Yarn based system for parallel processing of large data sets.
Working of Hadoop

There are 3 steps in Hadoop can discuss in detail in this Hadoop tutorial:

1st step: the user submits a job / Application to Hadoop for the necessary process by specifying the following:

1. Define the locations of the input and output files in the distributed file system.

2. Define the Java classes as a jar file containing the implementation of the plan and reduce functions.

3. Definition of the configuration of the job by defining different parameters specific to the job.

2nd Step: Hadoop (A Hadoop job client) then submits the job (jar / executable etc.) and configuring the Job Tracker which then assumes responsibility for distributing the software / configuration to the slaves, scheduling tasks and monitoring of granting status as diagnosis information to the job client.
3rd Step: Task Trackers on different nodes perform the task by the implementation and output of the function to reduce MapReduce is stored in the output files on the file system.

Hadoop Tutorial

Benefits of Hadoop

Top reasons to choose Hadoop is its ability to store and process huge amount of data quickly. Other benefits of Hadoop are:

Computing power - Hadoop distributed computing data module quickly process any amount of data. The increase in processing power using several computing nodes.
Flexibility - information you should pretreatment before storing. You can store as much information as you require and choose how to use it later.
Fault Tolerance - Information and handling of the application are insured against hardware failures. Incase a node goes down task are automatically redirected to other nodes to ensure that distributed computing is not lacking. Hadoop automatically store multiple copies of data.
Low Cost - Hadoop is open source Hadoop software framework is free & used good material to store large amount of data.
Hadoop is scalable - With little administration, we can easily increase our system simply by adding new nodes.

Hadoop MapReduce

It is a software framework for easily writing applications that process large amounts of data in parallel on large clusters working on thousands of basic hardware nodes reliably and fault tolerance.

The two different following tasks that programs perform MapReduce Hadoop actually refer the term:

1) Task Map: This is the first task that takes the input data and converts it into a data set, where individual elements are decomposed into tuples (key / value pairs).

2) Reduce Group: The reduction task is always executed after the task map. Out of a task card is taken as input and combines these data tuples in a small set of tuples.

Hadoop Tutorial

Author: Written by Mubeen Khalid for ®

Image Resources:

jQuery Drop Down List


The best way how to get a good jQuery drop down list is to utilize some of the plugins available at

There are many tutorials around the internet how to add drop down list on your website and use it in your applications.


Most of them use the most low-level way how to do that, and you have to create lot of boilerplate code to get a good drop down in your menu accomplished with jQuery.


Most of these plugins utilizing jQuery have same structure:

1. You need to include a jQuery script into the HTML header

<script src="//"></script>

2. You need to include drop down plugin script to HTML header

<script src="/jquery.{plugin_name}.js"></script>

3. Include CSS which adds styling to menu

<link rel="stylesheet" href="/css/{plugin_name}.css">

4. The actual HTML of drop down lists

<ul id="menu">
<li><a href="#">Item 1</a></li>
<li><a href="#">Item 2</a>
<li><a href="#">Sub Item 1</a></li>
<li><a href="#">Sub Item 2</a></li>
<li><a href="#">Item 3</a>
<li><a href="#">Item 4</a>


Row was updated or deleted by another transaction


While working with hibernate (which is quite popular java object relational mapper), you might have stumbled upon this exception:

Row was updated or deleted by another transaction


What does this exception really mean? Well, you're most probably trying to update an object via hibernate which was version ID is lower than version ID of object in the database.

Hibernate uses versioning to know that modified object you had is older than one which is currently persisted.

Hope this will help some java newbies which are using hibernate for persistence.

What is the solution to "row was updated or deleted by another transaction" then?



You need to get the very latest version of object to avoid "row was updated ... " exception before you make any changes.

This can be usually accomplished by adding:

MyEntity myEntity = persistenceManager.findObjectById(myEntity .getId())


here goes then:


on your hibernate persistence manager


Noclassdeffounderror Java

If you are a java developer, you definitely were experiencing java class not found exception with this error: noclassdeffounderror java . But what's behind this error?

We'll explain in next few lines:

Usually compiler tries to find a class within your classpath, but the class is not there. Therefore you're getting noclassdeffounderror.


If you are running a java command from command line:

1. Check included .jar files within your classpath

In windows environment type a command: "set"

Result should be something like:

CommonProgramFiles=C:\Program Files\Common Files
CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files
CommonProgramW6432=C:\Program Files\Common Files
Path=C:\ProgramData\Oracle\Java\javapath;C:\Program Files (x86)\NVIDIA Corporati
;D:\Prog\TortoiseGit\bin;C:\Program Files (x86)\Skype\Phone\
PROCESSOR_IDENTIFIER=AMD64 Family 16 Model 10 Stepping 0, AuthenticAMD
ProgramFiles=C:\Program Files
ProgramFiles(x86)=C:\Program Files (x86)
ProgramW6432=C:\Program Files
TVT=C:\Program Files (x86)\Lenovo

another command: set CLASSPATH=$CLASSPATH$"c:\{yourdirectorywithjarfiles}"

This should add visibility of your jar files within the classpath.

Optionally, you can specify a classpath this way:

java -jar mypackage.jar -cp c:\{yourdirectorywithjarfiles}

Guest Posting / Blogging

We're looking for guest bloggers to write articles about web technologies, latest trends, e-commerce and online marketing.

If you are interested, please drop us an email to info at codegravity dot com. Thank you !

Magento Multistore and Multi Website

Author Bio :

Jason Roiz is Magento developer at OSSMedia Magento development services company and is also engaged in writing informative articles on best tools and tricks for Magento development. His write-ups have proved beneficial for a wider group of Magento developers across the globe.

It is common knowledge that Magento is always in sync with the requirements of merchants planning to take their business online. When it comes to looking for the most ideal content management system to build their eShop on, Magento springs up almost instantly.


The performance levels and the scalability of this CMS has already given the other well established CMS platforms a serious run for their money. And one such feature that further elevates Magento's significance is the ability to set up and host multiwebsites and multi-stores.

This new feature under the Magento hood is built with great attention to detail. The system that allows for you to build multiwebsites and stores is called GWS (Global, Website, Store):

  • Global signifies the installation in its entirety.
  • Website signifies the source that hosts multiple stores.
  • Store (or store view group): While a single website may be hosting several stores, but when it comes to managing the frontend-specific elements like categories and product displays, it can be done from the stores. The store view group lets you address the root categories and this is how you can make sure that you can set up more than one store on a single website.
  • Store View refers to the interface between the website and the end-user. It dictates how data is showcased on the website.

Let's Take an Example

Take, for instance, a scenario where in you want to open an online store that can sell shoes, electronics and grooming products. Moving forward, we divide our catalogue into 2 stores and 2 websites. Now, we can decide on selling the electronic good from their exclusive website while shoes and grooming products can be sold collectively on the other website. So, we buy domains and Now, in order to ensure that information about customers and the order-specific details on the two websites don't get mixed up, we turn off the data sharing feature between the two websites.

On, we can further build two stores – shoes and grooming products. Now, both, shoes and the grooming products entail a wide range individually in terms of the variety and the price bands they belong to. So, what we are ging to do is that we will segregate these two in different sub categories of their own so that we do not have to deal with an enormous single category. With the stores finally created, corresponding root categories will be assigned to them.

Besides, it would not be a bad idea to have a catalog for both your websites in 2 different languages - English and let's say, French. To implement the same, there is the need to create an English and French store view for all the stores.


The GWS system is utilized as a sort of tree by the Magento setup while putting up the store. The Global settings refer to the default settings that are providing the stores with their elementary structure. To explicitly state that a particular feature is only meant for the specific website or store view, we need to un-check the checkbox against that feature. For example, we can have PayPal as the payment system on both websites, but we want Nelnet to be there only on For implementing the same, we need to head for the configurational settings and in there, select from the store view's drop-down menu. PayPal can be enabled or disabled for this website.

Settings for The Store View

When you are through with all the steps involved in creating the store view, you then have to move to modifying the configurational settings in the layout, along with makes some tweaks to the visual interface of the store view. Customers can also resort to the drop down for switching from one view to the other. On every reload, you can substitute the current page with the alternate view, without having to refresh it completely.

This also makes it possible for us to run A-B tests on different kinds of designs so that you are able to figure out which design can lead to the maximum conversions for your business. the diverse designs in order to find which one tosses at you maximum conversions.

So, with a clear-eyed understanding of how multistore and multi websites work in Magento, it would be a great idea to transform this idea into something tangible.

Binding Dropdown using jQuery

Author Bio :

Amy is WordPress developer by profession. She works for WordPrax - WordPress website development company and has a strong inclination for a suite of creative endeavors. Blogging meanwhile is a new found hobby for Amy.


Dropdown binding is perhaps one of the best things possible with jQuery. Eventhough it is feasible to bind a dropdown list using C#, opting for jQuery to perform the job entails utmost performance and convenience. This is a post which will walk you through the steps associated with using jQuery for binding a dropdown list.

Coming to the process of binding dropdown list using jQuery

Step 1- Add the reference jQuery Library

Prior to being able to use any particular function(s) available in jQuery, it is crucial for you to have the reference for the same within the aspx page. With jQuery 2.0 as the current working version, you can use the below line of code for providing reference of jQuery library within the aspx page:

<script src="/jquery-2.0.js" type="text/javascript"></script>

Step 2- Use a function for fetching data from jQuery Library

Just use the below mentioned function for deriving data from jQuery Library:

<script >

$(document).ready(function () {


type: "POST",

contentType: "application/json; charset=utf-8",

url: "test.aspx/LoadCountry",

data: "{}",

dataType: "json",

success: function (Result) {

$.each(Result.d, function (key, value) {




error: function (Result) {





Here's an explanation of the above code snippet:

1. $(document).ready(function () {

The above function is executed once the document has been loaded completely on the client machine. It is important to note that a page can't be manipulated securely unless the document is ready.

2. $.ajax({

You can combine jQuery and Ajax for getting and posting data on the server. Here, $.ajax is used for posting data to the sever and fetching data back for binding the same within the dropdown list.

3. type: "POST",

In this tutorial, I've assumed that the page just has two conditions: get and post. The above line of code represents this assumption.

4. contentType: "application/json; charset=utf-8",

This line of code represents as to what the content is encoded in.

5. url: "test.aspx/LoadCountry",

Th above line of code represents that URL holds the address of location where it is connected. So, text.aspx represents the name of page and LoadCountry is the name of method which allows you to connect the URL to database, followed by returning data on execution of jQuery function.

6. data: "{}",

Usually, data is passed the parameters (data) jQuery to code that's associated with the chosen method(here, it is LoadCountry).

7. dataType: "json",

This line of code represents that the all data types used in the explained example are supported by JSON. Some popular data types supported by JSON include: number, boolean, string, value, array, white space, object and many more.

8. success: function (Result) {

$.each(Result.d, function (key, value) {




In the above code snippet, Success is a pre-defined function available in jQuery. Also, the result is the object value $.Each which works in the form of a continuous loop until the desired values are returned. Within the lines of code:


  • #ddlcountry is the id of the 'country' drop-down

  • append($(“<option></option>”).val(value.CountryId).html(value.CountryName));

    represents that new options have been added into the dropdown list and their respective values are represented by value.CountryId and value.CountryName.

9. error: function (Result) {


Here, Error is also a jQuery function which is executed each time an error result is derived from executing the function.

Now, here is the code behind the above mentioned function:

// Country POCO class public class Country { public int CountryID {get; set;} public string CountryName {get; set;} }

[System.Web.Services.WebMethod] public static List<Country> LoadCountry() { return LoadCountries(); }

The above code represents the static type method used for executing the function.

The entire code snippet is shown below:

/// <summary> /// This method returns a list of Countries /// </summary> /// <returns>List<Country></returns> public static List<Country> LoadCountries() { //create a reference of List<Property>. List<Country> CountryInformation = new List<Country>();

// Creating database context and write Linq query to fetch countries list

using (var Context = new DatabaseContext()) { var list = Context.Country.ToList(); if(list != null && list.Count > 0) { foreach(var item in list) { CountryInformation.Add(new Country() { CountryID = item.CountryId, CountryName = item.CountryName }); } } return CountryInformation; } }

We're done!

OutPut :



Binding dropdown list on a page has always posed as a great challenge to developers. I hope the code mentioned within this post would aid you in performing the job with utmost amount of perfection. has Google Pagerank 8 (PR 8) again!

I'm pleased to announce that has Pagerank (PR) 8 again! :)

This just shows the significance of this website and it's presence in internet links all over the web.

Pagerank prediction

I finally found some working pagerank prediction tool.

I've been using , but it is completely broken. It gives you result: "Unable to retrieve prediction at this time".

This website: uses it's own algorithm to estimate your future pagerank based on the quality of backlinks.

It's hard to say how accurate it is, but maybe it at least does something :) This website has got several other useful SEO tools.

According to TLD or attribute directive in tag file

I wanted to use JSTL in my project, but whole day I was fighting with a message: According to TLD or attribute directive in tag file, attribute value does not accept any expressions.

My code was following:


Setting the value: ?Hello World!? and the problem still persisted. Then, on one forum I found a solution. It is the old version of JSTL and even if I copied the newest standard.jar and jstl.jar into WEB-INF/lib, it did not dissapear. The trick is here:

<%@ taglib prefix="c" uri="" %>

.. now everything works fine

<< Start < Prev 1 2 Next > End >>

Page 1 of 2

This website uses cookies to ensure you get the best experience on our website. Learn More.

Got It!