Hadoop

What is Hadoop?

What is Hadoop? first of all we are understanding what is DFS(Distributed File System), Why DFS?

DFS(Distributed File Systems)-

A distributed file system is a client/server-based application that allows clients to access and process data stored on the server as if it were on their own computer. When a user accesses a file on the server, the server sends the user a copy of the file, which is cached on the user’s computer while the data is being processed and is then returned to the server.

In above pics there are different physical machines in different location but in one logical machine have a common file system for all physical machine.

  • System that permanently store data
  • Divided into logical units (files, shards, chunks, blocks etc)
  • A file path joins file and directory names into a relative or absolute relative address to identify a file
  • Support access to files and remote servers
  • Support concurrency
  • Support Distribution
  • Support Replication
  • NFS, GPFS, Hadoop DFS, GlusterFS, MogileFS…

WHY DFS?

What is Hadoop?

Apache Hadoop is a framework that allow for the distributed processing for large data sets across clusters of commodity computers using simple programming model.

It is design to scale up from a single server to thousands of machines each offering local computation and storage.

Apache Hadoop is simply a framework, it is library which build using java with objective of providing capability of managing huge amount of data.

Hadoop is a java framework providing by Apache hence to manage huge amount of data by providing certain components which have capability of understanding data providing the right storage capability and providing right algorithm to do analysis to it.

Open Source Software + Commodity Hardware = IT Costs reduction

What is Hadoop used for?

  • Searching
  • Log Processing
  • Recommendation systems
  • Analytics
  • Video and Image Analysis
  • Data Retention

Company Using Hadoop:

  • Yahoo
  • Google
  • Facebook
  • Amazon
  • AOL
  • IBM
  • other mores

http://wiki.apache.org/hadoop/PoweredBy

Previous
Next
Dinesh Rajput

Dinesh Rajput is the chief editor of a website Dineshonjava, a technical blog dedicated to the Spring and Java technologies. It has a series of articles related to Java technologies. Dinesh has been a Spring enthusiast since 2008 and is a Pivotal Certified Spring Professional, an author of a book Spring 5 Design Pattern, and a blogger. He has more than 10 years of experience with different aspects of Spring and Java design and development. His core expertise lies in the latest version of Spring Framework, Spring Boot, Spring Security, creating REST APIs, Microservice Architecture, Reactive Pattern, Spring AOP, Design Patterns, Struts, Hibernate, Web Services, Spring Batch, Cassandra, MongoDB, and Web Application Design and Architecture. He is currently working as a technology manager at a leading product and web development company. He worked as a developer and tech lead at the Bennett, Coleman & Co. Ltd and was the first developer in his previous company, Paytm. Dinesh is passionate about the latest Java technologies and loves to write technical blogs related to it. He is a very active member of the Java and Spring community on different forums. When it comes to the Spring Framework and Java, Dinesh tops the list!

Share
Published by
Dinesh Rajput

Recent Posts

Strategy Design Patterns using Lambda

Strategy Design Patterns We can easily create a strategy design pattern using lambda. To implement…

2 years ago

Decorator Pattern using Lambda

Decorator Pattern A decorator pattern allows a user to add new functionality to an existing…

2 years ago

Delegating pattern using lambda

Delegating pattern In software engineering, the delegation pattern is an object-oriented design pattern that allows…

2 years ago

Spring Vs Django- Know The Difference Between The Two

Technology has emerged a lot in the last decade, and now we have artificial intelligence;…

3 years ago

TOP 20 MongoDB INTERVIEW QUESTIONS 2022

Managing a database is becoming increasingly complex now due to the vast amount of data…

3 years ago

Scheduler @Scheduled Annotation Spring Boot

Overview In this article, we will explore Spring Scheduler how we could use it by…

3 years ago