DFS(Distributed File Systems)-
In above pics there are different physical machines in different location but in one logical machine have a common file system for all physical machine.
- System that permanently store data
- Divided into logical units (files, shards, chunks, blocks etc)
- A file path joins file and directory names into a relative or absolute relative address to identify a file
- Support access to files and remote servers
- Support concurrency
- Support Distribution
- Support Replication
- NFS, GPFS, Hadoop DFS, GlusterFS, MogileFS…
WHY DFS?
What is Hadoop?
Apache Hadoop is a framework that allow for the distributed processing for large data sets across clusters of commodity computers using simple programming model.
It is design to scale up from a single server to thousands of machines each offering local computation and storage.
Apache Hadoop is simply a framework, it is library which build using java with objective of providing capability of managing huge amount of data.
Hadoop is a java framework providing by Apache hence to manage huge amount of data by providing certain components which have capability of understanding data providing the right storage capability and providing right algorithm to do analysis to it.
Open Source Software + Commodity Hardware = IT Costs reduction
What is Hadoop used for?
- Searching
- Log Processing
- Recommendation systems
- Analytics
- Video and Image Analysis
- Data Retention
Company Using Hadoop:
- Yahoo
- Amazon
- AOL
- IBM
- other mores
http://wiki.apache.org/hadoop/PoweredBy