• Large in quantity
  • Captured at rapid rate
  • Structure and unstructured
  • Whose Scale , Diversity and complexity require new architecture, techniques and algorithms

Characteristics

  • Volume
  • Variety
  • Velocity
Apaches Hadoop - The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

The Project includes these modules:

Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
Cassandra™: A scalable multi-master database with no single points of failure.
Chukwa™: A data collection system for managing large distributed systems.
HBase™: A scalable, distributed database that supports structured data storage for large tables.
Hive A data warehouse infrastructure that provides data summarization and ad hoc querying.
Mahout: A Scalable machine learning and data mining library.
Pig: A high-level data-flow language and execution framework for parallel computation.
ZooKeeper: A high-performance coordination service for distributed applications.

Tools & Technologies

Other Services

IT Consulting

Out IT Solutions provide customized solutions for management.

Explore

Business Intellegence

Business Intelligence has become essential part for the success of any enterprise.

Explore

Software Testing

Testing is the process of evaluating or exercising a system or system component by manual or automated ...

Explore

Data WareHousing

Data Warehouse is a special database that contains institutional and departmental data.

Explore