So what is IBM launching with Hortonworks in 2Q 2017? We are certifying IBM Spectrum Scale with the Hortonworks Data Platform. We certified IBM Power Systems for the Compute part of HDP in April 2017. What we are now certifying is that the Hortonworks Data Platform can run Spectrum Scale as the Storage layer instead of the default Hadoop Distributed Filesystem, on both Power Systems and x86 using the Spectrum Scale Transparent HDFS Connector. Since this certification is for the Spectrum Scale software, it applies to both the software only version of Spectrum Scale and our integrated appliance called Elastic Storage Server that runs on 2 Power Servers with the Spectrum Scale software and storage hardware in a single node. The main client benefit for running Hortonworks HDP with IBM Spectrum Scale instead of HDFS is the big cost savings resulting from the reduction in the data footprint at the customer site and the ability to do in-place analytics. What happens with the Hadoop Distributed File System in a traditional application environment is that data is stored in multiple NAS boxes and you have to move the data from these NAS filers to the Hadoop Distributed Filesystem before you can run your Hadoop analytics and when this is completed, you will need to move the results back to your NAS filers. As the amount of data that needs to be analyzed grows into the multi Terabyte and Petabyte range, the moving of data from the NAS filers to HDFS becomes not only cumbersome but a very time consuming process potentially taking many hours or even days resulting in stale data being used to generate results because of the long copy process. Because IBM Spectrum Scale supports multiple Storage protocols like POSIX, NFS, SMB/CIFS, iSCSI plus SWIFT and S3 for Object Storage, we are able to build a huge Data Lake and run in-place analytics without the need to copy data as in a typical Hadoop HDFS workflow. What happens is that the applications can store the data in the Spectrum Scale filesystem which is the same place the Hadoop analytics jobs are performed, because now data can be accessed using the Spectrum Scale Transparent HDFS Connector. The second major client benefit is that HDFS normally does a default 3 way replication for data protection and performance. So if you have 5 PBs of data, with a 3 way replication you will need 15PB of storage. Using the IBM Elastic Storage Server running IBM Power Servers and Spectrum Scale software, plus GPFS Native Software RAID you eliminate the need for 3 way replication. So for 5PBs of data you will only need 6.5 PB of Storage. So a cost saving in Storage capacity of more than 40%. In summary, eliminating the need to move data from NAS filers to HDFS, and reducing the amount of storage needed for running Hortonworks HDP, provide compelling reasons for clients to move to an IBM Spectrum Scale or Elastic Storage Server based analytics solution. To get more information on this offering please visit the IBM Spectrum Scale Website.
Recent Comments