The benefits of certifying IBM Spectrum Scale with the Hortonworks Data Platform by Par Hettinga


So what is IBM launching with Hortonworks in 2Q 2017? We are certifying IBM Spectrum Scale with the Hortonworks Data Platform. We certified IBM Power Systems for the Compute part of HDP in April 2017. What we are now certifying is that the Hortonworks Data Platform can run Spectrum Scale as the Storage layer instead of the default Hadoop Distributed Filesystem, on both Power Systems and x86 using the Spectrum Scale Transparent HDFS Connector. Since this certification is for the Spectrum Scale software, it applies to both the software only version of Spectrum Scale and our integrated appliance called Elastic Storage Server that runs on 2 Power Servers with the Spectrum Scale software and storage hardware in a single node. The main client benefit for running Hortonworks HDP with IBM Spectrum Scale instead of HDFS is the big cost savings resulting from the reduction in the data footprint at the customer site and the ability to do in-place analytics. What happens with the Hadoop Distributed File System in a traditional application environment is that data is stored in multiple NAS boxes and you have to move the data from these NAS filers to the Hadoop Distributed Filesystem before you can run your Hadoop analytics and when this is completed, you will need to move the results back to your NAS filers. As the amount of data that needs to be analyzed grows into the multi Terabyte and Petabyte range, the moving of data from the NAS filers to HDFS becomes not only cumbersome but a very time consuming process potentially taking many hours or even days resulting in stale data being used to generate results because of the long copy process. Because IBM Spectrum Scale supports multiple Storage protocols like POSIX, NFS, SMB/CIFS, iSCSI plus SWIFT and S3 for Object Storage, we are able to build a huge Data Lake and run in-place analytics without the need to copy data as in a typical Hadoop HDFS workflow. What happens is that the applications can store the data in the Spectrum Scale filesystem which is the same place the Hadoop analytics jobs are performed, because now data can be accessed using the Spectrum Scale Transparent HDFS Connector. The second major client benefit is that HDFS normally does a default 3 way replication for data protection and performance. So if you have 5 PBs of data, with a 3 way replication you will need 15PB of storage. Using the IBM Elastic Storage Server running IBM Power Servers and Spectrum Scale software, plus GPFS Native Software RAID you eliminate the need for 3 way replication. So for 5PBs of data you will only need 6.5 PB of Storage. So a cost saving in Storage capacity of more than 40%. In summary, eliminating the need to move data from NAS filers to HDFS, and reducing the amount of storage needed for running Hortonworks HDP, provide compelling reasons for clients to move to an IBM Spectrum Scale or Elastic Storage Server based analytics solution. To get more information on this offering please visit the IBM Spectrum Scale Website.

MSP Databalance Case Study: Business decisions in real time based on fast data-driven Analytics


In our modern datacenters, we prefer IBM infrastructure. For our environment we have a mixed platform from Intel, Linux, Power i and AIX.
For this we use the various IBM Power P, Power i and Intel servers. All these systems are linked using the capabilities of the IBM San Volume Controller, FlashSystem V840 and V7000. As central storage solutions we use the V3700, V7000 and V840 storage systems, because of their excellent speed, reliability and low operating costs. The San Volume Controller is used to easy tier, real time compression and mirroring. With these standard techniques present at our systems we are redundant and thus high available. To insure continuity, we are using Tivoli Storage Manager software on most platforms, fully integrated with our SVC solutions. Together with IBM we can offer all possible cloud solutions for our customers, IAAS, PAAS, SAAS. The SAP platform used by Beeztees is hosted by Databalance Services. Databalance advised Beeztees to put the database servers on IBM Flash Storage. This IBM V840 Flash Storage delivers more than 400 thousand IOP’ s. The other servers have been placed on Easy Tier Storage, resulting in an optimal mix of speed and capacity. In practice the generation of reports, lookup jobs and batches are processed much faster. Databalance is a key partner of Beeztees in the field of automation. Throughout the whole migration Databalance has been involved and has advised and supported us. The result of the last months is a very modern and “state of the art” ERP platform based on SAP software and IBM hardware, which enables Beeztees to stay a few steps ahead of the competition. IBM Spectrum is based on software-defined storage and it enables users to obtain increased business benefits from their current storage products whether from IBM or another vendor. IBM has pioneered in this field since 2003 and supports more than 265 storage systems from several brands. This give you more value from earlier storage investments. Databalance is making use of the IBM Spectrum family in serving its clients. The IBM Spectrum Virtualize is a giving maximum flexibility and reliability by virtualizing the storage. You can also get more benefits by using features like Real Time Compression and Easy Tier. And of course you can create a disaster recovery environment by implementing remote mirroring. With IBM Spectrum Protect you enable a reliable, efficient data protection and resiliency for software defined, virtual, physical and cloud environments.

Ubiquity to enable IBM Spectrum Storage in Containers (Docker & Kubernetes) by Robert Haas


As the CTO for Storage Europe in my previous update I had mentioned I had mentioned that we intended to deliver a way to integrate our Storage in container environments such as Docker Swarm and Kubernetes. Well, this is now a reality, and it is called Ubiquity, thanks to the hard work of a team involving our Research and Development labs across the world. Ubiquity is available as open source, in experimental status at this time. Let me briefly explain here where we see the adoption of containers, and what is this Ubiquity technology enabling in a bit more detail. Many surveys are showing that the adoption of containers, and more specifically Docker, is accelerating, also in the enterprise environments. You may have noticed the announcements by many large companies intending to adopt container for most of their infrastructure. This covers many use-cases, such as traditional applications, HPC, cloud, and devops, for instance. In HPC, the portability of containers ensures that a workload can go from the testing laptop of a scientist to the big supercomputer without changes, that is from quality assurance, to staging, to production, with the same code. In a cloud environment, whether on-premise or not, containers are attractive because they deliver the best resource utilization and scalability, with the smallest footprint and the highest agility. Finally, for devops, containers simplify and accelerate application deployment through the reuse of components specified as dependencies, encouraging a micro-service architecture strategy. In summary, containers are a standard way to package applications and all its dependencies; they are portable between environments without changes; they isolate unique elements to enable a standardized infrastructure; all of that in a fast and lightweight fashion. Now, with the adoption of containers increasing beyond just stateless things such as a load balancer or a web application server, there is a need to provide support for persistent storage, that is, storage that remains after containers stop, so that data sets can be shared, so that the output of analysis can be retrieved by other processes, and so on and so forth. For many adopters of container technology, the persistent storage and data management are seen as the top pain points, hence storage vendors have started to support ways to enable their products in the Docker and the Kubernetes container environments using what is called plug-ins. With the technology we call Ubiquity, because it is targeted to support all of the IBM Storage, in all of the types of container environments, we have now released this ability as well. As I said, it is available at the moment in experimental status, so we’re welcoming feedback, and you can download it as open-source from the public github. In a nutshell, Ubiquity is the universal plug-in for all of IBM Storage. With this plug-in, and the underlying framework, storage can be provisioned, and mounted directly by the containerized applications, without manual interventions. This is key to enable the agility in an end-to-end fashion. This allows you to take advantage, for instance, of our Data Ocean technology such as Spectrum Scale in container environments. This way, you can also take advantage of the unique capabilities of Scale, in terms of performance, scalability, and information lifecycle management. And you can also seamlessly integrate our block storage such as Storwize. We are convinced that containers are going to play a role as important as VMs, if not more. Containers are already the norm in the IBM Bluemix offerings, and have been adopted by our Power and Z products. With Ubiquity we’re now able to close the loop with Storage. We’re collaborating with a number of clients testing Ubiquity already now, so that we can develop this technology to match our clients’ needs. Among many other things, we intend to adapt Ubiquity to the rapid changes occurring in the container frameworks such as the CSI (for Container Storage Interface), currently worked on by the Cloud Native Computing Foundation’s (CNCF) storage working group. To conclude, with this you will get the best of new generation applications with the performance and enterprise support of IBM Storage.