Showing posts with label big compute. Show all posts
Showing posts with label big compute. Show all posts

Monday, December 02, 2013

Massively Parallel Compute as a Service

Back in the Spring of 2012, I asked several panelists at VMWorld to weigh in on vector processing with GPUs as a big data/big compute solution. The response was a resounding "not yet," as the infrastructure has not yet reached commodity level and GPU processing was greatly constrained by memory paging. It now seems like both obstacles are being removed.

Amazon Web Services is now offering EC2 instances that offer up virtualized instances of NVIDIA's Kepler GPUs as "G2" instances. This supports H.264 encoding, OpenCL, CUDA and OpenGL toolsets which allows for more mature toolsets to build apps targeted to these vector processing instances. This kind of support allows for commodity toolchains and commodity infrastructure to allow for massively parallel processing on demand.

Memory paging should soon be addressed by NVIDIA via CUDA 6, and should also be addressed by AMD with its upcoming Kaveri architecture. Once memory addressing is unified, the swapping of memory regions should become unnecessary and allow for memory to be addressed locally without pagination. This simplifies application development, virtualization and hardware architectures considerably.

I believe that very soon we will see vector processing at scale garner as much attention as map/reduce clusters currently do. Massive data parsing has been commoditized, and now we have an opportunity to commoditize massive algorithmic crunching.

Tuesday, March 06, 2012

Filling The Pipeline

My past few work engagements have been centered around cloud computing and big data - doing stuff from managing large data centers to machine learning to map/reduce clusters. When I was at VMworld 2011 last year I took the opportunity to ask the "Big Compute and Big Data Panel" about leveraging vector processing hardware such as NVIDIA's Tesla to do data processing. The five panelists (Amr Awadallah of Cloudera, Clint Green of Data Tactics, Luke Lonergan of EMC, Richard McDougall of VMware and Paul Kent of SAS) largely agreed on a few main sticking points in vector processing for massively parallel systems:
  • The toolset is still relatively immature (maybe three years behind general CISC architectures)
  • The infrastructure has not yet reached commodity level
  • Big Compute works well with vector processing clusters, but not big data, since the latter is all about locality rather than in-memory processing
  • Commodity GPU processing is greatly constrained by memory paging - there's too much latency in transferring large in-memory datasets to GPU memory.

AMD had a few interesting announcements over the past few weeks that may pave the way for making cloud and big data/compute clusters more efficient and more "commodity." The first is their acquisition of SeaMicro, whose emphasis is around massively parallel, low-power computing cores with high-speed interconnects. This addresses one big issue brought up during the panel - that interconnects on big data clusters are going to become a prevailing issue as data needs to be transferred across nodes more rapidly to keep otherwise idle compute resources busy. CPUs can't crunch data sets if the data takes forever to arrive over the wire.

The next big announcement, which may be a huge sleeper hit, is AMD's unified memory architecture that's supposed to arrive in 2012. The slide on AnandTech shows that in AMD's 2012 product line the "GPU can access CPU memory," which is a HUGE development in vector processing. Imagine a data set being loaded in 64 GB of main memory, having 8 CPU cores clean the data using branch-intensive algorithms and then that same in-memory dataset being transformed by 512 stream processors. That kind of compute power without the need to stream data across a PCI-E bus could be a really, really big deal.

Still, the issue that remains are the tools that are available to make this happen. Very likely a developer would need to write generic C code to do the branching and then launch a separate OpenCL app to transform it but still share memory pointers so that nothing has to be swapped or paged out. In a world full of enterprise software developers, this kind of software engineering agility isn't exactly easy to find. If Cloudera were able to unleash this kind of power, AMD would have a big hit on their hands. Maybe AMD needs to start looking towards Cloudera as the final stage in the pipeline - an open-source framework that unlocks the potential of their infrastructure.