Cloud HPC w/ NVIDIA Tesla V100 GPUs + AWS P3, Google TPU2, & IBM Power9 Processors
Cognitive Computing includes elements from Artificial Intelligence (AI), Deep Learning (DL), and Machine Learning (ML), primarily through a combination of training neural networks from data sets and processing information through established algorithms. Decades of research in computer science internationally has led to major advances in using this technology in practical ways, for example: robotics, data mining, speech recognition, autonomous or self-driving vehicle navigation, pharmaceutical discovery, image/video processing, risk management, complex system modeling, meme generation, customer behavior pattern discernment, product recommendations in ecommerce, etc. Neural Networks date back as far as 1944 to the work of Warren McCullough & Walter Pitts (University of Chicago/MIT), forming key aspects of the Cybernetics (1940-60s), Connectionism (1980-90s), and Deep Learning (2006-present) movements. Recent developments in cloud computing and hardware design have led to IT majors and Fortune 500 corporations implementing an "AI-First" strategy, where companies like eBay, Uber, SAP, Dropbox, AirBNB, Snapchat, Twitter, Qualcomm, ARM, and many others have already invested heavily in launching live production applications. In 2017, NVIDIA, Google, AWS, IBM, Facebook, AMD, and Microsoft all made major announcements about new AI/ML/DL platform technology, including for the first time making this advanced hardware available to businesses, researchers, and developers on the cloud computing model to integrate with existing web hosting and data center tools. These platforms also scale to perform as supercomputers in High Performance Computing (HPC) applications, processing "big data" in real-time and solving complex math, science, and research problems through the power of new GPU/TPU chip designs optimized for ML requirements.
The End of Moore's Law: Increased Competition & Innovation in GPU/TPU Chip Design
Most consumers are already familiar with the use of GPUs (Graphics Processing Units) and FPUs (Floating-Point Units) from PCI-based video graphics cards like those traditionally produced by NVIDIA, AMD, & Intel for high-end PC and Mac workstations or gaming machines. GPUs and FPUs optimize mathematical processing with better hardware performance than the CPU provides due to a combination of chip design, motherboard features, and software integration. Most of the CPU chips in current Intel/AMD desktops and servers are currently based on CISC (Complex Instruction Set Computing) architecture (Sandy Bridge, Ivy Bridge, Haswell, Broadwell, Skylake, etc.) with integrated GPUs, while mobile phones and tablets computers primarily use RISC (Reduced Instruction Set Computing) architecture licensed by ARM. In contrast, the new TPU chips designed and produced by Google, NVIDIA, and IBM for machine learning implement an ASIC (Application Specific Integrated Circuit) microarchitecture which provides significant performance gains in training neural networks, although these chips are much larger than the CPUs found in home and business computers. The recent surge in demand for GPU cards required for Bitcoin mining, autonomous vehicle network construction, and AI applications in industry has led NVIDIA's stock price to surge from around $29 at the beginning of 2016 to over $220 at the end of 2017. Building on the success of the Tesla P100 GPU accelerators for data centers and HPC based on the Pascal GPU architecture, NVIDIA released the Tesla V100 GPU in 2017 - "the world’s first GPU to break the 100 teraflops (TFLOPS) barrier of deep learning performance."
Microsoft and Amazon.com have both invested heavily in Volta hardware for their cloud data centers, where enterprise corporations, academic institutions, developers, and programmers can now lease GPU-enabled hardware for AI/ML/DL applications on the AWS and Azure platforms. However, considering the cost and margins involved in purchasing NVIDIA hardware at the highest levels of industrial scale, Google found it more cost effective to invest in designing their own TPU/TPU2 chips for internal use which they have also made available to the public through the TensorFlow platform. IBM's recent POWER9 chip launch (which grew out of the hardware ecosystem used to develop and run "Watson") is primarily targeted at high-end corporate enterprise, academic research, and government/military AI applications through the use of the IBM Power Systems AC922 server, which can run "up to six NVIDIA Volta-based Tesla V100 GPU accelerators in an air or water-cooled chassis."
"With 640 Tensor Cores, Tesla V100 is the world’s first GPU to break the 100 teraflops (TFLOPS) barrier of deep learning performance. The next generation of NVIDIA NVLink™ connects multiple V100 GPUs at up to 300 GB/s to create the world’s most powerful computing servers. AI models that would consume weeks of computing resources on previous systems can now be trained in a few days... By pairing NVIDIA CUDA® cores and Tensor Cores within a unified architecture, a single server with Tesla V100 GPUs can replace hundreds of commodity CPU-only servers for both traditional HPC and AI workloads." Learn More About NVIDIA Tesla V100 GPUs.
AWS, Google, and Microsoft Azure all currently offer Machine Learning-as-a-Service (MLaaS) products on their cloud platforms that allow programmers to get started immediately building applications based on the leading open source and proprietary AI development tool kits. Where both the Amazon and Microsoft platforms are built on NVIDIA Volta GPU hardware, Google currently offers the option to choose between TPU2 or Volta-based cloud servers. AWS servers are built around P2/P3 instances (with G3/EG1 instances available for graphics intensive applications).
Amazon's P3 servers, available with up to 8 dedicated Volta 100 GPUs, perform 14x better for machine learning applications than the P2 series based on Tesla K80 Accelerators with NVIDIA GK210 GPUs. Amazon Sagemaker is an optimized runtime environment with support for multiple deep learning platforms and includes hosted Jupyter notebooks, model training based on pre-installed hardware-optimized algorithms, and the ability to integrate with AWS S3 storage for data lake processing. Azure Machine Learning Studio also supports Jupyter notebooks and is designed to support Python and the R programming language. The Google Cloud ML Engine runs on TensorFlow and is designed to process "big data" for image, speech, and video applications or to speed up content/product recommendations served to customers in web apps. All of these services can scale to supercomputer levels for research and development institutions that need to contract HPC equipment on a time-limited basis, following the pay-as-you-go approach popular with cloud hosting accounts.
"Machine learning as a service (MLaaS) is an umbrella definition to automated and semi-automated cloud platforms that cover most infrastructure issues such as data pre-processing, model training, and model evaluation, with further prediction. Prediction results can be bridged with your internal IT infrastructure through REST APIs." Learn More About Amazon ML, Azure ML, & Google Cloud AI.
Programmers building new deep learning applications for business or research purposes currently have the option to draw upon the resources of the IT majors by using cloud services that provide speech recognition, image recognition, auto-translation, video processing, content recommendation, etc. that can be connected into web/mobile apps through RESTful APIs. IBM has made "Watson" an AI-as-a-Service application, AWS allows developers to integrate Alexa voice recognition into next-generation speech-driven apps & IoT devices, while Microsoft has a Cortana "bot" service on Azure that can be used, for example, to build and host chatbots for customer service requirements. All of these companies are offering advanced cognitive computing apps that are pre-trained with huge data sets that small businesses can leverage for much more powerful software capabilities than they would be able to affordably develop on their own independently. Amazon has Lex for Automatic Speech Recognition (ASR), Polly for text-to-speech processing, as well as highly trained image recognition, video transcription, language translation, and data mining tools. Microsoft Azure has cognitive services available via cloud APIs that include natural language processing (NLP), while GCP has Dialogflow for chatbots as well as industry leading translation, speech recognition, image analysis, and video intelligence tools all developed from mining their ecosystem of web properties. The ability to access these DL/ML tools on an affordable basis and easily build the functionality into existing websites or mobile apps using legacy customer data is one of the main ways that AI is filtering down into popular consumer platforms today.
"Volta, which has been on Nvidia's public roadmap since 2013, is based on a dramatically different architecture to Pascal, rather than a simple die shrink. The V100 chip is made on TSMC's 12nm Fin-FET manufacturing process and packs a whopping 21.1 billion transistors on a 815mm² die. By contrast, the P100 manages just 15.3 billion transistors on a 610mm² die, and the latest Titan Xp sports a mere 12 billion transistors on 471 mm²... The combination of die size and process shrink has enabled Nvidia to push the number of streaming multiprocessors (SMs) to 84. Each SM features 64 CUDA cores for a total of 5,376—much more than any of its predecessors. That said, V100 isn't a fully enabled part, with only 80 SMs enabled (most likely for yield reasons) resulting in 5,120 CUDA cores. In addition, V100 also features 672 tensor cores (TCs), a new type of core explicitly designed for machine learning operations." Learn More About the NVIDIA Tesla V100 GPU.
In order to build new applications for the NVIDIA Volta GPU architecture, programmers need to use the CUDA 9 software development toolkit. CUDA 9 supports the programming languages C, C++, Fortran, and Python, integrating with Microsoft Visual Studio 2017, clang 3.9, PGI 17.1 & GCC 6.x. With CUDA 9, software applications can take advantage of the superior throughput speeds of NVIDIA's new NVLINK architecture, use NVIDIA Performance Primitives for image & signal processing, as well as implementing advanced support for the features available with cuFFT, cuSOLVER, cuBLAS, & nvGRAPH. CUDA 9 also includes new algorithms for neural machine translations and sequence modeling operations using Volta Tensor cores. Some of the other popular programming and development platforms for building deep learning and machine learning applications currently supported by AWS, GCP, & Azure are:
In September of 2017, Facebook and Microsoft launched the Open Neural Network Exchange (ONNX) as an attempt to build open standards between all of these various developer frameworks for deep learning applications. Learn More About ONNX.
"Google’s CEO, Sundar Pichar, has made it clear that the company’s strategy has transitioned from 'Mobile First' to 'AI First'. Google’s Cloud TPU is far more strategic than just having access to a cheaper alternative to GPUs. The TPU and the Google TensorFlow Framework give the company’s engineers and data scientists a comprehensive and optimized platform to support their research and product development. Google teams can potentially gain time to market, performance and feature advantages since they control both the hardware and software for their Machine Learning enhanced products and services. The TPU could even provide a future platform to support the company’s autonomous vehicle aspirations. Beyond the internal drivers, Google Cloud could benefit in its competition with Amazon Web Services and Microsoft Azure Cloud by offering hardware with superior price / performance for TensorFlow development projects." Learn More About the Strategic Implications of the Google TPU2.
"A supervised deep learning algorithm will generally achieve acceptable performance with around 5,000 labeled examples per category and will match or exceed human performance when trained with a dataset containing at least 10 million labeled examples." Learn More About Neural Networks at TensorFlow.
"Built from the ground up for enterprise AI, the IBM Power Systems AC922 features two multi-core P9 CPUs and up to six NVIDIA Volta-based Tesla V100 GPU accelerators in an air or water-cooled chassis. To achieve the highest performance available from these state-of-the-art GPUs, the system features next-generation NVIDIA NVLink interconnect for CPU-to-GPU, which improves data movement between the P9 CPUs and NVIDIA Tesla V100 GPUs up to 5.6x compared to the PCIe Gen3 buses used within x86 systems. In addition to being the only server with next generation CPU-to-GPU NVLink, this is also the first server in the industry with PCIe 4.0, which doubles the bandwidth of PCIe Gen3, to which x86 is currently committed." Learn More About the IBM POWER9(P9) Processor.