Amazon SageMaker AI Service
Amazon SageMaker AI Service

Amazon SageMaker AI Service

Amazon revealed its SageMaker AI service today, allowing its customers to train machine learning models at massive scale while keeping costs down. Amazon uses novel techniques to keep the required compute power locked down while providing comparable performance.

When SageMaker takes in data to train a model, it uses a streaming algorithm that only makes one pass over the data that it gets fed. While other algorithms can see exponential increases in the amount of time and processing power needed, Amazon’s algorithms don’t. As data is streamed into the system, the algorithm adjusts its state — a persistent representation of the statistical patterns present in the information fed into SageMaker for training a particular system.

That state isn’t a trained machine learning model, though: It’s an abstraction of the data fed to SageMaker that can then be used to train a model. That provides a number of useful advantages, like making it easier for Amazon to distribute training of a model. SageMaker can compare the states of the same algorithms working on different data across multiple machines over the course of the training process, to make sure that all the systems are correctly sharing a representation of the data they’re being fed.

That same representation makes it easier to optimize the hyperparameters of a resulting machine learning model. Those parameters, which govern certain functions of the model, are key to creating the best machine learning system. Traditionally, data scientists would optimize those parameters by repeatedly training the same model with different parameters each time and picking the model that creates the most accurate final result.

However, that can be a time-consuming process, especially for models built using large amounts of data. With SageMaker, Amazon doesn’t have to do all the heavy lifting of retraining, since it can just use the streaming algorithm’s state.

All of this is in the service of creating a system that can handle incredibly large datasets running at global scale, something that’s important both for Amazon’s work on its own AI projects, as well as customers’ needs.

Learn More