SINGA is a general distributed deep learning platform for training big deep learning models over large datasets. It is designed with an intuitive programming model based on the layer abstraction. A variety of popular deep learning models are supported, namely feed-forward models including convolutional neural networks (CNN), energy models like restricted Boltzmann machine (RBM), and recurrent neural networks (RNN).
SINGA architecture is sufficiently flexible to run synchronous, asynchronous and hybrid training frameworks. SINGA also supports different neural net partitioning schemes to parallelize the training of large models, namely partitioning on batch dimension, feature dimension or hybrid partitioning.
Training a deep learning model is to find the optimal parameters involved in the transformation functions that generate good features for specific tasks. The goodness of a set of parameters is measured by a loss function, e.g., Cross-Entropy Loss. Since the loss functions are usually non-linear and non-convex, it is difficult to get a closed form solution. Typically, people use the stochastic gradient descent (SGD) algorithm, which randomly initializes the parameters and then iteratively updates them to reduce the loss as shown above.
To submit a job in SINGA (i.e., training a deep learning model), users pass the job configuration to SINGA driver in the main function. The job configuration specifies the four major components in Figure 2,
- NeuralNet describing the neural net structure with the detailed layer setting and their connections;
- TrainOneBatch algorithm which is tailored for different model categories;
- Updater defining the protocol for updating parameters at the server side;
- Cluster Topology specifying the distributed architecture of workers and servers.
This process is like the job submission in Hadoop, where users configure their jobs in the main function to set the mapper, reducer, etc. In Hadoop, users can configure their jobs with their own (or built-in) mapper and reducer; in SINGA, users can configure their jobs with their own (or built-in) layer, updater, etc.
For Quick Start & Programming guide refer link below