Abstract:
In many modern computer applications such as Market Analysis, Critical Care, Speech Recognition, Physical Plant Monitoring, Sleep Stage Classification, Biological Population Tracking, data is captured over the course of time, constituting a Time-Series. Time-Series data often contain temporal information dependencies that cause two otherwise identical points of time to belong to different classes or predict different behavior. This inherent characteristic increases the difficulty of processing such data. Deep Machine Learning (DML) techniques possess the inherent ability for analyzing and making predictions about such data. By its nature, DML requires extensive provision of resources key amongst which is the model computation time. Several optimization algorithms have been invented in the recent past and compare differently in terms of their resource needs. The most popular class of optimization algorithms is based on the classical stochastic gradient descent (SGD) algorithm due to its ability to converge within reasonable time bounds. This paper is part of a larger project investigating optimization procedures for deep learning tasks based on the SGD. Specifically, we report on the comparative performance capabilities of the most popular SGD based algorithms for task of Time Series prediction namely. From our analysis of the six of these algorithms, we noted that ADAMAX is most appropriate for online learning while RMSPROP is the least affected by over-fitting for long training cycles