Deep Learning Definition | Training Dataset

Deep Learning is the new data for machine learning research. which has been used to train a dataset in machine learning. You can follow here more about the deep learning process.

Deep Learning Concepts

The deep learning is useful for representing multiple datasets and abstractions to make sense in such as images, sound, text, etc.

This tutorial helps you to understand the important deep learning algorithm for the training dataset and how to run that in “Theano“.

The Theano is a python library that helps to create a training dataset in machine learning on GPU.

The GPU is the same as CPU (Control Processing Unit) But it executes the fast than CPU. So, the GPU is more helpful to load and execute large data in memory especially for deep learning programs of the dataset.

You can learn here more about Theano Basic Tutorial. This is also a technical topic. So, some knowledge may great topic to be done.


MNIST Dataset

The MNIST dataset consists of handwritten digit images. That is divided into 60,000 examples for the training dataset and 10,000 examples for testing. So, the total has 50,000 actual training dataset and 10,000 is a validation example.

All digit images have been normalized size and fixed size images of 28 x 28 pixels. In the original dataset of each image is represented between 0 to 255. where you can see in this image 50,000 training dataset is black and testing and validation dataset is white.

MNIST Dataset Program Using Theano Python Library

The MNIST Dataset is represented in three tuple lists- Training Set, Validation Set, Testing Set. Each of these is a pair formatted list image such as matrix format.

As the image is represented as numpy 1-dimensional array of 28 x 28 float values between 0 to 1. where 0 show black and 1 show white. You can follow this code in python.

import cPickle, gzip, numpy

f=‘mnist.pkl.gz’, ‘rb’)
train_set, valid_set, test_set = cPickle.load(f)

When using the dataset, that is divided. there have shared variables is related using GPU. there is large overhead when copying data into the GPU memory.

Theano library has possible to copy the entire data in Theano shared variables through. you can see this code to understand how to store data and how to access that.

def shared_dataset (data_xy)

“”” Load Shared Variable Dataset. We having store dataset in shared variables is to allow Theano to copy it into GPU memory. since copying data into GPU is slow. You can follow here “””

data_x, data_y = data_xy

shared_x = theano.shared (numpy.asarray (data_x, dtype=theano. config.floatX))
shared_y = theano.shared (numpy.asarray (data_y, dtype=theano. config.floatX))

“””when storing data on the GPU it has to be stored as float point. therefore we will store the lables as “floatX” as well. shared_x and shared_y is label of int by default from phython framework. which store the data in GPU.”””

return shared_x, T.cast (shared_y, ‘int32’)

test_set_x, test_set_y = shared_dataset (test_set)
valid_set_x, valid_set_y = shared_dataset (valid_set)
train_set_x, train_set_y = shared_dataset (train_set)

batch_size = 500 #size of the minibatch

#accessing training dataset

Data = train_set_x [2 * batch_size: 3 * batch_size]
Label = train_set_y [2 * batch_size: 3 * batch_size]

The data has to be stored as floats on the GPU ( the datatype like data and label for storing on the GPU is given by theano.config.floatX).

Deep Learning Python Theano Library

import theano
import theano . tensor as T
import numpy

Theano/Python Tips

Loading And Saving Models
when you are doing experiments. It can take much time to find out the decent parameter. you can follow this program to save current decent time as the search progress

The best way to achieve your model’s parameter is to use a pickle or deep copy of the n-array objects. For Example: when you are using u,v,w variable. you can use this command for the parameter.

import cPickle
save_file = open (‘path’, ‘wb’)

“””this indicate the overwrite current contents. you can follow below as-1 is for highest protocol than numpy’s default”””

cPickle.dump (w.get_value (borrow=True), save_file, -1)
cPickle.dump (v.get_value (borrow=True), save_file, -1)
cPickle.dump (u.get_value (borrow=True), save_file, -1)

#You can store this program for load back

save_file = open (‘path’)
w.set_value(cPickle.load(save_file), borrow=True)
v.set_value (cPickle.load(save_file), borrow=True)
u.set_value (cPickle.load(save_file), borrow=True)

You will able to write this program to save and load the model parameters.

Do Not Pickle In Theano Function Library

Theano function is compatible with Python’s deep copy and pickle mechanisms. But you should not necessarily pickle for Theano Function. if you update your Theano folder and one of the internal changes, then you may not able to un-pickle your model.

Theano is still in active development and the internal APIs are subject to change. So, don’t be use pickle for data training or data testing.

The pickle mechanism is aimed at short term storage such as a temp file or a copy to another machine in a distributed work.

Deep Learning Versus Machine Learning

Deep Learning Vs Machine Learning

deep learning vs machine learning

Deep Learning – That is a process of training a dataset in machine learning. This is the only technology to create the best training dataset by using ML.

The training dataset is categorized into two like supervised learning and unsupervised learning. we already covered in the previous blog. you can follow here supervised learning and unsupervised learning.

It refers to understand the machine about how that will behavior in the future. For Example, one person asking the question for answering. That is capable to give the answer of that by trained data. It is used to train the data.

Machine Learning – Machine Learning is a way of defining any significant things such as deep learning training dataset for developing any images, text, sound, etc.

Machine Learning refers to understand the machine about particular things. The machine learning has established the data.


Deep Learning PDF

Deep Learning PPT


The DL is an important technology to understand machine learning is capable to give the answer of any question for a long time till by trained data.

The Theano is the best Python library for training datasets. There are important interfaces like nuppy is the best algorithm to find out any data in large datasets.

The Python has many resources or API (Application Programming Interface) for deep learning and any other technology like Artificial Intelligence, Big Data, etc.

Share This Article

2 thoughts on “Deep Learning Definition | Training Dataset”

  1. Superb post however , I was wanting to know if you could write a litte more on this topic? I’d be very thankful if you could elaborate a little bit more. Many thanks!


Leave a Comment