
.. _lib_scan:

================================
:mod:`scan` -- Looping in Theano
================================


Guide
=====

The scan functions provides the basic functionality needed to do loops
in Theano. Scan comes with many whistles and bells, that can be easily
introduced through a few examples :

Basic functionality :  Computing :math:`A^k`
--------------------------------------------

Assume that, given *k* you want to get ``A**k`` using a loop.
More precisely, if *A* is a tensor you want to compute
``A**k`` elemwise. The python/numpy code would loop like

.. code-block:: python

  result = 1
  for i in xrange(k):
    result = result * A

The equivalent Theano code would be

.. code-block:: python

  # Symbolic description of the result
  result,updates = theano.scan(fn = lambda x_tm1,A: x_tm1*A,\
                       outputs_info = T.ones_like(A),\
                       non_sequences  = A, \
                       n_steps        = k)

  # compiled function that returns A**k
  f = theano.function([A,k], result[-1], updates = updates)

Let us go through the example line by line. What we did is first to
construct a function (using a lambda expression) that given `x_tm1` and
`A` returns `x_tm1*A`. Given the order of the parameters, `x_tm1`
is the value of our output at time step ``t-1``. Therefore
``x_t`` (value of output at time `t`) is `A` times value of output
at `t-1`.
Next we initialize the output as a tensor with same
shape as A filled with ones. We give A to scan as a non sequence parameter  and
specify the number of steps k to iterate over our lambda expression.

Scan will return a tuple, containing our result (``result``) and a
dictionary of updates ( empty in this case). Note that the result
is not a matrix, but a 3D tensor containing the value of ``A**k`` for
each step. We want the last value ( after k steps ) so we compile
a function to return just that. Note that there is an optimization, that
at compile time will detect that you are using just the last value of the
result and ensure that scan does not store all the intermediate values
that are used. So do not worry if A and k are large.

Multiple outputs, several taps values - Recurrent Neural Network with Scan
--------------------------------------------------------------------------

A more practical task would be to implement a RNN using scan. Assume
that our RNN is defined as follows :

.. math::
  x(n) = \tanh( W x(n-1) + W^{in}_1 u(n) + W^{in}_2 u(n-4) +
  W^{feedback} y(n-1) )

  y(n) = W^{out} x(n- 3)

Note that this network is far from a classical recurrent neural
network and might be useless. The reason we defined as such
is to better ilustrate the features of scan.

In this case we have a sequence over which we need to iterate ``u``,
and two outputs ``x`` and ``y``. To implement this with scan we first
construct a function that computes one iteration step :

.. code-block:: python

  def oneStep(u_tm4, u_t, x_tm3, x_tm1, y_tm1, W, W_in_1, W_in_2,  W_feedback, W_out):

    x_t = T.tanh( theano.dot(x_tm1, W) + \
                  theano.dot(u_t,   W_in_1) + \
                  theano.dot(u_tm4, W_in_2) + \
                  theano.dot(y_tm1, W_feedback))
    y_t = theano.dot(x_tm3, W_out)

    return [x_t, y_t]

As naming convention for the variables we used ``a_tmb`` to mean ``a`` at
``t-b`` and ``a_tpb`` to be ``a`` at ``t+b``.
Note the order in which the parameters are given, and in which the
result is returned. Try to respect cronological order among
the taps ( time slices of sequences or outputs) used. For scan is crucial only
for the variables representing the different time taps to be in the same order
as the one in which these taps are given. Also, not only taps should respect
an order, but also variables, since this is how scan figures out what should
be represented by what. Given that we have all
the Theano variables needed we construct our RNN as follows :

.. code-block:: python

   u  = T.matrix() # it is a sequence of vectors
   x0 = T.matrix() # initial state of x has to be a matrix, since
                   # it has to cover x[-3]
   y0 = T.vector() # y0 is just a vector since scan has only to provide
                   # y[-1]


   ([x_vals, y_vals],updates) = theano.scan(fn = oneStep, \
                                sequences    = dict(input = u, taps= [-4,-0]), \
                                outputs_info = [dict(initial = x0, taps = [-3,-1]),y0], \
                                non_sequences  = [W,W_in_1,W_in_2,W_feedback, W_out])
        # for second input y, scan adds -1 in output_taps by default



Now ``x_vals`` and ``y_vals`` are symbolic variables pointing to the
sequence of x and y values generated by iterating over u. The
``sequence_taps``, ``outputs_taps`` give to scan information about what
slices are exactly needed. Note that if we want to use ``x[t-k]`` we do
not need to also have ``x[t-(k-1)], x[t-(k-2)],..``, but when applying
the compiled function, the numpy array given to represent this sequence
should be large enough to cover this values. Assume that we compile the
above function, and we give as ``u`` the array ``uvals = [0,1,2,3,4,5,6,7,8]``.
By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and
will start scaning from ``uvals[4]`` towards the end.


Using shared variables - Gibbs sampling
---------------------------------------

Another useful feature of scan, is that it can handle shared variables.
For example, if we want to implement a Gibbs chain of length 10 we would do
the following:

.. code-block:: python

 W = theano.shared ( W_values ) # we assume that ``W_values`` contains the
                                # initial values of your weight matrix

 bvis = theano.shared( bvis_values)
 bhid = theano.shared( bhid_values)

 trng = T.shared_randomstreams.RandomStreams(1234)

 def OneStep( vsample) :
    hmean   = T.nnet.sigmoid( theano.dot( vsample, W) + bhid)
    hsample = trng.binomial( size = hmean.shape, n = 1, prob = hmean)
    vmean   = T.nnet.sigmoid( theano.dot( hsample. W.T) + bvis)
    return trng.binomial( size = vsample.shape, n = 1, prob = vsample)

 sample = theano.tensor.vector()

 values, updates = theano.scan( OneStep, outputs_info = sample, n_steps = 10 )

 gibbs10 = theano.function([sample], values[-1], updates = updates)


Note that if we use shared variables ( ``W``, ``bvis``, ``bhid``) but
we do not iterate over them ( so scan doesn't really need to know
anything in particular about them, just that they are used inside the
function applied at each step) you do not need to pass them as
arguments. Scan will find them on its on and add them to the graph. Of
course, if you wish to (and it is good practice) you can add them, when
you call scan (they would be in the list of non sequence inputs).

The second, and probably most crucial observation is that the updates
dictionary becomes important in this case. It links a shared variable
with its updated value after k steps. In this case it tells how the
random streams get updated after 10 iterations. If you do not pass this
update dictionary to your function, you will always get the same 10
sets of random numbers. You can even use the ``updates`` dictionary
afterwards. Look at this example :

.. code-block:: python

 a = theano.shared(1)
 values,updates = theano.scan( lambda : {a:a+1}, n_steps = 10 )

In this case the lambda expression does not require any input parameters
and returns an update dictionary which tells how ``a`` should be updated
after each step of scan. If we write :

.. code-block:: python

  b = a+1
  c = updates[a] + 1
  f = theano.function([], [b,c], updates = updates)

  print b
  print c
  print a.value

We will see that because ``b`` does not use the updated version of
``a``, it will be 2, ``c`` will be 12, while ``a.value`` is ``11``.
If we call the function again, ``b`` will become 12, ``c`` will be 22
and ``a.value`` 21.

If we do not pass the ``updates`` dictionary to the function, then
``a.value`` will always remain 1, ``b`` will always be 2 and ``c``
will always be ``12``.



Reference
=========

.. automodule:: theano.scan

.. autofunction:: theano.map
.. autofunction:: theano.reduce
.. autofunction:: theano.foldl
.. autofunction:: theano.foldr
.. autofunction:: theano.scan
    :noindex:

