Notebook

Ecosystem of a Tensor: N-dimensional arrays, their descriptions and meta infromation¶

Last modification (08.06.2018).¶

Note: this tutorial assumes that you are familiar with the notion of N-dimensional arrays and their efficient representaitons. The related material can be found in out previous tutorials: tutorial_1 and tutorial_2.

Requirements: hottbox==0.1.3

Authors: Ilya Kisil (ilyakisil@gmail.com);

In [1]:

import numpy as np
from hottbox.core import Tensor

In [2]:

def show_meta_information(tensor, data=True, shapes=True, modes=True, state=True):
    """ Quick util for showing relevant information for this tutorial
    
    Parameters
    ----------
    tensor : Tensor
    data : bool
        If True, show data array
    shapes : bool
        If True, show current shape and normal shape
    modes : bool
        If True, show mode information
    state : bool    
        If True, show state information
    """
    print(tensor)
    
    if data:
        print("\n\tThe underlying data array is:")
        print(tensor.data)
    
    if shapes:
        print("\n\tIs this tensor in normal state: {}".format(tensor.in_normal_state))
        print("Current shape of the data array: {}".format(tensor.shape))
        print("Normal shape of the data array: {}".format(tensor.ft_shape))
    
    if modes:
        print("\n\tInformation about its modes:")
        for i, tensor_mode in enumerate(tensor.modes):
            print("#{}: {}".format(i, tensor_mode))

    if state:
        print("\n\tInformation about its current state:")    
        tensor.show_state()
        
def print_sep_line():
    print("\n==========================="
          "============================="
          "===========================\n")

Recall tha the collected raw data in form of N-dimensional array represents different characteristics. Here are couple of examples:

different_tensors

N-dimensional arrays of data can be represented in various different forms. By applying numerical methods (algorithms for tensor decompositions) to the raw data we can obtain, for example, Kruskal or Tucker representation. At the same time, simple data rearrangement procedures (e.g. folding, unfolding) of the raw data also yields different representation.

different_representations

Each dimension of an N-dimensional array is associated with a certain property, mode, of the raw data. At the same time, this characterisc is described by certain features. The relation between these properties defines state of this N-dimensional array. In other words, modes and state could be seen as the meta information about the tensor.

Mode of the tensor is defined by name of the property it represents and features that describe this property.

State of the tensor is defined by transformations applied to the data array.

Normal state of the tensor is such state of the tensor when the underlying raw data array is in its original form. This means that it has not been folded, unfolded or rotated.

Thus, the tensor is described by two different shapes:

Shape of the data array in the current state of the tensor
Normal shape (full shape) - shape of the data array in the normal state.

Each transformation can be characterised by the mode order and type of reshaping. This information is enough in order to be able to revert applied transformation of the data array.

Transformations such as folding or unfolding does not change the original properties of the underlying data array, but they change relashionship between these properties.

data_modes_state

By default, an object of Tensor class is created in normal state with generic mode names that describe properties of dimensions of data array.

In [3]:

data_array = np.arange(24).reshape(2, 3, 4)

tensor = Tensor(data_array)

show_meta_information(tensor)

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

	Is this tensor in normal state: True
Current shape of the data array: (2, 3, 4)
Normal shape of the data array: (2, 3, 4)

	Information about its modes:
#0: Mode(name='mode-0', index=None)
#1: Mode(name='mode-1', index=None)
#2: Mode(name='mode-2', index=None)

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='Init', mode_order=([0], [1], [2]))

Meta information after applying data transformations¶

Next, we will show changes in the meta information of the tensor when different transformations are applied to it.

Note: at the moment, only one data transformation can be applied at the time. This will be generalised in a future releases of hottbox and will be outlined in the CHANGELOG.

Unfolding of the data¶

In [4]:

tensor.unfold(mode=1)

show_meta_information(tensor)

This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (3, 8) and ['mode-1', 'mode-0_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3 12 13 14 15]
 [ 4  5  6  7 16 17 18 19]
 [ 8  9 10 11 20 21 22 23]]

	Is this tensor in normal state: False
Current shape of the data array: (3, 8)
Normal shape of the data array: (2, 3, 4)

	Information about its modes:
#0: Mode(name='mode-0', index=None)
#1: Mode(name='mode-1', index=None)
#2: Mode(name='mode-2', index=None)

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='T', mode_order=([1], [0, 2]))

Folding of the data¶

In [5]:

tensor.fold()

show_meta_information(tensor)

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

	Is this tensor in normal state: True
Current shape of the data array: (2, 3, 4)
Normal shape of the data array: (2, 3, 4)

	Information about its modes:
#0: Mode(name='mode-0', index=None)
#1: Mode(name='mode-1', index=None)
#2: Mode(name='mode-2', index=None)

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='Init', mode_order=([0], [1], [2]))

Vectorisation of the data¶

In [6]:

tensor.vectorise()

show_meta_information(tensor)

This tensor is of order 1 and consists of 24 elements.
Sizes and names of its modes are (24,) and ['mode-0_mode-1_mode-2'] respectively.

	The underlying data array is:
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]

	Is this tensor in normal state: False
Current shape of the data array: (24,)
Normal shape of the data array: (2, 3, 4)

	Information about its modes:
#0: Mode(name='mode-0', index=None)
#1: Mode(name='mode-1', index=None)
#2: Mode(name='mode-2', index=None)

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='T', mode_order=([0, 1, 2],))

As wee can see, the applied transformations rearrange values of the underlying data array. Also they change relations between mode names and modifies state of the tensor. However, the normal shape, information about original modes remains the same.

Different reshaping convensions¶

In computing, row-major order and column-major order are methods for storing multidimensional arrays in linear storage such as random access memory. For example, for the array $$ \mathbf{A} = \begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23} \end{bmatrix} $$ the two possible ways are:

data_ordering

Therefore, there are several conventions for reshaping (unfolding/folding/vectorising) data. Both of them are available in the hottbox. They produce arrays of the same shape, but with values being permuted. The state of the tensor memorises which convention has been applied and will use it for reverting the applied transformation.

Row and column major unfolding¶

In [7]:

data_array = np.arange(24).reshape(2, 3, 4)

tensor_1 = Tensor(data_array)
tensor_2 = Tensor(data_array)

tensor_1.unfold(mode=1, rtype="T")
tensor_2.unfold(mode=1, rtype="K")

print("\tRow-major unfolding")
show_meta_information(tensor_1, shapes=False, modes=False)

print_sep_line()

print("\tColumn-major unfolding")
show_meta_information(tensor_2, shapes=False, modes=False)

	Row-major unfolding
This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (3, 8) and ['mode-1', 'mode-0_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3 12 13 14 15]
 [ 4  5  6  7 16 17 18 19]
 [ 8  9 10 11 20 21 22 23]]

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='T', mode_order=([1], [0, 2]))

===================================================================================

	Column-major unfolding
This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (3, 8) and ['mode-1', 'mode-0_mode-2'] respectively.

	The underlying data array is:
[[ 0 12  1 13  2 14  3 15]
 [ 4 16  5 17  6 18  7 19]
 [ 8 20  9 21 10 22 11 23]]

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='K', mode_order=([1], [0, 2]))

Row and column major folding¶

In [8]:

tensor_1.fold()
tensor_2.fold()
print("\tReverting Row-major unfolding")
show_meta_information(tensor_1, shapes=False, modes=False)

print_sep_line()

print("\tReverting Column-major unfolding")
show_meta_information(tensor_2, shapes=False, modes=False)

	Reverting Row-major unfolding
This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='Init', mode_order=([0], [1], [2]))

===================================================================================

	Reverting Column-major unfolding
This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='Init', mode_order=([0], [1], [2]))

As we can see, the different approaches to reshaping uderlying data affect only the data array itself, whereas other properties remain the same. Similarly to the ufolding along different mode, the state of the tensor keeps track of this transformation as well.

Note: the same type of unfolding and folding should be applied to the data array, in order not to mix up the values that describe different properties of the tensor. But don't worry about it, since this is handled automatically under the hood.

Creating Tensor with custom meta information¶

The state and list of mode are created at the initialisation of the Tensor object:

State of the tensor is created. By default, this step assumes that data is passed in normal shape (was not folded or unfolded before).
List of modes is created based on state. By default, it extracts from state the number of modes to be created and assigns default names to each of them.

The hottbox provides flexibility for this procedure. The Tensor can be created with cutom names for the modes and in state that is not inferred (defined) from the provided data.

If both customisation are passed to the Tensor constructor, the the list of mode names is dependent on the provided state. If only mode names are provided then its length should be consistent witht the number of dimensions of the data array.

Defining a custom state is little bit more trickier, but there is nothing to be scared of. Because state and modes are crucial parts of Tensor ecosystem. Even though there is quit a bit of input validation involded, which will point you to the right direction in case something was not specified correctly, custom state should be specified with caution.

Note: The usefullness of the custom mode names is not fully exploited in hottbox at the moment, but we work on that.

In [9]:

I, J, K = 2, 3, 4

# Provied with 3D array
data_3d = np.arange(I*J*K).reshape(I, J, K)

# Provied with 3D array that had been unfoled
data_2d = np.arange(I*J*K).reshape(I, (J*K))

Custom mode names¶

In [10]:

tensor_1 = Tensor(data_3d, mode_names=["Frequency", "Time", "Subject"])

show_meta_information(tensor_1, data=False, shapes=False, state=False)

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['Frequency', 'Time', 'Subject'] respectively.

	Information about its modes:
#0: Mode(name='Frequency', index=None)
#1: Mode(name='Time', index=None)
#2: Mode(name='Subject', index=None)

Custom state: different mode order¶

In [11]:

custom_state_1 = dict(mode_order=([0], [1, 2]),
                      normal_shape=(2, 3, 4),
                      rtype="T"
                     )
custom_state_2 = dict(mode_order=([1], [0, 2]),
                      normal_shape=(2, 3, 4),
                      rtype="T"
                     )

tensor_1 = Tensor(data_2d, custom_state=custom_state_1)
tensor_2 = Tensor(data_2d, custom_state=custom_state_2)

show_meta_information(tensor_1, modes=False, shapes=False, state=False)

print_sep_line()

show_meta_information(tensor_2, modes=False, shapes=False, state=False)

This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (2, 12) and ['mode-0', 'mode-1_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]]

===================================================================================

This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (2, 12) and ['mode-1', 'mode-0_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]]

In [12]:

tensor_1.fold()
tensor_2.fold()

show_meta_information(tensor_1, modes=False, shapes=False, state=False)

print_sep_line()

show_meta_information(tensor_2, modes=False, shapes=False, state=False)

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

===================================================================================

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 8  9 10 11]
  [16 17 18 19]]

 [[ 4  5  6  7]
  [12 13 14 15]
  [20 21 22 23]]]

Note: this example is for illustration purposes only, since it does not follow true unfolding/folding expressions that is:

unfolded_along = mode_order[0][0]
data_2d.shape[0] != normal_shape[unfolded_along]

Custom state: different reshaping type¶

In [13]:

custom_state_1 = dict(mode_order=([0], [1, 2]),
                      normal_shape=(2, 3, 4),
                      rtype="T"
                     )
custom_state_2 = dict(mode_order=([0], [1, 2]),
                      normal_shape=(2, 3, 4),
                      rtype="K"
                     )

tensor_1 = Tensor(data_2d, custom_state=custom_state_1)
tensor_2 = Tensor(data_2d, custom_state=custom_state_2)

show_meta_information(tensor_1, modes=False, shapes=False, state=False)

print_sep_line()

show_meta_information(tensor_2, modes=False, shapes=False, state=False)

This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (2, 12) and ['mode-0', 'mode-1_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]]

===================================================================================

This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (2, 12) and ['mode-0', 'mode-1_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]]

In [14]:

tensor_1.fold()
tensor_2.fold()

show_meta_information(tensor_1, modes=False, shapes=False, state=False)

print_sep_line()

show_meta_information(tensor_2, modes=False, shapes=False, state=False)

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

===================================================================================

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  3  6  9]
  [ 1  4  7 10]
  [ 2  5  8 11]]

 [[12 15 18 21]
  [13 16 19 22]
  [14 17 20 23]]]

Custom state: different normal shape¶

In [15]:

custom_state_1 = dict(mode_order=([0], [1, 2]),
                      normal_shape=(2, 3, 4),
                      rtype="T"
                     )
custom_state_2 = dict(mode_order=([0], [1, 2]),
                      normal_shape=(2, 4, 3),
                      rtype="T"
                     )

tensor_1 = Tensor(data_2d, custom_state=custom_state_1)
tensor_2 = Tensor(data_2d, custom_state=custom_state_2)

show_meta_information(tensor_1, modes=False, shapes=False, state=False)

print_sep_line()

show_meta_information(tensor_2, modes=False, shapes=False, state=False)

This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (2, 12) and ['mode-0', 'mode-1_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]]

===================================================================================

This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (2, 12) and ['mode-0', 'mode-1_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]]

In [16]:

tensor_1.fold()
tensor_2.fold()

show_meta_information(tensor_1, modes=False, shapes=False, state=False)

print_sep_line()

show_meta_information(tensor_2, modes=False, shapes=False, state=False)

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

===================================================================================

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 4, 3) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]
  [ 9 10 11]]

 [[12 13 14]
  [15 16 17]
  [18 19 20]
  [21 22 23]]]

Custom state and mode names¶

In [17]:

I, J, K = 2, 3, 4
data_2d = np.arange(I*J*K).reshape(J, (I*K))

custom_state = dict(mode_order=([1], [0, 2]),
                    normal_shape=(3, 2, 4),
                    rtype="T"
                   )
tensor_1 = Tensor(data_2d, custom_state, mode_names=["Frequency", "Time", "Subject"])
show_meta_information(tensor_1, shapes=False)

print_sep_line()

tensor_1.fold()
show_meta_information(tensor_1, shapes=False)

This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (3, 8) and ['Time', 'Frequency_Subject'] respectively.

	The underlying data array is:
[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]
 [16 17 18 19 20 21 22 23]]

	Information about its modes:
#0: Mode(name='Frequency', index=None)
#1: Mode(name='Time', index=None)
#2: Mode(name='Subject', index=None)

	Information about its current state:
State(normal_shape=(3, 2, 4), rtype='T', mode_order=([1], [0, 2]))

===================================================================================

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (3, 2, 4) and ['Frequency', 'Time', 'Subject'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [12 13 14 15]]

 [[ 4  5  6  7]
  [16 17 18 19]]

 [[ 8  9 10 11]
  [20 21 22 23]]]

	Information about its modes:
#0: Mode(name='Frequency', index=None)
#1: Mode(name='Time', index=None)
#2: Mode(name='Subject', index=None)

	Information about its current state:
State(normal_shape=(3, 2, 4), rtype='Init', mode_order=([0], [1], [2]))