Note: this tutorial assumes that you are familiar with the notion of N-dimensional arrays and their efficient representaitons. The related material can be found in out previous tutorials: tutorial_1 and tutorial_2.
Requirements: hottbox==0.1.3
Authors: Ilya Kisil (ilyakisil@gmail.com);
import numpy as np
from hottbox.core import Tensor
def show_meta_information(tensor, data=True, shapes=True, modes=True, state=True):
""" Quick util for showing relevant information for this tutorial
Parameters
----------
tensor : Tensor
data : bool
If True, show data array
shapes : bool
If True, show current shape and normal shape
modes : bool
If True, show mode information
state : bool
If True, show state information
"""
print(tensor)
if data:
print("\n\tThe underlying data array is:")
print(tensor.data)
if shapes:
print("\n\tIs this tensor in normal state: {}".format(tensor.in_normal_state))
print("Current shape of the data array: {}".format(tensor.shape))
print("Normal shape of the data array: {}".format(tensor.ft_shape))
if modes:
print("\n\tInformation about its modes:")
for i, tensor_mode in enumerate(tensor.modes):
print("#{}: {}".format(i, tensor_mode))
if state:
print("\n\tInformation about its current state:")
tensor.show_state()
def print_sep_line():
print("\n==========================="
"============================="
"===========================\n")
Recall tha the collected raw data in form of N-dimensional array represents different characteristics. Here are couple of examples:
N-dimensional arrays of data can be represented in various different forms. By applying numerical methods (algorithms for tensor decompositions) to the raw data we can obtain, for example, Kruskal or Tucker representation. At the same time, simple data rearrangement procedures (e.g. folding, unfolding) of the raw data also yields different representation.
Each dimension of an N-dimensional array is associated with a certain property, mode, of the raw data. At the same time, this characterisc is described by certain features. The relation between these properties defines state of this N-dimensional array. In other words, modes and state could be seen as the meta information about the tensor.
Mode of the tensor is defined by name of the property it represents and features that describe this property.
State of the tensor is defined by transformations applied to the data array.
Normal state of the tensor is such state of the tensor when the underlying raw data array is in its original form. This means that it has not been folded, unfolded or rotated.
Thus, the tensor is described by two different shapes:
Each transformation can be characterised by the mode order and type of reshaping. This information is enough in order to be able to revert applied transformation of the data array.
Transformations such as folding or unfolding does not change the original properties of the underlying data array, but they change relashionship between these properties.
By default, an object of Tensor class is created in normal state with generic mode names that describe properties of dimensions of data array.
data_array = np.arange(24).reshape(2, 3, 4)
tensor = Tensor(data_array)
show_meta_information(tensor)
This tensor is of order 3 and consists of 24 elements. Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively. The underlying data array is: [[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [[12 13 14 15] [16 17 18 19] [20 21 22 23]]] Is this tensor in normal state: True Current shape of the data array: (2, 3, 4) Normal shape of the data array: (2, 3, 4) Information about its modes: #0: Mode(name='mode-0', index=None) #1: Mode(name='mode-1', index=None) #2: Mode(name='mode-2', index=None) Information about its current state: State(normal_shape=(2, 3, 4), rtype='Init', mode_order=([0], [1], [2]))
Next, we will show changes in the meta information of the tensor when different transformations are applied to it.
Note: at the moment, only one data transformation can be applied at the time. This will be generalised in a future releases of hottbox and will be outlined in the CHANGELOG.
tensor.unfold(mode=1)
show_meta_information(tensor)
This tensor is of order 2 and consists of 24 elements. Sizes and names of its modes are (3, 8) and ['mode-1', 'mode-0_mode-2'] respectively. The underlying data array is: [[ 0 1 2 3 12 13 14 15] [ 4 5 6 7 16 17 18 19] [ 8 9 10 11 20 21 22 23]] Is this tensor in normal state: False Current shape of the data array: (3, 8) Normal shape of the data array: (2, 3, 4) Information about its modes: #0: Mode(name='mode-0', index=None) #1: Mode(name='mode-1', index=None) #2: Mode(name='mode-2', index=None) Information about its current state: State(normal_shape=(2, 3, 4), rtype='T', mode_order=([1], [0, 2]))
tensor.fold()
show_meta_information(tensor)
This tensor is of order 3 and consists of 24 elements. Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively. The underlying data array is: [[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [[12 13 14 15] [16 17 18 19] [20 21 22 23]]] Is this tensor in normal state: True Current shape of the data array: (2, 3, 4) Normal shape of the data array: (2, 3, 4) Information about its modes: #0: Mode(name='mode-0', index=None) #1: Mode(name='mode-1', index=None) #2: Mode(name='mode-2', index=None) Information about its current state: State(normal_shape=(2, 3, 4), rtype='Init', mode_order=([0], [1], [2]))
tensor.vectorise()
show_meta_information(tensor)
This tensor is of order 1 and consists of 24 elements. Sizes and names of its modes are (24,) and ['mode-0_mode-1_mode-2'] respectively. The underlying data array is: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] Is this tensor in normal state: False Current shape of the data array: (24,) Normal shape of the data array: (2, 3, 4) Information about its modes: #0: Mode(name='mode-0', index=None) #1: Mode(name='mode-1', index=None) #2: Mode(name='mode-2', index=None) Information about its current state: State(normal_shape=(2, 3, 4), rtype='T', mode_order=([0, 1, 2],))
As wee can see, the applied transformations rearrange values of the underlying data array. Also they change relations between mode names and modifies state of the tensor. However, the normal shape, information about original modes remains the same.
In computing, row-major order and column-major order are methods for storing multidimensional arrays in linear storage such as random access memory. For example, for the array $$ \mathbf{A} = \begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23} \end{bmatrix} $$ the two possible ways are:
Therefore, there are several conventions for reshaping (unfolding/folding/vectorising) data. Both of them are available in the hottbox. They produce arrays of the same shape, but with values being permuted. The state of the tensor memorises which convention has been applied and will use it for reverting the applied transformation.
data_array = np.arange(24).reshape(2, 3, 4)
tensor_1 = Tensor(data_array)
tensor_2 = Tensor(data_array)
tensor_1.unfold(mode=1, rtype="T")
tensor_2.unfold(mode=1, rtype="K")
print("\tRow-major unfolding")
show_meta_information(tensor_1, shapes=False, modes=False)
print_sep_line()
print("\tColumn-major unfolding")
show_meta_information(tensor_2, shapes=False, modes=False)
Row-major unfolding This tensor is of order 2 and consists of 24 elements. Sizes and names of its modes are (3, 8) and ['mode-1', 'mode-0_mode-2'] respectively. The underlying data array is: [[ 0 1 2 3 12 13 14 15] [ 4 5 6 7 16 17 18 19] [ 8 9 10 11 20 21 22 23]] Information about its current state: State(normal_shape=(2, 3, 4), rtype='T', mode_order=([1], [0, 2])) =================================================================================== Column-major unfolding This tensor is of order 2 and consists of 24 elements. Sizes and names of its modes are (3, 8) and ['mode-1', 'mode-0_mode-2'] respectively. The underlying data array is: [[ 0 12 1 13 2 14 3 15] [ 4 16 5 17 6 18 7 19] [ 8 20 9 21 10 22 11 23]] Information about its current state: State(normal_shape=(2, 3, 4), rtype='K', mode_order=([1], [0, 2]))
tensor_1.fold()
tensor_2.fold()
print("\tReverting Row-major unfolding")
show_meta_information(tensor_1, shapes=False, modes=False)
print_sep_line()
print("\tReverting Column-major unfolding")
show_meta_information(tensor_2, shapes=False, modes=False)
Reverting Row-major unfolding This tensor is of order 3 and consists of 24 elements. Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively. The underlying data array is: [[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [[12 13 14 15] [16 17 18 19] [20 21 22 23]]] Information about its current state: State(normal_shape=(2, 3, 4), rtype='Init', mode_order=([0], [1], [2])) =================================================================================== Reverting Column-major unfolding This tensor is of order 3 and consists of 24 elements. Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively. The underlying data array is: [[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [[12 13 14 15] [16 17 18 19] [20 21 22 23]]] Information about its current state: State(normal_shape=(2, 3, 4), rtype='Init', mode_order=([0], [1], [2]))
As we can see, the different approaches to reshaping uderlying data affect only the data array itself, whereas other properties remain the same. Similarly to the ufolding along different mode, the state of the tensor keeps track of this transformation as well.
Note: the same type of unfolding and folding should be applied to the data array, in order not to mix up the values that describe different properties of the tensor. But don't worry about it, since this is handled automatically under the hood.
The state and list of mode are created at the initialisation of the Tensor object:
The hottbox provides flexibility for this procedure. The Tensor can be created with cutom names for the modes and in state that is not inferred (defined) from the provided data.
If both customisation are passed to the Tensor constructor, the the list of mode names is dependent on the provided state. If only mode names are provided then its length should be consistent witht the number of dimensions of the data array.
Defining a custom state is little bit more trickier, but there is nothing to be scared of. Because state and modes are crucial parts of Tensor ecosystem. Even though there is quit a bit of input validation involded, which will point you to the right direction in case something was not specified correctly, custom state should be specified with caution.
Note: The usefullness of the custom mode names is not fully exploited in hottbox at the moment, but we work on that.
I, J, K = 2, 3, 4
# Provied with 3D array
data_3d = np.arange(I*J*K).reshape(I, J, K)
# Provied with 3D array that had been unfoled
data_2d = np.arange(I*J*K).reshape(I, (J*K))
tensor_1 = Tensor(data_3d, mode_names=["Frequency", "Time", "Subject"])
show_meta_information(tensor_1, data=False, shapes=False, state=False)
This tensor is of order 3 and consists of 24 elements. Sizes and names of its modes are (2, 3, 4) and ['Frequency', 'Time', 'Subject'] respectively. Information about its modes: #0: Mode(name='Frequency', index=None) #1: Mode(name='Time', index=None) #2: Mode(name='Subject', index=None)
custom_state_1 = dict(mode_order=([0], [1, 2]),
normal_shape=(2, 3, 4),
rtype="T"
)
custom_state_2 = dict(mode_order=([1], [0, 2]),
normal_shape=(2, 3, 4),
rtype="T"
)
tensor_1 = Tensor(data_2d, custom_state=custom_state_1)
tensor_2 = Tensor(data_2d, custom_state=custom_state_2)
show_meta_information(tensor_1, modes=False, shapes=False, state=False)
print_sep_line()
show_meta_information(tensor_2, modes=False, shapes=False, state=False)
This tensor is of order 2 and consists of 24 elements. Sizes and names of its modes are (2, 12) and ['mode-0', 'mode-1_mode-2'] respectively. The underlying data array is: [[ 0 1 2 3 4 5 6 7 8 9 10 11] [12 13 14 15 16 17 18 19 20 21 22 23]] =================================================================================== This tensor is of order 2 and consists of 24 elements. Sizes and names of its modes are (2, 12) and ['mode-1', 'mode-0_mode-2'] respectively. The underlying data array is: [[ 0 1 2 3 4 5 6 7 8 9 10 11] [12 13 14 15 16 17 18 19 20 21 22 23]]
tensor_1.fold()
tensor_2.fold()
show_meta_information(tensor_1, modes=False, shapes=False, state=False)
print_sep_line()
show_meta_information(tensor_2, modes=False, shapes=False, state=False)
This tensor is of order 3 and consists of 24 elements. Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively. The underlying data array is: [[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [[12 13 14 15] [16 17 18 19] [20 21 22 23]]] =================================================================================== This tensor is of order 3 and consists of 24 elements. Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively. The underlying data array is: [[[ 0 1 2 3] [ 8 9 10 11] [16 17 18 19]] [[ 4 5 6 7] [12 13 14 15] [20 21 22 23]]]
Note: this example is for illustration purposes only, since it does not follow true unfolding/folding expressions that is:
unfolded_along = mode_order[0][0]
data_2d.shape[0] != normal_shape[unfolded_along]
custom_state_1 = dict(mode_order=([0], [1, 2]),
normal_shape=(2, 3, 4),
rtype="T"
)
custom_state_2 = dict(mode_order=([0], [1, 2]),
normal_shape=(2, 3, 4),
rtype="K"
)
tensor_1 = Tensor(data_2d, custom_state=custom_state_1)
tensor_2 = Tensor(data_2d, custom_state=custom_state_2)
show_meta_information(tensor_1, modes=False, shapes=False, state=False)
print_sep_line()
show_meta_information(tensor_2, modes=False, shapes=False, state=False)
This tensor is of order 2 and consists of 24 elements. Sizes and names of its modes are (2, 12) and ['mode-0', 'mode-1_mode-2'] respectively. The underlying data array is: [[ 0 1 2 3 4 5 6 7 8 9 10 11] [12 13 14 15 16 17 18 19 20 21 22 23]] =================================================================================== This tensor is of order 2 and consists of 24 elements. Sizes and names of its modes are (2, 12) and ['mode-0', 'mode-1_mode-2'] respectively. The underlying data array is: [[ 0 1 2 3 4 5 6 7 8 9 10 11] [12 13 14 15 16 17 18 19 20 21 22 23]]
tensor_1.fold()
tensor_2.fold()
show_meta_information(tensor_1, modes=False, shapes=False, state=False)
print_sep_line()
show_meta_information(tensor_2, modes=False, shapes=False, state=False)
This tensor is of order 3 and consists of 24 elements. Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively. The underlying data array is: [[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [[12 13 14 15] [16 17 18 19] [20 21 22 23]]] =================================================================================== This tensor is of order 3 and consists of 24 elements. Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively. The underlying data array is: [[[ 0 3 6 9] [ 1 4 7 10] [ 2 5 8 11]] [[12 15 18 21] [13 16 19 22] [14 17 20 23]]]
custom_state_1 = dict(mode_order=([0], [1, 2]),
normal_shape=(2, 3, 4),
rtype="T"
)
custom_state_2 = dict(mode_order=([0], [1, 2]),
normal_shape=(2, 4, 3),
rtype="T"
)
tensor_1 = Tensor(data_2d, custom_state=custom_state_1)
tensor_2 = Tensor(data_2d, custom_state=custom_state_2)
show_meta_information(tensor_1, modes=False, shapes=False, state=False)
print_sep_line()
show_meta_information(tensor_2, modes=False, shapes=False, state=False)
This tensor is of order 2 and consists of 24 elements. Sizes and names of its modes are (2, 12) and ['mode-0', 'mode-1_mode-2'] respectively. The underlying data array is: [[ 0 1 2 3 4 5 6 7 8 9 10 11] [12 13 14 15 16 17 18 19 20 21 22 23]] =================================================================================== This tensor is of order 2 and consists of 24 elements. Sizes and names of its modes are (2, 12) and ['mode-0', 'mode-1_mode-2'] respectively. The underlying data array is: [[ 0 1 2 3 4 5 6 7 8 9 10 11] [12 13 14 15 16 17 18 19 20 21 22 23]]
tensor_1.fold()
tensor_2.fold()
show_meta_information(tensor_1, modes=False, shapes=False, state=False)
print_sep_line()
show_meta_information(tensor_2, modes=False, shapes=False, state=False)
This tensor is of order 3 and consists of 24 elements. Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively. The underlying data array is: [[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [[12 13 14 15] [16 17 18 19] [20 21 22 23]]] =================================================================================== This tensor is of order 3 and consists of 24 elements. Sizes and names of its modes are (2, 4, 3) and ['mode-0', 'mode-1', 'mode-2'] respectively. The underlying data array is: [[[ 0 1 2] [ 3 4 5] [ 6 7 8] [ 9 10 11]] [[12 13 14] [15 16 17] [18 19 20] [21 22 23]]]
I, J, K = 2, 3, 4
data_2d = np.arange(I*J*K).reshape(J, (I*K))
custom_state = dict(mode_order=([1], [0, 2]),
normal_shape=(3, 2, 4),
rtype="T"
)
tensor_1 = Tensor(data_2d, custom_state, mode_names=["Frequency", "Time", "Subject"])
show_meta_information(tensor_1, shapes=False)
print_sep_line()
tensor_1.fold()
show_meta_information(tensor_1, shapes=False)
This tensor is of order 2 and consists of 24 elements. Sizes and names of its modes are (3, 8) and ['Time', 'Frequency_Subject'] respectively. The underlying data array is: [[ 0 1 2 3 4 5 6 7] [ 8 9 10 11 12 13 14 15] [16 17 18 19 20 21 22 23]] Information about its modes: #0: Mode(name='Frequency', index=None) #1: Mode(name='Time', index=None) #2: Mode(name='Subject', index=None) Information about its current state: State(normal_shape=(3, 2, 4), rtype='T', mode_order=([1], [0, 2])) =================================================================================== This tensor is of order 3 and consists of 24 elements. Sizes and names of its modes are (3, 2, 4) and ['Frequency', 'Time', 'Subject'] respectively. The underlying data array is: [[[ 0 1 2 3] [12 13 14 15]] [[ 4 5 6 7] [16 17 18 19]] [[ 8 9 10 11] [20 21 22 23]]] Information about its modes: #0: Mode(name='Frequency', index=None) #1: Mode(name='Time', index=None) #2: Mode(name='Subject', index=None) Information about its current state: State(normal_shape=(3, 2, 4), rtype='Init', mode_order=([0], [1], [2]))