Lesson 1: Transformers

Transformers are just a composable way to serialize the stuff. unprocess serializes, process parses.

In [1]:
testJsonDict = {"abolish": ["patent", "copyright"]}
In [2]:
from transformerz.serialization.json import jsonFancySerializer

print(jsonFancySerializer.unprocess(testJsonDict))
{
	"abolish": [
		"patent",
		"copyright"
	]
}
In [3]:
print(jsonFancySerializer.process("[null]"))
[None]

They can be composed using + operation.

In [4]:
from transformerz.serialization.pon import ponSerializer  # JSON <-> JavaScript === PON <-> Python

print((ponSerializer + jsonFancySerializer).unprocess(testJsonDict))  # Returns a "PON" `str` in which a JSON string is serialized
'{\n\t"abolish": [\n\t\t"patent",\n\t\t"copyright"\n\t]\n}'
In [5]:
print((jsonFancySerializer + ponSerializer).unprocess(testJsonDict))  # Returns a JSON `str` in which "PON" string is serialized
"{'abolish': ['patent', 'copyright']}"

But you cannot save strings into files, you need to save bytes into files ...

In [6]:
from transformerz.text import utf8Transformer

ourTransformer = utf8Transformer + jsonFancySerializer
print(ourTransformer.unprocess(testJsonDict))  # Returns raw bytes of a "PON" string is serialized
b'{\n\t"abolish": [\n\t\t"patent",\n\t\t"copyright"\n\t]\n}'
In [7]:
del testJsonDict

The data can be compressed. For compression we use the stuff available in my fork of kaitai.compress library (I hope it would be merged somewhen). BinaryProcessor is an adapter allowing to use the stuff from that lib.

In [8]:
from transformerz.compression import BinaryProcessor
from transformerz.kaitai.compress import Zlib

zlibProcessor = BinaryProcessor("zlib", Zlib())  # You must name your processor!
print((zlibProcessor + ourTransformer).unprocess(["fuck"] * 8000))  # Returns ZLib-compressed UTF-8 encoded JSON string
b'x\x9c\xed\xc6\xa1\r\x00 \x0c\x000\xcd\xce\x98\xdeG\x04E\x82A\xef\x7f\x14_\xb4\xaa3F\x9e\xde7KDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD~b=s\x83\xe3\x84'

Lesson 2: Concepts of cold and hot storage and objects representing them

There exist a pair of concepts of storage in computer science. Cold storage and hot storage.

  • Cold storage is permanent and costly to access. It is used for long-term storage of data and distribution on physical medium. The examples are HDD, magnetic tape, flash memory, a piece of paper with handwritten data, a DVD, a holograph, anything mentioned within a safe, or even pieces of glass with dots burned in thew with a laser stored in a abandoned mine in permafrost.
  • Hot storage may be not permanent, but it must be efficient to access. It is usally RAM.

    We use the term store for cold storage and the term cache for hot storage. In our case hot storage is usually RAM and requires no explicit serialization (the data are stored the way defined by runtime and compiler), and cold storage is usually HDD/SSD requiring data to be serialized before written and unserialized after being read.

Defining cold storage

To define the way we are going to STORE data we need to answer to the following "orthogonal" questions:

  • WHERE are we going to store it? Saver object is an answer. Mapper.saver answers the question.
In [9]:
# Here is an example of a saver saving the data into a file. The first argument is a path to the file (without extension!). Second argument is the extension.
from pathlib import Path
from urm.storers.cold import FileSaver

savedDataRootDir = Path("./tests/testSavedDataRootDir")  # a directory where a file will reside
ourSaver = FileSaver(savedDataRootDir, "json")  # json is the extension of the files used. `ourSaver` will be populated into `saver` property in future
  • WHAT is the mapping between our internal keys and keys in the storage of this piece of data? A key is an answer. Mapper.key answers this question.
In [10]:
from urm.mappers.key import fieldNameKeyMapper

keyMapper = fieldNameKeyMapper  # We don't need a key mapper currently, since we deal with scalars in this example

For cold storage we need to answer an additional question:

  • HOW are we going to serialize the data? transformerz.Transformer object is an answer. ColdMapper.serializer answers this question. It is a function, that returns a transformer we will use to serialize the data. It allows you to change the transformer depending on some conditions, some of which can be encoded in other serialized data.
In [11]:
from transformerz.core import TransformerBase
from urm.ProtoBundle import ProtoBundle

def constantParamsSerializerMapper(parent: ProtoBundle) -> TransformerBase:  # it will be populated into `serializer` property in future
    return ourTransformer

So ColdMapper object answers the questions related to storage of values in cold storage. Let's construct it!

In [12]:
from urm.mappers import ColdMapper

ourStorer = ColdMapper(keyMapper, ourSaver, constantParamsSerializerMapper)

Defining hot storage

To work with data we need a hot storage.

In [13]:
from urm.mappers.key import PrefixKeyMapper, fieldNameKeyMapper
from urm.mappers import HotMapper
from urm.storers.hot import PrefixCacher

ourCacher = HotMapper(fieldNameKeyMapper, PrefixCacher())
In [ ]:
 

Lesson 3: Fields, strategies and bundles

A key is a tuple of strings and numbers. It is an unique identifier of a piece of data. It is decoupled from the actual storage implementation. We address data by keys.

A field strategy is an object of FieldStrategy class that describes our pattern of loading/storing data. There are 2:

  • "cold" one (ColdStrategy class): always load data from medium on getting the field value and always store data to medium when the field is assigned with a value.
  • "cached" one (CachedStrategy class): on accesses only alter the data in the hot storage (aka cache). Load data to cache from cold storage the first time it is read. Store the data to cold storage when explicitly asked.

As you see, the most basic strategy is the cold one. The strategy using only hot storage makes completely no sense by itself, to use it you don't need all this framework.

So, let's get familiar with the cold strategy first.

Defining a cold strategy

In [14]:
from urm.fields import ColdStrategy

coldStrategy = ColdStrategy(ourStorer)  # Don't do like this in real code, a strategy object must never be reused! We have a better way to set it.

Defining our class with a property backed by cold storage

Now we define a class, which properties are backed by cold storage.

In [15]:
from urm.fields import Field, Field0D, FieldND
from urm.mappers.serializer import JustReturnSerializerMapper

class A:
    __slots__ = ()
    scalarField = Field0D(None)
    scalarField.strategy = coldStrategy  # Don't define the field like this, we have a better option. This way is about how strategies work

... and test it ...

In [16]:
import json

a = A()
dataToSave = {"a": ["b", "c"]}
a.scalarField = dataToSave
json.loads((savedDataRootDir / (A.scalarField.strategy.name + ".json")).read_text()) == dataToSave  # the data read from the file by another way must match the value we have saved!
Out[16]:
True
In [17]:
(savedDataRootDir / (A.scalarField.strategy.name + ".json")).write_text("100500")  # we replace the value in the file ...
a.scalarField  # ... and the returned value changes
Out[17]:
100500

Defining a cached strategy

A cached strategy requires both cold and hot mappers.

In [18]:
from urm.fields import CachedStrategy

cachedStrategy = CachedStrategy(ourStorer, ourCacher)

Defining our class with a property backed by cached storage

Now we define a class, which properties are backed by cached storage.

  • Such classes must inherit from ProtoBundle!
  • Hot storage is placed into an attr of the class prefixed with an underscore, so they have to be added into slots.
In [19]:
class B(ProtoBundle):
    __slots__ = ("_scalarField",)
    scalarField = Field0D(None)
    scalarField.strategy = cachedStrategy  # Don't define the field like this in production, we have a better option. This way is only to explain how strategies work

... and test it ...

In [20]:
b = B()  # since we use the same storer, the data is loaded from the same storage
b.scalarField == a.scalarField
Out[20]:
True
In [21]:
dataToSave2 = {"d": ["e", "f"]}
b.scalarField = dataToSave2
b.scalarField == dataToSave2
Out[21]:
True
In [22]:
# but the data in cold storage is not automatically updated ...
b.scalarField == dataToSave
Out[22]:
False
In [23]:
# We can save the data ....
b.save()
a.scalarField == dataToSave
Out[23]:
False
In [24]:
a.scalarField == dataToSave2
Out[24]:
True

The way above is useful for just understanding how the stuff works. In real code you gonna create the storage-backed fields using the following syntax:

In [25]:
class A:
    scalarField = Field0D(ourStorer)  # uncached

class B(ProtoBundle):
    __slots__ = ("_scalarField",)
    scalarField = Field0D(ourStorer, ourCacher)  # cached


a = A()
b = B()
b.scalarField == a.scalarField
Out[25]:
True

Lesson 4: Bird-eye picture

To create a bidirectional mapping between class properties, we need to answer the following questions:

  • How does data flows between hot and cold storages? FieldStrategy subclasses are the answers.
  • How do we STORE the data? ColdMapper object is an answer. `FieldStrategy.cold" answers this question.
  • How do we CACHE the data? HotMapper object is an answer. `FieldStrategy.hot" answers this question.
  • How do we create our internal keys? Field subclasses contain the answers.

Lesson 5: File-backed collections

For fields containing collections of objects mapped to unrelational entities you need key mappers, mapping keys. For scalars keys always were empty. For collections the keys are provided by the user when indexing.

In [26]:
from urm.storers.hot import CollectionCacher

vectorKeyMapper = PrefixKeyMapper()
ourVectorStorer = ColdMapper(vectorKeyMapper, ourSaver, constantParamsSerializerMapper)
ourVectorCacher = HotMapper(vectorKeyMapper, CollectionCacher(dict))  # our hot storage is a dict, but we can plug there any collection

class C(ProtoBundle):
    vectorField = FieldND(ourVectorStorer, ourVectorCacher)  # cached


c = C()
c.vectorField["aaaa"] = 10
c.vectorField["bbbb"] = {25: 36}
c.vectorField["cccc"] = {"25": 36}
c.save()
print((savedDataRootDir / "aaaa.json").read_text() == str(c.vectorField["aaaa"]))
print(json.loads((savedDataRootDir / "bbbb.json").read_text()) == c.vectorField["bbbb"])  # False, because it is JSON!
print(json.loads((savedDataRootDir / "cccc.json").read_text()) == c.vectorField["cccc"])
True
False
True

Lesson 6: Controlling paths with dynamic attributes and cache invalidation

To resolve paths dynamically we have a wrapper object Dynamic. It is a path in object hierarchy. To invalidate cache, set the value to None

In [27]:
from urm.core import Dynamic
from urm.fields import Field0D, FieldND
from urm.mappers.serializer import JustReturnSerializerMapper

controlledPathKeyMapper = PrefixKeyMapper(Dynamic("name"))
ourNameControlledStorer = ColdMapper(controlledPathKeyMapper, ourSaver, constantParamsSerializerMapper)

class Pocket(ProtoBundle):
    __slots__ = ("name", "_shit")
    shit = Field0D(ourNameControlledStorer, ourCacher)
    def __init__(self, name: str):
        self.name = name


ptchkPocket = Pocket("ptchk")
ptchkPocket.shit = 2
ptchkPocket.save()
print("Wn: hv y brgt??")
(savedDataRootDir / "ptchk.json").write_text(str(json.loads((savedDataRootDir / "ptchk.json").read_text()) - 1))
ptchkPocket.shit = None  # invalidates cache
print("ptchk: Y nw hv", ptchkPocket.shit)
ptchkPocket.shit -= 1
print(json.loads((savedDataRootDir / "ptchk.json").read_text()))
ptchkPocket.save()
print(json.loads((savedDataRootDir / "ptchk.json").read_text()))
Wn: hv y brgt??
ptchk: Y nw hv 1
1
0