Transformers are just a composable way to serialize the stuff. unprocess
serializes, process
parses.
testJsonDict = {"abolish": ["patent", "copyright"]}
from transformerz.serialization.json import jsonFancySerializer
print(jsonFancySerializer.unprocess(testJsonDict))
{ "abolish": [ "patent", "copyright" ] }
print(jsonFancySerializer.process("[null]"))
[None]
They can be composed using +
operation.
from transformerz.serialization.pon import ponSerializer # JSON <-> JavaScript === PON <-> Python
print((ponSerializer + jsonFancySerializer).unprocess(testJsonDict)) # Returns a "PON" `str` in which a JSON string is serialized
'{\n\t"abolish": [\n\t\t"patent",\n\t\t"copyright"\n\t]\n}'
print((jsonFancySerializer + ponSerializer).unprocess(testJsonDict)) # Returns a JSON `str` in which "PON" string is serialized
"{'abolish': ['patent', 'copyright']}"
But you cannot save strings into files, you need to save bytes into files ...
from transformerz.text import utf8Transformer
ourTransformer = utf8Transformer + jsonFancySerializer
print(ourTransformer.unprocess(testJsonDict)) # Returns raw bytes of a "PON" string is serialized
b'{\n\t"abolish": [\n\t\t"patent",\n\t\t"copyright"\n\t]\n}'
del testJsonDict
The data can be compressed. For compression we use the stuff available in my fork of kaitai.compress
library (I hope it would be merged somewhen). BinaryProcessor
is an adapter allowing to use the stuff from that lib.
from transformerz.compression import BinaryProcessor
from transformerz.kaitai.compress import Zlib
zlibProcessor = BinaryProcessor("zlib", Zlib()) # You must name your processor!
print((zlibProcessor + ourTransformer).unprocess(["fuck"] * 8000)) # Returns ZLib-compressed UTF-8 encoded JSON string
b'x\x9c\xed\xc6\xa1\r\x00 \x0c\x000\xcd\xce\x98\xdeG\x04E\x82A\xef\x7f\x14_\xb4\xaa3F\x9e\xde7KDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD~b=s\x83\xe3\x84'
There exist a pair of concepts of storage in computer science. Cold storage and hot storage.
We use the term store
for cold storage and the term cache
for hot storage. In our case hot storage is usually RAM and requires no explicit serialization (the data are stored the way defined by runtime and compiler), and cold storage is usually HDD/SSD requiring data to be serialized before written and unserialized after being read.
To define the way we are going to STORE data we need to answer to the following "orthogonal" questions:
Saver
object is an answer. Mapper.saver
answers the question.# Here is an example of a saver saving the data into a file. The first argument is a path to the file (without extension!). Second argument is the extension.
from pathlib import Path
from urm.storers.cold import FileSaver
savedDataRootDir = Path("./tests/testSavedDataRootDir") # a directory where a file will reside
ourSaver = FileSaver(savedDataRootDir, "json") # json is the extension of the files used. `ourSaver` will be populated into `saver` property in future
key
s and key
s in the storage of this piece of data? A key
is an answer. Mapper.key
answers this question.from urm.mappers.key import fieldNameKeyMapper
keyMapper = fieldNameKeyMapper # We don't need a key mapper currently, since we deal with scalars in this example
For cold storage we need to answer an additional question:
transformerz.Transformer
object is an answer. ColdMapper.serializer
answers this question. It is a function, that returns a transformer we will use to serialize the data. It allows you to change the transformer depending on some conditions, some of which can be encoded in other serialized data.from transformerz.core import TransformerBase
from urm.ProtoBundle import ProtoBundle
def constantParamsSerializerMapper(parent: ProtoBundle) -> TransformerBase: # it will be populated into `serializer` property in future
return ourTransformer
So ColdMapper
object answers the questions related to storage of values in cold storage. Let's construct it!
from urm.mappers import ColdMapper
ourStorer = ColdMapper(keyMapper, ourSaver, constantParamsSerializerMapper)
To work with data we need a hot storage.
from urm.mappers.key import PrefixKeyMapper, fieldNameKeyMapper
from urm.mappers import HotMapper
from urm.storers.hot import PrefixCacher
ourCacher = HotMapper(fieldNameKeyMapper, PrefixCacher())
A key
is a tuple
of strings and numbers. It is an unique identifier of a piece of data. It is decoupled from the actual storage implementation. We address data by keys.
A field strategy is an object of FieldStrategy
class that describes our pattern of loading/storing data. There are 2:
ColdStrategy
class): always load data from medium on getting the field value and always store data to medium when the field is assigned with a value.CachedStrategy
class): on accesses only alter the data in the hot storage (aka cache
). Load data to cache from cold storage the first time it is read. Store the data to cold storage when explicitly asked.As you see, the most basic strategy is the cold one. The strategy using only hot storage makes completely no sense by itself, to use it you don't need all this framework.
So, let's get familiar with the cold strategy first.
from urm.fields import ColdStrategy
coldStrategy = ColdStrategy(ourStorer) # Don't do like this in real code, a strategy object must never be reused! We have a better way to set it.
Now we define a class, which properties are backed by cold storage.
from urm.fields import Field, Field0D, FieldND
from urm.mappers.serializer import JustReturnSerializerMapper
class A:
__slots__ = ()
scalarField = Field0D(None)
scalarField.strategy = coldStrategy # Don't define the field like this, we have a better option. This way is about how strategies work
... and test it ...
import json
a = A()
dataToSave = {"a": ["b", "c"]}
a.scalarField = dataToSave
json.loads((savedDataRootDir / (A.scalarField.strategy.name + ".json")).read_text()) == dataToSave # the data read from the file by another way must match the value we have saved!
True
(savedDataRootDir / (A.scalarField.strategy.name + ".json")).write_text("100500") # we replace the value in the file ...
a.scalarField # ... and the returned value changes
100500
A cached strategy requires both cold and hot mappers.
from urm.fields import CachedStrategy
cachedStrategy = CachedStrategy(ourStorer, ourCacher)
Now we define a class, which properties are backed by cached storage.
ProtoBundle
!class B(ProtoBundle):
__slots__ = ("_scalarField",)
scalarField = Field0D(None)
scalarField.strategy = cachedStrategy # Don't define the field like this in production, we have a better option. This way is only to explain how strategies work
... and test it ...
b = B() # since we use the same storer, the data is loaded from the same storage
b.scalarField == a.scalarField
True
dataToSave2 = {"d": ["e", "f"]}
b.scalarField = dataToSave2
b.scalarField == dataToSave2
True
# but the data in cold storage is not automatically updated ...
b.scalarField == dataToSave
False
# We can save the data ....
b.save()
a.scalarField == dataToSave
False
a.scalarField == dataToSave2
True
The way above is useful for just understanding how the stuff works. In real code you gonna create the storage-backed fields using the following syntax:
class A:
scalarField = Field0D(ourStorer) # uncached
class B(ProtoBundle):
__slots__ = ("_scalarField",)
scalarField = Field0D(ourStorer, ourCacher) # cached
a = A()
b = B()
b.scalarField == a.scalarField
True
To create a bidirectional mapping between class properties, we need to answer the following questions:
FieldStrategy
subclasses are the answers.ColdMapper
object is an answer. `FieldStrategy.cold" answers this question.HotMapper
object is an answer. `FieldStrategy.hot" answers this question.key
s? Field
subclasses contain the answers.For fields containing collections of objects mapped to unrelational entities you need key mappers, mapping keys. For scalars keys always were empty. For collections the keys are provided by the user when indexing.
from urm.storers.hot import CollectionCacher
vectorKeyMapper = PrefixKeyMapper()
ourVectorStorer = ColdMapper(vectorKeyMapper, ourSaver, constantParamsSerializerMapper)
ourVectorCacher = HotMapper(vectorKeyMapper, CollectionCacher(dict)) # our hot storage is a dict, but we can plug there any collection
class C(ProtoBundle):
vectorField = FieldND(ourVectorStorer, ourVectorCacher) # cached
c = C()
c.vectorField["aaaa"] = 10
c.vectorField["bbbb"] = {25: 36}
c.vectorField["cccc"] = {"25": 36}
c.save()
print((savedDataRootDir / "aaaa.json").read_text() == str(c.vectorField["aaaa"]))
print(json.loads((savedDataRootDir / "bbbb.json").read_text()) == c.vectorField["bbbb"]) # False, because it is JSON!
print(json.loads((savedDataRootDir / "cccc.json").read_text()) == c.vectorField["cccc"])
True False True
To resolve paths dynamically we have a wrapper object Dynamic
. It is a path in object hierarchy.
To invalidate cache, set the value to None
from urm.core import Dynamic
from urm.fields import Field0D, FieldND
from urm.mappers.serializer import JustReturnSerializerMapper
controlledPathKeyMapper = PrefixKeyMapper(Dynamic("name"))
ourNameControlledStorer = ColdMapper(controlledPathKeyMapper, ourSaver, constantParamsSerializerMapper)
class Pocket(ProtoBundle):
__slots__ = ("name", "_shit")
shit = Field0D(ourNameControlledStorer, ourCacher)
def __init__(self, name: str):
self.name = name
ptchkPocket = Pocket("ptchk")
ptchkPocket.shit = 2
ptchkPocket.save()
print("Wn: hv y brgt??")
(savedDataRootDir / "ptchk.json").write_text(str(json.loads((savedDataRootDir / "ptchk.json").read_text()) - 1))
ptchkPocket.shit = None # invalidates cache
print("ptchk: Y nw hv", ptchkPocket.shit)
ptchkPocket.shit -= 1
print(json.loads((savedDataRootDir / "ptchk.json").read_text()))
ptchkPocket.save()
print(json.loads((savedDataRootDir / "ptchk.json").read_text()))
Wn: hv y brgt?? ptchk: Y nw hv 1 1 0