Google Datastore Entity Review

I’m working on TTB Tamer. One of my tasks requires that every Entry made by a Brewer (i.e. moving beer out of the cellar) must have a reference to the Brewery to which it applies

I’m forced to reconsider my options for modeling this relationship. Reviewing the Google Datastore gave me fresh perspective…

OK, the basic structure of an Entity that can be stored with Google Datastore:

  1. Key
    1. Kind
    2. (Key Name | Numeric ID)
    3. Ancestor Path?
  2. Parent Entity?
  3. Properties+

Now some elaboration:

Key

  • Every entity has a key, … a unique identifierfor that entity….
  • includes optional key name, a string unique across entities [NATE: Same entity group?] of the given kind [NATE: Oh..]

A key consists of these components:

  1. The kind of the entity, which categorizes it for the purpose of Datastore queries
  2. An identifier for the individual entity, which can be either
    1. key name string [NATE: called optional, above]
    2. an integer numeric ID [NATE: automatically assigned]
  3. An optional ancestor path locating the entity within the Datastore hierarchy
    1. [NATE: Described as the list of kind:identifier pairs* for each ancestor of this entity (starting with the parent)
    2. The list starts from the “root” of the tree (i.e. [Grandparent,… ) and progresses towards the “branches” (i.e. … Parent, Child] ).
    3. If an entity does not have a parent, it is a “root entity”. Each root entity is considered part of a separate entity group!]

*This is an even-numbered list of all ancestors (even-numbered; as if each ancestor was represented by a two-item tuple)

I clarified the usage of db.Key.from_path, which always confused me (in the context of its documentation and docstring)

  1. EITHER the ancestor path (An even-numbered list of strings) can be passed as the args to db.Key.from_path
  2. …OR just the parent can be explicitly passed (as parent=<…>); its value must be
    1. a Key object (which knows its own ancestor path, and can thus substitute an ancestor path)
    2. OR an Entity object (which has a method to access its .key(), which knows its own…)

Once the key is assigned; it is permanent!

Odd that this key_name, which is a (unique — among entity groups) identifier, is not described in a declarative DDL; i.e. the properties of the Model subclass.

It is specified as part of the Model’s constructor, or db.create methods; db.get_or_insert and etc.

Be careful with your choice of key; validate that it is the correct format. Otherwise you may run into a situation where bad data points at the root instance — but the root instance has the wrong key! This means a new root must be created (can’t change key; it is permanent), and a new object created for every descendant (because you can’t change parent; it is permanent, as seen here…

Parent

An entity can also have an optional parent entity. Parent-child relationships form entity groups, which are used to control transactionality and data locality in the Datastore.

Transactions ; a set of operations on one or more entities. Transactions guarantee:

  1. Atomic. Either all of the changes succeed together or they all fail together; all or none. i.e. creating/updating multiple entities when users make a General Ledger entry (double-entry / double-entity accounting.)
  2. Consistency. Until all of a single transaction’s operations finish their effect on an entity, no other (non-transaction) operations can start. i.e. incrementing an entity’s ‘Populatory’ value prohibits any other Popularity updates in the meantime (so those updates won’t be ovewritten/reverted by the increment.)

In other words,

“This works because entity groups are a unit of consistency as well as transactionality. “

Keep in mind that Transactionality can be achieved without all affected entities using the same entity group; “cross-group (XG)” transactions are possible, but they have differences, and limitations.

Once a parent is assigned; it is permanent!

Don’t nest needlessly deep; use meaningful folders that provide mutually exclusive domains, i.e. would never make a set that took some from one parent, and some from the other (i.e. show user Entries from Brewery Theirs +  Brewery Totally Unrelated )

 

Don’t make parents if the relationship can be figured via filtering on another property you needed anyway. An “update”, was a series of entries that happened to be made at the same time. No need to nest them under an ‘update’ object; just group entries by time. No GQL Grouping; just use python’s.

Properties

… Weren’t the focus of my review. These are simply the “fields”, aka “properties”, aka “attributes” of each object. They have types.

It’s worth repeating that a db.ReferenceProperty type in Google Datastore is used in place of foreign-key JOINs of other relational-databases (therefore Google’s Datastore is considered more “hierarchical” (vs “relational”)). It’s convenient because:

  1. it’s accessible.
    1. (In a Relational database, a User object has a brewery_key property; it must be JOIN’ed before accessing the Brewery object represented by that key.
    2. In Python’s datastore language / abstraction layer, a User object has the Brewery object as a direct property, only one dot (‘.’) away; as in User.Brewery)
  2. it’s bi-directional
    1. (through Python code, if many User Entities have a db.Reference to a single “ABC” Brewery Entity– the Employees of “ABC Brewery”–
    2. then the “ABC” Brewery object in Python will reflexively have a collection of objects representing the User Entities)
    3. However, this bi-directionality is not available through the UI, aka the App Console. Frustrating.***
      1. …And not all relationships are symmetric.
    4. …And you only query “Users” by the direct User properties’ values; not their “Brewery” property’s values
      1. Where in SQL statements, if you JOIN, you can continue your query against other properties

OK, Back to the task at hand!

I simply need to express the relationship between an individual Entry and a Brewery.

Now that you understand more, you will appreciate the current design;

Currently, individual Entry (don’t confuse with the notion of Google Datastore Entity)

What’s the point of all this? I want to understand how to use Google Datastore optimally? What is optimally? The best combination of these things:

  1. No data corruption
  2. Fast
  3. Fewer reads
  4. Fewer writes

I’ve been struck by these things, or made these inferences:

  1. Hierarchical is like a DOM; a tree
    1. Can easily access all children.
    1. Can’t have more than one parent.
    2. Unlike DOM, can’t change a parent.
    3. Just one parent, but a list of ancestors.

And I’ve established my own basic best practices:

  1. If an Entity feels like nothing “supercedes” it, or “precedes” it, … or it has “precedent” in many things (instead of one thing), it shouldn’t have a parent; it should be a root entity.
  2. If an Entity feels like more than one type of thing could supercede it, make at least one of these things its parent.  The parent relationship is useful, especially for the type of thing which will be transacted more often. The other type can always be referenced with a referenceProperty.
  3. If a relationship points from one object ( a Subtotal ) to another object (a Total), and the to-object (a Total) might change, then to the To object should probably not be a parent: parents can’t be changed without creating new objects
  4. It is candidate to become a parent if an Entity or Type of Entity would have many different types of children, since fetching these children very performant accessible using kindless ancestory Query, and it is not possible in a ReferenceProperty relationship. **just like jQuery’s children() may consist of many different elements, like a <div> and a <p> and an <a>),
  5. Consider that an Entity needn’t be a parent; it can also serve useful as an ancestor. Like the <body> isn’t the direct parent of <a>’s , but may be a useful delegate for all the events; I think this may be the epiphany in my case.

So, for my question:

What should be an Entry’s parent?

  1. The Updates which group them.
    1. I’ve always had an instinct to group Entries made in the same POST request;
    2. i.e. made by the same user, to the Brewery, at the same time
    3. Use-case: it is indeed one “Transaction”; the Entries they’re making may be related, but at least have those three properties in common;
    4. Pro: DRY. If a group of Entries have these significant properties in common, store those on a single (“Update”) object, instead of repeating the properties on all the ENtry objects.
    5. Con: Distance; GQL does not allow you to filter on properties of entities related through a reference property. So I use list comprehension on top of GQL.
    6. Tradeoff: cost/cons of repeating the value of properties on two objects, vs the cost/cons of filtering outside of the query mechanism.
    7. Compromise: I’ll repeat the date properties. I won’t repeat the user property. And the Brewery property will be inherited via the ancestor chain, if the Update has the Brewery as its parent.
    8. EDIT: Another potential advantage of keeping one degree of separation between Entries and Breweries (i.e. Brewery -> Update -> Entry); “…but [writing to entity groups] also limits changes to the [parent entity] to no more than 1 write per second (the supported limit for entity groups).”
  2. The Brewery whose inventory they affected
    1. A Brewery must be an ancestor, without question.
    2. Everything is predicated on a) the current user and b) their Brewery.
    3. Indeed, the application constantly isolates the user to a single Brewery’s Entries, Accounts, etc.
  3. The inventory Account , or Account From/To they’ve affected
    1. Use-case: summing all the Entries in a single account.
    2. Cons: currently, accounts are generic (i.e. tax form line items); Breweries may be sharing the same account.
    3. And they only exist as a JSON map, they’re not Database objects… yet.
    4. I just use a string comparison.
  4. The EntryTotal; representing the sum of Entries made over a span of time.
    1. Too volatile; if the span of time changes, then entries may leave or enter this category; a parent can’t change.
    2. Also, they’re not Database objects…yet
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: