3 Tips to Solve Writing Contention Problems in Google Datastore
Google Cloud Platform provides lots of technologies to save developer’s time in terms of scalability and availability to serve a high amount of traffic.
When it comes to databases, you have two options to serve high traffic — MYSQL (SQL) or DataStore (NoSQL).
MYSQL vs Datastore
MYSQL
- SQL
- Relational DBMS
- Easy to implement complex logic by joining tables
- Scalability: you can add a node to increase reading capacity
- Availability: you need to set up a Master and Multiple Slave architecture.
Google DataStoe
- NoSQL
- Document store
- Cannot join tables
- Can easily achieve Scalability and Availability
If you are familiar with MYSQL`s architecture, it’s quite challenging to scale out to meet the growing demand of user traffic.
MYSQL usually locks the database in blocking mode when writing data to databases. You only can easily add nodes to increase reading capacity while it’s hard to increase writing capacity as traffic grows.
AWS Aroura, for example, you can only choose one Mater-Multiple Slave or Master-Master architecture. But none of them can scale out writing capacity on demand.
Google Datastore solved this issue that you can smoothly scale your writing capacity as traffic grows. Especially, Google Datastore is designed for high availability and scalability.
However, you should pay attention to how to implement writing data in parallel with Google DataStore.
Otherwise, your service will get a writing contention problem in Datastore when the current traffic is above 5 requests per second.
What is worst, Google’s documentation does not give you an easy-to-understand sample to implement Sharding Counter for resolving this issue.
In this post, we will not only provide an example to implement Sharding Counter but also the following.
- Limitations of Google DataStore
- Why writing data in Datastore is so slow?
- 3 Tips to resolve Contention Problems in DataStore.
Limitations of Google Datastore
- Any entity group can only be written at a rate of 1 request per second.
- If you define @ndb.transactional or @ndb.transactional(xg=true) in the data model to write the data, your API can only serve the traffic at 5 requests per second.
If you don’t pay attention to the limitations, your service can get a writing contention problem in the datastore when the traffic is above 5 requests per second. Because ndb.transactional will lock the data model in order to achieve transactional writing.
This makes sense when you want to write transactional data of payment records, but the performance at 5 requests per second is unacceptable to create a service.
Why Writing data in Datastore is so slow?
Because Datastore needs to copy your data globally to achieve high availability and scalability.
3 Tips to Solve Writing Contention Problems
- Sharding Counter
- Use Memcache to batch writing requests and do all the operations in memory and return back to your clients
- Defer a task queue to write data in the datastore
Sharding Counter is the best solution to resolve Writing Contention Problems and increase the writing capacity but you need to carefully design your data model in the first place.
Sharding Counter is you can do sharding on the entity group with a unique id as shown below.
The following codes show how to simultaneously write a thousand Friendship entities in parallel. If you need to improve its performance , just increase the number of NUM_SHARDS.
NUM_SHARDS = 1000
shard_string_index = str(random.randint(0, NUM_SHARDS - 1))
FriendShip(id=shard_string_index,
user_key='user Id',
friend_key='frind Id')
Note:
- When you update the data model with a sharding index, Datastore dispatches the wiring requests by the unique index to different nodes globally.
- This is the reason why Datastore can achieve parallel writing for big data at the same speed.
- But remember you need to keep a single data entity as small as possible.
Summary
- Sharding Counter can achieve parallel wiring for big data at the same performance but requires you to design the data model properly in the beginning.
- If you have lots of data models needed to update in one request, please use the task queue to update and return back to your clients only a few amounts of information.
- If you need to write transactional data using @ndb.transactional or @ndb.transactional(xg=true), defer a task queue to get it done and return a few amounts of information to the clients.
Depending on your data model’s design, you can choose Sharding Counter, Memcache, Task Queue, or even a hybrid approach to achieve the best performance in Google’s Datastore.