How Windshield Repair Works

Just because your screen has a chip doesn’t imply you have to pay for a replacement. Why not do a screen repair using modern technologies to fix the screen fast and effectively? This would be a far…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Next Evolution of Db2u

Db2 Universal Container (Db2u) is a containerized solution that delivers a fast, consistent deployment of Db2 in any Kubernetes platform.

The history of containerized Db2 has seen two major evolutions in the past. Db2u is now once again ripe for a more powerful deployment model. Before discussing the exciting new features to come, let’s take a look at what has been done and why this is an essential milestone for Db2u.

The first evolution was a turning point for Db2 in the container: Db2u. Db2u was first envisioned as a way to support Kubernetes. Unlike the first generation of Db2 in the container, Db2u was not designed for a specialised hardware configuration. This change in mindset represents the foundation of what makes Db2u what it is. Db2u stays close to Kubernetes, and while there are some limitations, each evolution of Db2u surpasses these limits using new functionality in Kubernetes. In its first generation, Db2u used Helm to enable consistent deployment of products on Kubernetes, setting the foundation for fast deployment; for a small OLTP instance, Db2u could spin up a database within minutes.

The second evolution upgraded the way Db2u would manage its lifecycle. Around the same time, many other products were facing issues with this, so the Kubernetes community introduced “Operators”. The goal was to inject domain knowledge of your product within Kubernetes. After the release of the Operator-SDK framework, Db2u started to adopt this new paradigm. The Operator-SDK framework has three paths, with only one being the right choice for Db2u — the others would make the third evolution of Db2u close to impossible to achieve.

Core needs have always driven the evolution of Db2u. The third, which in my opinion, is even more critical, sets the foundation for faster and bigger Db2 instance. So far, each step has brought us closer, and now, we are focusing on solving the weakness of the Cloud deployment.

In the early days of Db2u’s Operator support, we focused on staying as close to our Helm deployment as possible. With that in mind, our options for Db2u were limited. During my time as Db2u Architect, I decided it was time to change our deployment architecture to best align with the Kubernetes model and enable some Db2 functionality, which improves performance with most cloud deployments. I created the blueprint, and while we won’t be able to implement 100% of it this year, the most critical component is complete.

The critical component of this up-and-coming evolution to Db2u is moving away from a Kubernetes resource named “Statefulset” to our new custom resource (CR) called Db2uEngine. To best describe the benefit of this CR, we need to understand how Db2uEngine controls storage compared to other resources. To keep it simple, let’s review over a Db2wh deployment with no dedicated storage for active log or tempts.

Db2uEngine Db2wh deployment

From this design, some might notice that each Db2 data partition, or multiple logical node (MLN), is now its own volume. We can also put each data partition’s Active Logs within a dedicated volume as well. This was just the outcome of directly controlling the pods. Before talking about the advantages, let’s discuss the reason for this change.

Throughout the lifetime of any software product, one key factor is almost always the most important: cost per performance. In other words, “how much money do I need to spend to reach X performance goals”? On the surface, this looks like a simple question, but this is complicated to calculate yet highly essential for a database product. The key factor for Db2 performance in most cases is storage; the IOPs and latency of the disk can reduce performance by over 10x. This is why the main focus this time is on storage, and why we move each MLN into its own volume. While storage performance is the main benefit, there is one more advantage we should mention: with this new CR, Db2u warehouse deployments now support scaling out/in (with some limitations, e.g. same number of MLNs per pod).

This new design is called v2 because of how Kubernetes versions resources. Our initial design’s API version was v1. For all the charts below, we will reference them by those names. Also, to simplify the chart, all hardware comparisons only use the head node. When looking at all the nodes, they are very similar to the head node. Also, for each mention of v1, we only show a snapshot of the first 5 hours, as this is the time it takes to complete the workload in v2.

Big Data Insight (BDI) is an IBM workload modelled after a day in the life of a business intelligence application based on a retail database with in-store and online catalogues of merchandise for sale. The workload is based loosely on the TPC-DS benchmark spec, containing seven fact tables and 17 dimension tables. The data is randomly generated each time a platform is tested. The workload has three types of “users” to run simple, intermediate and complex queries. The BDI workload contains 100 different Cognos-generated queries. In our tests, we generated data 10TB database sizes.

From the above graph, we can see the throughput increased by 218% from v1 to v2 on 16 Heavy Users runs. On the 32 Heavy Users run, we see a 440% increase in throughput. On v2, from 16 to 32 Heavy Users, there is a 35% increase in throughput.

The usage of the CPU can showcase if there are any bottlenecks in the deployment. As we remove bottlenecks from other parts of the system, usage of the CPU, typically the fastest part of the system, moves closer to 100%. In this case, our new design is better able to saturate the CPU, providing better performance.

To reach the same performance as v2 (multiple disks) using v1 (one disk), you will spend around $1,500 more for this workload. While we could bring this down to a $100 difference per node if we reduce the disk IOPS to fix into AWS EBS GP3 spec, you can only use this simple hack for small-size databases. Once we consider huge database sizes, the cost will be in the thousands of dollars a month. For a database that needs around 64K IOPS per node and has 6 data partitions for each node, the price difference is about $4500 per node per month.

A small note, IOPS is only one part of the equation; depending on the workload, latency can play a huge factor as well, but mostly usually in write-heavy workloads.

Less is better; as the disk becomes busy, processes have to queue and wait to access the disk.
For this environment, a single disk read is capped at 1Gb/s; with multiple disks in V2, the capped read speed will be 6Gb/s.
The total IOPS v2 can use is over 18k, while v1 is capped at 3k.
The peak throughput for v1 is around 2.3Gb/s, and V2 is around 5Gb/s.

This blog is an opinion piece — I do not reflect the views, plans, or opinions of Db2.

Your mileage may vary — these benchmarks are not representative of all workloads, and depending on a variety of factors, you may see different performance results in your own environments.

Add a comment

Related posts:

How to write Flutter code efficiently

When we start with any development we generally tend to write each code manually. That is good no doubt but once we are well versed with that technology, if we still follow that same rule it might…

An Exercise in Syntactic Pedantry

I have yet to work anywhere where there was consensus on coding style. The fact that formatters like Prettier exist also tell you this isn’t a unique problem. I’d like to over one such convention…

Free Resources for Developers

Hello everyone! As 2017 is coming to an end, kickstart your new year with these curated free and open source resources. Want to start 2018 off as a developer? Well check out DevFreeBooks! Here you…