> you can run an initial VDiff, and then resume that one as you get closer to the cutover point.
VDiff (v2) only compares the source and destination at a specific point in time with resume only comparing rows with PK higher than the last one compared before it was paused. I assume this means:
1. VDiff doesn't catch updates to rows with PK lower than the point it was paused which could have become corrupt, and
2. VDiff doesn't continuously validate cdc changes meaning (unless you enforce extra downtime to run / resume a vdiff) you can never be 100% sure if your data is valid before SwitchTraffic
I'm curious if this is something customers even care about, or is point in time data validation sufficient enough to catch any issues that could occur during migrations?
You are correct about resuming. If you do an initial VDiff and then resume that same VDiff say 1 month later it would only diff rows with a higher PK value.
But there's also nothing stopping you from doing a new VDiff to cover all data at that later point in time.
That 400TB in the image is a large database! I'm guessing that's not the largest in the PlanetScale fleet either. Very impressive and a reminder that you're strongly differentiated against some of the recent database upstarts in terms of battle tested mission critical scale. Out of curiosity how many of these large clusters are using your true managed 'as a service' offering or are they mostly in the bring your own cloud mode? Do you offer zero downtime migrations from bring your own cloud to true as a service?
That particular cluster has grown significantly since the post was written, and yes there are now quite a few others that are challenging it for the "largest" claim. :-)
These larger ones are fully using the PlanetScale SaaS, but they are using Managed -- meaning that there are resources dedicated to and owned by them. You can read more about that here: https://planetscale.com/docs/vitess/managed
> you can run an initial VDiff, and then resume that one as you get closer to the cutover point.
VDiff (v2) only compares the source and destination at a specific point in time with resume only comparing rows with PK higher than the last one compared before it was paused. I assume this means:
1. VDiff doesn't catch updates to rows with PK lower than the point it was paused which could have become corrupt, and
2. VDiff doesn't continuously validate cdc changes meaning (unless you enforce extra downtime to run / resume a vdiff) you can never be 100% sure if your data is valid before SwitchTraffic
I'm curious if this is something customers even care about, or is point in time data validation sufficient enough to catch any issues that could occur during migrations?
You are correct about resuming. If you do an initial VDiff and then resume that same VDiff say 1 month later it would only diff rows with a higher PK value.
But there's also nothing stopping you from doing a new VDiff to cover all data at that later point in time.
What does it cost to host a 400TB database?
That 400TB in the image is a large database! I'm guessing that's not the largest in the PlanetScale fleet either. Very impressive and a reminder that you're strongly differentiated against some of the recent database upstarts in terms of battle tested mission critical scale. Out of curiosity how many of these large clusters are using your true managed 'as a service' offering or are they mostly in the bring your own cloud mode? Do you offer zero downtime migrations from bring your own cloud to true as a service?
That particular cluster has grown significantly since the post was written, and yes there are now quite a few others that are challenging it for the "largest" claim. :-)
These larger ones are fully using the PlanetScale SaaS, but they are using Managed -- meaning that there are resources dedicated to and owned by them. You can read more about that here: https://planetscale.com/docs/vitess/managed
All of the PlanetScale features, including imports and online schema migrations or deployment requests (https://planetscale.com/docs/vitess/schema-changes/deploy-re...) are fully supported with PlaneScale Managed.
Understood: that's great for your customers' EDP negotiations with their cloud providers!