HN Gopher Feed (2017-12-06) - page 1 of 10 ___________________________________________________________________
PostgreSQL HA cluster failure: a post-mortem
198 points by stevedomin
https://gocardless.com/blog/incident-review-api-and-dashboard-ou...hboard-outage-on-10th-october/___________________________________________________________________
qaq - 4 hours ago
There are specialized tools like Patroni
(https://github.com/zalando/patroni)
emXdem - 1 hours ago
I like Stolon: https://github.com/sorintlab/stolon
dboreham - 5 hours ago
This is why you should be extremely wary of anything that is only
run once in a a blue moon. And very wary of such things that when
run, are being run to save your bacon.
rosser - 4 hours ago
"Your backups are only as good as their last successful restore."
CodeWriter23 - 1 hours ago
https://feedbin.com/blog/2017/05/05/testing-huge-postgresql-...
quicklyfrozen - 5 hours ago
And wary of using options that might be outside typical usage
(such as -INF for the backup VIP).
Klathmon - 4 hours ago
I always advocate for monthly "pull the plug on the box" tests.If
you don't need "high availability", then it will test your
backups and restore process, and if you do need "high
availability", it will ensure your failover processes are running
smoothly.Not to mention it trains everyone involved what to do in
an emergency since it should be second nature by the time it
really happens.If you can't go "Full netflix" and unleash a chaos
monkey on your servers, at least setup a maintenance period where
downtime is somewhat expected, and do it then.
seanwilson - 3 hours ago
> This is why you should be extremely wary of anything that is
only run once in a a blue moon.I find this similar to when you
launch a project that hasn't been used in production yet. Bugs
should be expected because it hasn't been battle tested.
devit - 6 minutes ago
I think the root issue is that PostgreSQL does not offer an HA
solution that works out of the box with minimal configuration,
resulting in people using broken third-party ones and/or
configuring them incorrectly.They should either provide one or
"bless" an external solution as the official one (after making sure
it works correctly).The other problem is that GoCardless setup an
asynchronous and a synchronous replica instead of 2 synchronous
replicas (or preferably 4+), resulting in only two points of
failure, which is not enough.
testplzignore - 5 hours ago
Good write-up. I'm curious about two more things:1. What caused the
crash on the synchronous replica? Was it just a coincidence and
completely unrelated to the primary failure?2. Given the three
conditions necessary for the cluster to break, was the behavior of
the Pacemaker software expected? I.e., was this a gotcha that
should be in the Pacemaker documentation, or a bug?
Sinjo - 4 hours ago
1. Unfortunately the logs don't give any detail there. Most
likely something arrived down the replication connection that the
process couldn't handle, and it crashed.2. Our understanding now
is that INF is the strongest preference, whereas -INF is a veto.
It would be very cool to have this confirmed 100% by someone who
works on Pacemaker!
notyourday - 5 hours ago
Stop pretending that there's a magic bullet called "multi-master"
and "transparent promotion". Your apps are super simple. Their DB
interactions are super simple. Learn how to do federations and all
these problems will go away.
rosser - 4 hours ago
It's curious that you decry some kinds of "magic bullet" in favor
of another.How about "there are no magic bullets, full stop"?
vemv - 5 hours ago
Interesting. Recommended reads?
Erwin - 3 hours ago
If you are running HA in AWS RDS, how would you compare your
experience with the above? What are the types of RDS failures modes
that you have experienced?So far I've discovered that TCP
keepalives are quite important, otherwise your queries may hang
forever after failover (or at least for the default timeout which
is like 30 minutes). The connection does not get broken otherwise
by the failover.
pjungwir - 1 hours ago
Here are a couple gotchas I've seen on RDS:- If you are running a
MultiAZ instance, it is supposed to fail over automatically, but
if the problem is in the networking, then you can still
effectively lose service. One way around that is to run a read
replica in another AZ, and use a Route53 entry with a health
check to send traffic to the read replica if the primary isn't
reachable. You'll still need to promote the read replica to a
master though.- If you restore from a snapshot, the new EBS
volume only pulls blocks of data from S3 as they are requested.
So these reads are a lot slower than normal. If you have a large
database you could have degraded performance for days. Here is
some more info about this:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-resto...
emXdem - 1 hours ago
Stolon with etcd or consul is a far superior solution for HA
postgres.
tomc1985 - 4 hours ago
Makes me sad that running your own instances is now an "elephant in
the room." No pride in old-school do-it-yourself nerditry these
days :/
derefr - 4 hours ago
It's still "respectable" to run your own n=1 Postgres instance,
maybe with WAL-E backup. It's sensible, as well, to create your
own read-replicas to scale OLAP; and even to do your own shared-
nothing sharding across regions. These are all "set and forget"
enough that they can be the responsibility of your regular
devops.But, when you get to the point where you need multi-master
replication, you're making a mistake if you aren't dedicating an
ops person (i.e. a DBA) to managing your growing cluster. If you
can't afford that ops person, much better to just pay a DBaaS
provider to handle the ops for you, than to get hosed when your
cluster falls apart and your week gets shot putting it back
together.
Sinjo - 3 hours ago
Right!A thing that scares me is anyone saying they're running
their own HA cluster (not single instance) for cost reasons.
Infra people are not cheaper than the hosted solutions (Amazon
RDS, Google Cloud SQL, Heroku Postgres).
foobarbazetc - 3 hours ago
That?s a blanket statement that has very little basis in
reality. Hosted Postgres is never going to give you the
performance you need for low latency deployments.
bmm6o - 3 hours ago
Then you are running it yourself for latency reasons, not
(just) the cost reasons in the GP's scenario.
Sinjo - 3 hours ago
Oh for sure!My claim is that you need to hire some
expensive people if you want that performance, not that
there aren't reasons to run your own database instances!
qaq - 2 hours ago
Cause AWS infra is managed by magic monkeys.
rosser - 4 hours ago
Getting HA right is hard. DIY-ing it incurs risk, possibly
deliberately, out of Not-Invented-Here-ism.Source: PostgreSQL DBA
for over a decade; have built multiple HA environments; have seen
many ways of "doing it wrong", and how those can end up biting
their creators.
nh2 - 16 minutes ago
On the other hand:With hosted Postgres, when a failure does
happen, isn't it much harder to get at the log files? They seem
extremely useful to diagnose the problem and make sure it
doesn't happen again, as the article shows. What's your
experiene here, can you get at logs easily with hosted Posgres
offerings?And it seems the only way to get reliable Postgres HA
for everyone, and to weed out the bugs, is if more people run
Posgres HA themselves. For example, I find Stolon and Patroni
great, but I would be more relaxed about them if they had 100x
more users.
rosser - 3 minutes ago
We aren't using hosted postgres. We're provisioning EC2
instances on which we run self-managed postgres. Failover is
scripted, and manually invoked as needed.None of us trust any
of these automated failover solutions enough to use them. We
want human judgement in that loop, even if it means being
woken at 3AM to push the button. It's that hard to get right,
and the costs of getting it wrong far outweigh the additional
latency of paging a person and waiting for them to respond.
illumin8 - 2 hours ago
I love nerding out over this personally, but if you're a startup,
given the plethora of well managed offerings, you're frankly
foolish to invest resources on this. Even if you eventually
reach the point where it makes financial sense to hire a full
time ops person or DBA, the opportunity cost of having a smart
engineer (and it does take a smart engineer to manage a multi-
master database) work on infrastructure instead of your actual
product, is just stupid.How many startups have failed because
they spent too much money building "cool, nerdy, infrastructure"
instead of just building a product?
sandGorgon - 6 hours ago
>Fortunately, as part of some unrelated work we'd done recently, we
had a version of the cluster that we could run inside Docker
containers. We used it to help us build a script that mimicked the
failures we saw in production. Being able to rapidly turn clusters
up and down let us iterate on that script quickly, until we found a
combination of events that broke the cluster in just the right
way.this is the coolest part of this story. Any chance these
scripts are opensource ?
Sinjo - 6 hours ago
We plan to open source it as soon as we can. Tiny bit more work
to do, then review from a couple of people in the team, then we
make it public. :)
sandGorgon - 6 hours ago
Thanks for this !
foobarbazetc - 3 hours ago
I?ve been involved in various Postgres roles (developer on it,
consultant for it for BigCo, using it at my startup now) for around
18 years.I?ve never in that time seen a pacemaker/corosync/etc/etc
configuration go well. Ever. I have seen corrupted DBs, fail overs
for no reason, etc. The worst things always happen when the
failover doesn?t go according to plan and someone accidentally
nukes the DB at 2am.The lesson I?ve taken from this is it?s better
to have 15-20 minutes of downtime in the unlikely event a primary
goes down and run a manual failover/takeover script then it is to
rely on automation. pgbouncer makes this easy enough.That said,
there was a lot of bad luck involved in this incident.
merb - 2 hours ago
why not just use https://github.com/zalando/patroni
cromantin - 2 hours ago
Patrons is great. Running dockerised Postgres with consul
backend for years without a hitch. Haproxy as lb. What that? A
replica need reboot. Just reboot. Primary? Just failover to
replica and reboot. Undesired reboots recovers in under 10
seconds. During which just primary is not available but
replicas are.
takeda - 2 hours ago
My understanding is that the problem is not really with
pacemaker/corosync. Those tools also are always consistent as
ZK/etcd/Consul. There is also SONITH to make sure the node that
goes down can't cause damage once it is back.The problem is not
these tools, but implementing what is the right thing to do
during an outage or even properly detecting one (what happened
with github). Your solution might work 99 cases out of 100 but
that remaining 1 case might cause your data loss.When there is
a human required to do the switch it typically he/she can
investigate what happened and make the right decision.It's
theoretically possible to have a foolproof solution that always
works right, but that's extremely hard to implement, because
you need to know in advance what kind of issues you will have,
and if you miss something, that's one case where your tool
might make a wrong decision.
merb - 1 hours ago
well corosync/pacemaker is definitly not the same as
zk/etcd/consul. STONITH is mostly a bad idea. Two node
clusters are actually always a bad idea. Using a VIP is a bad
idea, too. This is what I learned in the small scale and in
the big scale it's even worse.The problem in this topic was
that they didn't understood corosync/pacemaker correctly. The
syntax is akward and it's hard to configure. With consul +
patroni they would have a way better architecture that could
be way more understood. They would not need a VIP (it would
work over DNS). They used archive_command to get a WAL file
from the primary on a sync replica. This should NEVER be
done, if archive_command did not returned with a sane status
code (which in fact it probably did not). They did not read
https://www.postgresql.org/docs/10/static/continuous-
archivi... at all. Last but not least you should never use
restore_command on a sync node when it doesn't need to
(always check if master is alive/healty before doing it.
Maybe even check how far behind you are)patroni would've
worked in their case. patroni would've made it easy to
restart the failed primary. patroni would be in control of
the postgresql which is way better than using
pacemaker/corosync (especially combined with a
watchdog/softdog).what would've helped also would have been
two sync nodes and fail to any of them. (will be harder since
sync nodes need to be detached if unhealty)and best thing is
with etcd/consul/zk you could have a cluster of
etcd/consul/zk on three different nodes than your 3 database
servers (this helps a lot).
Sinjo - 53 minutes ago
It's a little lost in another comment thread
(https://news.ycombinator.com/item?id=15862584), but I'm
definitely excited about solutions like Patroni and Stolon
that have come along more recently.
merb - 42 minutes ago
Well you should definitly look into them. In the past we
used corosync/pacemaker a lot (even for different things
than just database-ha) but trust me... it was never a
sane system. if it ain't broke it worked. if something
broke it was horrible to actually get back to any sane
state at all.we migrated to patroni (yeah stolon is cool
aswell, but since it's a little bit bigger than we need
to we used patroni). the hardest part for patroni is
actually creating a script which would create service
files for consul (consul is a little bit wierd when it
comes to services) or somehow changes dns/haproxy
whatever to point to the new master (this is not a
problem on stolon)but since then we tried all sorts of
failures and never had a problem. we pulled plugs (hard
drive, network, power cord) nothing bad did happen no
matter what we did. watchdog worked better than expected
in some cases where we tried to fire bad stuff at
patroni/overload it. and since it's in python the
charactaristic/memory/cpu usage is well understood. (the
code is also easy to reason about, at least better than
corosync/pacemaker.) etcd/zk/consul is battle tested and
did work even that we have way more network partitions
than your typical network (this was bad for galera..
:(:() we never autostart a failed node after a
restart/clean start. we always look into the node and
manually start patroni. and also we use the
role_change/etc hooks to create/delete service files in
consul and to ping us if anything on the cluster happens.
Sinjo - 39 minutes ago
Thanks for the extra info, and the insight into how
you're using Patroni. Always helpful to hear about
someone using it for real, especially someone who's come
from Pacemaker. :)
morrbo - 3 hours ago
How does pgbouncer make this process easy? Just because there are
less connections to go to the final DB? I've also got a random
question you might be able to answer (having a hard time
googling)... when using pgsql in streaming replication mode, are
created functions replicated/updated to everything else as
well?(just learning about postgres and saw an opportunity to ask
someone in the know) Cheers!
jeltz - 3 hours ago
One reason is that some applications do not properly support
reconnecting to the database if the connection is lost. With
pgbouncer that is not an issue. Another is to avoid having
floating IPs or updating the DNS.
pjungwir - 2 hours ago
> are created functions replicated/updated to everything else
as well?Yes.Note if those functions have an implementation
compiled from C, you do need to install the .so on the standbys
though.
morrbo - 24 minutes ago
Thanks!
snuxoll - 3 hours ago
PgBouncer makes it easy to do a controlled failover without
having to deal with floating IP's on your postgresql cluster.
fake-name - 3 hours ago
I'd assume it's because you can just reconfigure to point
pgbouncer to redirect everything to the secondary, rather then
having to update all the applications using pgbouncer. It
centralizes the configuration of which database is active.
liveoneggs - 2 hours ago
your findings are supported by Baron Schwartz as well
(https://www.xaprb.com/blog/2012/09/17/is-automated-failover-...
and others)
linker3000 - 3 hours ago
Since I am investigating HA with PostgreSQL right now and have
bitter experience of Pacemaker 'HA' instances that have been
anything but, I am looking at Amazon Aurora and Microsoft's (in
preview) Azure database for PostgreSQL offerings. I would really
appreciate any insight from others who are already using them (we
intend to do some PoC work shortly).Our dev team also came up with
some pertinent questions, which we have put to both companies, but
if anyone else can comment from experience that would be
fantastic:* Is the product a fork of PostgreSQL or a wrapper round
the current version?* Will the DB engine keep in lock-step with new
PostgreSQL releases or might they diverge?* If the DB engine keeps
in lock-step, what?s the period between a new version of PostgreSQL
being released before its incorporated in the live product?* When
new versions of Amazon Aurora/Azure DB for PostgreSQL are released
will our live instance get automatically updated or will we be able
to choose a version?
kodablah - 6 hours ago
I too am coming up on a need for no-downtime HA failover for
Postgres. I too am not allowed to use a hosted PaaS-ish solution
like RDS. I was considering Citus's multi master impl (I don't need
to spread the load, just need HA). I had not considered Pacemaker.
Has GoCardless investigated this option and have any insight to
give? HA has traditionally been a real pain point for traditional
RDBMS's in my experience.
Sinjo - 6 hours ago
To be honest we've not looked into Citus in any depth.My early
impression of it (can't speak for the rest of the team) was that
it was mostly aimed at sharding analytics workloads, but parts of
the docs (e.g.
https://docs.citusdata.com/en/v7.1/admin_guide/cluster_manag...)
make it sound like it handles OLTP workloads too.Maybe I've been
ignoring it for bad reasons!EDIT: Managing Postgres clusters is
something that a lot of people are working on. Thought I'd
mention two projects that have me excited right now: - Patroni
https://github.com/zalando/patroni - Stolon
https://github.com/sorintlab/stolon Stolon's client proxy
approach in particular looks interesting, and reminds me of how
people are using Envoy (https://github.com/envoyproxy/envoy),
albeit as a TCP proxy rather than one that understands and can do
fun stuff with the database's protocol. I wonder if we'll start
to see more Envoy filters for different databases!
craigkerstiens - 5 hours ago
Craig from Citus here. Since we grew transactional support a
couple of years ago and a number of the features we've
supported since then much of our traction has come from those
outgrowing single node Postgres and needing more performance.
So in short we're very much focused on handling and supporting
OLTP workloads.We do also support some analytics workloads,
less so data warehousing, when there is a need for end user
facing analytics where higher concurrency and real-time
responsiveness is key.
craigkerstiens - 5 hours ago
Craig from Citus here. Unfortunately, at this time Citus isn't
really focused solely on solving the HA implementation for single
node Postgres. Rather, we're focused on when you need to scale
for performance issues. Our multi-master setup is targeting use
cases that need higher throughput of 500,000+ single wrote
inserts per second or say higher than 5 million writes per second
when using ingestion with Postgres \copy.
mdekkers - 5 hours ago
Look at HAProxy, Patroni and Zookeeper
clarkdave - 4 hours ago
We?ve been using Patroni in production and it has been great. We
use it with consul & pgbouncer and it can failover in under a
minute with a small number of dropped requests (mostly bound by
how many clients your pgbouncer can hold at once while the new
master gets going). Controlled failover for upgrades or
maintenance can be as quick as 10 seconds.
mbrynard - 2 hours ago
When deciding to go with Patroni, did you have a look at
CruncyDB? We're deciding between the two and kubernetes support
on CrunchyDB and documentation seems to be more comprehensive.
ahoka - 2 hours ago
"The RAID controller logged the simultaneous loss of 3 disks from
the array. All subsequent read and write operations against it
failed."People seem to forget that adding a RAID controller creates
a single point of failure instead of removing one. :-)
luke0016 - 1 hours ago
> People seem to forget that adding a RAID controller creates a
single point of failure instead of removing one. :-)At worst it
does both. In most cases it really does just remove a single
point of failure (a disk). Other non-RAID configurations likely
use a shared controller too. Moving that single point of failure
to a different controller doesn't make it any worse.
jskrablin - 4 hours ago
Pacemaker is known to wreak havoc if it gets angry. The usual path
to quick recovery when the cluster goes crazy like this is to make
really sure what's the most up to date replica, shut down Pacemaker
completely, assign VIP manually to a healthy replica and promote it
manually. Then once you're up and back in the business figure out
how to rebuild the cluster.
hinkley - 2 hours ago
Unfortunately appropriate project name? (A maulfunctioning
pacemaker can kill you)
barkingcat - 3 hours ago
If this is indeed true, doesn't this negate the purpose of
pacemaker to begin with? It's like anti-software. When you run
with it in your environment, to recover from a failure (which
seems to me what HA software should be about) you have to turn it
off first or else it will destroy your recovery attempts.It's
like a perverse version of chaos-monkey, except you want it to
destroy you when you are most vulnerable.
jskrablin - 2 hours ago
It's great when it works as expected. When it doesn't... then
the fun begins. I've found it quite fragile, components
versions sensitive, configuration sensitive, etc. Most of the
time I've seen Pacemaker gone crazy Pg itself was happy to
cooperate once the Pacemaker was out of the way. The
unknown/weird Pacemaker failure modes were a real (and scary)
problem.I guess the lesson here is not to rely entirely on some
HA black magic and always have procedures in place for the 'HA
black magic failed us' moments. And team trained to deal with
situation like this. It's only software so it will break sooner
or later.
echelon - 5 hours ago
I'm told that MySQL replication blows Postgres out of the water by
my company's data team, but they could just be biased since that is
their area of expertise. I work on server code and don't really
have much familiarity with the operations of running replica
chains.Postgres seems like a better choice for personal projects
since it has a lot of nifty features. I'm also wary of Oracle, but
that's my own attitude talking. For a startup eventually wanting to
scale, would the better choice be to use MySQL out of the gates? Am
I being mislead about Postgres clusters and availability?Serious
(naive) question; not wanting to start a flame war.
YorickPeterse - 5 hours ago
PostgreSQL's replication itself is perfectly fine, albeit a bit
bare-bones (e.g. it's just replication plus the ability to
trigger a failover). For something like cluster management and
automated failovers you'll need e.g. https://repmgr.org/.
dijit - 5 hours ago
Having ran MySQL in prod for a decade and PostgreSQL in prod for
half a decade I can say without doubt that your data team is
telling fibs.Firstly we consider that there are multiple
replication possibilities of both technologies- however I'm going
to assume the defaults because that's pretty much what everyone
uses except if there's an actual case for using something else.
It's the exception.But by default MySQL uses statement based
replication (in a weird binary format with log positions and
stuff) and postgresql does logical replication (as in, you
transmit the binary differences of what you'll be doing to the
replica's database files directly and the replica just follows
along)Both of these approaches have trade-offs depending on what
you want.Statement based replication is great if you want to have
_different_ datasets on each side, You can transform the data or
remove huge chunks of it on a slave and use it for a dedicated
purpose. However that applies the other way, you can never really
be 100% sure that your replica looks exactly like your
master.this bit me a few times with MySQL when I assumed that
because the replica was 'up to date' with the master and it was
set to read only, that the data had integrity- it absolutely did
not.
antoncohen - 4 hours ago
I don't think the claim of MySQL replication being better is
related to statement vs. row vs. binary diff. I think it is
about the tooling and community knowledge about replication,
and about running MySQL in large scale production environments
in general.MySQL is run more often at extremely large scale
(Facebook, YouTube, Twitter, Dropbox, etc.) than Postgres. That
results in very battle tested and/or featureful tooling like
orchestrator (https://github.com/github/orchestrator), MHA
(https://github.com/yoshinorim/mha4mysql-manager), ProxySQL
(http://www.proxysql.com/), and gh-ost
(https://github.com/github/gh-ost), along knowledge and best
practices shared by those organization.
[deleted]
chousuke - 5 hours ago
Did you mean "physical replication"? Logical replication
corresponds to MySQL's model, I think, whereas WAL streaming is
just copying over the changed bytes as they get written to the
WAL on the master database.I like Postgres's physical
replication for its straightforwardness. It's pretty easy to
tell if your replica is up to date unless something really
weird is going on. (undetected data corruption?).That said,
PostgreSQL doesn't really make replication appear easy, so I
can understand people thinking that even a basic master-slave
setup is difficult (In my experience its behaviour is much
easier to understand than with MySQL). However, MySQL is ahead
in multi-master user friendliness, and setting up eg. a simple
galera cluster is pretty easy.Whether an "easy" multi-master
galera set up is actually production-quality is another matter
entirely, but it is not difficult to get up and running.
merb - 2 hours ago
> Did you mean "physical replication"? Logical replication
corresponds to MySQL's modelnope postgresql supports logical
replication since 10.
https://www.postgresql.org/docs/10/static/logical-
replicatio...> Whether an "easy" multi-master galera set up
is actually production-quality is another matter entirely,
but it is not difficult to get up and running.if you have
regular network partitons (actually having network paritions
is always the case, especially inside clouds) than a galera
cluster can actually broke in several cases that are even
worse than any failure even the most broken replication on
postgresql/mysql non galera can do.
deberon - 4 hours ago
Having also supported hundreds of production MySQL databases,
statement based replication is absolutely inferior. But it
should also be noted that row based replication (similar to
streaming replication in that the actual data changes are
synced) has been supported in MySQL since v5.1.5 (current is
v5.7). And row based replication is the default since v5.7.7
https://dev.mysql.com/doc/refman/5.7/en/replication-formats....
dijit - 3 hours ago
That's something I didn't know actually, so +1 to you.I've
been migrating away from MySQL for a number of years now.
kakwa_ - 3 hours ago
I found configuring an HA setup easier to do with MySQL. The
ability to configure a master <-> master setup simply is really
helpful. Of course, having writes on both sides is generally a
bad idea, but it's quite simple to have master - hot standby
setup, and not go through the slave promotion step.An HA setup
can be a master <-> master setup with a VIP using VRRP and a
check script ( keepalived). Of course, you have to remain
cautious about network partition. Another thing that might be
interesting in some use cases is the ability to skip some
replication errors. This is specially interesting in cases where
consistency is not critical.I actually did some setup like that
with a replication ring (4 full masters), and an additional
daemon re-configuring the ring dynamically when a node was down.
It also monitored the transaction log, trimming them if they were
about to fill up the disk and setting the GTIDs to the new
values. I added some skip error to not block replication. However
it was for a very simple DB (session DB containing just one
table, but an SQL db was required). Basically I switched from a
CA DB to an AP DB, and it's nice to be able to do these kind of
things.I know those are too simplistic setups not taking into
account all the failure modes. But it also make them easier to
understand and to debug.
benmmurphy - 1 hours ago
I think for a while MySQL had a much better replication story but
that has changed now with logical replication now in postgresql.
What most concerns me about MySql is how the configuration can
shoot you in the foot. At my previous company the slave replica
for a database was writable and I'm not even sure how this is a
valid configuration :/ and of course someone ended up running a
query on the slave and corrupting its data.
sitharus - 4 hours ago
MySQL has had replication for _longer_, but I would the MySQL
team has a history of releasing half-baked functionality with a
long list of gotchas. PostgreSQL has only had replication in the
core product for the past couple of versions, previously you had
to rely on third party tools which had some interesting
behaviours, or that were entirely commercial.Since PostgreSQL 10
WAL and logical replication strategies can support just about any
sort of replication you desire, except multi-master.