HN Gopher Feed (2017-10-09) - page 1 of 10 ___________________________________________________________________
Readings in Database Systems, 5th Edition (2015)
66 points by muramira
http://redbook.io___________________________________________________________________
muramira - 2 hours ago
Yesterday, https://news.ycombinator.com/item?id=15428526 hit number
1 on HN. Having read the two books, I strongly believe that they
not only complement each other, but also must be required reading
for any data engineer.
[deleted]
StavrosK - 2 hours ago
Is there an epub of this?
pagnol - 2 hours ago
Yes, there is.
0xFFC - 3 hours ago
Is this new edition? What have been changed?
jonsen - 2 hours ago
I think the preface tells.
tjr - 2 hours ago
New in 2015. See also:
https://news.ycombinator.com/item?id=10694538
alexnewman - 19 minutes ago
Everyone I have met who have worked a long time in the database
industry considers stonebreaker to be- Overrated - Overly Self
Promoting - Mostly not credibleThat being said I love his work and
his history. The red book is super famous. What gives?
alexnewman - 19 minutes ago
I should say I have been writing production databases for almost
10 years. The people I talked to have more than 20 years exp
alexnewman - 17 minutes ago
also sorry for the last name pun
zzzcpan - 1 hours ago
Redbook is too biased, there is just too much perspective from
RDBMS people, which is not relevant in modern distributed
environments or even outright incorrect.Redbook inspired list by
Christopher Meiklejohn [1] is a better alternative, or Aphyr's
course outline [2].[1]
https://github.com/cmeiklejohn/cmeiklejohn.github.io/blob/ma...[2]
https://github.com/aphyr/distsys-class
elvinyung - 1 hours ago
(Disclaimer: I work at Databricks.)RDBMS techniques are
absolutely relevant in modern distributed environments. It has
become increasingly clear that MapReduce is too low-level a
programming model for query processing, so modern distributed
dataflow systems are increasingly hybridizing with RDBMS-like
interfaces and optimizations (e.g. Spark dataframes).
makmanalp - 53 minutes ago
For OP's benefit, here are some excerpts from the red book that
agree with that premise:> Google MapReduce set back by a decade
the conversation about adaptivity of data in motion, by baking
blocking operators into the execution model as a fault-
tolerance mechanism. It was nearly impossible to have a
reasoned conversation about optimizing dataflow pipelines in
the mid-to-late 2000?s because it was inconsistent with the
Google/Hadoop fault tolerance model. In the last few years the
discussion about execution frameworks for big data has suddenly
opened up wide, with a quickly-growing variety of dataflow and
query systems being deployed that have more similarities than
differenceshttp://www.redbook.io/ch7-queryoptimization.htmlAlso
see Stonebraker's comment at the bottom
here:http://www.redbook.io/ch5-dataflow.html
alexnewman - 17 minutes ago
This is an example of stonebraker being crazy
frankmcsherry - 15 minutes ago
For further reading about the high quality of database query
optimization, and how far back MR et al must have set things,
recent SIGMOD work managed to get to within 1000x of a
single-threaded implementation (and so, not quite within that
of data-parallel systems):https://github.com/frankmcsherry/bl
og/blob/master/posts/2017...I don't use databases because
they are really quite bad at computation.In my opinion, the
main recent novelty in query planning has been the work on
worst-case optimal joins, stuff like EmptyHeaded[1] and the
recent FAQ work[2].[1]: https://arxiv.org/abs/1503.02368[2]:
https://arxiv.org/abs/1504.04044
zzzcpan - 1 hours ago
I agree, but you have to look at RDBMS from distributed systems
perspective.
jchanimal - 1 hours ago
The new consensus algorithm families as exemplified by Google?s
Spanner and FaunaDB, my employer, very much make the relational
model relevant to distributed systems. The important achievement
is support for global ACID transactions with performance
acceptable for interactive applications. A comparison of the
algorithms can be found here: https://fauna.com/blog/distributed-
consistency-at-scale-span...
fizixer - 2 hours ago
Does it have something along the lines of building your own RDBMS
from scratch? (If not, any recommendations?)edit: google search has
potentially promising results,
https://www.google.com/search?q=build+your+own+rdbms
hackermailman - 2 hours ago
This course does, has lectures on youtube
http://15721.courses.cs.cmu.edu/spring2017/
makmanalp - 2 hours ago
This book is more about different types of tradeoffs you can make
in terms of your system design. I'd recommend looking at grad
databases courses instead, e.g:- http://db.cs.cmu.edu/courses/ -
http://daslab.seas.harvard.edu/classes/cs165/ -
http://daslab.seas.harvard.edu/classes/cs265/
fizwhiz - 1 hours ago
I really wish this class (or the Harvard class referenced
below) were offered as a MOOC w/ some certification. I rarely
find classes around OS/databases offered as MOOCs, which is a
pity because those are the things I'd love to spend time on.
makmanalp - 57 minutes ago
Part of the problem is that doing a decent MOOC takes a TON
of preparation and effort, much more so than a regular class.
The professor who runs the class (Stratos Idreos) has a
billion things that he's working on, so turning it unto a
MOOC would require some outside support, probably. That said,
releasing the videos might be a possibility, I'll ask and
see. I believe Andy Pavlo's class has videos online
already.The other part is that in the Harvard classes
specifically, the class discussion is a huge part of the
class.
[deleted]
mamcx - 1 hours ago
I also looking for info on this.I will be very happy in how build
a sqlite-like DB engine.All the answer so far is "read the sqlite
code". As if everyone know low-level C or DB design!
jasonwatkinspdx - 2 hours ago
I have a copy of https://www.amazon.com/gp/product/0130402648 and
while I don't think it'll win any "best textbook ever" awards it
presents all the basic concepts in a straightforward if somewhat
simple way.
makmanalp - 2 hours ago
While this is a seminal work, I definitely wouldn't approach this
like I would approach a textbook: it's definitely not meant to be
friendly introductory material. That said, once you have a bit of
background, it's a goldmine of a survey. You'll notice that each
section is a short, few-page long introduction, but the bulk of the
material is in the papers themselves, which can be significantly
tougher to read. Though it's great that the summaries are friendly
and help you contextualize the papers. My tip is to read papers
starting with the introduction, and then the conclusion, and then
decide if you want to dive into the rest of the paper to track down
the evidence for specific claims.