HN Gopher Feed (2017-09-05) - page 1 of 10 ___________________________________________________________________
Solaris to Linux Migration 2017
224 points by hs86
http://www.brendangregg.com/blog/2017-09-05/solaris-to-linux-201...___________________________________________________________________
unethical_ban - 4 hours ago
This is a great set of information comparing features and tools
between the two ecosystems. I like it, and wish more were available
for Linux -> BSD and even lower level command tool comparisons like
apt-get vs. yum/dnf.In fact, this works a general purpose intro
into several important OS concepts from an ops and kernel hacker
perspective.My only surprise is that this is written as a specific
response to Oracle Solaris' demise. From that specific perspective,
how many target viewers are there? 10? Illumos isn't losing
contributors, and there are still several active Illumos distros.
Nevertheless, interesting.
brendangregg - 4 hours ago
Yes, I hope to write one for Solaris -> BSD.This post was written
for the illumos community as well.
dijit - 4 hours ago
Please do. I often advocate for BSD when it fits the need but I
get pushback due to its lack of popularity. If all Solaris
users migrated solely to Linux my position becomes weaker. :(
and BSD is very good at quite a number of things.
cbhl - 38 minutes ago
> "Xen is a type 1 hypervisor that runs on bare metal, and KVM is
type 2 that runs as processes in a host OS."Is it not the other way
around, that KVM runs on bare metal (and needs processor support)
while Xen runs as processes (and needs special kernel binaries)?
detaro - 22 minutes ago
No, it's not.It's true that KVM needs processor support: it kind
of adds a special process type that the kernel runs in an
virtualized environment, through the hardware virtualization
features. The linux kernel of the host schedules the execution of
the VMs.Xen has a small hypervisor running on bare metal. It can
run both unmodified guests using hardware support or modified
guests where hardware access is replaced with direct calls into
the hypervisor (paravirtualization). The small hypervisor
schedules the execution of the VMs. For access to devices it
cooperates with a special virtual machine (dom0), which has full
access to the hardware, runs the drivers and multiplexes access
for the other VMs - the hypervisor is really primarily scheduling
and passing data between domains, very micro-kernel like. Dom0
needs kernel features to fulfill that role.
liuw - 15 minutes ago
No. See the Classification
section.https://en.m.wikipedia.org/wiki/Hypervisor
agentile - 2 hours ago
OmniOS anyone? https://omnios.omniti.com/
SrslyJosh - 1 hours ago
> If you absolutely can't stand systemd or SMF, there is BSD, which
doesn't use them. You should probably talk to someone who knows
systemd very well first, because they can explain in detail why you
should like it.I can't imagine trying to sell anything with the
phrase "why you should like it". SMF certainly doesn't need that
kind of condescending pitch--it just fucking works and doesn't get
in your way.
Veratyr - 1 hours ago
> Linux has also been developing its own ZFS-like filesystem,
btrfs. Since it's been developed in the open (unlike early ZFS),
people tried earlier ("IS EXPERIMENTAL") versions that had serious
issues, which gave it something of a bad reputation. It's much
better nowadays, and has been integrated in the Linux kernel tree
(fs/btrfs), where it is maintained and improved along with the
kernel code. Since ZFS is an add-on developed out-of-tree, it will
always be harder to get the same level of
attention.https://btrfs.wiki.kernel.org/index.php/StatusSo long as
there exists code in BTRFS marked "Unstable" (RAID56), I refuse to
treat BTRFS as production ready. If it's not ready, fix it or
remove it. I consistently run into issues even when using BTRFS in
the "mostly OK" RAID1 mode.I don't buy the implication that "it
will always be harder to get the same level of attention" will lead
to BTRFS being better maintained either. ZFS has most of the same
features plus a few extra and unlike BTRFS, they're actually stable
and don't break.I'm no ZFS fanboy (my hopes are pinned solidly on
bcachefs) but BTRFS just doesn't seem ready for any real use from
my experience with it so far and it confuses me. Are BTRFS
proponents living in a different reality to me where it doesn't
constantly break?EDIT: I realize on writing this that it I might
sound more critical of the actual article than I really am. I think
his points are mostly fair but I feel this particular line paints
BTRFS to have a brighter, more production-ready future than I
believe is likely given my experiences with it. BTRFS proponents
also rarely point out the issues I have with it so I worry they're
not aware of them.
the8472 - 1 hours ago
> ZFS has most of the same features plus a few extraThe ZFS
feature set is not a strict superset of what btrfs offers. The
ability to online-restripe between almost any layout combination
is quite useful for example. So is on-demand deduplication, which
is also far less resource-intensive than ZFS dedup.
Veratyr - 52 minutes ago
> The ability to online-restripe between almost any layout
combination is quite useful for exampleThis is true but since
the only stable replication options on BTRFS are RAID1 and
single, this online restripe is of very limited usefulness.
brendangregg - 22 minutes ago
We're using both btrfs and zfsonlinux right now, in production,
and fortunately we're not consistently running into issues (I'd
be hearing about it if we were!).I should note that we do have a
higher risk tolerance than many other companies, due to the way
the cloud is architected to be fault tolerant. Chaos monkey can
just kill instances anytime, and it's designed to handle
that.Anyway, getting into specific differences is something that
we should blog about at some point (the Titus team).
holydude - 4 hours ago
It is actually sad to see that you had to write this.Funny how the
most used / popular technology and a mismanagement from a single
company can crush other competing tech.It is frightening how much
of what was invested in Solaris is now lost because of it.
Ologn - 4 hours ago
> Crash Dump Analysis...In an environment like ours (patched LTS
kernels running in VMs), panics are rare.As the order of magnitude
of systems administered increases, rare changes to occasional
changes to frequent. Especially when it is not running in a
VM.Also, from time to time you just get a really bad version of a
distro kernel, or some off piece of hardware that is ubiquitous in
your setup, and these crashes become more frequent and
serious.(Recent example of a distro kernel bug -
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674838 . I
foolishly upgraded to Ubuntu 17.04 on its release in stead of
letting it get banged around for a few weeks. For the next five
weeks it crashed my desktop about once a day, until a fix was
rolled out in Ubuntu proposed)Most companies I've worked at want to
have some official support channels, so usually we'd be running
RHEL, and if I was seeing the same crash more than once I'd
probably send the crash to Red Hat, and if the crash pointed to the
system, then the server maker (HP, Dell...) or hardware driver
maker (QLogic, Avago/Broadcom).Solaris crash dumps worked really
well though - they worked smoothly for years before kdump was
merged into the Linux kernel. It is one of those cases where you
benefited from the hardware and software both being made by the
same company.
tytso - 3 hours ago
Crash dumps don't matter as much if your distributed architecture
has to account for hardware failures. (Or VM failures, or
network hiccups, etc.)Kernel developers still have to use crash
dumps to root-cause an individual crash, but crash dumps are most
useful for extremely hard-to-reproduce crashes that are rare (but
if you are using the "Pet" model as opposed to the "Cattle"
model, even a single failure of a critical DB instances can't be
tolerated). For crashes that are easy to trigger, crash dumps
are useful, but they are much less critical to figure out what's
going on. If your distributed architecture can tolerate rare
crashes, then you might not even consider worth the support
contract cost to root cause and fix every last kernel crash.Yes,
it's ugly. But if you are administrating a very large number of
systems, this can be a very useful way of looking at the world.
rodgerd - 2 hours ago
> As the order of magnitude of systems administered increases,
rare changes to occasional changes to frequent. Especially when
it is not running in a VM.I think the perf engineer for Netflix
is quite aware of this.
jsiepkes - 2 hours ago
While I really respect Brendan's opinion (I've got most of his
books and he is one of my IT heroes) I do think he is very
netflix-IT-scale minded. When your Netflix you can maintain
your own kernel with ZFS, DTrace, etc. and have a good QA setup
for your own kernel / userland. Basically maintain your own
distro. However when your in a more "enterprisy" environment
you don't have the luxury of making Ubuntu with ZoL stable
yourself. I know from first hand experience that ZoL is
definitely not as stable as FreeBSD ZFS or Solaris ZFS.
espadrine - 4 hours ago
> It's a bit early for me to say which is better nowadays on Linux,
ZFS or btrfs, but my company is certainly learning the answer by
running the same production workload on both. I suspect we'll share
findings in a later blog post.I am eager to read this piece!Even
though I am afraid to see it confirm that btrfs still struggles to
catch up?The 2016 bcachefs benchmarks[0] are a mixed bag.[0]:
https://evilpiepirate.org/~kent/benchmark-full-results-2016-...
jsiepkes - 4 hours ago
Is BTRFS really an option now that RedHat has decided to pull the
plug on their BTRFS development? That basically leaves Oracle and
Suse I think? As far as I can tell the future of BTRFS doesn't
look good.Facebook using it doesn't mean anything since they are
probably using it for distributed applications. Meaning the
entire box (including BTRFS) can just die and the cluster won't
be impacted. I really can't imagine they are using BTRFS on every
node in their cluster.
geofft - 4 hours ago
> Facebook using it doesn't mean anything since they are
probably using it for distributed applications. Meaning the
entire box (including BTRFS) can just die and the cluster won't
be impacted.I don't think that follows for lots of reasons:- If
enough of your boxes die that you lose quorum (whether from
filesystem instability or from unrelated causes like hardware
glitches), your cluster is impacted. So, at the least, if you
expect your boxes to die at an abnormally high rate, you have
to have an abnormally high number of them to maintain service.-
Filesystem instability is (I think) much less random than
hardware glitches. If a workload causes your filesystem to
crash on one machine, recovering and retrying it on the next
machine will probably also make it crash. So you may not even
be able to save your service by throwing more nodes at the
problem. A bad filesystem will probably actually break your
service.- Crashes cause a performance impact, because you have
to replay the request and you have fewer machines in the
cluster until your crashed node reboots. It would take an
extraordinarily fast filesystem to be a net performance win if
it's even somewhat crashy.- Most importantly, distributed
systems generally only help you if you get clean crashes, as in
power failure, network disconnects, etc. If you have silent
data corruption, or some amount of data corruption leading up
to a crash later, or a filesystem that can't fsck properly,
your average distributed system is going to deal very poorly.
See Ganesan et al., "Redundancy Does Not Imply Fault Tolerance:
Analysis of Distributed Storage Reactions to Single Errors and
Corruptions", https://www.usenix.org/system/files/conference/fa
st17/fast17...So it's very doubtful that Facebook has decided
that it's okay that btrfs is crashy because they're running it
in distributed systems only.
jsiepkes - 3 hours ago
This article https://www.linux.com/news/learn/intro-to-linux
/how-facebook... explains somewhat what Facebook does with
BTRFS."Mason: The easiest way to describe the infrastructure
at Facebook is that it's pretty much all Linux. The places
we're targeting for Btrfs are really management tasks around
distributing the operating system, distributing updates
quickly using the snapshotting features of Btrfs, using the
checksumming features of Btrfs and so on.We also have a
number of machines running Gluster, using both XFS and Btrfs.
The target there is primary data storage. One of the reasons
why they like Btrfs for the Gluster use case is because the
data CRCs (cyclic redundancy checks) and the metadata CRCs
give us the ability to detect problems in the hardware such
as silent data corruption in the hardware. We have actually
found a few major hardware bugs with Btrfs so it?s been very
beneficial to Btrfs."The sentence: "We also have a number of
machines running Gluster, using both XFS and Btrfs." seems to
imply Facebook is not using it heavily for actual data
storage. What I distill from this (which is obviously my
personal interpretation) is that Facebook mostly uses it for
the OS and not for actual precious data.
geofft - 2 hours ago
I'm reading that as quite the opposite: they're saying that
Gluster, a networked file storage system, is being backed
with btrfs as the local filesystem, so all data stored in
Gluster is ultimately stored on btrfs volumes. (They're
also using it for OS snapshotting, yes, but insofar as the
data stored in Gluster is important, they're storing
important data on btrfs.)See also
https://code.facebook.com/posts/938078729581886/improving-
th..."We have been working toward deploying Btrfs slowly
throughout the fleet, and we have been using large gluster
storage clusters to help stabilize Btrfs. The gluster
workloads are extremely demanding, and this half we gained
a lot more confidence running Btrfs in production. More
than 50 changes went into the stabilization effort, and
Btrfs was able to protect production data from hardware
bugs other filesystems would have missed."
CrystalGamma - 3 hours ago
Given how many times btrfs has failed to read data or to
mount (with an error), I would imagine this is why btrfs is
used by Facebook: because it isn't afraid to 'just let it
crash' (cleanly), to use Erlang rethoric.
geofft - 2 hours ago
Yeah, it's definitely true that you want a filesystem with
data and metadata checksums if you want high reliability.
(I think btrfs and ZFS are the only Linux-or-other-UNIX
filesystems with data checksums?)But I think the inference
to make is that Facebook trusts btrfs to increase
reliability, not that Facebook trusts their distributed
systems to cover for btrfs decreasing reliability to gain
performance (or features).
vbernat - 3 hours ago
Redhat was never a big contributor to BTRFS. This still means
less users, but not less devs.
acdha - 3 hours ago
For a filesystem I'd worry about what that implies for
testing, especially for enterprise hardware and workloads.
RHEL has significantly more users and it seems likely that
their users would have more diversity than just Oracle shops.
kuschku - 2 hours ago
Are you completely forgetting SUSE? They?re a thing, too,
after all.
jsiepkes - 4 hours ago
Nice article! Though I do think the article could have more clearly
noted that Linux containers are not meant as security boundaries.
It doesn't explicitly say it but it is a very important
distinction.Unlike FreeBSD jails and Solaris Zones. You can't run
multiple docker tennant's safely on the same hardware. Docker is
basically the equivalent of a sign which says: "don't walk on the
grass" as opposed to an actual wall which FreeBSD jails and Solaris
zones have. Now if you have a very homogene environment (say you
are deploying hundreds of instances of the exact same app) then
this is probably fine. Docker is primarily a deployment tool. If
your an organization which runs all kinds of applications (with
varying levels of security quality) that's an entirely different
story.
vishvananda - 1 hours ago
> Docker is basically the equivalent of a sign which says: "don't
walk on the grass" as opposed to an actual wall which FreeBSD
jails and Solaris zones have.I think this is dramatically
overstating the risks. It is possible to run containers securely,
it is just much more difficult to secure containers on linux than
on BSD or Solaris. It is significantly difficult to break out of
a properly configured container (using user namespaces, seccomp,
and selinux/apparmor), and I know of no cases where it has been
done successfully.I still separate tenants onto VMs because I
don't want to be the first example of a breakout, but I don't
think people who isolate with containers are crazy, just a little
less risk-averse.
ryanlol - 2 hours ago
... Precisely none of these are technologies that you should use
to "safely" run multiple things on the same hardware.
zlynx - 3 hours ago
There are dangerous things that you can allow Docker to do. But
if you don't do those things, it is pretty difficult to break out
of a container.Redhat has been especially good here, with not
allowing anyone but host-root to connect to Docker and using
SELinux and seccomp filtering. With those working, it doesn't
matter if your container mounts a host filesystem since it won't
have the correct SELinux roles and types anyway.Many people claim
that ruins Docker, since now you can't use Docker from within
Docker. But that's the price you pay for security.I believe that
with the correct precautions, a Linux container is just as safe
as a jail or zone. Perhaps the problem is just how easy it is for
a sysadmin to put holes into the containers that ruin the
security.
s_kilk - 3 hours ago
> since now you can't use Docker from within Docker.Docker-in-
docker is a trashfire that barely works anyway, it's no real
loss.
bboreham - 38 minutes ago
Think you might be talking about running the Docker daemon
inside Docker, which is a different thing from just calling
Docker from a container.
Spooky23 - 1 hours ago
Linux containers are very similar to Zones.I doubt you would
choose between the two technologies based on security.
rodgerd - 2 hours ago
The fact this is the top-rated comment really does say a lot
about the technical literacy of the newsy audience.
unethical_ban - 2 hours ago
Multiple explanations, not all of which need to be the case:*
Not everyone needs or is an expert on containers, just as not
everyone is knowledgable about the TCP stack, dynamic routing,
assembly optimization, or name your topic.* It's a true and
well-stated comment in itself and deserves to be recognized,
even if many already know it.
rjzzleep - 2 hours ago
It's just sad to watch this. the fact that the whole root zone
etc. is just built right into every part of the os is amazing.
And yet you have a bunch of companies just stroking their egos. I
looked at the illumos-gate and smartos contributors as brendan
suggested and there isn't much.I wonder if adding a proper wifi
stack and commodity hardware supported would have helped. Maybe
it's just wishful thinking but I thought it would have been nice
for cheap routers and home nas.The fact that there is so little
documentation also probably didn't help it
raesene9 - 1 hours ago
I'd be very interested to hear details of exactly how you would
suggest, if I understand you correctly, that any Linux container
can be broken out of.With user namespacing in use (available in
Docker since 1.12) I'm not currently aware of any trivial
container-->host breakouts or container --> container
breakouts.There are information leaks from /proc but they don't
generally allow for breakout, and in general Dockers defaults
aren't too bad from a security standpoint.The only exception for
the general case is, I'd say, the decision to allow CAP_NET_RAW
by default, which is a bit risky.
Hello71 - 1 hours ago
https://google.com/search?q=user+namespace+vulnerabilityhttps:/
/lwn.net/Articles/543273/https://utcc.utoronto.ca/~cks/space/bl
og/linux/UserNamespace...
raesene9 - 1 hours ago
Right so each of those are specific vulns. like all code
gets, and not a systemic "linux containers aren't a security
boundry" as suggested by the OP, which was the point I was
asking for more info. on.All code has bugs, some of those are
security bugs. There's a big difference between "if you
haven't patched or I have a 0-day I can compromise you" and
"no matter how well patched you are this isn't a security
boundry so it can be bypassed"My reading of the top comment
was it was suggesting the latter with regard to Linux
containers, and I'm not sure that's true.
DCKing - 2 hours ago
I'm not sure what you mean here. Do you mean that:1) There are
known ways to perform Docker escapes on any or some common Docker
setup. You could write a Docker escape binary or script today and
it would not be a zero day. That's just the way Docker is.-or-2)
You simply have less faith in the ways Docker performs isolation.
One could write a Docker escape exploit and it would be a zero
day, but you expect there to be more of such zero days in Docker
than in Jails/Zones.?If 1) I'd be really interested in seeing it
and if 2) I'd like to know more about what additional levels of
isolation jails and zones (and LXC?) perform.
TheDong - 3 hours ago
Given that's the case, I'm sure you can go capture the flag at
https://contained.af/ .. no one has yet (docker containers, heavy
seccomp filtering).Or maybe you can break out of the Google App
Engine linux container and let me know how it looks (linux
containers, quite well-worn at this point)?Or perhaps you can
check for me whether AWS Lambda actually collocates tenents or
not (unknown linux containers)?Or you can launch a heroku dyno
and break into another dyno and steal some keys (lxc linux
containers)?In reality, many services do colocate multiple
different users together in linux containers. If you use seccomp
and ensure the user is unprivileged in the container, it's fairly
safe. Heroku has been doing it for years upon years now. The
other services I named above likely do.Linux containers
absolutely are intended as security boundaries. Kernel bugs which
allow escaping a properly setup mount namespace or peeking out of
a pid namespace or going from root in a userns to root on the
host are all treated as vulnerabilities and patched. That clearly
expresses the intent.Yes, I agree that in reality they're likely
not yet as mature / secure as jails or zones, but I think it's
disingenuous to say that Linux containers aren't meant to be
security boundaries.
okket - 4 hours ago
Maybe relevant: "Setting the Record Straight: containers vs.
Zones vs. Jails vs. VMs"https://blog.jessfraz.com/post
/containers-zones-jails-vms/Some discussion about this article
here on HN:https://news.ycombinator.com/item?id=13982620 (160
days ago, 235 comments)
jsiepkes - 4 hours ago
True, and that very long post basically says with many words:
"Yes, Linux namespace (docker) isn't as secure as FreeBSD jails
or Solaris Zones but security is not the problem docker solves.
Docker solves a deployment problem, not a security problem."
KGIII - 2 hours ago
Can't you just run docker in something like firejail?
gvb - 3 hours ago
Comparing containers (a concept) with VM/jails/zones is a
non-sequitur. To quote:A ?container? is just a term people
use to describe a combination of Linux namespaces and
cgroups. Linux namespaces and cgroups ARE first class
objects. NOT containers.[...]VMs, Jails, and Zones are if you
bought the legos already put together AND glued. So it?s
basically the Death Star and you don?t have to do any work
you get it pre-assembled out of the box. You can?t even take
it apart.Containers come with just the pieces so while the
box says to build the Death Star, you are not tied to that.
You can build two boats connected by a flipping ocean and no
one is going to stop you.Docker is a bunch of boxes floating
on a flipping ocean[1]. They could have made a deathstar, but
they chose not to.[1]
https://www.google.com/search?q=docker+logo
cmurf - 2 hours ago
SmartOS might be easier for the Solaris familiar, looking to deploy
Linux containers rather than go fully Linux.
severino - 3 hours ago
Solaris support ends in November, 2034. Yeah, 17 years from now. No
need to hurry ;-)
tannhaeuser - 2 hours ago
Hm, when was the next Unix timestamp range overflow again?
severino - 2 hours ago
I think it was 2038... so Solaris won't need a patch :-D
pjmlp - 3 hours ago
Given that they fired everyone, who do you think is going to give
that support and fix bugs?
severino - 3 hours ago
My comment was a joke, obviously. However, it's not that
difficult to provide that kind of support and bugfixing. It's
not like developing something new.
mrpippy - 3 hours ago
They didn't quite fire everyone: Alan Coopersmith
(https://twitter.com/alanc/status/904366563976896512) is still
present. I'm sure lots of support and bug fixes will be
neglected and probably moved offshore though.Also, it certainly
seems like Oracle Solaris 11.3 (released in fall 2015) will be
the last publicly available version. Between-release updates
(SRUs) have always been for paying customers only, but now it
seems like there will never be another release.
yourapostasy - 2 hours ago
It will be interesting to see if the Osborne Effect [1] repeats
itself with Solaris.[1]
https://en.wikipedia.org/wiki/Osborne_effect
mrbill - 45 minutes ago
People started moving away from Solaris/SPARC in droves as soon
as Oracle acquired Sun.An example: the "old" Sun gave me a
loaded T1000 system to run SUNHELP.ORG on.The "New" Sun
wouldn't even give me Solaris patches / security updates (which
used to be free) without a support contract.I had to eventually
move the site to being hosted on a Debian box because I
couldn't afford the hundreds of dollars they wanted every year
for patch access.It really chapped my hide. I'd even been part
of the external OpenSolaris release team.
dwheeler - 4 hours ago
If you can't access it directly, here's a cached version:
https://web.archive.org/web/20170905181357/http://www.brenda...