HN Gopher Feed (2017-11-04) - page 1 of 10 ___________________________________________________________________
Evio - Fast event-loop networking for Go
215 points by Acconut
https://github.com/tidwall/evio___________________________________________________________________
bsaul - 10 hours ago
Not sure i understand what the use case is. As soon as you start
doing something on the event loop , you need some kind of way to
perform the operation in another "thread" ( or goroutine or
whatever). And then you start to need some kind of concurrency
mechanism, and pay the price.Stripping those mechanism to pretend
the event handling is faster only works if you never intend to have
some real computation performed. That's never true in practice...
Or am i missing something ?
dboreham - 8 hours ago
Not the OP but typically you resort to these tactics when you
want to shave the last ms of the server's response time, and/or
get that last 1000 requests/s/core performance. You have a "fast
path" that is simple and event driven and hand off operation
processing to regular threads for the (less frequent) more
complex operations.So you're not missing anything.
indescions_2017 - 9 hours ago
I'd be interested in the level seven reverse proxy application. As
well as unix domain socket message queues. There are probably many
other places in the networking pipeline evio could provide a
boost.It's a testament to what is possible through the "syscall"
and "golang/x/sys" facilities. As well as your confidence in
playing with Linux internals ;)
tidwall - 8 hours ago
Thanks! The L7 proxy should be pretty sweet. :)
adrianratnapala - 9 hours ago
Worst of both worlds, only faster?
olivierva - 11 hours ago
The Go network stack already makes use of epoll and kqueue:
https://golang.org/src/runtime/netpoll_epoll.go So I'm not quiet
sure why this would be faster since almost all I/O in Go is event
driven, including the networking stack.
cafxx - 11 hours ago
Well, I guess because the runtime has to do a bunch of work to
dispatch the events to the appropriate goroutine that is blocked
waiting for that event. Switching and synchronization between
goroutines is cheap, not free.
amelius - 11 hours ago
And this solution does somehow not switch between goroutines?
willvarfar - 11 hours ago
No, the event loop is just one big goroutine that calls all
the handlers directly.
allengeorge - 7 hours ago
Which...is the same as ?not switch between goroutines? no?
willvarfar - 7 hours ago
(I think there was an edit)
willvarfar - 11 hours ago
The benchmarks at the bottom of the readme show quite an
improvement (with a single thread it seems).I would speculate the
performance win is because there is no stack switching and less
channels.I've done lots of event loops in the past (eg hellepoll
in c++) and think that the cost of that is on the programmer -
keeping track of things, callbacks, state machines and things and
avoiding using the stack for state etc is all hard work and easy
to mess up.I am reminded of this post I saw on HN a while ago
https://www.mappingthejourney.com/single-post/2017/08/31/epi...
Ryan Dahl, creator of node.js, would just use Go today ;)
striking - 9 hours ago
Yeah, but that's with 1 maximum thread. The whole point of Go
is to use green threading to your advantage.
tidwall - 8 hours ago
While goroutines are cool, there's not the whole point of Go
for me.
Acconut - 8 hours ago
> I've done lots of event loops in the past (eg hellepoll in
c++) and think that the cost of that is on the programmer -
keeping track of things, callbacks, state machines and things
and avoiding using the stack for state etc is all hard work and
easy to mess up.I very much agree. In the past, I have had
quite some fun developing a few streaming parsers using
Node.js, which also uses an event loop. And while these parser
worked relatively good and efficient, debugging them was not an
easy task. In addition, understanding the code is also a though
challenge, especially for people other than the original
authors.When I started using Go more and more, I really enjoyed
the different I/O-model using goroutines and blocking function
calls. It also has a few drawbacks but the mental model is a
lot easier to reason about.
nly - 8 hours ago
> I've done lots of event loops in the past (eg hellepoll in
c++) and think that the cost of that is on the programmer -
keeping track of things, callbacks, state machines and things
and avoiding using the stack for state etc is all hard work and
easy to mess up.This is improving, even in C++. This is what
the core loop of a line-based echo server could look like in
C++17 (and something very similar compiles today on my machine)
void echo_loop (tcp::socket socket) { io::streambuf
buffer; std::string line; std::error_code ec;
do { ec = co_await async_read_line (socket, buffer,
line); if (ec) break;
ec = co_await async_write (socket, line); } while
(!ec); }
tidwall - 8 hours ago
Wow. That looks really simple.
nly - 8 hours ago
Unfortunately it's just exposition, but here[0] is a
version that works with Clang 5 + BoostEcho specific code
starts on line 167. Everything above will hopefully be
provided by the standard library once both the Networking
TS and Coroutine TS merge in to C++20.One nice thing about
lines 1 - 165 though, is that it demonstrates how easy it
is to extend the native coroutine capabilities in C++ to
support arbitrary async libraries, even if the author of
those libraries didn't know anything about coroutines. All
this happens without breaking the ability to call these
coroutines from C. You can even use async C libraries that
only provide a void* argument to your callback.[0] https://
gist.github.com/anonymous/d9a258136431a352516122d1c9...
brian-armstrong - 3 hours ago
At this point, why not just use C++? I feel like people are trying
to stretch Go way past what it's good for. It's not going to
replace C++ where C++ is effective, and it shouldn't :)
pcwalton - 2 hours ago
Because memory safety is much easier to achieve in Go than in
C++, even "modern C++".
tidwall - 8 hours ago
This project is not intended to be a general purpose replacement
for the standard Go net package or goroutines. It's for building
specialized services such as key value stores, L7 proxies, static
websites, etc.You would not want to use this framework if you need
to handle long-running requests (milliseconds or more). For
example, a web api that needs to connect to a mongo database,
authenticate, and respond; just use the Go net/http package
instead.There are many popular event loop based applications in the
wild such as Nginx, Haproxy, Redis, and Memcached. All of these are
single-threaded and very fast and written in C.The reason I wrote
this framework is so I can build certain network services that
perform like the C apps above, but I also want to continue to work
in Go.
Acconut - 7 hours ago
> It's for building specialized services such as key value
stores, L7 proxies, static websites, etc.First of all, thank you
for publishing this project. It's very interesting in my opinion
since I never thought about the benefits of an event loop. Would
you mind explaining briefly why an event loop is a better suit
for these applications? Is it due to performance and efficiency?
jerf - 7 hours ago
I'd suggest that's not the right way to look at it. To a first
approximation, "everything" is using an event loop nowadays, in
that everything is using the same fundamental primitives to
handle and dispatch events. In particular, this includes the Go
runtime; run "strace" on a Go network program and you'll see
these same calls pop up in the strace.What this does instead is
give a Go program direct access to the event loop. The benefit
is that it bypasses all of the stuff that Go wraps around the
internal event loop call that allows it to implement the way it
offers a thread-like interface for you, and integrates with the
channel and concurrency primitives, and maintains your position
in the call stack between events, etc. The penalty is... the
exact same thing, that you lose all the nice stuff that the Go
runtime offers to you to implement the thread-like interface,
etc., and are back to a lower-level interface that offers less
services.The performance of the Go runtime is "pretty good",
especially by scripting language standards, but if you have
sufficiently high performance requirements, you will not want
to pay the overhead. The pathological case for all of these
nice high-level abstractions is a server that handles a ton of
network traffic of some sort and needs to do a little something
to every request, maybe just a couple dozen cycle's worth of
something, at which point paying what could be a few hundred
cycles for all this runtime nice stuff that you're not using
becomes a significant performance drain. Most people are not
doing things where they can service a network request in a few
dozen cycles, and the longer it takes to service a single
request the more sense it makes to have a nice runtime layer
providing you useful services, as it drops in the percentage of
CPU time consumed by your program. For the most part, if you
are so much as hitting a database over a network connection,
even a local one, in your request, you've already greatly
exceeded the amount of time you're paying to the runtime, for
instance.It does seem to me that a lot of people are a bit
bedazzled by the top-level stuff that various languages offer,
and forget that under the hood, everyone's using the event-
based interfaces. What differs between Node and Twisted and all
of the dozens or hundreds of other viable wrappers over these
calls is the services automatically provided, not whether or
not they are "event loops". Go is an event loop at the kernel
level. Node is an event loop at the kernel level. Erlang is an
event loop at the kernel level. They aren't all the same, but
"event-based" vs. "not event-based" is not the distinction;
it's a question of what they lay on top of the underlying event
loop, not whether they use it. Even pure OS threads are,
ultimately, event loops under the hood, just in the kernel
rather than the user space.
[deleted]
pcwalton - 2 hours ago
> It does seem to me that a lot of people are a bit bedazzled
by the top-level stuff that various languages offer, and
forget that under the hood, everyone's using the event-based
interfaces.Yup. It's all very similar under the hood.The most
important difference between I/O models is whether the
paradigm involves explicit vs. implicit management of the
event loop. Callback models like Node, async/await style
models like those of C#, and low-level primitives like IOCP,
epoll, and kqueue fall into the former category. Go/Erlang,
plain old threads, and even Unix processes fall into the
latter category. There are advantages and disadvantages of
each model.Within each of these broad categories, the
distinctions are, IMHO, much less interesting, and they're
often made out to be more significant than they actually are.
In particular, the distinction between runtimes like Go and
regular OS pthreads is often made out to be more important
than it really is, when the difference ultimately boils down
to the CPU privilege level that thread management runs at.
shabbyrobe - 46 minutes ago
This is an extremely helpful explanation. Would you consider
adding a "Rationale" subheading to the readme and pasting this in
wholesale? Great project, thanks for sharing!
tidwall - 25 minutes ago
I just added it. Thanks for the suggestion
fooyc - 11 hours ago
I love Go because I never had to write asynchronous, callback
driven programs in this language. I hope it won't become the norm
in Go, too.
tidwall - 8 hours ago
It won't become the norm. I promise you, cross my heart and hope
to die, stick a needle in my eye.
crawshaw - 10 hours ago
One of my favorite things about Go is that it cuts through the
"threads vs. events" debate by offering thread-style programming
with event-style scaling using what you might call green threads
(compiler assisted cooperative multitasking that has the
programming semantics of preemptive multitasking).That is, I can
write simple blocking code, and my server still scales.Using event
loop programming in Go would take away one of my favorite things
about the language, so I won't be using this. However I do
appreciate the work, as it makes an excellent bug report against
the Go runtime. It gives us a standard to hold the standard
library's net package to.
pcwalton - 2 hours ago
It doesn't really "cut through" the debate any more than any
other implementation of threads does. The only difference between
Go and plain old one-thread-per-connection is that regular
threads run in the kernel, while Go threads run in userspace.
That's not a semantic difference, only an implementation detail
(a large detail, to be clear, but still an implementation
detail).There were historical implementations of pthreads, such
as NGPT, that used precisely the same model as Go, and they were
abandoned because the advantages over 1:1 were not sufficient to
justify the complexity.
kjksf - 2 hours ago
What you call a "Go thread" has a precise name (goroutine) and
running in userspace is hardly the only difference between a
goroutine and a kernel thread.Creating and destroying kernel
threads is significantly more expensive.A kernel thread has a
fixed stack and if you go beyond, you crash. Which means that
you have to create kernel threads with worst-case-scenario
stack sizes (and pray that you got it right).Goroutine has an
expandable stack and starts with very small stack (which is
partly why it's faster; setting up kernel page mappings to
create a contiguous space for a large stack is not
free).Finally, goroutine scheduling is different than kernel
thread scheduling: a blocked goroutine consumes no CPU
cycles.In a 4 core CPU there is no point in running more than 4
busy kernel threads but kernel scheduler has to give each
thread a chance to run. The more threads you have, the more
time kernel spends and pointless work of ping-ponging between
threads. That hurts throughput, especially when we're talking
about high-load servers (serving thousands or even millions of
concurrent connections).Go runtime only creates as many threads
as CPUs and avoids this waste.That's why high-perf servers
(like nginx) don't just use kernel thread per connection and go
through considerable complexity of writing event driven code.Go
gives you straightforward programming model of thread-per-
connection with scalability and performance much closer to
event-driven model.You work on Rust and are well informed about
this topic so I'm sure you know all of that.Which is why it
amazes me the lengths to which you go to denigrate Go in that
respect and minimize what is a great and unique programming
model among mainstream languages.
pcwalton - 1 hours ago
> What you call a "Go thread" has a precise name (goroutine)I
call goroutines threads because they are user-level
threads.As an analogy, NVIDIA calls local threadgroups
"warps", but that doesn't make them not local threadgroups.>
Creating and destroying kernel threads is significantly more
expensive.Because kernel threads usually have larger stacks.
But they don't always have large stacks: that is
configurable. Other than the stack size, the primary
difference is simply that kernel threads are created in
kernel space and user threads are created in userspace.> A
kernel thread has a fixed stack and if you go beyond, you
crash. Which means that you have to create kernel threads
with worst-case-scenario stack sizes (and pray that you got
it right).You can do stack switching in 1:1 too. After all,
if you couldn't, then Go couldn't do stack switching at all,
since goroutines are built on top of kernel threads.Go's
small stacks are really a property of the moving GC, not a
property of the threading model.> In a 4 core CPU there is no
point in running more than 4 busy kernel threads but kernel
scheduler has to give each thread a chance to run.> Go
runtime only creates as many threads as CPUs and avoids this
waste.Not if they're blocked doing I/O!If they're not blocked
doing I/O, then Go tries to do preemption just as the kernel
does. (I say "tries to" because Go currently cannot preempt
outside function boundaries; this is a significant downside
of M:N threading compared to 1:1 kernel threading.)> That's
why high-perf servers (like nginx) don't just use kernel
thread per connection and go through considerable complexity
of writing event driven code.High-performance servers like
nginx use an event loop because it's the only way to get the
absolute fastest performance, with no overhead of stacks at
all. The fact that the project described in the article gets
better performance than Go's threads is proof of that fact,
in fact.It would be possible, and interesting, to do Go-like
1:1 threading with small stacks.> Go gives you
straightforward programming model of thread-per-connection
with scalability and performance much closer to event-driven
model.Sure. But that's mostly because of the GC, not because
of the M:N threading model.> Which is why it amazes me the
lengths to which you go to denigrate Go in that respect and
minimize what is a great and unique programming model among
mainstream languages.It's not unique. As I said, NGPT used to
do M:N for pthreads. Solaris used to do M:N for pthreads. The
JVM used to do M:N.
crawshaw - 2 hours ago
The goroutine implementation scales, while other thread
implementations (by default) do not. That's a semantic
difference. A Go server can have millions of active goroutines
with moderate resource use.You can achieve the same on Linux or
Solaris using kernel threads, but you have to work at it. With
Go you don't have to work at it, and it works on macOS and
Windows and a few other OSs too.This is all comparisons between
O(1) things, but the constant factor matters.
pcwalton - 2 hours ago
> You can achieve the same on Linux or Solaris using kernel
threads, but you have to work at it.By setting the thread
stack size to a reasonable value. That's it. And, in fact, on
64-bit you often don't even need to do that.The difference
you're describing is a difference in default thread stack
sizes, which is hardly a paradigm shift. We're talking about
one call to pthread_attr_setstacksize().
crawshaw - 2 hours ago
It's not nearly as simple as you claim.First: if you have
an epoll loop it is also the cost of the thread context
switch, which has definitely us in RPC systems using kernel
threads. By contrast the goroutine gets scheduled onto the
kernel thread that answered the poll, saving the
switch.Second: as I alluded to earlier, linux and solaris
can scale their kernel thread implementations, not all OSs
can. My experiences with large numbers of threads on the
BSDs and Windows (in years past admittedly) suggest other
kernels don't have thread implementations designed to scale
to such high numbers. Solving the problem in userspace
means Go programs written in this style are portable across
operating systems.Third: you can only adjust stack sizes
down if you know your program always keeps its stacks
small. If you depend on libraries you don't own in C/C++,
that's a difficult assumption. Go grows the stacks, so if
you hit some corner case where a small number of goroutines
need some significant amount of stack, your program uses
more memory, but typically keeps working. No need for
careful (manual!) stack accounting.If all this were as easy
as you say, we would still write nearly all our C/C++
servers using threads. We don't because it's not.
pcwalton - 1 hours ago
> First: if you have an epoll loop it is also the cost of
the thread context switch, which has definitely us in RPC
systems using kernel threads. By contrast the goroutine
gets scheduled onto the kernel thread that answered the
poll, saving the switch.I'm not comparing M:N to a 1:1
system where all I/O is proxied out to another thread
sitting in an epoll loop. I'm comparing M:N to 1:1 with
blocking I/O. In this scenario, the kernel switches
directly onto the appropriate thread.> Second: as I
alluded to earlier, linux and solaris can scale their
kernel thread implementations, not all OSs can.The vast
majority of Go users are running Linux. And on Windows,
UMS is 1:1 and is the preferred way to do high-
performance servers; it avoids a lot of the problems that
Go has (for instance, playing nicely with third-party
code).> Third: you can only adjust stack sizes down if
you know your program always keeps its stacks small.You
could do 1:1 with stack growth just as Go does. As I've
said before, small stacks are a property of the
relocatable GC, not a property of the thread
implementation.> If all this were as easy as you say, we
would still write nearly all our C/C++ servers using
threads.We don't write C/C++ servers using threads
because (1) stackless use of epoll is faster than both
1:1 threading and M:N threading, as this project shows;
(2) C/C++ can't do relocatable stacks, as the language is
hostile to precise moving GC.
hannofcart - 7 hours ago
That is, I can write simple blocking code, and my server
still scales. Using event loop programming in Go would take away
one of my favorite things about the language, so I won't be using
this.
If Go has or can emulate 'generators' a-la
Python/Nodejs,then you can write synchronous looking,blocking-
like code with event loops as well.
giovannibajo1 - 6 hours ago
That is exactly what Go does by default. Any time a blocking
operation is performed, Go either leaves the OS-level thread
blocked there and switches away, or hand the blocking operation
to an internal thread which is running epoll for the whole
process.The end result is much easier than Python/NodeJs
because there is no explicit "async/await" or deferred-style
programming. You simply write linear code and any blocking
operation (at the syscall level) is transparently handled.
hannofcart - 4 hours ago
Thanks for taking the time to write this response. The more I
hear about Go's features, the more I seem to like it.
wyager - 2 hours ago
FYI, this is not unique to Go. A number of languages
implement lightweight threads in the same way (roughly),
like erlang or Haskell. It?s all epoll or other efficient
polling primitives under the hood.
pcwalton - 2 hours ago
In other words, Go uses threads.There's nothing semantically
unique to Go about this model.
giovannibajo1 - 43 minutes ago
It depends if you're describing a semantic model or you're
concerned about implementation details.Semantically, a
goroutine is a thread, within a shared memory model. But
what makes Go unique (or let's say more unique) is that it
offers programmers a thread-like programming approach
(linear, blocking code) but internally turns it into an
event-driven approach (epoll/kqueue) for
networking.Moreover, the fact that goroutines are much
cheaper than OS-level threads enable a more pervasive
approach to concurrency.
cdoxsey - 2 hours ago
Go uses an m:n thread model. Goroutines are multiplexed
onto a smaller number of os-level threads. They're sort of
like threads, but they have a simplified programming model
(There is no thread-level storage for example).
pcwalton - 2 hours ago
Goroutines are threads. They're just not kernel-level
threads.There have been implementations of pthreads that
used an M:N model, for instance.
cdoxsey - 2 hours ago
This is single-threaded? What are you going to do with the other 31
or 63 cores?The single-threaded nature of applications liked Redis
an Haproxy is a singificant impediment to their vertical
scalability. CPUs aren't getting faster, we're just going to get
more cores, so anything that assumes there's only a single core
seems like a dead end.Haproxy literally just added multithreading
support in 1.8.
meritt - 2 hours ago
The CPU is rarely the bottleneck and for both Redis/HAProxy the
vertical scalability solution has been to launch multiple
processes or forks with different core affinities. There are
downsides of course (no IPC) but I still argue that CPU is not
the bottleneck for 99% of usage scenarios.HAProxy added threading
support in 1.8 as you pointed out and Redis has started the same
(for a certain subset of processing) in 4.0 as well. They're
getting there but concurrency is tough.To suggest that his
product is a "dead end" due to not supporting threading seems a
bit premature, as Redis and HAProxy are extremely well-regarded
in their niche and they made it there without threading, and
we've been at maximal clock speed for nearly a decade.
cdoxsey - 1 hours ago
> There are downsides of course (no IPC) but I still argue that
CPU is not the bottleneck for 99% of usage scenarios.I suppose
my experience might be unusual, but I frequently log in to
c3.8xlarge redis machines that have a single core pegged at
100% and the rest doing nothing. Yes multiple processes help,
but that requires updating clients and makes it harder to share
memory.> To suggest that his product is a "dead end" due to not
supporting threading seems a bit premature, as Redis and
HAProxy are extremely well-regarded in their niche and they
made it there without threading.Well yeah, CPUs hitting their
GHZ limit and the dramatic increase in the number of cores per
machine is a relatively recent phenomena.I just think its weird
to start a brand new project making those same assumptions,
especially when the underlying programming language was
explicitly designed with concurrency in mind.It'd be like
building a new networking library in Rust which ditches memory
safety.
tidwall - 2 hours ago
> This is single-threaded? What are you going to do with the
other 31 or 63 cores?Yes, the event loop is single-threaded. The
other cores can be used for other stuff, but not the event
loop.It's completely possible with this library to process
operations in a background thread and wake up the loop when it's
time to write a response. If that's what the developer desires.>
anything that assumes there's only a single core seems like a
dead end.If my documentation somehow implies that systems running
this library do not have multiple cores then I'm sorry for the
confusion. This library makes no assumption about the host
server, and it does not limit the application to a single core.
It just runs the event loop in one thread.