HN Gopher Feed (2017-08-14) - page 1 of 10 ___________________________________________________________________
Let's Remove the Global Interpreter Lock
471 points by MikusR
https://morepypy.blogspot.com/2017/08/lets-remove-global-interpr...___________________________________________________________________
wyldfire - 8 hours ago
> We have some money left in the donation pot for STM which we are
not using; according to the rules, we could declare the STM attempt
failed and channel that money towards the present GIL removal
proposal.I didn't donate to that pot but that does seem like a
judicious and reasonable step to take given the assessment of STM.
ericfrederich - 3 hours ago
Perhaps a little unrelated, I used the rpyc package to get Jython
and CPython working together. In the end I was able to use Java
libraries from CPython pretty much seamlessly.
vram22 - 3 hours ago
You mean you used RPyC at both ends, on the CPython side and on
the Jython side. Cool idea. I knew about RPyC but had not thought
of using it in this way. And getting access to Java libraries by
doing this, can be very useful, I can see.
[deleted]
sandGorgon - 7 hours ago
This is a PERFECT usecase for Kickstarter. It makes me sad that
this is a blog post that made it number 1 on HN with vast
readership with open pursestrings.. yet there is not a campaign
fundraising link.Use Kickstarter or Plasso to sell a pypy pro
license - its so much easier for companies to pay invoices than to
donate.If nothing else, I would pay for an official conda pypy
package which works seamlessly with pandas and blas.
dmix - 7 hours ago
> yet there is not a campaign fundraising link.Did you read the
article? They said in the article they aren't asking for
individual donations at the moment:>> we would like to judge the
interest of the community and the commercial partners to make it
happen (we are not looking for individual donations at this
point)Plus I'm sure they will consider using Kickstarter when the
time comes.
dweekly - 7 hours ago
Cash is one objective way to discern interest.
brian_herman - 7 hours ago
They can raise funds on their own for example: $67126 of $105000
(63.9%) for py3k in pypyAnd for STM in pypy: 2nd call: $59080 of
$80000 (73.9%)
fijal - 7 hours ago
Maybe selling packaging is a good idea.... that said, kickstarter
does not work in any jurisdiction we can use.
notzorbo3 - 6 hours ago
I think they're looking more for corporate backers. They're
probably very aware that their GILless PyPy will not run many of
the programs and libraries out there that are not written to be
thread-safe. And when the GIL is in place, there's really not
much reason to write thread-safe code. At the very least you
won't notice much when you're writing code unsafe code.So I
assume they're not doing a kickstarter to prevent the following
from happening:1. The internet at large will assume they're going
to get a GILless PyPy that can actually run their code.2. A
separate PyPy is released that doesn't run their code.3. People
are angry that they didn't get what the thought they were gonna
get, like what often happens with kickstarter backed projects.4.
With no coporate support and waning public interest due to the
uselessness of a GILless PyPy, the separately released project
becomes unmaintained.
issaria - 3 hours ago
It's not going to happen, you not only have to fix all the legacy
code, but also fix the developers.
macrael - 8 hours ago
Do people here use pypy in production? What are the benefits?
sanxiyn - 8 hours ago
We do. 2x speedup.
dr_zoidberg - 5 hours ago
I tried in digital forensics. Depends on the project. May get up
to 5x speedup in the software that runs, after a lot (a
loooooooooot) of complaining by it. Many proejcts didn't manage
to run though. In the end, not truly significant speedup (the
bottleneck tends to lie somewhere else) for the effort that is
required to get everything to work.PS: I do realize "digital
forensics" is probably not the kind of "production environment"
you were thinking. Just a small datapoint about a particular
branch of software that, while getting good speedups, may not
benefit as much as the "X times faster" line would suggest.
DonbunEf7 - 2 hours ago
Switched from CPython+Numpy to PyPy years and years ago, got a
60x speedup on a core numerical kernel and 20x speedup on real-
world benchmarks. The codebase was a multiplayer game server.
Less memory usage overall, leading to a big improvement in the
number of players that could be connected.You have to not have
problematic libraries in your system, but honestly they're all
either shitty on CPython too (literally every GUI toolkit that is
not Tkinter!) or they're stuff like lxml, where the
author/maintainer just has an anti-PyPy bias that they won't
drop.
sillysaurus3 - 8 hours ago
Sure. Free 2-5x speedup. pypy + pypy's pip generally works as a
transparent drop-in replacement to python + python's pip, so it's
free speed.It doesn't (or didn't) work when you need to rely on
an extension that uses Python's C API. I haven't followed the
scene in awhile so maybe that's changed. pypy's pip has so many
libraries that I hardly notice, so maybe they solved
that.Unfortunately python is fundamentally slower than lua or JS,
possibly due to the object model. Python traps all method calls,
but even integer addition, comparisons, and so on are treated as
metamethods. That's the case for Lua too, but e.g. it's absurdly
easy to make a Python object have a custom length, whereas Lua
didn't have a __len__ metamethod until after 5.1. I'm not sure it
even works on LuaJIT either. Probably in the newer versions.
JulianWasTaken - 8 hours ago
I can't tell what you mean by the last paragraph there, but
oftentimes PyPy's speedups come exactly from inlining stuff
like what you refer to there -- Python's not fundamentally
slower, it's those kinds of stuff that you can speed up.(And
yeah the CPython API is still a pain point if you've got a
library that uses it, although some stuff will still work using
PyPy's emulation layer. It'd be great if people stopped using
it though.)
sillysaurus3 - 8 hours ago
For example, Python makes it fairly easy to trap a call to a
missing method, both via __getattr__ and __missing__. In JS
the only way you can do that is via Proxy objects, and even
those have limits.You can't always inline the arithmetic ops
effectively. You can recompile the method each time it's
called with different types, but that's why the warmup time
is an issue. This wouldn't be a problem if Python didn't make
it so trivial to overload arithmetic. JS doesn't.
JulianWasTaken - 7 hours ago
Ah! Yes, agreed, Python does certainly make it too easy to
do things that cannot reasonably be sped up.
sillysaurus3 - 7 hours ago
Twist: Lua makes it trivial to overload arithmetic using
metatables, but LuaJIT seems to have solved that. If
there is any warmup time, it's hard to tell. Mike Pall is
a JIT god, and I wish we had more insight into everything
that went into producing one of the best JIT's of all
time.I'd love a comment/post that highlights the
differences between JS and Lua as the reason why LuaJIT
was able be so effective. There must be differences that
make Lua possible to speed up so much. There are easy
ones to think of, but the details matter a lot.EDIT: I
found some discussion at
https://news.ycombinator.com/item?id=1188246 but it left
me wanting
more.Related:https://stackoverflow.com/questions/4911762
/why-is-luajit-so...http://article.gmane.org/gmane.comp.l
ang.lua.general/58908http://lua-
users.org/lists/lua-l/2010-03/msg00305.html
sillysaurus3 - 4 hours ago
More: https://www.reddit.com/r/programming/comments/1r2s8
2/lua_fun...
myusernameisok - 8 hours ago
I tried. My company has a python API that we run on our machines,
we sell the machines to businesses and don't manage them
ourselves. We wanted to see if we could get some easy performance
increases without too much investment.At the time (a year ago)
there wasn't a way to precompile using pypy, which meant shipping
pypy along with gcc and a bunch of development headers for JIT-
ing. Additionally a one of the extensions we used for request
validation wasn't supported so we'd be forced to rewrite it. I
also found that the warmup time was too much for my liking, it
was several times longer than CPython's and it became a nuisance
for development. I guess I could've pre-warmed it up
automatically, but at that point I had better things to worry
about and abandoned trying to switch.I'm sure, given enough
resources, it would be a lot better. But it's not quite as simple
as switching over and realizing the performance increases without
some initial investment.
JulianWasTaken - 8 hours ago
We've been running a very large production PyPy deployment across
pretty much all our Python apps for about... 4 years now. Saves
us a ton of money for essentially no real downside.
makmanalp - 7 hours ago
Just out of curiousity, would you be willing to answer a few
more questions? What has the memory tradeoff been like? What is
the workload you're using it for?
JulianWasTaken - 7 hours ago
Certainly! It's a bit hard to answer some of those questions
because it's been so long since we've run CPython, and also
because we've now got ~10 apps or so that run on
PyPy.Initially memory tradeoff was definitely significant,
somewhere around 40% or so -- it's going to vary across
applications though certainly, and in a lot of cases I'm a
bit happy our memory usage went up because it forces us more
towards "nicer" architectures where data and logic are
cleanly separated.Not that I mean to apologize too much for
it, it's something certainly to watch, but for us on our most
widely deployed low-latency, high-throughput app, we traded
about 40% speedup for 40% RAM on an app that does very little
true CPU-bound tasks (it's an s2s webapp where per-request we
essentially are doing some JSON parsing, pulling some fields
out, building some data structures, maybe calling a database
or two, and assembling a response to serialize ~500
times/sec/CPU core).On more CPU-bound workflows, like one we
have that essentially just computes set memberships at 100%
resource usage all day long, we saw multiplicative increases,
and I can't even mention how much exactly, because the
speedup was so great that we couldn't run it in our data
center because it started using up all our bandwidth, so I
only have numbers for once it was moved into AWS and onto
different machines :).Happy to elaborate more, as you can
tell, I think companies with performance-sensitive workloads
need to be looking at PyPy, so always happy to talk about our
experiences.
sandGorgon - 7 hours ago
are you using any math packages like Numpy/Pandas or Opencv
?
JulianWasTaken - 3 hours ago
Not in any production workloads.They do work these days
in PyPy though, so I'd feel comfortable doing so if we
did, although I'd probably feel just as comfortable
writing whatever numerics in pure-Python too unless it
was stuff that already existed easily elsewhere.On a
personal note I've played with OpenCV as well (and done
so with PyPy to do some real-time facial analysis on a
video stream), but yeah also not for $PRODUCTION_WORK.
[deleted]
Pxtl - 1 hours ago
Ick, I'd forgotten how monkeypatchable the core of Python was.If
not for that, I'd focus on supporting some kind of pseudo-process
where multiple instances of the Python interpreter could be loaded
but they would only share pure-functional libs which, I assume,
could be used in a threadsafe fashion... but then you run into the
mutability of those libs. Well, the mutability of everything in
python. Plus what happens if those libs expose anythign that you
could hold a reference to - what happens to refcounting in a
multithreaded Python?Honestly, I feel like the world has passed
Python by. At this point the cost of its performance limitations
don't seem to be worth its payoff. Not that it's a bad language -
I like Python. I just don't really feel the need to use it for
anything anymore.
dec0dedab0de - 8 hours ago
Could someone who really wants to get rid of the GIL explain the
appeal? As far as I understand, the only time it would be useful
is when you have an application that is 1. Big enough to need
concurrency 2. Not big enough to require multiple boxes. 3.
Running in a situation that can not spare the resources for
multiprocessing. 4. You want to share memory instead of
designing your workflow to handle messages or working off a queue.
#4 does sound appealing, but is it really worth the effort?
sametmax - 5 hours ago
It's just a low hanging fruit for perfs from the dev point of
view. It's nice and useful, just nowhere as needed as most people
asking for it pretend it is.
[deleted]
omarforgotpwd - 5 hours ago
Motivation for removing the GIL is basically that when people
hear about it they go "hmmm that doesn't sound good". Obviously
many applications have been written in GIL languages and there
aren't really many practical problems that can't be overcome
easily.
neolefty - 4 hours ago
I think it may be some Stockholm Syndrome -- people have worked
very hard to get around the GIL, and they've come to expect its
limitations and respect those solutions.But I've never heard of
someone asking for a GIL to be added to the JVM.
make3 - 8 hours ago
There are many cases where the objects are too big to be passed
around. Python is used a huge amount in Machine learning and
datascience, where being able to do parallel work on stuff
already in memory would be great.
fulafel - 6 hours ago
Something like the web worker primitives might work there
(transferables & sharing read only data).
leereeves - 8 hours ago
Are those applications often bottlenecked by the CPU, as
opposed to GPU or data transfer?
rspeer - 7 hours ago
The world of algorithms that run well on a CPU is still much,
much bigger than the world of algorithms that run well on a
GPU, even in machine learning.And even if you're fortunate
enough that Nvidia designs their GPUs to solve your problem,
why should the CPU cores sit idle?
dec0dedab0de - 8 hours ago
So work on data that can not be broken down into smaller
chunks? That makes sense, and is something I never come
across.
chrisseaton - 7 hours ago
I'm sure they can be broken down into smaller chunks, but is
it more efficient if they aren't broken down and instead
shared memory is used? If you want parallelism you're
obviously already worried about performance.
smaddox - 6 hours ago
Can't this already be handled by calling out to a C/C++ or
FORTRAN procedure that processes the data in multiple threads?
For number crunching, Python is almost exclusively used as
glue.
foobarchu - 6 hours ago
You CAN handle it, but why should you have to? If it's
possible to remove that barrier, then it absolutely should be
removed. If the only answer to a problem is "use another
language", then the language in question has a limitation
that needs to be addressed.
Drdrdrq - 4 hours ago
It is not a limitation at all in this case. Python is just
a front to Tensorflow and similar libraries/frameworks so
GIL doesn't matter there.
[deleted]
ams6110 - 6 hours ago
Today's machine learning and data science students don't know
how to code in those languages. They know python, and maybe
java.
orangejewce - 4 hours ago
Don't forget R... shudders
sin7 - 4 hours ago
What's wrong with R? I know it's not a programmers
language but it's great for getting things done.
ant6n - 8 hours ago
Using multiprocessing is a pain to use, and it's slow.
bobwaycott - 7 hours ago
If you?re looking for simple threaded multiprocessing, it?s not
that hard/painful: from multiprocessing.dummy import Pool
pool = Pool(num_threads) result = pool.map(your_func,
your_objects) pool.close() pool.join() Improve and/or
complicate things from there.
zeptomu - 3 hours ago
This is a nice pattern and there are surprisingly many
problems that can be solved that way. AFAIK you do not have
to join() here as the processes die after the map call.Often
the challenge is a big amount of (hopefully read-only) data
that you want to access in every 'your_func'. The naive
solution is to copy the data, but this might blow your
memory.
dec0dedab0de - 8 hours ago
in what was is it a pain that threading is not?
cyphar - 8 hours ago
You can't just use functions defined in your tool, you need
to create a faux-cli interface in order to run each parallel
worker. Also, copying large datasets between processes is not
efficient. And also, there are cases where the fan-out
approach is not the best way of parallelizing a task, and
passing information back up to a parent task is more
complicated than necessary.
dec0dedab0de - 7 hours ago
"You can't just use functions defined in your tool, you
need to create a faux-cli interface in order to run each
parallel worker."the multiprocessing library allows you to
launch multiple processes using your function definitions.
It's almost the same as the multithreading library but does
not share data.It seems the real problem, as you pointed
out, is the additional memory. I didn't consider
situations where each process would need an identical large
data set, instead of just a small chunk to work on.
ant6n - 7 hours ago
It gets more interesting when you have a large data set
that's required for the computation, but as you compute,
you may discover partial solutions that can be cached and
used by other workers.So not only a large read-only data
set, but also a read-write cache used by all workers.
This sort of thing is relatively easy with threads, but
basically impossible with multiprocessing.
dom0 - 6 hours ago
Depending on where you want to go and the application,
such things may be good idea for a low number of workers
but can become a major bottleneck.
emidln - 8 hours ago
Serializing data for IPC is often undesirable (copies kill)
which leads to multi process shared memory. Sharing memory
across process boundaries safely is a problem you avoid
entirely with threading. You still need to lock your data (or
use immutable data), but the machinery is built into your
implementation (and hopefully trustworthy).
sevensor - 8 hours ago
With threading, all of your threads can refer to the same
objects. Multiprocessing means you have multiple
interpreters running. That means no shared memory, and
communication over pretty slow queues. I've definitely
wanted to have multithreaded Python programs where all
threads referred to the same large read-only data structure.
But I can't do this because of the GIL. I mean, I can, but
it's pointless. I can't do this with multiprocessing because
of the limitations on shared memory with
multiprocessing.Edit: I realize I'm contradicting myself
here. No shared memory is a first approximation. You can
have shared memory with multiprocessing, but most objects
can't be shared.
dguaraglia - 4 hours ago
Yeah, sharing memory between processes is a very delicate
ballet to perform. That said, sharing a read-only piece of
data is way simpler than you'd expect, depending on size
and your forking chain. The documentation could do a better
job of explaining the nuances and provide more examples.
sevensor - 4 hours ago
Care to elaborate? All I've seen in the docs is how to
share arrays or C structures between processes. It would
take a substantial rewrite to use either. Is there some
kind of CoW mechanism I'm missing?
btilly - 5 hours ago
And yet, if you could have what you want, would it actually
be faster?The costs of synchronizing mutable data between
cores is surprisingly high. Any time your CPU thinks that
the data that it has in its cache might not be what some
other CPU has in its cache, the two have to coordinate what
they are doing. And thanks to the fact that Python uses
reference counting, data is constantly being changed even
though you don't think that you're changing it.Furthermore
if you throw out the GIL for fine-grained locking, you then
open up a world of potential problems such as deadlocks.
Which look like "my program mysteriously froze". Life just
got a lot more complicated.It is easy to look at all of
those cores and say, "I just want my program to use all of
them!" But doing that and actually GETTING better
performance is a lot trickier than it might seem.
sevensor - 5 hours ago
Right, but like I said, I'd be fine with a read-only
shared data structure. I have a problem that has a hefty
data model. The problem can be decomposed and attacked
in parallel, but the decomposition doesn't cut across the
data. Right now I run n instances on n cores, but that
means making n copies of a large data structure. This
requires a lot of system memory, ruins any chance I have
of not wrecking the cache (not that I have high hopes
there, but still), and forces me into certain patterns,
like using long-lived processes because it's expensive to
set up the model, that I'd prefer to avoid.
btilly - 56 minutes ago
You might want to look at
https://stackoverflow.com/questions/17785275/share-large-
rea... for inspiration.If you need to share a large
readonly structure, the best way IMO is that approach.
Implement the structure in a low-level language that
supports mmap (be very sure to make the whole structure
be in the mmap'd block - it is easy to wind up with
pointers to random other memory and you don't want that!)
and have high performance accessors to use in your code.
hermitdev - 2 hours ago
It's been a while, and my memory is fuzzy, but I recall
either pyodbc or pysybase reacting very poorly with the
multiprocessing module. With multiprocessing, Python would
segfault after fork. With threading, it would "work" albeit
slowly. Also, IIRC, it did not matter if the module was
imported before or after the fork, still segfaulted. I never
had the time to try and track down the issue that was causing
it, though, deadlines and all that.
kllrnohj - 6 hours ago
#1 & #2: Consumer CPUs are now pushing 16 cores & 32 threads.
Python is limited to ~1/20th of what a single box is capable of.
That's a pretty big bottleneck.#4: Even if you're just talking
message passing sending a message between threads is in the 10s
of nanoseconds while between processes is 10s of microseconds.
That's a ~1000x slowdown on core communication. Given that CPU
cores are not getting any faster, that's a pretty big hit to
efficiency to take. Similarly simply moving data between
processes is expensive, while moving data between threads is
free.
sqeaky - 6 hours ago
You are right about the majority of what you said, but I am
pedantically picking on one point. CPU cores are getting
faster, but they aren't doing it with clock speed, they are
dispatching more instructions per cycle or otherwise making the
work faster.
kllrnohj - 5 hours ago
IPC gains per generation are vanishingly tiny, if they exist
at all. Skylake -> Kaby Lake, for example, had no IPC
improvements at all. A very small clock bump to the various
tiers was it.Even if you look over a large generation gap
there's only a ~20% IPC improvement going from an i7-2600K to
an i7-7700K ( https://www.hardocp.com/article/2017/01/13/kaby
_lake_7700k_v... )6 years & a shrink from 32nm to 14nm and
all it can muster is +20%. Cores are just not getting faster
by any meaningful amount.
breatheoften - 4 hours ago
Moving data between threads is only free to the extent that
synchronization is free. Maybe you could say that moving
immutable data between threads is free but I don't think you
can say its free in general ... Doing so significantly
undersells the complexity that comes with shared memory
concurrency.
kllrnohj - 2 hours ago
You seem to be conflating moving with sharing. Moving between
threads is always free[1] regardless of if it's mutable or
immutable, and there's no concurrency issues at all since
it's a move.Move means the sender no longer has a reference.
As in, std::move, rust's ownership transfer, webworker's
transferables, etc...1: Yes there's a single synchronize
point where the handoff happens, but this is part of sending
a message at all. It's also independent to the size &
complexity of the payload itself when we're talking multi-
threaded instead of multi-process. You have that exact same
sync point that costs the exact same regardless of whether
your message consists of a single byte or a multi-gigabyte
structure.
jononor - 6 hours ago
The efficiency hit is very dependent on how large your
computation chunks are. If the computation per message batch is
on order of 100 ms, it would be <10% loss.
kllrnohj - 5 hours ago
Assuming very small messages that are rarely sent then yes,
the hit of multi process is not going to be your biggest
issue.
sandGorgon - 7 hours ago
+1 on this - what is more important for me is some kind of Numba
LLVM jit to automatically optimize hotspots : kind of like the
JVM hotspot compiler.Numba already does some of
this.Additionally, I cannot help but wonder if the answer to
these problems has been the JVM all along. Especially with JVM 9
and the Truffle framework -
https://github.com/securesystemslab/zippy
rbjorklin - 6 hours ago
I was just about to mention Graal & Truffle when I saw your
post! I wasn't aware of ZipPy but it looks promising! Java 9
will provide a proper interface for Graal through JVMCI and is
only 37 days away from GA [1]. With Graal supposedly only
months away from GA [2], ZipPy may very well prove to be the
future of high performance Python.[1]
http://www.java9countdown.xyz/ [2]
https://www.infoq.com/presentations/polyglot-jvm-graal (see
roughly 42:00 - 47:00)EDIT: Wording.
dkersten - 7 hours ago
To add to what everyone else said, if you need transactional
semantics, its much simpler in multiple threads. With multiple
processes (local or remote), you can't simply share an atomic
data structure or a lock, you have to use a distributed lock or
consensus algorithm, which are more complex and usually quite
"chatty". If memory or network bandwidth are constrained, it may
be especially desirable to eliminate this, but even if not, fast
locking/transactions may be desirable regardless.If you're using
multiple processes for CPU-bound performance, why not squeeze as
much as you can out of each CPU?
dom0 - 6 hours ago
Just like you can share memory between processes, you can also
share OS-level locks and semaphores between them. A distributed
lock manager is not required for the single-node case.
btown - 8 hours ago
Say you're running CPU-bound workers that need to load
significant data into RAM - say, a machine learning or NLP model.
The most cost-effective theoretical approach would be to have
that in shared memory, so you're not paying for that RAM multiple
times in order to fully utilize all cores. Even if you need
multiple boxes, the cost savings per core would be substantial.
My understanding is that multiprocessing makes you jump through
hoops to set up that shared memory; this would make it largely
transparent to the user while remaining performant. I haven't
used multiprocessing in production, though, so I could be wildly
off base there.
[deleted]
dom0 - 6 hours ago
Unless your model actually consists of a large number of Python
objects (and not a handful of PyObjects referencing something
like a np array), there isn't really anything blocking you from
doing so. You can have a master process map the blob of static
data into a block of shared memory that's mapped by the
secondary processes; ctypeslib lets you access it as a numpy
array again.
[deleted]
colesbury - 7 hours ago
Your criteria 2, 3, and 4 doesn't make much sense to me. We often
have workloads that require multiple boxes, but we still want to
make effective use of each box. Common server hardware has dozens
of cores, which requires a lot of parallelism to fully utilize.
The GIL hinders that, even when most of the work doesn't hold the
GIL (see Amdahl's law)Python multiprocessing doesn't work well
with a lot of external libraries. For example, CUDA doesn't work
across forks and many system resources can be shared across
threads but not processes. Python objects must be pickled to be
sent to another process, but not all objects can be pickled
(including some built-in objects like tracebacks).A lot of
different parallel programming models can be built on top of
threads (shared memory, fork-join, message passing), and to a
certain extent they can be mixed. That's not true of Python
multiprocessing, which only allows a narrow form of message
passing. (It's also buggy, has internal race conditions, and
easily leaks resources.)The problem for CPython is that it may
not be possible to remove the GIL without breaking the C API, and
a lot of the benefit of Python is the huge number of high-quality
packages, many of which use the C API.
PrimHelios - 7 hours ago
CPython doesn't have any reservations about breaking the Python
API between minor versions, so why care about the C API? I get
where you're coming from, but they've already shown they don't
care much for compatibility, so I don't see why that's a big
obstacle.
dom0 - 7 hours ago
Removing the GIL (in a non-braindead way) likely entails
breaking all existing code using the C API. PyPy could do so
without breaking cpyext, by maintaining the illusion of a GIL
whenever control passes to cpyext.
std_throwaway - 3 hours ago
Does it lock the GIL so numpy can release it again
immediately afterwards?
dkersten - 2 hours ago
Perhaps it makes the unlock call a no-op before numpy
tries to unlock it.
dom0 - 7 hours ago
> (see Amdahl's law)Amdahl's law bears little relevance to
throughput computing (i.e. most servers).> (It's also buggy,
has internal race conditions, and easily leaks resources.)There
is also at least one memory corruption bug in multiprocessing
(linked a few months back by a fellow HN reader).
m_mueller - 8 hours ago
In my five years of python I've run up against this boundary at
least once. In your list I would* take out #2. if something can
make use of multiple nodes it can usually make even better use of
multi-core parallelization (which affects both computational and
memory bandwidth performance). multi-node comes with a much
higher communications overhead, so there's a relatively wide
range of applications that scale well on multi-core but not
multi-node.* add that #3 comes up as soon as you have complex
data structures to share. Serializing and Deserializing (by
default with pickle) is a huge overhead for anything a bit more
involved. If you design for this from the start you can be fine,
but often these things grow and eat up bigger and bigger usecases
until you run against the GIL. This basically happens with
anything that has enough data and users and need - hey I heard
your scheduler tool works well for the cafeteria, I'm sure it can
handle our global operations right?* about #4 - see the previous
point.
njharman - 5 hours ago
Once (or a few times) in 5 years puts this problem into the
"not worth(ROI) solving" bucket for me.Those few times, put
down the hammer and use some other tool for those not naillike
jobs.
neolefty - 4 hours ago
Certainly not worth it for one person to tackle the GIL, but
a million people running into it a few times in 5 years, and
I think it's economical.
gaius - 6 hours ago
I haven't written shared memory code in literally years, I just
use Redis now.
smegel - 4 hours ago
Removing it is the easy part.
devwastaken - 8 hours ago
This would be great if it means we can run the C portions of Python
in threads without performance hits. I recently started a little
project that is a cross-platform GUI for batch bzip2 compression,
and Python did it quite well with its built-in bzip2 module. But,
once I tried to do it parallel, the performance impacts of GIL were
obvious. Yes, you can work around that with multi-process, but I'd
rather not be spamming the running processes list and have to
actually handle seperate processes that should be threads.In the
end I settled for C++ and QT with the native bzip2 library with a
few modifications.
[deleted]
cjhanks - 5 hours ago
That doesn't seem quite right. C extensions can release the GIL
and still continue running. So long as they are not operating on
Python objects directly it is safe.
dkersten - 7 hours ago
If you're calling into C, you can disable the GIL from C for the
duration of the call. You have to re-enable it again before the
call returns and you have to be careful not to call any of the
Python C API calls that rely on the GIL (reference counting and
such for sure). Of course, if you want to do this, you can't
simply call into a random C library directly but have to write a
C stub.
monkmartinez - 7 hours ago
> I'd rather not be spamming the running processes list and have
to actually handle seperate processes that should be threads.I
may be a bit naive asking this... but why would you care that
much?Looking at activity monitor on my Mac, I count 14 Google
Chrome Helper Process instances each spawning upwards of 13
threads. Adobe does something similar, as do several other
programs/applications on my machine. Yet, my machine is mostly
idle.I can only speak for myself here. If I want something done
on my computer... I don't care if it spams my process list if
that is what it takes to complete the task. Don't crash my
machine, but do what you have to do to get it done quickly.
robotresearcher - 7 hours ago
A process swap completely wipes the cache. Once swapped in,
your process is not up to top speed for a while, until the
working set has been copied into cache. You'd like to keep it
that way for as long as possible. Best case scenario: one
process per core.
devwastaken - 7 hours ago
This is a parallel compression application that uses all cores
of a system by default. On some systems, it may use 100% HDD,
others near 100% CPU. Its meant to take up as much resources as
it can unless its core usage is lowered. But, with any program
that has a high workload, the potential exists that the
programs UI will not respond, or perhaps your desktop won't
even allow you to get to the UI to stop the process. This is
where task manager saves the day.Along with that, I like it to
be a single process so its easily wrappable in whatever
monitoring or process-throttling application you want. I will
admit I'm completely assuming that multiple processes is harder
than a single process to do that with.Also, when you get up to
the 16 thread count, seeing that many processes pop up at the
top of your process list is both annoying and doesn't let you
know how much the application overall is using easily. It could
also be scary to some users who have never seen that before and
think its trying to run a whole bunch of programs.Yes, some of
those are clearly nitpicks and not good technical reasons, but
this is a problem that is fixed with a good framework anyways.
darpa_escapee - 7 hours ago
There's more overhead communicating between processes than
there would be if threads were just modifying shared state.
chocolatebunny - 8 hours ago
Were you running on Windows or Linux? It's my understanding that
multiple processes doesn't have a big performance penalty on
Linux compared to multiple threads.
devwastaken - 8 hours ago
I was running under Windows, but the application is meant for
Windows linux and mac.
orf - 8 hours ago
Did you investigate the multiprocessing library?Also this kind of
thing should be relatively light on the GIL if done correctly.
The bzip2 module releases the GIL (I assume?), as does file IO,
which is most of the workload in your use case?
apenwarr - 8 hours ago
In normal CPython, you can design your C extension (such as
bzip2) to release the GIL while it runs. This is one of the few
times when threads are useful in Python. It's also why scipy etc
are as fast as they are.I don't know if the bzip2 module does
this, but it probably should.
akx - 7 hours ago
Yeah, the stdlib bz2 module does release the GIL for the time
of the compression (though it locks the compressor object
simultaneously):
https://github.com/python/cpython/blob/d4b93e21c2664d6a78e06...
cjbillington - 2 hours ago
This. Any part of my numerical code that is a bottleneck, is
either already coming from scipy or numpy, or I'm going to
write in Cython if possible. Rewriting in Cython is already the
opimisation you would do before going to multithreaded, because
it can get you factors of 10, 100, etc, whereas multithreaded
gets me a factor of 4 to 8 depending on how many cores I have
and how independent the workload is.So by the time it comes to
consider multiple threads, the bottlenecks that I want to
paralellise are already non-GIL-holding.I wrote a tool to
measure what proportion of the time the GIL is held in a
program:https://github.com/chrisjbillington/gil_loadI
encourange people to measure what fraction of the time the GIL
is actually held in their multithreaded programs. Unless it's
approaching 100%, go ahead and use more threads! You will get a
speedup. It's my experience that this is true more often than
not. The biggest exception is poorly written C extensions that
do not release the GIL even though they have no need for it.
But if you're writing your own in Cython it's a matter of just
typing `with nogil:`.
crb002 - 6 hours ago
Having ported Ruby to IBM's Blue Gene/L my advice is to forget
about the GIL. Run one Python process per core. Use something like
MPI2 for message passing communication. Ruthlessly eliminate bloat
code from production binaries and statically link all the things.
kevingadd - 6 hours ago
The post addresses this strategy and describes why they consider
it insufficient. You can already do this in Python, anyway.
sametmax - 5 hours ago
Yes but they discard multiple interpreters as having no real
advantages. This is dishonest since it should be able to share
objects with much, much less overhead as multiprocessing, while
allowing to use multiple CPU. It's not perfect, but honestly it
seems like a very good deal for Python.
metalliqaz - 6 hours ago
I agree wholeheartedly. Almost every time I hear from someone
who is upset about the GIL, I find that they would be much better
suited to using multiprocessing instead of multithreading.With
80% of the developers out there, they are basically assured of
producing better, more stable code this way.
optimusclimb - 5 hours ago
Except when your use case requires a massive shared data cache
that needs to be atomically updated.
zepolen - 5 hours ago
Of which there are plenty well defined ones to do the job
already, and as a plus they can communicate with any language
not just Python.
omarforgotpwd - 5 hours ago
Redis could help. Obviously not perfect for every use case
but covers many of them.
optimusclimb - 4 hours ago
It doesn't if you need to manage atomic data across the
processes, as there's no way to lock and block the other
cache consumers (think the data you need to handle cache
evictions, etc.)Also, you're describing multiple python
processes + an extra server (redis) process - as a
"simpler" solution for the limitation that Python doesn't
do multi-threads well.Of course there are a ton of use
cases out there where you can scale in other ways, but
threads and shared memory exist for a reason - there's no
reason not to call a spade a spade and say the GIL is still
a limitation.
abecedarius - 5 hours ago
I think it's ok to not write everything in Python, and this
is a long way from the top of my problems with it.
jhayward - 3 hours ago
This is self-fulfilling. As long as Python is useless for
a set of tasks that are intrinsic and important to some
domains, they won't use it.
optimusclimb - 4 hours ago
Of course it is - and that's what people do. The reason for
the parent article is that there ARE people that would like
to continue to use Python the language, and their existing
source code/libraries, but would like not to deal with the
GIL. Just because it is not a priority for you doesn't mean
it isn't for others.
jhayward - 3 hours ago
> Except when your use case requires a massive shared data
cache that needs to be atomically updatedYou can delete the
last 6 words. Anything where multiple processes would have
to read in/acquire a massive dataset to do some independent
work qualifies. For instance, running some number (e.g.,
hundreds to hundreds of thousands) of analytical or
statistical tests over a set to pick parameters, etc.
pslam - 6 hours ago
Absolutely agree. Almost all tasks will perform very well when
using multiprocessing. It also has a nice side-effect of
steering you towards explicitly coding data flows without fine-
grained sharing.If you need to close that gap between the
performance of multiprocessing, and multithreading, then you
probably shouldn't be using Python, or any language of the same
shape, in the first place.There is one other option I'd like to
see: multiprocessing style, but with multiple Python
interpreter instances in the same process ? one per thread.
There would still be the hard delineation of data boundaries
between instances, but less overhead for pushing data between
them.
metalliqaz - 6 hours ago
What would that get you?
weberc2 - 6 hours ago
> If you need to close that gap between the performance of
multiprocessing, and multithreading, then you probably
shouldn't be using Python, or any language of the same shape,
in the first place.Unfortunately, these performance concerns
often manifest well after the "rewrite it in a different
language" date has expired. There are a lot of people in that
boat, and they need better options.> There is one other
option I'd like to see: multiprocessing style, but with
multiple Python interpreter instances in the same process ?
one per thread. There would still be the hard delineation of
data boundaries between instances, but less overhead for
pushing data between them.If I understand correctly, the
article discusses this ("subinterpreters"), but claims that
there is no advantage to this approach vs multiprocessing.
Presumably any overhead savings are eaten by GIL contention
or some such?
dom0 - 5 hours ago
> There are a lot of people in that boat, and they need
better options.Land isn't coming to you, folks, you must
start rowing if you want to get there.Rewrite bit for bit.
Module for module. Package for package.
weberc2 - 5 hours ago
Sounds like you're saying this is infeasible; care to
explain why?
funkymike - 5 hours ago
I took it to mean that it is feasible. Instead of saying
"well we used the wrong language, I guess we're screwed,"
you rewrite one component at a time, piece by piece,
until the whole has been replaced.This is the approach I
try to use myself. It's nearly impossible to replace an
entire system all at once. But replacing one part at a
time is doable and you can see the improvements much
sooner.
weberc2 - 4 hours ago
By "it is infeasible", I meant, "removing the GIL is
infeasible"; not "rewriting is infeasible".
ericfrederich - 2 hours ago
The GIL is a legit pain when dealing with GUIs.When you're
jumping between C/C++ code and Python code you don't care much
about the GIL... until you have a GUI which needs to be kept
responsive and needs the GIL to do so.
simonh - 1 hours ago
Ive done a fair bit of GUI development in Python, mainly
using Qt and not hit any significant responsiveness issues.
The multi-threading support in Python is perfectly fine for
providing responsive switching between activities and event
loops, as long as you don't have anything that locks hard for
too long. But in that case you can always split that off into
a separate process e.g. The way browsers nowadays run a
process per tab.
kerkeslager - 2 hours ago
Your parent comment gives good advice, because the GIL is
probably here to stay and so there's no use complaining about
it. But the idea that multiprocessing gives better results than
multithreading is ridiculous.In languages which don't have a
GIL, threads are almost as capable as processes, but lighter
weight. Threads are almost always preferable to processes in
most languages.I understand why the GIL is still around, and
don't necessarily support removing it, but it's definitely not
there because it produces "better, more stable [Python] code".
gnaritas - 45 minutes ago
> In languages which don't have a GIL, threads are almost as
capable as processes, but lighter weight.But also plagued
with shared state concurrency bugs, something multi-
processing completely avoids so...> Threads are almost always
preferable to processes in most languages.No, they aren't.
It's too easy to write buggy code with threads, it's a flawed
model. Now it's certainly true that more people choose
threads than processes but that's because they vastly
overestimate their ability to write bug free lock based code.
Processes are better.
slantedview - 2 hours ago
> multiprocessing instead of multithreadingThere's a reason
threads exist.
gnaritas - 42 minutes ago
Those reasons aren't what they used to be, resources aren't
nearly as limited these days and we now have the hindsight to
see that threads lead to very buggy code due to shared state.
Processes are better.
glic3rinu - 3 hours ago
Lately I've found out that multiprocessing will not help you if
your program is multithreaded. There is no sane way of forking
a multithreaded program. For one, the child process will
inherit a copy of all locks in the state they where at forking
time, possibly causing random crashes and deadlocks.
hyperbovine - 1 hours ago
If the child program is multithreaded then it's almost
certainly not pure Python in the first place. So, wrap it up
in `with nogil:` Cython statements and use the threading
module (or concurrent.futures.ThreadPoolExecutor).
d0mine - 2 hours ago
that is why 'forkserver' start method exists. https://docs.py
thon.org/3/library/multiprocessing.html#conte...
lstyls - 2 hours ago
> There is no sane way of forking a multithreaded programThe
sane way of forking a multithreaded process is to exec
immediately after.
bb101 - 6 hours ago
I can see how using multiprocessing trumps threads for smaller
programs. However it can become memory inefficient to have
larger programs running in multiple processes, especially on
servers with less resources.
njharman - 5 hours ago
If I run N versions of program that occupies 8mb of memory
the memory footprint of the code is much less than N*8mb due
to shared libraries/memory pages.It's a factor, sure. But,
one you should weigh with other factors to determine what is
best.
agumonkey - 4 hours ago
hettinger said that multiprocess used pickle for every
communication and that it must be accounted for when optimizing
Animats - 4 hours ago
Python's "multiprocessing" means launching another Python
interpreter in a subprocess. Each process has a full copy of
the Python environment. They may share the base interpreter,
but there's a separate copy of every package loaded and all
data. Memory consumption is bloated and the CPU caches thrash.
Launching a subprocess is expensive; it means a full
interpreter launch and a recompile/reload."Multiprocessing" is
useful when you have a lot of work to do concurrently and not
too much data to pass between processes. I've used Python
subprocesses that way. Parallelizing your number crunching is
probably not going to work very well.
hyperbovine - 1 hours ago
Actually things are not as bad as they used to be. Since 3.4
you can alter the way multiprocessing starts processes:https:
//docs.python.org/3/library/multiprocessing.html#conte...The
``forkserver`` method eliminates most of the problems you
mention: child processes are only started once, and they
fork() from a totally separate process so they don't inherit
all of the resources of the main process (in particular, they
don't copy the whole heap). I've found this eliminates 90% of
the performance-related issues I used to experience with
multiprocessing.
dekhn - 3 hours ago
The other issue with multiprocessing is that it requires the
enclosing code to be pickleable, and many Python objects are
not pickleable. For example, if I have a thread-safe RPC
client and want to send thousands of RPCs using the client, I
can't do that with multiprocessing (subprocess pool;
threading pools work). RPC clients manage a TCP connection,
if you use multiprocess you end up having to make many TCP
connections.
asperous - 3 hours ago
If you're CPU bound (only reason to care about the GIL
anyway), then you want one process per core. So at least the
L1 memory cache isn't shared. The separate memory consumption
is minimal (3-5MB*N cores).You don't need to do setup/destroy
more then once.
valarauca1 - 3 hours ago
>Memory consumption is bloated and the CPU caches thrash.
Launching a subprocess is expensiveStatically and dynamically
loaded binaries are resident in the kernel's page cache.
Which while each process will have different locations within
its process address space for each process (b/c ALSR), they
_should_ be de-duplicated in RAM, ultimately all these
seperate in process images will be pointing at the same
physical RAM page(s).So from a hardware cache standpoint
you're mostly okay.
Animats - 1 hours ago
That's just the interpreter's executable. All the stuff
that's generated from the Python code you load, and any
data it generates, is unique to the process.
zeptomu - 3 hours ago
See my other replay in this thread.> Parallelizing your
number crunching is probably not going to work very well.
[...]The question is, what exactly does "number crunching"
mean? We do aerial imagery analysis, so image processing in
essence, which I would classify as a "number crunching"
problem. A common thing e.g. is to do a time-series analysis
and you can simply start multiple (2, 4, ..., N with
clusters, etc.) processes for each problem. Obviously this
works because most methods are computation and/or memory
heavy - the additional memory requirements and "overhead" of
Python itself (IMHO people overestimate the weight of
starting new processes instead of threads) is completely
dwarfed by the requirements (memory and CPU) of the method
itself.
kerkeslager - 2 hours ago
...which is true, but doesn't mean you can just ignore
it.Interpreter state is among the most frequently accessed
memory in many applications, meaning it's ideal to have it
in cache. The difference between two interpreter states and
one might not be big compared to the data being processed,
but it's big enough to bump a lot of interpreter state out
of cache, which for many programs can have drastic
performance implications.If you don't think cache locality
is important, look at radix sort versus quicksort. Radix
sort has a much lower O, but performs worse in most cases
because of its poor cache locality.Look, I get that there
are fairly easy ways to work around these problems, but
let's not just blithely pretend they aren't problems.
zeptomu - 1 hours ago
Agreed, but there is a lot of misinformation about the
topic. I met developers that thought the GIL prevents you
from running your program in multiple instances at the
same time on one machine - which is obviously not the
case.Sure, it's a problem for specific workloads, and
Python will get there eventually - I just don't think it
is a deal breaker.
peterkelly - 4 hours ago
If CPU load is an issue, why would you be using an
interpreter in the first place?
dsfyu404ed - 4 hours ago
Just because those resources exist does not mean you get to
park your 1997 Chevy Cavalier diagonal across three parking
spaces.I'm not going to run your code on my server if your
code uses resources so poorly that I can't run other things
I want to run on my server.
DigitalJack - 3 hours ago
You are getting downvotes, I suspect, because your
comment makes no sense in the context of the post you
replied to.Perhaps you meant to reply to GP?
zerkten - 4 hours ago
It seems like people never make this assessment, or use the
GIL argument to put interpreted languages down. I
personally run into I/O bound problems way more often than
CPU bound ones. That said, I'm mainly doing things in the
realm of a Python web developer. Scientists probably hit
CPU bound problems more often with Python, but seem to drop
down to C/C++ extensions without needing to complain about
the problems.
khedoros1 - 4 hours ago
A lot of the heavy lifting is done through calls to C
libraries anyhow, with Python just being a convenient way
to pass the data around.
falcolas - 2 hours ago
Indeed, and in that case the GIL is effectively a non-
issue (there's no requirement for the GIL to be held by
non-python code).
Animats - 1 hours ago
No, no, if you're manipulating Python objects from C
code, you have to hold the lock. You can release it only
when not doing anything with objects in Python's memory
space. Otherwise you get race conditions and intermittent
crashes.
carapace - 5 hours ago
I can't tell you how happy I was to see your comment at the top
of this discussion.Relevant: "Python is Only Slow If You Use it
Wrong" http://apenwarr.ca/diary/2011-10-pycodeconf-apenwarr.pdf
omarforgotpwd - 5 hours ago
Yes, multi-processing is much easier anyway. Not to mention how
complicated, not backwards compatible and thorny trying to get
rid of the GIL is...
zeptomu - 3 hours ago
I am a heavy user of Python and its scientific libraries (numpy,
etc.), and although I know about the GIL, I have to add, that for
us (we do a lot of scientific code-prototyping to evaluate remote
sensing processing methods) the GIL hasn't been a problem so
far.E.g. in the remote sensing and earth observation domain you
can simply divide your problem (e.g. semantic segmentation) into
(maybe over-lapping) subproblems (via e.g. tiling) and start
separate processes for each image processing tool-chain.Granted
you may not utilize your resources to the full extent by only
applying multiprocessing (and ignoring threading), but in my
experience you can solve a lot of problems by simply applying
map-reduce-like programs and optimizing for throughput.
sbeckeriv - 5 hours ago
I would like to know more.
dguaraglia - 5 hours ago
Whoa, thanks for bringing up MPI2, you might have just saved me a
lot of painstaking development with the mmap and multiprocessing
libraries.
astrodust - 2 hours ago
Counterpoint: Threads in JRuby work as threads should work. No
GIL. No grinding of gears.Multi-process is just one form of
concurrency, and it's not always the best one.
twoodfin - 6 hours ago
"It mostly works for simple programs, but probably segfaults on
anything complicated" is not a promising beginning. Starting with
race condition chaos and trying to patch your way out of it with
"strategic" lockinga) Inspires much less confidence than starting
with a known-correct locking model (the degenerate case being a
GIL) and preserving it while improving available concurrency.andb)
Seems at least 50/50 to end up without much in the way of tangible
scalability gains once enough locking has been added to reduce the
rate of crashes and data corruption to an acceptable (?!) degree.
At least that was my takeaway from all the challenges Larry
Hastings has documented while working on the gilectomy. Sure, they
don't have to worry about locking around reference counting, but
it's not like writing a (concurrent?) GC operating against
concurrently executing threads isn't a significant design challenge
itself with many tradeoffs to make.
mcherm - 6 hours ago
> "It mostly works for simple programs, but probably segfaults on
anything complicated" is not a promising beginning.Perhaps they
would have done better to say "it works correctly for all
programs that do not assume the built-in data structures are
threadsafe". That is an accurate description, what you quoted is
a reasonable approximation.
vladf - 5 hours ago
There seem to be a lot of naysayers in the comments about removing
the GIL. Multiprocess parallelism isn't always appropriate, so I
find this to be a very promising change that will definitely make
me want to switch to PyPy. Here are the use cases I've found
multiprocessing to be inappropriate:* High-contention parallel
operations. Doing synchronization through a Manager (a separate
IPC-based synchronizing broker process) is of course less
preferable than, say, a futex.* Embarrassingly parallel small
tasks. This is a big one. If the operation being parallelized is
short, then message-passing overhead takes up more runtime than the
operation itself, like a bad Amdahl's Law scenario. Shared address
space multithreading solves this problem.* Related: parallelization
without the pickling headaches! Many objects can be synchronized
but not easily pickled or copied! True multithreading would really
enable a large amount of use cases (map a lambda instead of a named
function, anyone?) since the same Python interpreter can just pass
a pointer to a single shared object.* Related: lots of libraries
(Keras, TensorFlow, for instance) make heavy use of module level
globals, and aren't meant to be run on multiple cores on the same
machine (TF, for instance, hogs all GPU memory). Multithreading in
these deep learning environments (assuming PyPy support from those
packages) is useful for parallelizing the input ingestion pipeline.
But this point isn't TF/Keras dependent; I can't recall other
modules but don't doubt the heavy use of module-globals that's
unfriendly with fork()-ing, especially if kernel-related state is
involved.
falcolas - 2 hours ago
> There seem to be a lot of naysayers in the comments about
removing the GIL.That's because it's been attempted over and over
and over again. And each time it ends up failing due to the
decrease in single-threaded performance (the bevy of necessary
memory mutexes aren't free)), and the extensive amount of work
required to make all of the standard libraries threadsafe.I don't
buy the $50,000 cost for a second. Sure, you might be able to
safely change the interpreter for that little money, but you
couldn't fix up performance and the standard library for that.
vladf - 1 hours ago
Simplicity of implementation and single threaded speed seem to
be, well, implementation issues. Nonetheless, they are
reasonable doubts about the project. However, my comment was
mostly aimed at the other commenters who were saying
multiprocessing suffices for parallel workloads - that came off
as dismissive for the reasons I mentioned above.
njharman - 5 hours ago
> Multiprocess parallelism isn't always appropriateUsing Python
isn't always appropriate.
vladf - 4 hours ago
Are you saying that because a language is missing something,
when considering a fix for that thing, the existence of other
languages/solutions is an argument against that fix?
zzzeek - 8 hours ago
Super glad they're going to try using mutexes and not that STM
approach which was looking to be immensely complicated. Was not
looking forward to the kinds of interpreter bugs that was going to
produce.
Beltiras - 8 hours ago
Think about a recursive function whose implementation is changed
while it is running. The replacement might have an entirely
different algorithm. Which version finishes the stack call?
chrisseaton - 7 hours ago
The version that was originally activated. I think that's the
case in every single parallel implementation of a programming
language ever. I can't imagine it working any other way.When you
redefine a method in any language I'm aware of you just change
which method the name points to. You don't modify the original
method.
Beltiras - 7 hours ago
So a function: def fun(*args): if not args:
return 0 return fun(*(args[1:])) would be call-by-
address after the first invocation? It could be lookup-by-name
by way of code.
chrisseaton - 7 hours ago
The naive implementation, and the semantic model, is always
lookup-by-name on every invocation.In practice we apply
speculative optimisations including inline caching and guard
removal with remote dynamic deoptimisation via safe points to
make it a direct call instead.
jtchang - 6 hours ago
Just reading this post makes me think that it could do with a bit
more "marketing" speak. I love python. I use it day to day and
realize there is a GIL.But give me some business reasons as to why
removing the GIL is critical. Will is save me a ton of money? Will
my stack magically just run faster?I wonder if Google has already
done so since they would benefit quite a bit from a GIL-less
python.
cool-RR - 4 hours ago
Python's GIL issue is like the Israeli-Palestinian conflict.1.
People like to talk about it a lot, complain about it and say their
opinion of what should be done with it.2. It's not likely to be
resolved for years to come.3. In the end, the problem has very
little effect on people's lives, much much less than the amount of
hype around the issue.
seunosewa - 7 hours ago
The ideal solution is for someone to design a new programming
language that is as similar to Python as possible without requiring
a global lock. Rarely used features that make it hard to
parallelize Python would be dropped. STM might be built into the
language instead of being hacked into one implementation, etc.
mFixman - 7 hours ago
What are these features?
rburhum - 7 hours ago
So basically recreate an entire language and library ecosystem
because there is one feature that is less than ideal? I hope you
realize why a better approach may be to reengineer that one
component...
dTal - 7 hours ago
Python has many less-than-ideal features. Do you think we
finally got it right, that we will use Python forever, and that
the library work of the past decade or so is irreplaceable?"Is
it possible that software is not like anything else, that it is
meant to be discarded: that the whole point is to see it as a
soap bubble?" -- Alan Perlis
ehsankia - 6 hours ago
Just to go from py2 to py3, which was relatively a MUCH
smaller change than a whole new language, it's taken a decade
and it's still far from over. I don't see how a whole new
language would be any better. And it's not like there's a
lack of new languages popping up left and right. There's a
reason most of them just die out. It's insanely hard to gain
critical mass unless you have a huge backer, like a whole
organization or company using the language.
rburhum - 4 hours ago
At the end of the day, the purpose of why we write software
as software engineers is to solve real-world problems, not to
have a perfect beautiful language. What you are describing is
equivalent to doing an amputation when all you need is
antibiotics.There are several things I personally _hate_
about python, but there is a cost-benefit that comes from
engineering new things. What new problems are we going to be
able to solve by using a new language? If the answer is clear
(e.g. imperative programming vs declarative/functional
programming let you solve different kind of problems) then it
makes sense to do. If certain constructs enable you to
completely avoid a recurring mistake (e.g. garbage
collection), then it may make sense.But this?!?!? No man, you
don't need a new language to fix this.
digitalzombie - 3 hours ago
> I hope you realize why a better approach may be to reengineer
that one component...Top comment is proposing basically Erlang
or an actor model.As for immutability... well they have to
either have it or manage mutable state.That task of engineering
is not something to scoff at and I think building a new
language or using an existing language with those ability would
help. Erlang is not a number crunching language. But there are
others such as Pony.
mrsteveman1 - 7 hours ago
> language that is as similar to Python as possible without
requiring a global lockSomething like Pony[0] or Nim[1]? I'm not
very familiar with either one, but Nim says it is inspired by
Python, and on the surface Pony appears to be as well.[0]
https://bluishcoder.co.nz/2015/11/04/a-quick-look-at-
pony.ht...[1] https://nim-lang.org/features.html
NewEntryHN - 6 hours ago
What makes it so hard for CPython to drop the GIL is keeping
backward compatibility for the CPython C API. If you're willing
to break the API there's no need for a new language.
nine_k - 7 hours ago
If you want other multiplatform, open-source, highly parallel
languages with nice syntax and quick turnaround, we already have
a few, like Elixir, Racket, or, well, even ES6.Much of the
Python's appeal is in its huge, colossal, powerful ecosystem,
with modules for everything, and things like numpy or tensorflow
using it as the high-level interface language. Not breaking this
is probably more important for success than efficient in-process
data sharing. (Yes, process pools, queues, and a shared DB cover
most of my cases.)
rlander - 6 hours ago
I don't mean to be pedantic, but please explain how ES6 is a
"highly parallel" language.
nine_k - 5 hours ago
You have web workers, generators, all the async stuff,
futures and promises ? plenty enough from the language
perspective. Maybe node.js does not happen to be multi-
threaded, but it's not about the language.
jerf - 4 hours ago
That's all true about Python, too, and it's been true since
before Node existed. If Node were adequate, Python would be
adequate too.In fact, other than "run lots of Javascript",
I'm not sure I can name a single thing Node did before
Python.
heavenlyblue - 4 hours ago
By the same logic Python is multithreading-ready as well,
since it's only a matter of it's major implementation why
it doesn't support multithreading.
nine_k - 3 hours ago
> By the same logic Python is multithreading-ready as
well, since it's only a matter of it's major
implementation why it doesn't support multithreading.How
web workers aren't threads? Browsers are more widely
deployed than Node, even with the same V8 engine.
ryanjl - 7 hours ago
Jython and Iron Python lack the GIL. It's just an implementation
detail of the underlying VM. There's nothing in the language
itself which requires a GIL.
fijal - 7 hours ago
C API... You can argue it's not a part of the language, but
PyPy was forced to support it at the end
xenadu02 - 5 hours ago
IronPython interoperates with a whole host of C and C++ code.
I'm not sure why this would matter?The initial implementation
may need to assume single-threaded C interface support and
take a global lock but it wouldn't be a stretch to have these
things declare they are multithread aware and relax that
restriction.Forgive me but most of these objections seem like
post-hoc rationalizations. The first step is deciding to
support a GIL-less multithreaded mode. After that, solve the
problems one step at a time.It is amazing how many times
accomplishing "magic" boils down to:1. Decide we're going to
solve this problem. 2. Iterate toward the solution in
manageable steps.#1 is by far the most difficult aspect :)
ryanjl - 3 hours ago
Jython uses the JNI and IronPython does it through C++/CLI,
neither of them support the CPython extension interface,
meaning the C modules aren't compatible. Because of this,
Jython and IronPython inherit the interface properties of
their respective VMs and they can remain thread safe
without the GIL.
dom0 - 7 hours ago
> It's just an implementation detail of the underlying
VM.Python itself is just an implementation detail of the
underlying VM.
jrs95 - 7 hours ago
Perl 6 doesn't have a GIL, and already has a sane concurrency
model, but the lack of libraries and community interest seems to
make that pretty much a non-starter.
DougN7 - 7 hours ago
Well, that and Perl has to be one of the most unreadable
languages out there ;)
vgy7ujm - 2 hours ago
Only if you don't know Perl... To me Python is more
unreadable ;)
bsder - 59 minutes ago
Only if you know ALL of Perl.I used to carry around a Perl
program of my own on a printout to take to VLSI interviews.
That way when I got the "Do you know Perl?" question I
could bring it out and force the interviewer into MY stupid
subset of Perl rather than being stuck in his stupid subset
of Perl.That's not a compliment to the language.
[deleted]
smitherfield - 7 hours ago
It?s also still dog-slow for the (Perl 5 / scripting-language)
common case, which makes whatever theoretical performance
improvements to its semantics a bit academic at this point:
https://news.ycombinator.com/item?id=15004977
adambyrtek - 7 hours ago
Fork with no backward compatibility is hardly "the ideal
solution". Healthy ecosystem is crucial for sustainability of any
programming language.
[deleted]
weberc2 - 5 hours ago
If we're talking about creating an entirely new language, I don't
think the "ideal" is Python with a few tweaks; it's going to be
radically different. The point is, you have to draw the line
somewhere, and if you're going to build an entirely new language,
you should probably address as many problems as you can; few will
switch to an incrementally better Python (unless you can give
strong compatibility guarantees, in which case it's arguably not
a new language).
gshulegaard - 5 hours ago
I feel like the GIL is, at this point, Python's most infamous
attribute. For a long time I thought it was also the biggest flaw
with Python...but over time I care less and less about it.I think
the first thing to realize is that single-threaded performance is
often significantly better with the GIL than without it. I think
Larry Hasting's first Gilectomy talk was extremely insightful
(about the GIL in general and about performance when removing the
GIL):https://youtu.be/P3AyI_u66Bw?t=23m52sI am not sure I would,
personally, trade single-threaded performance for enabling multi-
threaded applications. I view Python as a high-level rapid
prototyping language that is well suited for business logic and
glue code. And for that type of workload I would value single-
threaded performance over support for multi-threading.Even now, a
year later, the Gilectomy project is still slightly off
performance-wise (although it looks really really close :)
):https://youtu.be/pLqv11ScGsQ?t=27m32sAs noted elsewhere, multi-
processing offers adequate parallelization for this type of logic.
Also, coroutines and async libraries such as gevent and asyncio
offer easily approachable event loops for maximizing single-
threaded resource utilization.It's true that multi-processing is
not a replacement for multi-threading. There definitely are tasks
and workloads where multi-processing and its inherent overhead make
it unsuitable as a solution. But for those tasks, I question
whether or not Python itself (as an interpreted, dynamically typed
language) is suitable.But that's just my $0.02. If there is a way
to remove the GIL without negatively impacting single-threaded
performance or sacrificing reference counting for a more robust
(and heavy) GC, then I am all for it. But if there is not...I
would just as soon keep the GIL.
sametmax - 3 hours ago
Which is what multi-interpreters is a good solution. You keep the
GIL and the benefits of it, but you loose the cost of
serialization and can share memory.
xenadu02 - 5 hours ago
Many projects have solved this problem with dual compilation
modes and provide two binaries the user can select from at
runtime.Eliminating the GIL doesn't have to mean actually
eliminating it. You could certainly have #defines and/or
alternate implementations that make the fine-grained locks no-ops
when compiling in GIL mode. Conversely make the GIL a no-op in
multithreaded mode.
VectorLock - 3 hours ago
The GIL has been a much bigger problem for perception than it
ever has been for performance. Python has lost more mindshare
over it than anything else. The few machine cycles that were
ever saved by moving away from it were far outweighed by the
waste of human cycles.
bsder - 1 hours ago
The few machine cycles that were ever saved by NOT moving away
from it (which is the ONLY justification for keeping it) were
far outweighed by the waste of human cycles.If Python would
simply suck it up and eat the 20% performance hit, we could
stop talking about the GIL and start optimizing code to get the
20% back.
est - 7 hours ago
Sub-interpreter looks like an interesting idea, I don't mind
limited to few primitive immutable objects shared between threads
as long as it's shared.
eslaught - 5 hours ago
The comments here are missing a massive use case: shared memory.
Shared memory isn't just about programmer convenience. It's about
using a machine's memory resources more effectively.Yes, shared
memory is available in multi-processing, but it doesn't necessarily
interact well with existing codes.I've been working on adding
Python support to Legion [1], a task-based runtime system for HPC.
Legion wants to manage shared memory so that multiple cores don't
necessarily need multiple copies of the data, when the executing
tasks don't conflict (all are read-only, or access disjoint data).
Legion is C++, so this mostly "just works". Some additional work is
required to support GPUs, but it's still not so difficult. But with
Python, if we go with multiprocessing, we have to switch to a
different mechanism. Worse, Python is an optional dependency for
Legion, so we can't depend on Python's multiprocessing support
either.If you have a large existing project, and a use case that
can take advantage of shared memory, being forced into Python's
multiprocessing scheme for parallelism is a pain.We've been
investigating using a dlmopen approach as well, based on this proof
of concept [2]. Turns out that dlmopen in every available version
of libc has a critical bug that prevents it from being practically
useful, if you have any desire to make use of native modules. You
can build a custom libc with this patch [3] but rolling a custom
libc is also a massive pain.In all likelihood we'll end up rolling
our own multiprocessing to make this work. If the GIL were truly
gone though, we could potentially avoid many of these issues.[1]:
http://legion.stanford.edu/[2]:
https://news.ycombinator.com/item?id=11844268[3]:
https://patchwork.ozlabs.org/patch/496559/
sametmax - 3 hours ago
But then multi-interpreters would allow that, and the article
discard it as a valid solution. I find it harsh. It seems much
easier to implement, doesn't have the same serialization problem
than multiprocessing has and allow to utilize all the CPUs. Yes
it's not as good proper threads because you do have more
overhead, but it's an order of magnitude better than what we
currently have, while being way easier to do that getting rid of
the GIL.Too bad the current project is on hold.
lstyls - 1 hours ago
This is a good point and one of the few convincing arguments I've
heard against the GIL. Thanks for providing so much detail!Did
you consider just mounting a ramdisk and storing data as files?
At first glance it seems like a decent fit for sharing read-only
data in memory.
riri-au - 8 minutes ago
Agreed, shared programming is an immensely useful feature for
numerical programming, including data science and machine
learning. Lots of people will say that those should be written in
C++, but I think the rise of machine learning & data science in
high level languages argues against their point.
detroitcoder - 3 hours ago
^This. It is a very common usecase for applications I work with
to create a very large in memory read-only pd dataframe and then
put a flask interface to operations on that dataframe using
gunicorn and expose as an API. If I use async workers, the
dataframe operations are bound by GIL restraints. If I use sync
workers, each process needs a copy of the pd dataframe which the
server cannot handle (I have never seen pre-fork shared memory
work for this problem). I don't want to introduce another
technology to solve this problem.
sambe - 2 hours ago
I rarely hear people complain about genuine use-cases but this
would seem to be one. However, aren't most/all of the dataframe
operations done in C extensions in these cases?
sobkas - 4 hours ago
>fully working PyPy interpreter with no GIL as a release, possibly
separate from the default PyPy releaseI have concerns that if such
functionality will not be in the main release enabled by
default(and consequently don't get as much testing), it will just
bitrot and in the end, will be removed.
_wmd - 4 hours ago
They're asking for funding to spend on a risk-free attempt at GIL
removal (risk-free since it won't bone PyPy mainline), if the
attempt meaningfully succeeded I'd imagine their next step would
be making it the default.A fully functional PyPy that could do
heavy math in multiple threads would be an amazing tool in the
box, but there are plenty of risks to that (penalizing single
threaded performance, for example). So this strategy makes plenty
of sense to me.They can't just do it on mainline from the outset
because there are huge obstacles to overcome.. for example, that
ancient foe, CPython extension interface compatibility, which
assumes a single global lock covering all mutable extension data.
I don't think there will ever be a way around maintaining the GIL
for that, even if pure Python code can freewheel it otherwise
vasilakisfil - 8 hours ago
Just curious: if they solve it in Python, would it be possible to
solve it in Ruby too ?
pjmlp - 8 hours ago
It is already solved for Ruby when you use JRuby.
dfox - 1 hours ago
For this environment (ie. you already have some kind of
concurrent GC) it is probably significantly easier problem for
Ruby than for Python.Ruby does not have that much dynamism
compated to "everything is dict of pointers to dicts" Python.
fny - 8 hours ago
This is in the cards for Ruby 3. The plan is to migrate away from
a global interpreter lock to "guilds" which are akin to execution
contexts. These guilds also have a locking mechanism that allows
for parallelism which they call the "global guild lock."You can
learn more about concurrency in Ruby 3 at this wonderful blog
post: http://olivierlacan.com/posts/concurrency-in-ruby-3-with-
gui...
chrisseaton - 7 hours ago
JRuby, Rubinius and TruffleRuby are all existing implementations
of Ruby without a GIL.
taf2 - 8 hours ago
If I understand correctly the issue in Ruby is the existing C
extensions that have been written to assume the lock exists...
myusernameisok - 8 hours ago
It's the same issue with Python. AFAIK there are a number of
Python libraries that are not thread-safe, and the GIL prevents
them from being an issue.
shwouchk - 7 hours ago
GIL does not help thread safety in application code (and
external libraries), just in the VM.
myusernameisok - 5 hours ago
I'm fairly certain you're incorrect. With the GIL you don't
have to lock shared memory because the assumption is that
only one thread will be running at a time. For example
shared data structures won't be changed while being being
read/written to by multiple threads, because only one
thread is actually running.
aidenn0 - 7 hours ago
I thought the GIL was not held during execution of foreign
code in python (at least that was one point given for why the
GIL wasn't a big deal in practice).
dom0 - 6 hours ago
No, it must be explicitly released. The GIL must be held to
invoke almost all Python runtimes (main exceptions:
acquiring the GIL, low-level allocator).
wiremine - 7 hours ago
> We estimate a total cost of $50k...Just looking at it from a
financial perspective, having a great Python interpreter that
doesn't have a GIL seems like a no brainer for $50,000, and it
creates another reason why people should take a look at PyPy.Side
note: if you haven't looked at PyPy, check it out, along with
RPythonhttps://rpython.readthedocs.io/en/latest/
andruby - 7 hours ago
How can they estimate this? What about all the libraries that
might not be compatible with the solution PyPy comes up with?This
feels like a number that might in the end blow up to 10x the
original estimate.
joaodlf - 7 hours ago
This reminds me how the sales/marketing teams in my company
typically sell new features: "Not having this feature costs us
50k a month!"It might as well not be the case here, I just
found it funny, 50k is our little magic number.
rguillebert - 7 hours ago
It's not the PyPy developers' job to make every Python library
threadsafe, people writing libraries will have to make their
code threadsafe, like in every other language.
taeric - 7 hours ago
There is a clear difference here, though. Making a change
that could lead to poorly written libraries now being broken
is clearly the fault of the change. Userspace for these
libraries is defined by how it is, not how it was
intended.(And really, was it intended to be dangerous in this
way?)
rguillebert - 6 hours ago
Then just use the version of PyPy with a GIL?
masklinn - 6 hours ago
> There is a clear difference here, though. Making a change
that could lead to poorly written libraries now being
broken is clearly the fault of the change.No, these
libraries are already semantically broken in the same way
e.g. libraries which didn't properly close their files and
assumed the CPython refcounting GC would wipe there asses
were broken.They're already broken under two non-GIL'd
implementations.
koolba - 6 hours ago
No the fault in that situation is a user blindly upgrading
PyPy without testing the totality of their software package
and its dependencies.Expecting bad code to magically work
forever is unrealistic and hinders progress.
cjhanks - 5 hours ago
I agree. Even developers who are well aware of how to
write thread-safe code probably don't even bother with
mutex locking in Python. That code isn't poorly written...
it's just code targeting the implementation.
anonacct37 - 6 hours ago
That's not the concern. Python already has threads and race
conditions (although the GIL means that the interpreter
itself probably won't get corrupted while executing a piece
of bytecode).What python doesn't have is a C api for
extensions that makes sense without a GIL. So ideally a
correct threadsafe C extension will continue to be correct,
which probably implies that a function called
"PyEval_AcquireLock" will continue to provide similar
guarantees. Which means that the process for utilizing more
cores with pure python code in one process will probably be a
gradual upgrade process.
rguillebert - 6 hours ago
C extensions will still run under the GIL
lemoncucumber - 5 hours ago
Would you say the Python ecosystem is stuffed to the GILs with
incompatible libraries?
heavenlyblue - 4 hours ago
Andrew Godwin had raised ?17K out of ?2.5K expected in order to
implement (I believe excellent) Django migrations that are now
part of the official repository:
https://www.kickstarter.com/projects/andrewgodwin/schema-
mig...Neither do I think that raising $50K for Python interpreter
would be an issue.PS: I don't find Django an excellent ORM per
se. On the other hand it's highly pragmatic, and their
implementation of automatically-generated migrations have saved a
good chunk of my time.
chubot - 5 hours ago
Who uses PyPy? I have been hearing about it for so long now,
maybe 10 years. And I have been programming in Python almost
full-time for 14 years.But still I don't know anybody who uses
it? It seems like the C extension API is still an issue, or am I
mistaken?
benhoyt - 8 hours ago
Excellent! Where's the Donate button or call to action for
businesses who want to support this? There's a small link in the
sidebar to "Donation page", but that doesn't seem to have a place
to donate for the remove-the-GIL effort.
Herald_MJ - 8 hours ago
> Since such work would complicate the PyPy code base and our
day-to-day work, we would like to judge the interest of the
community and the commercial partners to make it happen (we are
not looking for individual donations at this point).
fijal - 8 hours ago
As mentioned in the blog post the individual donation buttons are
not a resounding success. I'm happy to sign contracts with
corporate donors (or even individuals) that we'll deliver. My
mail should be public, if not #pypy on freenode or fijal at
baroquesoftware.com
btown - 8 hours ago
Is the issue that individual donations are unpredictable (and
therefore difficult to use as justification for such a large
scope increase)? Would you consider setting up something akin
to a Patreon to allow individuals to commit to recurring
monthly support for the project?
fijal - 8 hours ago
The main issue is that the effort it takes to setup and
maintain it greatly outweighs the amount of money we get
(typically). There is also complexity with taxation,
jurisdictions and all kinds of mess that is usually very much
not worth couple dollars (e.g. $7/week on gratipay for
example)
unkown-unknowns - 4 hours ago
> If we can get a $100k contract, we will deliver a fully working
PyPy interpreter with no GIL as a release, possibly separate from
the default PyPy release.If done as a separate release, will that
version be maintained in the future?
andreasgonewild - 6 hours ago
I just can't stop thinking that somewhere along the line one of the
Guidos should have reacted to handing out global locks left and
right. I mean, as long as its only you and your friends using it.
But once it starts spreading, these are the kind of issues that
need to be kicked out of the way asap. Lock granularity affects the
entire design of client code, reducing it basically means rewriting
everything.Ah well, at least it serves as a warning sign for
budding language composers as myself. Snabel did full threading
before it walked or talked:https://github.com/andreas-gone-
wild/snackis/blob/master/sna...And to any Pythoneers with sore toes
out there: pick a better language or learn to live with it, down-
voting me will do nothing to solve your problems. It's a tool,
we're supposed to pick the best one for the job; not decide on one
for life and defend it to death. Imagine what could happen if
language communities started working together rather than
competing. There is no price to be won, we're all being taken for a
ride.
[deleted]
tyingq - 8 hours ago
Would this work with cpython extensions that were ported to PyPy?
fijal - 8 hours ago
They would run under GIL (I can't see CPython C API being thread-
friendly unless gilectomy succeeds)
tyingq - 8 hours ago
Ah, ok. So this approach doesn't completely remove the GIL,
but removes it as a barrier for pure python code running in
PyPy?Or does it break the current support for porting cpython
extensions?
fijal - 7 hours ago
It removes it for pure python code. The C extensions run
under the lock (which is unfair to call interpreter any more)