HN Gopher Feed (2017-08-14) - page 1 of 10 ___________________________________________________________________
U.S. judge says LinkedIn cannot block startup from public profile
data
242 points by techrush
http://www.reuters.com/article/us-microsoft-linkedin-ruling-idUS...___________________________________________________________________
danschumann - 1 hours ago
Being a programmer not a lawyer, I like the idea of more rights for
scrapers. I don't want to see the internet partitioned away and
owned by a few companies, especially when that information is often
called a "public profile".
djsumdog - 58 minutes ago
It gets into the incredibly murky water of how the web works.
You're just issuing a request and getting things back. Sometimes
in a web browser, sometimes not. But the content itself may still
be copyright. You can't just take it, even though for now, the
publisher/server is allowing you to view it for free.But what if
you only chose to view some of the content (e.g. block ads). What
if you apply your own styles to change the way that information
is displayed? You're just changing the way the browser represents
that data. You're not redistributing it as your own at this
point. What if you store that data, but don't republish it; just
used it in some each algorithms?There are a whole lot of
interesting grey areas here, but many that already have
precedents that side more with the copyright holders.
moojah - 15 minutes ago
The interesting part here is that linkedin doesn't hold any
copyright on much of the data. You cannot copyright someones
name and title.
sliverstorm - 7 minutes ago
I think most of those "grey areas" already have plenty of
precedent in the world of art & music. Have you heard of "fair
use", and what it allows and does not?
swiley - 41 minutes ago
Maybe the whole idea of copyright is flawed and harmful?
bitJericho - 17 minutes ago
There's nothing wrong with copyrights; 14 year copyrights.
revelation - 40 minutes ago
There is no grey area. You can not copyright facts. If you
download ("scrape") a webpage and then extract the facts,
whatever you downloaded only exists in volatile memory. So
there is no claim there. The only claim you can make is on the
download itself, hence what LinkedIn chose.
tyingq - 14 minutes ago
It's possible that HiQ is downloading and processing things
that aren't facts. Long passages of text in LinkedIn posts,
recommendations, and comments aren't "facts".Though that
doesn't appear to be the path LinkedIn is using to fight it.
revelation - 8 minutes ago
Copyright isn't thoughtcrime, if they aren't redistributing
it to anyone with standing to sue (very likely not
LinkedIn, no matter their ToS) they can process things all
day.
tyingq - 2 minutes ago
I did not say or imply "thoughtcrime". Just noting it
can be more complex. Sentiment analysis of copyrighted
text passages might be claimed as being a derivitive work
for example. Fair use does have limits.The standing to
sue is an issue for user generated content, yes.
onion2k - 35 minutes ago
I've seen plenty of LinkedIn profiles that could be
classified as fiction.
smegel - 58 minutes ago
If a website puts something on the public internet, it should not
even be aware if it is being accessed by a scraper or a
human.Maybe we should just ban User Agent strings and be done
with it.
laumars - 21 minutes ago
It really doesn't take much effort to detect the majority of
scrapers. Usually you do so by monitoring patterns of any given
IP. Is each request a profile page incremented (/users/1,
/users/2, etc) or dozens of requests a minute (faster than a
typical user would read)? Is static content (particularly
images and CSS) being downloaded too or just the HTML
content?Sometimes the referrer HTTP header can give clues too -
though you have to be careful there as that's as unreliable as
the user agent header.However if you're really paranoid about
scrapers you can also throw in some honeypots. eg a fake user
(/users/13) which is a user account that doesn't exist so that
page wouldn't have any links from within your site. ie you only
reach it if you're incrementing through the user IDs. Or
perhaps a link within your HTML which doesn't render so it's
only reachable via automated scripts that don't check what
links are rendered inside the display view. Anyone that gets
ensnared in your honeypot could then be put on a temporary IP
blacklist. Though the danger of doing this is you accidentally
blacklist good crawlers if you're not careful about setting
appropriate robots rules.
rev_bird - 54 minutes ago
I'd be willing to bet that the user-agent field isn't the
problem; it's patterns that everything looks for now, right?
People have been lying in the request headers for decades.
CaptSpify - 11 minutes ago
I've kind of always thought that we shouldn't be using UA
strings. Just give the requester the data that they requested
according to the current open standards. If they choose not to
render it correctly, then thats their problem.Yes, I realize
that it's not that simple, but I think browsers would have
tried much harder to adhere to standards if we had done it that
way.
sbarre - 32 minutes ago
You call it the "public internet" but it's most definitely not
a public space or anything like it.Private entities own and
operate all(most of) the servers, services and conduits, and
that does need to be paid for and maintained.I'm not saying I
agree with Linkedin in this particular scenario, but this is
about two commercial for-profit entities arguing over money, so
let's not make it about something it's not.
smegel - 26 minutes ago
> Private entities own and operate all(most of) the servers,
services and conduits, and that does need to be paid for and
maintainedAnd are MORE than happy to send the content of
their servers to unsolicited, uninvited, anonymous guests on
mere request. No-one is forcing them to do so!
bogomipz - 16 minutes ago
>"You call it the "public internet" but it's most definitely
not a public space or anything like it."How is it not a
"pubic space"? They publish publicly visible A records for
their site as well as route their public IP space to transit
providers in order for the public to be able to reach their
site.
FLUX-YOU - 1 hours ago
Can they claim a tax credit for supporting that bandwidth usage
and handling abuse?
tomc1985 - 1 hours ago
Not needed. Cost of doing business.
FLUX-YOU - 18 minutes ago
I don't trust the US government to write good rights for
scrapers. They can't even do computer crime sentences well.At
best, it's a burden for no solid gain for society. At worst,
there will be loopholes used to DoS businesses because they
can't shut down individuals due to law-given rights, and that
will lead to court fights.These rights would do nothing but
save scraper authors from learning to obfuscate their
actions.
iamleppert - 16 minutes ago
I fully support this decision. If you're offering a service that is
public, with the intent to your users that such information will be
available publicly, you cannot then police what users of that data
you consider to be "public" because it serves your business
interest.LinkedIn, of course, wants to get all the benefit of the
public Internet with providing as little as they can. This, coming
from someone who used to work at LinkedIn.These companies have
built their fortunes on the public Internet and now that they are
successful they seek to not pay homage to the platform that give
them their success. It's very clearly anti-competitive, and bad for
users. LinkedIn should be forced to compete based upon the veracity
and differentiation of their service, not because they have their
users' public data held hostage from competitors.
PaulHoule - 15 minutes ago
On one hand, LinkedIn is like Twitter, Craigslist and Delicious in
that it has sat on a treasure trove of data without helping users
mobilize it. (All of the premium services they offer are outright
lame; if there was a market for premium services we might seem
some good ones.)On the other hand, privacy is an issue too.
LinkedIn lets you download a spreadsheet with the email addresses
of all your connections, and if you have a lot of connections you
will regularly get e-mail messages from life coaches, "managing
directors", software development outsourcers, "SEO experts", and
all kind of BS artists.
fav_collector - 51 minutes ago
Does this ruling include regular anti-scraping defenses that might
stop HiQ, but doesn't specifically target them?
jlgaddis - 1 hours ago
RTFA before commenting, folks. Questions and misunderstandings in
this thread that are easily fixed if you just RTFA.
DanBlake - 1 hours ago
This seems very at-odds with previous rulings (specifically,
relating to craigslists many past dealings). Strikes me as being
very unlikely to stand up to appeal. Also, linkedin will likely
modify their websites behavior (make you click to agree before you
view a profile) which would create a binding 'click wrap' stopping
companies from scraping them.
devrandomguy - 1 hours ago
That click wrap contract is kind of an interesting thing on it's
own, for those of us who only enable JS when absolutely
necessary. If I never see the agreement, and I am not
specifically avoiding it, does it still apply to me?
jlgaddis - 1 hours ago
Did you click a button saying you agreed to it?
CosmicShadow - 52 minutes ago
There was a big ruling in Canada about this specifically around
MLS, the big real estate monopoly we have, so that if you go to
their sites to search for homes, like you'd see at Realtor.ca,
you have to click through a clickwrapper to access any data,
and even if you automate past that, the fact that a human would
have to click it means that it's illegal to scrape since you
are forced as a human to agree to a TOS before you view.
devrandomguy - 33 minutes ago
Ah yes, that. MLS compliance was a source of many tickets for
me, in a previous job, in Canada. The employer didn't even
want me to waste time trying to learn it all, just follow the
compliance officer.IIRC, this stuff varies quite a bit from
region to region, even within a single metropolitan area.
Attempting to simultaneously comply with multiple
independently developed rulebooks was ... fun.I can't wait
for shipyard startups to disrupt the housing market. /s
aneutron - 1 hours ago
Make the profile loadable via XHR and problem solved. For
example.(Which I bet is already the case)
jlgaddis - 1 hours ago
Appeal? The case hasn't even been heard yet. This was a
preliminary injunction; it's far from over!
dragonwriter - 26 minutes ago
A preliminary injunction can be the subject of an interlocutory
appeal.
bkanber - 1 hours ago
It's just a preliminary injunction used to maintain the status
quo (ie, allowing scrapers) while the case is heard. A
preliminary injunction is basically "ok everyone stop what you're
doing, maintain business as usual until the court rules."
[deleted]
isalmon - 36 minutes ago
The biggest reason why they have not done this so far is SEO. If
you introduce the 'click wrap' - other crawlers like Google won't
be able to crawl it, so their traffic will decrease overnight.
[deleted]
dawnerd - 29 minutes ago
They'd most certainly whitelist the google ips.
fooey - 16 minutes ago
that's against TOS for Google SERPsShowing different results
to google than you do to users is called cloaking and it's
not allowed
walterbell - 1 hours ago
Is the reasoning in this case different from the Craigslist/3taps
dispute?
opaque - 1 hours ago
Does anyone know how they do this scraping from a technical
standpoint. The articles allude to it being the same as data
Google/Bing spiders, which can clearly access more data that
average internet IP for making their result summaries. I had
assumed big sites whitelisted specific crawler IP ranges or User-
Agents for the search giants. Do they somehow spoof this?
revelation - 32 minutes ago
I don't think they do any such thing, if anything they are
rotating IPs/user agents to avoid being limited or blocked.Google
requires sites to send the crawler the same content as someone
clicking a link on a Google results page would see, so even if
some sites get creative covering it up with blurred boxes and
similar dark patterns, the data is there in the markup.
SomeStupidPoint - 1 hours ago
Does anyone know where to view this ruling?I'm curious how it
passes free-association muster: you're not allowed to discriminate
on particular tasks, but there's no reason you can't discriminate
based on eg, behavior or user-agent or IP address.It seems very
strange to me that the judge would order MS to associate against
their will prior to hearing the arguments.
marksomnian - 1 hours ago
It's a preliminary injunction, not a ruling. For all we know,
ruling could be completely different.
dwynings - 1 hours ago
http://online.wsj.com/public/resources/documents/2017_0814_h...
EpicDavi - 59 minutes ago
After skimming the document for a bit, hiQ's argument looks
really flaky. Especially grasping at straws like "Free Speech".
They argue that LinkedIn is like a public mall and denying them
access to the mall is denying them "Free Speech"? I don't see
how this can be the case if they had no intent to "speak" at
all in this place. Their data collection via scraping seems
more like people-watching in the mall, if you go along with
their analogy.
SomeStupidPoint - 1 hours ago
Wow, that sign analogy is really faulty.
bigtones - 1 hours ago
This is just a preliminary injunction and the court has not even
heard or ruled on this case. They just allowed HiQ to access the
data while they wait for the scheduled court hearing to begin. The
court may eventually rule very differently once they have heard all
the evidence presented and weighed up existing applicable case
law.The judge who issued this injunction - Edward Chen, is also the
judge presiding over the Uber drivers as independent contractors
class action case.
DannyBee - 26 minutes ago
"This is just a preliminary injunction and the court has not even
heard or ruled on this case"This is not quite right. One of the
requirements to get a PI is a likelihood of success on the merits
;)
cookiecaper - 8 minutes ago
I agree that we need to be careful not to read too much into
this, but in most scraping cases I know about, preliminary
injunctions are granted as a matter of routine.The fact that this
judge refrained from doing so may signal that the judiciary is
finally willing to bring some nuance and rationality to their
interpretation of extremely broad statutes like the CFAA. It's a
positive signal, even if ultimate victory remains unlikely./me is
not a lawyer
djsumdog - 1 hours ago
Wow, those are two very big issues, that affect two very large
industries, not to mention the implications and precedent for
both free speech and workers' rights.
schoen - 57 minutes ago
There are not that many judges on the Northern District of
Californiahttps://www.cand.uscourts.gov/judgesand cases
involving Silicon Valley companies are very often filed here,
so quite a lot of the high-profile industry matters end up
getting heard by the same judges!
charlesdm - 1 hours ago
"U.S. District Judge Edward Chen in San Francisco granted a
preliminary injunction request brought by hiQ Labs, and ordered
LinkedIn to remove within 24 hours any technology preventing hiQ
from accessing public profiles."Interesting ruling
danschumann - 1 hours ago
I would hope there is a special consideration for any anti-ddos
technology they have. It would be hard to differentiate between
a ddos'er and a scraper. Rate limiting for ddos attacks might
affect a scraper, then the question ( that linkedin is asking ),
is how low can we limit them without looking like we're blocking
them. I have a feeling this isn't over!
0xCMP - 1 hours ago
I wonder if this isn't such a big deal since it's not like
they're gonna verify beyond "can they scrape now?"As long as
that is true then they will likely not run in to issues. Other
issues are not for blocking them and case can be made that it's
a separate issue. Defending against common internet attacks is
an easy case to make to a Judge. He can't be expect LinkedIn,
in this case, to kill their service so someone can scrape.
tryingagainbro - 1 hours ago
Fixed the headline: U.S. judge says LinkedIn cannot block startup
from public profile data; the judge will personally pay for the
gazillion servers and man hours needed now that scrappers cannot be
blocked.
PatrickAuld - 1 hours ago
> ?We will continue to fight to protect our members? ability to
control the information they make available on LinkedIn.?LinkedIn
has full control over this, it's their site. What they are fighting
for is the ability to choose who gets public access to various
pieces of information; which its member do not get control over.
polote - 28 minutes ago
So they want to forbid a startup to scrap the personal data of
their users as if Linkedin was the only company allowed to have
access to this data.I mean it is completely crazy, it is not
LinkedIn data it is OUR data