现在的位置: 首页 > 综合 > 正文

User preferences for search engines by John Langford@Y! research

2013年08月13日 ⁄ 综合 ⁄ 共 4821字 ⁄ 字号 评论关闭

you can access the original   here

 

=======

 

As a not-distant-outsider, let me mention the sources of bias I may have. I work at Yahoo!
, which has started using Bing
. This might predispose me towards Bing, but on the other hand I’m still at Yahoo!, and have been using Linux
exclusively as an OS for many years, including even a couple minor kernel patches. And, on the gripping hand
, I’ve spent quite a bit of time thinking about the basic principles of incorporating user feedback in machine learning
. Also note, this post is not related to official Yahoo! policy, it’s just my personal view.

The issue
Google engineers inserted synthetic
responses to synthetic queries on google.com, then executed the
synthetic searches on google.com using Internet Explorer with the Bing
toolbar and later noticed some synthetic responses from Bing with the
synthetic queries.

There are two kinds of disagreement which people might have with this.

One is the privacy disagreement
Big Brother
Microsoft
is looking at what I search and using it”. I’m sympathetic on this
count, but also sympathetic to the counter argument, that the data
collected has value and can enhance the results for all users. In the
end, I think companies should simply do their best to accept a user’s
wishes, so those who want privacy can have it, and those who want to
contribute their data towards improving a search engine can do so. The
precise manner for achieving this by opt-in, opt-out, differential privacy
, anonymization or other techniques is not entirely clear to me.

Let’s assume the privacy issue is dealt with. This is at least
partly and possibly grossly untrue, but I want to focus on the other
issue, and this assumption simplifies it’s discussion because a user and
their internet browser are synonymous when the privacy issue is dealt
with, as the agent’s actions are a true reflection of the user’s
preferences.

The other issue is an originality disagreement
,
which much of the discussion focuses on. What I believe happened was a
user feedback process, where users queried Google, clicked on a result,
informed Microsoft/Bing of the query and clicked result, and their
preference was used to promote the search result within Bing. Now,
there is a slippery-slope of questions. Should a user be allowed to:

  1. Reveal to their chosen search engine their preferred result?
  2. Reveal to a competitor’s search engine their preferred result?

If you answer ‘no’ to the first, you are deeply against user freedom
in a manner I can’t sympathize with. If you answer ‘yes’ to the first,
and ‘no’ to the second, then you are still somewhat against user
freedom. This isn’t too crazy a stance, as various people sell
information and require of their users that it not be retransmitted.
One of the more famous examples of this is the Bloomberg Terminal
.
However, in all instances I’m aware of, users knowingly agree to a
contract providing access to the information with limitations. Google
never entered into such a contract with it’s users, and I don’t know a
sound basis for even an implicit contract. So, my answer are “yes, and
yes” here.

But this doesn’t entirely deal with the issue of originality. You
could argue that it’s ok for Microsoft to take advantage of revealed
user interaction, but it’s still a matter of following rather than
leading. This argument is simplistic and wrong, as I expect all
informed parties already understand. A basic truth seen in many ways,
is that the proper incorporation of new sources of information always
improves results. This is true in machine learning where sample
complexity results and cotraining
formalize mechanisms and values of incorporating additional
information, and it was heavily used by all competitive teams in the Netflix Competition
.
More generally, it’s true in basic knowledge engineering, where people
fuse sources of information to create a better system, and I’m
virtually certain it’s true of the ranking algorithms behind Google and
Bing, which are surely complex beasts taking into account many sources
of information. I know no details about the algorithm which Microsoft
is using, but it’s quite plausible that they incorporated this
information well enough to improve the quality of their results, perhaps
in some instances so they are better than Google’s or the earlier
version of Bing’s. If that’s the case, Google will either follow
Microsoft’s lead taking into account user feedback as Microsoft does, or
risk becoming obsolete.

We can also think about things in terms of the future. A basic
truth, is that building a successful search engine is extraordinarily
difficult. This is revealed by search market share, but also by simply
thinking about the logistics involved. You need to crawl the web, have
server farms all over the world (because the speed of light just isn’t
fast enough), and incorporate many sources of information in just the
right way in order to succeed, all while adversaries
try to corrupt your results. If we prefer a future where there is a
healthy competition amongst search engines, then it’s important to lower
these barriers to entry so new people with new ideas can more easily
test them out. One way to lower the barrier to entry is to accept that
users can share their interaction, even with a competitor’s search
engine.

Perhaps it’s inevitable that Amit Singhal
has a viewpoint driving towards a monopoly on internet search.
However, Google has generally been relatively good about supporting a
rich ecosystem of innovation for information technology development, so I
am still somewhat surprised. I would be more sympathetic to a position
for allowing users of Internet Explorer a built-in means to choose to
share their search behavior with Google or other search engines on an
equal footing.

 

【上篇】
【下篇】

抱歉!评论已关闭.