[论文笔记] Budget-optimal crowdsourcing using low-rank matrix approximations (Allerton, 2011)

现在的位置: 首页 > 综合 > 正文

[论文笔记] Budget-optimal crowdsourcing using low-rank matrix approximations (Allerton, 2011)

2012年11月26日 ⁄ 综合 ⁄ 共 1738字 ⁄ 字号小中大 ⁄ 评论关闭

Time: 2.5 hours
Timespan: Feb 15 – Mar 24, 2012
Karger, D., Oh, S., and Shah, D. Budget-optimal Crowdsourcing using Low-rank Matrix Approximations. Proc. of the Allerton Conf. on Communication, Control, and Computing, (2011).

平日事务较忙，少有空闲读论文的时间。不管如何，每月至少还是要读一两篇，不可断了香火。眼看三月将过，这几日抓紧读了一篇。

本文作者David R. Karger是EECS@MIT的professor, 95年从斯坦福获得计算机博士学位，主要研究兴趣是"information retrieval (particularly our haystack project) and analysis of algorithms"。

以下是论文情况：

背景
近些年来兴起的crowdsourcing系统（以下简称“CS系统”），是解决“human-powered solving of large scale problems”的有效方式。
CS系统依赖于大量成本低廉的劳动力，对于他们所提交的结果，如果保证质量是一个至关重要的问题。

本文要解决的问题 如何在保证结果整体的有效性的前提下，使成本最低？
对于CS系统的worker本文认为：（1）很难跟特定的worker建立可信关系；（2）很难根据答案的质量来发放劳务费。
在上述前提下，CS系统通常采用通过答案的冗余性来保证结果的可靠性（对同一个问题，获取多份答案）；对于每一份提交物，不区分质量而发放一样的劳务费。
因此本文要解决的问题可以表述为：“一个工作最少分发给几个worker，可以保证结果整体的有效性？”。进一步的，相关的子问题有：（1）“choice of task assigment”: 如何调度分配工作？（2）如果从冗余结果中推断出正确结果？

符号定义与建模
详见论文（S1）。符号与用整数规划对Web服务组合问题建模时的符号很相似。
文中使用了二分图(bipartitle graph)来对问题进行建模(S2.A)。

本文对CS系统的一些假设：
(1) one-shot model: 分配给多个用户的同个问题，这些用户的答案会"同时"递交。(S1)
(2) 本文讨论的CS系统比较简单，任务只需要提交 T/F（比如对于给定的一副图片，判断是否适合未成年人看，适合则提交T，否则提交F）(S1.Setup)

解决方法
（1）将“assigning tasks to works”映射为“designing a bipartitle graph”，使用configuration model方法来生成(l, r)-regular bipartitle graph. (S2.A)
这里有一段比较数学：“a sparse regular random graph is known to have a large spectral gap(谱隙)” ，且“use a graph with a large spectral gap makes it easier to find the meaningful signal from the noisy data”，在论文后面讲诉的inference algorithm利用了这点。

（2）提出了一个inference algorithm (S2.B)，这是一个low-rank approximation algorithm。
输入： , 其中m是task的数量，n是worker的数量，这个矩阵表示worker提供的答案的情况.
输出： , 推断得到的问题的结果("unobserved solution vector")
（3）然后，将上述两部分内容结合起来，提出了Budget-optimal Crowdsourcing算法(S2.C)，用以"computer the total budget sufficient to achieve a target error rate"。

其他
1. 在(S2.D)里讨论了更加一般化的模型：任务的难易程度；worker的经验和可靠性差异；worker的bias等

2. 理解本文的技术细节需要补一些数学知识。这次论文中数学细节没精读。

【上篇】团购网站如何优化
【下篇】SQL Server 2005之PIVOT/UNPIVOT行列转换

作者: vaguely

该日志由 vaguely 于11年前发表在综合分类下，最后更新于 2012年11月26日.
转载请注明: [论文笔记] Budget-optimal crowdsourcing using low-rank matrix approximations (Allerton, 2011) | 学步园 +复制链接

抱歉!评论已关闭.

学步园

[论文笔记] Budget-optimal crowdsourcing using low-rank matrix approximations (Allerton, 2011)

作者: vaguely

书签

最新文章New

本站推荐

返回首页