数据流-移动超平面(HyperPlane)构造

现在的位置: 首页 > 综合 > 正文

RSS

数据流-移动超平面(HyperPlane)构造

2013年10月21日 ⁄ 综合 ⁄ 共 2896字 ⁄ 字号小中大 ⁄ 评论关闭

转自 Koala++'s blog 感谢原作者

移动超平面是非常好的一种模拟数据流的方法（我感觉它应该是最好的），倒不是它没有缺点，只是别的模拟方法实在是有点差劲。

自己写的程序又一次在硬盘的结束中逝去了，还好找到一个更强大的数据流构造的程序集（不过对于我需要的移动超平面来说，它是相当的弱，以前我写的程序可以产生很多种变化，而且使用非常方便）。下面是链接：

http://www.cs.waikato.ac.nz/~abifet/MOA/software.html（MOA的页面）

http://sourceforge.net/projects/moa-datastream/（下载地址）

它里面实现了Hoeffding树，我也看过VFDT，cVFDT，还有一个改进版本VFDTc，反正我是不喜欢他那种方式，我这次讲的是他其中产生数据流的一个程序：HyperPlaneGenerator（在wDriftGenerators.zip包中），它是VFDT的作者提出的，在Wang那篇经典论文中也用的是这种方法。

我自己写了两行代码，让它先能运行起来：

publicvoid write2File( String path,
int N )

{

HashMap<Integer,Instance> map =
new HashMap<Integer,Instance>();

generateHeader();

restart();

for(
int i = 0; i < N; i++ )

{

map.put( i, nextInstance() );

}

try{

File file = new File( path );

FileWriter filewriter =
new FileWriter( file,
false );

filewriter.write(
streamHeader.toString() );

for(
int i = 0; i < N; i++ )

{

filewriter.write( map.get( i ) +
"\n" );

}

filewriter.close();

}catch( Exception e )

{

e.printStackTrace();

}

public static void main(String[] args) {

HyperplaneGenerator g = new HyperplaneGenerator();

g.write2File(".//data.arff", 10000);

}

你可能会想，我产生一亿条样本，难道我这把这一亿条样本的数据集读到内存里吗？当然不是了，你自己把这一亿个样本拆开分成几个数据集不就行了。另外，自己用数学软件看一下数据漂移的变化规律，在对数据都不了解的情况下，产生10000和1000000000个样本是没有区别的。

下面介绍一下代码：

public void restart() {

this.instanceRandom = new Random(this.instanceRandomSeedOption

.getValue());

this.weights = new double[this.numAttsOption.getValue()];

this.sigma = new int[this.numAttsOption.getValue()];

for (int i = 0; i < this.numAttsOption.getValue(); i++) {

this.weights[i] = this.instanceRandom.nextDouble();

this.sigma[i] = (i < this.numDriftAttsOption.getValue() ? 1 : 0);

}

Restart是初始化权重(weights)和概念漂移时权重变大还是变小(sigma，也就是原文说的:specify the
total number of demensions whose weights are changing)，也许你会奇怪，为什么这个函数不叫init，start，我猜是因为它还可能会在渐变(Gradual)漂移的同时产生突变(Abrupt)漂移，或是什么Recur之类的漂移。

Public Instance nextInstance() {

int numAtts =
this.numAttsOption.getValue();

double[] attVals =
newdouble[numAtts + 1];

double sum = 0.0;

double sumWeights = 0.0;

for (int I = 0; I < numAtts; i++) {

attVals[i] = this.instanceRandom.nextDouble();

sum += this.weights[i] * attVals[i];

sumWeights += this.weights[i];

}

int classLabel;

if (sum >= sumWeights * 0.5) {

classLabel = 1;

} else {

classLabel = 0;

}

//Add Noise

if ((1 + (this.instanceRandom.nextInt(100)))
<=

this.noisePercentageOption.getValue()) {

classLabel = (classLabel == 0 ? 1 : 0);

}

Instance inst = new Instance(1.0, attVals);

inst.setDataset(getHeader());

inst.setClassValue(classLabel);

addDrift();

return inst;

}

这段代码就是产生一个样本，代码非常简单（其实上这一切都很简单），sum是权重向量与样本向量的乘积，sumWeights是权重之和，如论文中所述，大于权重之和1/2就是正例，小于权重1/2就是负例。将样本的类别进行反转的方式产生噪音。最后调用addDrift进行漂移处理。

privatevoid addDrift() {

for (int i = 0; i <
this.numDriftAttsOption.getValue(); i++) {

this.weights[i] += (double)
((double)
sigma[i])

* ((double)
this.magChangeOption.getValue());

if (//this.weights[i] >= 1.0 || this.weights[i] <= 0.0 ||

(1 + (this.instanceRandom.nextInt(100))) <=

this.sigmaPercentageOption

.getValue()) {

this.sigma[i] *= -1;

}

这里有两个问题，第一，原论文中并没有说明，漂移的维是固定的几个维在漂移，还是一会这一些在漂移，下一次另一些在漂移，这其实也很好理解，固定的几个维能比较真实的模拟现实中的概念漂移，乱漂移的这种我还想不出来现实中什么漂移是这样的。第二，代码中有一行注释，这行注释是很重要的，加与不加区别是很大的，规律也不是很清楚，用数学软件自己看一下就明白了。

【上篇】HDU 1059 物品价值平分问题，母函数或者多重背包与 2844类似
【下篇】JavaScript 函数调用规则和if条件判断

作者: batu

该日志由 batu 于11年前发表在综合分类下，最后更新于 2013年10月21日.
转载请注明: 数据流-移动超平面(HyperPlane)构造 | 学步园 +复制链接

抱歉!评论已关闭.

学步园

数据流-移动超平面(HyperPlane)构造

作者: batu

书签

最新文章New

本站推荐

返回首页