现在的位置: 首页 > 综合 > 正文

数据挖掘数据集下载资源

2019年05月16日 ⁄ 综合 ⁄ 共 3019字 ⁄ 字号 评论关闭

1、气候监测数据集 http://cdiac.ornl.gov/ftp/ndp026b

2、几个实用的测试数据集下载的网站

http://www.fs.fed.us/fire/fuelman/

http://www.cs.toronto.edu/~roweis/data.html
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html

在下面的网址可以找到reuters数据集:http://www.research.att.com/~lewis/reuters21578.html

该网址有各种数据集:http://kdd.ics.uci.edu/summary.data.type.html

进行文本分类,还有一个数据集是可以用的,即rainbow的数据集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

3、UCI收集的机器学习数据集
ftp://pami.sjtu.edu.cn/
http://www.ics.uci.edu/~mlearn//MLRepository.htm

4、statlib 
http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm
http://lib.stat.cmu.edu/

5、关于基金的数据挖掘的网站
http://www.gotofund.com/index.asp

http://lans.ece.utexas.edu/~strehl/

6、进行文本分类&WEB
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

http://www.w3.org/TR/WD-logfile-960221.html
http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog
http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.web-caching.com/traces-logs.html
http://www-2.cs.cmu.edu/webkb
http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
http://www.cs.cornell.edu/projects/kddcup/index.html

7、时间序列数据的网址
http://www.stat.wisc.edu/~reinsel/bjr-data/

8、apriori算法的测试数据
http://www.almaden.ibm.com/cs/quest/syndata.html

9、数据生成器的链接
http://www.cse.cuhk.edu.hk/~kdd/data_collection.html
http://www.almaden.ibm.com/cs/quest/syndata.html

10、关联:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData

11、WEKA:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar

1。A jarfile containing 37 classification problems, originally obtained from the UCI repository
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar

2。A jarfile containing 37 regression problems, obtained from various sources
http://prdownloads.sourceforge.net/weka/datasets-numeric.jar

3。A jarfile containing 30 regression datasets collected by Luis Torgo
http://prdownloads.sourceforge.net/weka/regression-datasets.jar

12、癌症基因:
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi

13、金融数据:
http://lisp.vse.cz/pkdd99/Challenge/chall.htm

14、一个很好的资源网址为:http://kdd.ics.uci.edu/,里面包含的数据资源如下(按应用领域划分):

Direct Marketing 

  KDD CUP 1998 Data 
GIS 

  Forest CoverType 
Indexing 
  Corel Image Features 

  Pseudo Periodic Synthetic Time Series 
Intrusion Detection 
  KDD CUP 1999 Data 
Process Control 

  Synthetic Control Chart Time Series 
Recommendation Systems 
  Entree Chicago Recommendation Data 
Robots 

  Pioneer-1 Mobile Robot Data 

  Robot Execution Failures 
Sign Language Recognition 

  Australian Sign Language Data 

  High-quality Australian Sign Language Data 
Text Categorization 

  20 Newsgroups Data 

  Reuters-21578 Text Categorization Collection 

  NSF Research Awards Abstracts 199 0-2003 
World Wide Web 
  Microsoft Anonymous Web Data 

  MSNBC Anonymous Web Data 

  Syskill Webert Web Data

抱歉!评论已关闭.