Links
- Competitions
in Data Mining and Knowledge Discovery - Data Library Sources: Downloadable
Data by Subject - Data
Mining - Case Study - Datasets for Data Mining and
Knowledge Discovery - Datasets for
Machine Learning, Knowledge Discovery, Data Mining - Machine Learning network Online Information Service - David Dowe's data links
- Delve Datasets - Collections
of data for developing, evaluating, and comparing learning methods - Digital Chart of the World
- Directory of /pub/machine-learning-databases
- FIMI (Frequent Itemset Mining Implementations):
software and datasets - Kent Ridge Biomedical Data Set Repository
- NCDC: Online
Document Library, Dataset Documentation - Reuters-21578
Text Categorization Collection - State and County Demographic and Economic
Profiles - Surveillance, Epidemiology, and End Results
- The Royal Statistical Society Dataset
Website - The UCR Time Series Data
Mining Archive - TheDataWeb - a network of online data libraries
- UCI KDD Archive
- UCI Machine Learning Repository
- UCR Time Series Data Mining
Archive - WHO Statistical Information System
(二)
Links
- Climate
Data Archives - Data Centre - www.marine.csiro.au
- Data Library Sources: Downloadable
Data by Subject - Data Sets
- Data Sources
- Datafiles by Subject
- Datasets for Data Mining and
Knowledge Discovery - Datasets from the
Book: Statistical Consulting - Delve Datasets - Collections
of data for developing, evaluating, and comparing learning methods - Digital Chart of the World
- Donnees SMEL
- Electronic Dataset Service
- Kent Ridge Biomedical Data Set Repository
- Martin Bland's
Medical Data-sets - MIMAS dataset services
- NCDC: Online
Document Library, Dataset Documentation - SNZ-
Datalab - STA114/MTH136
Andrews and Herzberg Data Sets - Stat Labs Data Page
- State and County Demographic and Economic
Profiles - Statistical Reference Datasets (StRD)
- StatLib---Datasets Archive
- STATWEB of the Swiss Federal Statistical Office
- Surveillance, Epidemiology, and End Results
- The Data and Story Library
- The DataLab at UC Irvine
- The Royal Statistical Society Dataset
Website - TheDataWeb - a network of online data libraries
- U.S. Census Data Database Search
- UCI KDD Archive
- UCI ML Repository Content
Summary - WHO Statistical Information System
(三)
- Gunnar
Raetsch's Benchmark DatasetsVarious benchmark datasets prepared for Matlab (V6 and V7). Includes BreastCancer, Cards, chess, Circle, credit, Heart1, hepatitis, HouseVotes84, Ionosphere, liver, monks3, musk, PimaIndiansDiabetes, promotergene, ringnorm,
Sonar, Spirals, threenorm, tictactoe, titanic and twonorm. Those areBenchmark Data Sets used in [RaeOnoMue01] and [MikRaeWesSchMue99].
Very good for classification tasks.[RaeOnoMue01 Mirror] [MikRaeWesSchMue99
Mirror] - Data from "Benchmarking Support Vector Machines"[MeyerLeischHornik02].
Very good for comparing your classifier or regression algorithm against other algorithms (SVM, KNN, Neural Nets, Bagging, Boosting, Random Forests and others). Includes many data sets such as liver, hepatitis, credit, monks3, HouseVotes84, Sonar, tictactoe,
ringnorm, musk, Spirals, threenorm, Ionosphere, BreastCancer, Circle, titanic, Heart1, chess, PimaIndiansDiabetes, promotergene, twonorm, Cards. The data is in images ofR.
To extract it, you can use the following R-command:for(iin (1:100)){load(sprintf("%i.RData",i)); write.table(train,file=sprintf("%itrain.txt",i));} - UCI Machine Learning Repository - Many useful datasets
- DMOZ - Data sets for machine
learning - A dataset for path-finding in images (Field Robotics)
- LETOR - package of benchmark data sets
for LEarning TO Rank - KIN40K regressions data set
- Clustering Data Sets (Mammals, Birth/Death Rates, New Haven Schools,
Nutrients) - UCI and UCIKDD data sets classification and
regressionin Weka ARFF format. More ARFF datasets such as Protein & Biomedical data, drug design, Reuters21578 as the
ModApte split, and various agricultural data sets can be foundhere. - Clustering data sets
- Fundamental Clustering Problem Suite (FCPS).
Includes
clustering problems such as Hepta, Lsun, Tetra, Chainlink, Atom, EngyTime, Target, TwoDiamonds, Wingnut and Golfball. - RCV1 Text Categorization
Test Collection