proc rank

现在的位置: 首页 > 综合 > 正文

RSS

2018年10月21日 ⁄ 综合 ⁄ 共 2604字 ⁄ 字号小中大 ⁄ 评论关闭

一个中文博客：http://blog.sina.com.cn/s/blog_6849f0730100we95.html

排序方法详解（文档）：

Proc rank 计算观测值对应数值型变量的秩次

语法：

Proc rank <options>;

By <descending> variable-1 <descending> variable-n

<notsorted> ;*分组变量;

Var data-set-variables(s);*设定待排序求秩变量;

Ranks new-variable(s);*含秩次的变量;

Options中求秩排序的方法：

1.1FRACTION

computes fractional ranks by dividing each rank by the number of observations having nonmissing values of the ranking variable

TIES=HIGH is the default with the FRACTION option. With TIES=HIGH, fractional ranks are considered values of a right-continuous empirical cumulative distribution function.

1.2NPLUS1

computes fractional ranks by dividing each rank by the denominator n+1, where n is the number of observations having nonmissing values of the ranking variable.

2.GROUPS=number-of-groups

assigns group values ranging from 0 to number-of-groups minus 1. Common specifications are GROUPS=100 for percentiles, GROUPS=10 for deciles, and GROUPS=4 for quartiles. For example, GROUPS=4 partitions the original
values into four groups, with the smallest values receiving, by default, a quartile value of 0 and the largest values receiving a quartile value of 3.

The formula for calculating group values is

where FLOOR is the FLOOR function, rank is the value's order rank, k is the value of GROUPS=, and n is the number of observations having nonmissing values of the ranking variable.

If the number of observations is evenly divisible by the number of groups, each group has the same number of observations, provided there are no tied values at the boundaries of the groups. Grouping observations
by a variable that has many tied values can result in unbalanced groups because PROC RANK always assigns observations with the same value to the same group.

3.NORMAL=BLOM | TUKEY | VW

computes normal scores from the ranks. The resulting variables appear normally distributed. The formulas are

where ri is the rank of the ith observation, and
n is the number of nonmissing observations for the ranking variable.

VW stands for van der Waerden. With NORMAL=VW, you can use the scores for a nonparametric location test. All three normal scores are approximations to the exact expected order statistics for the normal distribution,
also called normal scores. The BLOM version appears to fit slightly better than the others (Blom 1958; Tukey 1962).

4. PERCENT

divides each rank by the number of observations that have nonmissing values of the variable and multiplies the result by 100 to get a percentage.

5. SAVAGE

computes Savage (or exponential) scores from the ranks by the following formula (Lehman 1998):

TIES=HIGH | LOW | MEAN

specifies how to compute normal scores or ranks for tied data values.

HIGH

assigns the largest of the corresponding ranks (or largest of the normal scores when NORMAL= is specified).

LOW

assigns the smallest of the corresponding ranks (or smallest of the normal scores when NORMAL= is specified).

MEAN

assigns the mean of the corresponding rank (or mean of the normal scores when NORMAL= is specified).

【上篇】异方差/自相关/多重共线性/协整
【下篇】经典线性回归拾穗

作者: yokogawa

该日志由 yokogawa 于6年前发表在综合分类下，最后更新于 2018年10月21日.
转载请注明: proc rank | 学步园 +复制链接

抱歉!评论已关闭.

学步园

proc rank

作者: yokogawa

书签

最新文章New

本站推荐

返回首页