利用判别分析、logistic模型来分析信用风险的文章很多,这类文章应该传入国内最早的方法之一了。但是看期刊文章可以发现,作者都喜欢直接给出结果,至于如何得到结果的过程通常并不是非常的明晰。《Application of Proc Discrim and Proc Logistic in Credit Risk Modeling》一文给出SAS软件的具体操作:
1.判别分析
之前转载过一篇判别分析的例子:http://blog.csdn.net/yugao1986/article/details/6359080
这里大体类似:
/*该例是预测开户两年的账户的信用等级情况*/ /*案例数据build,目标变量分为3类等级A、C、O,自变量有3个 credit_limit number_of_trades utilization*/ DATA build; INPUT credit_limit number_of_trades utilization Target $ @@; DATALINES; 1300 22 0.41 O 400 9 0.88 C 2500 29 0.57 O 4400 49 0.29 A 4900 43 0.86 A 5200 49 0.36 A 2500 35 1.02 A 600 9 0.83 C 1100 28 0.88 O 600 11 0.50 C 500 33 0.12 C 1600 26 0.90 O 4000 49 0.43 A 400 11 1.09 C 2400 36 0.22 O 2500 29 0.76 O 5400 53 0.30 A 2700 32 0.68 O 500 8 1.09 C 2000 36 0.54 O 1600 35 0.54 O 500 8 1.10 C 650 13 1.00 C 5000 46 0.71 O 5100 52 0.17 A 2200 37 0.60 O 1400 25 0.50 O 2200 31 0.37 O 4700 49 0.75 A 1500 33 0.25 O 1600 29 0.63 O 2200 33 0.25 O 1500 30 0.27 O 1600 38 0.58 O 1800 39 0.45 O 2100 37 0.37 O 1700 36 0.68 O 2100 28 1.00 O 1600 29 0.65 O 600 10 0.50 C 1300 25 0.22 O 1900 24 0.18 O 1900 33 0.49 O 2600 30 0.53 O 2300 24 0.12 O 1200 34 0.30 O 5400 52 0.47 A 2600 35 0.33 O 1700 24 0.65 O 500 8 0.73 C 600 7 0.40 C 4400 47 0.10 A 2200 31 0.50 O 1400 34 0.64 O 5100 47 0.71 A 2000 31 0.17 O 2300 30 0.30 O 1700 32 0.59 O 1000 30 0.11 O 1400 33 0.23 A 2400 32 0.90 C 3300 30 0.87 A 2300 35 0.47 O 2800 35 0.58 O 500 12 0.48 C 2700 37 0.69 O 2200 28 0.38 O 4500 54 0.29 A 4900 50 0.29 A 550 11 0.41 C 1900 25 0.20 O ; RUN; /*指标筛选过程,结果选出credit_limit number_of_trades两变量*/ PROC STEPDISC DATA=build; CLASS target; VAR credit_limit number_of_trades utilization; RUN; /*判别分析过程*/ PROC DISCRIM DATA=build TESTDATA=build POOL=test OUT=disc; PRIORS prop; CLASS target; VAR credit_limit number_of_trades; RUN;
2.logistic模型
PROC LOGISTIC DATA=build; MODEL target(ref='O') = credit_limit number_of_trades utilization / SELECTION=stepwise LINK=glogit; RUN; /*根据logistic过程得到的结果建立评估模型,-9.1618等参数来自proc logistic*/ DATA logit; SET build; phat_A=-9.1618+credit_limit*0.00116+number_of_trades*0.1263; phat_C=6.3220+credit_limit*(-0.00272)+number_of_trades*(-0.1629); prob_O = 1/(1+exp(phat_A)+exp(phat_C)); prob_A = prob_O*exp(phat_A); prob_C = prob_O*exp(phat_C); max = max(prob_A, prob_C, prob_O); IF prob_O = max THEN pred = 'O'; ELSE IF prob_A = max THEN pred = 'A'; ELSE pred = 'C'; RUN; /*结果一览*/ PROC FREQ DATA=logit; TABLES target * pred /LIST; RUN;