求字符串中不重复的子串数

现在的位置: 首页 > 综合 > 正文

RSS

求字符串中不重复的子串数

2018年02月21日 ⁄ 综合 ⁄ 共 2540字 ⁄ 字号小中大 ⁄ 评论关闭

Given a string, we need to find the total number of its distinct substrings.

Input

T- number of test cases. T<=20;
Each test case consists of one string, whose length is <= 1000

Output

For each test case output one number saying the number of distinct substrings.

Example

Sample Input:
2
CCCCC
ABABA

Sample Output:
5
9

Explanation for the testcase with string ABABA:
len=1 : A,B
len=2 : AB,BA
len=3 : ABA,BAB
len=4 : ABAB,BABA
len=5 : ABABA
Thus, total number of distinct substrings is 9.

[cpp]view

 plaincopy

#include<iostream>  

#include<cstdio>  

#include<cstring>  

using namespace std;  

const int maxn=1010;  

char str[maxn];  

int wa[maxn],wb[maxn],wv[maxn],wn[maxn],a[maxn],sa[maxn];  

int cmp(int* r,int a,int b,int l)  

{  

    return r[a]==r[b]&&r[a+l]==r[b+l];  

}  

//n为字符串长度，m为字符的取值范围，r为字符串。后面的j为每次排序时子串的长度  

void DA(int* r,int* sa,int n,int m)  

{  

    int i,j,p,*x=wa,*y=wb,*t;  

    ///对R中长度为1的子串进行基数排序  

    for(i=0; i<m; i++)wn[i]=0;  

    for(i=0; i<n; i++)wn[x[i]=r[i]]++;  

    for(i=1; i<m; i++)wn[i]+=wn[i-1];  

    for(i=n-1; i>=0; i--)sa[--wn[x[i]]]=i;  

    for(j=1,p=1; p<n; j*=2,m=p)  

    {  

        //利用了上一次基数排序的结果，对待排序的子串的第二关键字进行了一次高效地基数排序  

        for(p=0,i=n-j; i<n; i++)y[p++]=i;  

        for(i=0; i<n; i++)if(sa[i]>=j)y[p++]=sa[i]-j;  

        ///基数排序  

        for(i=0; i<n; i++)wv[i]=x[y[i]];  

        for(i=0; i<m; i++)wn[i]=0;  

        for(i=0; i<n; i++)wn[wv[i]]++;  

        for(i=1; i<m; i++)wn[i]+=wn[i-1];  

        for(i=n-1; i>=0; i--)sa[--wn[wv[i]]]=y[i];  

        ///当p=n的时候，说明所有串都已经排好序了  

        ///在第一次排序以后，rank数组中的最大值小于p，所以让m=p  

        for(t=x,x=y,y=t,p=1,x[sa[0]]=0,i=1; i<n; i++)  

            x[sa[i]]=cmp(y,sa[i-1],sa[i],j)?p-1:p++;  

    }  

    return;  

}  

///后缀数组  计算height数组  

/** 

height数组的值应该是从height[1]开始的，而且height[1]应该是等于0的。 

原因是，+因为我们在字符串后面添加了一个0号字符，所以它必然是最小的 

一个后缀。而字符串中的其他字符都应该是大于0的（前面有提到，使用倍 

增算法前需要确保这点），所以排名第二的字符串和0号字符的公共前缀 

（即height[1]）应当为0.在调用calheight函数时，要注意height数组的范 

围应该是[1..n]。所以调用时应该是calheight(r,sa,n) 

而不是calheight(r,sa,n+1)。*/  

int rank[maxn],height[maxn];  

void calheight(int* r,int* sa,int n)  

{  

    int i,j,k=0;  

    for(i=1; i<=n; i++)rank[sa[i]]=i;  

    for(i=0; i<n; height[rank[i++]]=k)  

        for(k?k--:0,j=sa[rank[i]-1]; r[i+k]==r[j+k]; k++);  

    return;  

}  

int main()  

{  

    int t;  

    scanf("%d",&t);  

    while(t--)  

    {  

        char str[1010];  

        scanf("%s",str);  

        int n=strlen(str);  

        for(int i=0;i<n;i++) a[i]=(int)str[i];  

        a[n]=0;  

        DA(a,sa,n+1,260);  

        calheight(a,sa,n);  

        int sum=0;  

        for(int i=1;i<=n;i++)  

        sum+=n-sa[i]-height[i];  

        printf("%d/n",sum);  

    }  

    return 0;  

}

【上篇】编程之美1.3 一摞烙饼的排序
【下篇】图的相关算法

作者: courtship

该日志由 courtship 于6年前发表在综合分类下，最后更新于 2018年02月21日.
转载请注明: 求字符串中不重复的子串数 | 学步园 +复制链接

抱歉!评论已关闭.

学步园