Codeforces Round #246 (Div. 2) D. Prefixes and Suffixes(后缀数组orKMP)

现在的位置: 首页 > 综合 > 正文

Codeforces Round #246 (Div. 2) D. Prefixes and Suffixes(后缀数组orKMP)

2018年03月18日 ⁄ 综合 ⁄ 共 4764字 ⁄ 字号小中大 ⁄ 评论关闭

You have a string s = s₁s₂...s_|s|,
where |s| is the length of string s,
and s_i its i-th
character.

Let's introduce several definitions:

A substring s[i..j] (1 ≤ i ≤ j ≤ |s|) of
string s is string s_is_i + 1...s_j.
The prefix of string s of length l (1 ≤ l ≤ |s|) is
string s[1..l].
The suffix of string s of length l (1 ≤ l ≤ |s|) is
string s[|s| - l + 1..|s|].

Your task is, for any prefix of string s which matches a suffix of string s,
print the number of times it occurs in strings as a substring.

Input

The single line contains a sequence of characters s₁s₂...s_|s| (1 ≤ |s| ≤ 10⁵) —
string s. The string only consists of uppercase English letters.

Output

In the first line, print integer k (0 ≤ k ≤ |s|) —
the number of prefixes that match a suffix of string s. Next print klines,
in each line print two integers l_i c_i.
Numbers l_i c_i mean
that the prefix of the length l_i matches
the suffix of length l_i and
occurs in string s as a substring c_i times.
Print pairs l_i c_i in
the order of increasing l_i.

Sample test(s)

input
ABACABA

output
3
1 4
3 2
7 1

input
AAA

output
3
1 3
2 2
3 1

题意：

给你一个长度不超过10^5的字符串。要你按长度输出和后缀完全匹配的的前缀的长度。和该前缀在整个串中出现的次数。（可重叠）

思路：

比赛时一看到前缀后缀。心里一阵窃喜。哈哈。刚好学过后缀数组。正好有用武只地了，一番思索后算法已成型。0号后缀就是整个字符串。和它求公共前缀能和整个后缀匹配的后缀一定有一个前缀能和这个后缀完全匹配。然后再确定出现了多少次。当你知道某个后缀是目标后缀时。你可以知道到它的rank值。然后要完全包含一个后缀的后缀一定在它后面。根据排名规则。你想啊。如果后缀a的前缀包含后缀b。a还会排在b前面吗？明显长度短的排前面了。所以剩下工作就是确定可以向下扩展的最大距离了。
这个可以根据height数据的值确定。要用到二分+rmq。二分确定位置。rmq判断是否满足条件。思路虽然正确但是到比赛结束都一直是错的，到后面调试出来才知道还是对后缀数组的理解不够深刻。问题就出在倍增算法为什么要规定txt[n-1]=0.还有j=sa[rank[i]-1];rank[i]=0怎么处理。我们把原来的字符串末尾加个0就可解决。详细见代码：

#include<bits/stdc++.h>
using namespace std;
const int INF=0x3f3f3f3f;
const double eps=1e-8;
const double PI=acos(-1.0);
const int maxn=150010;
char txt[maxn];
int sa[maxn],T1[maxn],T2[maxn],ct[maxn],he[maxn],rk[maxn],ans,n,m;//sa[i]表示排名第i的后缀的起始位置。
int rmq[25][maxn],lg[maxn],ansn[maxn],ansp[maxn],ptr;
void getsa(char *st)//注意m为ASCII码的范围
{
    int i,k,p,*x=T1,*y=T2;
    for(i=0; i<m; i++) ct[i]=0;
    for(i=0; i<n; i++) ct[x[i]=st[i]]++;
    for(i=1; i<m; i++) ct[i]+=ct[i-1];
    for(i=n-1; i>=0; i--)//倒着枚举保证相对顺序
        sa[--ct[x[i]]]=i;
    for(k=1,p=1; p<n; k<<=1,m=p)//枚举长度
    {
        for(p=0,i=n-k; i<n; i++) y[p++]=i;
        for(i=0; i<n; i++) if(sa[i]>=k) y[p++]=sa[i]-k;//按第二关键字排序.y[i]表示第二关键字排名第i的后缀起始位置
        for(i=0; i<m; i++) ct[i]=0;
        for(i=0; i<n; i++) ct[x[y[i]]]++;//x[i]表示起始位置为i的后缀的第一关键字排序
        for(i=1; i<m; i++) ct[i]+=ct[i-1];
        for(i=n-1; i>=0; i--) sa[--ct[x[y[i]]]]=y[i];//接着按第一关键字排序
        for(swap(x,y),p=1,x[sa[0]]=0,i=1; i<n; i++)
            x[sa[i]]=y[sa[i-1]]==y[sa[i]]&&y[sa[i-1]+k]==y[sa[i]+k]?p-1:p++;//x[i]存排名第i后缀的排名
        //保证txt[n-1]=0.规定txt[n-1]=0 的好处，如果y[sa[i-1]]=y[sa[i]]，说明以y[sa[i-1]]或y[sa[i]]
        //开头的长度为k的字符串肯定不包括字符txt[n-1]因为包括了肯定就不相等了。所以调用变量y[sa[i]+l]和y[sa[i-1]+l]
        //不会导致数组下标越界，这样就不需要做特殊判断。
    }
}
void gethe(char *st)//求height数组
{
    int i,j,k=0;
    for(i=0;i<n;i++) rk[sa[i]]=i;
    for(i=0;i<n-1;i++)
    {
        if(k) k--;
        j=sa[rk[i]-1];
        while(st[i+k]==st[j+k]) k++;
        he[rk[i]]=k;
    }
}
void rmq_init()
{
    int i,j;
    for(i=0;i<n;i++)
        rmq[0][i]=he[i];//单个元素
    for(i=1;i<=lg[n];i++)//枚举长度
        for(j=0;j+(1<<i)-1<n;j++)//枚举起点注意边界
            rmq[i][j]=min(rmq[i-1][j],rmq[i-1][j+(1<<(i-1))]);
}
int rmq_min(int l,int r)
{
    int tmp=lg[r-l+1];
    return min(rmq[tmp][l],rmq[tmp][r-(1<<tmp)+1]);
}
void prermq()
{
    int  i;
    lg[0]=-1;
    for(i=1;i<maxn;i++)
        lg[i]=lg[i>>1]+1;
}
void solve()
{
    int low,hi,mid,p,pos,a,b,ans,tp,i;
    getsa(txt),gethe(txt),rmq_init();
    ptr=0,pos=rk[0];
    for(i=n-2;i>0;i--)
    {
        if(rk[i]<pos)
            a=rk[i]+1,b=pos;
        else
            a=pos+1,b=rk[i];
        p=rmq_min(a,b);

        if(p>=n-i-1)
        {
            ansp[ptr]=p,tp=rk[i]+1;
            low=rk[i]+1,hi=n-1,ans=-1;
            while(low<=hi)
            {
                mid=(low+hi)>>1;
                if(rmq_min(tp,mid)>=p)
                    ans=mid,low=mid+1;
                else
                    hi=mid-1;
            }
            ansn[ptr++]=ans-rk[i]+1;
        }
    }
}
int main()
{
    int i;

    prermq();
    while(~scanf("%s",txt))
    {
        m=150,n=strlen(txt);
        n++;
        solve();
        ansp[ptr]=n-1;
        ansn[ptr++]=1;
        printf("%d\n",ptr);
        for(i=0;i<ptr;i++)
            printf("%d %d\n",ansp[i],ansn[i]);
    }
    return 0;
}

接下来是第二种思路。比赛完后。第一个思路调不出来。于是就去群里问了下。结果被叉姐鄙视了。扔了句kmp就走了。定神一想是啊。自己智商被深深得鄙视了。kmp可以很轻松得统计出每个前缀在原串中出现的次数。具体做法是对原串求个失配数组。然后自己和自己匹配。若第i个位置和第j个位置匹配了说明前缀j在第i个位置出现了一次。我们用cnt[i]记录。前缀i出现的次数。最后统计cnt[next[i]]+=cnt[i]。这个很好理解。如果前缀j能在i这个位置出现一次那么next[j]一定能在i这个位置出现一次。统计完每个前缀在原串中出现次数后。现在就要找钱缀和后缀匹配的前缀的个数了。这个很简单。自己和自己匹配不就是拿自己的前半部分和自己的其他部分匹配么。所以我们只需要匹配第n+1个位置就可以找出所有和后缀匹配的前缀了。华丽的O(n)就过掉了。。。。

详细见代码：

#include<bits/stdc++.h>
using namespace std;
const int INF=0x3f3f3f3f;
const double eps=1e-8;
const double PI=acos(-1.0);
const int maxn=150010;
char txt[maxn];
int f[maxn],cnt[maxn],ansp[maxn],ansn[maxn],ct,n;
void getf(char *p)
{
    int i,j;
    f[0]=f[1]=0;
    for(i=1;i<n;i++)
    {
        j=f[i];
        while(j&&p[j]!=p[i])
            j=f[j];
        f[i+1]=p[j]==p[i]?j+1:0;
    }
}
void KMP()
{
    int i,j,t;
    for(i=0,j=0;i<n;i++)
    {
        while(j&&txt[j]!=txt[i])
            j=f[j];
        if(txt[j]==txt[i])
           cnt[j]++,j++;//cnt[j]表示前缀j出现次数。因为i不同所以终点不同。
    }
    t=j,ct=0;;
    for(j=n;j>0;j--)//为什么可以这样做呢。终点不同的串一定是不同的串。kmp保证了终点不同。
        if(f[j])//f[j]表示下个比较的位置。说明前f[j]-1一定是相同的。
            cnt[f[j]-1]+=cnt[j-1];
    while(t)//前缀匹配后缀
    {
        ansp[ct]=t;
        ansn[ct++]=cnt[t-1];
        t=f[t];
    }
    printf("%d\n",ct);
    for(i=ct-1;i>=0;i--)
        printf("%d %d\n",ansp[i],ansn[i]);
}
int main()
{
    while(~scanf("%s",txt))
    {
        n=strlen(txt);
        memset(cnt,0,sizeof cnt);
        getf(txt);
        KMP();
    }
}

【上篇】hdu 2825 Wireless Password(ac自动机&dp)
【下篇】poj 1743 Musical Theme(男人八题&后缀数组第一题)

作者: cajon

该日志由 cajon 于6年前发表在综合分类下，最后更新于 2018年03月18日.
转载请注明: Codeforces Round #246 (Div. 2) D. Prefixes and Suffixes(后缀数组orKMP) | 学步园 +复制链接

抱歉!评论已关闭.

学步园

Codeforces Round #246 (Div. 2) D. Prefixes and Suffixes(后缀数组orKMP)

作者: cajon

书签

最新文章New

本站推荐

返回首页