现在的位置: 首页 > 综合 > 正文

poj 2774 Long Long Message 求两个字符串的最长公共子串 后缀数组

2013年01月27日 ⁄ 综合 ⁄ 共 4974字 ⁄ 字号 评论关闭
Long Long Message
Time Limit: 4000MS   Memory Limit: 131072K
Total Submissions: 10949   Accepted: 4374
Case Time Limit: 1000MS

Description

The little cat is majoring in physics in the capital of Byterland. A piece of sad news comes to him these days: his mother is getting ill. Being worried about spending so much on railway tickets (Byterland is such a big country, and he has to spend 16 shours on train to his hometown), he decided only to send SMS with his mother.

The little cat lives in an unrich family, so he frequently comes to the mobile service center, to check how much money he has spent on SMS. Yesterday, the computer of service center was broken, and printed two very long messages. The brilliant little cat soon found out:

1. All characters in messages are lowercase Latin letters, without punctuations and spaces.
2. All SMS has been appended to each other – (i+1)-th SMS comes directly after the i-th one – that is why those two messages are quite long.
3. His own SMS has been appended together, but possibly a great many redundancy characters appear leftwards and rightwards due to the broken computer.
E.g: if his SMS is “motheriloveyou”, either long message printed by that machine, would possibly be one of “hahamotheriloveyou”, “motheriloveyoureally”, “motheriloveyouornot”, “bbbmotheriloveyouaaa”, etc.
4. For these broken issues, the little cat has printed his original text twice (so there appears two very long messages). Even though the original text remains the same in two printed messages, the redundancy characters on both sides would be possibly different.

You are given those two very long messages, and you have to output the length of the longest possible original text written by the little cat.

Background:
The SMS in Byterland mobile service are charging in dollars-per-byte. That is why the little cat is worrying about how long could the longest original text be.

Why ask you to write a program? There are four resions:
1. The little cat is so busy these days with physics lessons;
2. The little cat wants to keep what he said to his mother seceret;
3. POJ is such a great Online Judge;
4. The little cat wants to earn some money from POJ, and try to persuade his mother to see the doctor :(

Input

Two strings with lowercase letters on two of the input lines individually. Number of characters in each one will never exceed 100000.

Output

A single line with a single integer number – what is the maximum length of the original text written by the little cat.

Sample Input

yeshowmuchiloveyoumydearmotherreallyicannotbelieveit
yeaphowmuchiloveyoumydearmother

Sample Output

27

 

 

 

 

 

//

 

 

 

#include<iostream>
#include<cstdio>
#include<cstring>
using namespace std;
///后缀数组  倍增算法
const int maxn=500000;
char str[maxn];
int wa[maxn],wb[maxn],wv[maxn],wn[maxn],a[maxn],sa[maxn];
int cmp(int* r,int a,int b,int l)
{return r[a]==r[b]&&r[a+l]==r[b+l];}
/**n为字符串长度,m为字符的取值范围,r为字符串。后面的j为每次排
序时子串的长度*/
void DA(int* r,int* sa,int n,int m)
{
    int i,j,p,*x=wa,*y=wb,*t;
    ///对R中长度为1的子串进行基数排序
    for(i=0;i<m;i++)wn[i]=0;
    for(i=0;i<n;i++)wn[x[i]=r[i]]++;
    for(i=1;i<m;i++)wn[i]+=wn[i-1];
    for(i=n-1;i>=0;i--)sa[--wn[x[i]]]=i;
    for(j=1,p=1;p<n;j*=2,m=p)
    {
        /**利用了上一次基数排序的结果,对待排序的子串的第二关键字进行
        了一次高效地基数排序*/
        for(p=0,i=n-j;i<n;i++)y[p++]=i;
        for(i=0;i<n;i++)if(sa[i]>=j)y[p++]=sa[i]-j;
        ///基数排序
        for(i=0;i<n;i++)wv[i]=x[y[i]];
        for(i=0;i<m;i++)wn[i]=0;
        for(i=0;i<n;i++)wn[wv[i]]++;
        for(i=1;i<m;i++)wn[i]+=wn[i-1];
        for(i=n-1;i>=0;i--)sa[--wn[wv[i]]]=y[i];
        ///当p=n的时候,说明所有串都已经排好序了
        ///在第一次排序以后,rank数组中的最大值小于p,所以让m=p
        for(t=x,x=y,y=t,p=1,x[sa[0]]=0,i=1;i<n;i++)
            x[sa[i]]=cmp(y,sa[i-1],sa[i],j)?p-1:p++;
    }
    return;
}
///后缀数组  计算height数组
/**
height数组的值应该是从height[1]开始的,而且height[1]应该是等于0的。
原因是,+因为我们在字符串后面添加了一个0号字符,所以它必然是最小的
一个后缀。而字符串中的其他字符都应该是大于0的(前面有提到,使用倍
增算法前需要确保这点),所以排名第二的字符串和0号字符的公共前缀
(即height[1])应当为0.在调用calheight函数时,要注意height数组的范
围应该是[1..n]。所以调用时应该是calheight(r,sa,n)
而不是calheight(r,sa,n+1)。*/
int rank[maxn],height[maxn];
void calheight(int* r,int* sa,int n)
{
    int i,j,k=0;
    for(i=1;i<=n;i++)rank[sa[i]]=i;
    for(i=0;i<n;height[rank[i++]]=k)
    for(k?k--:0,j=sa[rank[i]-1];r[i+k]==r[j+k];k++);
    return;
}
//最长公共子串
bool check(int a,int b,int n1,int n2)
{
    if(a>b) return check(b,a,n1,n2);
    if(a<n1&&b>n1) return 1;
    return 0;
}
int main()
{
        //后缀数组 倍增算法 使用方法
        /**
        在使用倍增算法前,需要保证r数组的值均大于0。然后要在原字
        符串后添加一个0号字符,具体原因参见罗穗骞的论文。这时候,
        若原串的长度为n,则实际要进行后缀数组构建的r数组的长度应
        该为n+1.所以调用DA函数时,对应的n应为n+1.*/
       /* int n=strlen(str);//str 待处理字符串
        for(int i=0;i<n;i++) a[i]=(int)str[i];
        a[n]=0;
        DA(a,sa,n+1,256);
        calheight(a,sa,n);*/
        //....................................
    char str1[maxn],str2[maxn];
    scanf("%s%s",str1,str2);
    int n1=strlen(str1),n2=strlen(str2);
    for(int i=0;i<n1;i++) a[i]=(int)str1[i];a[n1]=1;//notice 1
    for(int i=0;i<n2;i++) a[n1+1+i]=(int)str2[i];a[n1+1+n2+1]=0;
    int n=n1+n2+1;
    DA(a,sa,n+1,256);
    calheight(a,sa,n);
        /**1. 求两个字符串的最长公共子串
        先将第二个字符串写在第一个字符串后面,中间用一个没有出现过
        的字符隔开,再求这个新的字符串的后缀数组。观察一下,看看能
        不能从这个新的字符串的后缀数组中找到一些规律。以A=“aaaba”,
        B=“abaa”为例,那么是不是所有的height值中的最大值就是答案呢?
        不一定!有可能这两个后缀是在同一个字符串中的,所以实际上只
        有当suffix(sa[i-1])和suffix(sa[i])不是同一个字符串中的两个后
        缀时,height[i]才是满足条件的。而这其中的最大值就是答案
        */
    int _max=-1;
    for(int i=1;i<=n;i++)
    {
        if(check(sa[i-1],sa[i],n1,n2)&&height[i]>_max)
        {
            _max=height[i];
        }
    }
    printf("%d/n",_max);
    return 0;
}

 

抱歉!评论已关闭.