现在的位置: 首页 > 综合 > 正文

PERL常见问题解答–FAQ(4)–Data: Strings

2014年03月27日 ⁄ 综合 ⁄ 共 7858字 ⁄ 字号 评论关闭

How do I validate input?

The answer to this question is usually a regular expression, perhaps with auxiliary logic. See the more specific questions (numbers, email addresses, etc.) for details.

 

 


How do I unescape a string?

It depends just what you mean by ``escape''. URL escapes are dealt with in the perlfaq9 manpage. Shell escapes with the backslash (/) character are removed with:

 

    s///(.)/$1/g;

Note that this won't expand /n or /t or any other special escapes.

 

 


How do I remove consecutive pairs of characters?

To turn ``abbcccd'' into ``abccd'':

 

    s/(.)/1/$1/g;

 

 


How do I expand function calls in a string?

This is documented in the perlref manpage. In general, this is fraught with quoting and readability problems, but it is possible. To interpolate a subroutine call (in a list context) into a string:

 

    print "My sub returned @{[mysub(1,2,3)]} that time./n";

If you prefer scalar context, similar chicanery is also useful for arbitrary expressions:

 

    print "That yields ${/($n + 5)} widgets/n";

See also ``How can I expand variables in text strings?'' in this section of the FAQ.

 

 


How do I find matching/nesting anything?

This isn't something that can be tackled in one regular expression, no matter how complicated. To find something between two single characters, a pattern like /x([^x]*)x/ will get the intervening bits in $1. For multiple ones, then something more like /alpha(.*?)omega/ would be needed. But none of these deals with nested patterns, nor can they. For that you'll have to write a parser.

 

 


How do I reverse a string?

Use reverse() in a scalar context, as documented in reverse.

 

    $reversed = reverse $string;

 

 


How do I expand tabs in a string?

You can do it the old-fashioned way:

 

    1 while $string =~ s//t+/' ' x (length(___FCKpd___5amp;) * 8 - length(

How do I validate input?

The answer to this question is usually a regular expression, perhaps with auxiliary logic. See the more specific questions (numbers, email addresses, etc.) for details.

 

 


How do I unescape a string?

It depends just what you mean by ``escape''. URL escapes are dealt with in the perlfaq9 manpage. Shell escapes with the backslash (/) character are removed with:

 

    s///(.)/$1/g;

Note that this won't expand /n or /t or any other special escapes.

 

 


How do I remove consecutive pairs of characters?

To turn ``abbcccd'' into ``abccd'':

 

    s/(.)/1/$1/g;

 

 


How do I expand function calls in a string?

This is documented in the perlref manpage. In general, this is fraught with quoting and readability problems, but it is possible. To interpolate a subroutine call (in a list context) into a string:

 

    print "My sub returned @{[mysub(1,2,3)]} that time./n";

If you prefer scalar context, similar chicanery is also useful for arbitrary expressions:

 

    print "That yields ${/($n + 5)} widgets/n";

See also ``How can I expand variables in text strings?'' in this section of the FAQ.

 

 


How do I find matching/nesting anything?

This isn't something that can be tackled in one regular expression, no matter how complicated. To find something between two single characters, a pattern like /x([^x]*)x/ will get the intervening bits in $1. For multiple ones, then something more like /alpha(.*?)omega/ would be needed. But none of these deals with nested patterns, nor can they. For that you'll have to write a parser.

 

 


How do I reverse a string?

Use reverse() in a scalar context, as documented in reverse.

 

    $reversed = reverse $string;

 

 


How do I expand tabs in a string?

You can do it the old-fashioned way:

 

) % 8)/e;

Or you can just use the Text::Tabs module (part of the standard perl distribution).

 

    use Text::Tabs;
    @expanded_lines = expand(@lines_with_tabs);

 

 


How do I reformat a paragraph?

Use Text::Wrap (part of the standard perl distribution):

 

    use Text::Wrap;
    print wrap("/t", '  ', @paragraphs);

The paragraphs you give to Text::Wrap may not contain embedded newlines. Text::Wrap doesn't justify the lines (flush-right).

 

 


How can I access/change the first N letters of a string?

There are many ways. If you just want to grab a copy, use substr:

 

    $first_byte = substr($a, 0, 1);

If you want to modify part of a string, the simplest way is often to use substr() as an lvalue:

 

    substr($a, 0, 3) = "Tom";

Although those with a regexp kind of thought process will likely prefer

 

    $a =~ s/^.../Tom/;

 

 


How do I change the Nth occurrence of something?

You have to keep track. For example, let's say you want to change the fifth occurrence of ``whoever'' or ``whomever'' into ``whosoever'' or ``whomsoever'', case insensitively.

 

    $count = 0;
    s{((whom?)ever)}{
        ++$count == 5           # is it the 5th?
            ? "${2}soever"      # yes, swap
            : $1                # renege and leave it there
    }igex;

 

 


How can I count the number of occurrences of a substring within a string?

There are a number of ways, with varying efficiency: If you want a count of a certain single character (X) within a string, you can use the tr/// function like so:

 

    $string = "ThisXlineXhasXsomeXx'sXinXit":
    $count = ($string =~ tr/X//);
    print "There are $count X charcters in the string";

This is fine if you are just looking for a single character. However, if you are trying to count multiple character substrings within a larger string, tr/// won't work. What you can do is wrap a while() loop around a global pattern match. For example, let's count negative integers:

 

    $string = "-9 55 48 -2 23 -76 4 14 -44";
    while ($string =~ /-/d+/g) { $count++ }
    print "There are $count negative numbers in the string";

 

 


How do I capitalize all the words on one line?

To make the first letter of each word upper case:

 

        $line =~ s//b(/w)//U$1/g;

This has the strange effect of turning ``don't do it'' into ``Don'T Do It''. Sometimes you might want this, instead (Suggested by Brian Foy <comdog@computerdog.com>):

 

    $string =~ s/ (
                 (^/w)    #at the beginning of the line
                   |      # or
                 (/s/w)   #preceded by whitespace
                   )
                //U$1/xg;
    $string =~ /([/w']+)//u/L$1/g;

To make the whole line upper case:

 

        $line = uc($line);

To force each word to be lower case, with the first letter upper case:

 

        $line =~ s/(/w+)//u/L$1/g;

 

 


How can I split a [character] delimited string except when inside [character]? (Comma-separated files)

Take the example case of trying to split a string that is comma-separated into its different fields. (We'll pretend you said comma-separated, not comma-delimited, which is different and almost never what you mean.) You can't use split(/,/) because you shouldn't split if the comma is inside quotes. For example, take a data line like this:

 

    SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"

Due to the restriction of the quotes, this is a fairly complex problem. Thankfully, we have Jeffrey Friedl, author of a highly recommended book on regular expressions, to handle these for us. He suggests (assuming your string is contained in $text):

 

     @new = ();
     push(@new, ___FCKpd___19) while $text =~ m{
         "([^/"//]*(?://.[^/"//]*)*)",?  # groups the phrase inside the quotes
       | ([^,]+),?
       | ,
     }gx;
     push(@new, undef) if substr($text,-1,1) eq ',';

If you want to represent quotation marks inside a quotation-mark-delimited field, escape them with backslashes (eg, C<``like /''this/``''). Unescaping them is a task addressed earlier in this section.

Alternatively, the Text::ParseWords module (part of the standard perl distribution) lets you say:

 

    use Text::ParseWords;
    @new = quotewords(",", 0, $text);

 

 


How do I strip blank space from the beginning/end of a string?

The simplest approach, albeit not the fastest, is probably like this:

 

    $string =~ s/^/s*(.*?)/s*$/$1/;

It would be faster to do this in two steps:

 

    $string =~ s/^/s+//;
    $string =~ s//s+$//;

Or more nicely written as:

 

    for ($string) {
        s/^/s+//;
        s//s+$//;
    }

 

 


How do I extract selected columns from a string?

Use substr() or unpack(), both documented in the perlfunc manpage.

 

 


How do I find the soundex value of a string?

Use the standard Text::Soundex module distributed with perl.

 

 


How can I expand variables in text strings?

Let's assume that you have a string like:

 

    $text = 'this has a $foo in it and a $bar';
    $text =~ s//$(/w+)/${$1}/g;

Before version 5 of perl, this had to be done with a double-eval substitution:

 

    $text =~ s/(/$/w+)/$1/eeg;

Which is bizarre enough that you'll probably actually need an EEG afterwards. :-)

See also ``How do I expand function calls in a string?'' in this section of the FAQ.

 

 


What's wrong with always quoting "$vars"?

The problem is that those double-quotes force stringification, coercing numbers and references into strings, even when you don't want them to be.

If you get used to writing odd things like these:

 

    print "$var";       # BAD
    $new = "$old";      # BAD
    somefunc("$var");   # BAD

You'll be in trouble. Those should (in 99.8% of the cases) be the simpler and more direct:

 

    print $var;
    $new = $old;
    somefunc($var);

Otherwise, besides slowing you down, you're going to break code when the thing in the scalar is actually neither a string nor a number, but a reference:

 

    func(/@array);
    sub func {
        my $aref = shift;
        my $oref = "$aref";  # WRONG
    }

You can also get into subtle problems on those few operations in Perl that actually do care about the difference between a string and a number, such as the magical ++ autoincrement operator or the syscall() function.

 

 


Why don't my <There must be no space after the << part.

Check for these three things:

 

  1. There (probably) should be a semicolon at the end.
  2. You can't (easily) have any space in front of the tag.

 


【上篇】
【下篇】

抱歉!评论已关闭.