Regex to remove non printable characters

Discussion:

Ramprasad A Padmanabhan

2005-03-21 13:14:52 UTC

Hello All
I want to remove all characters with ascii values > 127 from a string

Can someone show me a efficient way of doing this.
Currently what I am doing is reading the string char-by-char and check
its ascii value. I think there must be a better way.

Thanks
Ram

----------------------------------------------------------
Netcore Solutions Pvt. Ltd.
Website: http://www.netcore.co.in
Spamtraps: http://cleanmail.netcore.co.in/directory.html
----------------------------------------------------------

--
To unsubscribe, e-mail: beginners-***@perl.org
For additional commands, e-mail: beginners-***@perl.org
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Offer Kaye

2005-03-21 14:48:39 UTC

Permalink

Post by Ramprasad A Padmanabhan
Hello All
I want to remove all characters with ascii values > 127 from a string
Can someone show me a efficient way of doing this.
Currently what I am doing is reading the string char-by-char and check
its ascii value. I think there must be a better way.
Thanks
Ram

$string =~ s/(.)/(ord($1) > 127) ? "" : $1/egs;

Overview:
* For every ("g") char(".") (saved in $1), if its numeric code is >
127, replace it with an empty string, otherwise leave it alone.

Explanation:
* if the string to be modified is saved in $string, than:
$string =~ s/(.)/EXPR/g;
will replace every character in $string with EXPR, due to the "g" modifier.
* Since I'm using the "e" modifier as well, EXPR is taken to be Perl
code, which is evaluated (think as "e" as short for "eval"), and the
value it returns is used for the substitution. It basically means
means writing an inline subroutine that returns the value you want,
depending on the current char (which was saved in $1).
* See "perldoc -f ord" for an explanation of the ord() function, and
"perldoc perlop" for an explanation of the ?: "Conditional Operator".
* The "s" modifier is just an extra precaution on my side - if you
have embedded newlines in your string, "." will not match them, unless
you use the "s" modifier. Actually in this case it is not really
needed, I guess.

Hope this helps,

--
Offer Kaye
--
To unsubscribe, e-mail: beginners-***@perl.org
For additional commands, e-mail: beginners-***@perl.org
<http://learn.perl.org/> <http://learn.perl.org/first-response>

John W. Krahn

2005-03-21 20:53:36 UTC

Permalink

Post by Ramprasad A Padmanabhan
Hello All

Hello,

Post by Ramprasad A Padmanabhan
I want to remove all characters with ascii values > 127 from a string

By definition ASCII only includes the characters in the range 0 to 127 so
those are non-ASCII characters.

Post by Ramprasad A Padmanabhan
Can someone show me a efficient way of doing this.
Currently what I am doing is reading the string char-by-char and check
its ascii value. I think there must be a better way.

$string =~ tr/\x80-\xFF//d;

John

--
use Perl;
program
fulfillment
--
To unsubscribe, e-mail: beginners-***@perl.org
For additional commands, e-mail: beginners-***@perl.org
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Offer Kaye

2005-03-22 12:13:54 UTC

Permalink

Post by John W. Krahn
$string =~ tr/\x80-\xFF//d;

<joke>
No no, he can't use that - that solution is much too elegant! It will
also quite probably run faster than my suggested solution!
</joke>

Very nice solution! Here's a variation, using the "s///" operator:
$string =~ s/[\x80-\xFF]//g;

--
Offer Kaye
--
To unsubscribe, e-mail: beginners-***@perl.org
For additional commands, e-mail: beginners-***@perl.org
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Chris Devers

2005-03-22 12:18:16 UTC

Permalink

Post by Offer Kaye

Post by John W. Krahn
$string =~ tr/\x80-\xFF//d;

$string =~ s/[\x80-\xFF]//g;

If you benchmark it, I suspect the tr/// version will be much faster.

It's a simpler operation than s///, so if you can get away with using a
translation instead of a substitution, you should get a speed boost.
In a lot of cases, the tr/// is *too* simple, and you're stuck. But in
this example, it works, and should do well.

As always though, the only way to be positive is to measure it :-)

--
Chris Devers
--
To unsubscribe, e-mail: beginners-***@perl.org
For additional commands, e-mail: beginners-***@perl.org
<http://learn.perl.org/> <http://learn.perl.org/first-response>

John W. Krahn

2005-03-22 20:13:45 UTC

Permalink

Post by Offer Kaye

Post by John W. Krahn
$string =~ tr/\x80-\xFF//d;

<joke>
No no, he can't use that - that solution is much too elegant! It will
also quite probably run faster than my suggested solution!
</joke>
$string =~ s/[\x80-\xFF]//g;

Well, s/he did say s/he wanted a regex so I originally thought
s/[^[:ascii:]]+//g but then s/he asked for "a efficient way" so I had to go
with transliteration. :-)

John