#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    4
    Rep Power
    0

    Remove special signs


    Hi

    I'm looking for a way to remove accents from a string. So '' must become 'eao'.
    I've tried Unidecode and Unaccent, but none of them seem to work.

    Thanks.
  2. #2
  3. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,607
    Rep Power
    4247
    Define "not work". Were you not able to install the modules, did the modules not produce what you were expecting, etc.

    For what it is worth, have you triedText::Unaccent::PurePerl which is like Text::Unaccent, but written in perl purely, so it doesn't require you to install other modules.
    Up the Irons
    What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
    "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
    Down with Sharon Osbourne

    "I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    828
    Rep Power
    496
    If worse come to worse, you could always do something like this on each line of your file:

    Perm Code:
    $line =~ tr//aeeuaeioucei/;
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    4
    Rep Power
    0
    Originally Posted by Scorpions4ever
    Define "not work". Were you not able to install the modules, did the modules not produce what you were expecting, etc.
    Code:
    use strict;
    use Text::Unaccent::PurePerl;
    
    my $toConvert = "";
    my $converted = unac_string($toConvert);
    print $converted;
    Output: A�A A�

    Code:
    use strict;
    use Text::Unidecode;
    
    my $toConvert = "";
    my $converted = unidecode($toConvert);
    print $converted;
    Output: A(c)A AP
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    4
    Rep Power
    0
    Originally Posted by Laurent_R
    If worse come to worse, you could always do something like this on each line of your file:

    Perm Code:
    $line =~ tr//aeeuaeioucei/;
    Code:
    use strict;
    
    my $toConvert = "";
    $toConvert =~ tr//aeeuaeiooucei/;
    print $toConvert;
    Output: auaeai

    'print length($toConvert)' gives 6.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2007
    Posts
    765
    Rep Power
    929
    Code:
    F:\temp>type accents.pl
    my $text = "";
    print length($text);
    
    F:\temp>perl accents.pl
    6
    
    F:\temp>type accents2.pl
    use utf8;
    my $text = "";
    print length($text);
    
    F:\temp>perl accents2.pl
    3
    Perl assumes an 8-bit character encoding unless you tell it otherwise.

    Comments on this post

    • TheChosenOne agrees
    sub{*{$::{$_}}{CODE}==$_[0]&& print for(%:: )}->(\&Meh);
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    828
    Rep Power
    496
    Consider this session under the Perl debugger (on a Unix box):

    Perl Code:
      DB<1> $string = "";
     
      DB<2> $string =~ tr//aeeuaeiooucei/;
     
      DB<3> print $string
    ucaiee


    So, this works for standard ASCII characters. Here you probably have Unicode or UTF8 encoding. I can't help you very much since I do not have a similar environment here.

    Comments on this post

    • TheChosenOne agrees
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    4
    Rep Power
    0
    Thanks for your answers.
    The use of utf8 solved the problem.

    I was converting the tags of of flac files to a more robust form (so it can be read by music players that cannot deal with special characters).
    The trick is to decode the input with utf8 and then use unidecode to remove all special characters with unidecode.
    Code:
    use strict;
    use utf8;
    use Text::Unidecode;
    use Audio::FLAC::Header;
    
    my $flac = Audio::FLAC::Header->new($file);
    my $tags = $flac->tags();
    my $title = $tags->{"TITLE"};
    utf8::decode($title);
    my $converted = unidecode($title);
    print $converted;
    Thread closed.

IMN logo majestic logo threadwatch logo seochat tools logo