#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2017
    Posts
    25
    Rep Power
    0

    filter out some data that contains certain marker - words


    hello dear folks - good day dear php-experts,

    i have some lines of data - approximately 500 lines - within this set of data i have some lines i want to filter out... - each line where the marker word "span" is contained:


    Code:
    'url': 'http:www.url1.com', 'the_firstname': 'Pope', 'lastname': 'of Rome', 'the email-adress': 'popeofrome@hotmail.com'
    'url': 'http:www.url2.org', 'the_firstname': 'Ben', 'lastname': 'hur', 'the email-adress': 'ben@hotmail.com'
    'url': 'http:www.url3.net', 'the_firstname': 'Ali', 'lastname': 'the champ', 'the email-adress': 'ali-thechamp0hotmail.de'
    'url': 'http:www.url4.at', 'the_firstname': 'meetoo', 'lastname': 'youtoo', 'the email-adress': 'meetoo@hotmail.com'
    see the line with the markerword :
    Code:
    'url': 'http:www.url5.at', 'the_firstname': 'spam', 'lastname': 'spam', 'the email-adress': 'spamadress@hotmail.com'
    note: i have hundreds of lines of data....

    how to do that - how can i filter out each line which contains the word spam?

    Is there an appropiate way?
  2. #2
  3. Banned (not really)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 1999
    Location
    Caro, Michigan
    Posts
    14,961
    Rep Power
    4575
    Where's the data? In a file? database? That'll kinda determine how to filter it.

    Answer is probably going to be to read it line by line and only save the data that doesn't have "spam" in it.
    -- Cigars, whiskey and wild, wild women. --
  4. #3
  5. Banned (not really)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 1999
    Location
    Caro, Michigan
    Posts
    14,961
    Rep Power
    4575
    Where's the data? In a file? database? That'll kinda determine how to filter it.

    Answer is probably going to be to read it line by line and only save the data that doesn't have "spam" in it.
    -- Cigars, whiskey and wild, wild women. --
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2017
    Posts
    25
    Rep Power
    0

    Thumbs up


    hello dear Sepodati,

    first of all - many many thanks for the quick reply. 'Great to hear from you. The data is the result of a parsing process. And if i am able to store it into a
    mysql db it would be even more fantastic.


    Originally Posted by Sepodati
    Where's the data? In a file? database? That'll kinda determine how to filter it.
    Answer is probably going to be to read it line by line and only save the data that doesn't have "spam" in it.
    If were be able to do this - store it into a Mysql-db. then i would be more than glad.


  8. #5
  9. Banned (not really)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 1999
    Location
    Caro, Michigan
    Posts
    14,961
    Rep Power
    4575
    So the data is in a PHP variable? In an array or just all the lines in one variable?
    -- Cigars, whiskey and wild, wild women. --
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2017
    Posts
    25
    Rep Power
    0
    hello dear Sepodati,


    first of all many thanks for the quick reply - great to hear from you.
    The data were spit out - as a result - of this parsing job.



    Code:
    import urllib
    import urlparse
    import re
    
    url = "http://search.cpan.org/author/?W"
    html = urllib.urlopen(url).read()
    for lk, capname, name in re.findall('<ahref="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html):
        alk = urlparse.urljoin(url, lk)
        data = { 'url':alk, 'name':name, 'cname':capname }
        phtml = urllib.urlopen(alk).read()
        memail = re.search('<a href="mailto:(.*?)">', phtml)
        if memail:
            data['email'] = memail.group(1)
    
        print data

    the output looks like so;

    Code:
    {'url': 'http://search.cpan.org/~aayars/', 'cname': 'AAYARS', 'name': 'Alex Ayars', 'email': 'pause%40nodekit.org'}
    {'url': 'http://search.cpan.org/~abablabab/', 'cname': 'ABABLABAB', 'name': 'ross', 'email': 'cpan%40abablabab.co.uk'}
    {'url': 'http://search.cpan.org/~aayars/', 'cname': 'AAYARS', 'name': 'Alex Ayars', 'email': 'pause%40nodekit.org'}
    it would be great if we can store this automatically to a db.

    I guess that this is possible - - as a

    - python - list
    - python - ditionary

    or do you have any idea.

    greetings


    update:
    well for the database connection we can do something like so:


    Code:
    <?php 
    /*
    /**
    * storing some data in the mysql db. how to do that!? 
    *
    * PHP Version: 5.4 - Can be back-ported to 5.3 by using 5.3 Array-Syntax (not PHP 5.4's square brackets)
    */
    
    $dbhost = 'localhost';
    $dbuser = 'root';
    $dbpass = '';
    $conn = mysql_connect($dbhost, $dbuser, $dbpass);
    mysql_select_db('map');
    
    if (!$conn)
        {
        die('Could not connect: ' . mysql_error());
        }
    
    echo 'Connected successfully';
    .... ands so forth ands so forth...
  12. #7
  13. Banned (not really)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 1999
    Location
    Caro, Michigan
    Posts
    14,961
    Rep Power
    4575
    Fix your python code to not create the entry if "spam" is there.

    Wouldn't it be better to filter at the source rather than having PHP do it?
    -- Cigars, whiskey and wild, wild women. --

IMN logo majestic logo threadwatch logo seochat tools logo