#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    4
    Rep Power
    0

    Talking Can PCRE search string in docx and xlsx files?


    Hi,

    I was asked to search string in 2007 Microsoft Word (.docx) and Excel (.xlsx) files. I used regex before, and I thought this would be a good tool to use.

    I can search string "ABC" ( using [A][B][C] ) in 2003 Microsoft Word, but failed to search in 2007 Word and Excel. I even tried Unicode search using \x{0041}\x{0042}\x{0043} also failed.

    Could someone please advice what did I do wrong or is it possible?

    Thank you in advance

    Tom
  2. #2
  3. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,749
    Rep Power
    9397
    .docx and .xlsx files are zip-compressed. The odds of finding the original text in plain form are low.
    Extract the archives and search the relevant file(s) for your text.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    4
    Rep Power
    0

    Smile


    Thanks requinix.

    Please forgive me for not making it clear ... when I created and saved a new 2007 Word/Excel document, it automatically stored file types as docx or xslx; i.e test.docx (Word) or test.xlsx (Excel).

    Hope this clarifies

    Thanks again

    Tom
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    4
    Rep Power
    0

    Smile


    The strange thing is my regex works for 2003 Word (file type is doc )
  8. #5
  9. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,749
    Rep Power
    9397
    Originally Posted by Tom9999
    Please forgive me for not making it clear ... when I created and saved a new 2007 Word/Excel document, it automatically stored file types as docx or xslx; i.e test.docx (Word) or test.xlsx (Excel).
    Okay... That doesn't change anything.

    Originally Posted by Tom9999
    The strange thing is my regex works for 2003 Word (file type is doc )
    They're different file types: .doc files are binary but the text content might not be compressed. Small and simple searches would probably work fine.
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    4
    Rep Power
    0

    Smile


    Thank you for your help, requinix.! Now I know I need to find an alternate solution.

IMN logo majestic logo threadwatch logo seochat tools logo