Regex Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming Languages - MoreRegex Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old March 30th, 2012, 01:58 PM
Tom9999 Tom9999 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2012
Posts: 4 Tom9999 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 48 m 57 sec
Reputation Power: 0
Talking Perl - Can PCRE search string in docx and xlsx files?

Hi,

I was asked to search string in 2007 Microsoft Word (.docx) and Excel (.xlsx) files. I used regex before, and I thought this would be a good tool to use.

I can search string "ABC" ( using [A][B][C] ) in 2003 Microsoft Word, but failed to search in 2007 Word and Excel. I even tried Unicode search using \x{0041}\x{0042}\x{0043} also failed.

Could someone please advice what did I do wrong or is it possible?

Thank you in advance

Tom

Reply With Quote
  #2  
Old March 30th, 2012, 02:22 PM
requinix's Avatar
requinix requinix is offline
Still alive
Click here for more information.
 
Join Date: Mar 2007
Location: Washington, USA
Posts: 12,680 requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)  Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 5 Months 1 Week 4 Days 1 h 55 m 14 sec
Reputation Power: 8969
Send a message via AIM to requinix Send a message via MSN to requinix Send a message via Yahoo to requinix Send a message via Google Talk to requinix
.docx and .xlsx files are zip-compressed. The odds of finding the original text in plain form are low.
Extract the archives and search the relevant file(s) for your text.

Reply With Quote
  #3  
Old March 30th, 2012, 03:19 PM
Tom9999 Tom9999 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2012
Posts: 4 Tom9999 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 48 m 57 sec
Reputation Power: 0
Smile

Thanks requinix.

Please forgive me for not making it clear ... when I created and saved a new 2007 Word/Excel document, it automatically stored file types as docx or xslx; i.e test.docx (Word) or test.xlsx (Excel).

Hope this clarifies

Thanks again

Tom

Reply With Quote
  #4  
Old March 30th, 2012, 03:22 PM
Tom9999 Tom9999 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2012
Posts: 4 Tom9999 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 48 m 57 sec
Reputation Power: 0
Smile

The strange thing is my regex works for 2003 Word (file type is doc )

Reply With Quote
  #5  
Old March 30th, 2012, 03:46 PM
requinix's Avatar
requinix requinix is offline
Still alive
Click here for more information.
 
Join Date: Mar 2007
Location: Washington, USA
Posts: 12,680 requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)  Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 5 Months 1 Week 4 Days 1 h 55 m 14 sec
Reputation Power: 8969
Send a message via AIM to requinix Send a message via MSN to requinix Send a message via Yahoo to requinix Send a message via Google Talk to requinix
Quote:
Originally Posted by Tom9999
Please forgive me for not making it clear ... when I created and saved a new 2007 Word/Excel document, it automatically stored file types as docx or xslx; i.e test.docx (Word) or test.xlsx (Excel).

Okay... That doesn't change anything.

Quote:
Originally Posted by Tom9999
The strange thing is my regex works for 2003 Word (file type is doc )

They're different file types: .doc files are binary but the text content might not be compressed. Small and simple searches would probably work fine.

Reply With Quote
  #6  
Old March 31st, 2012, 07:50 AM
Tom9999 Tom9999 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2012
Posts: 4 Tom9999 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 48 m 57 sec
Reputation Power: 0
Smile

Thank you for your help, requinix.! Now I know I need to find an alternate solution.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreRegex Programming > Perl - Can PCRE search string in docx and xlsx files?

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap