Thread: Solr Search

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    6
    Rep Power
    0

    Solr Search


    Great site, this is my first post.

    I am running CF9.
    I have a folder of .doc and .docx files I want to search he file contents, I don't need the meta data, just the viewable content.
    I have these same files listed in a MS SQL 2005 table.

    What I'd like is to be able to search the contents of the .doc and .docx and return the primary key and extension that are listed in the table.

    Am I making this more complicated than it needs to be? I thought this would be easier, but it's making me crazy.

    Thanks for the help
  2. #2
  3. No Profile Picture
    Moderator

    Join Date
    Jun 2002
    Location
    Raleigh, NC
    Posts
    5,269
    Rep Power
    968
    If you think about what you're asking, this isn't easy at all. You have two separate sets of data (a directory of files, and a database table), and you want to search the content of the files but map the search results to a database row.

    That said, I believe Example #5 (toward the bottom) of the cfindex page in the docs shows one way to blend database results and file path indexing.
    Last edited by kiteless; December 18th, 2012 at 05:39 PM.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    6
    Rep Power
    0
    Thank you very much for your reply.

    I had seen that reference, but didn't see where I told it where the files are.

    OK, maybe I can make it easier. When I upload the files, I name them with the primary key from the same database table I want to later reference. So in the folder it might look like:
    1,doc
    2.docx
    7.docx
    9.doc
    ...


    Is there a way to index the folder and use the filename as the key? Then I can use what the search returns to query the database based on the key which will match the primary key from the table.

    Thanks again
  6. #4
  7. No Profile Picture
    Moderator

    Join Date
    Jun 2002
    Location
    Raleigh, NC
    Posts
    5,269
    Rep Power
    968
    I've actually only used Solr to index database data OR documents, but not both of them at once. What you're talking about is probably possible, but it's an usual case and is not the normal way indexing is done. Other than going over the docs and doing some trial and error, I'm afraid I can't offer much on this specific scenario.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    6
    Rep Power
    0
    OK, thanks, but in the second proposal I think I was only asking how to index the docs, all of the docs in a single folder.
    I'll do some more searching.
  10. #6
  11. No Profile Picture
    Moderator

    Join Date
    Jun 2002
    Location
    Raleigh, NC
    Posts
    5,269
    Rep Power
    968
    Ah, the Example #2 on the cfindex docs page shows indexing a file path.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    6
    Rep Power
    0
    It worked!!!

    In case anyone else tries something similar, and I understand this is pretty basic, but here is what I did.

    Here is the index I'll run nightly to keep the search updated.

    <cfindex
    collection="docs"
    action="refresh"
    type="path"
    key="my path"
    extensions=".doc, .docx"
    URLpath="my URL">

    Then from the Key I can get the file name:
    <cfset FileName=GetFileFromPath(Key)>

    I'll use that variable in a query to get a list of documents from the database.

    The one hiccup I had is getting the number from the key. I used something similar in another search, but all of the documents were .pdf so I could simply strip off the 4 characters on the right. I don't have anything like that this time. The file might be 1.doc or 1342.docx.
    So I used this to get the number portion of the Key:
    <cfset SearchFile = ReReplaceNoCase(FileName,"[^0-9,]","","ALL")>

    Then I'll query the database to find all of those stripped Keys that will match the primary key.

    Thank you for your patience and guidance.

    Cliff

IMN logo majestic logo threadwatch logo seochat tools logo