#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2011
    Posts
    2
    Rep Power
    0

    Question Solr errors when indexing custom file extensions.


    Greetings!

    I am working on my company's website and need to be able to index the web pages with solr. The site is configured to read .ak file extensions as .cfm files, but Solr errors when trying to index them.

    While testing I found that if I remove the <head> tags from the documents there are no errors. I've looked into the Solr config files for a location to tell Solr that .ak files should be parsed as cfm files. I have been unable to find such a setting, does one exist? Is there maybe another way to resolve this issue?



    Thanks for your help,
    Dave
  2. #2
  3. No Profile Picture
    Moderator

    Join Date
    Jun 2002
    Location
    Raleigh, NC
    Posts
    5,278
    Rep Power
    968
    Well, if I follow you correctly, I'm not sure this would do what you think it will. When you point Solr at a directory, it indexes the file content. So Solr has no idea what "ColdFusion" means, it just parses the raw text of the files. Which probably isn't going to do much good if your CF templates are actually showing dynamic data at runtime.

    Consider a CF template named product.cfm. At runtime you might pass a url variable like product.cfm?id=20 which would show the information for the product with the ID of 20. But when Solr indexes product.cfm, it has no idea about product IDs or anything else, it's just going to index the actual text in the product.cfm file.

    Make sense?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2011
    Posts
    2
    Rep Power
    0
    Kiteless,
    Thanks for your response. I understand what you are saying. The pages I am trying to index have some static content placed for the indexing.

    When I index a directory that has duplicate files with both the .cfm and .ak extensions. If I index just the .cfm files I have no problems. But, when I index the .ak versions the indexing errors and finds 0 files. (The only difference in the files is the filename extension) This happens when indexing through the Administrator window as well as cfindex. If I remove the header tags, the indexing returns no errors and indexes the files properly.
  6. #4
  7. No Profile Picture
    Moderator

    Join Date
    Jun 2002
    Location
    Raleigh, NC
    Posts
    5,278
    Rep Power
    968
    Hmm I'm not sure then, as I haven't needed to try and dig into the guts of Solr myself. My guess is that since the Solr instance doesn't know how to process that extension, it's treating it in some default way. Maybe as XML, or maybe it is grabbing the content and trying to force it into an XML CDATA block. Which means anything in the file that would be interpreted as invalid XML could make it blow up.

    That said, a quick look at the Solr docs doesn't help much. Once again, the CF engineers have done an amazing job of taking something really complicated and making it easy to use. So my guess is you'll need to pour over the Solr docs or grab one of the Solr books to figure out what Solr config or setting will make it handle that extension the way you want it to. :-/

IMN logo majestic logo threadwatch logo seochat tools logo