#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    8
    Rep Power
    0

    How to extract font-family in HTML Code...?? Using Javascript/Regex..??


    Hi Guys,
    I am currently working on some Html utility
    chethan.co.in/htmlhelper

    I wanted to extract the font-family used in the HTML Code using javascript.
    I found this piece of Regular Expression which can find only the normal fonts.

    Regular Expression:font-family\s*?:.*?(;|(?=""|'|;))

    i.e font-family:Arial, Helvetica, sans-serif

    I am not able to match the font-family like

    font-family: 'Lucida Sans', 'Lucida Sans Unicode', 'Lucida Grande', sans-serif;

    is there any methods available to extract the fonts/font-family..??
  2. #2
  3. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,749
    Rep Power
    9397
    Where is this CSS contained? A stylesheet? Or inline? If it's inline then you need to know the quotes used...
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    8
    Rep Power
    0
    Originally Posted by requinix
    Where is this CSS contained? A stylesheet? Or inline? If it's inline then you need to know the quotes used...
    sorry forgot to mention... alll inline styles....
  6. #4
  7. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,749
    Rep Power
    9397
    Wait a second...
    Originally Posted by chethan.bsc
    I wanted to extract the font-family used in the HTML Code using javascript.
    Using Javascript? Going through the DOM would be a lot easier and let you bypass the quoting problem.
    Load the text into a DOM node (without adding it to the document) then scan it for whatever. Makes grabbing images much easier too.
    simple version:
    Code:
    (function(input) {
    	var div = document.createElement("div");
    	try {
    		div.innerHTML = input;
    	} catch (e) {
    		alert(e.message);
    		return;
    	}
    
    	// images
    	var images = div.getElementsByTagName("IMG");
    	console.log(images);
    
    	// font-family
    	var fontfamily = [];
    	var ffstack = [div];
    	while (ffstack.length) {
    		var ff = ffstack.pop();
    		if (ff.style && ff.style.fontFamily) {
    			fontfamily.push(ff);
    		}
    		if (ff.childNodes) {
    			for (var i = 0; i < ff.childNodes.length; i++) {
    				ffstack.push(ff.childNodes[i]);
    			}
    		}
    	}
    	console.log(fontfamily);
    })("<html><div style='font-family: bar;'>Bar with an <img src='#foo' /></div></html>");
    My Chrome 24 says
    Code:
    [img, item: function]
    	0: img
    		...
    		outerHTML: "<img src="#foo">"
    		..
    		src: "http://forums.devshed.com/newreply.php#foo"
    		...
    	length: 1
    	__proto__: NodeList
    Code:
    [div]
    	0: div
    		...
    		childElementCount: 1
    		childNodes: NodeList[2]
    		children: HTMLCollection[1]
    		...
    		innerHTML: "Bar with an <img src="#foo">"
    		innerText: "Bar with an "
    		...
    		outerHTML: "<div style="font-family: bar;">Bar with an <img src="#foo"></div>"
    		outerText: "Bar with an "
    		...
    		style: CSSStyleDeclaration
    			0: "font-family"
    			...
    			cssText: "font-family: bar;"
    			...
    			fontFamily: "bar"
    			...
    		...
    	length: 1
    	__proto__: Array[0]
    Tries to load the images though, not a fan of that.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    8
    Rep Power
    0
    Thanks Pal..
    I am new to Javascript. i don't have any idea how to use
    Code:
    function(input)
    I have changed your code something like this..
    Code:
    function demo(input,output) {
    	var div = document.createElement("div");
    	try {
    		div.innerHTML = input;
    	} catch (e) {
    		alert(e.message);
    		return;
    	}
    
    	// images
    	var images = div.getElementsByTagName("IMG");
    	console.log(images);
    
    	// font-family
    	var fontfamily = [];
    	var ffstack = [div];
    	while (ffstack.length) {
    		var ff = ffstack.pop();
    		if (ff.style && ff.style.fontFamily) {
    			fontfamily.push(ff);
    		}
    		if (ff.childNodes) {
    			for (var i = 0; i < ff.childNodes.length; i++) {
    				ffstack.push(ff.childNodes[i]);
    			}
    		}
    	}
    	console.log(fontfamily);
    	output.value = fontfamily;
    }
    But i have gone through the console log, it contains all the images & td's with font-family.
    But still don know how to extract font-family...??
    I am getting output like this
    Code:
    [object HTMLImageElement],[object HTMLImageElement],[object HTMLImageElement],[object HTMLImageElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLImageElement],[object HTMLImageElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLImageElement],[object HTMLImageElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement],[object HTMLTableCellElement]
  10. #6
  11. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,749
    Rep Power
    9397
    What do you want to do with the images and the font-familys? What should the output look like?

    For the images you can loop over the images "array". For the font-family, where the "fontfamily.push(ff);" is you can put in your own code (then get rid of the fontfamily variable).
    You can remove the console.log()s whenever you're ready, but it might help to keep them in a bit longer so you can watch for any problems with your test inputs.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    8
    Rep Power
    0
    Originally Posted by requinix
    What do you want to do with the images and the font-familys? What should the output look like?

    For the images you can loop over the images "array". For the font-family, where the "fontfamily.push(ff);" is you can put in your own code (then get rid of the fontfamily variable).
    You can remove the console.log()s whenever you're ready, but it might help to keep them in a bit longer so you can watch for any problems with your test inputs.
    Basically this is the html i will be using
    Code:
    <html>
    <head>
    <title>Sample HTML Code</title>
    </head>
    <body>
    <table width="100%">
    <tr>
    <td style="font-family:Verdana, Geneva, sans-serif;">some text></td>
    <td style="font-family:Arial, Helvetica, sans-serif">some text></td>
    <td style="font-family:'Lucida Sans', 'Lucida Sans Unicode', Arial, Helvetica, sans-serif;">some text></td>
    <td style="font-family:'Lucida Sans Unicode', 'Lucida Grande', Arial, Helvetica, sans-serif">some text></td>
    </tr>
    </table>
    </body>
    </html>
    and i made some changes as per your instruction
    Code:
    function demo(input,output) {
    	var div = document.createElement("div");
    	try {
    		div.innerHTML = input;
    	} catch (e) {
    		alert(e.message);
    		return;
    	}
    
    	// images
    	var images = div.getElementsByTagName("IMG");
    
    	// font-family
    	var fontfamily = [];
    	var ffstack = [div];
    	while (ffstack.length) {
    		var ff = ffstack.pop();
    		if (ff.style && ff.style.fontFamily) {
    				fontfamily.push(ff.style.fontFamily+"\n\n");
    		}
    		if (ff.childNodes) {
    			for (var i = 0; i < ff.childNodes.length; i++) {
    				ffstack.push(ff.childNodes[i]);
    			}
    		}
    	}
    	output.value = Duplicates(fontfamily);
    }
    
    function Duplicates(arr) {
        var i,
        lenn = arr.length,
            out = [],
            obj = {};
    
        for (i = 0; i < lenn; i++) {
            obj[arr[i]] = 0;
        }
        for (i in obj) {
            out.push(i);
        }
        return out;
    }
    i am getting my output. its working nicely
    Code:
    Verdana, Geneva, sans-serif
    
    ,Arial, Helvetica, sans-serif
    
    ,'Lucida Sans', 'Lucida Sans Unicode', Arial, Helvetica, sans-serif
    
    ,'Lucida Sans', 'Lucida Sans Unicode', 'Lucida Grande', sans-serif
    
    ,'Lucida Sans Unicode', 'Lucida Grande', Arial, Helvetica, sans-serif
    you can look into my application chethan.co.in/htmlhelper
    I have tried with regex also... you can find the difference.. i found this method is more effective than the regex...
    is there any way to remove "," only at the begining of the font-family?
  14. #8
  15. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,749
    Rep Power
    9397
    That comma is showing up because there's an empty element in obj or out (in Duplicates).
    Also, using "for (X in Y)" can be dangerous because you might get more than you expect.
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    8
    Rep Power
    0
    Originally Posted by requinix
    That comma is showing up because there's an empty element in obj or out (in Duplicates).
    Also, using "for (X in Y)" can be dangerous because you might get more than you expect.
    I have fixed that issue now here is my final Code
    Code:
    function demo(input,output) {
    	var div = document.createElement("div");
    	try {
    		div.innerHTML = input;
    	} catch (e) {
    		alert(e.message);
    		return;
    	}
    	// images
    	var images = div.getElementsByTagName("IMG");
    	//console.log(images);
    
    	// font-family
    	var fontfamily = [];
    	var ffstack = [div];
    	while (ffstack.length) {
    		var ff = ffstack.pop();
    		if (ff.style && ff.style.fontFamily) {
    				fontfamily.push(ff.style.fontFamily);
    		}
    		if (ff.childNodes) {
    			for (var i = 0; i < ff.childNodes.length; i++) {
    				ffstack.push(ff.childNodes[i]);
    			}
    		}
    	}
    	//console.log(fontfamily);
        document.getElementById("resultBlock").style.display = "block";
    	var vArray = Duplicates(fontfamily);
        var fontss = vArray.join("\n\n");
    	output.value =fontss;
    }
    
    function Duplicates(arr) {
        var i,
        lenn = arr.length,
            out = [],
            obj = {};
    
        for (i = 0; i < lenn; i++) {
            obj[arr[i]] = 0;
        }
        for (i in obj) {
            out.push(i);
        }
        return out;
    }
    now I am getting my desired output
    Code:
    'Lucida Sans', 'Lucida Sans Unicode', Arial, Helvet, sans-serifa
    
    'Lucida Sans', 'Lucida Sans Unicode', Arial, Helvetica, sans-serifa
    
    'Lucida Sans', 'Lucida Sans Unicode', Arial, Helvetica, sans-serif
    
    'Lucida Sans', 'Lucida Sans Unicode', 'Lucida Grande', sans-serif
    
    'Lucida Sans Unicode', 'Lucida Grande', Arial, Helvetica, sans-serif
    Thanks Dude..
  18. #10
  19. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,749
    Rep Power
    9397
    Ah, I see what happened. The \n\ns were too early in the process so it looked like there were extra commas when actually there weren't. What you've done is what I would have recommended.
  20. #11
  21. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    8
    Rep Power
    0
    Originally Posted by requinix
    Ah, I see what happened. The \n\ns were too early in the process so it looked like there were extra commas when actually there weren't. What you've done is what I would have recommended.
    i was wondering.... do you know how to get the line number, i am having the javascript function uses regular expression which gives a list of images without alt/title attribute...
    Alt missing in line no:___ <img src="images/Demo.jpg" border="0">
    My HTML will look like this...
    Code:
    <html>
    <head>
    <title>Untitled Document</title>
    </head>
    <body>
    <table width="600" cellpadding="0" cellspacing="0" border="0">
    <tr>
    	<td width=""><img src="images/Demo.jpg" border="0"></td>
    	<td width=""><img src="images/Demo.jpg" border="0"></td>
    </tr>
    <tr>
    	<td width=""><img src="images/Demo.jpg" border="0"></td>
    	<td width=""><img src="images/Demo.jpg" border="0"></td>
    </tr>
    </table>
    </body>
    </html>
    Below is the function
    Code:
    function AltTitle(aSourceHTML, aResultField) {
        try {
    	regexp = /<img((?:(?!alt)[^<>])*)>/gim;
            var vArray = aSourceHTML.match(regexp);
            var vLinks = vArray.join("\n\n");
            aResultField.value = vLinks;
        } catch (err) {
              alert("No Images Found");
        }
    }
    This will return out put like this
    Code:
    <img src="images/Demo.jpg" border="0">
    
    <img src="images/Demo.jpg" border="0">
    
    <img src="images/Demo.jpg" border="0">
    
    <img src="images/Demo.jpg" border="0">
    How to get the line number where alt/title is missing..??

IMN logo majestic logo threadwatch logo seochat tools logo