October 9th, 2018, 07:39 AM
-
remove <span></span>
I would like to remove all span html tags (and the contents in it) and the end of span too. <span xxx...></span>
I also need to remove chunk in between <p xxxx....> tags - just leaving <p>.
I would prefer two separate operations so that I can learn to do this on my own.
What is the best way to do this?
----------
Update: I found the answer on Stack Overflow
'/<span[^>]+\>/i' and '/<\/span\>/i'
October 9th, 2018, 08:17 AM
-
Well regex is going to be the wrong tool for the job.
You'll find it hard to exclude all the places where span can appear (such as the body of your text) from actual spans you want to remove.
Use a tool which knows how to navigate HTML/XML - https://en.wikipedia.org/wiki/XPath
October 11th, 2018, 06:04 AM
-
I don't share your skepticism. What is needed is to find every "<span", erase it and everything until you reach a ">" and erase it. I don't know regex well enough to do it but it would be something like "<span[*]>".
October 11th, 2018, 10:42 AM
-
Go ahead and try it on the HTML of your original post.