Hello everybody.
Probably the title of this post is not very clear, sorry for that ;).
I have a bunch of text (html code) and need to find <p> tags with their classes, id, styles (if any) etc. I'm doing this using the following regexs:
<p(.*?)> or (<p([^>]+))>
The pattern of my text is here:
<p class="navi_buttons">Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
<p class="reg">Aliquam mi sapien, rutrum eget sem vel, semper efficitur.<a href="xyz.html" class="topiclink">vitae velit</a></p>
<p class="THIS_SHOULD_BE_AVOIDED">Donec fringilla sapien vitae interdum volutpat.</p>
<p class="nav">Cras nec orci non dolor ultrices luctus sit amet vitae velit.</p>
The problem is that I need to find every occurrence of <p> tag except one certain class (i.e. I want to avoid paragraph tags of this class). I don't know how to write a regex exclusion that is treated as a string, not a set of the individual characters? I tried to use back-references, with no success. I want to use regex because the tag classes, to be avoided, are different on each page (but they keep a certain pattern) and a the job should be done as automatic as possible (the code should be as versatile as possible).
I will appreciate any help. Kind regards,
gordom