LO Writer and regex - finding "everything" but one thing

gordom · May 21, 2015, 7:38am

Hello everybody.
Probably the title of this post is not very clear, sorry for that ;).

I have a bunch of text (html code) and need to find tags with their classes, id, styles (if any) etc. I'm doing this using the following regexs:
<p(.*?)> or (<p([^>]+))>

The pattern of my text is here:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Aliquam mi sapien, rutrum eget sem vel, semper efficitur.<a href="xyz.html" class="topiclink">vitae velit</a>

Donec fringilla sapien vitae interdum volutpat.

Cras nec orci non dolor ultrices luctus sit amet vitae velit.

The problem is that I need to find every occurrence of tag except one certain class (i.e. I want to avoid paragraph tags of this class). I don't know how to write a regex exclusion that is treated as a string, not a set of the individual characters? I tried to use back-references, with no success. I want to use regex because the tag classes, to be avoided, are different on each page (but they keep a certain pattern) and a the job should be done as automatic as possible (the code should be as versatile as possible).
I will appreciate any help. Kind regards,

gordom

mariosv · May 21, 2015, 10:17pm

Hi @gordon,

I hope this is what you are looking for.

<p(?!.*topic.*).*<\/p>

finds paragraph with 'p' tag without 'topic¡ inside.

(?!.*topic.*).* does the exclusion.

A good place to text regex: https://regex101.com/

Miguel Ángel.

<http://nabble.documentfoundation.org/file/n4149169/Captura1.png>

gordom · May 22, 2015, 3:17pm

Thank you very much Miguel for your answer. You helped me a lot :).
Regards,
gordom

W dniu 2015-05-22 o 00:17, m.a.riosv pisze: