anyone knows how to find/replace not-printable glyphs?

I have a problem with a old book text file. There are 3 or more lines between paragraphs. I want to reduce them down to only one.

When I turned the option to view non-printed characters, I get the "PI-ish" looking symbol - ¶ - that is at the end of every paragraph. Each extra line has that symbol as its first character. The character listed in the last line of this text looks like the non-printing character, but must be different from it, since it gives a "no match found" error when using it.

So, is there a way to use that end of paragraph symbol/character to look for three in a row and replace it with only two? Since some of the paragraphs have more than 3 extra lines between paragraphs, I could run that find/replace several times. All I want to show is the symbol at the end of the paragraph and the one between paragraphs. Since there are at least 300 pages to the book - in text format - it would not be practicable to do this manually.

Actually, I am trying to fix a bad .epub book. I found that if I convert it to a text file and the convert it back to an e-book format, most of the formatting issues go away when viewing it with my tablet's .epub readers [Nook and Pocketbook].

So has anyone tried to do something like this, removing blank paragraphs, i.e. blank lines between paragraphs, as an automatic find/replace option?

Hi,

if this is about removing all empty paragraphs, there is a special feature for it besides Search and Replace.

Goto Tools > AutoCorrect > AutoCorrect Options. On tab 'Options' remove all checks in column [M], but check 'Remove blank paragraphs'. Go back to the text and select the whole area where you want to remove empty paragraphs. Goto Tools > AutoCorrect > Apply. That`s all :slight_smile:

Kind regards
Regina

Tim---Kracked_P_P---webmaster schrieb:

Tim---Kracked_P_P---webmaster wrote:

> I have a problem with a old book text file. There are 3 or more lines
> between paragraphs. I want to reduce them down to only one.
>
> When I turned the option to view non-printed characters, I get the
> "PI-ish" looking symbol - ¶ - that is at the end of every paragraph.
> Each extra line has that symbol as its first character. The character
> listed in the last line of this text looks like the non-printing
> character, but must be different from it, since it gives a "no match
> found" error when using it.
>
> So, is there a way to use that end of paragraph symbol/character to look
> for three in a row and replace it with only two? Since some of the
> paragraphs have more than 3 extra lines between paragraphs, I could run
> that find/replace several times. All I want to show is the symbol at
> the end of the paragraph and the one between paragraphs. Since there
> are at least 300 pages to the book - in text format - it would not be
> practicable to do this manually.
>
> Actually, I am trying to fix a bad .epub book. I found that if I
> convert it to a text file and the convert it back to an e-book format,
> most of the formatting issues go away when viewing it with my tablet's
> .epub readers [Nook and Pocketbook].
>
> So has anyone tried to do something like this, removing blank
> paragraphs, i.e. blank lines between paragraphs, as an automatic
> find/replace option?

As you mention in your last paragraph, it is about blank paragraphs. It is not blank lines between paragraphs. These come from the bad practice to insert so called "blank lines" as paragraph separators, but as you correctly observe, they are additional blank paragraphs. The proper way to get spacing between paragraphs is to set additional space after the paragraph with Format > Paragraph.

To get rid of the blank paragraph, use Edit > Find and Replace, Click Other options, and check Regular expressions. In the search field enter ^$

Now you can find/replace the empty paragraphs.

Tim---Kracked_P_P---webmaster wrote:

I have a problem with a old book text file. There are 3 or more lines
between paragraphs. I want to reduce them down to only one.

When I turned the option to view non-printed characters, I get the
"PI-ish" looking symbol - ¶ - that is at the end of every paragraph.
Each extra line has that symbol as its first character. The character
listed in the last line of this text looks like the non-printing
character, but must be different from it, since it gives a "no match
found" error when using it.

It's just represented like that when showing non-printing characters; it's not actually inserted as that character (which is itself a printable character).

So, is there a way to use that end of paragraph symbol/character to look
for three in a row and replace it with only two? Since some of the
paragraphs have more than 3 extra lines between paragraphs, I could run
that find/replace several times. All I want to show is the symbol at
the end of the paragraph and the one between paragraphs. Since there
are at least 300 pages to the book - in text format - it would not be
practicable to do this manually.

I haven't been able to easily find a way to search for consecutive paragraph breaks. However, you can find empty paragraphs by searching for "^$" (without the quotes), and ticking "Regular expressions" under "Other options". Leave "Replace With" blank and click "Replace All", and all the empty paragraphs will be removed.

I don't know if that helps, since it will remove all empty paragraphs, not just those where there are 3 or more together. Unfortunately searching for "$^$^" doesn't seem to work to find 2 consecutive empty paragraphs...

Why do you want to leave 2 consecutive paragraph breaks anyway? If it's to get the spacing, you should remove the extra blank paragraphs and add the spacing by setting the spacing above/below paragraph in the paragraph formatting (or even better, in a paragraph style which is applied to the appropriate paragraphs).

Actually, I am trying to fix a bad .epub book. I found that if I
convert it to a text file and the convert it back to an e-book format,
most of the formatting issues go away when viewing it with my tablet's
.epub readers [Nook and Pocketbook].

Before going too far with the spacing above/below paragraph, you might want to check that it actually gets applied when converted into .epub format. I'd have thought it should, but you never know.

Tim---Kracked_P_P---webmaster wrote:
> I have a problem with a old book text file. There are 3 or more
> lines
> between paragraphs. I want to reduce them down to only one.
>
> When I turned the option to view non-printed characters, I get the
> "PI-ish" looking symbol - ¶ - that is at the end of every
> paragraph.
> Each extra line has that symbol as its first character. The
> character
> listed in the last line of this text looks like the non-printing
> character, but must be different from it, since it gives a "no
> match
> found" error when using it.

It's just represented like that when showing non-printing characters;
it's not actually inserted as that character (which is itself a
printable character).

> So, is there a way to use that end of paragraph symbol/character to
> look
> for three in a row and replace it with only two? Since some of the
> paragraphs have more than 3 extra lines between paragraphs, I could
> run
> that find/replace several times. All I want to show is the symbol
> at
> the end of the paragraph and the one between paragraphs. Since
> there
> are at least 300 pages to the book - in text format - it would not
> be
> practicable to do this manually.

I haven't been able to easily find a way to search for consecutive
paragraph breaks. However, you can find empty paragraphs by searching
for "^$" (without the quotes), and ticking "Regular expressions"
under
"Other options". Leave "Replace With" blank and click "Replace All",
and
all the empty paragraphs will be removed.

I don't know if that helps, since it will remove all empty
paragraphs,
not just those where there are 3 or more together. Unfortunately
searching for "$^$^" doesn't seem to work to find 2 consecutive empty
paragraphs...

Why do you want to leave 2 consecutive paragraph breaks anyway? If
it's
to get the spacing, you should remove the extra blank paragraphs and
add
the spacing by setting the spacing above/below paragraph in the
paragraph formatting (or even better, in a paragraph style which is
applied to the appropriate paragraphs).

> Actually, I am trying to fix a bad .epub book. I found that if I
> convert it to a text file and the convert it back to an e-book
> format,
> most of the formatting issues go away when viewing it with my
> tablet's
> .epub readers [Nook and Pocketbook].

Before going too far with the spacing above/below paragraph, you
might
want to check that it actually gets applied when converted into .epub
format. I'd have thought it should, but you never know.

> So has anyone tried to do something like this, removing blank
> paragraphs, i.e. blank lines between paragraphs, as an automatic
> find/replace option?

If you need to search and replace paragraph marks and others, you can
use AltSearch (http://extensions.libreoffice.org/extension-center/alter
native-dialog-find-replace-for-writer). If have it installed on LO5.1
and it works fine. You look for two end of paragraph characters (select
from the drop-down or type in \p\p)
and replace with just one; repeat until there no more replacements.
Cheers!
Rémy Gauthier.

Here is a link to a part of the screen-clip of a typical problem.
http://libreoffice-na.us/holding/paragraph-break.jpg

This shows the problem with the text file. The .epub had even more problems like it seemed that after every sentence would appear to have a "carriage-return and/or line-feed - plus it seemed to be doubled space.

That ¶ symbol is shown in blue and some of the text has one breaking up the middle of the sentence between paragraphs even though it was not needed for a page break. Some times there 3 of these symbols between the paragraphs, and other times there are 4 symbols.

I can read the book in a text file, but it would be better as an .epub file. The conversion is done by Calibre, first to the text file, then back from there. To be honest, I really start using that package to convert PDF book files to an e-book so I can read it in a larger font than having the original PDF file viewed on my 10 inch tablet. Then I started to use it for my free book collection and other conversion needs.

This worked for a lot of the issues.

When I took the TXT file and convert it to EPUB file.

I still have some more issues, but it is better than it originally was.

I wonder about simply using search/replace and remove all end of paragraph marks ('\n\), replacing them with a mark of one's own like '#' (as long as it doesn't occur elsewhere in the text); then remove all consecutive "##' and replace with '\r' or '\r\r'?

f.

Open Document format is a form of XML, in which paragraphs are denoted by
<p>[Content]</p>.