Given a list of documents, is there a convenient way of doing a Find on
the whole set of documents in the list?
Hi
Which platform?
If you are on Gnu&Linux then some sort of grep command on the command line
might be an optimally efficient route.
I am assuming your searching the contents rather than just searching for a
file-name as a file-name search would be fairly trivial in any OS wouldn't
it?
Regards from
Tom
Somebody gave me a script. I find it very useful for finding a word or phrase within my odt documents. I hope this helps.
Ubuntu 14.04
Don Pobanz
don@PC-002:~/Documents/Sermons$ cat search_odt_files
#!/bin/bash
# run this script by typing the following
# sh search_odt_files "this string"
if [ $# -ne 1 ]; then
echo "Usage: searchodt searchterm"
exit 1
fi
# added */ in front of *.odt to search subdirectories of a directory
for file in $(ls */*.odt); do
unzip -ca "$file" content.xml | grep -ql "$1"
if [ $? -eq 0 ]; then
echo "$file"
fi
done
don@PC-002:~/Documents/Sermons$
Thank you, Don, but that only shows which files contain the
search string. (It's likely that all files in the list will contain at
least one occurrence of the string.)
That would be a start, but what I am looking for is a means of seeing
the string as if Writer was showing the file contents, so that I can
see the surrounding text.
(Equivalent to joining all the doc's into one big file, then doing a
Find. Perhaps I shall have to do the joining manually...)
Try changing the line:
unzip -ca "$file" content.xml | grep -ql "$1"
to:
unzip -ca "$file" content.xml | grep -qC 10 "$1"
the "-l" to grep makes it show only the names of files that match, not
the content. The "-C #" gives # lines of context around the match. Or
you could use "-B #" and "-A #" to print # lines of leading and
trailing conext, respectively.
You could also make a script to pull the contents of all the files and
concatenate them in such a way that you can use Writer to do find
inside one big document, but that would be considerably harder. Try
this first.
Paul
Disclaimer: I haven't actually tested this, just done a "man grep", but
I think the syntax is right...
Hi
I could easily be wrong but I think the first "echo" could use updating?
It doesn't affect how it runs or any processing because the "echo" is only
regurgitated into the display for human being to read. It's where you'd
put "hello world" if you wanted the display to just show that. I'm not
sure if i've explained that well or just made it as clear as mud because
i'm guessing everyone else already knew that, right?
Even though it's not processed by the machine and is just for human-readers
to see, it's still (imo) handy for it to give the correct information. So,
probably change
echo "Usage: searchodt searchterm"
to
echo "Usage: search_odt_files searchterm"
or else change the scripts' file-name back to searchodt as it appears to
have been as some previous point in the scripts' 'life'.
Regards from
Tom
That's what I shall perhaps finish up doing...
Any particular reason? Did the arguments to grep not work, or do you
just not find that style of output particularly useful?
Well, Maurice quoted from my mail, so I'm pretty sure he did receive it.
Btw: Tom, your mail was addressed to me directly, and CCd to the group,
causing my default reply-to to go to just you (luckily I noticed in
time). Not sure why this happens for some messages, did you do anything
differently for your message?
Hi
Nope, it;s the standard way these mailing lists have behaved for a long
time now.
It used to be that people could just click on "Reply to" and their message
would go straight to the mailing list. Now most email-clients require
people to click on "Reply to all ..." and the mailing list's address is
only in the "CC" rather than in the "To" field. Numerous people have
grumbled about it in here but few bother to post a complaint to the
postmaster address and those that do just seem to get agro for it.
One person here did try to show how he re-configured his own email-client
to get around the problem and a few of the other longer-term people here
might well have followed his lead but i am not sure what effect that sort
of thing has on non-LO emails. Also i kinda believe in the "Eat your own
dog food" principle so that i stay in touch with the problems normal users
have when they first approach this mailing list.
Regards from
Tom
Hi
I suspect that Paul's post below has not yet arrived in Maurice's
time-line.
Email threads sometimes get a bit disjointed, especially if an
over-enthusiastic junk/spam-filter tends to carefully reject anything with
any hint of code in it! However it could easily be that someone starts
from their older messages and work forwards to newer and newer ones instead
of the more sensible approach (imo) of working from the newest posts
backwards to the oldest. By starting with the newest ones first i often
find that older posts have already been dealt with and can thus be safely
ignored even if they stir-up side-issues (which also might have already
been largely dealt with).
On the other hand it might be good if someone could test Paul's script.
Perhaps it's possible to combine the 2 ideas so that both the file-name AND
the few lines of surrounding text could be output? Would that help? Also
it might be good to have the output directed into a file rather than just
onto the command-line?
I really like Don Pobanz's answer and the way Paul was able to help tweak
it. It felt like a return to what this mailing list is largely about =
collaborating to build-up a better answer faster than the individuals had
time to do on their own. Good work!! :)))
Regards from
Tom
Well, it does seem like all your mails do this, but not all mails from
this list exhibit this behaviour. Most mails from the list, even
replies, are addressed to the list. Yours are different in that they're
not addressed to the list, only CCd to the list. Some other people's
replies are the same, but I'd say not most.
When the mail is addressed to the list, or addressed to someone else and CCd
to the list, I can just click reply, but when the mail is addressed to
me personally and only CCd to the list clicking Reply replies to the
sender only.
I can only think that it's a difference in email clients and how they
handle list messages. The messages contain list headers, so most
clients, like mine, must pick that up and automatically reply to the
list, but some, like yours, must be ignoring those and replying to the
sender instead. I think.
So if I'm understanding the process right, it's not so much a problem
with how the list is set up (other than that it doesn't rewrite the
sender header), but rather with some clients not honouring the list
headers.
Paul
Try this, even if it isn't exactly an 'out of the box' solution, it
can be useful:
in few words, the script parses the xml file inside the .odt - in fact
an archive file, and search for a keyword after having extracted the
text part.
A short excerpt, from the page 3 of "Extract and Parse ODF Files with Python":
"In this particular program, I collect all the text as a list of
paragraphs, and then I search for the keywords passed in from the
command line. If the searched word matches, the paragraph is printed
out.
The text found in each <text:p> is Unicode text. You have to convert
this to normal text in order to print correctly and/or use in a
widget. The encode() command translates the Unicode to a printable
string. "
If the Python code were modified to also add filename with path and inject it at end of paragraph as URL.
It might be possible to re-direct python command output to a .txt file that could be opened by Writer.
I am not sure whether or not Writer could be set to recognize and "Open File URL" automatically to modify original document.
Hhhhhmmmmm
Well, it does seem like all your mails do this, but not all mails from
this list exhibit this behaviour. Most mails from the list, even
replies, are addressed to the list. Yours are different in that they're
not addressed to the list, only CCd to the list. Some other people's
replies are the same, but I'd say not most.
I have to manually remove the OP's address and put the list address in the To: field.
it is tiresome. no other list I know (or manage) works this way.
I use 'alpine' when posting to the list.
I would be happy to write the postmaster; I forget who that is.
F.
Quick question, Felmon, do you have to do this for all mails from this
list, both Tom's and mine, or just for mails like the ones from Tom?
From a *brief* google, it seems that the List-* headers may legally
be other than email addresses, so some mail clients, I gather Alpine is
one, don't use them at all for replies, instead replying to the From
field. Other mail clients, like Thunderbird and apparently Claws Mail,
do use them, at least when they are email addresses or mailto URLs, to
reply to the list.
So, as I understand it:
Alpine is technically correct, in that it cannot *rely* on the List-*
headers for replies. Though arguably it should do this when these
headers do give a valid email address.
The list is technically correct, in that it shouldn't enforce Reply-To
header rewriting to *force* all mails to go back to the list, and
instead provides the List-* headers as per the RFC. Though arguably it
would be fine to overwrite the Reply-To header, because it changes the
default to be a reply to the list, and users can override that if they
want, rather than the current situation with non-smart mail clients,
which is the reverse.
Thunderbird et. al. are practically correct, in that the List-* headers
often do contain a valid email address, and can be used for replying to
the list, which is what people want most of the time.
The best solution, obviously, would be to extend the relevant RFC to
include a List-Reply header that is mandated to be the valid email
address of the list for replies, such that mail clients can use it to
provide smart replies. Of course, this means extending an RFC, and then
waiting for all email clients to update to include the new behaviour.
Maybe someone who knows this stuff could comment further, I'm just
starting to understand it myself. I've heard the debate on the list
before about what the list should and shouldn't do, but I never really
understood it, because some mails worked correctly and some didn't. I'm
only now getting the hang of this, and I can't remember what all has
been said before (and I can't really be bothered to go look it up and
raise it all again).
Basically, for me things work, unless someone sends mails that are
addressed to me personally and only CCd to the list, which I feel is
the wrong behaviour from a mail client, although the issue of why
some mail clients do that seems to be technical.
But technically, neither the list nor the mail client are *wrong*. In
practice it seems either one could be changed, although it is probably
more a mail client fault than a list fault, or at least so it seems to
me.
Paul
Hi
The mailing list used to do a certain thing regardless of which mailing
client was being used, and that was to make it easy for messages and
replies to go straight to the mailing list. it was possible to p.m. people
but it took a bit of faffing around. Answers were often built-up as
happened in this thread. People were not shy about posting answers or bits
of answers or suggestions. The community here was quite strong and grew
fast back then.
Nowadays new people to the list struggle to keep messages on-list and
replies from us apparently go off-list quite often too.
Of course the technical reasons are quite complicated. Should new users
have to deal with that and change email clients in order to use this
mailing list? Should this mailing list be restricted to only users who
perform technical shennanigans with their email client before being allowed
to seek answers from us?
Regards from
Tom
Paul wrote:
The list is technically correct, in that it shouldn't enforce Reply-To
header rewriting to *force* all mails to go back to the list, and
instead provides the List-* headers as per the RFC. Though arguably it
would be fine to overwrite the Reply-To header,
That would still cause problems, as it makes it difficult to reply to an individual if the need arises. Especially as this list accepts posts from non-subscribed users, who sometimes request a direct copy of the mail.
For a start, overwriting the Reply-To header would make Reply and Reply All both go to only the list, and not the original sender (Reply-To overrides From). It's difficult to persuade many clients to reply (or even copy) the From address when a Reply-To address is set.
Additionally, if a user had set their own Reply-To address, expecting replies to go to that address instead of their From address, that would be completely lost.
because it changes the
default to be a reply to the list, and users can override that if they
want,
No, they can't; that's the problem. If I send an email with my own Reply-To header, to a list which overwrites Reply-To, that list removes my Reply-To header and sets its own.
In this list's setup, I can set the Reply-To address to the list address, indicating that I don't want a direct copy and replies should go to the list. (I don't usually bother to do that, but I have the choice).
Paul wrote:
Well, it does seem like all your mails do this, but not all mails from
this list exhibit this behaviour. Most mails from the list, even
replies, are addressed to the list. Yours are different in that they're
not addressed to the list, only CCd to the list. Some other people's
replies are the same, but I'd say not most.When the mail is addressed to the list, or addressed to someone else and CCd
to the list, I can just click reply, but when the mail is addressed to
me personally and only CCd to the list clicking Reply replies to the
sender only.I can only think that it's a difference in email clients and how they
handle list messages. The messages contain list headers, so most
clients, like mine, must pick that up and automatically reply to the
list, but some, like yours, must be ignoring those and replying to the
sender instead. I think.
Indeed. I think Tom tends to Reply All, so you get two copies - one direct and one through the list. If you reply to the direct one, that doesn't have the List-* headers, so will go only to Tom by default. If you reply to the one received through the list, that does have the List-* headers and if your mail client uses them it will reply to the list.
In my client, I have to select Reply to List to use the List-Reply header, but it sounds like yours uses it by default if available.
So if I'm understanding the process right, it's not so much a problem
with how the list is set up (other than that it doesn't rewrite the
sender header), but rather with some clients not honouring the list
headers.
Yep. This list is set up differently from many others, but it is more correct. It shouldn't rewrite the From or Reply-To headers as that causes other problems, particularly as this list accepts emails from non-subscribed users who sometimes request a direct copy of replies. If the From or Reply-To address has been overwritten, it is difficult or even impossible to email or copy someone individually. The Sender header I think would be safe to overwrite, but doing so wouldn't be particularly useful as mail clients don't use it for replies.
Mark.
Hi
I think it's easier to just edit the bash script isn't it?
Surely to get it's output into a file all that is needed is something like
> filename.txt
to be added to the end of the relevant lines? or better would be if it
could keep adding to the end of a file after first creating the file with
the first bit of output. I think Python is a bit of an over-kill for this
although it might be really nice to have as a permanent Extension written
in a decent language like Python.
Regards from
Tom