[Archivesspace_Users_Group] Advice on what to look for in AS logs after printing error?

Andrew Morrison andrew.morrison at bodleian.ox.ac.uk
Fri Jan 7 05:18:50 EST 2022


My fix should prevent all "Failed to clean XML: The reference to 
entity..." errors triggered by EAD-compliant encoding. But, depending on 
what version of ArchivesSpace you are running, it may only make a 
difference in niche cases. As I understand it (although Blake may wish 
to correct me if I am wrong) the timeline is this:

    Up to 2.7.0, PDFs generated by the PUI did not fail in this precise
    way, at least not if your records used EAD-compliant encoding of
    characters such as ampersands, greater-than, less-than, etc.

    In 2.7.1, a change was made to allow people to include a HTML entity
    reference, specifically the one for non-breaking spaces ( ) in
    their records. That is not strictly EAD-compliant encoding, but some
    people use them for formatting purposes, or because their records
    are converted from old web pages. But that broke generation of PDFs
    for records containing EAD-compliant encoding of ampersands which
    happened to be immediately followed by an uppercase letter (e.g.
    "B&W").

    In 2.8.1, the case of ampersands immediately followed by an
    uppercase letter was fixed, but PDFs will still fail if a record
    contains an ampersand immediately followed by a character which
    isn't an ASCII upper or lowercase alphabetic character or space. The
    specific case I've encountered is numbers in citations of printed
    resources (e.g. "Vols. 1&2") but it could also happen with UTF-8
    characters outside the ASCII range.

    Now, my proposed fix would, I believe, prevent PDFs from breaking
    whatever immediately follows an ampersand. Also potentially other
    problems such as records containing < in certain contexts.
    Admittedly these are rare, but if you've got enough records they
    will occur somewhere, and they are fiendishly difficult to track down.

So, if you are running 2.7.1 or 2.8.0, and you are sure that your 
records only contain things like "B&W", and never things like "Vols. 
1&2", then upgrading to 2.8.1 or higher would probably fix your problem.

If you're already running 2.8.1 or higher, my fix is currently untested 
by anyone but me, but if you want to give it a try, let me know.

Andrew.


On 06/01/2022 14:28, Kyle Breneman wrote:
>
> Andrew, thank you for taking the time to point me to your Github fix.  
> I /do/ see the “Failed to clean XML” error in my logs, but in each 
> case it is seemingly upset about missing semicolons: “Failed to clean 
> XML: The reference to entity "W" must end with the ';' delimiter.”
>
> If I understand your Github repo code, it is narrowly targeted at 
> dealing with situations where &amp is immediately followed by a digit, 
> and so would not help in my situation. Have I got that right?
>
> *Kyle Breneman*
>
> Integrated Digital Services Librarian
>
> The University of Baltimore
>
> kbreneman at ubalt.edu <mailto:kbreneman at ubalt.edu>
>
> /I believe in freedom of thought and /
>
> /freedom of speech. Do you?/
>
> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org 
> <archivesspace_users_group-bounces at lyralists.lyrasis.org> *On Behalf 
> Of *Andrew Morrison
> *Sent:* Thursday, January 6, 2022 5:14 AM
> *To:* archivesspace_users_group at lyralists.lyrasis.org
> *Subject:* Re: [Archivesspace_Users_Group] Advice on what to look for 
> in AS logs after printing error?
>
> *[**EXTERNAL EMAIL: This message originated from a non-UBalt email 
> system. Hover over any links before clicking and use caution when 
> opening attachments.**]*
>
> If you do see that "Failed to clean XML" message in the logs, then you 
> might be interested in this pull request I submitted recently:
>
> https://github.com/archivesspace/archivesspace/pull/2553 
> <https://github.com/archivesspace/archivesspace/pull/2553>
>
> I could put the same fix into the form of a plug-in, if that is what 
> you are seeing, you have the ability to install plug-ins, and you are 
> running 2.7.1 or newer.
>
> It might be a different markup issue, but in my experience the logs 
> never tell you which archival object the problem is in. It cannot, 
> because by that point it has converted the collection into a temporary 
> HTML file, which is the intermediate step before converting to PDF. 
> You could try exporting as EAD from the staff interface, then 
> validating in an XML editor, but if the issue is something which is 
> valid in EAD, then it can be very difficult to trace. If you have a 
> local development instance of ArchivesSpace, you can modify the code 
> so it doesn't delete the temporary HTML files, then validate those.
>
> Andrew.
>
> On 05/01/2022 18:14, Blake Carver wrote:
>
>     It's going to be a bit of looking for a bunch of needles in a very
>     short hay stack kinda thing.
>
>     The errors should have either FATAL or ERROR and something about
>     pdf around there somewhere. Sometimes there will be allotta other
>     FATAL and ERROR around, so you'll need to narrow it down based on
>     what each one says.
>
>     You could also look for "92" "126" and "21" I think the resource
>     number should show up around the error as well.
>
>     Also wouldn't surprise me to see this error in particular, but not
>     always:
>
>     |RuntimeError (Failed to clean XML: The entity name must
>     immediately follow the '&' in the entity reference.):|
>
>     ------------------------------------------------------------------------
>
>     *From:*archivesspace_users_group-bounces at lyralists.lyrasis.org
>     <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>
>     <archivesspace_users_group-bounces at lyralists.lyrasis.org>
>     <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>
>     on behalf of Kyle Breneman <kbreneman at ubalt.edu>
>     <mailto:kbreneman at ubalt.edu>
>     *Sent:* Wednesday, January 5, 2022 12:36 PM
>     *To:* Archivesspace Users Group
>     <archivesspace_users_group at lyralists.lyrasis.org>
>     <mailto:archivesspace_users_group at lyralists.lyrasis.org>
>     *Subject:* Re: [Archivesspace_Users_Group] Advice on what to look
>     for in AS logs after printing error?
>
>     Thank you for that reminder, Blake!  Another question: the print
>     action was being run from the following pages.  Wouldn’t clicking
>     the AS print button itself register in the logs?  If so, how could
>     I efficiently find those lines?
>
>     https://archivesspace.ubalt.edu/repositories/2/resources/92
>     <https://archivesspace.ubalt.edu/repositories/2/resources/92>
>
>     https://archivesspace.ubalt.edu/repositories/2/resources/126
>     <https://archivesspace.ubalt.edu/repositories/2/resources/126>
>
>     https://archivesspace.ubalt.edu/repositories/2/resources/21
>     <https://archivesspace.ubalt.edu/repositories/2/resources/21>
>
>     *Kyle Breneman*
>
>     Integrated Digital Services Librarian
>
>     The University of Baltimore
>
>     kbreneman at ubalt.edu <mailto:kbreneman at ubalt.edu>
>
>     /I believe in freedom of thought and /
>
>     /freedom of speech. Do you?/
>
>     *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org
>     <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>
>     <archivesspace_users_group-bounces at lyralists.lyrasis.org>
>     <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>
>     *On Behalf Of * Blake Carver
>     *Sent:* Wednesday, January 5, 2022 12:31 PM
>     *To:* Archivesspace Users Group
>     <archivesspace_users_group at lyralists.lyrasis.org>
>     <mailto:archivesspace_users_group at lyralists.lyrasis.org>
>     *Subject:* Re: [Archivesspace_Users_Group] Advice on what to look
>     for in AS logs after printing error?
>
>     *[EXTERNAL EMAIL: This message originated from a non-UBalt email
>     system. Hover over any links before clicking and use caution when
>     opening attachments.]*
>
>     grep the logs for  ERROR or FATAL
>
>     ------------------------------------------------------------------------
>
>     *From:*archivesspace_users_group-bounces at lyralists.lyrasis.org
>     <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org><archivesspace_users_group-bounces at lyralists.lyrasis.org
>     <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>>
>     on behalf of Kyle Breneman <kbreneman at ubalt.edu
>     <mailto:kbreneman at ubalt.edu>>
>     *Sent:* Wednesday, January 5, 2022 12:28 PM
>     *To:* archivesspace_users_group at lyralists.lyrasis.org
>     <mailto:archivesspace_users_group at lyralists.lyrasis.org><archivesspace_users_group at lyralists.lyrasis.org
>     <mailto:archivesspace_users_group at lyralists.lyrasis.org>>
>     *Subject:* [Archivesspace_Users_Group] Advice on what to look for
>     in AS logs after printing error?
>
>     Our archives staff have noticed that AS tends to get hung up when
>     users click the Print button on some of our largest collections. 
>     Campus IT tested this today.  The server did not hang for them,
>     but the print action also /did not complete/.  They got a very,
>     very generic error message (attached).
>
>     I have access to the ArchivesSpace files on the server, including
>     the /logs directory, but I’m not sure how to parse the logs for
>     clues. Does anyone have advice for how I can sift through the logs?
>
>     *Kyle Breneman*
>
>     Integrated Digital Services Librarian
>
>     The University of Baltimore
>
>     kbreneman at ubalt.edu <mailto:kbreneman at ubalt.edu>
>
>     /I believe in freedom of thought and /
>
>     /freedom of speech. Do you?/
>
>
>
>     _______________________________________________
>
>     Archivesspace_Users_Group mailing list
>
>     Archivesspace_Users_Group at lyralists.lyrasis.org  <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
>
>     http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group  <http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group>
>
>
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20220107/c435721c/attachment.html>


More information about the Archivesspace_Users_Group mailing list