[Archivesspace_Users_Group] diacritics in Title and Filing Title fields

Tang, Lydia ltang5 at lib.msu.edu
Wed Dec 12 10:35:09 EST 2018

The ticket is currently in “Ready for Implementation,” meaning that it has passed through Dev. Pri. and is awaiting a Developer.  😊
-on behalf of Development Prioritization subteam

From: <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Benn Joseph <benn.joseph at northwestern.edu>
Reply-To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Date: Wednesday, December 12, 2018 at 9:44 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] diacritics in Title and Filing Title fields

Thanks Alexander—I figured there must be an open ticket but couldn’t find one…so I just voted for yours!


Benn Joseph
Head of Archival Processing
Northwestern University Libraries
Northwestern University
benn.joseph at northwestern.edu<mailto:benn.joseph at northwestern.edu%0d>

From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> On Behalf Of Alexander Duryee
Sent: Wednesday, December 12, 2018 8:39 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] diacritics in Title and Filing Title fields

Thanks for posting this!  I created a ticket for this issue some time ago - https://archivesspace.atlassian.net/projects/ANW/issues/ANW-758<https://urldefense.proofpoint.com/v2/url?u=https-3A__archivesspace.atlassian.net_projects_ANW_issues_ANW-2D758&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=fciHLC2ou0tXKp-JlPlsrEmslFw9tnR331DgXAhVLvo&m=O_I9IP5TQqjSgLCEmGbTMMyY4LobLqBLjzWAg0WkdGA&s=VOQxtRmnpKCZotsqXT5kV4pn8GQCasVf90Fvb5XmXk4&e=>.  The issue appears to be that the base PDF font set is limited in its character support, and does not handle diacritics/non-Latin characters well - it either "flattens" them to ASCII, or replaces them with "#".

I'm unaware of any workarounds in the meantime, but it's entirely a PDF rendering issue - your data should be fine as-is.


On Tue, Dec 11, 2018 at 12:57 PM Zalduendo, Ines <izalduendo at gsd.harvard.edu<mailto:izalduendo at gsd.harvard.edu>> wrote:
Thanks Benn for sending this along.
The same is going on with Japanese characters. They display correctly in ArchivesSpace but the PDF doesn’t display them.
Here’s an example: https://hollisarchives.lib.harvard.edu/repositories/7/resources/201<https://urldefense.proofpoint.com/v2/url?u=https-3A__hollisarchives.lib.harvard.edu_repositories_7_resources_201&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=fciHLC2ou0tXKp-JlPlsrEmslFw9tnR331DgXAhVLvo&m=O_I9IP5TQqjSgLCEmGbTMMyY4LobLqBLjzWAg0WkdGA&s=JjKdbQRUM9fSwq9li8NwrT9tTu3v2D2NJXBwpWU7O1E&e=> (top right button for PDF)
I never reported this to the users group, but am glad others are interested in this being looked into. I was told core developers already know about this.

Special Collections Archivist / Frances Loeb Library / Harvard University Graduate School of Design / 48 Quincy Street, Cambridge, MA 02138 / T. 617.496.1300

From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> <archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>> On Behalf Of Benn Joseph
Sent: Tuesday, December 11, 2018 11:19 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: [Archivesspace_Users_Group] diacritics in Title and Filing Title fields

Not sure if there’s a ticket for this, but we’re seeing some tricky behavior with diacritics in both the Title and Filing Title fields when trying to print a PDF as a background job.

Here’s an example: the collection name is “Camille Saint-Saëns correspondence”, and the umlaut displays correctly in the public interface.

If this text is input into the Title field without any character encoding, i.e. if the “ë” is just pasted in there, then when I print a PDF as a background job in the staff interface it shows up like this:

“Camille Saint-Sae#ns correspondence”

If I encode the character, whether HTML (ë) or UTF-8 (ë), the title ends up looking like this in the PDF output:

“Camille Saint-Saëns correspondence”

…because the ampersand gets converted to “&” in the xml and ends up as “& #235;”. I’m not seeing this behavior in any other fields though. Does this mean that no diacritics are allowed in the Title fields? Or, am I just inputting this wrong? When generating a PDF from the public interface, it seems to remove the encoding entirely, so the title fields end up as “Saint-Saens” in each case--although I understand that PDF creation process to be different than the one done as a background job.


Benn Joseph
Head of Archival Processing
Northwestern University Libraries
Northwestern University
benn.joseph at northwestern.edu<mailto:benn.joseph at northwestern.edu%0d>

Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>

Alexander Duryee
Metadata Archivist
New York Public Library
alexanderduryee at nypl.org<mailto:alexanderduryee at nypl.org>

More information about the Archivesspace_Users_Group mailing list