[Archivesspace_Users_Group] Thai names in Finding Aid PDF

Custer, Mark mark.custer at yale.edu
Thu Jan 24 14:45:41 EST 2019

Exactly what Adam said!

To add to that, there's no single font (or even font family) that has glyphs for every single Unicode character.  The Noto font family has aims to do that "in the future," however, and it already includes a lot of fonts as part of its family (including Noto Sans Thai and Noto Serif Thai) that one would have to install.  See https://www.google.com/get/noto/

In any event, ASpace should certainly be updated so that the staff-side PDFs have more coverage by default (but I also think there needs to be a decision about whether the platform supports both EAD to PDF transformations as well as HTML/CSS to PDF transformations), but the out-of-the-box approach is never going to cover everything.  Perhaps a good next step would be to update Apache FOP (since the version used by ASpace is pretty out of date right now), package ASpace with a few of the Noto fonts so that those could be used in place of the base-14 fonts (e.g. Times is used by FOP for its "any" font), and update the transformation process.  Even then, though, I believe that you would actually need to embed the fonts into the PDF file, since if you don't, there's no guarantee that whomever opens the PDF file has that font on their computer, so you might still wind up with character replacements.  But the PDF standard allows you to do just that.

Last, EAD3 added language and script data attributes for precisely this sort of reason (e.g. if you have one paragraph in English, and another in Arabic, you'd need some reliable method to determine when to switch fonts and the direction of the text).  ASpace doesn't have that ability yet (although I'm pretty sure that AtoM does), but it would be a great addition (as well as a necessary one, for this sort of reason) addition.  Here's a note from EAD3s tag library:

"Support for multilingual description was addressed by adding @lang and @script attributes to all non-empty elements in EAD3, making it possible to explicitly state what language or script is used therein. Additionally, some elements were modified to allow them to repeat where previously they did not, thus enabling the inclusion of the same data in multiple languages."

So, lots to do, but all worth doing.

From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Adam Jazairi <jazairi at bc.edu>
Sent: Thursday, January 24, 2019 1:45:38 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] Thai names in Finding Aid PDF

Hi Ed,

This is a problem with the fonts included in the version of Apache FOP that ASpace uses. There's an open ticket here: https://archivesspace.atlassian.net/browse/ANW-473<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Farchivesspace.atlassian.net%2Fbrowse%2FANW-473&data=02%7C01%7Cmark.custer%40yale.edu%7C48edfc3ae3324bae138d08d6822c2cb0%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C636839523610525508&sdata=z%2FQrtRc3rB1CBIR3Djte09O2zYiXvUpJAJP%2FA7HftOw%3D&reserved=0>

We've encountered the same issue when we attempt to generate a PDF finding aid containing Irish or Japanese diacritics. An interim solution we've been using is to export the EAD, then run Saxon<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsaxon.sourceforge.net%2F%23F9.9HE&data=02%7C01%7Cmark.custer%40yale.edu%7C48edfc3ae3324bae138d08d6822c2cb0%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C636839523610535518&sdata=Fdt%2FN5Ei5qarYYT5qVq%2FhgbMgp5TaFJzpOs4MPlixVE%3D&reserved=0> on it to generate an FO file, then run FOP 1.0<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fxmlgraphics.apache.org%2Ffop%2F1.0%2F&data=02%7C01%7Cmark.custer%40yale.edu%7C48edfc3ae3324bae138d08d6822c2cb0%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C636839523610535518&sdata=zPfmkMTcJBxUjq9IP93jQy6buUoT9i2M0RFPbx0G5PQ%3D&reserved=0> with the appropriate font on the FO file to generate the PDF. It's a bit cumbersome, but it's worked for us so far.

Here's the FOP conf file that we use: https://github.com/BCDigLib/bc-aspace/blob/master/fop/fop.xconf<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FBCDigLib%2Fbc-aspace%2Fblob%2Fmaster%2Ffop%2Ffop.xconf&data=02%7C01%7Cmark.custer%40yale.edu%7C48edfc3ae3324bae138d08d6822c2cb0%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C636839523610545518&sdata=GgWWuL3vTkfjsJpZIypVKl2VgIIXF5qgieIez36Ks9U%3D&reserved=0>

The only catch is that you'll need a font that supports the unicode characters you need. In your case, it looks like Arial v2.95 would work: https://en.wikipedia.org/wiki/Arial#TrueType/OpenType_version_history<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FArial%23TrueType%2FOpenType_version_history&data=02%7C01%7Cmark.custer%40yale.edu%7C48edfc3ae3324bae138d08d6822c2cb0%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C636839523610545518&sdata=9MgY%2BExVKK4yhLPXxdlHsAdwJCIz5jDIBUtmE4HlTHk%3D&reserved=0>

Hope this helps.


On Thu, Jan 24, 2019 at 1:14 PM Tang, Lydia <ltang5 at lib.msu.edu<mailto:ltang5 at lib.msu.edu>> wrote:
Hi Ed,
The related ticket that I see is here: https://archivesspace.atlassian.net/browse/ANW-294?jql=text%20~%20%22pdf%20diacritics%22<https://na01.safelinks.protection.outlook.com/?url=https:%2F%2Farchivesspace.atlassian.net%2Fbrowse%2FANW-294%3Fjql%3Dtext%2520~%2520%2522pdf%2520diacritics%2522&data=02%7C01%7Cmark.custer%40yale.edu%7C48edfc3ae3324bae138d08d6822c2cb0%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C636839523610555528&sdata=rSJ6Fc0g2VvG3tlzd8gxx4uBzXtrebM4aouNPRfR5kc%3D&reserved=0>  It is “closed – completed”  It doesn’t look like Marcella’s ticket was ever created.  Ed, please go ahead and create a new ticket!  Thanks for pointing this out!
Lydia – on behalf of Dev. Pri.

From: <archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>> on behalf of "Busch, Edward" <buschedw at msu.edu<mailto:buschedw at msu.edu>>
Reply-To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Date: Thursday, January 24, 2019 at 12:53 PM
To: "'archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>'" <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: [Archivesspace_Users_Group] Thai names in Finding Aid PDF

I’m not sure if there is an open ticket on this or not; a quick search didn’t reveal anything directly.

Agents with Thai names and diacritics look correct in ASpace but when generated into a PDF finding aid, do not. They end up like:
Saph# K#ns#ks# h#ng Ch#t

I can create a ticket if needed.

Ed Busch, MLIS
Electronic Records Archivist
Michigan State University Archives
Conrad Hall
943 Conrad Road, Room 101
East Lansing, MI 48824
buschedw at msu.edu<mailto:buschedw at msu.edu><mailto:buschedw at msu.edu<mailto:buschedw at msu.edu>>

Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>

Adam Jazairi
Digital Repository Services
Boston College Libraries
(617) 552-1404
adam.jazairi at bc.edu<mailto:adam.jazairi at bc.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20190124/e39ca4a3/attachment.html>

More information about the Archivesspace_Users_Group mailing list