[Archivesspace_Users_Group] API output - extra unicode

Peter Heiner ph448 at cam.ac.uk
Sat Sep 4 08:35:35 EDT 2021


Hi Tom,

The AS API is UTF-8 by default and AS tries to make sure your database is set up correctly, too, by checking the database/table encodings. As a data point, with dozens of migrations making millions of calls to the AS API and sending data in both directions I've yet to come across a single instance of AS inserting spurious characters into API responses, but I've had plenty of encoding issues in the same migrations on the data/database level. I'm fairly confident you'll find the source of those characters if you look at the raw data.

p

________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Tom Hanstra <hanstra at nd.edu>
Sent: 03 September 2021 18:09
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] API output - extra unicode

Brian (and others),

The data in the database should be UTF-8 as far as I can tell. So, I think this has to be happening at the API export level. Is there anything specific that needs to be done to have the API know that this is UTF-8 data?

Tom

On Fri, Sep 3, 2021 at 11:42 AM Brian Harrington <brian.harrington at lyrasis.org<mailto:brian.harrington at lyrasis.org>> wrote:

Hi Tom,



In my experience \u00c3 appearing in anything is almost always a sign of encoding issues.  I would make sure that everything is UTF-8 all the way through.



Brian



From: <archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>> on behalf of Tom Hanstra <hanstra at nd.edu<mailto:hanstra at nd.edu>>
Reply-To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Date: Friday, September 3, 2021 at 11:06 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: [Archivesspace_Users_Group] API output - extra unicode



On our local version of ArchivesSpace, we are testing API output and are finding that we are getting extra Unicode characters on export. It looks like the data is right in the database, but doesn't quite come out right from the API extract. It looks like there is an extra unicode character added (in some of the code we reviewed, this was either \u00c3 or \u00a2).



Where might we have something set incorrectly?  Where might the extra data be coming from or have been introduced along the way?



Thanks,

Tom



--

Tom Hanstra

Sr. Systems Administrator

hanstra at nd.edu<mailto:hanstra at nd.edu>



[Image removed by sender.]

_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


--
Tom Hanstra
Sr. Systems Administrator
hanstra at nd.edu<mailto:hanstra at nd.edu>

[https://docs.google.com/uc?export=download&id=1GFX1KaaMTtQ2Kg2u8bMXt1YwBp96bvf0&revid=0B7APN9POn6xAQ244WWFYMFU3aVJwZ0lxbmVHK3FxNXlCd0RRPQ]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20210904/fab24cb1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-Image remo.gif
Type: image/gif
Size: 42 bytes
Desc: Outlook-Image remo.gif
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20210904/fab24cb1/attachment.gif>


More information about the Archivesspace_Users_Group mailing list