[Archivesspace_Users_Group] AT > ASpace migration questions

Chris Fitzpatrick Chris.Fitzpatrick at lyrasis.org
Wed Jul 30 05:03:44 EDT 2014


Hi Mark,


Well, you can always tweak the EAD exporter and importer to have them work the way you need it to. These could be incorporated as a plugin, so you wouldn't need to do a rebuild.



But what I'm hearing is that you would like blocks of text separated by line breaks ( two line breaks? ) to be wrapped in <p> tag when they are exported? I think this is a common request and it shouldn't be difficult for us to implement as the default.


best, chris.




Chris Fitzpatrick | Developer, ArchivesSpace
Skype: chrisfitzpat  | Phone: 918.236.6048
http://archivesspace.org/
________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Custer, Mark <mark.custer at yale.edu>
Sent: Tuesday, July 29, 2014 6:49 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] AT > ASpace migration questions

Chris,

I’m wondering how we should proceed with paragraphs since Archivists’ Toolkit handles paragraphs quite differently, and this is likely going to affect how we conduct our migration from AT to ArchivesSpace since we depend upon EAD exports (side note:  how are paragraphs handled in the Archon database?).

In the AT, the EAD paragraph element was never stored in the database (the EAD import process always stripped those out, but would nevertheless serialize them upon export where need be).  Here’s an example note from the AT, which includes two paragraphs, neither of which are encoded in the database:

Thomas Turner, 1729-1793, was a shop-keeper, and for a time the school-master, in the village of East Hoathly in Sussex, England. He also became church-warden in 1757. From 1754 to 1765 Turner wrote the diary which forms the major part of the papers.

Further biographical information may be found in <title render="italic">The Diary of Thomas Turner of East Hoathly (1754-1765)</title>, an abridged edition edited by Florence Maris Turner (Mrs. Charles Lamb), great-great-granddaughter of the diarist, with an introduction by J. B. Priestly, published in 1925 by John Lane, the Bodly Head Ltd., London.


That same note in ArchivesSpace looks like the following after using the database migration tool (and after converting it back to JSON):

{
    "jsonmodel_type": "note_multipart",
    "label": "THOMAS TURNER 1729-1793",
    "subnotes": [
        {
            "content":
"Thomas Turner, 1729-1793, was a shop-keeper, and for a time the school-master, in the village of East Hoathly in Sussex, England. He also became church-warden in 1757. From 1754 to 1765 Turner wrote the diary which forms the major part of the papers.
\n\n
Further biographical information may be found in <title render=\"italic\">The Diary of Thomas Turner of East Hoathly (1754-1765)</title>, an abridged edition edited by Florence Maris Turner (Mrs. Charles Lamb), great-great-granddaughter of the diarist, with an introduction by J. B. Priestly, published in 1925 by John Lane, the Bodly Head Ltd., London.
\n\n",
            "jsonmodel_type": "note_text",
            "publish": true,
            "subnote_guid": "0fa3207256b91d8d9832d99b470a3aea"
        }
    ],
    "type": "bioghist",
    "persistent_id": "bb5cd3f9e05a0b21cce68e4fbf0e7951"
}

In both cases, there are no paragraph elements stored in the database.

If I export this finding aid from Archivist’s Toolkit, here’s the EAD that I get (which is the EAD that I need):

      <bioghist id="ref11">
         <head>THOMAS TURNER 1729-1793</head>
         <p>Thomas Turner, 1729-1793, was a shop-keeper, and for a time the school-master, in the village of East Hoathly in Sussex, England. He also became church-warden in 1757. From 1754 to 1765 Turner wrote the diary which forms the major part of the papers.</p>
         <p>Further biographical information may be found in
                <title ns2:type="simple" render="italic">The Diary of Thomas Turner of East Hoathly (1754-1765)</title>, an abridged edition edited by Florence Maris Turner (Mrs. Charles Lamb), great-great-granddaughter of the diarist, with an introduction by J. B. Priestly, published in 1925 by John Lane, the Bodly Head Ltd., London.</p>
      </bioghist>

Note the 2 paragraph elements.

When I export this from ArchivesSpace (v1.0.9) after using the migration tool, here’s the result that I get:

    <bioghist id="bb5cd3f9e05a0b21cce68e4fbf0e7951">
      <head>THOMAS TURNER 1729-1793</head>
      <p>Thomas Turner, 1729-1793, was a shop-keeper, and for a time the school-master, in the
        village of East Hoathly in Sussex, England. He also became church-warden in 1757. From 1754
        to 1765 Turner wrote the diary which forms the major part of the papers. Further
        biographical information may be found in <title render="italic">The Diary of Thomas Turner
          of East Hoathly (1754-1765)</title>, an abridged edition edited by Florence Maris Turner
        (Mrs. Charles Lamb), great-great-granddaughter of the diarist, with an introduction by J. B.
        Priestly, published in 1925 by John Lane, the Bodly Head Ltd., London. </p>
    </bioghist>

Now there’s just a single paragraph element, where two paragraph elements should be, and this won’t work for us.

But here’s where things get tricky/problematic with ArchivesSpace.   If I were to upload this EAD file into ArchivesSpace (i.e. not using the migration tool), then two paragraphs would be explicitly encoded in the database.  Here’s the JSON that you’ll get after an EAD import:

{
    "jsonmodel_type": "note_multipart",
    "subnotes": [
        {
            "jsonmodel_type": "note_text",
            "content": "
<p>Thomas Turner, 1729-1793, was a shop-keeper, and for a time the school-master, in the village of East Hoathly in Sussex, England. He also became church-warden in 1757. From 1754 to 1765 Turner wrote the diary which forms the major part of the papers.</p>
\n
<p>Further biographical information may be found in \n                <title ns2:type=\"simple\" render=\"italic\">The Diary of Thomas Turner of East Hoathly (1754-1765)</title>, an abridged edition edited by Florence Maris Turner (Mrs. Charles Lamb), great-great-granddaughter of the diarist, with an introduction by J. B. Priestly, published in 1925 by John Lane, the Bodly Head Ltd., London.</p>",
            "subnote_guid": "1ab692d612a83b056388d77fcba36872"
        }
    ],
    "type": "bioghist",
    "persistent_id": "ref11",
    "label": "THOMAS TURNER 1729-1793"
}

This is a considerable difference from what happens when using the migration tool.  Also, as I noted in my previous message, the ASpace user interface encourages encoding paragraphs explicitly since the p tag is available when a user types in the less-than sign (‘<’) within a notes field.

So, given these discrepancies with how paragraphs are handled in ArchivesSpace (as well as the AT-to-ASpace migration tool), what’s the best way for us to proceed?

It seems to me that we should proceed by updating all of our notes in the AT to have explicitly encoded paragraph elements so that our ASpace JSON values will look like the second example; doing this would ensure that our EAD files will export correctly right now.  However, I do not know what changes are happening in ArchivesSpace, so I’d like advice on this before we conduct our migration.  Is it intended that ArchiveSpace will manage both implicit paragraphs as well as explicitly encoded paragraphs?  Right now, ASpace will only manage the latter.  If that’s going to continue to be the case, then I think that a lot of users (AT users at least) will need assistance either pre- or post-migration to address this issue.  Does that make more sense?

Mark




From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Chris Fitzpatrick
Sent: Tuesday, July 29, 2014 4:26 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] AT > ASpace migration questions




Hi Mark,



Sorry this got lost in the tread...had another user ask about this and I wanted to follow up..



Yes, the ticket I have now is just for display.



In regards to actually adding text into the note field, you're asking if line breaks could be converted into wrapping the blocks of text in <p> tags and added to the note's text? Or, are you just wanting the EAD exporter to do this to blocks of text?



b,chris.


Chris Fitzpatrick | Developer, ArchivesSpace
Skype: chrisfitzpat  | Phone: 918.236.6048
http://archivesspace.org/
________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> <archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>> on behalf of Custer, Mark <mark.custer at yale.edu<mailto:mark.custer at yale.edu>>
Sent: Wednesday, July 23, 2014 4:24 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] AT > ASpace migration questions

To follow up on one point of this thread specifically, I wonder if there is any more clarification regarding Ed’s question about “Lengthy text in Scope & Contents Notes,” its corresponding user story, https://www.pivotaltracker.com/s/projects/386247/stories/72606134, and how this is going to be addressed in future releases of ArchivesSpace?

The above user story only applies to how this note displays in ArchivesSpace, but what I need to address currently is how this note exports in the EAD.  In the example at Pivotal Tracker, the note looks like this:

This is a story.

It has a break.

Then it ends.

In version 1.0.9 (and earlier), that same note will exports like this in the EAD:

<p>This is a story. It has a break. Then it ends.</p>

Is this going to be changed in the next release of ArchivesSpace, or should we update our AT database prior to the migration to ensure that the following is migrated into ArchivesSpace instead:

<p>This is a story.</p>
<p>It has a break.</p>
<p>Then it ends.</p>

I ask this second question since it seems to be aligned with how ArchivesSpace handles mixing in EAD elements in general, since the paragraph element pops up as an option in the user interface after a user types in the less-than sign, < (see the attached screen shot).  In other words, if the user interface is expecting paragraphs to be encoded explicitly, I assume that ArchivesSpace does not have plans not to interpret double-line breaks as paragraph separations for the EAD export process as Archivists’ Toolkit does.  Is that correct?

Furthermore, if the paragraphs are explicitly encoded in ArchivesSpace (which will ensure that the correct EAD is exported in the current version), will having them encoded explicitly break the EAD export for future versions of ArchivesSpace?

There are also at least two other hard-coded EAD issues in the AT that we’ll likely need to update prior to our migration (since if we don’t do these things the EAD will be invalid upon export from ArchivesSpace), which I’ll list here again just in case no one else has been considering these yet:


•         @target attributes in <ref> elements.  The last time that I checked and reported, the migration tool updates @id attributes, but not the @target attributes that link to those ids.

•         Hard-coded namespace prefixes for XLink will need to change from  ns2: (in the AT) to  xlink: (in ASpace)

Right now, though, I’m primarily concerned with this question of paragraphs, specifically with how these should be encoded in the ArchivesSpace database in light of the AT migration tool and import/export mechanisms of ArchivesSpace.

Mark



From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Galloway, Ed
Sent: Wednesday, July 23, 2014 9:17 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] AT > ASpace migration questions

Chris,

Thanks for your responses…let me try to answer back in red below.

Ed

From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Chris Fitzpatrick
Sent: Wednesday, July 23, 2014 6:31 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] AT > ASpace migration questions


Hi Ed,



Glad to hear about your migration and thanks for letting us know about these issues...

Some responses:


Agents do not display in the Public interface, meaning no Creators or Personal/Corporate names appear in the search results or as an option to browse; nothing displays when the “Names” category is selected. It says “No records,” but they’re there on the Staff side.

Hm... For the agents to display, they need to be associated to a published resource. Do you have an agent that is associated to a published resource? Can you see that agent in the published resource on the public side?
If you mean is an agent (ie corporate name) linked to a resource (ie finding aid) from the Staff side, then yes most definitely. All of our finding aids (almost 1000) have Creators or Personal/Corporate names that we created in AT and appear in our current online guides. But when we look at them in the Public ASpace view, none of them appear anywhere (but they are there in Staff).


HTML markup does not display; instead it shows the mark-up itself (I believe this is a known ASpace bug and being addressed).



Which context is this in? Notes or titles? Also, be aware adding certain HTML to your data can make your EAD export invalid...



Using the wrap-in tag feature in AT, we created italics fairly regularly for unittitles such as <title render=”italic”>Ed’s Book</title>. This markup was migrated exactly as you see it in my example. In other words, it doesn’t actually display in italics but we used the AT tools provided. It happens anywhere we used <emph> tags whether it was unittitle or scope notes or anywhere else. We did not get any EAD export errors.





   Similarly though, if a <unittitle> begins with an <emph> tag, the title does not display at all and neither does any of its children. If for example, the name of a subseries started with an <emph> tag, none of the folders in that subseries display.



This is a known issue...https://www.pivotaltracker.com/story/show/75502994



OK




•         Lengthy text in Scope & Contents Notes (eg) do not show paragraph breaks, but instead display as one long paragraph. When viewed on the Staff side, it does format properly when viewed as Formatted Text (not Raw).

A feature request has been added for this.. https://www.pivotaltracker.com/story/show/72606134

Ok



    External links (URLs) we’ve inserted in a S&C note do not actually link out.



How are you adding these urls to the note?



Using the AT wrap-in tag feature <extref> so the markup looks like this: “Digital reproductions of the collection are available <extref href="http://digital.library.pitt.edu/images/pittsburgh/cityphotographer.html">online</extref>.” The links work just fine in our current online guides.





     Perform an Advanced Search and view first set of results; then advance to next page of results. This second set of results is not based on the original search but appears to be the results of searching everything!



Yikes. I just replicated that. Definitely a bug...hard to believe nobody has caught that one yet. Make a ticket for that here :

https://www.pivotaltracker.com/story/show/75503592



Ok…glad we could help!



 Search does not highlight your search terms. Can it?



Short answer is yes. This is actually a Solr configuration that would need to be made to return highlight text in the search results.

However, there is a feature request to have this native  in the application here : https://www.pivotaltracker.com/story/show/66505844



Ok





There is no Go to Next Folder (eg Component) feature; in other words when viewing any folder, it would be nice to simply click a button to see the next (or preceding) folder (or item).



Ok, so is this in the public interface only? So, if youre looking at a resource in a folder, you'd like a button just to take you to the next folder rather than having to go back to the top level view and select the next folder?



Yes in the Public interface. Yes most definitely! :)



Thanks!



best, chris.




Chris Fitzpatrick | Developer, ArchivesSpace
Skype: chrisfitzpat  | Phone: 918.236.6048
http://archivesspace.org/
________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> <archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>> on behalf of Galloway, Ed <edwardg at pitt.edu<mailto:edwardg at pitt.edu>>
Sent: Tuesday, July 22, 2014 3:32 PM
To: Archivesspace Users Group
Subject: [Archivesspace_Users_Group] AT > ASpace migration questions

All,

The University of Pittsburgh has successfully migrated its data from AT to ASpace, meaning we have worked out any transfer problems so 100% of our records migrate. Now we are carefully reviewing the data in the Public ASpace interface and have observed the following which we’d like comments on:


•         Agents do not display in the Public interface, meaning no Creators or Personal/Corporate names appear in the search results or as an option to browse; nothing displays when the “Names” category is selected. It says “No records,” but they’re there on the Staff side.

•         HTML markup does not display; instead it shows the mark-up itself (I believe this is a known ASpace bug and being addressed).

•         Similarly though, if a <unittitle> begins with an <emph> tag, the title does not display at all and neither does any of its children. If for example, the name of a subseries started with an <emph> tag, none of the folders in that subseries display.

•         Lengthy text in Scope & Contents Notes (eg) do not show paragraph breaks, but instead display as one long paragraph. When viewed on the Staff side, it does format properly when viewed as Formatted Text (not Raw).

•         External links (URLs) we’ve inserted in a S&C note do not actually link out.

Bug?

•         Perform an Advanced Search and view first set of results; then advance to next page of results. This second set of results is not based on the original search but appears to be the results of searching everything!

Interface questions:

•         Search does not highlight your search terms. Can it?

•         There is no Go to Next Folder (eg Component) feature; in other words when viewing any folder, it would be nice to simply click a button to see the next (or preceding) folder (or item).

I would appreciate any feedback you can give me. We’re at the stage of determining what we need to address versus what is a bug or other issue that the ASpace developers need to address.

Thanks, Ed

Edward A. Galloway
Head, Archives Service Center
University Library System
University of Pittsburgh
412-648-5901
facebook.com/pittarchives<http://www.facebook.com/pittarchives>
pittarchives.tumblr.com<http://pittarchives.tumblr.com/>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20140730/56195220/attachment.html>


More information about the Archivesspace_Users_Group mailing list