[Archivesspace_Users_Group] unpublished resource showing up as PDF download in a Google search

Amanda Focke afocke at rice.edu
Tue Jul 5 17:07:54 EDT 2016


So, it sounds like if a resource was *ever* set to "Publish" at the 
resource level, Google / other search engines will have crawled it and 
the EAD-PDF link will be there in search engines and live on, even if 
that resource subsequently gets unpublished.

If in our conversion from AT, *all* resources were set to "publish" 
(instead of just publishing ones with the status "completed", then that 
most likely explains this situation. I do recall that in our 
post-conversion QC work a year ago, we did "unpublish" whatever 
Resources had a status other than "completed".  So these are resources 
that could not have been "published" for long (more than a couple of 
months, though - long enough to be crawled) but have been "unpublished" 
at the resource level for a year now, and yet are discoverable via their 
EAD-PDF link.

Mang mentions that If a visibility check could be introduced when 
generating EAD-PDF and etc., the problem can be solved.
That sounds like it could be helpful as being built into ArchivesSpace, no?

For now though, we will have make a plan for dealing with our 
unpublished resources which are discoverable in search engines.

Thanks for helping to think this through,
Amanda



On 7/5/2016 3:17 PM, Mang Sun wrote:
>
>
> Amanda,
>
>
> When did you change the Publish? flag of those resources to "No" ? I 
> vaguely recall, at the very beginning, all resources were set to 
> Publish? YES. Then if this is true, Google has  already crawled and  
> indexed the shortcut link to EAD-PDF for each resource, even its 
> Publish? was set to NO later on. Because at our current version which 
> is 1.4.2 the code underlying the link to EAD-PDF seemingly doesn't 
> check the PUBLISH? flag of resources, the shortcut link (even can be 
> assembled manually by following a pattern) in question will remain 
> valid and in effect,and will be kept crawled and indexed by Google 
> even for unpublished resources that were ever published . If a 
> visibility check could be introduced when generating EAD-PDF and etc., 
> the problem can be solved.  To prevent Google from remembering a 
> shortcut link with our current version, a new resource should be set 
> to Publish?NO at the very beginning without, but this still can't 
> prevent power users from handcrafting the link to get EAD-PDF output 
> of an invisible resource if they know the generated or assigned 
> resource number.
>
>
> Mang
>
>
>
>
> On 7/5/2016 2:52 PM, Custer, Mark wrote:
>>
>> Amanda,
>>
>> So, it sounds like the PUI is working as expected in that case, but 
>> that the ASpace PDF conversion process is including everything from 
>> each finding aid, whether it’s listed as published or not.  Is that 
>> right?  If so, it should just be a simple update to the ASpace PDF 
>> stylesheet, and that type of change should definitely be in the core 
>> code.
>>
>> I’ll look to see if there’s an open issue for this, but if there’s 
>> not, I can create one in JIRA.  I’ve made a couple updates  to the 
>> core ASpace PDF stylesheet, and I hope to make a few more before the 
>> next PUI is released.
>>
>> Mark
>>
>> *From:*archivesspace_users_group-bounces at lyralists.lyrasis.org 
>> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] *On 
>> Behalf Of *Amanda Focke
>> *Sent:* Tuesday, 05 July, 2016 2:45 PM
>> *To:* archivesspace_users_group at lyralists.lyrasis.org
>> *Subject:* Re: [Archivesspace_Users_Group] unpublished resource 
>> showing up as PDF download in a Google search
>>
>> Hello Mang and all -
>>
>> What I did was search Google for something I know is from one of our 
>> unpublished finding aids,
>>
>> such as this text string:
>> "10002.  Genetic regulatory proteins -1 (laci with altered ligand 
>> responsivity.  Kathleen Matthews"
>>
>> and the result was that the entire ArchivesSpace-generated PDF 
>> version of the (unfinished / unpublished) finding aid is available as 
>> the 2nd hit from Google's results list.
>>
>>
>> I was hoping to attend the ArchivesSpace webinar which is going on 
>> right now to see if this issue has been resolved,
>> but the webinar is full. I'll just wait for the recording and if my 
>> questions aren't answered there, will follow up with ArchivesSpace folks.
>>
>> Amanda
>>
>>
>>
>>
>>
>> On 7/5/2016 9:48 AM, Mang Sun wrote:
>>
>>     Amanda,
>>
>>     I am just back. I seemingly can't reproduce the Google hit by
>>     searching Google for "Randall Hulet" and I don't see problem with
>>     our Public interface when searching for "Randall Hulet".  Can you
>>     give me a screen snapshot of your googling result for the title
>>     of this  archival object?
>>
>>     Mang
>>
>>     On 6/15/2016 1:59 PM, Amanda Focke wrote:
>>
>>         I think this may beAR-583   or AR-278 which both seem to say
>>         they are resolved, so maybe if we upgrade this summer to the
>>         new version this will be fixed....
>>
>>         Amanda
>>
>>         On 6/14/2016 4:29 PM, Amanda Focke wrote:
>>
>>             Hello --
>>
>>             We have an *unpublished* Resource in our ArchivesSpace
>>             instance which is showing up
>>             when I search a text string from it in Google.
>>
>>             I search a text string from that resource and I get a hit
>>             (in Google) coming from our ArchivesSpace offering a
>>             "printer friendly download" of the full PDF for the
>>             Resource.
>>
>>             I double checked the Resource, it is definitely
>>             "unpublished" at the top level, although it has
>>             components which are marked as published (I'm not sure
>>             why those are published but it shouldn't matter if the
>>             parent is unpublished).
>>
>>             Has anyone noticed this behavior?
>>             Thanks,
>>             Amanda
>>
>>             -- 
>>             *Amanda Focke, CA, DAS*
>>             Asst. Head of Special Collections
>>             Woodson Research Center
>>             Fondren Library MS-44
>>             Rice University
>>             6100 Main St.
>>             Houston, TX 77005
>>             713-348-2124 | afocke at rice.edu <mailto:afocke at rice.edu>
>>             Website: http://library.rice.edu/woodson
>>             <https://urldefense.proofpoint.com/v2/url?u=http-3A__library.rice.edu_woodson&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=3SIN67f0Tro00gQKJHxLbmDWmnRPz399UpBuwNe5Xr4&e=>
>>             Blog: http://woodsononline.wordpress.com/
>>             <https://urldefense.proofpoint.com/v2/url?u=http-3A__woodsononline.wordpress.com_&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=dMbXm9sY5G9VGaxj-ur6CTV2KvUNNmKIK2Y0_39Ne5g&e=>
>>
>>
>>
>>
>>             _______________________________________________
>>
>>             Archivesspace_Users_Group mailing list
>>
>>             Archivesspace_Users_Group at lyralists..lyrasis.org
>>             <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
>>
>>             http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>             <https://urldefense..proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=N0QkZjMA44kL7h0mu-ZlNla8zK2LgHWQ4PAEFM4eAhg&e=>
>>
>>         -- 
>>         *Amanda Focke, CA, DAS*
>>         Asst. Head of Special Collections
>>         Woodson Research Center
>>         Fondren Library MS-44
>>         Rice University
>>         6100 Main St.
>>         Houston, TX 77005
>>         713-348-2124 | afocke at rice.edu <mailto:afocke at rice.edu>
>>         Website: http://library.rice.edu/woodson
>>         <https://urldefense.proofpoint.com/v2/url?u=http-3A__library.rice.edu_woodson&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=3SIN67f0Tro00gQKJHxLbmDWmnRPz399UpBuwNe5Xr4&e=>
>>         Blog: http://woodsononline.wordpress.com/
>>         <https://urldefense.proofpoint.com/v2/url?u=http-3A__woodsononline.wordpress.com_&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=dMbXm9sY5G9VGaxj-ur6CTV2KvUNNmKIK2Y0_39Ne5g&e=>
>>
>>
>>
>>
>>         _______________________________________________
>>
>>         Archivesspace_Users_Group mailing list
>>
>>         Archivesspace_Users_Group at lyralists.lyrasis.org
>>         <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
>>
>>         http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>         <https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=N0QkZjMA44kL7h0mu-ZlNla8zK2LgHWQ4PAEFM4eAhg&e=>
>>
>>
>>     !DSPAM:114,577bc8b160581446016412!
>>
>>
>>     _______________________________________________
>>
>>     Archivesspace_Users_Group mailing list
>>
>>     Archivesspace_Users_Group at lyralists.lyrasis.org
>>     <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
>>
>>     http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>     <https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=N0QkZjMA44kL7h0mu-ZlNla8zK2LgHWQ4PAEFM4eAhg&e=>
>>
>>     !DSPAM:114,577bc8b160581446016412!
>>
>> -- 
>> *Amanda Focke, CA, DAS*
>> Asst. Head of Special Collections
>> Woodson Research Center
>> Fondren Library MS-44
>> Rice University
>> 6100 Main St.
>> Houston, TX 77005
>> 713-348-2124 | afocke at rice.edu <mailto:afocke at rice.edu>
>> Website: http://library.rice.edu/woodson 
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__library.rice.edu_woodson&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=3SIN67f0Tro00gQKJHxLbmDWmnRPz399UpBuwNe5Xr4&e=>
>> Blog: http://woodsononline.wordpress.com/ 
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__woodsononline.wordpress.com_&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=dMbXm9sY5G9VGaxj-ur6CTV2KvUNNmKIK2Y0_39Ne5g&e=>
>>
>>
>>
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>
> !DSPAM:114,577c15c560581109012355!
>
>
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>
>
> !DSPAM:114,577c15c560581109012355!


-- 
*Amanda Focke, CA, DAS*
Asst. Head of Special Collections
Woodson Research Center
Fondren Library MS-44
Rice University
6100 Main St.
Houston, TX 77005
713-348-2124 | afocke at rice.edu
Website: http://library.rice.edu/woodson
Blog: http://woodsononline.wordpress.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20160705/721aa05b/attachment.html>


More information about the Archivesspace_Users_Group mailing list