[Archivesspace_Users_Group] unpublished resource showing up as PDF download in a Google search
Amanda Focke
afocke at rice.edu
Tue Jul 5 17:07:54 EDT 2016
So, it sounds like if a resource was *ever* set to "Publish" at the
resource level, Google / other search engines will have crawled it and
the EAD-PDF link will be there in search engines and live on, even if
that resource subsequently gets unpublished.
If in our conversion from AT, *all* resources were set to "publish"
(instead of just publishing ones with the status "completed", then that
most likely explains this situation. I do recall that in our
post-conversion QC work a year ago, we did "unpublish" whatever
Resources had a status other than "completed". So these are resources
that could not have been "published" for long (more than a couple of
months, though - long enough to be crawled) but have been "unpublished"
at the resource level for a year now, and yet are discoverable via their
EAD-PDF link.
Mang mentions that If a visibility check could be introduced when
generating EAD-PDF and etc., the problem can be solved.
That sounds like it could be helpful as being built into ArchivesSpace, no?
For now though, we will have make a plan for dealing with our
unpublished resources which are discoverable in search engines.
Thanks for helping to think this through,
Amanda
On 7/5/2016 3:17 PM, Mang Sun wrote:
>
>
> Amanda,
>
>
> When did you change the Publish? flag of those resources to "No" ? I
> vaguely recall, at the very beginning, all resources were set to
> Publish? YES. Then if this is true, Google has already crawled and
> indexed the shortcut link to EAD-PDF for each resource, even its
> Publish? was set to NO later on. Because at our current version which
> is 1.4.2 the code underlying the link to EAD-PDF seemingly doesn't
> check the PUBLISH? flag of resources, the shortcut link (even can be
> assembled manually by following a pattern) in question will remain
> valid and in effect,and will be kept crawled and indexed by Google
> even for unpublished resources that were ever published . If a
> visibility check could be introduced when generating EAD-PDF and etc.,
> the problem can be solved. To prevent Google from remembering a
> shortcut link with our current version, a new resource should be set
> to Publish?NO at the very beginning without, but this still can't
> prevent power users from handcrafting the link to get EAD-PDF output
> of an invisible resource if they know the generated or assigned
> resource number.
>
>
> Mang
>
>
>
>
> On 7/5/2016 2:52 PM, Custer, Mark wrote:
>>
>> Amanda,
>>
>> So, it sounds like the PUI is working as expected in that case, but
>> that the ASpace PDF conversion process is including everything from
>> each finding aid, whether it’s listed as published or not. Is that
>> right? If so, it should just be a simple update to the ASpace PDF
>> stylesheet, and that type of change should definitely be in the core
>> code.
>>
>> I’ll look to see if there’s an open issue for this, but if there’s
>> not, I can create one in JIRA. I’ve made a couple updates to the
>> core ASpace PDF stylesheet, and I hope to make a few more before the
>> next PUI is released.
>>
>> Mark
>>
>> *From:*archivesspace_users_group-bounces at lyralists.lyrasis.org
>> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] *On
>> Behalf Of *Amanda Focke
>> *Sent:* Tuesday, 05 July, 2016 2:45 PM
>> *To:* archivesspace_users_group at lyralists.lyrasis.org
>> *Subject:* Re: [Archivesspace_Users_Group] unpublished resource
>> showing up as PDF download in a Google search
>>
>> Hello Mang and all -
>>
>> What I did was search Google for something I know is from one of our
>> unpublished finding aids,
>>
>> such as this text string:
>> "10002. Genetic regulatory proteins -1 (laci with altered ligand
>> responsivity. Kathleen Matthews"
>>
>> and the result was that the entire ArchivesSpace-generated PDF
>> version of the (unfinished / unpublished) finding aid is available as
>> the 2nd hit from Google's results list.
>>
>>
>> I was hoping to attend the ArchivesSpace webinar which is going on
>> right now to see if this issue has been resolved,
>> but the webinar is full. I'll just wait for the recording and if my
>> questions aren't answered there, will follow up with ArchivesSpace folks.
>>
>> Amanda
>>
>>
>>
>>
>>
>> On 7/5/2016 9:48 AM, Mang Sun wrote:
>>
>> Amanda,
>>
>> I am just back. I seemingly can't reproduce the Google hit by
>> searching Google for "Randall Hulet" and I don't see problem with
>> our Public interface when searching for "Randall Hulet". Can you
>> give me a screen snapshot of your googling result for the title
>> of this archival object?
>>
>> Mang
>>
>> On 6/15/2016 1:59 PM, Amanda Focke wrote:
>>
>> I think this may beAR-583 or AR-278 which both seem to say
>> they are resolved, so maybe if we upgrade this summer to the
>> new version this will be fixed....
>>
>> Amanda
>>
>> On 6/14/2016 4:29 PM, Amanda Focke wrote:
>>
>> Hello --
>>
>> We have an *unpublished* Resource in our ArchivesSpace
>> instance which is showing up
>> when I search a text string from it in Google.
>>
>> I search a text string from that resource and I get a hit
>> (in Google) coming from our ArchivesSpace offering a
>> "printer friendly download" of the full PDF for the
>> Resource.
>>
>> I double checked the Resource, it is definitely
>> "unpublished" at the top level, although it has
>> components which are marked as published (I'm not sure
>> why those are published but it shouldn't matter if the
>> parent is unpublished).
>>
>> Has anyone noticed this behavior?
>> Thanks,
>> Amanda
>>
>> --
>> *Amanda Focke, CA, DAS*
>> Asst. Head of Special Collections
>> Woodson Research Center
>> Fondren Library MS-44
>> Rice University
>> 6100 Main St.
>> Houston, TX 77005
>> 713-348-2124 | afocke at rice.edu <mailto:afocke at rice.edu>
>> Website: http://library.rice.edu/woodson
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__library.rice.edu_woodson&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=3SIN67f0Tro00gQKJHxLbmDWmnRPz399UpBuwNe5Xr4&e=>
>> Blog: http://woodsononline.wordpress.com/
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__woodsononline.wordpress.com_&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=dMbXm9sY5G9VGaxj-ur6CTV2KvUNNmKIK2Y0_39Ne5g&e=>
>>
>>
>>
>>
>> _______________________________________________
>>
>> Archivesspace_Users_Group mailing list
>>
>> Archivesspace_Users_Group at lyralists..lyrasis.org
>> <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
>>
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>> <https://urldefense..proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=N0QkZjMA44kL7h0mu-ZlNla8zK2LgHWQ4PAEFM4eAhg&e=>
>>
>> --
>> *Amanda Focke, CA, DAS*
>> Asst. Head of Special Collections
>> Woodson Research Center
>> Fondren Library MS-44
>> Rice University
>> 6100 Main St.
>> Houston, TX 77005
>> 713-348-2124 | afocke at rice.edu <mailto:afocke at rice.edu>
>> Website: http://library.rice.edu/woodson
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__library.rice.edu_woodson&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=3SIN67f0Tro00gQKJHxLbmDWmnRPz399UpBuwNe5Xr4&e=>
>> Blog: http://woodsononline.wordpress.com/
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__woodsononline.wordpress.com_&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=dMbXm9sY5G9VGaxj-ur6CTV2KvUNNmKIK2Y0_39Ne5g&e=>
>>
>>
>>
>>
>> _______________________________________________
>>
>> Archivesspace_Users_Group mailing list
>>
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
>>
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=N0QkZjMA44kL7h0mu-ZlNla8zK2LgHWQ4PAEFM4eAhg&e=>
>>
>>
>> !DSPAM:114,577bc8b160581446016412!
>>
>>
>> _______________________________________________
>>
>> Archivesspace_Users_Group mailing list
>>
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
>>
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=N0QkZjMA44kL7h0mu-ZlNla8zK2LgHWQ4PAEFM4eAhg&e=>
>>
>> !DSPAM:114,577bc8b160581446016412!
>>
>> --
>> *Amanda Focke, CA, DAS*
>> Asst. Head of Special Collections
>> Woodson Research Center
>> Fondren Library MS-44
>> Rice University
>> 6100 Main St.
>> Houston, TX 77005
>> 713-348-2124 | afocke at rice.edu <mailto:afocke at rice.edu>
>> Website: http://library.rice.edu/woodson
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__library.rice.edu_woodson&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=3SIN67f0Tro00gQKJHxLbmDWmnRPz399UpBuwNe5Xr4&e=>
>> Blog: http://woodsononline.wordpress.com/
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__woodsononline.wordpress.com_&d=CwMD-g&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=qrl1p9pdF8AKUWh4QzJttjsQJvj57JscK0PiJy-NDGM&s=dMbXm9sY5G9VGaxj-ur6CTV2KvUNNmKIK2Y0_39Ne5g&e=>
>>
>>
>>
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>
> !DSPAM:114,577c15c560581109012355!
>
>
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>
>
> !DSPAM:114,577c15c560581109012355!
--
*Amanda Focke, CA, DAS*
Asst. Head of Special Collections
Woodson Research Center
Fondren Library MS-44
Rice University
6100 Main St.
Houston, TX 77005
713-348-2124 | afocke at rice.edu
Website: http://library.rice.edu/woodson
Blog: http://woodsononline.wordpress.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20160705/721aa05b/attachment.html>
More information about the Archivesspace_Users_Group
mailing list