[Archivesspace_Users_Group] More ArchivesSpace performance analysis -- this time, the database

Chris Fitzpatrick Chris.Fitzpatrick at lyrasis.org
Thu Aug 27 17:05:05 EDT 2015



Hi,


@Jason: thanks! Any thoughts would be great about MySQL performance...we could add it to the document ( you can add it in git or just send it to the list and I'll add it..).


@Chris : Yeah, you have to be careful there... If you cache the EAD each time the resource gets updated, that means you'll have to export and EAD every time someone does a save to any record that is involved. Tweak a note? EAD gets exported. Add a few components? EAD gets exported a few times. I think the cache feature would have to be user driven, that is someone have to click an "Update EAD" button.


The other big problem is the pages people usually complain about loading are edit pages, which cannot be cached. We probably could just have the View pages pull from Solr ( I think that was a suggestion in the HM report? ) . If you want caching, you can already implement that right now with Squid or Varnish.


What I think would be great if AT and Archon people could time how many queries exporting an EAD of similar size takes. My gut is telling me that the number of queries has a lot to do with the nature of EAD, but I could be wrong.


Also, with the page load times, I'd really suggest using something like Google analytics  or New Relic, which will give you a much better sense of what's happening.


b,chris.


Chris Fitzpatrick | Developer, ArchivesSpace
Skype: chrisfitzpat  | Phone: 918.236.6048
http://archivesspace.org/


________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Prom, Christopher John <prom at illinois.edu>
Sent: Thursday, August 27, 2015 6:08 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] More ArchivesSpace performance analysis -- this time, the database

While this should not absolve the project from addressing the fundamental issues that report identifies (which are quite concerning, especially the extraordinary number of SQL queries being generated), I second the motion for caching the EAD and html output, with the cache being updated whenever an underlying component of the cached output is updated.

It can be a bit tricky to implement, but many enterprise style web apps have this kind of feature, and there may even be an external library that can be incorporated to do it.  (As a side note, we incorporated caching to archon at a certain point in time, which dramatically cut page load time for long finding aids and EADs where the content had not been modified.  We implemented it so the cache was updated whenever the public page for a modified resource was requested--which was not ideal, but at least contained the problem the initial page load of a modified page.)

Another suggestion I have is to add some performance data to the bottom of each page in the UI (which can be enabled with a configuration setting) along the lines of (Page generated in x seconds using x queries), so that it is possible to assess performance issues on a running basis under multiple environments, which can function as an early warning system in testing new features or modifications—or as a quick way for the team to congratulate themselves when solving a persistent problem.

Chris Prom
University of Illinois



On Aug 27, 2015, at 10:36 AM, Jason Loeffler <j at minorscience.com<mailto:j at minorscience.com>> wrote:

Kudos to Yale for engaging Percona. For small and very small organizations with limited or no in-house IT support, this kind of contribution is essential. Having experienced substantial latency on resource trees with only 3,000 records, I

Chris, regarding benchmarking tools, I've used tuning-primer<https://urldefense.proofpoint.com/v2/url?u=https-3A__launchpad.net_mysql-2Dtuning-2Dprimer&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=jGJMaTc-8I-z6_tkoj_Qyi4UF1KtYBfcz4s2Ly33jmw&m=sYxMTtfZU9rIKQ_pQ1HLrHeYPTcyoGJXudcMXK1pKUM&s=0GZvusRIq5YEacXko6vQC3PGETVe5AhepGb4XsBwVvU&e=> and sysbench<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_akopytov_sysbench&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=jGJMaTc-8I-z6_tkoj_Qyi4UF1KtYBfcz4s2Ly33jmw&m=sYxMTtfZU9rIKQ_pQ1HLrHeYPTcyoGJXudcMXK1pKUM&s=uWUvOQVUkzlXblxwbs2YE-K43_t2r0T72oD4hk-abjw&e=> alongside MySQLTuner for years with good success. Additionally I've deployed ASpace (and other projects) against MariaDB<https://urldefense.proofpoint.com/v2/url?u=https-3A__mariadb.org_&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=jGJMaTc-8I-z6_tkoj_Qyi4UF1KtYBfcz4s2Ly33jmw&m=sYxMTtfZU9rIKQ_pQ1HLrHeYPTcyoGJXudcMXK1pKUM&s=NDSconlpArQ7ucTO1537AUvUHJVcqxw0BChhHcCtPng&e=> and Percona's own InnoDB drop-in replacement with the Xtra DB storage engine<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.percona.com_software_mysql-2Ddatabase_percona-2Dserver_xtradb&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=jGJMaTc-8I-z6_tkoj_Qyi4UF1KtYBfcz4s2Ly33jmw&m=sYxMTtfZU9rIKQ_pQ1HLrHeYPTcyoGJXudcMXK1pKUM&s=qcTLiPE6RJfYMExBCgXUa9yvYPJLSboGPgzTWbYGzOc&e=>. No issues so far.

For organizations with dedicated IT support, I highly recommend the Percona toolkit <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.percona.com_software_mysql-2Dtools_percona-2Dtoolkit&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=jGJMaTc-8I-z6_tkoj_Qyi4UF1KtYBfcz4s2Ly33jmw&m=sYxMTtfZU9rIKQ_pQ1HLrHeYPTcyoGJXudcMXK1pKUM&s=dc8ufJuyqLLtUNbbutaaxjO0vfizSzAz1kOJ8v3e2s0&e=> as a standard part of any deployment as a means to collect and analyze information about database-related problems.

I'd be happy to describe how I've used these database distributions and tools in the ASpace context and outline the potential benefits.

That said, having read the Percona report, I'm not entirely clear that database tuning yields much of a positive impact. The report bears this out. The section on "Alternatives" in the JRuby GitHub page seems most promising. I'm not a Java dev, but this description seems very similar to using in-memory or file-based cache stores, like Memcache or Redis, in unthreaded applications (e.g. PHP).

Jason Loeffler
Technical Consultant | American Academy in Rome
Principal | Minor Science | Application Development & Metadata Strategy
Brooklyn, New York


On Wed, Aug 26, 2015 at 10:47 AM, Chris Fitzpatrick <Chris.Fitzpatrick at lyrasis.org<mailto:Chris.Fitzpatrick at lyrasis.org>> wrote:



Hi Maureen,


This is excellent. Percona is one of the premier MySQL experts, so this is great feedback.


Also, I have a page describing some MySQL and application tuning suggestions.<https://urldefense.proofpoint.com/v2/url?u=http-3A__archivesspace.github.io_archivesspace_user_tuning-2Darchivesspace_&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=jGJMaTc-8I-z6_tkoj_Qyi4UF1KtYBfcz4s2Ly33jmw&m=sYxMTtfZU9rIKQ_pQ1HLrHeYPTcyoGJXudcMXK1pKUM&s=w1fr9R-bpxicmWuR4qurQGSHMMP_pxkOZRfMMYKQ4OQ&e=>

One thing that I suggest is to run a profiler on the MySQL DB server, which can give you some ideas on where to look for problem areas. I've been using MySQL Tuner for years ( http://mysqltuner.com/)<https://urldefense.proofpoint.com/v2/url?u=http-3A__mysqltuner.com_&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=jGJMaTc-8I-z6_tkoj_Qyi4UF1KtYBfcz4s2Ly33jmw&m=sYxMTtfZU9rIKQ_pQ1HLrHeYPTcyoGJXudcMXK1pKUM&s=ilPsD8s5bxhnho-xmV_bsfO2IN6FctFU_06Cm4zgxTA&e=> but maybe there are some other things people are using? Any suggestions would help, and I can add them to the page ( or better yet send me a PR

Also curious about various MySQL distributions ( Oracle, MariaDB, Percona, etc ) people are using?  Are people using dedicated db servers, or clusters, or just having the db server on the same box as the application?


Lastly, it would be great if we could start work on supporting other DBs. I think there's a strong interests ( especially from smaller orgs ) for MS Server, but are others wanting Oracle or postgres?


best, chris.




Chris Fitzpatrick | Developer, ArchivesSpace
Skype: chrisfitzpat  | Phone: 918.236.6048<tel:918.236.6048>
http://archivesspace.org/<https://urldefense.proofpoint.com/v2/url?u=http-3A__archivesspace.org_&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=jGJMaTc-8I-z6_tkoj_Qyi4UF1KtYBfcz4s2Ly33jmw&m=sYxMTtfZU9rIKQ_pQ1HLrHeYPTcyoGJXudcMXK1pKUM&s=L26CpKlTR6X2dmSdnnbzHulcky_6J46BAhQAx2lKhEY&e=>


________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> <archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>> on behalf of Callahan, Maureen <maureen.callahan at yale.edu<mailto:maureen.callahan at yale.edu>>
Sent: Wednesday, August 26, 2015 3:36 PM
To: 'archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>'
Subject: [Archivesspace_Users_Group] More ArchivesSpace performance analysis -- this time, the database

Hi everyone,



As I’ve reported before, we at Yale are in the middle of an aggressive period of analysis to diagnose some of the slow performance we’ve been seeing with ArchivesSpace. As part of this initiative, we’ve contracted with Percona, a firm that specializes in MySQL database analysis. Their report (attached), includes a number of action items that we believe the ArchivesSpace community may find helpful. We hope that this will result in further improvements to the application.



Best wishes,
Maureen



Maureen Callahan
Archivist, Metadata Specialist
Manuscripts & Archives
Yale University Library
maureen.callahan at yale.edu<mailto:maureen.callahan at yale.edu>
203.432.3627<tel:203.432.3627>



Webpage: web.library.yale.edu/mssa<https://urldefense.proofpoint.com/v2/url?u=http-3A__web.library.yale.edu_mssa&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=jGJMaTc-8I-z6_tkoj_Qyi4UF1KtYBfcz4s2Ly33jmw&m=sYxMTtfZU9rIKQ_pQ1HLrHeYPTcyoGJXudcMXK1pKUM&s=il5kqPWzNKhOZRu6dbtSlEGZh3ivBOroSclNvHD2kbI&e=>
Collections: drs.library.yale.edu<https://urldefense.proofpoint.com/v2/url?u=http-3A__drs.library.yale.edu&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=jGJMaTc-8I-z6_tkoj_Qyi4UF1KtYBfcz4s2Ly33jmw&m=sYxMTtfZU9rIKQ_pQ1HLrHeYPTcyoGJXudcMXK1pKUM&s=A5eKQor7yHrRYelGeaAuYnq_9-BEsZBy6_fzNFCHV3I&e=>



_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group<https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=jGJMaTc-8I-z6_tkoj_Qyi4UF1KtYBfcz4s2Ly33jmw&m=sYxMTtfZU9rIKQ_pQ1HLrHeYPTcyoGJXudcMXK1pKUM&s=2HqTRngQzTLf7b73TbhCb4ol3DrdTXvz_H1cfi1XlRg&e=>


_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20150827/4239a754/attachment.html>


More information about the Archivesspace_Users_Group mailing list