<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div>On Feb 19, 2014, at 3:25 AM, Chris Fitzpatrick <<a href="mailto:Chris.Fitzpatrick@lyrasis.org">Chris.Fitzpatrick@lyrasis.org</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; font-size: 12pt; background-color: rgb(255, 255, 255); font-family: Calibri, Arial, Helvetica, sans-serif; position: static; z-index: auto;"><div style="margin-top: 0px; margin-bottom: 0px;"><br class="Apple-interchange-newline">Hey Noah,<br></div><div style="margin-top: 0px; margin-bottom: 0px;"><br></div><div style="margin-top: 0px; margin-bottom: 0px;">Yes, the EAD import errors are rather uninformative. A big problem is that schema valid EAD is sometimes not compliant to the AS model. There's a big range of variance that is allowed in the EAD schema, so we're still working on getting all the import mappings right.</div><div style="margin-top: 0px; margin-bottom: 0px;"><br></div></div></blockquote></div><br><div><br></div><div>Are these other (non-schema) restrictions documented somewhere ? </div><div><br></div><div>What I’ve figured so far ( besides those character count limits ) is: </div><div><br></div><div><physdesc> must contain an <extent> element, and it’s contents must be a number and a unit </div><div>( although unit seems to accept almost any phrase, as long as it’s preceded by a number. )</div><div>Are there other places where <extent> is required ? This seems to be the greatest class of import error we’re seeing,</div><div>even after fixing some of the instances with a stylesheet. </div><div><br></div><div><br></div><div>There are restrictions on dates, but we haven’t quite figured out those rules:</div><div>It clearly doesn't like “n.d.” (no date) or “c.a.” (circa) as a prefix. ( Will it parse as a suffix, after a date ? ) </div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><div>We’re also seeing truncated unittitle’s on some of the ones that have been successfully been imported.</div><div><br></div><div>In some instances, these occur when a <unittitle> contains mixed content, for example:</div><div><br></div><div><unittitle>Newspaper Clippings, <title xmlns:xlink="<a href="http://www.w3.org/1999/xlink">http://www.w3.org/1999/xlink</a>" xlink:type="simple" render="italic" xlink:href="">Richmond Times Dispatch,</title><unitdate type="inclusive" era="ce" calendar="gregorian">1967-1968</unitdate></unittitle></div><div><br></div><div><br></div><div>shows up as “Newspaper Clippings,” . </div><div><br></div><div><br></div><div>I don’t know if it’s specifically the mixed content and embedded tags that is the problem, or if it’s the empty attributes</div><div> ( xlink:href=“” — those are added by the LOC dtd2schema.xsl conversion. ), but I have seen empty attributes in</div><div>other elements cause parse errors on import. </div><div><br></div><div><br></div><div>( There appear to be other truncations that may not fit that specific pattern, that I haven’t traced yet. ) </div><div><br></div></div><div><br></div><div>— Steve Majewski</div><div><br></div><div><br></div></body></html>