Transformations – WTF’s going on? Andy.hunt@alfresco.com Basics… What’s a Transformation? • • • • • Indexing Doclib Thumbnails Previews Rules …. What’s the problem? • Lots of transformers • Lots of mimetypes • Lots of permutations of the above • Inconsistent results / Non-deterministic • Transformations not working • Lack of visibility How does Alfresco choose? • Active Transformers • “Explicit” takes precedence • Any Limits • Speed Make it transparent • Log4j.logger. org.alfresco.repo.content.transform .TransformerDebug = DEBUG • debugTransfomers.txt • Exactly 18 bytes Example 1 – txt to html ] 193 text/plain text/html ] 193 txt html 24.txt 5 bytes ContentService.transform(...) ] 193 **a) transformer.complex.OpenOffice.PdfBox<<Complex>> < 5 MB 204 ms ] 193 b) transformer.OpenOffice<<Proxy>> 1,918 ms ] 193 c) transformer.TikaAuto 6,724 ms ] 193.1 text/plain text/html ] 193.1 txt html 24.txt 5 bytes transformer.complex.OpenOffice.PdfBox<<Complex>> ] 193.1.1 text/plain application/pdf ] 193.1.1 txt pdf 24.txt 5 bytes transformer.OpenOffice<<Proxy>> ] 193.1.1 Finished in 43 ms ] 193.1.2 store:///installs/3411e/tomcat/temp/Alfresco/ComplextTransformer_intermediate_txt_59276 71274426616985.pdf ] 193.1.2 application/pdf text/html ] 193.1.2 pdf html <<TemporaryFile>> 6.2 KB transformer.PdfBox ] 193.1.2 Finished in 7 ms ] 193.1 Finished in 50 ms ] 193 Finished in 56 ms Example 2 – large txt to html ] 204 ] 204 ] 204 ] 204 txt html alfresco.biggerlog.txt 16.5 MB ContentService.transform(...) **a) transformer.TikaAuto 526 ms b) transformer.OpenOffice<<Proxy>> 853 ms --c) transformer.complex.OpenOffice.PdfBox<<Complex>> > 5 MB Example lists ] 13.2 transformer.StringExtracter 0 ms ] 13.2 1) txt txt unlimited ] 13.2 2) csv txt unlimited ] 13.2 3) html txt unlimited disabled not explicit ] 14.1243 txt jp2 a) transformer.complex.OpenOffice.Image<<Complex>> 1,171 ms 5 MB ] 14.1249 txt txt a) transformer.StringExtracter 0 ms unlimited ] 14.1249 b) transformer.TikaAuto 0 ms unlimited ] 14.1249 c) transformer.complex.OpenOffice.PdfBox<<Complex>> 0 ms 0 bytes disabled What can we do? • Available transformers • content-services-context.xml <!-- This one does excel only --> <bean id="transformer.Poi" class="org.alfresco.repo.content.transform.PoiHssfContentTransformer" parent="baseContentTransformer" /> What can we do? • Explicit transformers html txt a) transformer.StringExtracter 0 ms unlimited disabled not explicit b) transformer.OpenOffice<<Proxy>> 831 ms 0 bytes disabled not explicit c) transformer.TikaAuto 0 ms unlimited disabled not explicit d) transformer.HtmlParser 0 ms unlimited EXPLICIT e) transformer.complex.OpenOffice.PdfBox<<Complex>> 0 ms unlimited disabled not explicit <property name="explicitTransformations"> <list> <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails" > <property name="sourceMimetype"><value>text/html</value></property> <property name="targetMimetype"><value>text/plain</value></property> </bean> </list> </property> What can we do? • Explicit transformers - 2 html txt a) transformer.StringExtracter 0 ms unlimited disabled not explicit b) transformer.OpenOffice<<Proxy>> 831 ms 0 bytes disabled not explicit c) transformer.TikaAuto 0 ms unlimited disabled not explicit d) transformer.HtmlParser 0 ms unlimited EXPLICIT e) transformer.complex.OpenOffice.PdfBox<<Complex>> 0 ms unlimited disabled not explicit <property name="supportedTransformations"> <list> <bean class="org.alfresco.repo.content.transform.SupportedTransformation" > <property name="sourceMimetype"><value>text/html</value></property> <property name="targetMimetype"><value>text/csv</value></property> </bean> </list> </property> What can we do? • Any Limits • maxSourceSizeKBytes • content.transformer.PdfBox.TextToPdf.maxSourceSizeKBytes • Listed in repository.properties • content.transformer.default.maxSourceSizeKBytes=-1 What can we do? • Speed - Startup Averages • • • • transformer.OpenOffice.time=123456 transformer.PdfBox.TextToPdf.time=50000 transformer.complex.Text.Image.time=10000 transformer.complex.Text.Image.count=10000 Thank you for listening andy.hunt@alfresco.com