DevCon 2012_TransformationsWTF

advertisement
Transformations – WTF’s going on?
[email protected]
Basics…
What’s a Transformation?
•
•
•
•
•
Indexing
Doclib Thumbnails
Previews
Rules
….
What’s the problem?
• Lots of transformers
• Lots of mimetypes
• Lots of permutations of the above
• Inconsistent results / Non-deterministic
• Transformations not working
• Lack of visibility
How does Alfresco choose?
•
Active Transformers
•
“Explicit” takes precedence
•
Any Limits
•
Speed
Make it transparent
• Log4j.logger.
org.alfresco.repo.content.transform
.TransformerDebug
= DEBUG
• debugTransfomers.txt
• Exactly 18 bytes
Example 1 – txt to html
] 193
text/plain text/html
] 193
txt html 24.txt 5 bytes ContentService.transform(...)
] 193
**a) transformer.complex.OpenOffice.PdfBox<<Complex>> < 5 MB 204 ms
] 193
b) transformer.OpenOffice<<Proxy>>
1,918 ms
] 193
c) transformer.TikaAuto
6,724 ms
] 193.1
text/plain text/html
] 193.1
txt html 24.txt 5 bytes transformer.complex.OpenOffice.PdfBox<<Complex>>
] 193.1.1
text/plain application/pdf
] 193.1.1
txt pdf 24.txt 5 bytes transformer.OpenOffice<<Proxy>>
] 193.1.1
Finished in 43 ms
] 193.1.2
store:///installs/3411e/tomcat/temp/Alfresco/ComplextTransformer_intermediate_txt_59276
71274426616985.pdf
] 193.1.2
application/pdf text/html
] 193.1.2
pdf html <<TemporaryFile>> 6.2 KB transformer.PdfBox
] 193.1.2
Finished in 7 ms
] 193.1
Finished in 50 ms
] 193
Finished in 56 ms
Example 2 – large txt to html
] 204
] 204
] 204
] 204
txt html alfresco.biggerlog.txt 16.5 MB ContentService.transform(...)
**a) transformer.TikaAuto
526 ms
b) transformer.OpenOffice<<Proxy>>
853 ms
--c) transformer.complex.OpenOffice.PdfBox<<Complex>> > 5 MB
Example lists
] 13.2
transformer.StringExtracter 0 ms
] 13.2
1) txt txt unlimited
] 13.2
2) csv txt unlimited
] 13.2
3) html txt unlimited disabled not explicit
] 14.1243 txt jp2
a) transformer.complex.OpenOffice.Image<<Complex>> 1,171 ms 5 MB
] 14.1249 txt txt
a) transformer.StringExtracter 0 ms unlimited
] 14.1249
b) transformer.TikaAuto 0 ms unlimited
] 14.1249
c) transformer.complex.OpenOffice.PdfBox<<Complex>> 0 ms 0 bytes
disabled
What can we do?
• Available transformers
• content-services-context.xml
<!-- This one does excel only -->
<bean id="transformer.Poi"
class="org.alfresco.repo.content.transform.PoiHssfContentTransformer"
parent="baseContentTransformer" />
What can we do?
• Explicit transformers
html txt
a) transformer.StringExtracter 0 ms unlimited disabled not explicit
b) transformer.OpenOffice<<Proxy>> 831 ms 0 bytes disabled not
explicit
c) transformer.TikaAuto 0 ms unlimited disabled not explicit
d) transformer.HtmlParser 0 ms unlimited EXPLICIT
e) transformer.complex.OpenOffice.PdfBox<<Complex>> 0 ms
unlimited disabled not explicit
<property name="explicitTransformations">
<list>
<bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails" >
<property name="sourceMimetype"><value>text/html</value></property>
<property name="targetMimetype"><value>text/plain</value></property>
</bean>
</list>
</property>
What can we do?
• Explicit transformers - 2
html txt
a) transformer.StringExtracter 0 ms unlimited disabled not explicit
b) transformer.OpenOffice<<Proxy>> 831 ms 0 bytes disabled not
explicit
c) transformer.TikaAuto 0 ms unlimited disabled not explicit
d) transformer.HtmlParser 0 ms unlimited EXPLICIT
e) transformer.complex.OpenOffice.PdfBox<<Complex>> 0 ms
unlimited disabled not explicit
<property name="supportedTransformations">
<list>
<bean class="org.alfresco.repo.content.transform.SupportedTransformation" >
<property name="sourceMimetype"><value>text/html</value></property>
<property name="targetMimetype"><value>text/csv</value></property>
</bean>
</list>
</property>
What can we do?
• Any Limits
• maxSourceSizeKBytes
• content.transformer.PdfBox.TextToPdf.maxSourceSizeKBytes
• Listed in repository.properties
• content.transformer.default.maxSourceSizeKBytes=-1
What can we do?
• Speed - Startup Averages
•
•
•
•
transformer.OpenOffice.time=123456
transformer.PdfBox.TextToPdf.time=50000
transformer.complex.Text.Image.time=10000
transformer.complex.Text.Image.count=10000
Thank you for listening
[email protected]
Download