Corrupt PDF or PPTX files can block indexing queue in WebCenter Content 184.108.40.206 platform using DATABASE.FULLTEXT as engine.
How identify it?
Before start Collection Rebuild put some traces to debug level. Follow next:
- Configure IDC tracers: indexer, indexermonitor, indexerprocess, systemdatabase, taskmanager. In addition, check Full Verbose Tracing.
- The new traces will contain following information:
- indexer, indexermonitor, indexerprocess: Shows indexing task traces.
- systemdatabase: Shows all the SQL Queries involved in the indexing process.
- taskmanager: Shows information relative to the process and tasks involved in the indexing process. For example, TextExport task is the responsible of transform the content to indexing files.
- Before start full Collection Rebuilt it's recommendable stop automatic indexer.
- Configure Collection Reubuild to generate traces.
- When the indexer stops/block indexing a corrupt file then will appear a log trace like next:
(internal)/6 06.25 22:17:22.961 TextExport_0 Process 'TextExport' timed out.It means a timeout during the conversion of the processing content to an index file. Timeout sometimes could be solve including next variables to config.cfgIndexerTextExtractionTimeout: por defecto son 15 sec (subirlo a 60 sec).
TextExtractorTimeoutSec: por defecto son 15 sec (subirlo a 60 sec).
- However, after increase Timeout and set taskmanager on in IDC traces shows following log:taskmanager/6 06.25 23:56:10.636 TextExport_0 Task failed with output: 1.
(internal)/7 06.25 23:56:10.636 TextExport_0 Unexpected abort by process 'TextExport'.taskmanager/6 06.25 23:56:10.636 TextExport_0 Removing launcher for task: TextExport that has been marked as terminatedindexer/6 06.25 23:56:10.636 TextExport_0 Extracted file contains zero bytes.taskmanager/6 06.25 23:56:10.652 TextExport_0 task Monitor <intradoc.taskmanager.TaskMonitor$1@130a6d30> exitingtaskmanager/7 06.25 23:56:10.652 TaskLauncher_TextExport_stderr__0 Finish reading.taskmanager/7 06.25 23:56:10.652 TaskLauncher_TextExport_stderr__0 Finish reading.TextExport were aborted when processing a file.
This error is due to a bug in Oracle 220.127.116.11 WebCenter Content. Applying the latest patch of WebCenter Content 18.104.22.168 is solved by making the indexer not remain stuck when it encounters a problem of this type, and, therefore, allowing again the indexing process until finish.