Indexing Baggage Files and External URLs
With WebWorks Reverb 2.0, files can be indexed to produce as search results with the user's help set. An indexable Baggage File in this context is any PDF or HTML file that is linked from a source document that will be included in the generated output for producing useful search results. For more detailed information on baggage files, see Targets.
Note: In order to determine what baggage files are indexed, ePublisher examines the file extension and if it matches one on the following then it will be indexed.
.pdf
.html
.htm
.shtml
.shtm
.xhtml
.xhtmBaggage files are indexed in the same way that source documents are. Indexable baggage files will be indexed as long as the Index baggage files Target Setting is Enabled. External URLs will be downloaded & indexed as long as the Index external links Target Setting is Enabled.
Using Tidy for Indexing HTML Pages
In order to index an HTML baggage file, Reverb creates an XHTML copy of the file using Tidy (tool for cleaning up HTML files) to get a valid XML file that ePublisher can read. As useful as Tidy is, there may be times where it does not recognize a tag or generates something improperly. Tidy is configurable and can be adjusted to convert the HTML in the proper way.
When Tidy does not recognize a tag in an HTML file, an error like the following is produced:
line 33 column 3 - Error: <not_recognized_tag> is not recognized!This error means that Tidy wasn't able to generate an XHTML copy of the HTML file, and therefore ePublisher won't be able to index it as a baggage file. With the right adjustments, this can be fixed.
Configuring Tidy To Recognize New Tags
- Go to your Tidy directory under the installation directory in your local computer:
\WebWorks\ePublisher\<VERSION>\Helpers\tidy - Create a Format override of this helper. To do this: in the sub-folder of your project called: Formats, where the Format overrides live, create a new folder called Helpers and copy the entire folder called tidy (from step 1) to this new folder.
- In the newly created tidy folder, open your
config.txtfile. - Depending on the kind of tag you want to add, you'll have to uncomment line 8 or 10, or maybe both in the config.txt file.
- Substitute the placeholder we put there and after the colon, with your new tag name (for example:
not_recognized_tag). - Save and close the file.
To know more about how to customize Tidy go to https://www.w3.org/People/Raggett/tidy/.
Assigning Relevance Weight to Your Source Documents Styles
Search results are displayed in the Search tab when a user types a word to search for. The search results are sorted by a relevancy ranking, which, in the case of source documents, is calculated based on the Search relevance weight option defined in your Paragraph and Marker Styles. By default, WebWorks Reverb 2.0 assigns relevance weight of 1 to all styles.
To Modify the Relevancy Ranking in Source Documents for Search Results
- Open your project with ePublisher Designer.
- Scan the document, to pull all styles into the Style Designer.
- Open the Style Designer (
F10or View > Style Designer). - Select the style you want to assign a weight to (either in Paragraph Styles or Marker Styles).
- Open the Options window.
- Change the Value of the Search relevance weight option to a decimal number you determine or you can just ignore it (which is going to be 0), meaning that the style is not going to be shown in your results.
Assigning Relevance Weight to Your HTML and PDF Baggage Files
The search results are sorted by relevancy ranking, which, in case of HTML baggage files, is calculated based on the scoring preference defined for the HTML tags in the
search_settings.xml file. By default, WebWorks Reverb 2.0 assigns relevancy rankings based on where in a topic a particular item is found.To Modify the Relevancy Ranking in Baggage Files for Search Results
- Open your project with ePublisher Designer.
- If you want to override the relevancy ranking for all WebWorks Reverb 2.0 targets, create the
Formats\WebWorks Reverb 2.0\Transformsfolder in your<ProjectName>folder, where ProjectName is the name of your ePublisher project. - If you want to override the relevancy ranking for one WebWorks Reverb 2.0 target, create the
Targets\<TargetName>\Transformsfolder in your<ProjectName>folder, where ProjectName is the name of your ePublisher project. - Create a customization of your
search_settings.xmlfile. - You'll see the following block of code:
<Settings version="1.0" xmlns="urn:WebWorks-Settings-Schema"> <ScoringPrefs default-weight="0.05" pdf-weight="0.05"> <meta name="keywords" weight="1.0"/> <meta name="description" weight="1.0"/> <meta name="summary" weight="1.0"/> <title weight="1.0"/> <div class="myclass" weight="0.05"/> <div weight="0.05"/> <h1 weight="0.1"/> <h2 weight="0.1"/> <caption weight="0.1"/> <h3 weight="0.1"/> <th weight="0.1"/> <h4 weight="0.1"/> <h5 weight="0.1"/> <h6 weight="0.1"/> <h7 weight="0.1"/> <p weight="0.05"/> </ScoringPrefs> </Settings> - Modify the
weightattributes for any tags, such ash1andh2, you want to change. You can also specify additional tags with or without class attributes to further refine weights for your HTML baggage files. You may use decimal values to modify theweightattribute value.Note: If you wish to set a default weight to tags that are not defined in this file simply update thedefault-weightattribute value.Note: You can change the default weight for all of the text in a PDF file by changing thepdf-weightattribute value. - Save and close the
search_settings.xmlfile. - Regenerate your project to review the changes.
Search Highlighting in Baggage Files
When you click on a result in your Search Results, you'll open the associated source document or baggage file. If what you are clicking is a baggage file and you want to get the highlighting feature in your baggage file, you'll have to copy next to your file and then reference the
reverb-search.js script in the <head> tag of your HTML file. The reverb-search.js file lives in the installation directory at \WebWorks\ePublisher\<VERSION>\Formats\WebWorks Reverb 2.0\API\reverb-search.js. To Reference the reverb-search.js File From Within Your HTML File (After Copying the Script Next to Your HTML File)
- Open your HTML document.
- Locate the
<head>tag. - Create the following line inside the
<head>tag, pointing to the script you just copied:<script type="text/javascript" src=".../reverb-search.js"></script> - Save your HTML file.
- You can either Enable in your Target Settings under Baggage Files the Copy baggage file dependents (this will copy the script to the Output folder, see Copy baggage file dependents), or you can manually copy the file to the Output directory, next to your baggage file.
Last modified date: 01/21/2026