Save Sources
The Save Sources LOP converts rows from an input table DAT into individual Markdown files on disk. It is designed as a bridge between content acquisition operators (like web scrapers or document processors) and the RAG Index LOP, which ingests folders of Markdown files for retrieval-augmented generation.
Input/Output
Section titled “Input/Output”Inputs
Section titled “Inputs”This operator has no wired inputs. It reads from an internal input_table DAT that receives data from an upstream operator connection. The table must contain at least two columns:
doc_id— unique identifier per row, used as the final fallback filenamecontent— the text to write into each Markdown file
Optional columns:
source_path— URL used for filename generation when “Use URL for Filename” is enabled- A custom column specified in “Filename Column (Optional)” for alternative filename sourcing
Outputs
Section titled “Outputs”- One output — passes through the input table for downstream chaining
The primary output of this operator is the set of .md files written to disk in the configured output folder.
Basic File Export
Section titled “Basic File Export”- Connect an upstream operator (such as a web scraper) so that its output populates the internal
input_tableDAT withdoc_idandcontentcolumns. - On the Save Config page, set “Output Folder” to the directory where files should be saved.
- Pulse “Save Markdown Files” to begin the export.
- Monitor “Current Status”, “Progress (%)”, and “Files Saved” to track the operation.
- When complete, the status will show the number of files saved and the time elapsed.
URL-Based Filenames
Section titled “URL-Based Filenames”When saving content scraped from the web, enable “Use URL for Filename” to generate meaningful filenames from the source_path column:
https://example.com/articles/machine-learningbecomesarticles_machine-learning.mdhttps://site.com/docs/tutorial.htmlbecomesdocs_tutorial.mdhttps://blog.com/index.php?id=123becomesindex_php_id_123.md
The operator strips common extensions (.html, .php), sanitizes special characters, and truncates filenames to 100 characters.
Custom Filenames
Section titled “Custom Filenames”To use a specific column for filenames instead of URLs or document IDs:
- Add a column to your input table with the desired filenames (e.g., a column named
filename). - On the Save Config page, enter the column name in “Filename Column (Optional)”.
- If “Use URL for Filename” is also enabled, the URL method is tried first and this column serves as a fallback.
Adding a Prefix
Section titled “Adding a Prefix”Set “Filename Prefix (Optional)” to prepend a string to every saved filename. For example, a prefix of project_ produces files like project_articles_machine-learning.md.
Preparing a RAG Knowledge Base
Section titled “Preparing a RAG Knowledge Base”- Use a source operator to scrape or import content into a table.
- Wire the output into the Save Sources operator.
- Set “Output Folder” to your knowledge base directory.
- Enable “Use URL for Filename” for web content, or configure a filename column for other sources.
- Pulse “Save Markdown Files” to export.
- Point a RAG Index operator at the same output folder to ingest the saved files.
Filename Resolution Order
Section titled “Filename Resolution Order”The operator resolves filenames using a three-tier fallback strategy:
- URL-based — if “Use URL for Filename” is enabled and a valid
source_pathexists, the URL is parsed and sanitized into a filename. - Fallback column — if URL generation fails or is disabled, the column specified in “Filename Column (Optional)” is used.
- Document ID — if both above methods fail, the
doc_idcolumn value is used as the filename.
Every row is guaranteed to produce a filename through this chain.
Overwrite Protection
Section titled “Overwrite Protection”By default, “Overwrite Existing Files” is off. When disabled, the operator skips any file that already exists at the target path. Enable it to replace existing files during re-exports or content updates.
Troubleshooting
Section titled “Troubleshooting”“Error: Missing ‘doc_id’ column” or “Missing ‘content’ column”
The input table must have both doc_id and content as column headers in the first row. Verify your upstream operator is producing the expected table format.
“Error: Output folder invalid” The specified folder path could not be resolved. Ensure the path exists or that its parent directory exists (the operator will attempt to create the final folder). Use absolute paths to avoid ambiguity.
“Error: Input table empty/no header” The input table has no data rows beyond the header. Confirm your source operator has finished populating the table before pulsing “Save Markdown Files”.
Files not appearing despite successful status If “Overwrite Existing Files” is off and files with the same names already exist, they are silently skipped. Enable overwrite or use a different prefix to generate unique filenames.
Resetting after errors Pulse “Clear Status” to reset the status, progress, and file count back to their initial state.
Parameters
Section titled “Parameters”Save Config
Section titled “Save Config”op('save_sources').par.Outputfolder Folder The directory where Markdown files will be saved.
- Default:
"" (Empty String)
op('save_sources').par.Filenameprefix Str Optional prefix to add to the beginning of each saved filename (before the doc_id).
- Default:
"" (Empty String)
op('save_sources').par.Filenamecolumn Str Optional: Specify a column name (e.g., "filename") to use for filenames instead of "doc_id". If empty or column not found, "doc_id" is used.
- Default:
"" (Empty String)
op('save_sources').par.Overwrite Toggle If enabled, existing Markdown files with the same name will be overwritten.
- Default:
False
op('save_sources').par.Savemarkdown Pulse Starts the process of saving content from the input DAT to Markdown files.
- Default:
False
op('save_sources').par.Clearstatus Pulse Resets the status, progress, and files saved counters.
- Default:
False
op('save_sources').par.Status Str - Default:
"" (Empty String)
op('save_sources').par.Progress Float - Default:
0.0- Range:
- 0 to 1
- Slider Range:
- 0 to 1
op('save_sources').par.Filessaved Int - Default:
0- Range:
- 0 to 1
- Slider Range:
- 0 to 1
op('save_sources').par.Useurlasfilename Toggle If enabled, attempts to create a safe filename from the "source_path" column URL.
- Default:
False