Process Document using AI
The CloudFiles: Process Document Using AI flow action is a foundational step for working with documents in CloudFiles Document AI. Before you can query a document, it must be processed first. In other words, just as you would upload a file to an AI system to later ask questions about it, CloudFiles requires this initial processing to make the document accessible for querying. The Process Document Using AI action handles this, preparing the document for subsequent queries.
This processing is an essential prerequisite for any workflow that involves further CloudFiles Document AI actions (For example, Query Document) analyzing document content or extracting specific details, such as identifying document type or pulling specific values for other fields.
This action runs asynchronously, meaning it does not provide immediate output. The Process Document Using AI action converts documents, such as images or PDFs, into a digitized, queryable format that CloudFiles can interpret. Instead of returning output directly, it publishes a Document Processed event once processing is complete. This event details include ProcessedDocumentId (a unique CloudFiles identifier), which is essential for further CloudFiles Document AI actions, enabling you to query and interact with the document’s contents in subsequent flows.
Consider a scenario where you need to process KYC documents attached to Contact records in Salesforce, such as passports or national identity certificates. You may want to automatically identify the document type (e.g., “Is this document a passport?”) and, if it is, query additional information such as the address or nationality to populate fields on the Contact record. To enable this process, you would:
- Create a flow that triggers each time a document is attached to a Contact record and uses the Process Document Using AI action to make the document ready for querying.
- Set up another flow triggered by the Document-Processed event, which references ProcessedDocumentId (a unique CloudFiles identifier) and information to identify its origin (such as the specific Contact record and processed file details) and use other CloudFiles Document AI actions to query and gather further information from the file.
This setup allows you to automate document classification and content extraction workflows effectively.
In your Flow Builder, search for the element named "CloudFiles: Process Document using AI". You can find this action in the CloudFiles category when you click on the "Action" element in the "Add Element" box. Select the action to insert it into the flow, and then configure the input parameters.
In order to specify a Salesforce File to be processed, Input paramters as:
- Library - salesforce
- FileID - The ContentDocumentID of the Salesforce File to be processed.
You can get the ContentDocumentID of the Salesforce File from other standard salesforce elements like "Get Records" or standard Screen Flow "Upload Files" component or from details of CloudFiles Events like Salesforce File Attached Event.
- It is mandatory to input both Library and FileID to specify a Salesforce File.
If you are using CloudFiles: Document Managemnt pacakage as well, then you can Process a file in connected external storage as well.
In order to specify an External Storage File to be processed, Input paramters as:
- Library - The Library parameter is the external storage type you are using. Possible values are sharepoint, google (for Google Drive), onedrive, dropbox, box, cloudfiles (for AWS S3).
- DriveID - The ID of the drive where the document resides. This is important for Google Drive & Sharepoint libraries only. The Drive ID is a unique identifier for a storage location in both SharePoint and Google Drive. In SharePoint, it represents a document library within a site, while in Google Drive, it identifies a user's drive or shared drive.
- FileID - The unique identifier (ResourceId) of the file that is to be processed.
Based on the use case, you can get these parameters from details of other CloudFiles Events like File Uploaded Event or File Received Event etc.
An optional identifier to track the source of the event or anyother intended/necessary details. This shall be available in corresponding output i.e. in the corresponding Document Processed event details.
The Context parameter is particularly helpful if this action is used in multiple flows. For example, if you’re processing documents attached to Contact records, you can set the Contact’s recordId as the Context. When events are published, this Context value will help you track the origin of each event by showing the associated Contact record.
The apex action does not return anything as an output in the flow it is used but for every file processed a Document Processed event is published. This event signals the completion of file processing and can be used to trigger platform event flows to perform post-processing actions such as Query Document or Query Document (Batch).
If the action fails due to some reason an error-event event will be triggered and this event can be used in a decision element to diagnose and handle the error.