Document AI
Configuration
Dataset Configuration
datasets in cloudfiles document ai allow you to group multiple documents into a single ai ready collection so they can be analyzed together once documents are added and processed, you can run natural language prompts across the entire dataset instead of querying each document individually datasets are especially useful when you want to compare documents, summarize information across files, or extract insights from multiple related documents such as quotes, binders, contracts, or reports below is a straightforward walkthrough of how to configure and use a dataset create a new dataset a dataset represents a collection of documents that cloudfiles processes and indexes so they can be queried together using ai to create a new dataset, navigate to cloudfiles app → document ai tab → datasets subtab on the datasets page, you will see a list of existing datasets along with details such as dataset id, creator, creation date, and processing status click create new dataset to open the dataset creation dialog configure the dataset details in the create dataset window, provide the following information dataset name enter a clear name for the dataset so it can be easily identified later description (optional) add a short description describing what documents the dataset contains or what it will be used for once the details are entered, click save & proceed this creates the dataset and opens the dataset configuration page where you can add documents add data to the dataset after creating the dataset, the dataset details page will open this page contains the data sources section, where you can add both documents and salesforce data that will be included in the dataset files within the files section, you will see an option to upload documents click upload files and select the files you want to include in the dataset these can be pdfs or other supported document types that you want to analyze together once uploaded, cloudfiles automatically processes the documents by indexing their content so they can be used for ai queries after processing is complete, the status changes to processed , indicating that the files are ready salesforce data (soql queries) in addition to files, you can also add salesforce data using soql queries navigate to the soql queries section and click add query provide a query name, enter your soql query, and choose a refresh interval such as daily or monthly after validation, cloudfiles fetches the data and includes it as part of the dataset this data is processed and indexed just like documents, allowing it to be queried alongside your files the system supports large datasets, with up to 1 million rows per query , and automatically refreshes the data based on the selected interval when processing is complete, the status will change to processed , indicating that the dataset is ready to be queried query the dataset in ai playground after the dataset has been processed, you can test and query it using the ai playground navigate to the ai playground tab and click create playground from the dropdown options, select dataset playground in the dataset playground interface, select the dataset you created from the dataset selector once the dataset is selected, you can start entering prompts to query the documents contained in that dataset the ai will analyze all documents in the dataset and return answers based on the information found across those files for example, you can ask prompts such as summarizing the documents, extracting key information, or comparing data across multiple files result once configured and processed, a dataset becomes an ai ready collection of documents that can be queried collectively using natural language prompts using the docid 2imyi 69ntxktr6s8yuz , users can quickly analyze multiple documents, extract insights, and validate prompts before using those queries in automation datasets can also be managed directly within salesforce flows using cloudfiles document ai flow actions for example, the docid\ gmwc xazlo4a9qanfcwih action can be used to programmatically create datasets during a flow, while actions such as docid\ dm3eh gzaocyoqd5y0r8d or docid\ ok1o1lpb07zvaclnrbgvu allow prompts to run against those datasets the extracted results can then be used to update records or trigger actions as part of salesforce automation