Flow Actions
Doc AI Flow Actions
Create Dataset
introduction the create dataset action in cloudfiles document ai enables you to group multiple files into a single logical dataset and process them together using ai a dataset represents a collection of sources (currently files only) that can be versioned, queried, and compared as a whole this makes it ideal for use cases where insights need to be drawn across multiple documents rather than from a single file just like individual documents must be processed before they can be queried, datasets must first be created and processed so that their contents become searchable and queryable through cloudfiles ai this action is a foundational step for workflows that involve querying across multiple files, comparing data between documents, or asking high level questions that span an entire dataset using docid\ dm3eh gzaocyoqd5y0r8d or docid\ ok1o1lpb07zvaclnrbgvu actions what this action does this action runs asynchronously, meaning it does not provide immediate output the create dataset action processes multiple files together as a single dataset, preparing them for intelligent, cross document querying within cloudfiles document ai instead of returning output directly, it publishes a docid\ fe88tt1zz yvq yzv ens event once all dataset resources have been successfully processed during processing, each file in the dataset is analyzed individually the docid\ fe88tt1zz yvq yzv ens event includes the datasetid (a unique cloudfiles identifier), which is essential for querying the dataset in subsequent cloudfiles document ai actions example scenario consider a scenario where you need to analyze multiple kyc documents associated with an account or contact in salesforce such as passports, national identity cards, address proofs, and utility bills instead of processing each file individually, you may want to treat all these documents as a single logical unit and ask questions like “do all documents belong to the same person?” “compare the address across all submitted documents ” “is there any mismatch in nationality or date of birth?” to enable this process, you would create a flow that collects multiple related files (for example, all kyc documents attached to a contact or uploaded via a cloudfiles widget) and uses the create dataset action to group and process them together set up another flow triggered by the dataset created event , which references the datasetid and contextual information (such as the originating salesforce record) this flow can then use docid\ dm3eh gzaocyoqd5y0r8d or docid\ ok1o1lpb07zvaclnrbgvu actions to ask questions, compare values across files, or extract consolidated insights from the dataset input parameters in flow builder, search for cloudfiles create dataset under the cloudfiles category and configure the following inputs context an optional identifier to track the source of the event or any other intended/necessary details this shall be available in corresponding output i e in the corresponding dataset created event details the context parameter helps identify the origin of a dataset when the create dataset action is used across multiple flows for example, when creating a dataset from documents attached to a contact or account, you can pass the record’s id as the context this value is included in the docid\ fe88tt1zz yvq yzv ens event, allowing downstream flows to easily associate the dataset with the correct salesforce record or process name (required) a human readable name for the dataset this helps identify the dataset in queries, events, and version history resources (required) the resources input specifies the files that will be included in the dataset this parameter must be an apex defined collection of docid\ mr7u7qdhigoasucpf nlr objects only values provided in this collection are accepted by the create dataset action each docid\ mr7u7qdhigoasucpf nlr represents a single file source you can include multiple resources in the collection to create a dataset containing multiple files even when creating a dataset with a single file, the resource must still be passed as a collection the cldfsresource apex type defines all required metadata needed by cloudfiles to locate and process a file, such as library file identifier ( like resource id, drive id, etc) description (optional) an optional description to provide context about the dataset’s purpose or contents output parameters the apex action does not return anything as an output in the flow it is used but for every dataset processed a dataset created event is published this event signals the completion of file processing and can be used to trigger platform event flows to perform post processing actions such as docid\ dm3eh gzaocyoqd5y0r8d or docid\ ok1o1lpb07zvaclnrbgvu if the action fails due to some reason, an error event event will be triggered and this event can be used in a decision element to diagnose and handle the error