Dataset Playground

a dataset in cloudfiles document ai is a collection of multiple documents grouped together so they can be analyzed by ai as a single unit instead of querying each document individually, datasets allow users to run natural language prompts across all documents in the collection at once datasets are useful when working with related documents such as contracts, quotes, insurance forms, or reports where information may need to be extracted, compared, or summarized across multiple files before using the dataset playground, a dataset must first be created and processed this involves defining the dataset and uploading the documents that will be included in it for detailed steps on creating and configuring datasets, refer to the datasets docid 9bnapymehdavjjkhktrrx article after the dataset has been processed, you can test and query it using the ai playground docid\ tyblqcdnynqzxfrzqp86w by navigating to the ai playground tab, clicking on create playground , and selecting dataset playground docid 2imyi 69ntxktr6s8yuz from the dropdown during setup, choose the dataset you want to query, and once the playground session is created, all documents within that dataset are loaded and available for querying dataset playground view the dataset playground interface is designed to help users experiment with prompts and observe how the ai interprets information across an entire dataset it provides a structured environment where users can review documents, test queries, and validate results before implementing them in automation workflows the interface includes the following components files panel the files panel displays all documents that belong to the selected dataset these files are the same documents that were uploaded and processed during dataset configuration users can view the complete list of documents included in the dataset select any document to preview its contents switch between documents to better understand the context of the dataset although queries run across all documents in the dataset , selecting a specific file allows users to inspect the document content and verify how the ai extracts information from it this is particularly helpful when refining prompts or troubleshooting unexpected query results document viewer the document viewer allows users to preview, zoom in, scroll, and download the uploaded document for easy reference query panel the query panel allows users to enter natural language prompts to test queries against the selected dataset unlike the file playground where prompts run on a single document, the dataset playground analyzes all documents contained within the dataset and returns results based on the combined information across those files using natural language processing (nlp) , the system can interpret prompts and extract relevant information from both printed and handwritten text , including text present in images users can run different types of prompts such as classification, extraction, validation, summarization, decision making, or calculation queries to analyze the dataset the results are returned as text based outputs derived from the dataset content if the system determines the answer with high confidence , it returns the extracted result if the query is unclear, unrelated to the dataset, or the confidence level is low, the system returns na to ensure accuracy query history the query history section stores and displays previously asked queries along with their extracted answers for quick reference why use the dataset playground the dataset playground is primarily used to test and refine prompts before using them in automation it allows users to verify how queries behave across multiple documents and ensure that the extracted information is accurate here’s your example rewritten in the same format for kyc documents upload multiple documents to the dataset and use the dataset playground to run queries across all files in that dataset for example, you upload a dataset containing multiple kyc documents (such as passports, driver’s licenses, or id cards) for identity verification these documents may belong to the same individual or different individuals, and may include variations in name formats, missing fields, or inconsistent details ask queries that analyze information across all documents in the dataset, such as verifying whether all documents belong to the same person based on key identity fields like name and date of birth for example query instruction verify identity consistency across kyc documents task review all documents in the dataset and extract the full name and date of birth of the primary individual from each document normalize name formats (e g , "doe, john" → "john doe") and compare across all documents determine whether all documents belong to the same person if not, group documents by matching identities output { "result" "different individuals", "groups" \[ { "name" "john doe", "documents" \["passport", "driver license"] }, { "name" "andrew sample", "documents" \["driver license"] } ] } once a query works as expected in the playground, the same prompt can be used in salesforce automation using cloudfiles document ai flow actions such as query document/dataset docid\ dm3eh gzaocyoqd5y0r8d and query document/dataset (batch) docid\ ok1o1lpb07zvaclnrbgvu this enables automated workflows that analyze and extract information from entire document collections