Sitecore Search is a powerful tool for managing content, and I recently explored its functionality, specifically in the context of the Push Source. During my exploration, I encountered a challenge: deleting specific types of documents in bulk. Surprisingly, Sitecore Search does not provide a direct method for achieving this scenario. Driven by this limitation, I began working on a solution to address this gap.
Document Deletion in Sitecore Search
When it comes to deleting documents, Sitecore Search offers two primary methods:
Sitecore Search Customer Engagement Console (CEC) : This is a user-friendly interface; however, it does not support bulk deletion of multiple documents simultaneously.
Sitecore Injection API: This API provides more flexibility but is limited to deleting one document at a time.
To delete a document using the Injection API, you can utilize the following endpoint:
{base-URL}/ingestion/v1/domains/{domainID}/sources/{sourceID}/entities/{entityID}/documents/{documentID}?locale={locale}
While this API is effective for single-document deletion, it lacks the ability to handle multiple document IDs in a single request. Consequently, if you need to delete multiple documents, you must call the API repeatedly for each document ID.
Challenges in Bulk Deletion
The first hurdle is obtaining the document IDs. To accomplish this, you'll need to use the API to fetch all relevant document IDs and then pass them one-by-one to the delete endpoint. This repetitive process can be tedious and time-consuming, especially when working with large sets of documents.
A Practical Solution: Automating Bulk Deletion with PowerShell
To address this issue, I developed a PowerShell script to automate the bulk deletion process. This script simplifies the task by programmatically fetching document IDs and calling the delete API for each ID.
Here’s how the PowerShell script can help:
Fetch document IDs using the appropriate API calls.
Loop through the document IDs and invoke the delete API for each one automatically.
Provide flexibility for customization to meet specific requirements.
Fetch ID from Search:
# Define the API URL and headers for the Sitecore Search API
# - $apiUrl: Specifies the API endpoint for retrieving document IDs.
# - $headers: Contains the necessary headers, including domain ID, content type,
# and authorization token, for making authenticated requests to the API.
$apiUrl = 'https://discover.sitecorecloud.io/discover/v2/{domainid}' # Replace with your actual Domain ID
$headers = @{
'rfk.domainId' = 'rfkid_7' # Replace with your actual domain ID
'Content-Type' = 'application/json'
'Accept' = 'application/json'
'Authorization' = '' # Insert your actual token
}
# Specify the output file where all document IDs will be saved in JSON format
$outputFile = 'delete_document.json'
# Initialize variables for paginated API requests
# - $offset: Tracks the starting point for each batch of results (pagination).
# - $limit: Specifies the maximum number of items to retrieve per API call.
# - $totalItems: Will store the total number of items available from the API.
# - $allContent: An array to accumulate all retrieved document details (e.g., IDs).
$offset = 0
$limit = 100
$totalItems = $null
$allContent = @() # Array to store all extracted content
do {
# Prepare the JSON request payload for fetching document IDs
# - $body: Constructs the API request body with search parameters, such as entity,
# fields to extract ('id'), batch size ($limit), and starting point ($offset).
$body = @{
widget = @{
items = @(@{
entity = 'product' # Specify the entity type to search for (e.g., 'product').
rfk_id = 'product_search' # Replace with your actual RFK ID
search = @{
content = @{
fields = @('id') # Specify which fields to retrieve (e.g., 'id').
}
limit = $limit # Number of items to retrieve in each batch.
offset = $offset # Starting point for the current batch.
}
sources = @('') # Specify the source ID to target the appropriate data source.
})
}
} | ConvertTo-Json -Depth 10 -Compress # Convert the request body to a JSON string.
# Make the API call using the POST method and capture the response
# - $response: Contains the API's response, including document IDs and pagination info.
$response = Invoke-RestMethod -Uri $apiUrl -Headers $headers -Method Post -Body $body
# Extract relevant data from the API response
# - $content: Holds the list of document details (e.g., IDs) for the current batch.
# - $totalItems: Indicates the total number of items available for retrieval.
$content = $response.widgets[0].content
$totalItems = $response.widgets[0].total_item
# Append the retrieved document details to the $allContent array
# This ensures all document details across multiple batches are stored.
foreach ($item in $content) {
$allContent += @{
id = $item.id # Document ID.
source_id = $item.source_id # Source ID of the document.
}
}
# Log progress by displaying the current offset being processed
Write-Host "Processed offset $offset"
# Increment the offset to fetch the next batch of results in the subsequent API call
$offset += $limit
} while ($offset -lt $totalItems) # Continue fetching data until all items are retrieved.
# Write the collected document details to the specified JSON file
# - Converts the $allContent array to a JSON format and saves it to $outputFile.
$allContent | ConvertTo-Json -Depth 10 | Out-File -FilePath $outputFile -Encoding utf8
# Log completion message indicating where the output file is stored
Write-Host "Processing complete. The data is saved in '$outputFile'."
The above code will fetch all the Ids from the Search Entity. If you want to do a filtering by type or any other attributed you can make use of Search Result Filters.
Delete IDs from the output of above Script.
This script fetch the IDs from the delete_document.json file and iterate through and delete the document ID from the Search.
# Define script parameters to customize bulk document deletion
# - $JsonFilePath: Mandatory parameter specifying the path to the input JSON file with document IDs.
# - $LogFilePath: Optional parameter specifying the path for the log file (default: './bulk_deletion.log').
# - $ApiDomain: Optional parameter specifying the Sitecore API domain ID (default: "22764180680").
# - $ApiSource: Optional parameter specifying the Sitecore API source ID (default: "1088192").
# - $ApiToken: Optional parameter specifying the authentication token for Sitecore API access.
# - $Locale: Optional parameter specifying the locale (default: "en_us").
param(
[Parameter(Mandatory = $true)]
[string]$JsonFilePath,
[Parameter(Mandatory = $false)]
[string]$LogFilePath = ".\bulk_deletion.log",
[Parameter(Mandatory = $false)]
[string]$ApiDomain = "",
[Parameter(Mandatory = $false)]
[string]$ApiSource = "",
[Parameter(Mandatory = $false)]
[string]$ApiToken = "",
[Parameter(Mandatory = $false)]
[string]$Locale = "en_us"
)
# Function to log messages with timestamps
# - $Message: The log message to write both to the console and the log file.
function Write-Log {
param($Message)
$timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
"$timestamp - $Message" | Add-Content -Path $LogFilePath
Write-Host $Message
}
# Function to delete a single Sitecore Search document using the API
# - $DocumentId: Mandatory parameter specifying the document ID to delete.
# - $SourceId: Mandatory parameter specifying the source ID of the document.
function Remove-SitecoreSearchDocument {
param(
[Parameter(Mandatory = $true)]
[string]$DocumentId,
[Parameter(Mandatory = $true)]
[string]$SourceId
)
# Construct the API URL using dynamic parameters
$apiUrl = "https://discover.sitecorecloud.io/ingestion/v1/domains/$ApiDomain/sources/$SourceId/entities/map/documents/$DocumentId`?locale=$Locale"
$headers = @{
'Accept' = 'application/json'
'Authorization' = $ApiToken
}
Write-Log "Making DELETE API call to: $apiUrl"
try {
# Invoke the REST API DELETE request
$response = Invoke-RestMethod -Uri $apiUrl -Method Delete -Headers $headers
Write-Log "Document successfully deleted: $DocumentId"
return @{
success = $true
documentId = $DocumentId
sourceId = $SourceId
}
}
catch {
# Log error details for unsuccessful deletion attempts
Write-Log "Error deleting document: $DocumentId"
Write-Log "Error details:"
Write-Log $_.Exception.Message
if ($_.ErrorDetails) {
Write-Log "Response body:"
Write-Log $_.ErrorDetails.Message
}
elseif ($_.Exception.Response) {
Write-Log "Response body:"
try {
$rawResponse = $_.Exception.Response.Content.ReadAsStringAsync().Result
Write-Log $rawResponse
}
catch {
Write-Log "Unable to read error response content: $_"
}
}
return @{
success = $false
documentId = $DocumentId
sourceId = $SourceId
error = $_.Exception.Message
}
}
}
# Ensure the input file exists; exit if not found
if (-not (Test-Path $JsonFilePath)) {
Write-Log "Error: Input file not found at path: $JsonFilePath"
exit 1
}
# Create or clear the log file
if (Test-Path $LogFilePath) {
Clear-Content $LogFilePath
}
else {
New-Item -Path $LogFilePath -ItemType File -Force | Out-Null
}
Write-Log "Starting bulk document deletion process..."
Write-Log "Reading JSON file: $JsonFilePath"
try {
# Read and parse the JSON file containing document IDs
$jsonContent = Get-Content $JsonFilePath -Raw | ConvertFrom-Json
}
catch {
# Log and exit if JSON file parsing fails
Write-Log "Error: Failed to parse JSON file: $_"
exit 1
}
# Initialize counters for tracking the deletion process
$totalDocuments = $jsonContent.Count
$processedCount = 0
$successCount = 0
$failureCount = 0
Write-Log "Found $totalDocuments documents to delete"
foreach ($doc in $jsonContent) {
$processedCount++
$documentId = $doc.id
$sourceId = $doc.source_id
Write-Log "Processing document $processedCount of $totalDocuments (ID: $documentId, Source: $sourceId)"
# Call the function to delete the document and capture the result
$result = Remove-SitecoreSearchDocument -DocumentId $documentId -SourceId $sourceId
if ($result.success) {
$successCount++
Write-Log "Success - Document ID: $documentId deleted"
}
else {
$failureCount++
Write-Log "Error - Document ID: $documentId - Error: $($result.error)"
}
# Add a small delay to prevent overwhelming the API
Start-Sleep -Milliseconds 500
}
Write-Log "Deletion completed:"
Write-Log "Total documents processed: $totalDocuments"
Write-Log "Successfully deleted: $successCount"
Write-Log "Failed deletions: $failureCount"
# Output summary to console
Write-Host ""
Write-Host "Deletion Summary:" -ForegroundColor Cyan
Write-Host "Total documents processed: $totalDocuments" -ForegroundColor White
Write-Host "Successfully deleted: $successCount" -ForegroundColor Green
Write-Host "Failed deletions: $failureCount" -ForegroundColor Red
This solution significantly reduces the manual effort and ensures efficiency when handling bulk deletion tasks.
This solution addresses the current limitation in Sitecore Search by using a PowerShell script for bulk deletion. However, it highlights the need for Sitecore to introduce an API dedicated to bulk deletion, which would make workflows easier and improve efficiency for developers working with Push Sources.
If you encounter similar issues, you can customize and use this script to simplify the task. Until Sitecore provides a native bulk deletion feature, this script offers a reliable and practical workaround.
Comments