Exporting Salesforce Files (aka ContentDocument)

Last week a client asked me to help out, we had been creating a system that creates PDF files in Salesforce using Drawloop (today known as Nintex Document Generation which is a boring name).

Anyways, we had about 2000 PDF created in the system and after looking into it there doesn’t seem to be a way to download them in bulk. Sure you can use the Dataloader and download them but you’ll get the content in a CSV column and that doesn’t really fly with most customers.

I tried dataloader.io, Realfire and search through every link on Google or at least the first 2 pages and I didn’t find a good way of doing it.

There seems to be an old AppExchange listing for FileExporter by Salesforce Labs and I think this is the actual software FileExporter and it stopped working with the TLS 1.0 deprecation.

Enough of small talk, I had to solve the problem so I went ahead and created a very simple Python script that lets you specify the query to find your ContentVersion objects and also filter the ContentDocuments if you need to ignore some ids.

My very specific use case was that I was to export all PDF files with a certain pattern in the filename but only those that were related to a custom object that had a certain status. Given that you can’t do certain queries like this one:

SELECT ContentDocumentId, Title, VersionData, CreatedDate FROM ContentVersion WHERE ContentDocumentId IN (
SELECT ContentDocumentId FROM ContentDocumentLink where LinkedEntityId IN (SELECT Id FROM Custom_Object__c))

It gives you a:

Entity 'ContentDocumentLink' is not supported for semi join inner selects

I had to implement the option for the second query which gives a list of valid ContentDocumentIds to include in the download.

The code is at https://github.com/snorf/salesforce-files-download, feel free to try it out and let me know if it works or doesn’t work out for you.

One more thing, keep in mind that even if you’re an administration with View All you will not see ContentDocuments that doesn’t belong to you or are explicitly shared with you. You’ll need to either change the ownership of the affected files or share them with the user running the Python script.


Uploading CSV data to Einstein Analytics with AWS Lambda (Python)

I have been playing around with Einstein Analytics (the thing they used to call Wave) and I wanted to automate the upload of data since there’s no reason on having dashboards and lenses if the data is stale.

After using Lambda functions against the Bulk API I wanted to have something similar and I found another nice project over at Heroku’s GitHub account called pyAnalyticsCloud

I don’t have a Postgres Database so I ended up using only the uploader.py file and wrote this Lambda function to use it:

from __future__ import print_function

import json
from base64 import b64decode
import boto3
import uuid
import os
import logging
import unicodecsv
from uploader import AnalyticsCloudUploader

logger = logging.getLogger()

s3_client = boto3.client('s3')
username = os.environ['SF_USERNAME']
encrypted_password = os.environ['SF_PASSWORD']
encrypted_security_token = os.environ['SF_SECURITYTOKEN']
password = boto3.client('kms').decrypt(CiphertextBlob=b64decode(encrypted_password))['Plaintext'].decode('ascii')
security_token = boto3.client('kms').decrypt(CiphertextBlob=b64decode(encrypted_security_token))['Plaintext'].decode('ascii')
file_bucket = os.environ['FILE_BUCKET']
wsdl_file_key = os.environ['WSDL_FILE_KEY']
metadata_file_key = os.environ['METADATA_FILE_KEY']

def bulk_upload(csv_path, wsdl_file_path, metadata_file_path):
    with open(csv_path, mode='r') as csv_file:
        logger.info('Initiating Wave Data upload.')
        logger.debug('Loading metadata')
        metadata = json.loads(open(metadata_file_path, 'r').read())

        logger.debug('Loading CSV data')
        data = unicodecsv.reader(csv_file)
        edgemart = metadata['objects'][0]['name']

        logger.debug('Creating uploader')
        uploader = AnalyticsCloudUploader(metadata, data)
        logger.debug('Logging in to Wave')
        uploader.login(wsdl_file_path, username, password, security_token)
        logger.debug('Uploading data')
        logger.info('Wave Data uploaded.')
        return 'OK'

def handler(event, context):
    for record in event['Records']:
        # Incoming CSV file
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key']
        csv_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
        s3_client.download_file(bucket, key, csv_path)

        # WSDL file
        wsdl_file_path = '/tmp/{}{}'.format(uuid.uuid4(), wsdl_file_key)
        s3_client.download_file(file_bucket, wsdl_file_key, wsdl_file_path)

        # Metadata file
        metadata_file_path = '/tmp/{}{}'.format(uuid.uuid4(), metadata_file_key)
        s3_client.download_file(file_bucket, metadata_file_key, metadata_file_path)
        return bulk_upload(csv_path, wsdl_file_path, metadata_file_path)

Yes the logging is a bit on the extensive side and make sure to add these environment variables in AWS Lambda:

SF_USERNAME - your SF username
SF_PASSWORD - your SF password (encrypted)
SF_SECURITYTOKEN - your SF security token (encrypted)
FILE_BUCKET- the bucket in where to find the mapping file
METADATA_FILE_KEY- the path to the metadata file in that bucket (you get this from Einstein Analytics)
WSDL_FILE_KEY - the path to the wsdl partner file in the bucket

I added an S3 trigger that runs this function as soon as a new file is uploaded. It has some issues (crashing with parenthesis in the file name for example) so please don’t use this for a production workload before making it enterprise grade.

Note: The code above only works in Python 2.7