Archive folders based on the modification date of their files using Archiware P5

Recently I was asked if P5 could automatically archive an entire folder based not on the modification of the folder itself, but instead on the modification time of the contents of the folder (files and subdirectories). The goal was to enable a “cold folder” automation that can automatically clean entire projects from storage based on them going stale and not being used anymore. Using a little python and the P5 API, I’ve created a script that does just that.

The main magic here that is happening that can’t be accomplished out-of-box with P5 is to actually make a decision on the which directory to archive based on the modification date of each and every file in all of its subdirectories. In practice, using this script would be pointed at the top level of a “Projects” directory on your shared storage and archive the entire single subfolder if every file in that subfolder had not been touched in t days.

So in the image above, I have 3 projects:

  • 1234_Project_Customer_ABC
  • 1235_Project_Customer_DEF
  • 1236_Project_Customer_GHI

I’m going to look at the individual files inside each of these parent directories, and if every file (not directory/folder) has not been touched in the amount of days I’ve defined at run time, I’ll add the entire directory to the archive job. If I set t to 365 days, the script would archive the entire directories 1234_Project_Customer_ABC and 1235_Project_Customer_DEF as nothing in their entire tree has been modified in a year. However, we would skip 1236_Project_Customer_GHI as we can see there is an item (as of this writing) that is under 365 days old, the file iconik-isg-handler.zip.

The P5 filter toolkit can handle archiving individual files or folders/directories/paths based on their mod time, but what it does not do is check the contents of those folders/directories for updates. That is the goal of what this script accomplishes, and it does some rather simply. There are three main functions at play in the script.

def check_folder_is_archivable(folder):
    # set a counter to zero for our files, increment if a file doesn't meet 
    # aging rules
    n=0
    for this_file in get_all_files(os.path.join(source_directory,folder)):
        if check_mtime(this_file):
            pass
        else:
            n+=1
 
    # check if all files in a folder are good to go
    if n == 0:
        logging.info("Folder " + folder + " meet aging requirements")
        return True
    else:
        logging.info(str(n) + " files found that don't meet aging in " + folder +", skipping")
        return False

The above function checks the contents of each project folder to see if every single file meets the aging requirements. It simply sets a counter at 0 and if any single file in a directory is older than the defined time in days, it increments. Once all file are checked, it returns a simple True or False to the calling function. Next we have the actual P5 function to create an archive job.

def archive_folders(folders):
    archive_selection = p5_api_call(cmd,['ArchiveSelection','create','localhost',str(plan)]).rstrip()
    if archive_selection == "":
        logging.error("Could not create archive selection: " + p5_api_call(cmd,['geterror']).rstrip() + ". Exiting.")
        exit(1)
    else:
        logging.info("Successfully created archive selection " + archive_selection)
        for folder in folders:
            this_handle = p5_api_call(cmd,['ArchiveSelection',archive_selection,"adddirectory","{" + source_directory + "/" + folder + "}"]).rstrip()
            if this_handle == "":
                logging.error("Could not add directory " + folder + " to archive selection, skipping")
                logging.error(p5_api_call(cmd,['geterror']).rstrip())
            else:
                logging.info("Successfully added directory " + folder + " with handle " + this_handle)
        job_number = p5_api_call(cmd,['ArchiveSelection',archive_selection,'submit','now']).rstrip()
        if job_number == "":
            logging.error("Could not submit job: " + p5_api_call(cmd,['geterror']).rstrip() + ". Exiting.")
            exit(1)
        else:
            logging.info("Successfully submitted " + str(len(folders)) + " folders with P5 job number " + job_number)

This function just does three primary things:

  • Creates a new Archive Selection
  • Loops through all folders that meet requirements and submits them with a single adddirectory call.
  • Submits the job

It also checks for each call and logs any errors that occur during this process to the log.

The other functions used are just time savers and fairly self explanatory. The final bit of logic for the script happens in this chunk:

############################################
#here is where we actually start the script#
############################################
 
#log if we are using a dry run or not
if args.dry_run is False:
    logging.info("Starting Script.")
else:
    logging.info("Starting Script in dry run mode.")
 
# make sure nsdchat exists
if not os.path.isfile(nsdchat):
	logging.error("Could not find P5 CLI at " + str(aw_path) + "/bin/nsdchat, exiting")
	sys.exit(1)
 
# make sure source directory exists exists
if not os.path.isdir(source_directory):
    logging.error("Could not source directory at " + str(source_directory) + ", exiting")
    sys.exit(1) 
 
#get all our "project" folders
my_subs = get_all_subdirs(source_directory)
 
#check each one for aging and add it to a list if it meets aging requirements
folders_ready = []
for this_sub in my_subs:
    if check_folder_is_archivable(this_sub):
        logging.info("Folder " + this_sub + " meets requirements, adding to queue.")
        folders_ready.append(this_sub)
    else:
        logging.info("Folder " + this_sub + " does not meet requirements, skipping.")        
 
#create an archive job for all folders that meet requirements
#check if we are in dry run mode first
if args.dry_run is False:
    if len(folders_ready) > 0:
        archive_folders(folders_ready)
    else:
        logging.info("No folders meet requirements.")
 
#log that the script completed successully.
logging.info("Script complete.  Exiting.")

This block really only does a few things. It validates that the folder being asked for exists before doing anything at all. It then validates that P5’s CLI tool is accessible. It then gets all the project folders and checks each one for validity based on our timing rules. Lastly, it submits all valid directories to be archived.

I’ve made the entire source code and docs to the script available on GitHub. Feel free to use as you like or suggest any changes (or even update/fix).

https://github.com/szumlins/p5_project_archiver

There are a few restrictions with the script right now that I’m aware of. The main one being that it only supports being run on the P5 server itself and the data must also be visible to localhost. This can be worked around relatively easily in the script, but it also would create opportunities for users to successfully submit jobs that would fail.

Normally I’d suggest running the script as a cron or launchd job periodically. All of the CLI flags are visible in the README.md file on GitHub.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.