{"id":260,"date":"2019-02-19T12:48:24","date_gmt":"2019-02-19T17:48:24","guid":{"rendered":"http:\/\/provideotech.org\/?p=260"},"modified":"2019-02-19T12:48:30","modified_gmt":"2019-02-19T17:48:30","slug":"archive-folders-based-on-the-modification-date-of-their-files-using-archiware-p5","status":"publish","type":"post","link":"https:\/\/provideotech.org\/?p=260","title":{"rendered":"Archive folders based on the modification date of their files using Archiware P5"},"content":{"rendered":"<p>Recently I was asked if P5 could automatically archive an entire folder based not on the modification of the folder itself, but instead on the modification time of the contents of the folder (files and subdirectories). The goal was to enable a &#8220;cold folder&#8221; automation that can automatically clean entire projects from storage based on them going stale and not being used anymore. Using a little python and the P5 API, I&#8217;ve created a <a href=\"https:\/\/www.github.com\/szumlins\/p5_project_archiver\">script<\/a> that does just that.<br><!--more--><\/p>\n\n\n<p>The main magic here that is happening that can&#8217;t be accomplished out-of-box with P5 is to actually make a decision on the which <strong>directory<\/strong> to archive based on the modification date of each and every <strong>file<\/strong> in all of its subdirectories.  In practice, using this script would be pointed at the top level of a &#8220;Projects&#8221; directory on your shared storage and archive the entire single subfolder if every file in that subfolder had not been touched in <em>t<\/em> days.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"2182\" height=\"562\" data-attachment-id=\"268\" data-permalink=\"https:\/\/provideotech.org\/?attachment_id=268\" data-orig-file=\"https:\/\/i0.wp.com\/provideotech.org\/wp-content\/uploads\/2019\/02\/Screen-Shot-2019-02-19-at-12.16.59-PM.png?fit=2182%2C562&amp;ssl=1\" data-orig-size=\"2182,562\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Screen Shot 2019-02-19 at 12.16.59 PM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/provideotech.org\/wp-content\/uploads\/2019\/02\/Screen-Shot-2019-02-19-at-12.16.59-PM.png?fit=720%2C186&amp;ssl=1\" src=\"https:\/\/i2.wp.com\/provideotech.org\/wp-content\/uploads\/2019\/02\/Screen-Shot-2019-02-19-at-12.16.59-PM.png?fit=720%2C186\" alt=\"\" class=\"wp-image-268\" srcset=\"https:\/\/i0.wp.com\/provideotech.org\/wp-content\/uploads\/2019\/02\/Screen-Shot-2019-02-19-at-12.16.59-PM.png?w=2182&amp;ssl=1 2182w, https:\/\/i0.wp.com\/provideotech.org\/wp-content\/uploads\/2019\/02\/Screen-Shot-2019-02-19-at-12.16.59-PM.png?resize=300%2C77&amp;ssl=1 300w, https:\/\/i0.wp.com\/provideotech.org\/wp-content\/uploads\/2019\/02\/Screen-Shot-2019-02-19-at-12.16.59-PM.png?resize=768%2C198&amp;ssl=1 768w, https:\/\/i0.wp.com\/provideotech.org\/wp-content\/uploads\/2019\/02\/Screen-Shot-2019-02-19-at-12.16.59-PM.png?resize=1024%2C264&amp;ssl=1 1024w, https:\/\/i0.wp.com\/provideotech.org\/wp-content\/uploads\/2019\/02\/Screen-Shot-2019-02-19-at-12.16.59-PM.png?w=1440&amp;ssl=1 1440w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><\/figure>\n\n\n\n<p>So in the image above, I have 3 projects:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>1234_Project_Customer_ABC<\/li><li>1235_Project_Customer_DEF<\/li><li>1236_Project_Customer_GHI<\/li><\/ul>\n\n\n\n<p>I&#8217;m going to look at the individual files inside each of these parent directories, and if every file (not directory\/folder) has not been touched in the amount of days I&#8217;ve defined at run time, I&#8217;ll add the entire directory to the archive job.  If I set <em>t<\/em> to 365 days, the script would archive the entire directories 1234_Project_Customer_ABC and 1235_Project_Customer_DEF as nothing in their entire tree has been modified in a year.  However, we would skip 1236_Project_Customer_GHI as we can see there is an item (as of this writing) that is under 365 days old, the file <strong>iconik-isg-handler.zip<\/strong>.  <\/p>\n\n\n\n<p>The P5 filter toolkit can handle archiving individual files or folders\/directories\/paths based on their mod time, but what it does not do is check the contents of those folders\/directories for updates.  That is the goal of what this script accomplishes, and it does some rather simply.  There are three main functions at play in the script.<\/p>\n\n\n\n<pre lang=\"python\">def check_folder_is_archivable(folder):\n    # set a counter to zero for our files, increment if a file doesn't meet \n    # aging rules\n    n=0\n    for this_file in get_all_files(os.path.join(source_directory,folder)):\n        if check_mtime(this_file):\n            pass\n        else:\n            n+=1\n\n    # check if all files in a folder are good to go\n    if n == 0:\n        logging.info(\"Folder \" + folder + \" meet aging requirements\")\n        return True\n    else:\n        logging.info(str(n) + \" files found that don't meet aging in \" + folder +\", skipping\")\n        return False\n<\/pre>\n\n\n\n<p>The above function checks the <strong>contents<\/strong> of each project folder to see if every single file meets the aging requirements.  It simply sets a counter at 0 and if any single file in a directory is older than the defined time in days, it increments.  Once all file are checked, it returns a simple <em>True<\/em> or <em>False<\/em> to the calling function.  Next we have the actual P5 function to create an archive job.<\/p>\n\n\n\n<pre lang=\"python\">def archive_folders(folders):\n    archive_selection = p5_api_call(cmd,['ArchiveSelection','create','localhost',str(plan)]).rstrip()\n    if archive_selection == \"\":\n        logging.error(\"Could not create archive selection: \" + p5_api_call(cmd,['geterror']).rstrip() + \". Exiting.\")\n        exit(1)\n    else:\n        logging.info(\"Successfully created archive selection \" + archive_selection)\n        for folder in folders:\n            this_handle = p5_api_call(cmd,['ArchiveSelection',archive_selection,\"adddirectory\",\"{\" + source_directory + \"\/\" + folder + \"}\"]).rstrip()\n            if this_handle == \"\":\n                logging.error(\"Could not add directory \" + folder + \" to archive selection, skipping\")\n                logging.error(p5_api_call(cmd,['geterror']).rstrip())\n            else:\n                logging.info(\"Successfully added directory \" + folder + \" with handle \" + this_handle)\n        job_number = p5_api_call(cmd,['ArchiveSelection',archive_selection,'submit','now']).rstrip()\n        if job_number == \"\":\n            logging.error(\"Could not submit job: \" + p5_api_call(cmd,['geterror']).rstrip() + \". Exiting.\")\n            exit(1)\n        else:\n            logging.info(\"Successfully submitted \" + str(len(folders)) + \" folders with P5 job number \" + job_number)\n<\/pre>\n\n\n\n<p>This function just does three primary things:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Creates a new Archive Selection<\/li><li>Loops through all folders that meet requirements and submits them with a single <strong>adddirectory<\/strong> call.<\/li><li>Submits the job<\/li><\/ul>\n\n\n\n<p>It also checks for each call and logs any errors that occur during this process to the log.<\/p>\n\n\n\n<p>The other functions used are just time savers and fairly self explanatory.  The final bit of logic for the script happens in this chunk:<\/p>\n\n\n\n<pre lang=\"python\">\n############################################\n#here is where we actually start the script#\n############################################\n\n#log if we are using a dry run or not\nif args.dry_run is False:\n    logging.info(\"Starting Script.\")\nelse:\n    logging.info(\"Starting Script in dry run mode.\")\n\n# make sure nsdchat exists\nif not os.path.isfile(nsdchat):\n\tlogging.error(\"Could not find P5 CLI at \" + str(aw_path) + \"\/bin\/nsdchat, exiting\")\n\tsys.exit(1)\n\n# make sure source directory exists exists\nif not os.path.isdir(source_directory):\n    logging.error(\"Could not source directory at \" + str(source_directory) + \", exiting\")\n    sys.exit(1) \n\n#get all our \"project\" folders\nmy_subs = get_all_subdirs(source_directory)\n\n#check each one for aging and add it to a list if it meets aging requirements\nfolders_ready = []\nfor this_sub in my_subs:\n    if check_folder_is_archivable(this_sub):\n        logging.info(\"Folder \" + this_sub + \" meets requirements, adding to queue.\")\n        folders_ready.append(this_sub)\n    else:\n        logging.info(\"Folder \" + this_sub + \" does not meet requirements, skipping.\")        \n\n#create an archive job for all folders that meet requirements\n#check if we are in dry run mode first\nif args.dry_run is False:\n    if len(folders_ready) > 0:\n        archive_folders(folders_ready)\n    else:\n        logging.info(\"No folders meet requirements.\")\n\n#log that the script completed successully.\nlogging.info(\"Script complete.  Exiting.\")\n<\/pre>\n\n\n\n<p>This block really only does a few things.  It validates that the folder being asked for exists before doing anything at all.  It then validates that P5&#8217;s CLI tool is accessible.  It then gets all the project folders and checks each one for validity based on our timing rules.  Lastly, it submits all valid directories to be archived.<\/p>\n\n\n\n<p>I&#8217;ve made the entire source code and docs to the script available on GitHub.  Feel free to use as you like or suggest any changes (or even update\/fix).  <\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/szumlins\/p5_project_archiver\">https:\/\/github.com\/szumlins\/p5_project_archiver<\/a><\/p>\n\n\n\n<p>There are a few restrictions with the script right now that I&#8217;m aware of.  The main one being that it only supports being run on the P5 server itself and the data must also be visible to localhost.  This can be worked around relatively easily in the script, but it also would create opportunities for users to successfully submit jobs that would fail.<\/p>\n\n\n\n<p>Normally I&#8217;d suggest running the script as a cron or launchd job periodically.  All of the CLI flags are visible in the <a href=\"https:\/\/github.com\/szumlins\/p5_project_archiver\/blob\/master\/README.md\">README.md<\/a> file on GitHub.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently I was asked if P5 could automatically archive an entire folder based not on the modification of the folder itself, but instead on the modification time of the contents of the folder (files and subdirectories). The goal was to enable a &#8220;cold folder&#8221; automation that can automatically clean entire projects from storage based on &hellip; <a href=\"https:\/\/provideotech.org\/?p=260\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Archive folders based on the modification date of their files using Archiware P5&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[8,3,5,6],"tags":[],"class_list":["post-260","post","type-post","status-publish","format-standard","hentry","category-archiware-p5","category-general-info","category-software","category-storage"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bwLw-4c","_links":{"self":[{"href":"https:\/\/provideotech.org\/index.php?rest_route=\/wp\/v2\/posts\/260","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/provideotech.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/provideotech.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/provideotech.org\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/provideotech.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=260"}],"version-history":[{"count":0,"href":"https:\/\/provideotech.org\/index.php?rest_route=\/wp\/v2\/posts\/260\/revisions"}],"wp:attachment":[{"href":"https:\/\/provideotech.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=260"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/provideotech.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=260"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/provideotech.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=260"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}