{"id":217,"date":"2017-08-07T14:56:44","date_gmt":"2017-08-07T18:56:44","guid":{"rendered":"http:\/\/provideotech.org\/?p=217"},"modified":"2017-08-07T14:56:44","modified_gmt":"2017-08-07T18:56:44","slug":"bulk-adding-choices-to-cantemo-portal-metadata-fields-from-generic-text-files","status":"publish","type":"post","link":"https:\/\/provideotech.org\/?p=217","title":{"rendered":"Bulk adding choices to Cantemo Portal metadata fields from generic text files"},"content":{"rendered":"<p>For a while now I&#8217;ve been asked if there is an easy way to mass populate metadata fields with a list of choices in Portal. In the back of my mind I thought it would probably be relatively simple to write a simple parser to do this, and I&#8217;ve done just that. It can be downloaded <a href=\"https:\/\/github.com\/szumlins\/metadata_list_builder\">here<\/a>. I&#8217;ll use this article to walk through how I did it so you can understand a bit more about Portal metadata field structure.<\/p>\n<p><!--more--><\/p>\n<p>Right now for the sake of our script, Portal really only has 4 &#8220;types&#8221; of fields.<\/p>\n<ul>\nSpecial Fields<br \/>\nNon editable (system) fields<br \/>\nEditable fields that do not contain lists of choices<br \/>\nEditable fields that contain lists of choices\n<\/ul>\n<p>So lets look at these field classes.<\/p>\n<p><strong>Special Fields<\/strong><\/p>\n<ul>\nInteger &#8211; can only contain whole numbers (ex: 2,5,1232)<br \/>\nDate &#8211; can only contain specifically formatted date string<br \/>\nTimestamp &#8211; can only contain specifically formatted date\/time string<br \/>\nFloat &#8211; can only contain fractional\/float numbers (2.3,1.0,543.683)<br \/>\nTags &#8211; mix of a choice style field and free text field, user extensible<br \/>\nHierarchy &#8211; contains a full relational tree with multiple linked values in a single field\n<\/ul>\n<p>These fields all have very specific requirements and we won&#8217;t be using them in a list population script.<\/p>\n<p><strong>Non editable (system) fields<\/strong><\/p>\n<ul>\nSystem field &#8211; These are fields specific to Portal that are searchable, but not editable<br \/>\nXMP field &#8211; Embedded metadata values scraped from XMP\/EXIF\/DC\/IPTC values\n<\/ul>\n<p>Since these fields aren&#8217;t editable, we won&#8217;t be putting any lists into them.<\/p>\n<p><strong>Editable fields that do not contain lists of choices<\/strong><\/p>\n<ul>\nText<br \/>\nTextarea\n<\/ul>\n<p>These fields contain strings without special formatting unless regex&#8217;d in settings.  Since we are building lists, we are gong to ignore these field types.<\/p>\n<p><strong>Editable fields that contain lists of choices<\/strong><\/p>\n<ul>\nMulti-choice &#8211; selection of elements from a dropdown, can have only one selection at a time<br \/>\nCheckbox &#8211; selection of elements from a visual array of checkboxes, can have multiple values<br \/>\nRadio &#8211; selection of elements from a visual array with a radio button, can have only one selection at a time<br \/>\nLookup &#8211; selection of elements from a dropdown, can have multiple values<br \/>\nWorksteps &#8211; used for process flow but values are still list populated, can have only one selection at a time\n<\/ul>\n<p>These are the fields that we are concerned with for the purpose of this script.  There is one important thing to note here &#8211; for some reason Lookup fields store their data in a different way and use a different API call to store lists.  This isn&#8217;t the end of the world, but we do have to differentiate between two &#8220;types&#8221; of fields inside this class and we also have to sniff what type of field we are working with here so we know which API call to make.<\/p>\n<p>So now that we understand what fields are what, lets dive into the script a little.<\/p>\n<p>First thing we are going to do is import the libraries we need to make this all work.<\/p>\n<pre lang=\"python\" line=\"1\">\r\nimport json\r\nimport requests\r\nimport argparse\r\nimport csv\r\nimport os \r\nimport unicodedata\r\nimport re \r\n<\/pre>\n<p>Next, we build our cli options using the <a href=\"https:\/\/docs.python.org\/2.7\/library\/argparse.html\">argparse<\/a> library.  This provides us the ability to easily parse command line arguments in any order by defining flags before them.  It also provides us a simple help menu and allows us to require some fields.<\/p>\n<pre lang=\"python\" line=\"9\">\r\n#command line argument parsing\r\nparser = argparse.ArgumentParser(description='Batch update values in Portal metadata fields')\r\nparser.add_argument('-u','--username',dest='username',metavar=\"USERNAME\",type=str,help=\"Portal username, uses \\\"admin\\\" if not set\",default=\"admin\")\r\nparser.add_argument('-p','--password',dest='password',metavar=\"PASSWORD\",type=str,help=\"Portal password\",required=True)\r\nparser.add_argument('-a','--address',dest='address',metavar=\"ADDRESS\",type=str,help=\"IP Address or DNS name of Portal server\",required=True)\r\nparser.add_argument('-f','--field',dest='field',metavar=\"FIELD\",help=\"Portal metadata field\",required=True)\r\nparser.add_argument('-i','--input-file',metavar=\"FILE_PATH\",dest='input_file',help=\"Key\/Value input file (line delimited values or csv key\/value pairs)\",required=True)\r\ncli_args = parser.parse_args()\r\n<\/pre>\n<p>Next up, we are just going to build a shortcut so we don&#8217;t have to type our Portal URL in every REST call for the rest of the script.  <\/p>\n<pre lang=\"python\" line=\"18\">\r\n#format our URL as a shortcut\r\nportal_url = 'http:\/\/' + cli_args.address + ':8080\/API\/'\r\n<\/pre>\n<p>After that, we are going to build a list of field types that this script works on.  From our research above, we&#8217;ve come up with dropdown, checkbox, radio, tags, workstep, and lookup field types.  You&#8217;ll notice these field types aren&#8217;t necessarily named the same thing as they are in the Metadata Manager UI.  To figure out their programatic names, I had to do some API calls on all the metadata fields to get their programatic keys (more on that later).<\/p>\n<pre lang=\"python\" line=\"21\">\r\n#define which field types this script works on\r\nsafe_field_types = ['dropdown','checkbox','radio','tags','workstep','lookup']\r\n<\/pre>\n<p>Next up we define some functions to make our lives easier.  First up is the slugify function.  This simply makes strings safe for key entry in the system.  As of Portal 3.0, all keys are slugified when added via the UI, I wanted to mimic that functionality.<\/p>\n<pre lang=\"python\" line=\"24\">\r\ndef slugify(s):\r\n\t#slugify function to remove special characters from value strings\r\n\ts = s.lower()\r\n\tfor c in [' ', '-', '.', '\/']:\r\n\t\ts = s.replace(c, '_')\r\n\ts = re.sub('\\W', '', s)\r\n\ts = s.replace('_', ' ')\r\n\ts = re.sub('\\s+', ' ', s)\r\n\ts = s.strip()\t\t\t\r\n\ts = s.replace(' ', '-')\r\n\treturn s\r\n<\/pre>\n<p>This is where things get interesting.  To figure out what kind of field I&#8217;m working with, I need to get all the information on that field.  The field itself is a Vidispine object, so we need to look at the <a href=\"http:\/\/apidoc.vidispine.com\/latest\/ref\/metadata\/field.html\">Vidispine API docs<\/a> to get all the field data back.  In this case, we use <code>:8080\/API\/metadata-field\/<field_name>\/?data=all<\/code> to with <code>application\/json<\/code> headers. In my test environment, here is an example of what I&#8217;d get when running on a field.<\/p>\n<pre lang=\"json\">\r\n{\r\n    \"data\": [\r\n        {\r\n            \"key\": \"extradata\",\r\n            \"value\": \"{\\\"sortable\\\": false, \\\"name\\\": \\\"Review and Approve\\\", \\\"default\\\": \\\"In Process\\\", \\\"pattern\\\": null, \\\"description\\\": null, \\\"cascading\\\": false, \\\"readonly\\\": null, \\\"values\\\": [{\\\"value\\\": \\\"In Process\\\", \\\"key\\\": \\\"In Process\\\"}, {\\\"value\\\": \\\"Review\\\", \\\"key\\\": \\\"Review\\\"}, {\\\"value\\\": \\\"Approved\\\", \\\"key\\\": \\\"Approved\\\"}, {\\\"value\\\": \\\"Distribution\\\", \\\"key\\\": \\\"Distribution\\\"}], \\\"autoset\\\": false, \\\"externalid\\\": null, \\\"reusable\\\": null, \\\"type\\\": \\\"workstep\\\", \\\"Film\\\": {\\\"required\\\": true, \\\"hideifnotset\\\": false, \\\"representative\\\": false}, \\\"vertical_mode\\\": false}\"\r\n        }\r\n    ],\r\n    \"defaultValue\": \"In Process\",\r\n    \"name\": \"portal_mf116390\",\r\n    \"origin\": \"VX\",\r\n    \"stringRestriction\": {},\r\n    \"type\": \"string-exact\"\r\n}\r\n<\/pre>\n<p>So there is something interesting to note above &#8211; we have a <code>data<\/code> block, but in there is a key\/value pair where the key is <code>extradata<\/code>.  Inside the value of that key is an entire JSON array that is stringified.  So we gotta get that data out and then objectify it so we can get figure out what its values are and what type of field it is.  Enter the <code>get_field_data<\/code> function<\/p>\n<pre lang=\"python\" line=\"35\">\r\ndef get_field_data(fieldname):\r\n\t#get all information about a field\r\n\tr = requests.get(portal_url + 'metadata-field\/' + fieldname,headers={'accept':'application\/json'},params={'data':'all'},auth=(cli_args.username,cli_args.password))\r\n\tfor d in r.json().get('data',[]):\r\n\t\tif d['key'] == 'extradata':\r\n\t\t\treturn json.loads(str(d['value']))\r\n\t\telse:\r\n\t\t\treturn []\r\n<\/pre>\n<p>Our for loop here looks at the response JSON we got and loops through all the keys looking for <code>extradata<\/code>.  If it isn&#8217;t found, it returns an empty list.  If it does find the key, it returns a JSON object of the value for our <code>extradata<\/code> key.<\/p>\n<p>Because I&#8217;m slightly lazy, had already written and implemented the above function, and needed parent information from the original JSON, I wrote a second function to get ALL the data back as we need that as well.  The computational load of making two calls on the field is negligible in any use case I can see for this script, but the *right* way to do it would have been to parse the returned object from this function instead of using our <code>get_field_data<\/code> function.  This simply gets back the JSON document from VS that we uncovered above into an object in the script.  More on why that matters later.<\/p>\n<pre lang=\"python\" line=\"43\">\r\ndef get_field_document(fieldname):\r\n\t#get all information about a field\r\n\tr = requests.get(portal_url + 'metadata-field\/' + fieldname,headers={'accept':'application\/json'},params={'data':'all'},auth=(cli_args.username,cli_args.password))\r\n\treturn r.json()\r\n<\/pre>\n<p>Since lookups store data in a different way than all other list type fields, we needed a way to get the list of key\/value pairs back for them as well in our processing.  This function does just that.<\/p>\n<pre lang=\"python\" line=\"47\">\r\ndef get_lookup_values(fieldname):\r\n\t#return json object of all key\/value pairs of a field for lookups.  If the field values list is empty, just return an empty list\r\n\tr = requests.get(portal_url + 'metadata-field\/' + fieldname + '\/values',headers={'accept':'application\/json'},auth=(cli_args.username,cli_args.password))\r\n\ttry:\r\n\t\treturn r.json()['field']\t\t\r\n\texcept:\r\n\t\treturn []\r\n<\/pre>\n<p>Now that we have information about our fields and the values held within them, we needed a way to parse our CSV or text files to easily update the fields.  This function looks at the file, checks to see if it exists, reads it in as a CSV file, then checks for each line of the file how it is formatted, creates a key\/value pair, and adds it to a list of dicts.  It then returns a unicode list of key\/value pair dicts.<\/p>\n<pre lang=\"python\" line=\"54\">\r\n\r\ndef get_file_values(input_file):\r\n\t#return json object of all key\/value pairs of a field for non lookups\r\n\tif os.path.exists(input_file):\r\n\t\ttry:\r\n\t\t\twith open(input_file, 'r') as csvfile:\r\n\t\t\t\toptions_obj = csv.reader(csvfile, delimiter=',', quotechar='\"')\r\n\t\t\t\tvalues = []\t\t\r\n\t\t\t\t#this isn't a lookup, so we formulate JSON key\/value pairs\r\n\t\t\t\tfor row in options_obj:\r\n\t\t\t\t\tif len(row) == 2:\r\n\t\t\t\t\t\tvalues.append({\"key\":slugify(row[0]),\"value\":row[1].rstrip()})\r\n\t\t\t\t\telif len(row) == 1:\r\n\t\t\t\t\t\tvalues.append({\"key\":slugify(row[0]),\"value\":row[0].rstrip()})\r\n\t\t\t\t\telse:\r\n\t\t\t\t\t\tprint \"Too many options for CSV file, should have two per line\"\r\n\t\t\t\t\t\texit()\t\t\t\t\t\t\t\t\t\t\t\t\t\r\n\t\texcept Exception as e:\r\n\t\t\tprint \"Error trying to parse input file: \" + str(e)\r\n\t\t\texit()\r\n\telse:\r\n\t\tprint \"File \" + input_file + \" doesn't exist\"\r\n\t\texit()\r\n\treturn unicode_list(values)\r\n<\/pre>\n<p>The last function we had to write was a way to convert existing lists to unicode objects so we had uniformity when we start mixing\/matching lists later.  This function just takes a list with string encoded values and converts them all to unicode encoded values.<\/p>\n<pre lang=\"python\" line=\"77\">\r\ndef unicode_list(list):\r\n\tnew_list = []\r\n\tfor pair in list:\r\n\t\tunidict = dict((k.decode('utf8'), v.decode('utf8')) for k, v in pair.items())\r\n\t\tnew_list.append(unidict)\r\n\treturn new_list\r\n<\/pre>\n<p>Now we get into the meat of the script &#8211; actually doing stuff.  It looks like a lot, but there are really only three things happening.  First, we run our functions to get all of the values and document objects from the field we are looking to update.  <\/p>\n<pre lang=\"python\" line=\"84\">\r\n#get all the data about a field\r\nfield_data = get_field_data(cli_args.field)\r\nfield_document = get_field_document(cli_args.field)\r\n<\/pre>\n<p>There are basically two quick checks that happen, one to make sure the field type we have got is one we can process, and if that is true we see if it is a lookup field or not (since the values are stored differently in lookups vs. other fields).<\/p>\n<pre lang=\"python\" line=\"88\">\r\n#check if our initial call returned good data\r\nif field_data != None:\r\n\t#check if our field is an acceptable field\r\n\tif field_data['type'] in safe_field_types:\r\n\t\t#check what type of field and process\t\r\n<\/pre>\n<p>Now we only have two options &#8211; first is what happens if it is not a <code>lookup<\/code> field. The first thing we do is concatenate the values from the field currently and then the fields from the file.  Then we sort that set and remove duplicates.  We then format the data so that it can be sent in a REST call.  <\/p>\n<p>Next, we have to figure out what index our <code>extradata<\/code> key is at.  99.9% of the time this will be 0, but it isn&#8217;t safe to assume, so we check.  Then we replace the value for that key with our newly formatted data.  Lastly, we make a REST call to post our JSON object back with all the new values and make sure it worked.<\/p>\n<pre lang=\"python\" line=\"93\">\r\n\t\tif field_data['type'] != 'lookup':\r\n\t\t\t# get back options for non lookup fields\r\n\t\t\tnew_values = get_file_values(cli_args.input_file) + field_data['values']\r\n\t\t\t#add existing values to new text file values, sort, and remove duplicates\r\n\t\t\tsorted_set = [dict(t) for t in set([tuple(sorted(d.items())) for d in new_values])]\t\r\n\t\t\t#format data for posting back\t\t\t\r\n\t\t\tfield_data['values'] = sorted_set\t\t\t\t\t\t\t\t\r\n\t\t\t#find our extradata index in the data object\r\n\t\t\textradata_index = next(index for (index, d) in enumerate(field_document['data']) if d['key'] == 'extradata')\r\n\t\t\t#update that index's value with our new updated field data\r\n\t\t\tfield_document['data'][extradata_index]['value'] = json.dumps(field_data)\r\n\t\t\t#put back data to field\t\r\n\t\t\tr = requests.put(portal_url + 'metadata-field\/' + cli_args.field, headers={'accept':'application\/json','content-type':'application\/json'},data=json.dumps(field_document),auth=(cli_args.username,cli_args.password))\t\t\t\r\n\t\t\tr.raise_for_status()\r\n<\/pre>\n<p>Otherwise, we know we we are working with a lookup field.  The process is similar here, but our list data has to be formatted as XML and posted back in a different way.  You can see the comments in this block follow the same path, only difference is formatting our response and how\/where we send it.<\/p>\n<pre lang=\"python\" line=\"107\">\r\n\t\telse:\r\n\t\t\t# get back options for lookup fields\t\t\r\n\t\t\tnew_values = get_file_values(cli_args.input_file) + get_lookup_values(cli_args.field)\r\n\t\t\t#add existing values to new text file values, sort, and remove duplicates\r\n\t\t\tsorted_set = [dict(t) for t in set([tuple(sorted(d.items())) for d in new_values])]\t\t\r\n\t\t\t#format data for posting back\r\n\t\t\tlookup_data = {'field':sorted_set}\r\n\t\t\t#format as XML since VS is broken for posting JSON right now\r\n\t\t\tmetadata_doc = '<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><SimpleMetadataDocument xmlns=\"http:\/\/xml.vidispine.com\/schema\/vidispine\">'\r\n\t\t\tfor pair in lookup_data['field']:\r\n\t\t\t\tmetadata_doc = metadata_doc + '<field><key>' + pair['key'] + '<\/key><value>' + pair['value'] + '<\/value><\/field>'\r\n\t\t\tmetadata_doc = metadata_doc + '<\/SimpleMetadataDocument>'\r\n\t\t\t#put back data to field\t\r\n\t\t\tr = requests.put(portal_url + 'metadata-field\/' + cli_args.field + '\/values', headers={'accept':'application\/json','content-type':'application\/xml'},data=metadata_doc,auth=(cli_args.username,cli_args.password))\r\n\t\t\tr.raise_for_status()\r\n<\/pre>\n<p>The only thing left are the closures of our if statements to see if the field we are working with is valid.<\/p>\n<pre lang=\"python\" line=\"123\">\r\n\telse:\r\n\t\tprint \"Can't use this field type with this script. Exiting.\"\r\n\t\texit()\r\nelse:\r\n\tprint \"Error finding field \" + cli_args.field\r\n\texit()\r\n<\/pre>\n<p>Hopefully this gives you a little insight into how metadata values are stored in a Portal database and how you can manipulate them.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For a while now I&#8217;ve been asked if there is an easy way to mass populate metadata fields with a list of choices in Portal. In the back of my mind I thought it would probably be relatively simple to write a simple parser to do this, and I&#8217;ve done just that. It can be &hellip; <a href=\"https:\/\/provideotech.org\/?p=217\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Bulk adding choices to Cantemo Portal metadata fields from generic text files&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[13,5],"tags":[],"class_list":["post-217","post","type-post","status-publish","format-standard","hentry","category-cantemo-portal","category-software"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bwLw-3v","_links":{"self":[{"href":"https:\/\/provideotech.org\/index.php?rest_route=\/wp\/v2\/posts\/217","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/provideotech.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/provideotech.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/provideotech.org\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/provideotech.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=217"}],"version-history":[{"count":0,"href":"https:\/\/provideotech.org\/index.php?rest_route=\/wp\/v2\/posts\/217\/revisions"}],"wp:attachment":[{"href":"https:\/\/provideotech.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=217"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/provideotech.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=217"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/provideotech.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=217"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}