Creating Cantemo Portal Hierarchy Metadata Field content from external sources

Not too long ago I wrote an article about how to create hierarchical metadata fields using Cantemo’s REST API. Given that this was really more of a “how to” and not a real world use case, I decided that I’d try and tackle a commonly asked for hierarchical field – worldwide location.

 

The goal here would be to take data about all the continents, countries, and states in the world and put them into a single hierarchical field for easy manual location tagging.  First thing I had to do was find my data set.  I searched all over and couldn’t fine exactly what I wanted, but I was able to find an entire list of countries/continents with ISO codes, and another list that had all of the states and countries tied to ISO codes.  I did a little data munging and built a relatively simple JSON file that has everything we need to move forward.  That file can be downloaded here.

This data set is pretty simple. Here is a quick sample:

{ 
	"locations":[
		{
			"continent": "Europe",
			"country": "Albania",
			"state": "Berat"
		}
	]
}

Obviously my list is much larger than this, but you get the idea.  I have a location, and each location has a continent, country, and state.  Works just like I want my field to work.  So lets get into the script and break it down.

 

This top section is just setting up our arguments. I could have hard coded all this, but while I was testing I realized that maybe people would want to work with different subsets of data and everyone would have different fields. So built it to share. What we’ve done here is simply imported the libraries we will need in Python and created a CLI argument parser. Now you can run the script with a “-h” and get back all the inputs you are supposed to put in.

import requests
import os
import sys
import json
import argparse
 
#CLI Options
parser = argparse.ArgumentParser(description='Create Location Tree Structure')
parser.add_argument('-a','--ip-address',dest='portal_ip',metavar="IPADDR",type=str,help="IP Address of Portal Server",required=True)
parser.add_argument('-u','--username',dest='portal_user',metavar="USERNAME",type=str,help="Username of Portal Admin",required=True)
parser.add_argument('-p','--password',dest='portal_pass',metavar="PASSWORD",type=str,help="Password of Portal Admin",required=True)
parser.add_argument('-f','--metadata-field',dest='target_field',metavar="FIELD",type=str,help="Metadata Field To Add Items To",required=True)
parser.add_argument('-i','--input-file',dest='infile',metavar="FILE",type=str,help="Path of JSON File to Parse",required=True)
args = parser.parse_args()
 
portal_user = args.portal_user
portal_pass = args.portal_pass
portal_ip = args.portal_ip
target_field = args.target_field
db = args.infile

All I am doing here is making sure our data file (given the handle “db”) exists and is readable by our script.

# check inputs for validity
if not os.path.isfile(db):
	print db + " Is not a valid file path"
	exit(1)
 
#try to parse input
fp = open(db,'r')
 
try:
	locations = json.load(fp)
except ValueError:
	print ValueError
	exit(3)

This section just set up some globals that get re-used in our functions and procedural stuff below. Why type the same thing over and over?

# set up some shortcuts for later
headers = {'accept': 'application/json','content-type':'application/json'}
api_url = 'http://' + portal_ip + '/API/v2/'

This is our function to actually create a node. It takes 3 arguments – parent, name, label. Parent is which hierarchical parent you want to put this particular node into, name is what it should be called, and label is what class it is in. This is all the same as the metadata group editor UI information, so if you haven’t taken the time to build a hierarchical field by hand first and understand how it works, now would be the time.

 
def create_node(parent,name,label):
	# function to create new nodes quickly
	print "Creating new node for item " + name + " with parent ID " + str(parent)
	url = api_url + 'metadata-schema/fields/' + target_field + '/hierarchy/'
	this_node = {"name": name,"label":label,"parent_id":parent}	
	w = requests.post(url,auth=(portal_user,portal_pass),headers=headers,data=json.dumps(this_node))
	w.raise_for_status()
	return w.json()

This little block just checks to see what our actual parent root is. You would think this is always zero, but you’d be wrong. We need this so we can assign all our first children to it (in this example, continents).

 
# get root id for our field
url = api_url + 'metadata-schema/fields/' + target_field + '/hierarchy/'
r = requests.get(url,auth=(portal_user,portal_pass),headers=headers)
r.raise_for_status()
for object in r.json()['objects']:
	if object['parent_id'] == None and object['level'] == 0:
		root = object['id']

This is the meat and potatoes of the script. It opens the JSON file, looks at every single location and indexes the node it is creating into “roots”. It then checks each node to see if its parent is already created by looking in the “roots” dict. Admittedly, this method WILL break if you have children with the same exact name as parents in some classes as it isn’t comparing to the actual data set, but instead to a local unique index. Luckily, there aren’t any states in the world that are named the same as countries or continents so it works here.

 
# build a dict for parents to look up their ids (so we don't have to keep making REST calls)
roots = {}
 
# start processing our records one at a time from our input file
i = 1
for location in locations['locations']:
	print "Working on location " + str(i) + " of " + str(len(locations['locations']))
	i+=1
	# check if our continent exists
	url = api_url + 'metadata-schema/fields/' + target_field + '/hierarchy/' + str(root) + '/children'
	r = requests.get(url,auth=(portal_user,portal_pass),headers=headers)
	r.raise_for_status()
 
	# create our new child continent	- if it is empty create, or if we come across a new continent that doesn't exist yet
	if len(r.json()['objects']) == 0:
		result = create_node(root,location['continent'],"Country")
		roots[location['continent']] = result['id']
	else:
		for object in r.json()['objects']:
			if object['name'] != location['continent']:
				if not roots.get(location['continent']):
					result = create_node(root,location['continent'],"Country")			
					roots[location['continent']] = result['id']	
 
	# create our new child country if the continent exists	
	if not roots.get(location['country']):
		result = create_node(roots[location['continent']],location['country'],"State/Provice")
		roots[location['country']] = result['id']	
	# create our new child state if the country exists
	if not roots.get(location['state']):
		result = create_node(roots[location['country']],location['state'],None)
		roots[location['state']] = result['id']

So to run this script, first make sure you have the appropriate libraries installed (if not, time to google pip or better yet virtualenv). Then you can get sample output simply:

$> python create_location_field.py -h
usage: create_location_field.py [-h] -a IPADDR -u USERNAME -p PASSWORD -f
                                FIELD -i FILE
 
Create Location Tree Structure
 
optional arguments:
  -h, --help            show this help message and exit
  -a IPADDR, --ip-address IPADDR
                        IP Address of Portal Server
  -u USERNAME, --username USERNAME
                        Username of Portal Admin
  -p PASSWORD, --password PASSWORD
                        Password of Portal Admin
  -f FIELD, --metadata-field FIELD
                        Metadata Field To Add Items To
  -i FILE, --input-file FILE
                        Path of JSON File to Parse

So a proper run would look like this:

$> python create_location_field.py -u admin -p admin -a cantemo.provideotech.lan -f portal_mf847484 -i ./continent_country_state.json

If all is working, you’ll get a fun rolling list walking through all the records. The full script can be downloaded here.