Creating Cantemo Portal Hierarchy Metadata Field content from external sources

Not too long ago I wrote an article about how to create hierarchical metadata fields using Cantemo’s REST API. Given that this was really more of a “how to” and not a real world use case, I decided that I’d try and tackle a commonly asked for hierarchical field – worldwide location.

 

The goal here would be to take data about all the continents, countries, and states in the world and put them into a single hierarchical field for easy manual location tagging.  First thing I had to do was find my data set.  I searched all over and couldn’t fine exactly what I wanted, but I was able to find an entire list of countries/continents with ISO codes, and another list that had all of the states and countries tied to ISO codes.  I did a little data munging and built a relatively simple JSON file that has everything we need to move forward.  That file can be downloaded here.

This data set is pretty simple. Here is a quick sample:

{ 
	"locations":[
		{
			"continent": "Europe",
			"country": "Albania",
			"state": "Berat"
		}
	]
}

Obviously my list is much larger than this, but you get the idea.  I have a location, and each location has a continent, country, and state.  Works just like I want my field to work.  So lets get into the script and break it down.

 

This top section is just setting up our arguments. I could have hard coded all this, but while I was testing I realized that maybe people would want to work with different subsets of data and everyone would have different fields. So built it to share. What we’ve done here is simply imported the libraries we will need in Python and created a CLI argument parser. Now you can run the script with a “-h” and get back all the inputs you are supposed to put in.

import requests
import os
import sys
import json
import argparse
 
#CLI Options
parser = argparse.ArgumentParser(description='Create Location Tree Structure')
parser.add_argument('-a','--ip-address',dest='portal_ip',metavar="IPADDR",type=str,help="IP Address of Portal Server",required=True)
parser.add_argument('-u','--username',dest='portal_user',metavar="USERNAME",type=str,help="Username of Portal Admin",required=True)
parser.add_argument('-p','--password',dest='portal_pass',metavar="PASSWORD",type=str,help="Password of Portal Admin",required=True)
parser.add_argument('-f','--metadata-field',dest='target_field',metavar="FIELD",type=str,help="Metadata Field To Add Items To",required=True)
parser.add_argument('-i','--input-file',dest='infile',metavar="FILE",type=str,help="Path of JSON File to Parse",required=True)
args = parser.parse_args()
 
portal_user = args.portal_user
portal_pass = args.portal_pass
portal_ip = args.portal_ip
target_field = args.target_field
db = args.infile

All I am doing here is making sure our data file (given the handle “db”) exists and is readable by our script.

# check inputs for validity
if not os.path.isfile(db):
	print db + " Is not a valid file path"
	exit(1)
 
#try to parse input
fp = open(db,'r')
 
try:
	locations = json.load(fp)
except ValueError:
	print ValueError
	exit(3)

This section just set up some globals that get re-used in our functions and procedural stuff below. Why type the same thing over and over?

# set up some shortcuts for later
headers = {'accept': 'application/json','content-type':'application/json'}
api_url = 'http://' + portal_ip + '/API/v2/'

This is our function to actually create a node. It takes 3 arguments – parent, name, label. Parent is which hierarchical parent you want to put this particular node into, name is what it should be called, and label is what class it is in. This is all the same as the metadata group editor UI information, so if you haven’t taken the time to build a hierarchical field by hand first and understand how it works, now would be the time.

 
def create_node(parent,name,label):
	# function to create new nodes quickly
	print "Creating new node for item " + name + " with parent ID " + str(parent)
	url = api_url + 'metadata-schema/fields/' + target_field + '/hierarchy/'
	this_node = {"name": name,"label":label,"parent_id":parent}	
	w = requests.post(url,auth=(portal_user,portal_pass),headers=headers,data=json.dumps(this_node))
	w.raise_for_status()
	return w.json()

This little block just checks to see what our actual parent root is. You would think this is always zero, but you’d be wrong. We need this so we can assign all our first children to it (in this example, continents).

 
# get root id for our field
url = api_url + 'metadata-schema/fields/' + target_field + '/hierarchy/'
r = requests.get(url,auth=(portal_user,portal_pass),headers=headers)
r.raise_for_status()
for object in r.json()['objects']:
	if object['parent_id'] == None and object['level'] == 0:
		root = object['id']

This is the meat and potatoes of the script. It opens the JSON file, looks at every single location and indexes the node it is creating into “roots”. It then checks each node to see if its parent is already created by looking in the “roots” dict. Admittedly, this method WILL break if you have children with the same exact name as parents in some classes as it isn’t comparing to the actual data set, but instead to a local unique index. Luckily, there aren’t any states in the world that are named the same as countries or continents so it works here.

 
# build a dict for parents to look up their ids (so we don't have to keep making REST calls)
roots = {}
 
# start processing our records one at a time from our input file
i = 1
for location in locations['locations']:
	print "Working on location " + str(i) + " of " + str(len(locations['locations']))
	i+=1
	# check if our continent exists
	url = api_url + 'metadata-schema/fields/' + target_field + '/hierarchy/' + str(root) + '/children'
	r = requests.get(url,auth=(portal_user,portal_pass),headers=headers)
	r.raise_for_status()
 
	# create our new child continent	- if it is empty create, or if we come across a new continent that doesn't exist yet
	if len(r.json()['objects']) == 0:
		result = create_node(root,location['continent'],"Country")
		roots[location['continent']] = result['id']
	else:
		for object in r.json()['objects']:
			if object['name'] != location['continent']:
				if not roots.get(location['continent']):
					result = create_node(root,location['continent'],"Country")			
					roots[location['continent']] = result['id']	
 
	# create our new child country if the continent exists	
	if not roots.get(location['country']):
		result = create_node(roots[location['continent']],location['country'],"State/Provice")
		roots[location['country']] = result['id']	
	# create our new child state if the country exists
	if not roots.get(location['state']):
		result = create_node(roots[location['country']],location['state'],None)
		roots[location['state']] = result['id']

So to run this script, first make sure you have the appropriate libraries installed (if not, time to google pip or better yet virtualenv). Then you can get sample output simply:

$> python create_location_field.py -h
usage: create_location_field.py [-h] -a IPADDR -u USERNAME -p PASSWORD -f
                                FIELD -i FILE
 
Create Location Tree Structure
 
optional arguments:
  -h, --help            show this help message and exit
  -a IPADDR, --ip-address IPADDR
                        IP Address of Portal Server
  -u USERNAME, --username USERNAME
                        Username of Portal Admin
  -p PASSWORD, --password PASSWORD
                        Password of Portal Admin
  -f FIELD, --metadata-field FIELD
                        Metadata Field To Add Items To
  -i FILE, --input-file FILE
                        Path of JSON File to Parse

So a proper run would look like this:

$> python create_location_field.py -u admin -p admin -a cantemo.provideotech.lan -f portal_mf847484 -i ./continent_country_state.json

If all is working, you’ll get a fun rolling list walking through all the records. The full script can be downloaded here.

5 thoughts on “Creating Cantemo Portal Hierarchy Metadata Field content from external sources

  1. Hi, thanks for the write up.
    I recently wrote a little NodeJS script to import a tree of ~500 entries to Cantemo as hierarchical metadata.
    Data was basically a fixed set of hierachical categories / tags. Getting the data into Cantemo was quite straight forward.

    But I am not very happy with implementation of the hierarchical field itself.

    Users who enter metadata have to “walk” along the whole tree to enter metadata.
    If I have a hierarchy like “Europe > Italy > Rome” I want to be able to type just “Rome” and have the whole tree as well. But users have to type “Europe”, then “Italy”, then “Rome”.

    I have to implement a location metadata field as well and I was sure I would do it with Cantemo’s hierarchical metadata feature.
    But with the learnings of the taxonomy hierarchy with ~500 entries, I guess I will have to implement a more user friendly way.

    I made something similar for WordPress, you just type “Munich” and metadata will be “Europe” > “Germany” > “Upper Bavaria” > “Munich”. This was backed by OpenGeoDB and worked quite well.

    Are you / your users satisfied by the current implementation of hierarchical metadata? I ask because I hope that I have overlooked something.

    1. Basti, there is a new feature coming in one of the 3.2 releases of Portal called “filtered metadata” that accomplishes what you want by establishing relationships between different fields and realtime filtering the results. From what I’m told, this will be bi-directional, so choosing the lowest member of a tree it will auto-fill all parents.

      I see hierarchical metadata used for more general single field instances where a dive-down tree is the preferable way of getting data into the system as you may not actually know children until you pick a parent, so it still has real value in some cases. That said, I believe the filtered metadata will likely be applicable to more general use cases (and therefore more users).

      1. Thanks for the information. This will help a lot – but I’m on my way to develop a browser based form to enter the metadata. I leaned from past projects that one can’t put too much effort into making these forms as user friendly as possible. The results will pay back as you will get a clean, consistent media archive.
        I’m used to web content management systems where – at least from what I saw until now – custom data fields are implemented better.
        Maybe it’s because I have to re-think some patterns, but I’m sure there’s lots of potential for improvement.
        Of course, newsroom or asset management software has to take care of a lot more than metadata, while a web CMS is barely more than a number of meta fields in a database…

        1. Thanks Mike for the script!
          Basti, have you started developing your browser based form yet?
          Just like you, we are not really happy with the current implementation of this feature: we need to be able to lookup for last level childs too.
          Mike’s solution using filtered metadata does not suit us as you need to create as many portal fields as hierarchical level you get (correct me if I’m wrong).
          Also we would love being able to dynamically add new childs (as for the “tag” fiels type), and search for childs of a given term (ex: if I search for “Europe” > “Germany” > “Upper Bavaria, I should get all media with “Europe” > “Germany” > “Upper Bavaria” > “Munich”)
          We’re also investigating ways to better implement this feature. Would you be interested by sharing thoughts about that?

  2. Hi, yes we discussed this internally and it seems to be the best solution for us to create an external web app.

    It would be better to integrate this in Cantemo because that’s where it should be. But it’s difficult to integrate with barely no / wrong / misleading documentation and zero support from Cantemo.

    I created a back-end which handles all API calls and abstracts away some pain. Cantemo is one module in the back-end. There are other modules that push/pull data from our newsroom system, the playouts or other parts. It has no own database and I hope that it won’t need one. It acts as a middleware to pull, push, normalize and cache data so the frontend doesnt talk to the services directly.

    The front-end is a Progressive Web App based on VueJS. Currently it is possible to create and edit Cantemo placeholders based on rundown entries in our newsroom system. We now have a great usability including keyboard shortcuts, working autoComplete fields, better validation and pre-processing of entered data.

    What I figured out so far:

    – Cantemo is basically a frontend for Vidispine. Vidispine API has a much better documentation, while it’s also a huge load of concepts to learn before things work.

    – Hierarchies with many entries will not be implemented in Cantemo / Vidispine. E.g. the “location” field uses a geolocation API as source and uploads text fields to Vidispine.

    – Cantemo uses ElasticSearch v1.7 – so many things like geo-search (“items max 50km around Berlin” won’t work out of the box.)

    Bottom line:
    I’m still pretty optimistic that things will work out and also that it was the right decision that we didn’t try to integrate this directly to Cantemo.

Leave a Reply