Today I had to review code that parses EDL files. In the code review I suggested using parsimonious instead of raw regex as this is my go to method to parse any structured text files. Since I wasn’t sure if EDL files were easy or hard to parse I decided to have a quick go at writing parsimonious grammar to make sure I’m not suggesting something in the review that is worse than the current, already working solution. Anyway, this is a good opportunity to show my approach at building the grammar. Parsimonious error are not always helpful and quite often seem totally meaningless. It tries hard but it is not easy to point out which part of the grammar failed and why. To avoid getting bogged in errors I usually start very simple with just one line of the file I need to parse and then slowly add more lines and extend the grammar as I go continuously checking if it is still parsing.
A simple EDL that I found online:
TITLE: TEST PAPEREDIT
FCM: NON-DROP FRAME
001 Card01Ky AA/V C 00:02:26:21 00:02:30:12 00:00:00:00 00:00:03:16
* FROM CLIP NAME: KYLE_INTERVIEW.MOV
* COMMENT:
FINAL CUT PRO REEL: Card01_Kyle_Interview REPLACED BY: Card01Ky
002 Card01Ky AA/V C 00:02:30:12 00:02:34:13 00:00:03:16 00:00:07:17
* FROM CLIP NAME: KYLE_INTERVIEW.MOV
* COMMENT:
FINAL CUT PRO REEL: Card01_Kyle_Interview REPLACED BY: Card01Ky
lets start with the header. we have two header lines (potentially there can be more) with very clear structure of key and value. a grammar for it could be something like:
grammar = Grammar(
r"""
edl = header_entry+
header_entry = key ":" spaces value newline
key = ~"[A-z0-9_ ]+"i
value = ~".*"i
newline = ~"\n*"
spaces = ~"\s+"
""")
edl = """TITLE: TEST PAPEREDIT
FCM: NON-DROP FRAME
"""
p = grammar.parse(edl)
I think this doesn’t require more explanation. One important fact about parsimonious grammar is that we have to declare everything, including new lines and space bars. Similarly to regex but in a more readable way. Since this parses we can start to extend it and the clip entries to it.
grammar = Grammar(
r"""
edl = header_entry+ (empty entry)+
header_entry = key ":" spaces value newline
key = ~"[A-z0-9_ ]+"i
entry = (~".+"i newline)+
empty = spaces? newline
value = ~".*"i
newline = ~"\n*"
spaces = ~"\s+"
""")
edl = """TITLE: TEST PAPEREDIT
FCM: NON-DROP FRAME
001 Card01Ky AA/V C 00:02:26:21 00:02:30:12 00:00:00:00 00:00:03:16
* FROM CLIP NAME: KYLE_INTERVIEW.MOV
* COMMENT:
FINAL CUT PRO REEL: Card01_Kyle_Interview REPLACED BY: Card01Ky
"""
p = grammar.parse(edl)
At this stage we can declare the entries is a very rudimentary way as any characters followed by new lines. We want to go slowly and keep it working without adding too much in each step. Now what is left to do is to extend the definition of “entry” to extract all the data from it we need. The definition of “edl” is already completed – a few lines of the header and then clip entries separated by empty lines.
Quite often I keep building the visitor structure at the same time while I’m extending the grammar, again to make sure it all works and I do not need to deal with complicated error messages and try to figure out what part of the code is not working. If it stops to work then whatever I did last is what broke it. It is a good practice to build unittests and iterate between the tests and the code slowly adding complexity.
The complete code to parse EDL could be something like this:
import attr
@attr.s
class Clip():
index = attr.ib()
reel = attr.ib()
tracks = attr.ib()
transition = attr.ib()
out_start = attr.ib()
out_end = attr.ib()
src_start = attr.ib()
src_end = attr.ib()
attrs = attr.ib()
from parsimonious import Grammar, NodeVisitor, VisitationError, rule
grammar = Grammar(
r"""
edl = header_entry+ (empty entry)+
header_entry = key ":" spaces value newline
key = ~"[A-z0-9_ ]+"i
title = "TITLE:" spaces value newline
fcm = "FCM:" spaces value newline
entry = index spaces reel spaces tracks spaces transition spaces timings newline attrib+
index = ~"[0-9]+"
attrib = "*" spaces key ":" (newline/spaces) value newline
tracks = string (slash string)?
timings = timecode spaces timecode spaces timecode spaces timecode
reel = ~"[A-z0-9_]+"i
transition = ~"[A-z]+"i
string = ~"[A-z0-9_]+"i
timecode = time ":" time ":" time ":" time
time = ~"[0-9][0-9]"
empty = spaces? newline
slash = "/"
value = ~".*"i
newline = ~"\n*"
spaces = ~"\s+"
""")
class V(NodeVisitor):
def generic_visit(self, node, visited_children):
return visited_children or node
def visit_edl(self, node, visited_children):
return {
'header': {k: v for d in visited_children[0] for k, v in d.items()},
'clips':[ch[1] for ch in visited_children[1]]
}
def visit_header_entry(self, node, visited_children):
return {visited_children[0].text: visited_children[3].text}
def visit_entry(self, node, visited_children):
return Clip(index = visited_children[0].text,
reel = visited_children[2].text,
tracks = visited_children[4],
transition = visited_children[6].text,
out_start = visited_children[8]['out_start'],
out_end = visited_children[8]['out_end'],
src_start = visited_children[8]['src_start'],
src_end = visited_children[8]['src_end'],
attrs = visited_children[-1]
)
def visit_tracks(self, node, visited_children):
return node.text
def visit_attrib(self, node, visited_children):
return {visited_children[2].text: visited_children[5].text}
def visit_timecode(self, node, visited_children):
return node.text
def visit_timings(self, node, visited_children):
return {'out_start':visited_children[0],
'out_end':visited_children[2],
'src_start':visited_children[4],
'src_end':visited_children[6]}
from pprint import pprint
p = grammar.parse(edl)
v = V()
pprint(v.visit(p))
{
'clips': [
Clip(index='001', reel='Card01Ky', tracks='AA/V', transition='C', out_start='00:02:26:21', out_end='00:02:30:12', src_start='00:00:00:00', src_end='00:00:03:16', attrs=[{'FROM CLIP NAME': ' KYLE_INTERVIEW.MOV'}, {'COMMENT': 'FINAL CUT PRO REEL: Card01_Kyle_Interview REPLACED BY: Card01Ky'}]),
Clip(index='002', reel='Card01Ky', tracks='AA/V', transition='C', out_start='00:02:30:12', out_end='00:02:34:13', src_start='00:00:03:16', src_end='00:00:07:17', attrs=[{'FROM CLIP NAME': ' KYLE_INTERVIEW.MOV'}, {'COMMENT': 'FINAL CUT PRO REEL: Card01_Kyle_Interview REPLACED BY: Card01Ky'}]),
Clip(index='003', reel='Card02Je', tracks='AA/V', transition='C', out_start='00:00:26:12', out_end='00:00:27:00', src_start='00:00:07:17', src_end='00:00:08:05', attrs=[{'FROM CLIP NAME': ' JEFF_INTERVIEW.MOV'}, {'COMMENT': 'FINAL CUT PRO REEL: Card02_Jeff_Interview REPLACED BY: Card02Je'}]),
Clip(index='004', reel='Card02Je', tracks='AA/V', transition='C', out_start='00:00:28:22', out_end='00:00:32:00', src_start='00:00:08:05', src_end='00:00:11:08', attrs=[{'FROM CLIP NAME': ' JEFF_INTERVIEW.MOV'}, {'COMMENT': 'FINAL CUT PRO REEL: Card02_Jeff_Interview REPLACED BY: Card02Je'}]),
Clip(index='005', reel='Card01Ky', tracks='AA/V', transition='C', out_start='00:01:08:03', out_end='00:01:12:19', src_start='00:00:11:08', src_end='00:00:15:24', attrs=[{'FROM CLIP NAME': ' KYLE_INTERVIEW.MOV'}, {'COMMENT': 'FINAL CUT PRO REEL: Card01_Kyle_Interview REPLACED BY: Card01Ky'}])
],
'header':
{
'FCM': 'NON-DROP FRAME',
'TITLE': 'TEST PAPEREDIT'
}
}
I only based it on my one sample EDL so this is most likely not a complete solution.