Gus ...
VCF centered test. What Box hosted VCF should we use to compare notes? Maybe a public VCF and a Box hosted VCF would do it.
https://samtools.github.io/hts-specs/VCFv4.3.pdf
Google Nucleus: library to read/write in common genomics formats
https://github.com/google/nucleus
Motivating example
What are protocol buffers?
https://developers.google.com/protocol-buffers
Possible ETL workflow for Box to GCP? See figure below.
TFRecords (https://www.tensorflow.org/tutorials/load_data/tfrecord) -- suitable for big data and gives access to TF Data API (https://www.tensorflow.org/guide/data) to prevent GPU starvation. Compatible with Keras and Estimator APIs.
Creating BigQuery dataset programmatically (https://www.tensorflow.org/io/tutorials/bigquery) then import as data frames in any language of choice. Example with Pandas for Python (https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas).
Jeya:
https://episphere.github.io/vcf
(Jonas, Praful)
A Matrix laboratory is emerging with TF.
TF-backed protobuffer, pythonic js, Lee Mason ...
https://gitter.im/episphere/tensorFlowJS