#!/usr/bin/env python
# coding: utf-8

# # ISB-CGC Community Notebooks¶
# Check out more notebooks at our [Community Notebooks Repository](https://github.com/isb-cgc/Community-Notebooks)!
# 
# ```
# Title:   How to create convert 10X bams to fastq files using dsub
# Author:  David L Gibbs
# Created: 2019-08-07
# Purpose: Demonstrate how to make fastq files from 10X bams
# Notes:   
# ```

# # How to use dsub to convert 10X bam files to fastqs

# In[ ]:


In this example, we'll be using DataBiosphere's dsub. dsub makes it easy to run a job without having to  spin up and shut down a VM. It's all done automatically. 

https://github.com/DataBiosphere/dsub

Docs for the genomics pipeline run: https://cloud.google.com/sdk/gcloud/reference/alpha/genomics/pipelines/run

For this to work, we need to make sure that the Google Genomics API is enabled. To do that, from the main menu in the cloud console, select 'APIs & Services'. The API is called: genomics.googleapis.com.


# In[45]:


# first to install dsub,
# it's also possible to install it directly from 
# github

get_ipython().system('pip install dsub')


# In[ ]:


# In[10]:


# let's see if it's installed OK

get_ipython().system('pip show dsub')


# In[ ]:


# In[13]:


# pip install software in the /.local/bin directory .. not part of PATH yet

get_ipython().system('~/.local/bin/dsub')


# In[ ]:


# In[16]:


# hello world test

# using the local provider (--provider local)
# is a faster way to develop the task

get_ipython().system(' ~/.local/bin/dsub     --provider local     --logging /tmp/dsub-test/logging/     --output OUT=/tmp/dsub-test/output/out.txt     --command \'echo "Hello World" > "${OUT}"\'     --wait')


# In[17]:


# and we can check the output
get_ipython().system('cat /tmp/dsub-test/output/out.txt')


# In[ ]:


# In[43]:


# dsub can take a shell script..

cmd = '''
apt-get update;
apt-get --yes install wget;
wget http://cf.10xgenomics.com/misc/bamtofastq;
chmod +x bamtofastq;
OUTPUT_DIR="$OUTPUT_FOLDER/fastq";./bamtofastq ${INPUT_FILE} ${OUTPUT_DIR};'''

fout = open('job.sh', 'w')
fout.write(cmd)
fout.close()

get_ipython().system('cat job.sh')


# In[ ]:


# In[18]:


# default for dsub is for a ubuntu image
# which is great, because bamtofastq is compatible 


# In[44]:


get_ipython().system('~/.local/bin/dsub      --provider google-v2      --project cgc-05-0180      --zones "us-west1-*"      --script job.sh      --input INPUT_FILE="gs://cgc_bam_bucket_007/pbmc_1k_protein_v3_possorted_genome_bam.bam"      --output-recursive OUTPUT_FOLDER="gs://cgc_output/testout/"      --disk-size 200      --logging "gs://cgc_temp_02/testout"      --wait')
        
        
#error: error creating output directory: "/mnt/data/output/gs/cruk_data_02". Does it already exist?        


# In[ ]:


# That's it!  We can check the output with:

# In[ ]:


get_ipython().system('gsutil ls gs://cgc_bam_bucket_007/output')