#!/usr/bin/env python # coding: utf-8 # # ISB-CGC Community Notebooks¶ # Check out more notebooks at our [Community Notebooks Repository](https://github.com/isb-cgc/Community-Notebooks)! # # ``` # Title: How to create convert 10X bams to fastq files using dsub # Author: David L Gibbs # Created: 2019-08-07 # Purpose: Demonstrate how to make fastq files from 10X bams # Notes: # ``` # # How to use dsub to convert 10X bam files to fastqs # In[ ]: In this example, we'll be using DataBiosphere's dsub. dsub makes it easy to run a job without having to spin up and shut down a VM. It's all done automatically. https://github.com/DataBiosphere/dsub Docs for the genomics pipeline run: https://cloud.google.com/sdk/gcloud/reference/alpha/genomics/pipelines/run For this to work, we need to make sure that the Google Genomics API is enabled. To do that, from the main menu in the cloud console, select 'APIs & Services'. The API is called: genomics.googleapis.com. # In[45]: # first to install dsub, # it's also possible to install it directly from # github get_ipython().system('pip install dsub') # In[ ]: # In[10]: # let's see if it's installed OK get_ipython().system('pip show dsub') # In[ ]: # In[13]: # pip install software in the /.local/bin directory .. not part of PATH yet get_ipython().system('~/.local/bin/dsub') # In[ ]: # In[16]: # hello world test # using the local provider (--provider local) # is a faster way to develop the task get_ipython().system(' ~/.local/bin/dsub --provider local --logging /tmp/dsub-test/logging/ --output OUT=/tmp/dsub-test/output/out.txt --command \'echo "Hello World" > "${OUT}"\' --wait') # In[17]: # and we can check the output get_ipython().system('cat /tmp/dsub-test/output/out.txt') # In[ ]: # In[43]: # dsub can take a shell script.. cmd = ''' apt-get update; apt-get --yes install wget; wget http://cf.10xgenomics.com/misc/bamtofastq; chmod +x bamtofastq; OUTPUT_DIR="$OUTPUT_FOLDER/fastq";./bamtofastq ${INPUT_FILE} ${OUTPUT_DIR};''' fout = open('job.sh', 'w') fout.write(cmd) fout.close() get_ipython().system('cat job.sh') # In[ ]: # In[18]: # default for dsub is for a ubuntu image # which is great, because bamtofastq is compatible # In[44]: get_ipython().system('~/.local/bin/dsub --provider google-v2 --project cgc-05-0180 --zones "us-west1-*" --script job.sh --input INPUT_FILE="gs://cgc_bam_bucket_007/pbmc_1k_protein_v3_possorted_genome_bam.bam" --output-recursive OUTPUT_FOLDER="gs://cgc_output/testout/" --disk-size 200 --logging "gs://cgc_temp_02/testout" --wait') #error: error creating output directory: "/mnt/data/output/gs/cruk_data_02". Does it already exist? # In[ ]: # That's it! We can check the output with: # In[ ]: get_ipython().system('gsutil ls gs://cgc_bam_bucket_007/output')