Toggle navigation
JUPYTER
FAQ
View as Code
Python 3 Kernel
View on GitHub
Execute on Binder
Download Notebook
Spoken-Digit-Recognizer
docs
Notebook
GAN
Target: 訓練Generator繪製英文數字之Spectrogram,並轉為音檔
¶
對應source code連結
¶
GAN.ipynb
¶
概念
¶
訓練generator繪製Spectrogram,並將Spectrogram使用逆轉喚回時域(Frequency Domain → Time Domain)
將時域訊號儲存為.wav檔
採用Short-Time Fourier Transform (STFT)作為標的Spectrogram
(因librosa套件之inverse-CQT轉換當前版本有bug)
流程表
¶
讀取.npy(已於Spectrogram+CNN階段建立好)
這邊取英文版實作,2400(pannous data)筆
對資料做normalize、1-hot等處理
建立GAN(generator model、discriminator model)
訓練GAN
使訓練後的generator繪製Spectrogram
將該Spectrogram做inverse-STFT並儲存為音檔
Result
¶
Generator繪製的STFT-Spectrogram示例如下
可看出Spectrogram大概有學到一些Spectrogram特色(有類似音頻的圓弧曲線),但其實仍是非常雜亂<\center>
inverse-STFT轉換後的音檔如下
In [1]:
import
IPython.display
as
ipd
ipd
.
Audio
(
'resources/generate.wav'
)
Out[1]:
Your browser does not support the audio element.
轉回的音檔可聽出只是一段雜訊,此目標設立停損點,以失敗告終
Reference
¶
https://github.com/Zackory/Keras-MNIST-GAN