ChatGPT Exercise
In this exercise you will be given a piece of code that is much more complex than what you have worked with so far. As there are no comments explaining what this code does, we will use ChatGPT to help us understand what it does, and modify the code. It often happens that you receive code, or find some code online that you would want to use, but you don't understand it fully. Here, ChatGPT comes in handy.
This exercise has some different levels of difficulty, try around with a few of the tasks below:
And if there are parts ChatGPT says that you do not understand, try prompting it for further explanations.
import numpy as np
from sklearn.cluster import KMeans
def one_hot_encode(sequence):
encoding = {'A': [1, 0, 0, 0], 'C': [0, 1, 0, 0], 'G': [0, 0, 1, 0], 'T': [0, 0, 0, 1]}
return np.array([encoding[nucleotide] for nucleotide in sequence]).flatten()
def generate_random_dna_sequence(length):
nucleotides = ['A', 'C', 'G', 'T']
return ''.join(np.random.choice(nucleotides, size=length))
def main():
sequences = [generate_random_dna_sequence(20) for _ in range(50)]
encoded_data = [one_hot_encode(sequence) for sequence in sequences]
kmeans = KMeans(n_clusters=2, random_state=42)
clusters = kmeans.fit_predict(encoded_data)
for i, sequence in enumerate(sequences):
cluster_label = clusters[i]
print(f"Sequence: {sequence}, Cluster: {cluster_label}")
if __name__ == "__main__":
main()