Showing posts with label CompGeek stuff. Show all posts
Showing posts with label CompGeek stuff. Show all posts

Tuesday, June 9, 2009

Calling all Pythonistas!

Okay so I have a question:


I have a pile of sequences (200+) in fasta format in a text file eg 

>geneA
asdfdfasfsdfasdfsdfsdfsfdsfsdfsdfsdfasdfsdfsdfasdfasfsdfafasdfsdsdfsfs
afasdfasdfasdfsdfasdfdfafdasdfsdfafasdfasfafdsfasfafsdfsdfasfsfasdfsdf

>geneB
asdfdfasfsdfasdfsdfsdfsfdsfsdfsdfsdfasdfsdfsdfasdfasfsdfafasdfsdsdfsfs
afasdfasdfasdfsdfasdfdfafdasdfsdfafasdfasfafdsfasfafsdfsdfasfsfasdfsdf

>geneC
asdfdfasfsdfasdfsdfsdfsfdsfsdfsdfsdfasdfsdfsdfasdfasfsdfafasdfsdsdfsfs
afasdfasdfasdfsdfasdfdfafdasdfsdfafasdfasfafdsfasfafsdfsdfasfsfasdfsdf

I need to make it into columns like this:

geneA asdfsdfasdfsdfsdfsdfasdfsfasdfsdfafsdfsdfasdfasdfsdfasdfs
geneB asdfsdfasdfsdfsdfsdfasdfsfasdfsdfafsdfsdfasdfasdfsdfasdfs
geneC asdfsdfasdfsdfsdfsdfasdfsfasdfsdfafsdfsdfasdfasdfsdfasdfs

all in one line.  I want to have the text file set up like this because i want to use python to stuff the text file into an SQL table - only I dont know how to do it - I can concatenate the sequence bit of it in excel for one of the sequences but that doesnt work for the 200+ other sequences I have.  Everything I have found on fasta and or concatenation involves simple exercises or pulling down individual fasta sequences from genbank which didnt help me a lot.

So what I want to know is how do i get the sequence name in one column and all sequences in the other so that I can make an output file that I can open using a python script then stuff into an SQL table.  Any ideas?

E.