Questions tagged [bioinformatics]

Use this tag only for programming-related questions related to Bioinformatics. Other questions do not belong here, but might be on-topic at https://bioinformatics.stackexchange.com/. Please refer to the tag wiki for more information.

-1
votes
0answers
23 views

Using a loop for making dotplots in R language

so I've seen something similar with proteins sequences, but it's not quite the same. I have 10 fasta files of DNA sequences. Using seqinr library, I want to make all the possible dotplots for these 10 ...
3
votes
2answers
33 views

How to transfer (sum up) the counts from a set of ranges to ranges that are englobing those ranges?

I am working with sequencing data, but I think the problem applies to different range-value datatypes. I want to combine several experiments of read counts(values) from a set DNA regions that have a ...
-1
votes
0answers
32 views

Creating a topGO object

I have an gene list with their p.values in a .csv file. sample data ID P.Value 1555123_at 8.41E-06 212587_s_at 4.52E-05 205547_s_at 0.00010758 214156_at 0.0001204 206941_x_at 0.0001916 ...
0
votes
1answer
25 views

Python for identifying minimal chromosomal regions among samples

I have multiple sample files (>20) that look like: chr startpos endpos 1 14930 818094 1 818161 31595422 2 35593931 35865807 2 35868158 104785784 And I would like to output ...
1
vote
1answer
18 views

Removing labels from kinship2 pedigree plot

I'm plotting a pedigree for a wild population using kinship2, and trying to remove the labels for individuals. I've tried various arguments in par() and plot() but they either don't get rid of the ...
0
votes
0answers
23 views

ValueError: X has 2 features per sample; expecting 10

Hello i am a beginner in coding and have to do a gene analysis for my bio-informatics class. When i classify some genes. I get the following error: File "C:\Users\arthi\Anaconda3\lib\site-packages\...
2
votes
2answers
58 views

How do I generate a recursive tree-like dictionary from a flat file (Gene Ontology OBO file)?

I'm trying to write code to parse a Gene Ontology (GO) OBO file and push the go term IDs (e.g. GO:0003824) into a tree-like nested dictionary. The hierarchical go structure in an OBO file is indicated ...
1
vote
1answer
30 views

Get hgnc_symbol/gene_name from ensembl_gene_id

I have this code (come from here): library('biomaRt') mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl")) genes <- rownames(res) G_list <- getBM(filters="ensembl_gene_id", ...
1
vote
0answers
25 views

Ode does not solve the equations properly

I'm completely new with R and equation solving and stuff like that, yet I have a university task which involves differential equations and R, and I cannot do it properly at all... ```{r} library(...
0
votes
0answers
19 views

snakemake on slurm cluster - jobs not updating/submitting after checkpoints? (Error submitting jobscript (exit code 1):)

I have a pretty complicated pipeline I need to run on a slurm cluster, but am not able to get it to work. For some reason, the pipeline works for smaller jobs, but as soon as I add more input files ...
1
vote
1answer
42 views

How to match unique elements in one column and list the corresponding values from second column

I have a file in the following format: ENSG00000087510 ENST00000201031 TFAP2C transcription_factor protein_coding Where each column is separated by a tab. As you can see, there are 5 columns....
0
votes
1answer
35 views

Get genome from NCBI with biopython

Python newby here. I want to download the genome sequence for genome (NC_007779.1) using BioPython packages Entrez and SeqIO. So far, I have this code: from Bio import Entrez from Bio import SeqIO ...
1
vote
1answer
24 views

Iterating through a series of GenBank genes and appending each gene's features to a list returns only the last gene

I'm having a problem with my code. I'm trying to iterate through the genbank file's list of genes using BioPython. Here's what it looks like: class genBank: gbProtId = str() gbStart = int() ...
2
votes
2answers
65 views

Python: how to find elements of a large (1 million elements) array in another larger array (600 million elements)

I have a very large file (containing dbSNP ID), containing 1 million rows, each containing a single string, and another larger file (.vcf) containing 600 million rows, each containing 7-8 columns. I ...
0
votes
0answers
18 views

Differentially methylated position analysis in a related sample? [closed]

I'm trying to figure out how to do a DMP analysis (using minfi dmpFinder) on a related sample (if it's even possible). Right now the code is: dmps = dmpFinder(mVals, pheno = targets1$X8yrfactor, type ...
0
votes
2answers
24 views

Substring multifasta file using python

I am trying to extract sequences from a multifasta file from position 2 to 8 (seeds of microRNAs). To do this I have written a small python script. The script works but I couldn't write an output file....
1
vote
3answers
73 views

Convert sequence list to fasta for multiple files

I have thousands of files, which are a list of sequence names followed by their sequence, one individual per line, something like this: L.abdalai.LJAMM.14363.SanMartindeLosAndes ...
0
votes
0answers
28 views

10-fold Cross Validation Lasso for Multiple Predictors in R

The data set that I am working with can be obtained online. It is a dataset that contains 569 observations and 32 variables. The first column is subject ID, the second is the label for instance class; ...
1
vote
1answer
30 views

create and save fasta file from stringset [closed]

I have this DNA stringset, but I want to create a new file.fa containing this information. What is an efficient way to save these? I've tried to use write.fasta but it crashed. genes_seq <- A ...
1
vote
1answer
48 views

How to print the first few records using SeqIO from Biopython

I have a fasta file that has several hundred records but I'm trying to return a table with just the first 20 records (record description, AA length, and name). My code is not working and I would ...
1
vote
0answers
29 views

Snakemake refuses to unpack input function when rule A is a dependency of rule B, but accepts it when rule A is the final rule

I have a snakemake workflow for a metagenomics project. At a point in the workflow, I map DNA sequencing reads (either single or paired-end) to metagenome assemblies made by the same workflow. I made ...
1
vote
4answers
57 views

Extract specific word

I have this list of objects: dput(head(annotations)) structure(list(X1 = c("KQ415659.1", "KQ415659.1", "KQ415659.1", "KQ415659.1", "KQ415659.1", "KQ415659.1"), X2 = c("Genbank", "...
0
votes
1answer
26 views

Match col values from two different datasets (dbSNP ID's) and merge dataset

I have two datasets A and B. I wish to match dbSNP ID's from dataset A with B. If matched, then for that SNPid's want to fetch the other col values for that row and merge with cols present in dataset ...
0
votes
1answer
18 views

vcf files modification before converting to BCF

I am adding new vcf files to a previously made bcf file in which the ID field in the VCF had been set to CHR:POS:POS:REF:ALT ? How do I set the ID field in the VCF to CHR:POS:POS:REF:ALT ? Thanks ...
3
votes
2answers
79 views

In Julia, how does one convert a list of ASCII decimals to a string?

tldr: I want to convert [125, 119, 48, 126, 40] to output string, }w0~( To give a real life example, I am working with sequence data in fastq format (Here is a link to the library imported). cat ...
2
votes
1answer
69 views

How to setup a snakemake rule whose target files are determined by file content?

I would like to split a sam file into multiple sam files according to the barcode info. And the query barcode info are list in another file. $ cat barcode.list ATGCATGC TTTTAAAA GGGGCCCC CGCGATGA ...
0
votes
1answer
31 views

How do I delete element of XML in Python?

I am trying to remove some of element in xml file with ElementTree. Mycode doesn't give any error but it doesn't do what I want. I want to enter CHAIN_ID and RES_POSITION and when I look new written ...
0
votes
1answer
100 views

How to summary statistics 2WA Post HOC?

structure(list(AGI = c(“ATCG01240”, “ATCG01310”, “ATMG00070”), aox2_0h__1 = c(15.79105291, 14.82652303, 14.70630068), aox2_0h__2 = c(16.06494674, 14.50610036, 14.52189807), aox2_0h__3 = c(14.64596287, ...
5
votes
0answers
91 views

Regex to Match mRNA Sequences

In eukaryotes spliced mRNA has three key properties: mRNA starts with a start codon (ATG) The coding part of the mRNA ends with one of three stop codons (TAA/TAG/TGA) Immediately after the stop codon ...
1
vote
1answer
55 views

Snakemake processing large workflow slow due to lengthy sequential checking of job completion? >100x speed reduction

I am working on a rather complex snakemake workflow that spawns several hundreds of thousands of jobs. Everything works... The workflow executes, DAG gets created (thanks to the new checkpoint ...
2
votes
1answer
36 views

Create a matrix to show overlaps among multiple GRanges

I am trying to find a way to efficiently extract a matrix showing '0' or '1' when comparing different GRange objects. In my example: df <- data.frame(chr = c("chr1", "chr10"), start = c(1,4), end=...
0
votes
3answers
34 views

Manipulating data in a tuple from dictionary keys

I have a list of tuples (variable name 'values') in the format (1, 'K', '-', 0.8878048780487805) (2, 'Y', '-', 0.32882882882882886) (3, 'E', '-', 0.7216494845360825) (4, 'Y', 'B', 0....
0
votes
1answer
34 views

plot-Bamtools: where are the commands?

I would like to visualize my bam file statistics and was told (with no instruction) to use plot-bamstats. I downloaded bamstats with conda and tried to find a manual or help, but it keeps telling me ...
0
votes
0answers
19 views

Clustermap pivot_table with coloured leaves (row) using seaborn

I have a dataframe with all pairs distances between (phage) virus-like genomes. distData = pd.read_csv("distances.tab",sep="\t") print(distData.head()) reference-ID query-ID ...
1
vote
1answer
24 views

Automation for paired end reads with Cutadapt

I am trying to automate my paired-end reads with cutadapt, but I keep encountering the same issue - the adapter is trimmed from the forward reads, but not from the reverse. Even after modifying the ...
0
votes
2answers
49 views

Extracting lines of text depending on the len() of a particular column

I'm trying to write a simple script to extract particular data from a VCF file, which displays variants in genome sequences. The script needs to extract the header from the file, as well as SNVs ...
0
votes
0answers
32 views

loading a VCF file into memory, and then, reading it with pyvcf

I am new in python and bioinformatics. I am trying to first load a VCF file into the memory, and then parse it with the pyvcf library, but I am getting this error:"IndexError: list index out of range*...
3
votes
1answer
61 views

Directly calling SeqIO.parse() in for loop works, but using it separately beforehand doesn't? Why?

In python this code, where I directly call the function SeqIO.parse() , runs fine: from Bio import SeqIO a = SeqIO.parse("a.fasta", "fasta") records = list(a) for asq in SeqIO.parse("a.fasta", "...
0
votes
1answer
29 views

Proportion across rows and specific columns in R

I want to get a proportion of pcr detection and create a new column using the following test dataset. I want a proportion of detection for each row, only in the columns pcr1 to pcr6. I want the other ...
0
votes
0answers
18 views

How can I generate a log/log standard curve and interpolate the concentrations?

I carry out Enzyme Linked Immunosorbent Assay (ELISA) experiments, in which a log/log curve fit is required in order to calculate concentration from O.D. conc<-c(0, 15.6, 31.3, 62.5, 125, 250, ...
1
vote
4answers
125 views

how can I count the frequency of letters

I have a data like this >sp|Q96A73|P33MX_HUMAN Putative monooxygenase p33MONOX OS=Homo sapiens OX=9606 GN=KIAA1191 PE=1 SV=1 ...
-3
votes
0answers
47 views

IOError: [Errno 2] : no such file or directory

I'm totally new to programming. I try to practice the more I can but please be gentle, i'm totally noob I need to run a program (Spladder) to solve a bioinformatic problem : find the alternative ...
0
votes
1answer
27 views

R ggplot Create violin plot on ranked data by group

I'm little bit stuck on ggplot trying to make a figure. So I have a data-frame which have a length = 21685 Here a little example of my data x <- data.frame("Genes" = c("Gene_1","Gene_2","Gene_3"...
0
votes
0answers
18 views

How does dMod R package handle volumes?

I'm writing an ODE pharmacokinetics model in R using the dMod package. I believe dMod has some sort of option for accounting for compartment volume, but I can't seem to find how to use it or associate ...
1
vote
0answers
16 views

key 112 (char 'p') not in lookup table in Biostrings after calling getPromoterSeq (GenomicFeatures package)

Failed when trying to get promoter sequence from TxDb.Hsapiens.UCSC.hg19.knownGene by getPromoterSeq (GenomicFeatures package) library(TxDb.Hsapiens.UCSC.hg19.knownGene) library(BSgenome.Hsapiens....
0
votes
1answer
34 views

Retrieve data from GenBank with Bio.Entrez module

I am trying to solve one of the Rosalind challenges and I can't seem to find a way to retrieve data, within a specific time frame. http://rosalind.info/problems/gbk/ Do/How Do I modify Entrez....
2
votes
0answers
25 views

How to share objects between processes when using multiprocessing map and pool?

I have looked at other posts but I still don't know how to set an object attribute inside a multiprocessing pool. It works when I use threading but not with multiprocessing, and I don't know how to ...
0
votes
0answers
36 views

pandas iterate rows: add rows based on column lists of items [duplicate]

great people from the Internet! first post here, please be kind. I have a DF with a list of alleles separated by semicolon: Epitope MHC alleles 16 ...
2
votes
2answers
48 views

Extracting gene sequences from FASTA File?

I have the following code that reads a FASTA file with 10 gene sequences and return each sequences as a matrix. However the code seems to be missing on the very last sequence and I wonder why? file=...
0
votes
1answer
70 views

problems with pairwise blast in biopython

I try to run a pairwise blast between two sequences within a python script and using the biopython blast tools. I have no problems running a blast against a local database by adding parameter db='...

http://mssss.yulina-kosm.ru