Questions tagged [bioinformatics]

Use this tag only for programming-related questions related to Bioinformatics. Other questions do not belong here, but might be on-topic at https://bioinformatics.stackexchange.com/. Please refer to the tag wiki for more information.

-1
votes
0answers
15 views

What algorithm would be optimal for a mitochondrial dynamics computer vision project?

I'm doing a project on mitochondrial fission and fusion and I want to build a computer vision program that can tell how divided mitochondria are from microscope images. Here are example images with ...
1
vote
4answers
42 views

Populating a dictionary with multiple lines as one string

I have a file with multiple lines in FASTA format, which I want to break up in pieces and populate a dictionary with these pieces. >piece_1 Lorem ipsum dolor sit amet consectetur adipiscing elit. ...
1
vote
1answer
31 views

snakemake PICARD merge bam files

I am new in using snakemake, I have an issue when using PICARD MergeSamFiles to merge bam files into one bam files. I would like to merge 1_sorted.bam 2_sorted.bam ...10_sorted.bam into one bam file ...
1
vote
1answer
15 views

Executing checkpoint intermediate Commands

I've currently been running into some issues with snakemake running intermediate rules required by a checkpoint. After attempting to trouble shoot this issue, I believe the problem lies within the ...
0
votes
2answers
28 views

How to subset data based on more than 1 condition

I am looking to subset a gene dataset, selecting genes based on their machine learning class (labels 0-2) and their loci/rssnp ID. My data looks like this: Chr Start End rssnp1 Type ...
0
votes
1answer
20 views

use snakemake pair-end bwa alignment

I am new in using snakemake, I have a simple problem when doing mapping in snakemake. I have couples of _1.fastq.gz & _2.fastq.gz and I would like to do pair-end mapping for around 10 pairs of ...
0
votes
0answers
7 views

How to get majority population assignment by ADMIXTURE output in text file

I am running ADMIXTURE to figure out population structure for a couple of species that are pretty admixed. I can see how individuals assign using the structure-like plots, and I need to make a file ...
1
vote
1answer
43 views

How to order data by duplicates in R?

I have a dataset I am trying to order by the duplicate IDs in 1 column (rssnp1 column), but I can only find duplicate functions to remove duplicates online. My data looks like this: Chr Start End ...
0
votes
2answers
44 views

How to match two .csv files and write on third file replacing data in file 2 with data from file 1

I have 2 csv and text files, file 1 has 2 columns one with gene id and two with gene name,file 2 has many columns with part of the string in columns being gene id e.g gene id(genome) or pseudo gene id(...
1
vote
1answer
32 views

How to extract data from one file based on the IDs for another

I am running code where I take one gene from a gene list, find it's Sentinel gene in Data1, and then select all rows with a matching rssnp1 ID to the gene at it's Sentinel row. However, currently my ...
0
votes
1answer
34 views

Snakemake Getting Checkpoint and Aggregate Function to Work

I'm having issue getting my snakemake aggregate command to work. My hope is to take a given GTF file, look for separate regions within the GTF and if found write these regions to a separate file. Thus,...
-5
votes
0answers
39 views

extract matching rows from a matrix and average them and produce a new matrix

I have two matrices which contain unique rows. Also, I have another matrix which doesn't have unique rows. I have to select unique rows from the second matrix with respect to the first matrix and if ...
0
votes
0answers
38 views

Designing an efficient tool for query and comparison between #n-queries

I am setting up a tool that compares samples. Each sample has 55,000 'genes' (the biology isn't important). I perform an initial query for samples that match a 'tissue type' then I want to know which ...
0
votes
1answer
30 views

Building stacked bar plot for specific data organisation

I would like to build the stacked bar plot with x-axis representing the number of genomes (or just organisms) and y-axis representing the number of gene clusters, which occur in exact number of ...
0
votes
0answers
44 views

How to translate this Python code into R? [closed]

I have a problems about this code. This below is the code to extract the AA sequences that contain the matching conserved domain from the downloaded genbank file. Normally I worked in Python and doesn'...
1
vote
0answers
25 views

Error in if loop whle fetching the data from file [duplicate]

I am doing a analysis where i need to add transcript id This is not actual data .only for reference query file = structure(list(geneSymbol = c("A", "AR", "A1", "A12", "A7", "A9A"), chr = c("chr1"...
0
votes
2answers
56 views

Extract molecules in order from SDF file according to IDs given in another file

I have an SDFile containing thousands of molecules and I need to extract from it molecules according to their IDs given in a simple one column file. So, the example of the SDF will be file1.sdf: ...
0
votes
2answers
29 views

How to create DESeqDataSetFromMatrix from 2 vectors of numbers?

I have two datasets, each having the form: Gene1Name, 234 Gene2Name, 445 Gene3Name, 23 ... GeneNName, 554 The gene names are identical for each of the 2 datasets. The numbers on the second column ...
0
votes
1answer
44 views

Duplicate Strings with Ambiguity

I have a large (5-10 million) set of strings with the restricted alphabet of nucleotide symbols (A,T,C, and G) along with a wildcard symbol N. Each string has an integer associated with it. I want ...
6
votes
2answers
104 views

How to combine similar strings showing most common characters

In a dataframe I have a list of strings that are similar to each other but separated by the difference of a %. I would like to combine these common strings into a single string that has the most ...
-1
votes
0answers
81 views

Manhattan plot and linkage disequilibrium heatmap on one plot

I would like to plot something similar like on the image bellow. I need to integrate Manhattan plots results and linkage disequilibrium heatmap results on one plot. Does any have any idea how to go ...
-1
votes
1answer
62 views

how to share anaconda packages with the user of HTTP server

I am user of ubuntu and I run many scripts written with python3 which was installed through anaconda. All modules that I need have been installed there previously i.e. biopython. However, I can't ...
0
votes
2answers
70 views

Is it possible to partially unzip a .vcf file?

I have a ~300 GB zipped vcf file (.vcf.gz) which contains the genomes of about 700 dogs. I am only interested in a few of these dogs and I do not have enough space to unzip the whole file at this time,...
0
votes
1answer
65 views

How can i stop a while loop and start another while loop where the previous one started

I am trying to count the letters in the list by skipping 1 letter and grouping them in three until i find "t a c" in the data frame and then i want to group the rest of them in three by skipping 3 ...
0
votes
0answers
33 views

How to deal with gaps during translation with biopython

I need to translate aligned DNA sequences with biopython from Bio.Seq import Seq from Bio.Alphabet import generic_dna seq = Seq("tt-aaaatg") seq.translate() Running this script will get error: Bio....
1
vote
1answer
40 views

Compare two cols of one file with another file of same cols and fetch the matches_large dataset_14GB

I have file1 of 650,000 rows with two cols, "Chr" and "Pos". I want to compare this file with dbsnp (file2) datadump and match with with Chr and Pos col present in dbSNP dump. Once matched, respective ...
1
vote
0answers
22 views

Error from getSRAfile function in SRAdb package

I am trying to download RNASeq data from NCBI SRA repository using SRAdb package. I consistently get the following error: getSRAfile( in_acc = c("SRR2033366", "SRR2033446"), sra_con = sra_con, ...
0
votes
2answers
45 views

Single linkage clustering of edit distance matrix with distance threshold stopping criterion

I'm trying to assign flat, single-linkage clusters to sequence IDs separated by an edit distance < n, given a square distance matrix. I believe scipy.cluster.hierarchy.fclusterdata() with criterion=...
2
votes
4answers
120 views

Remove character from the middle of a string

I have a SAM file with an RX: field containing 12 bases separated in the middle by a - i.e. RX:Z:CTGTGC-TCGTAA I want to remove the hyphen from this field, but I can't simply remove all hyphens from ...
0
votes
1answer
46 views

traceback in global sequence alignment

I am facing problem of tracing back the global sequence alignment. My first sequence is ATTGCGCGCAT and second sequence is ATGCTTAACCA. The traceback result should be A T T G C _ _ _ G C G C A T A _ T ...
0
votes
1answer
41 views

Hamming distance matrix for multiple sequences

I have a FASTA file with IDs and corresponding DNA sequences which I have parsed and stored into a dictionary. I now need to write a Python program compute the pairwise Hamming distance matrix for ALL ...
0
votes
1answer
82 views

Speed up for nested for loop using apply function

I am not good in using apply family of functions. I want to use apply family of functions instead of following type of nested loop iterated by 9076 x 9076 times for computation speed up. Minimal ...
0
votes
0answers
16 views

Why my shiny app is not getting file content for analysis?

...Hi, I coded a shiny App for analysis of two protein or DNA sequences similarities using DotPlot function in seqinr library. The app is opening perfectly but when i upload the fasta files for the ...
0
votes
2answers
28 views

How can integrate ShinyR function with UI?

I am trying to develop my first small shinyR app but its UI is not going to be properly integrating with its function. When the files are browsed after running app, the function is not executed. ...
3
votes
1answer
52 views

How to avoid overlap of labels in a plot using geom_label_repel?

I am trying to make a volcano plot with huge data. Showing some data here. tab7 <- structure(list(logFC = c(-1.27422400347856, -0.972370320302353, -1.63545104297305, 0.921263558062452, -0....
0
votes
0answers
58 views

How can I modify MakeFiles to allow for proper installation?

this is my first post here so please forgive me for any errors :) I have recently tried to work with two bioinformatic packages in the Terminal and recently tried to install them using the command: ...
-1
votes
3answers
61 views

Extract substring in exact order

I have a string containing a sequence of three letter amino acid codes and RNA sequence. I want to extract the amino acid code in the exact order it appears in the string. raw_seq = '''...
1
vote
0answers
23 views

AWS Ubuntu java.lang.ClassNotFoundException: Jama.Matrix

I am trying to run a mutation clustering algorithm described in https://tinyheero.github.io/2015/08/26/TrAp.html Running java -jar TrApWithDependencies.jar --text figure1.txt works fine on my laptop ...
0
votes
1answer
15 views

Generating a for loop that will make plots and give them the right title

The problem is in for loop. It seems that it does not pick up the title for every plot that is generated with AAstat function. prot_seq <- read.fasta("FGF2-ortholog-proteins", seqtype ="AA" ) ...
0
votes
1answer
46 views

How to get the sequence counts (in fasta) with conditions using python?

I have a fasta file (fasta is a file in which header line starts with > followed by a sequence line corresponding to that header). I want to get the counts for sequences matching TRINITY and total ...
0
votes
0answers
22 views

Proteins with one SS bond?

I would like to find proteins with exactly one SS bond? Is there a database where I can search this? I've tried advanced search on https://www.rcsb.org/, but no such option, at least i could not find ...
0
votes
0answers
45 views

How can I solve the 'could not convert string to float:' error when I try to embed sequence data in a keras model

I want to replace the sequence data with a single character or string and put it into my keras model. Replacing sequence data with a string is done by including padding as follows. My Enviornment is ...
0
votes
0answers
20 views

How to send an xml post to query a web service, PDB in this case?

I'm trying to get a list of entries by querying PDB using their Web Service documented here https://www.rcsb.org/pdb/software/wsreport.do. In Step 1, for a Text Search, it says to post this xml query ...
0
votes
2answers
86 views

Plotting coverage depth in 1kb windows?

I would like to plot average coverage depth across my genome, with chromosomes lined in increasing order. I have calculated coverage depth per position for my genome using samtools. I would like to ...
0
votes
1answer
37 views

Why is this script giving a syntax error? [duplicate]

I wrote this script from a Youtube video tutorial to practice coding in python for bioinformatics. When I try to run the .py file in Python 3.7, I get this error print "number of g's " + str(g). The ....
0
votes
0answers
22 views

Mygene showing AttributeError even though code is identical to handbook

I am trying to get the Entrezgene names for my RNA seq EdgeR output. I found mygene and thought I'd try it out, but regardless of what I do (have tried on home system and server), I get this error: ...
0
votes
1answer
59 views

How do I fix missing values in a for-loop in a Python list?

I want to encode the value of a Python list to refine it, but the value is missing in the middle of my function. My environments are as follows. PC 1 - Windows 10 (64-bit) without GPU, Python 3.6.8 (...
0
votes
1answer
27 views

Dividing by Columns & Rows Doesn't Work with 'ComplexHeatmap' - Even Using Their Own Example?

I'm currently trying to assemble a heatmap using ComplexHeatmap. However, when I try to divide my heatmap by columns and rows it doesn't work. When I go back to the documentation that accompanies ...
0
votes
1answer
36 views

What is the p.value for each individual test in my prop.test function?

What is the p-value for each individual test in my prop.test function? (See code below). When doing multiple testing (k = 10 000 tests in this case), I want to find the alpha for each individual ...
-1
votes
1answer
27 views

Partitioning a list on the basis of bins where binsize is defined

I have a dictionary with every key having a list of values where the keys are chromosome numbers like chr1,chr2, and the values are positions of mutations. The values are integers, and I have to bin ...

http://mssss.yulina-kosm.ru