Questions tagged [genome]

Genome is the entirety of an organism's DNA sequence. The genome includes both the genes and the non-coding sequences, such as repeats, introns and regulatory sequences, possessing both known and unknown function.

0
votes
2answers
70 views

Is it possible to partially unzip a .vcf file?

I have a ~300 GB zipped vcf file (.vcf.gz) which contains the genomes of about 700 dogs. I am only interested in a few of these dogs and I do not have enough space to unzip the whole file at this time,...
1
vote
1answer
40 views

Compare two cols of one file with another file of same cols and fetch the matches_large dataset_14GB

I have file1 of 650,000 rows with two cols, "Chr" and "Pos". I want to compare this file with dbsnp (file2) datadump and match with with Chr and Pos col present in dbSNP dump. Once matched, respective ...
0
votes
0answers
12 views

filter gff3 file for complete gene

I have a gff3 file which has complete length sequence. But, few of the complete sequences have multiple UTRs. I wish to filter them out. Is there any utility that is available ? scaffold105size588288 ...
0
votes
2answers
86 views

Plotting coverage depth in 1kb windows?

I would like to plot average coverage depth across my genome, with chromosomes lined in increasing order. I have calculated coverage depth per position for my genome using samtools. I would like to ...
0
votes
0answers
16 views

random sampling 1/3 of genome .fasta

I have a genome of about 2 gb composed by scaffolds I would random sample the genome. I used reformat.sh but the output was only a scaffold. I need 1/3 of the total genome... >LGKD01000001.1 ...
-1
votes
1answer
41 views

Use XPATH to obtain value from a large NCBI XML file

I am new to R. I have downloaded the XML with all Bioprojects from the NCBI. The file is 1GB in size. I started with this: setwd("C://Users/USER/Desktop/") xmlfile = xmlParse("bioproject.xml") root = ...
0
votes
1answer
50 views

Get genome from NCBI with biopython

Python newby here. I want to download the genome sequence for genome (NC_007779.1) using BioPython packages Entrez and SeqIO. So far, I have this code: from Bio import Entrez from Bio import SeqIO ...
0
votes
1answer
17 views

Encoding a value in gray code with floating point with negatives

My objective here is to be able to convert any number between -4.0 and 4.0 into a 5 bit binary string using gray code. I also need to be able to convert back to decimal. Thanks for any help you can ...
0
votes
2answers
49 views

Extracting lines of text depending on the len() of a particular column

I'm trying to write a simple script to extract particular data from a VCF file, which displays variants in genome sequences. The script needs to extract the header from the file, as well as SNVs ...
2
votes
2answers
93 views

How to get elements from javascript file and put them into HTML elements

So I'm very new to HTML and I was trying to take an txt output file, convert it into usable data, and input that data into HTML, changing various attribute values, like title and innerHTML As an ...
0
votes
0answers
70 views

Ubuntu genome session is broken by an linux application build by pyinstaller when it's running by desktop shortcut and closed

I have built an linux application by python, pyqt5, django and pyinstaller and I have added desktop shortcut to lunch from start menu. When I run the app on command line there is no problem. But when ...
2
votes
0answers
48 views

How to fix “bowtie2 died with signal 9 (KILL) error”

I am working on assembling a transcriptome of the terminal ganglion of the cricket (Gryllus bimaculatus). We are working on a non-model species and thus did not remove ribosomal contamination during ...
0
votes
0answers
20 views

Compare a genome's double combination percentages with a reference genomes' percentages

I have 3 unknown genomes and I have 15 genomes(5 for bacteria,5 for fungus,5 for yeast). I found double combinations of all these genomes. Double combinations are AA,AT,AG,AC,...(16 combinations). ...
0
votes
0answers
56 views

Gapcloser memory failure

I was running Gapcloser (a tool released from Soapdenovo), and it kept comes up a message and failed in the middle of the process: *** Error in `GapCloser': free(): invalid pointer: ...
1
vote
0answers
31 views

Identify genes from sCNA and point mutations

I need to find the genes that overlap deletions and point mutations that I found in my analysis. I have a file with a list of repair genes, I have 3 files: A list of repair genes RepairGenes.bed A ...
-1
votes
1answer
30 views

Using the NEAT Algorithm, will a child of two genomes always have the same structure as the most fit parent?

I'm trying to implement the NEAT Algorithm using c#, based off of Kenneth O. Stanley's paper. On page 109 (12 in the pdf) it states "Matching genes are inherited randomly, whereas disjoint genes (...
0
votes
0answers
29 views

How to use bedtools coverage to assess genome assembly?

I have been attempting to use "bedtools coverage" command to try and assess the coverage of my genomic assembly and the existance of any inversions etc. I must have a fundamental misunderstanding ...
0
votes
1answer
30 views

split terminal into different tabs

I'm using this script shell to run multiple commands, my problem is that each result of the command appears in a different window, my goal is to have one window with different tabs. here is my code: ...
1
vote
1answer
30 views

Error when converting .gprobs files from Impute2 to PLINK format

I have a set of .gprobs files that I need to import into Plink. However, I keep getting the same error -- a problem in a specific line, even after I removed that line and the lines around it. The ...
0
votes
0answers
47 views

Hap map genotype file to VCF genotype file

I don't have a lot of experience with converting genotype files but I would like to know if it is possible to convert a hap map file to a VCF genotyping file for import into R? Any R-code or ...
0
votes
0answers
83 views

BWA fails to locate index genome

I know this question has been asked before. I have searched through the threads and nothing has made a ton of sense to me. Admittedly, I am a newbie at bioinformatics so maybe the answer is clear to ...
-1
votes
1answer
39 views

percentage of missing data in R gives error

My data looks like this in R console: dim(df1) [1] 54003 994 df1[1:10, 1:10] marker X1 X73 X88 X9 X17 X25 X33 X41 X49 1 1228104|F|0-8:C>T-8:C>T 0 0 0 ...
-2
votes
1answer
33 views

KEGG Annotation [closed]

I have a set of genes (Amino acid sequences). I want to find the Kegg based functional annotations or KO ids. Is there any KEGG database available for download? I want to use blast with that database. ...
0
votes
1answer
69 views

optimize parallelisation in SLURM cluster: the case of genome alignemnt

I would like to understand what is the best way of using bwa in parallel in a SLURM cluster. Obviously, this will depend on the computational limits that I have as user. bwa software has an argument "...
1
vote
0answers
51 views

Merge two chromsome box data type which have one common edge

This is my data: > pos chrA x x_end chrB y y_end chr11 0 3000000 chr19 0 20000000 chr11 60000000 63500000 chr19 0 20000000 chr11 63500000 ...
0
votes
0answers
35 views

what statistic test should i use to identify mutation hotspot on chromosomes?

I am trying to identify mutation hotspots in cancer genomes. one approach i took was to use a sliding window (2kb window, move along the chromosome with 200bp a step, so window 1 and window 2 overlap ...
0
votes
1answer
34 views

Amino acid usage vs Amino acid Identity

I have a little confusion regarding the terms "amino acid usage" and "amino acid identity". How can amino acid usage be calculated? I heard about "CodonW". Is there any other option do we have? ...
-1
votes
1answer
130 views

Convert .gprobs files from Impute2 to PLINK format

I have some imputed .gprobs files (one per chromosome), imputed by Impute2 downloaded from dbGaP, and I need to convert this file into .bed format of PLINK in order to do some analysis. My .gprobs ...
0
votes
1answer
51 views

where could I find the dataset that correlate with personality trait and genetic information

As I am working on a project. In Which I need to predict the personality trait of a person to big five inventory(O,C,E,A,N) based on the genetic information. But I am stuck on finding the dataset or ...
0
votes
1answer
37 views

alternative of jbrowse on centos

Is there an alternative Software of JBrowse (on Centos6). I need to integrate one into my webpage, but jbrowse is giving error of zlib while installing PerlIO::gzip. Although all related modules (...
0
votes
1answer
50 views

NEAT: Speciating

I was trying to implement neat myself, using the original paper but got stuck. Let's say that in the last generation I had the following species: Specie 1: members: 100 avg_score: 100 Specie ...
0
votes
0answers
161 views

Issue using MaSuRCA 3.2.6 assembler

I'm actually using MaSuRCA-3.2.6 to assemble my genome and a ran the fallowing script: #PBS -S /bin/bash #PBS -l nodes=1:ppn=8:bigmem,mem=100gb #PBS -e /pandata/ACG-0006_0027/LOGS/ACG-006_assembly....
2
votes
0answers
66 views

Extract and organize automatically KEGG annotation results into Excel

I have launched a query with amino acid sequences on "KAAS - KEGG Automatic Annotation Server". I have then downloaded the results file called "myfile.keg". A small example file that shows how it ...
0
votes
0answers
12 views

How to use ALLPATH-LG

I actually need your help because it is the first time I have to assemble a genome. I have in my possession 2 fasta file: reads1.fq and reads2.fq Those file are comming from an illumina Hiseq 3000 ...
0
votes
0answers
43 views

Multiple pairwise gene expression matrix comparison

I work in R, and have a large gene expression matrix, which I have subdivided into clusters as per the following: Sample Cluster Gene1 Gene2 Gene3... 1 A 412 535 23 2 ...
0
votes
1answer
65 views

Proteomics: Create a MSnSet class file with MSnbase [closed]

I want to create a MSset file (proteomics data, data corresponds to spectral counts) but I get error messages and I am stuck (after reading manuals, helps, forums, etc). You can get my files here: ...
0
votes
1answer
115 views

R Proteomics: issues with input file “ExpressionSet” : Processing information: msmsTest package

Updated Question: I want to use the msmsTest package for statistics of my proteomics data (which is spectral counts type). However, I have a message error when importing the file with the commands: ...
1
vote
1answer
113 views

SNP coordinates to gene names

I have SNP ids and coordinates in a bed file provided by UCSC. I want to map them to their gene names. chr1 9160974 9160975 rs1013578619 0 + chr1 164528869 164528870 ...
-3
votes
1answer
81 views

Using python to make a random human genome [duplicate]

I need help making a program that creates a text file of randomly sequenced genome that uses the letters 'A' 'C' 'T' and 'G'. The end goal is to produce abut a million randomly sequenced genomes then ...
-2
votes
2answers
593 views

Looping over list multiple times

Is it possible to iterate through a list multiple times? basically, I have a list of strings and I am looking for the longest superstring. Each of the strings in the list has some overlap of at least ...
2
votes
0answers
146 views

Invalid command name “tk_chooseDirectory” error

I am using bioconductor for WES pipeline and I am using tk_choose.dir for selection of directory (and store it for further use) where user has stored input files. Here the command lines library(tcltk)...
0
votes
2answers
81 views

Error in NGSLIB PACKAGE INSTALLATION

I am trying to install ngslib python package for oncotator 1.9.5.0 . But every time installation failed due to some reason (mentioned below). I have tried all possible methods like pip install ngslib ...
2
votes
0answers
243 views

No reads mapped in proper pairs, in paired-end sequencing bamfile using samtools

I am working with a bamfile of paired-end whole genome sequencing, and want to filter out reads from a specific genomic region that are not mapped in a proper pair (these sometimes indicate a ...
0
votes
0answers
48 views

Identifying relevant SNPs from a list

I have a list of all SNPs in CDS of several thousand genes, and I'm looking for a convenient method to come up with a list of known SNPs that alter an amino acid in those genes. I'll be glad to hear ...
1
vote
1answer
332 views

Unix. Loop through Plink files with same names but different chromosome numbers. Produce output

I have basic knowledge of Unix. I have a list of 22 files with this name pattern: chr1_ASI, chr2_ASI, chr3_ASI...chr22_ASI. I want to loop through them using this Plink command in the OS X Terminal: ...
1
vote
2answers
63 views

Efficient code to map genotype matrix in R

Hi I want to convert a matrix of genotypes, encoded as triples to a matrix encoded as 0, 1, 2, i.e. c(0,0,1) <-> 0; c(0,1,0) <-> 1; c(0,0,1) <-> 2 First here is some code to ...
1
vote
1answer
41 views

How to create an interval file defined by values from another file - for circos imaging of WGS data

I am trying to depict my whole-genome sequence (WGS) data of my parasite, using the circos software. One of the elements I would like to depict, is the areas of the reference genome for which i do ...
0
votes
3answers
227 views

Block bootstrap for genomic data

I am trying to implement a block bootstrap procedure, but I haven't figured out a way of doing this efficiently. My data.frame has the following structure: CHR POS var_A var_B 1 192 0,9 0,7 1 2000 ...
-1
votes
2answers
29 views

Is there any open source tools available for chimeric sequence detection?

Is there any tools for detecting and removing chimeric sequences from 16s,WGS,WTS sequences other than USearch. The alternative should be open source so that it can be used for commercial purposes.
1
vote
0answers
58 views

Filter overlapping entries in bed file (2)

This is a follow-up of my previous question. This time, in case two entries in bed file overlap I would like to keep one of them in the output (randomly selected). Preferably, I look for a ...

http://mssss.yulina-kosm.ru