makeTranscriptDb {GenomicFeatures} | R Documentation |
makeTranscriptDb
is a low-level constructor for making
a TranscriptDb object from user supplied transcript annotations.
See ?makeTranscriptDbFromUCSC
and
?makeTranscriptDbFromBiomart
for higher-level
functions that feed data from the UCSC or BioMart sources
to makeTranscriptDb
.
makeTranscriptDb(transcripts, splicings, genes=NULL, chrominfo=NULL, metadata=NULL, reassign.ids=FALSE)
transcripts |
data frame containing the genomic locations of a set of transcripts |
splicings |
data frame containing the exon and cds locations of a set of transcripts |
genes |
data frame containing the genes associated to a set of transcripts |
chrominfo |
data frame containing information about the chromosomes hosting the set of transcripts |
metadata |
2-column data frame containing meta information about
this set of transcripts like species, organism, genome, UCSC table, etc...
The names of the columns must be |
reassign.ids |
controls how internal ids should be assigned for each
type of feature i.e. for transcripts, exons, and cds. For each type, if
|
The transcripts
(required), splicings
(required)
and genes
(optional) arguments must be data frames that
describe a set of transcripts and the genomic features related
to them (exons, cds and genes at the moment).
The chrominfo
(optional) argument must be a data frame
containing chromosome information like the length of each chromosome.
transcripts
must have 1 row per transcript and the following
columns:
tx_id
: Transcript ID. Integer vector. No NAs. No duplicates.
tx_name
: [optional] Transcript name. Character vector (or
factor). NAs and/or duplicates are ok.
tx_chrom
: Transcript chromosome. Character vector (or factor)
with no NAs.
tx_strand
: Transcript strand. Character vector (or factor)
with no NAs where each element is either "+"
or "-"
.
tx_start
, tx_end
: Transcript start and end.
Integer vectors with no NAs.
Other columns, if any, are ignored (with a warning).
splicings
must have N rows per transcript, where N is the nb
of exons in the transcript. Each row describes an exon plus, optionally,
the cds contained in this exon. Its columns must be:
tx_id
: Foreign key that links each row in the splicings
data frame to a unique row in the transcripts
data frame.
Note that more than 1 row in splicings
can be linked to the
same row in transcripts
(many-to-one relationship).
Same type as transcripts$tx_id
(integer vector). No NAs.
All the values in this column must be present in
transcripts$tx_id
.
exon_rank
: The rank of the exon in the transcript.
Integer vector with no NAs. (tx_id
, exon_rank
)
pairs must be unique.
exon_id
: [optional] Exon ID.
Integer vector with no NAs.
exon_name
: [optional] Exon name.
Character vector (or factor).
exon_chrom
: [optional] Exon chromosome.
Character vector (or factor) with no NAs.
If missing then transcripts$tx_chrom
is used.
If present then exon_strand
must also be present.
exon_strand
: [optional] Exon strand.
Character vector (or factor) with no NAs.
If missing then transcripts$tx_strand
is used
and exon_chrom
must also be missing.
exon_start
, exon_end
: Exon start and end.
Integer vectors with no NAs.
cds_id
: [optional] cds ID. Integer vector.
If present then cds_start
and cds_end
must also
be present.
NAs are allowed and must match NAs in cds_start
and cds_end
.
cds_name
: [optional] cds name. Character vector (or factor).
If present then cds_start
and cds_end
must also
be present.
NAs are allowed and must match NAs in cds_start
and cds_end
.
cds_start
, cds_end
: [optional] cds start and end.
Integer vectors.
If one of the 2 columns is missing then all cds_*
columns
must be missing.
NAs are allowed and must occur at the same positions in
cds_start
and cds_end
.
Other columns, if any, are ignored (with a warning).
genes
must have N rows per transcript, where N is the nb
of genes linked to the transcript (N will be 1 most of the time).
Its columns must be:
tx_id
: [optional] genes
must have either a
tx_id
or a tx_name
column but not both.
Like splicings$tx_id
, this is a foreign key that
links each row in the genes
data frame to a unique
row in the transcripts
data frame.
tx_name
: [optional]
Can be used as an alternative to the genes$tx_id
foreign key.
gene_id
: Gene ID. Character vector (or factor). No NAs.
Other columns, if any, are ignored (with a warning).
chrominfo
must have 1 row per chromosome and the following
columns:
chrom
: Chromosome name.
Character vector (or factor) with no NAs and no duplicates.
length
: Chromosome length.
Either all NAs or an integer vector with no NAs.
is_circular
: [optional] Chromosome circularity flag.
Either all NAs or a logical vector with no NAs.
Other columns, if any, are ignored (with a warning).
A TranscriptDb object.
H. Pages
makeTranscriptDbFromUCSC
and
makeTranscriptDbFromBiomart
for convenient ways to
make TranscriptDb objects from UCSC or BioMart online
resources.
makeTranscriptDbFromGFF
for making a
TranscriptDb object from annotations available
as a GFF3 or GTF file.
The TranscriptDb class.
transcripts <- data.frame( tx_id=1:3, tx_chrom="chr1", tx_strand=c("-", "+", "+"), tx_start=c(1, 2001, 2001), tx_end=c(999, 2199, 2199)) splicings <- data.frame( tx_id=c(1L, 2L, 2L, 2L, 3L, 3L), exon_rank=c(1, 1, 2, 3, 1, 2), exon_start=c(1, 2001, 2101, 2131, 2001, 2131), exon_end=c(999, 2085, 2144, 2199, 2085, 2199), cds_start=c(1, 2022, 2101, 2131, NA, NA), cds_end=c(999, 2085, 2144, 2193, NA, NA)) txdb <- makeTranscriptDb(transcripts, splicings)