StatQuest: A gentle introduction to RNA-seq

  • 🎬 Video
  • ℹ️ Published 5 years ago

RNA-seq may sound mysterious, but it's not. Here's go over the main ideas behind how it's done and how the data is analyzed.

For a complete index of all the StatQuest videos, check out:

If you'd like to support StatQuest, please consider...

Buying The StatQuest Illustrated Guide to Machine Learning!!!


...a cool StatQuest t-shirt or sweatshirt:

...buying one or two of my songs (or go large and get a whole album!)

...or just donating to StatQuest!

Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:

#statquest #rnaseq

💬 Comments

Hey Josh, I just started doing my RNA sequencing and your video was very useful for me to get the basic idea behind the RNA sequencing. I spent hours and hours reading papers and didn't understand a thing about it. I really appreciate what you have done. I think your channel will play a big role in my research. Hats off your efforts in coming up with useful informative videos like this.

Author — Himalayan plane spotter


I've learned more from this 18-min long video, than my whole course in transcriptomics. You are so good at explaining these complex processes. Thank you!

Author — Alexander Palm


So the "+" symbol used to be a section where individuals could ad more info about their sequence. This is back when we had far smaller output. Since then, data has grown so much that we have developed tools to go through the data more efficiently. In the end, the "+" is simply an artifact from the early days of these files.

Author — Paul Risteca


First you got me through my statistics course. Then you pulled me through my machine learning courses. Now, you will save me in my graduate genetics lab. Thank you so much Josh.

Author — Patrick


This technique finally makes sense! Thank you so much, this really helped a lot. Very clearly explained and easy to follow.

Author — Charlotte L


This is an amazing video! I have learnt a lot. We would like to see more videos on bioinformatics.

Author — sanjanatule


Call this a 'gentle introduction' if you want. It is really just a high level overview of the process. Such presentations are important to building students mental models of any subject, and honestly are frequently overlooked in many subjects in my experience.

Great stuff.

Author — Tanner Nelson


Thank you for the clear explanations. This was the first time I learned how the machine reads the bases by taking pictures from the top with single base annealing intervals. It was very informative. I am heading to watch your other videos. Many thanks!

Author — Maribel Yazdanifar


8:48 Two semesters of Bioinformatics and all I wanted was the sequence "GATTACA" to appear on any one slide during a lecture. Thanks, StatQuest for making my wish finally come true!

Author — Grillpander


Thank you, Josh, for making such a great video, it really explained lots! I just have some questions about how the sequence being broken up into fragments bit and wonder how the reads are counted. If they break the sequence into small fragments, is there a chance that these fragments might match fragments of more than one "reference" sequence? Also, is there a chance that fragments of one sequenced read actually match a "reference" sequence but because the "reference" sequence was broken up in a different way so they ended up not matching each other? for example, the “reference” (ATCGATCGATCGATCGATCG) is broken up to fragments (ATCGATCGAT and CGATCGATCG) and the fragment of the read is from the same sequence but is just broken into a sequence “TCGATCGATC”? I am also wondering, how do they count the number of reads? If they break them into fragments, this means that these fragments actually all originate from the same “reference” sequence, so how do they know which fragments were from the same sequence and only count once?

Author — Wanqing Jiang


Thank you for this. My current molecular bio professor recommedned we see it before our first class, and it was well worth the 18 minutes!

Author — Steven Silz-Carson


This is honestly amazing... the best explanation on YouTube for RNA-seq!!

Author — J E


Excellent course! Thank you for making these stuff so clear! It's much easier to understand!

Author — Leiping Zeng


Hi, this is very great video as usual. Could you please make a video about spatial gene expression because I think this is more update technology compared to single cell analysis? Don't you think? Please help me understand the spatial gene expression analysis things. I do not really know what it exactly is. Thank you very much.

Author — emkahuda


Hello congratulations for the video. Very clear and well explained. Thank you!
I'd like to have some clarifications:
1) minute 9:14 when you say "the genome fragments that matched the reads fragments will determine a location (chromosome and position) in the genome" basically means that for reads aligned/mapped we know their position with respect to the reference genome, or are you referring to something else?
2) minute 9:29 - I don't understand why breaking read sequences into small fragments. In the example (9:45) you say that the first fragment will not match beacuse of the A mismatch, but also with some mismatches reads can map on the reference genome. So why make fragments is a better choice. Moreover fragments too small improve multimapping, not good for gene expression profiling. Could you clarify my misunderstanding?
3) Finally i ask you with what program have you created these images.


Author — gabriele rosso


These are great, I’ve been inundating myself with media about genomics and bioinformatics, but a lot of the specific techniques and terminology were still going over my head.

Thanks for the thorough, but no-nonsense explanations!

Author — Trannus Aran


"+" sign in raw data indicates the forward strand. If paired-end sequencing was performed there will be files with"+" or "-" signs that indicate forward or reverse strands, respectively.

Author — Muna Alhammadi


This video is great! Could you do a video someday going into details of working with the raw data? Ie accessing it (terminal and fastq files), converting it... which software to use etc? I think you might be the only one who'd be able to explain hahaha

Author — LuvElaYay


This is the clearest explanation in this sequencing videos. Usually explanation of such technology is hard, but they did a great job.

Author — Alex Yang


According to the "+" in fastQ, I have heard once, that in this line it used to be a repeat of the sequence name without "@" at the begining, to ensure that quality regards to the sequence above, but to lower file weight it has been switched to the "+". That's what I have heard, but it would be worth to approve somehow this information, I have no sources to do that.

Author — MrErPeeeG