When we think of the word marker, the first thing that comes to our minds is something that is used to indicate a place. For example, it can be your current location on Google Maps or it can be the place where you planted some seeds in your garden. Similarly, in genomics studies, we can find marker genes in bacterial genomes. In this article, I will introduce you to marker genes used in metagenomics analysis, how they are used and walk you through an example of a commonly used gene prediction tool.
The assembly algorithms that have been developed so far intend to provide better assemblies evaluated under different criteria. Hence, depending on the specific scenario the assembly process might produce better results if we use the most appropriate assembler. Even though contiguous genomes may not be produced, segments from the reference genomes can be obtained using existing assembly methods. Therefore, the need to evaluate the quality of assemblies exists. These evaluations help researchers to pick different assemblers for different scenarios.
We know that there are trillions of microbes in the environment surrounding us, even in our bodies. These microscopic communities have very diverse ecosystems and by studying their composition and behaviour we can learn a lot about them. If you have come across my previous article Metagenomics — Who is there and what are they doing? then you know that binning is an important step in metagenomics analysis.
Did you know that your body houses about 100 trillion bacteria? Estimates show that a human has approximately a pound or two of bacteria living in his/her gut (stomach)  (Now don’t go and drink all the antibiotics you know, to kill those bacteria. In fact, these bacteria play an important role in our metabolism and immune system). The same goes for the backyard of your house. There can be many species of bacteria living in the soil and they help to enrich the soil (e.g.: nitrifying bacteria produce nitrates which are essential for plants). These microscopic communities have very diverse ecosystems and studying their composition and behaviour can provide us with valuable insights. In this article, I will provide a basic introduction to metagenomics, which is the study of genetic material obtained from microbial communities.
Have you ever wondered how life formed from the primordial soup and evolved to the different life forms which can be seen at present? How did different species evolved from their ancestors and what relationships do they have with each other? The answers to these questions can be answered through the study of phylogenetics.
Have you wondered how scientists identify regions of similarity in three or more biological sequences? As described in my previous article, Sequence alignment is a method of arranging sequences of DNA, RNA, or protein to identify regions of similarity. In my latest article on bioinformatics, I have discussed about pairwise sequence alignment. Make sure to check them out as well. Multiple sequence alignment is quite similar to pairwise sequence alignment, but it uses three or more sequences instead of only two sequences.
With the development of various methods to obtain data from living beings, there has been an explosion in biological data which is readily available to be used. However, such vast amounts of data will be of no use if there is no proper way to execute a series of steps to manipulate the data as we want, to output desired results. This is where Workflow Management Systems come in handy.
Yesterday I was returning home from university via the expressway and the oil refinery at Sapugaskanda caught my eye. The refinery towers operate while sending huge flames into the sky with smoke. The sight of the oil refinery reminded me of pipelines which are used in many manufacturing and transportation industries to transform and transport materials which will provide outputs at the end. One common example is an oil pipeline which is used for long-distance transportation, while refining the oil within intermediate units to give various petroleum products.
Similarly, genomic data can be passed through special software pipelines to refine and analyze the data as required, while resulting in desired visualizations and interpretations.
In my first article where I introduced bioinformatics, I have mentioned that we will be learning a lot about DNA, RNA and Protein sequences. Since I’m new to all these DNA/RNA jargon, I decided to learn about them first and then try out some coding problems. All the sequencing problems seem to have some words related to genetics. So first things first, let’s get started. 😊
The word Bioinformatics is making quite a turnaround in today’s world of Science. The word seems to be made up of two parts which are related to two different fields, biology and computer science. About one or two decades ago, people saw biology and computer science as two entirely different fields. One would learn about living beings and their functions whereas the other would learn about computers and underlying theories.