Time trajectory inference on Bone Marrow data.

This case study demonstrates the use of pseudotime trajectory inference to analyze a bone marrow dataset.

You can import the study in this tutorial with view-permission to your study list using the following link: https://cytoanalyst.tinnguyen-lab.com/studies/import/ZDdj4h49ZbA5KHXWL

Dataset

We use a subset of the bone marrow dataset from a workshop for this analysis. Below are the details of the dataset.

Dataset download link: https://export.uppmax.uu.se/naiss2023-23-3/workshops/workshop-scrnaseq/trajectory/trajectory_seurat_filtered.h5ad
Backup download:

https://zenodo.org/records/15319627

(Please use the backup download link if the primary link becomes unavailable.)

Note: The dataset comprises 5,828 cells and has been filtered to retain only high-quality cells. Additionally, it includes precomputed embeddings and a clustering result.

For more information about the dataset, refer to the workshop

Workflow

Create a New Study

To create a new study, navigate to the Study Management page and complete the Create Study form with the following inputs:

Name: Time trajectory inference. A descriptive name for the study.
Description: Time trajectory inference with a bone marrow dataset. A brief description of the study.

Then, click the Create Study button.

A new study will be created and added to the study table. You can use this table to manage your studies or navigate to the other pages. For more details about managing study, refer to Study Management page.

To navigate to the Data Management page for further analyses, hover over the View action of the study you just created on the study table. A pop-up will appear with options to navigate the study. Click on the Data Management option to go to the data management page.

Upload data

Uploading and processing the data

On the data management page, we will use the downloaded subset of the bone marrow dataset.

Click the Click to upload button and select the dataset to upload.

File Type: AnnData (.h5ad). This specifies the format of the dataset file.
File: trajectory_scanpy_filtered.h5ad. The name of the dataset file.
Assay: Default. The type of assay used in the dataset. In this case, it is the default assay.
Feature ID Column: _index. The column in the dataset that contains the feature IDs. In this case, it is the gene IDs.
Keep Embeddings: True. Indicates whether to retain the precomputed embeddings in the dataset.
Embeddings: umap - umap3d. Specifies precomputed visualizations and embeddings to be kept. In this case, we select only one embedding that will be used in the further analysis.
Keep Metadata in h5ad File: True. Indicates whether to retain the metadata within the dataset file.
Extra Metadata File: Empty. An additional metadata file to be uploaded. In this case, no extra metadata file is provided.
Has Multiple Samples True. Indicates whether the dataset contains multiple samples.
Sample ID Is In: Metadata. Indicates where the sample IDs are located. In this case, they are in the metadata.
Sample ID Column: dataset. The column in the dataset that contains the sample IDs.

Finally, click the Submit button to start the data preprocessing. A job will be created in the background to process the data. Once the job is completed, the right panel will display the options for data filtering. Visit Study Logs to learn more about monitoring analysis jobs and system status.

Data Filtering

In the data filtering panel, you can filter cells based on different criteria, including the number of UMI counts, the number of genes expressed, and the percentage of mitochondrial genes.

Check out Data Management for more details on data filtering.

In this case study, we will not apply any data filtering as the downloaded dataset is already preprocessed.

Save data

Click the Save data button to open dialog for saving data.

This dialog allows you to choose which samples and embeddings to save. In this case study, we will save all samples and the embedding that we chose to retain in the data uploading process.

Then, click Save data button to save the data. The newly saved data will be added to the data table.

Navigate to the Analysis page.

Once the data is saved, scroll to the top of the current page and click the Analysis to navigate to the Analysis page.

The Analysis page provides a comprehensive view of the data and analysis tools. The basic layout of the Analysis page is shown below:

Top Toolbar: Contains dropdown menus for selecting embedding, data normalization, plot type, blending mode, and color map.
Left Sidebar: Contains the label selection panel for selecting labels to visualize.
Bottom Drawer: Contains all analysis tools

For more details about navigation and understanding the layout of the Analysis page, refer to Data Analysis.

Create a new genes collection

In this case study, we select and perform trajectory inference on the genes MS4A1, LTF, SIGLECH, and CD34, which are included in the author's list of cell-type markers.

Note: You can visit their workshop to obtain the complete list of cell-type markers collected and provided by the author.

Cell Type	Marker
B cell lineage	Ms4a1
Granulocyte lineage	Ltf
Dendritic cell lineage	Siglech
HSC progenitor	Cd34

Follow these steps to create a gene set collection:

Click the Genes Collection tab on the Bottom Drawer
Click the New Collection to create a new genes collection
Select Input Type as Text Input
Enter a name and the following gene sets in the text box:

B cell lineage	Ms4a1
Granulocyte lineage	Ltf
Dendritic cell lineage	Siglech
HSC progenitor	Cd34

Click Save button to save the gene set collection.

For more details on the various methods to create a new Gene Set Collection and manage existing gene collections, refer to Gene Set Collection.

Data visualization

Visualize the cell landscape and the clustering analysis result

In this section, we will explore our data by examining the cell landscape and the result of the clustering analysis to provide an overview of the data structure.

Follow these steps to visualize the cell landscape and the clustering side by side:

In the top toolbar, use the following settings:
- Visualization embedding: umap3d. Specifies which embedding will be used for visualization.
- Plot Type: Scatter. Specifies the type of chart that will be displayed.
- Plot blending mode: Separate. Specifies which blending mode will be used for visualization.
In the label selection panel on the left sidebar, select all samples, including GSE107727, Marrow, GSE132042, and GSE108097. Then, click the button next to the Samples label to display the cell landscape. Under Categorical Metadata, click the button next to the clusters_use label to display the clustering analysis result.

Update color mapping and other settings

Click the button on the top toolbar to update the color mapping and other settings.

Here you will be able to:

Update the color mapping
Change the plot title
Arrange the plot by simply dragging it up or down
Change how many plots are displayed in a row
Enable and disable tooltips
Synchronize the zoom and pan across all plots
And more

For more details about data visualization, refer to Data Visualization.

Trajectory inference analysis

In this section, we will employ the Slingshot method to perform trajectory inference on the genes Ms4a1, Ltf, Siglech, and Cd34. The analysis contains following parts:

Visualize the expression of each gene.
Create and submit a new form to perform trajectory inference analyses.
Browse and visualize the results of the trajectory inference analysis.

Trajectory inference for the Ms4a1 gene.

Visualize the Ms4a1 gene expression

The Slingshot method requires specifying the starting and ending points for its parameters. Therefore, we will observe the expression of the Ms4a1 gene to manually select the starting and ending points.

Follow these steps to visualize the expression of the Ms4a1 gene and the clustering analysis result.

In the top toolbar, ensure the settings are as follows:
- Visualization embedding: umap3d. Specifies which embedding will be used for visualization.
- Plot Type: Scatter. Specifies the type of chart that will be displayed.
- Plot blending mode: Separate. Specifies which blending mode will be used for visualization.

Switch to the Observation tab in the left sidebar, then:
- Ensure that the button next to the Samples label is unselected
- Ensure that the button next to the cluster_use label is selected

Retain only the visualization of the clustering

Switch to the Feature tab in the left sidebar, then:
- Under Gene set collections, you will see your created gene set collection named Cell Types Marker.
- Click the B cell lineage to expand the individual genes in the gene set.
- Click the button next to the Ms4a1 label to visualize the gene.

The Ms4a1 gene expression patterns (the right panel) reveal high Ms4a1 expression in cluster 20 and minimal expression in clusters 33 and 47. Based on these expression patterns, we designate cluster 20 as the starting point and clusters 33 and 47 as endpoints for the Slingshot trajectory inference parameters.

Perform trajectory inference analysis for the Ms4a1 gene

To perform trajectory inference analysis, click the Time Trajectory tab on the Bottom Drawer, and then click the New Time Trajectory button.

Follow these settings to perform the analysis:

Name: Trajectory inference for MS4A1 gene. Analysis identifier.
Method: Slingshot. Trajectory inference method used for the analysis.
Embeddings: umap3d. Embedding used for the analysis. In this case, we use the generated embedding from the dataset author.
Group cells by: Metadata. Specifies cell grouping method.
Metadata cluster_use. Metadata field for cell grouping. In this case, we use the clustering analysis result from the dataset author.
Start Groups: 20. Initial cluster for the trajectory inference analysis.
End Groups: 33, 47. Terminal clusters for the trajectory inference analysis.
Distance method: Mutual nearest neighbors. Method for computing inter-cluster distances.
Convergence threshold: 0.100. Threshold for change in total distance between cells and curve projections.
Approximate number of points: 300. Number of points along trajectory curves.
Stretch: 0.1. Curve extrapolation factor beyond endpoints.
Allow breaks: False. Ensures continuous principal curves from origin.

Click the Submit button to create the trajectory inference analysis.

Browse and Visualize Trajectory Inference Analysis Results for the Ms4a1 gene.

Once the trajectory inference is completed, the results will be added to the Trajectory Inference Table in the Existing Time Trajectory tab. Follow these steps to view the table:

Click the Time Trajectory tab in the Bottom Drawer.
Click the Exsiting Time Trajectory tab to switch to the existing time trajectory panel.
Click the Expand Plus icon next to the analysis name Trajectory inference for MS4A1 gene to expand the analysis results for the Ms4a1 gene.

Trajectory Inference Analysis Results Table For The MS4A1 Gene

Here, we will display multiple plots to visualize the analysis results. In the top toolbar, ensure the settings are configured as follows:

Visualization embedding: umap3d. Specifies which embedding will be used for visualization.
Plot Type: Scatter. Specifies the type of chart that will be displayed.
Plot blending mode: Separate. Specifies which blending mode will be used for visualization.

Switch to the Observeration panel in the left sidebar, then:

Click the name of the trajectory inference analysis for the Ms4a1 gene, named Trajectory inference for MS4A1 gene to expand all results of the analysis.
Click the button next to the All, Lineage 1, Lineage 2, Lineage 5 labels, and ensure that the button next to the cluster_user label is selected

Results Selection Of The Trajectory Inference Analysis For MS4A1 Gene In the Observation Panel

Switch to the Features panel in the left sidebar, then:

Under Cell Types Marker, click the B cell lineage to expand the individual genes in the gene set.
Ensure that the button next to the Ms4a1 label is selected.

Results Selection Of The Trajectory Inference Analysis For MS4A1 Gene In the Features Panel

In the top toolbar, click the button to open the visualization settings panel and follow these settings:

Number of rows: 2. This determines the number of rows in the grid layout.
Sync zoom: Enable. When enabled, zooming in on one scatter plot will zoom in on all scatter plots in the grid.
Show plot title: Enable. This will show the title of each plot in the grid. You can position the title to the left, center, and right.

In the visualization settings table, focus on the key customization settings as outlined below:

Name: The name/title of the plot.
- You can edit the name by clicking the Edit icon next to the current plot name.
Blend Mode: The blending mode used in the plot. Blend mode can be used to combine multiple plots into one visualization. See Blend Mode for details how to use blending modes.
Color: The color mapping used in the plot. Depending on the chart type, the color mapping can be of two types:
- Value: The color mapping is based on the expression values, such as the minimum and maximum expression values.
- Group: The color mapping is based on the groupings, such as the metadata, clusters, or annotations.
For more details about color customization, refer to Visualization Settings.
Action:
- Click the Remove icon to remove the plot from the grid if needed.

Visualization Settings of The Trajectory Inference Analysis Results For The MS4A1 Gene

Click anywhere outside the visualization settings panel to close it. The details of all the plots are:

First plot: All lineages. Displays the complete trajectory lineage map of the analysis.
Second plot: Lineage 1 & Lineage 2. Shows the developmental pathways from the starting to the ending groups.
Third plot: Ms4a1 & Lineage 5. Illustrates the Ms4a1 gene expression overlay with the representative lineage Lineage 5.
Fourth plot: clusters_use. Shows the original clustering result that helped determine the starting and the ending groups.

The Complete Results of The Trajectory Inference Analysis For The MS4A1 Gene

Trajectory inference for the Ltf gene

Visualize the Ltf gene expression

We will observe the expression of the Ltf gene to manually select the starting and ending points for the Slingshot method parameters.

The Ltf gene expression patterns (the right panel) reveal high Ltf expression in cluster 5 and minimal expression in clusters 9 and 17. Based on these expression patterns, we designate cluster 5 as the starting point and clusters 9 and 17 as endpoints for the Slingshot trajectory inference parameters.

Perform trajectory inference analysis for the Ltf gene

Follow these settings to perform the analysis:

Name: Trajectory inference for LTF gene. Analysis identifier.
Method: Slingshot. Trajectory inference method used for the analysis.
Embeddings: umap3d. Embedding used for the analysis. In this case, we use the generated embedding from the dataset author.
Group cells by: Metadata. Specifies cell grouping method.
Metadata cluster_use. Metadata field for cell grouping. In this case, we use the clustering analysis result from the dataset author.
Start Groups: 5. Initial cluster for the trajectory inference analysis.
End Groups: 9, 17. Terminal clusters for the trajectory inference analysis.
Distance method: Mutual nearest neighbors. Method for computing inter-cluster distances.
Convergence threshold: 0.100. Threshold for change in total distance between cells and curve projections.
Approximate number of points: 300. Number of points along trajectory curves.
Stretch: 0.1. Curve extrapolation factor beyond endpoints.
Allow breaks: False. Ensures continuous principal curves from origin.

Click the Submit button to create the trajectory inference analysis.

Browse and Visualize Trajectory Inference Analysis Results for the Ltf gene

Once the trajectory inference is completed, the results will be added to the Trajectory Inference Table in the Existing Time Trajectory tab.

Trajectory Inference Analysis Results Table For The LTF Gene

Also, we will display multiple plots to visualize the analysis results. The details of all the plots are:

First plot: All lineages. Displays the complete trajectory lineage map of the analysis.
Second plot: Lineage 8 & Lineage 9. Shows the developmental pathways from the starting to the ending groups.
Third plot: Ltf & Lineage 1. Illustrates the Ltf gene expression overlay with the representative lineage Lineage 1.
Fourth plot: clusters_use. Shows the original clustering result that helped determine the starting and the ending groups.

The Complete Results of The Trajectory Inference Analysis For The LTF Gene

Trajectory inference for the Siglech gene

Visualize the Siglech gene expression

We will observe the expression of the Siglech gene to manually select the starting and ending points for the Slingshot method parameters.

The Siglech gene expression patterns (the right panel) reveal high Siglech expression in cluster 35 and minimal expression in clusters 18 and 27. Based on these expression patterns, we designate cluster 5 as the starting point and clusters 18 and 27 as endpoints for the Slingshot trajectory inference parameters.

Perform trajectory inference analysis for the Siglech gene

Follow these settings to perform the analysis:

Name: Trajectory inference for SIGLECH gene. Analysis identifier.
Method: Slingshot. Trajectory inference method used for the analysis.
Embeddings: umap3d. Embedding used for the analysis. In this case, we use the generated embedding from the dataset author.
Group cells by: Metadata. Specifies cell grouping method.
Metadata cluster_use. Metadata field for cell grouping. In this case, we use the clustering analysis result from the dataset author.
Start Groups: 35. Initial cluster for the trajectory inference analysis.
End Groups: 18, 27. Terminal clusters for the trajectory inference analysis.
Distance method: Mutual nearest neighbors. Method for computing inter-cluster distances.
Convergence threshold: 0.100. Threshold for change in total distance between cells and curve projections.
Approximate number of points: 300. Number of points along trajectory curves.
Stretch: 0.1. Curve extrapolation factor beyond endpoints.
Allow breaks: False. Ensures continuous principal curves from origin.

Click the Submit button to create the trajectory inference analysis.

Browse and Visualize Trajectory Inference Analysis Results for the Siglech gene

Once the trajectory inference is completed, the results will be added to the Trajectory Inference Table in the Existing Time Trajectory tab.

Trajectory Inference Analysis Results Table For The SIGLECH Gene

Also, we will display multiple plots to visualize the analysis results. The details of all the plots are:

First plot: All lineages. Displays the complete trajectory lineage map of the analysis.
Second plot: Lineage 5 & Lineage 8. Shows the developmental pathways from the starting to the ending groups.
Third plot: Siglech & Lineage 1. Illustrates the Siglech gene expression overlay with the representative lineage Lineage 1.
Fourth plot: clusters_use. Shows the original clustering result that helped determine the starting and the ending groups.

The Complete Results of The Trajectory Inference Analysis For The SIGLECH Gene

Trajectory inference for the Cd34 gene

Visualize the Cd34 gene expression

We will observe the expression of the Cd34 gene to manually select the starting and ending points for the Slingshot method parameters.

The Cd34 gene expression patterns (the right panel) reveal high Cd34 expression in cluster 34 and minimal expression in clusters 17 and 27. Based on these expression patterns, we designate cluster 5 as the starting point and clusters 17 and 49 as endpoints for the Slingshot trajectory inference parameters.

Additionally, the author's analysis included clusters 27, 25, 16, 26, and 53 as ending groups. Consequently, we will also include these clusters in our ending groups for the comparison between our analysis results with those of the author.

Perform trajectory inference analysis for the Cd34 gene

Follow these settings to perform the analysis:

Name: Trajectory inference for CD34 gene. Analysis identifier.
Method: Slingshot. Trajectory inference method used for the analysis.
Embeddings: umap3d. Embedding used for the analysis. In this case, we use the generated embedding from the dataset author.
Group cells by: Metadata. Specifies cell grouping method.
Metadata cluster_use. Metadata field for cell grouping. In this case, we use the clustering analysis result from the dataset author.
Start Groups: 34. Initial cluster for the trajectory inference analysis.
End Groups: 17, 27, 25, 16, 26, 53, and 49. Terminal clusters for the trajectory inference analysis.
Distance method: Mutual nearest neighbors. Method for computing inter-cluster distances.
Convergence threshold: 0.100. Threshold for change in total distance between cells and curve projections.
Approximate number of points: 300. Number of points along trajectory curves.
Stretch: 0.1. Curve extrapolation factor beyond endpoints.
Allow breaks: False. Ensures continuous principal curves from origin.

Click the Submit button to create the trajectory inference analysis.

Browse and Visualize Trajectory Inference Analysis Results for the Cd34 gene

Once the trajectory inference is completed, the results will be added to the Trajectory Inference Table in the Existing Time Trajectory tab.

Trajectory Inference Analysis Results Table For The CD34 Gene

The original study conducted trajectory inference analysis for the Cd34 gene and provided a lineages table for their results.

By comparing the author's lineages table, we can see a perfect concordance between the two analyses.

Finally, we will display multiple plots to visualize the analysis results. The details of all the plots are:

First plot: All lineages. Displays the complete trajectory lineage map of the analysis.
Second plot: Lineage 1 & Lineage 2. Shows the developmental pathways from the starting to the ending groups.
Third plot: Cd34 & Lineage 5. Illustrates the Cd34 gene expression overlay with the representative lineage Lineage 5.
Fourth plot: clusters_use. Shows the original clustering result that helped determine the starting and the ending groups.

The Complete Results of The Trajectory Inference Analysis For The CD34 Gene

Last modified: 24 September 2025