bang wong > post

Data Visualization

Vis Skunkworks

Effective visualization makes complex data clear.

There is an opportunity to use visual representation in innovative ways to gain insights into biological data to effectively communicate science. To do this requires discovering novel visual encoding systems and applying existing ones in new ways. The challenge facing us in biology today is to make sense of the vast amounts of information. By bringing together an understanding of biology and principles from perception, cognition, and design, we help researchers meet the challenge by describing and developing meaningful visual representations of data.

Designing effective visual encodings of data requires a primary focus on the scientific questions and a thorough characterization of the visualization system. It is a process that translates the language of biology into the more abstract language of computer science and maps it to information visualization.

The Data Visualization Initiative aims to 1) establish processes for creating informative visualization models, 2) provide functional prototypes, and 3) build a community for people who apply visuals in their research. This effort is being lead by Bang Wong, Noam Shoresh and Liraz Greenfeld.

Pathline: A tool for comparative genomics
M. Meyer, B. Wong, M. Styczynski, T. Munzner, and H. Pfister.
Eurographics/ IEEE-VGTC Symposium on Visualization, 29: 1-10 (2010)

Biologists pioneering the new field of comparative functional genomics attempt to infer the mechanisms of gene regulation by looking for similarities and differences of gene activity over time across multiple species. They use three kinds of data: functional data such as gene activity measurements, pathway data that represent a series of reactions within a cellular process, and phylogenetic relationship data that describe the relatedness of species. No existing visualization tool can visually encode the biologically interesting relationships between multiple pathways, multiple genes, and multiple species. We tackle the challenge of visualizing all aspects of this comparative functional genomics dataset with a new interactive tool called Pathline. In addition to the overall characterization of the problem and design of Pathline, our contributions include two new visual encoding techniques. One is a new method for linearizing metabolic pathways that provides appropriate topological information and supports the comparison of quantitative data along the pathway. The second is the curvemap view, a depiction of time series data for comparison of gene activity and metabolite levels across multiple species. Pathline was developed in close collaboration with a team of genomic scientists. We validate our approach with case studies of the biologists’ use of Pathline and report on how they use the tool to confirm existing findings and to discover new scientific insights.

Download the paper
More info at

Figure 2:
Linearizing a pathway. (a) The node-link representation of the directed graph includes both a branch and cycle. (b) Loops are unrolled and branches are disconnected. (c) Branches are reinserted just above their reconnection points. (d) The pathway is represented as a grey segment, with genes encoded spatially with points and metabolites as lines. Short breaks in the pathway segment indicate branch points, along with stylized marks to the left of the blocks. Cy- cle start points are also shown to the left with another mark.

Figure 4:
Whole genome duplication event. (a) The known post-duplication shift in activity patterns in the first five rows between the g1 and g2 genes is immediately obvious in Pathline, where the curves clearly have mirror symmetry. (b) The mirror symmetry is much less apparent in a conventional heatmap view showing the same data.