Jens G Lohr, Viktor A Adalsteinsson, Kristian Cibulskis, Atish D Choudhury, Mara Rosenberg, Peter Cruz-Gordillo, Joshua M Francis, Cheng-Zhong Zhang, Alex K Shalek, Rahul Satija, John J Trombetta, Diana Lu, Naren Tallapragada, Narmin Tahirova, Sora Kim, Brendan Blumenstiel, Carrie Sougnez, Alarice Lowe, Bang Wong, Daniel Auclair, Eliezer M Van Allen, Mari Nakabayashi, Rosina T Lis, Gwo-Shu M Lee, Tiantian Li, Matthew S Chabot, Amy Ly, Mary-Ellen Taplin, Thomas E Clancy, Massimo Loda, Aviv Regev, Matthew Meyerson, William C Hahn, Philip W Kantoff, Todd R Golub, Gad Getz, Jesse S Boehm & J Christopher Love.
Nat. Biotech, 32, 479–484 (2014).
Comprehensive analyses of cancer genomes promise to inform prognoses and precise cancer treatments. A major barrier, however, is inaccessibility of metastatic tissue. A potential solution is to characterize circulating tumor cells (CTCs), but this requires overcoming the challenges of isolating rare cells and sequencing low-input material. Here we report an integrated process to isolate, qualify and sequence whole exomes of CTCs with high fidelity using a census-based sequencing strategy. Power calculations suggest that mapping of >99.995% of the standard exome is possible in CTCs. We validated our process in two patients with prostate cancer, including one for whom we sequenced CTCs, a lymph node metastasis and nine cores of the primary tumor. Fifty-one of 73 CTC mutations (70%) were present in matched tissue. Moreover, we identified 10 early trunk and 56 metastatic trunk mutations in the non-CTC tumor samples and found 90% and 73% of these mutations, respectively, in CTC exomes. This study establishes a foundation for CTC genomics in the clinic.
Sandro Santagata, Marc L. Mendillo, Yun-chi Tang, Aravind Subramanian, Casey C. Perley, Stephane P. Roche, Bang Wong, Rajiv Narayan, Hyoungtae Kwon, Martina Koeva, Angelika Amon, Todd R. Golub, John A. Porco Jr., Luke Whitesell, Susan Lindquist.
Science 19 July 2013
The ribosome is centrally situated to sense metabolic states, but whether its activity, in turn, coherently rewires transcriptional responses is unknown. Here, through integrated chemical-genetic analyses, we found that a dominant transcriptional effect of blocking protein translation in cancer cells was inactivation of heat shock factor 1 (HSF1), a multifaceted transcriptional regulator of the heat-shock response and many other cellular processes essential for anabolic metabolism, cellular proliferation, and tumorigenesis. These analyses linked translational flux to the regulation of HSF1 transcriptional activity and to the modulation of energy metabolism. Targeting this link with translation initiation inhibitors such as rocaglates deprived cancer cells of their energy and chaperone armamentarium and selectively impaired the proliferation of both malignant and premalignant cells with early-stage oncogenic lesions.
Martin Krzywinski & Bang Wong. Nature Methods 10, 451 (2013).
Choose distinct symbols that overlap without ambiguity and communicate relationships in data.
Scatter plots require us to visually assemble data point symbols into patterns so that we can understand the relationship between the variables. Symbols can therefore have a large impact on figure legibility and clarity. Well-chosen symbols mitigate the effects of data occlusion and maintain the visual independence of different data categories.
Bang Wong. Nature Methods 9, 1131 (2012).
Data visualization is increasingly important, but it requires clear objectives and improved implementation.
Researchers today have access to an unprecedented amount of data. The challenge is to benefit from this abundance without being overwhelmed. Data visualization for efficient exploration and effective communication is integral to scientific progress. For visualization to continue to be an important tool for discovery, its practitioners need to be present as members of research teams.
Bang Wong & Rikke Schmidt KjÃ¦rgaard. Nature Methods 9, 1037 (2012).
A unique set of tools facilitate thinking and hypothesis generation.
Creating pictures is integral to scientific thinking. In the visualization process, putting pencil to paper is an essential act of inward reflection and outward expression. It is a constructive activity that makes our thinking specific and explicit. Compared to other constructive approaches such as writing or verbal explanations, visual representation places distinct demands on our reasoning skills by forcing us to contextualize our understanding spatially.
Nils Gehlenborg & Bang Wong. Nature Methods 9, 935 (2012).
Two-dimensional visualizations of multivariate data are most effective when combined.
High-dimensional data pose a significant analytical and representational challenge. One instinctual response has been to represent data in three-dimensional (3D) space in order to capture additional information. Given the common medium utilized for science communication, great utility can be achieved by pushing the communicative power of the endless 2D planes that surround us in the form of pieces of paper, computer monitors and video projections.
M. Garber et al., Mol. Cell, 47 (5): 810-822 (2012).
Understanding the principles governing mammalian gene regulation has been hampered by the difficulty in measuring in vivo binding dynamics of large numbers of transcription factors (TF) to DNA. Here, we develop a high-throughput Chromatin ImmunoPrecipitation (HT-ChIP) method to systematically map protein-DNA interactions. HT-ChIP was applied to define the dynamics of DNA binding by 25 TFs and 4 chromatin marks at 4 time-points following pathogen stimulus of dendritic cells. Analyzing over 180,000 TF-DNA interactions we find that TFs vary substantially in their temporal binding landscapes. This data suggests a model for transcription regulation whereby TF networks are hierarchically organized into cell differentiation factors, factors that bind targets prior to stimulus to prime them for induction, and factors that regulate specific gene programs. Overlaying HT-ChIP data on gene-expression dynamics shows that many TF-DNA interactions are established prior to the stimuli, predominantly at immediate-early genes, and identified specific TF ensembles that coordinately regulate gene-induction.
Nils Gehlenborg & Bang Wong. Nature Methods 9, 851 (2012).
Three-dimensional visualizations are effective for spatial data but rarely for other data types.
When working with high-dimensional data, it may be tempting to choose a three-dimensional (3D) spatial visualization over a two-dimensional (2D) ‘flat’ representation because it allows us an additional data dimension. However, because quantitative, categorical and relational data are often not representing spatial relationships, plotting them in 3D space adds a level of visual complexity that often makes the data more difficult to understand. It therefore can be more effective to plot these data on a 2D plane and rely on nonspatial graphical encodings to represent additional dimensions.
Nils Gehlenborg & Bang Wong. Nature Methods 9, 769 (2012).
Data structure informs choice of color maps.
Data can be classified in many ways. One useful method of classifying data for visualization is to distinguish between those with and without an inherent order. For example, a set of species (such as Escherichia coli, Drosophila melanogaster and Homo sapiens) has no intuitive ordering and is considered ‘categorical data’, whereas a list of gene expression values is ‘ordered data’ because we can sort them from lowest to highest. In a previous column, we described methods for color-coding categorical data (August 2010)1. Here we focus on creating color maps for quantitative data.
Cydney Nielsen & Bang Wong. Nature Methods 9, 631 (2012).
Techniques for displaying relations between distant genomic positions.
With a rapidly growing collection of genomes coming from such initiatives as the 1000 Genomes Project, the days of a single reference genome are numbered. Although the genomic sequence between any two human individuals differs only by about 0.1%, there are abundant structural and copy-number variations of different types and sizes. Effective visualization of these genomic variations is required to gain insight into the genetic basis of human health and disease. However, variation data pose new challenges to traditional genome visualization tools, which depend on linear layouts and have difficulty depicting large structural rearrangements.