All posts by Angela Zoss

Demystifying Data & GIS Services

Staff Expertise ChartConfused about Data & GIS Services?  Not sure what questions you should be asking us or what kind of services we provide?  Here’s one handy chart we’ve come up with to explain what exactly we cover in our consultations and workshops.

When it comes to picking what day to stop by our walk-in hours or knowing how much of the data life cycle our consultants cover, this graphic might be your first stop.  Whether it’s finding data, processing or analyzing that data, or mapping and visualizing that data, we have staff with expertise to help!

Still not sure who to approach or what kind of help you might need?  Just email askdata@duke.edu to get in touch with all of us at once.  Some questions can be answered quickly over email, but we’re also happy to schedule an appointment to talk in person.

Announcing the 2014 Student Data Visualization Contest

Student Data Visualization ContestData & GIS Services will soon be accepting submissions to its 2nd annual student data visualization contest.  If you have a course project that involves visualization, start thinking about your submission now!

The purpose of the contest is to highlight outstanding student data visualization work at Duke University. Data & GIS Services wants to give you a chance to showcase the hard work that goes into your visualization projects.

Data visualization here is broadly defined, encompassing everything from charts and graphs to 3D models to maps to data art.  Data visualizations may be part of a larger research project or may be developed specifically to communicate a trend or phenomenon. Some are static images, while others may be animated simulations or interactive web experiences.  Browse through last year’s submissions to get an idea of the range of work that counts as visualization.

The Student Data Visualization Contest is sponsored by Data & GIS Services, Perkins Library, Scalable Computing Support Center, Office of Information Technology, and the Office of the Vice Provost for Research.

For more details, see the 2014 Student Data Visualization Contest page.   Please address all additional questions to Angela Zoss (angela.zoss@duke.edu), Data Visualization Coordinator, 226 Perkins Library.

Dr. Christopher Healey to Present at Visualization Friday Forum

Dr. Christopher G. HealeyOn Friday, October 4, Dr. Christopher G. Healey will visit Duke University to speak at the Visualization Friday Forum.

Christopher G. Healey is an Associate Professor in the Department of Computer Science at North Carolina State University. He received a B.Math from the University of Waterloo in Waterloo, Canada, and an M.Sc. and Ph.D. from the University of British Columbia in Vancouver, Canada. He is an Associate Editor for ACM Transactions on Applied Perception. His research interests include visualization, graphics, visual perception, and areas of applied mathematics, databases, artificial intelligence, and aesthetics related to visual analysis and data management.

We hope you can join us at the Friday Forum!

Visualizing Tweet Sentiment
Friday, October 4, 2013
12:00p.m. to 1:00p.m. (lunch provided)

Levine Science Research Center, Room D106 (near the Research Drive entrance), in conjunction with the Visualization Friday Forum

During this talk I will discuss a new project that focuses on ways to visualize text, specifically short text snippets like those found in tweets, SMS text messages, or Facebook wall posts. Our visualizations present text collections using a combination of numerous approaches: sentiment analysis, topic clusters, tag clouds, affinity graphs, volume timelines, and sentiment heatmaps. A second aspect of this project involves web-based visualization. We are implementing our visualization tools in Javascript, HTML, and CSS, allowing us to distribute our visualizations through any modern web browser, without the need for plug-ins. This also offers an opportunity to assess the strengths and limitations of current web-based visualization and user
interface libraries.

We chose Twitter as a testbed for our techniques. Twitter’s publicly accessible APIs allow us to query collections of recent tweets for user-chosen keywords, or to tap into the real-time tweet stream—the “firehose”—to capture tweets by keyword as they are posted. To assess the practical usefulness and usability of our visualizations, we partnered with WRAL TV, the CBS/Fox network affiliate for the Raleigh, North Carolina broadcast region. WRAL ran our Twitter visualizations on their web site during the each of the recent U.S. Presidential debates. This allowed viewers to watch the debate, and at the same time to monitor the volume, sentiment, and content of tweets about the debate as they were posted in real-time. WRAL reporters used a modified version of the visualization tool to perform post-analysis of the captured tweet stream. Interesting findings were included in news stories they published following the debates.

Upcoming MATLAB Training at Duke

MATLAB is an integrated technical computing environment that combines numeric computation, advanced graphics and visualization, and a high-level programming language.  Duke’s license agreement offers MATLAB licenses to faculty and staff for work or personal computers, as well as students through on-campus use.  The Duke Office of Information Technology (OIT) maintains instructions on installing MATLAB at Duke.  MATLAB is used by many communities at Duke, including Engineering, Econometrics, Medical Sciences, Computational Biology, and Business.

On Tuesday, June 18, OIT in partnership with Duke University Libraries will host a one-day course on MATLAB that focuses on using this software for Data Processing and Visualization.  The course will cover importing data, organizing data, and visualizing data in a hands-on format (detailed outline).  Seats are limited to 20; please register soon to reserve your spot.

MATLAB for Data Processing and Visualization
(outline)
Laura Proctor, Academic Training Engineer at MathWorks
Tuesday, June 18
8:30 a.m. to 4:30 p.m. (lunch break from 12:00 p.m. to 1:00 p.m., lunch not provided)
Library Computer Classroom, Bostock 023
Registration (seats limited to 20)

The course assumes some existing familiarity with MATLAB.  New potential MATLAB users may want to attend an overview seminar on the software that will be held on Thursday, May 30.  This overview will not be hands on, but it will include live demonstrations and examples of both MATLAB and Simulink, an environment for multi-domain simulation and model-based design.

Introduction to Data Analysis and Visualization with MATLAB & Simulink
(details and registration)
Mehernaz Savai, Applications Engineer at MathWorks
Thursday, May 30
1:00 p.m. to 4:00 p.m.
FCIEMAS Building, Schiciano Auditorium – side A

If you would like to begin learning to use MATLAB, MathWorks offers a self-directed MATLAB Fundamentals course, and the Duke library collection also includes several introductory MATLAB texts, such as MATLAB Primer and MATLAB: A Practical Approach.

Student Data Visualization Contest Winners

The finalists winners of the 2013 Data Visualization Contest were announced at our recent Data & GIS Services open house. The judging panel selected the top five submissions as finalists, each of which was then converted into a poster for display in the Brandaleone Family Center for Data and GIS Services (Perkins 226). Of the five finalists, the panel also selected two grand prize winners, each of whom was awarded $250 in Amazon Gift Cards.

The grand prize winners were:

ACC Basketball Tournament Series Records, by Volodymyr Zavidovych

1acc_white_transparent

Limbique, by Pinar Yoldas and David Paulsen

Yoldas_Paulsen

The other three finalists were:

Mapping Chinatown, by Sabrina McCutchan

McCutchan_gallery

Duke Intellectual Climate Report 2012, by Amanda Peralta

ICCReportCompilation300ppi

spNavigate, by Benjamin Radford

Radford_spNavigate

Data and GIS Services would like to congratulate the finalists and winners and thank all of the student submitters for their impressive work! The full set of submissions to the contest is available on our growing Flickr gallery.

Free Tableau Licenses for Students

tableau

Tableau is a data visualization software application that allows you to easily create and share interactive charts, graphs, and maps. While the free version of this tool, Tableau Public, has offered wonderful opportunities for generating and publishing data visualizations, there are file size and format limits that make it difficult for some researchers to use the public tool.

For some time, the company has had a program to offer temporary licenses to teachers and students who use Tableau in the classroom (Tableau for Teaching). Now, the company is giving full-time students free access to Tableau Desktop for one year.

With the recent release of Tableau 8 and its many new features, this is a wonderful time to start visualizing your data!

Duke welcomes Dr. Christopher Collins, April 4-5

Dr. Christopher CollinsOn Thursday, April 4 and Friday, April 5, Duke University will host a visit from Dr. Christopher Collins, Assistant Professor of Computer Science at the University of Ontario Institute of Technology (UOIT), where he directs The Visualization for Information Analysis lab (vialab). While at Duke, Dr. Collins will give two public presentations and will be available for meetings with groups and individuals. His visit is sponsored by Information Science + Information Studies (ISIS).

Dr. Collins engages in interdisciplinary research, combining information visualization and human-computer interaction with natural language processing to address the challenges of information management and the problems of information overload. His publications, including the DocuBurst document content visualization system, have helped to open a new and thriving area of research in “Linguistic Visualization”. Dr. Collins has been awarded a Discovery Grant from NSERC, providing 5 years of funding for research on “Text and Multimedia Document Visualization”. His research interests include: visualization of natural language data, interaction techniques for information visualization (including multi-touch interaction), scientific visual analytics, and social implications of computing / ethics & philosophy of computing.

Dr. Collins will give the following public presentations:

Humanizing Data:
Enabling Linguistic Insight with Information Visualization

Thursday, April 4, 2013
12:00p.m. to 1:00p.m. (lunch provided)
Smith Warehouse, Bay 4, in the FHI Garage

While linguistic skill is a hallmark of humanity, the increasing volume of linguistic data each of us faces is causing individual and societal problems – ‘information overload’ is a commonly discussed condition. Big data has enabled new tasks, such as finding the most appropriate information online, engaging in historical study using language data on the level of millions of documents, and tracking trends in sentiment and opinion in real time. These tasks need not cause stress and feelings of overload: the human intellectual capacity is not the problem. Rather, the current technological supports are inappropriate for these tasks. Linguistic information overload is not a new phenomenon: throughout history, the pace of information creation and storage has exceeded the pace of development of management strategies.

Drawing on a variety of qualitative and quantitative methods, my research aims to bring new, richly interactive interfaces to the forefront of information management, in order to keep up with the current challenges of ‘big data’ and the growing power of linguistic computing algorithms. In this talk I will present the results of several design studies spanning investigations of patterns in millions of real passwords to using visualization to analyze the written history of the court system. Each project aims to bridge what I call the ‘linguistic visualization divide’ – the practical disconnect between the sophistication of natural language processing and the power of interactive visualization. In conclusion, I will present some general challenges and opportunities for the future of text and language visualization.

Designing Multiple Relation Visualizations:
Case Studies from Text Analytics

Friday, April 5, 2013
12:00p.m. to 1:00p.m. (lunch provided)
Levine Science Research Center, Room D106 (near the Research Drive entrance), in conjunction with the Visualization Friday Forum

Datasets often have both explicit relations (e.g. citations between papers in a data set, links in a parse tree), and implicit relations (e.g. papers by the same author, words that start with the same letter). Drawing on grounding research into the real-world problems faced by computational linguists, in this talk I will explore several examples of visualizations designed to support simultaneous exploration of both explicit and implicit relations in data. I will suggest the concept of ‘spatial rights’ – the primacy of the spatial visual encoding, and present several methods for enhancing visualizations through adding implicit relation information without disrupting the spatialization of the explicit relation. The techniques have been generalized by others beyond the linguistic domain to be used in bioinformatics, finance, and general statistical charts.

There are also blocks of time in his schedule available for individual and group meetings. If you would like to meet with Dr. Collins, please contact Angela Zoss (angela.zoss@duke.edu) or Eric Monson (emonson@cs.duke.edu).

Select Research Projects

 

Data and GIS Services to Host Open House

On March 20, from 3:30pm to 5:30pm, Data and GIS Services will be hosting an Open House to celebrate recent upgrades to our computer lab and to announce the finalists and winners of the recent Data Visualization Contest.

Perkins_226_sm

Since the end of the summer, Data and GIS Services has enjoyed periodic upgrades to the computer lab, starting with a refresh of all of the machines and an expansion from 8 machines to 12. Each machine now boasts two 24″ monitors, a 4-core 3.5 GHz processor, 16GB RAM, 1TB of open storage for projects in progress, and an extensive list of GIS, statistical, and visualization software packages.

In addition to the new machines, the lab space has just been enhanced with a 50″ display and conference table to support small group instructional sessions. For advanced topics and sessions using software packages that can be installed on individuals’ laptops, the new display will allow Data and GIS Services staff to expand instructional opportunities and meet additional needs of the Duke community.

The finalists and winners of the recent data visualization contest are developing poster versions of their submissions.  These posters will hang in the Data and GIS Services lab and will be unveiled at the open house.

Contest_Teaser

Details

  • Date: Wednesday, March 20
  • Location: Perkins 226
  • Schedule:
    • 3:30pm: Refreshments, mingling
    • 4:00pm: Welcome from Data & GIS Services and Research Computing; announcement of finalists and winners of the data visualization contest
    • 5:30pm: Event concludes

Please join us on March 20 to celebrate the expansion of Data and GIS Services! We look forward to having you stop by.

Submit to our 2013 Data Visualization Contest!

Data Visualization Contest

The Data & GIS Services department is hosting a data visualization contest this winter!

Are you a current Duke University undergraduate or graduate student? Have you used data visualization in a past or current research project to help solve a problem, tell a story, or highlight an interesting trend? Write up a short description and you’ll have a submission for the contest and a chance to win a $500 technology prize!

$500 Technology Prize!

The final deadline for submissions is January 20, 2013.  For more details, see the full contest site.

Start thinking now about how to turn your course projects into submissions!  Email askdata@duke.edu or angela.zoss@duke.edu with any questions.

Adding Colored Regions to Excel Charts

Time series data is easy to display as a line chart, but drawing an interesting story out of the data may be difficult without additional description or clever labeling. One option, however, is to add regions to your time series charts to indicate historical periods or visualization binary data.

Here is an example where a chart of annual U.S. national economic indicators has been enhanced with regions that also indicate contractions in the U.S. business cycle – roughly speaking, economic recessions.

A time series with colored regions in the background, created in Excel.

To create this chart, all of the indicators were averaged by year and, where necessary, adjusted for inflation using a conversion factor. Download the time series data Excel file for the data and the chart to follow along.

First, to set up the basic line chart, hold Ctrl (PC) or Cmd (Mac) while you select the following columns:

  • D (stock price index over CF…)
  • E (avg. annual unemployment…)
  • G (GDP over CF…)
  • I (debt over CF…)
  • K (interest rate * 10)
  • L (years economy is in decline)

Youll notice that the columns are color coded. Some colors apply to multiple columns; this is because the values that appear on the chart have been calculated by transforming the raw data in some way. Each line on the final chart thus corresponds to one or more columns of data used to produce the values. Transforming the values helps us by normalizing the values (i.e., adjusting for inflation) or scaling the data series itself (making it possible to see the relationships between many different indicators on a single graph, despite wide variations in the ranges of values).

When we select the six columns above and insert a line chart, we get a rather ugly line chart.

The chart as it looks with the default Excel settings.

We’ll make several changes to improve this:

  • Change the “years… in decline” series to an area chart
  • Select and adjust the x axis labels and ticks
  • Adjust the y axis range
  • Customize the color, label, and order of the data series

The basic mechanism of the colored regions on the chart is to use Excel’s “area chart” to create rectangular areas. The area chart essentially takes a line chart and fills the area under the line with a color. If we have a continuous horizontal line as a data series, we will create a large colored rectangle on the chart. To have breaks in the rectangle, we simply need to leave some of the years blank (without values in the cells). To select the appropriate values for column L, we first found the maximum value for the other data series and determined that a value of 20 would create bars starting above the other data series.

To produce the colored regions that indicate contractions in the business cycle, we take the series that was created from column L and turn it into an area chart.

  1. Right-click on any data point in the series or on the legend entry
  2. Select “Change Series Chart Type…”
    Changing the chart type for a particular data series.
  3. Select the standard Area chart from the ribbon
    Select Area from the ribbon.

The chart now fills in the area under the original lines with a default fill color.

After changing one series to an Area chart.

At this point, you can right click on the series again, select “Format Data Series…”, and change the Fill color to a light gray.

Changing the fill color of the Area series.

Next, we tell the x axis what the correct labels are (the “Year” column) and have the labels show up every 4 years. (Our data series start on an election year, so the labels will always appear on election years.)

      1. Right click inside the chart somewhere and select “Select Data…”
      2. Select any of the data series in the “Series” list, then go over to the “Category (X) axis labels” box and select the “Year” column. Click “OK”.
        Changing the X axis labels.
      3. Right-click on the x axis and select “Format Axis…”.
      4. Under “Scale”:
        1. Change the default interval between labels from 3 to 4
        2. Change the interval between tick marks to 4 as well
        3. Uncheck the box next to “Vertical axis crosses between categories”
          Changing the X axis scale.
      5. Under “Text Box”, select the text direction of “Rotate to 90 deg Counterclockwise”. Click “OK”.
        Changing the X axis text.

The x axis should have appropriate year labels now. The y axis can similarly be adjusted to show just the range of values we’re most interested in.

      1. Right-click on the axis and select “Format Axis…”.
      2. Under “Scale”, unselect the check box next to “Maximum:” and change the value to 20.
        Changing the Y axis scale.

The rest of the changes are simply formatting changes. Right-click on the individual data series to change the colors, line widths, etc. Use the formatting options or the Chart tools on the Excel ribbon to change the font of any text, adjust the grid lines, add labels and titles, etc. The data series names in the legend can be adjusted by using the “Select Data…” option and typing in custom text in the “Name” field.

The final product should have colored regions and look something like the chart below.

A time series with colored regions in the background, created in Excel.

In another post, we will show how to spice this chart up even more using Adobe Illustrator.