This post describes my first attempt at creating a graph for scientific publications using Scribus, a free and open-source (FOSS) desktop publishing (DTP) application. In addition to a WYSIWYG interface, Scribus also allows programmatic editing of page contents using Python. A few lines of code are enough to automate repetitive tasks that require high precision (e.g. placing points inside graphs or markers in scales). I found that this combination makes Scribus an excellent option for creating graphs for scientific publications.
Scribus vs. the competition
Scribus is more precise than run-of-the-mill office software, cheaper and easier to use than commercial DTP software, and more flexible than the graphics libraries in statistical packages such as R. Available export formats include PDF and the most popular image formats; therefore, the output is appropriate for most academic publications.
Creating a simple graph
To test the above, I created the graph below, which depicts a comparison between two groups of patients. Creating this graph was one of the assignments in the Designing Figures and Tables class at UCSD. The design is generally based on recommendations from Tom Lang’s excellent book How to Report Statistics in Medicine.
Automating the nitty-gritty
Calculating the page coordinates of each point of the curve and of the scales in the axes by hand would be tedious and error-prone, but this task can be automated using a simple Python script.
First, a function to create the x and y axes…
def DrawAxes (base, x_length, y_length): createLine (base[0], base[1], base[0] + x_length, base[1]) createLine (base[0], base[1], base[0], base[1] - y_length)
…then, a second function to draw the scale markers on the x and y scales…
def DrawScale (base, length, marker_spacing, marker_length, axis, offset): if axis == "x": for i in xrange (base[0], base[0] + length, marker_spacing): createLine(i + offset, base[1], i + offset, base[1] + marker_length) elif axis == "y": for i in xrange (base[1], base[1] - length, - marker_spacing): createLine(base[0] - marker_length, i - offset, base[0], i - offset)
…and finally, a function to draw the curves:
def DrawCurve (base, points, offset): createLine(base[0] + offset[0] + i[0][0], base[1] - offset[1] - i[0][1], base[0] + offset[0] + i[1][0], base[1] - offset[1] - i[1][1]) for i in zip (points, points[1:])
Putting it all together:
#Data for the graph. #'healthy' lists the heights and ages of health children (one age-height pair per tuple). #'ckd' lists the heights and ages of children with CKD. #'plot_area_base' indicates the origin of the graph. healthy = [(5,75), (6, 73), (7, 79), (8, 80), (9, 90), (10, 110), (11, 140), (12, 150)] ckd = [(5,50), (6, 62), (7, 65), (8, 75), (9, 80), (10, 85), (11, 110), (12, 120)] plot_area_base = (250, 500) #Actual drawing of the graph DrawScale (plot_area_base, 200, 25, 5, "x", 10) DrawScale (plot_area_base, 150, 25, 5, "y", 10) DrawCurve (plot_area_base, [(25 *(i[0] - 5), i[1] - 50) for i in healthy], (10, 10)) DrawCurve (plot_area_base, [(25 *(i[0] - 5), i[1] - 50) for i in ckd], (10, 10)) DrawAxes (plot_area_base, 200, 150)
The functions above will draw a skeleton graph (shown below), which can be edited to prepare the final version.
After a few more minutes of work—adding a few text fields, inserting scale numbers, and tweaking line patterns —the final version:
Comparison with Excel and PowerPoint
There’s no comparison, really. When I did this assignment for the first time, I tried to use Excel and PowerPoint, which are not really designed for this application. Unfortunately both these programs impose severe limitations on the formatting of the graph. In particular, I could not prevent the lines from touching the axes, choose specific values or intervals for the scales, measure the exact height of data points, or position captions exactly. After much struggling, it took me longer to produce an inferior graph when I first took the Designing Figures and Tables class at UCSD.
Next steps
The code above is just a quick sketch to demonstrate the practicality of drawing graphs with Scribus. Evidently there are countless other possibilities, such as creating any other kind of graph, reading data files, and customization to suit requirements from publishers, printers, presenters, or other stakeholders in the communications process. I will polish the script and publish an improved version in an upcoming post.
Conclusion
Scribus is an excellent option for creating graphs for scientific publications. It is easier to use than DTP software, more powerful than run-of-the-mill Office packages (I created this graph more quickly than when I completed the original assignment using Excel), and, perhaps most interesting, more flexible than anything else available out there. If you create graphs for scientific publications and know a little Python, you may want to try it out.