Unix tutorial for scientists

You have heard it before : "I can write a script in Unix that will do in 5 seconds what you do in Excel in an hour", or if you haven't, you have thought: "There has to be a better way than clicking mindlessly for an hour". You would love to save time, but you don't know where to start. All tutorials on the web look like gibberish when you read them. This hopes to give you an overview of what it means "to write a Unix script" or "use Unix", what it can do and how to do it. At the end of this document, you will find a list of simple commands that you can cut and paste on your computer to do tasks that could be of general interest.
Note: Often, people will use these terms interchangeably: programming in Unix, or using Unix, shell programming, script writing, bash, sh, csh, perl, python, programming. You will learn below what it means to use a shell or write a script, and why these appear to be synonyms.

Introduction


Command-line tools can save you a lot of time because they can process large number of files very rapidly. They are called command-line tools because they require you to type a text command on a line instead of requiring you to point and click in a Graphical User Interface (or GUI). These tools are often called, or thought of as, Unix because the operating system Unix originally could only accept commands as text commands and includes a multitude of such tools. As an extension, Unix became a synonym (in people's minds) for command-line tools.

These tools are meant to operate on files or directories. Hence, they are accessed usually through a shell that places you into the file system, inside a given directory. Typically, you start a program called a Terminal, that will start a shell and place you in your main directory (or your home directory). Here is an example from Mac OS X Terminal.app application:

Shell


Here are examples of command-line tools (from now on, called simply tools or commands).

  1. The pwd command (print working directory) will tell you what directory you are in.
  2. The ls command (list), in its simplest form, will list all files in the current directory. In its most complex form, it can list files, order them by modification date and colour them depending on what type they are.
  3. The find command, in its simplest form, can find files with a certain name. In its most advanced form, it can find certain directories modified after a certain date and act on the group of directories or a group of files contained in them.
  4. The gzip command (gnu zip) will compress a file or multiple files

Starting

More to come.


0 comments

VNC Musings: Multiple sessions on OS X

There are some software packages to make it possible to obtain multiple GUI sessions from multiple users simultaneously on Mac OS X: AquaConnect and iRAPP. However, in my experience, they are buggy (i.e. can make the server crash) and very expensive. However, they are convenient. It is possible to duct tape together a similar solution that does not require Admin privileges other than to be able to log in via VNC remotely. This works on OS X client and server, starting from 10.4 where fast user switch was introduced,

The strategy is the following: using the Fast User Switch capability of OS X and start a VNC server (as a regular user) on a port of your choice. By switching back to the loginwindow, you hide your session from other users (i.e. you are not at the Console anymore) yet you can remotely access it with a simple VNC viewer.

  1. Turn on Fast User Switching in System Preferences on the machine you want to connect to.
  2. Download Vine Server
  3. Open Vine Server, choose a port such as 5901 (because 5900 may be taken by the system's own VNC)
  4. Set a password and choose whatever settings you want for sharing (i.e. exclusive or not).
  5. Click Start Server
  6. You can test right away with a VNC client (Chicken of the VNC for instance) by connecting to port 5903 on your server. You should get an identical screen to the main screen.
  7. In the OS X Fast User Switch menu, choose "Login window", which will bring back the login window to the front. Now other users can still use the console but you can also access your own GUI session by VNC'ing to port 5903.

Advantages:

  1. Free
  2. No admin privileges needed (you can run Vine Server.app from your home directory)
  3. Secure

Disadvantages

  1. You need to at least log in once to start your own Vine Server on the port of your choosing
  2. If you reboot the machine, you need to start over to set it up again
  3. You need to partly remember what port you choose (although Vine Server offers your connection through Bonjour and you can partly skip that).
  4. Multiple users need to pick different ports
  5. May be a bit convoluted for non-expert users




0 comments

VNC Musings: fixing AppleVNCServer

Unable to connect via VNC or Apple Screen sharing


Sometimes AppleVNCServer starts acting funny and prevents new sessions from connecting properly. A message like this appears in the logs:

Sep  1 22:35:49 cafeine AppleVNCServer[74229]: kCGErrorIllegalArgument: CGSGetDisplayBounds (display 603550)
Sep 1 22:35:49 cafeine com.apple.ScreenSharing.server[74229]: Wed Sep 1 22:35:49 cafeine.crulrg.ulaval.ca AppleVNCServer[74229] <Error>: kCGErrorIllegalArgument: CGSGetDisplayBounds (display 603550)

You are unable to connect because of some confusion in the VNCServer provided by apple. Many forums discuss this and the fix is the following. You need to kill the loginwindow process that belongs to "root". The other loginwindow processes that do not belong to root are real users who are logged in. This will force a restart and will solve the problem. Killing AppleVNCServer does not solve anything.

So type this:

$ ps -axl | grep loginwindow
 1025   104     1 80004104   0  50  0  2527372   7792 -      Ss   29a11d20 ??         0:02.48 /System/Library/CoreServices/loginwindow.app/Contents/MacOS/loginwindow console
 1084 71197   187 80004104   0  50  0  2524484   7148 -      Ss   285e8000 ??         0:00.37 /System/Library/CoreServices/loginwindow.app/Contents/MacOS/loginwindow
    0 74219 187 80004004 0 50 0 2495952 5596 - Ss 2fb6aa80 ?? 0:00.17 /System/Library/CoreServices/loginwindow.app/Contents/MacOS/loginwindow
 1025 80214 70523     4006   0  31  0  2426848    336 -      R+   2b9e77e0 ttys000    0:00.00 grep loginwindow

The one that belongs to user 0 (first column) is the one. In the example here, you would kill it with:

sudo kill -9 74219





0 comments

Jobs Control in Unix

Some simple assumptions: the syntax is for the C-shell (csh or tcsh), not bash or sh. sudo is a command that calls a program as root.


If one wants to send a job to the background in Unix, most people know that you append '&' to the command:


% sudo /usr/libexec/locate.updatedb &


It is in the background, as you can see with the command:


% jobs -l

[1] 23035 Running sudo /usr/libexec/locate.updatedb


[1] is the job number ([1], [2], [3], etc...). Don't confuse the job number with the PID 23035 (Process ID number). You can bring the job back to the foreground with fg %1 (or just fg):

% fg


But what do you do when you have the job in the foreground and you want to send it to the background (i.e. you are "stuck" because you forgot the &, or you simply changed your mind and now you want the job in the background)? Ctrl-C will kill the job, and you don't want that. If you type Control-Z, the job gets "suspended" which means, it is in not in the foreground anymore, but it is not running either. For instance if you type:


% sudo /usr/libexec/locate.updatedb

then Control-Z, the shell will respond with:


^Z Suspended


Listing the jobs will give you the following:


% jobs -l

[1] + 23039 Suspended sudo /usr/libexec/locate.updatedb


To make it run again, you have two options: use the bg command (which means "make it RUN in the background" or "change its status from suspended to running):


% bg

Also, and that I did not know until recently, you can use kill -CONT <pid>, where <pid> is the process id number. In this context, kill is actually a pretty bad command name: it does not "kill" the program. It sends a signal to the program (which, if you don't use -CONT(inue), will be by default -TERM(inate)). Similarly, you can send a -STOP signal to suspend a job that is already in the background.


For instance, if you are running a lengthy and CPU consuming job (or disk consuming job), like:


% sudo /usr/libexec/locate.updatedb & [1]

You can see it running in the background:


% jobs -l

[1] + 21035 Running sudo /usr/libexec/locate.updatedb

where [1] is the job number and 21035 is the pid. By issuing:


% kill -STOP 21035

you will suspend the job, until you use the bg command with the job number, or kill -CONT with the process ID number.



0 comments

CARS Widget

This Mac OS X widget can calculate the Pump, Stokes, anti-Stokes wavelengths and corresponding vibration frequency.



Widget here: Download file "CARSCalculator.zip"
Source code here: Download file "CARScalculator.dcproj.zip"



0 comments

Beam Properties Widget


Beam Properties Widget is a Mac OS X widget that calculates laser pulse parameters in various units. You can quickly calculate bandwidth in wavenumbers, length or frequency, whether pulses are transform-limited or not. You can use multiple width definitions (1/e field or intensity, FWHM, HWHM). They all assume gaussian pulses.

You can also calculate an approximation of the peak and average power, irradiance and electric field (currently approximate only since square pulses and disk profiles are assumed).

Get it here: Download file "BeamProperties.zip"
It should really be called Pulse Properties Widget, but it is not.



The extremely complicated mathematics don't fit on the front page and are available on the back panel:

Version 1.5.3 Added Rayleigh range, reorganized interface
Version 1.5: Added power, irradiance and electric field calculation. Approximate: assumes square pulse and disk.
Version 1.3.2: First public release

Link to this page.

0 comments

Quartz Composer and animations

I have made several animations over the years for my presentations in microscopy. They are available here. If you use them please give credit to Daniel.Cote@crulrg.ulaval.ca with a link to this page if possible: http://cafeine.crulrg.ulaval.ca/users/dccote/

Laser Scanning microscopy

This animation shows a laser beam (on the left) scanned onto a sample of spinal cord and illustrates multimodal imaging. In blue, you see myelin, in red you see reflectance. As you scan a "green" beam onto the sample an image of myelin is obtained. If you change the laser color to "yellow", the reflectance image is produced. The green and yellow lasers are for illustration purposes: it shows one can change the physical interaction to obtain different contrast.





Coherent versus incoherent effects




CARS versus Raman

This animation shows the difference between Raman scattering and coherent Raman scattering.


Three different vibration modes in a molecule



Healthy myelin, degrading myelin







0 comments

iPhoton Releases

The current release of iPhoton is 5.5.4beta2. What's new in this release:

5.5.4

  • Improved real-time averaging, now works for PPC
  • Better version of MovieStitcher

5.5.3

  • Improved real-time display for averaging and summing. Now much faster. You still need to stop/start if you change from average to sum.
  • Driver issue solved.

[This page will describe everything one needs to know about iPhoton: current releases, planned updates, planned features, requests, etc... If I even get off my sorry bum, I may write documentation.]

0 comments

Buzzz.tv

L'expérience Buzzz.tv est très intéressante.  Voir ici.

À partir des données disponibles sur le site de Buzzz.tv, on peut faire beaucoup d'analyse. Nous avons accès à:
1) Statistiques d'utilisateurs (sexe, tendance politique)
2) Vote dans le temps (par chaque utilisateur)
3) Qui parlait à quel moment

J'ai fait un graphique selon la règle suivante (mes petits programmes d'analyse seront à la fin de cet article un jour, pas aujourd'hui):

Si un politicien parle, il accumule le score total des voteurs.  Ainsi:
1) Si la moyenne des gens trouve la discussion intéressante, le politicien "gagnera des points positifs".
2) Au contraire, si la majorité des gens trouvent la discussion moins bonne, le score sera négatif et le politicien "perd des points"

C'est comme au hockey: on est sur la glace quand notre équipe compte, youppi.  On est sur la glace quand l'autre équipe compte, bou-hou.

On fait le total pour tous les votes cumulés et on obtient ceci, ce qui semble indiquer que Pauline a complètement dominé ses adversaires:

score-brut.png

Comme tout bon scientifique, on sait qu'un graphique c'est bien, mais un regard critique sur les données, c'est mieux.  À partir des données disponibles sur le site de Buzzz.tv, on peut faire un histogramme de l'activité des voteurs.  On peut simplement compter le nombre moyen de votes par interval de 10 secondes pour voir si certains voteurs était très actifs:


histogramme-brut.png
On remarque que: 1) certains voteurs n'ont presque pas voté et 2) que d'autres semble voter beaucoup.
On se rend compte rapidement que certaines personnes ont probablement triché (surprise!) parce que Buzzz.tv limitait (par son interface Web) le nombre de votes à 1 par 10 secondes (pour chaque bouton).  Donc on identifie facilement quelques tricheurs:

histogramme-tricheurs.png

Pour voir de quoi il en est, on refait le même graphique que le premier graphique (le Score Des Politiciens) mais seulement pour les "tricheurs", c'est-à-dire ceux qui ont voté plus de 750 fois en 120 minutes (i.e. plus que la limite).
On se rend compte que si ce n'était que de ces gens-là, le Score aurait été comme suit:

score-tricheurs.png
On comprend donc que les gens qui ont voté plus souvent que tout le monde semble avoir voté en permanence pour un candidat et contre les deux autres, sans pratiquement aucune exception. Donc on refait le graphique de score, mais sans ces quelques tricheurs. On obtient plutôt ceci:

score-sanstricheurs.png
Ce graphique est différent du premier graphique obtenu, avec une moins forte domination d'un seul débateur.

Note: pour attaquer la disparité d'activité des voteurs de façon plus générale, et pour obtenir une meilleure appréciation des données, après un peu d'analyse on peut obtenir le graphique suivant en classant les usagers-voteurs en ordre d'activité.  On peut voir que 870 usagers sont responsable de 50% des votes, et que l'autre 50% est donné par 120 usagers seulement.


proportion


2e note: [ceci demande à être travaillé, ce n'est pas clair et le graphique devrait tourner] j'essaie de trouver une facon de caractériser les profils de voteurs.  J'ai utilisé l'algorithme suivant pour au moins identifier les tendances: je cumule les votes associés à chaque voteurs en l'associant à chaque débateur.  À la fin, j'obtiens un vecteur (x,y,z) qui représente la la somme des vote pour chaque debateur ou les trois coordonnées sont (Liberal, ADQ, PQ), pour des raisons qui seront évidentes: le vecteur représente la tendance et la couleur en RGB sur un graphique.


0 comments

CoreFoundation Lite on Linux

When performing calculations with complex input parameters, it is difficult to provide an input file that is both complete for the program and readable to the person performing the computation. Over the years, I have found that the most appropriate choice for a file format is XML: it is fully user-defined and hierarchical and is a text format. It is very powerful and flexible, and this is why many companies now use this file format for their documents. Apple uses XML throughout its operating system in files called Property Lists. There is a property list editor on OS X that is easy to use and the format can be read and written easily. In addition, OS X provides extremely convenient functions to read and write property files through a library Core Foundation), and they integrate perfectly with all the OS X libraries (including Cocoa).

As part of its open source effort, Apple has provided a lite version of Core Foundation called CF-Lite. There is a lot of interest in using Apple's CoreFoundation Lite (or CF-lite) on Linux. The CF-Lite library allows reading of property lists on computers other than OS X. Hence, computer clusters running Linux can take advantage of this format without having to use other XML libraries that are not ideal for Property Lists.

Apple has written an article on modifying the library (http://developer.apple.com/opensource/cflite.html) and explains how to take the CF-Lite 299.33 library and compile it on Linux. However, if you follow the instructions, it highly likely will not work because most Linux distributions now use GCC version 4 and many things have changed. For instance, you will get many errors like this:

./Base.subproj/CFRuntime.c:423: error: invalid lvalue in assignment

because of statements like this:

(intptr_t)cf -= sizeof(CFAllocatorRef);

or like this:

while (numBytes-- > 0) *(((uint8_t *)bitmap)++) = nonFillValue;

If you try to use more recent versions of CF-Lite (476.14 on OS X 10.5.x, 550 on 10.6.x) it will not work either because there are deep modifications of the code that are required to make it work on Linux. So far, I have only had success for CF-299.33, so it is easier to simply fix CF-299.33. To make a long story short, all "lvalue error" statements need to be rewritten like this:

*(intptr_t*)&cf -= sizeof(CFAllocatorRef);

or

*(((uint8_t *)bitmap)++ becomes *((*(uint8_t **)bitmap)++

I have made a patch to modify the original version (and it includes the patch from http://developer.apple.com/opensource/cflite.html). You can download the modified code here:


  1. Download CF-299.33 lite for linux (Download file "CF-299.33-linuxgcc4.tar.gz")
  2. untar with tar xzvf CF-299.33-linuxgcc4.tar.gz
  3. cd CF-299.33-linuxgcc4
  4. make DSTROOT=$HOME install

If you want to work from the original code, follow the instructions below:
  1. Download CF-299.33 lite, either from Apple or from here (Download file "CF-299.33.tar.gz")
  2. Download the patch for linux GCC4 (here: Download file "CF-299.33-linuxgcc4.diff")
  3. Untar with tar xzvf CF-299.33.tar.gz
  4. patch -p0 < CF-299.33-linuxgcc4.diff
  5. cd CF-299.33
  6. mkdir CoreFoundation
  7. cd CoreFoundation
  8. find .. -name "*.h" -exec ln -s {} \;
  9. make DSTROOT=$HOME install

When you are done, you will have two new directories in your home directory: include and lib. In include, you wil find another directory called CoreFoundation with all the headers for using the CF-Lite library. In lib, you will find three libraries (optimized, debug and profile).

You can then use the library for reading these property lists (there is an example program called plistReader.c in the directory, you can also get it at: http://developer.apple.com/opensource/cflite.html).

Other resources related to CF-Lite and other open source efforts:



1 comment

First Post

I am supposed to start writing stuff here for the world to read.

0 comments