Educational Revolutioners: January 2015

Tuesday, 20 January 2015

Looking for a Free Backup Solution? Try Areca

Areca Backup is an open source file backup utility that comes with a lot of features, while also being easy to use. It provides a large number of backup options, which make it stand out among the various other backup utilities. This article will help you learn about its features, installation and use on the Linux platform.
Areca Backup is personal file backup software written in Java by Olivier Petrucci and released under GNU GPL v2. It’s been extensively developed to run on major platforms like Windows and Linux, providing users a large number of configurable options with which to select their files and directories for backup, choose where and how to store them, set up post-backup actions and much more. This article deals with Areca on the Linux platform.

Features
To start with, it must be made clear that Areca is by no means a disk-ghosting application. That is, it will not be able to make an image of your disk partitions (as Norton Ghost does), mainly because of file permissions. Areca, along with a backup engine, includes a great GUI and CLI. It’s been designed to be as simple, versatile and interactive as possible. A few of the application’s features are:

Zip/Zip64 compression and AES 128/AES 256 archive encryption algorithms
Storage on local drive, network drive, USB key, FTP/FTPs (with implicit and explicit SSL/TLS encryption) or SFTP server
Incremental, differential and full backup support
Support for delta backup
Backup filters (by extension, sub-directory, regexp, size, date, status and usage)
Archive merges
As of date recovery
Backup reports
Tools to help you handle your archives easily and efficiently, such as Backup, Archive Recovery, Archive Merge, Archive Deletion, Archive Explorer, History Explorer

Installation
Areca is developed in Java, so you need to have the Java Virtual Machine v1.4 or higher already installed and running on your system. You can verify this by checking it in the command-line:

$ java -version

In case you come up with a false result, you can download and install it from http://java.sun.com/javase/downloads/index.jsp
To install Areca, you need to download the latest release from http://sourceforge.net/project/showfiles.php?group_id=171505 and retrieve its contents on your disk. To make Areca executable from the console, go to the extracted Areca directory and run the commands given below:

$ chmod a+x areca.sh areca_check_version.sh

$ chmod a+x -v bin/*

Now you can easily launch Areca from your console with

./areca.sh for Graphical User Interface
./bin/run_tui.sh for Command Line Interface

Now that you’ve set up the entire thing, let’s understand the basics of Areca—what you’ll need to know before getting started with creating your first backup archive.

Basics Storage modes: Areca follows three different storage modes.

Standard (by default), where a new archive is created on each backup.
Delta (for advanced users), where a new archive is created on each backup, consisting of modified parts of files since the last backup.
Image is a unique backup created, which updates on each backup.

Target: A backup task is termed as ‘target’ in Areca’s terminology. A target defines the following things.

Sources: It defines the files and directories to be stored in the archive at backup.
Destination: It defines the place to store your archives such as file system (external hard drive, USB key, etc) or even your FTP server.
Compression and encryption: You may even define how to store your archives, i.e., compressing into a Zip file if data is large or encrypting the archival data to keep it safe, so that it can be decrypted only by using Areca with the correct decryption key.

Your first backup with Areca
After successfully passing through all the checkpoints, you can now move on to creating your first backup with Areca. First, execute the Areca GUI by running ./areca.sh from the console. You’ll see a window (as shown in Figure 1) open up on your screen. Let’s configure a few things.
Set your workspace: The section on the left of the window is your workspace area. The Select button here can be used to set your workspace location. This should be the safe location on your computer, where Areca saves its configuration files. You can see the default workspace location here.

Figure 2

Figure 3 The main window show your current targets

Set your target: Now you need to set up your target in order to run your first backup. Go to Edit > New Target. You’ll have something like what’s shown in Figure 2. Now set your Target name, Local Repository (this is where your backup archive is saved), Archive’s name and also Sources by switching the tab at the left, and then do any other configuration you’d like to. Next, click on Save. Your target has been created. Your main window now looks something like what’s shown in Figure 3.
Running your backup: After doing all that is necessary, you can run your first backup. Go to Run > Backup. Then select Use Default Working Directory to use a temporary sub-directory (created at the same location as the archives). Click on Start Backup. Great, so you have now created your first backup.
Recovery: You have a backup archive of your data now. This may be used at any time to recover your lost data. Just select your target from the workspace on the left and right click on the archive on the right section, which you wish to use to recover your data. Click Recover, choose the location, and click OK.
At this stage, you can easily create backups using the Areca GUI. However, you can further learn to configure your backups at http://areca-backup.org/tutorial.php.
Using the command line interface
You just used the Areca GUI to create a backup and recover your data again. Although the GUI is the preferred option, you may use the CLI, too, for the same purpose. This may seem good to those comfortable with the console. However, this is also useful in the case of scheduled backups.
To run it, just go to the Areca directory and follow up with the general syntax below:

$ ./bin/run_tui.sh <command> <options>

Here are the few basic commands you’ll need to create backups of your data and recover it using the console. All you need to have as a prerequisite is the areca config xml file, which you must generate from the GUI; else, http://areca-backup.org/config.php is good to follow.
1. You may get the textual description of a target group by using the describe command as shown below:

$ ./bin/run_tui.sh describe -config <your xml config file>

2. You may launch a backup on a target or a group of targets using the backup command as follows:

$ ./bin/run_tui.sh backup -config <your xml config file> [-target <target>] [-f] [-d] [-c] [-s] [-title <archive title>]

Here, [-f], [-d], [-c], [-s] are used in the case of a full backup, differential backup, for checking archive consistency after backup and for target groups, respectively.
3. If you have a backup, recover your data easily using recover as follows:

$ ./bin/run_tui.sh recover -config <config file> -target <target> -destination <destination folder> -date <recovery date: YYYY-MM-DD> [-c]

Here [-c] is to check and verify the recovered data.
You can learn more about command line usage at http://areca-backup.org/documentation.php.
Final verdict
The Areca Backup tool is one of the best personal file backup tools when you look for options in open source. Despite having a few limitations such as no support for VSS (Volume Shadow Copy Service) and its inability to create backups for files locked by other programs, Areca serves users well due to its wide variety of features. Moreover, it has a separate database of plugins which may be used to overcome almost all of its limitations. If you are looking for a personal file backup utility, go for nothing but Areca.

Create Your First App with Android Studio

Android Studio is a new Android development environment developed by Google. It is based on IntellJ IDEA, which is similar to Eclipse with the ADT plugin. Let’s get familiar with the installation of Android Studio and some of the precautions that must be taken during installation.

Android is gaining market share and opening up new horizons for those who want to develop Android apps. Android app development doesn’’t require any investments because all the tools needed for it are free. It has been quite a while since Android app development began and most of us are aware of how things work. Just install Java, then install Eclipse, download the ADT (Android Development Toolkit) bundle, do a bit of configuration and you are all set to develop Android apps. Google provides us with a new IDE called Android Studio, which is based on IntellJ IDEA. It is different from Eclipse in many ways. The most basic difference is that you don’t have to do any configuration like you would have to for Eclipse. Android Studio comes bundled with the Android ADT, and all you need to do is to point it to where Java is installed on your system. In this article, I will cover a few major differences between Android Studio and the Eclipse+ADT plugin methods. Android Studio is currently available as an “‘easy access preview” or developer preview, so several features will not be available and there are chances that you may encounter bugs. First, let’’s install Android Studio. I’’m assuming your’s is a Windows OS with pre-installed JDK as the developer machine’s configuration. One thing to check is that the JDK version is later than version 6. Next, go to the link:http://developer.android.com/sdk/installing/studio.html. Here, you’’ll see the button for downloading Android Studio. The Web page automatically recognises your OS and provides you with the compatible version. If you need to download for some other OS, just click on ‘Download for other Platforms’ (refer to Figure 1). Once downloaded, you can follow the set-up wizard. There might be a few challenges. For example, at times, for Windows systems, the launcher script isn’’t able to find Java. So you need to set an environment variable called JAVA_HOME and point it to your JDK folder. If you are on Windows 8, you can follow these steps to set an environment variable: click on Computer -> Properties -> Advanced System Settings -> Advanced Tab (on the System Properties Dialogue) -> Environment Variables. Then, underSystem Variables, click on New. Another problem might be the PATH variable. In the same manner as above, reach the Environment Variable dialogue box, and there, instead of creating a new variable, find the existing PATH variable and edit it. To the existing value, just add a semicolon at the end (if it’s not already there) and add the path to the bin folder of the JDK. Also, please note that if you are working with a 64-bit machine, the path to JDK should be something like: C:\Program Files\Java\jdk1.7.0_21 and not C:\Program Files (x86)\Java\jdk1.7.0. If you don’t have it in the former location, it means that a 64-bit version of Java isn’t installed on your system; so install that first.

Figure 1 – Download Android Studio

Figure 2 Welcome Screen

Now that the set-up is complete, we can go ahead and directly launch the Android Studio. There is no need to download the ADT plugin and configure it. When you launch it, you can see the Welcome screen (refer to Figure 2), which is very powerful and deep. You can directly check out the Version Control Systems from the Welcome screen itself. The Version Control Systems supported are GitHub, CVS, Git, Mercurial and Subversion. Then, from the Configure menu within the Welcome screen, you can configure the SDK manager, plugins, import/export settings, project default settings and the overall settings for the IDE—all this without even launching the IDE. You can also access the Docs and the How-Tos from the Welcome screen. Next, the New Project screen is almost similar to what it looked like in Eclipse, but now there’s no need to select Android Application or anything else. You are directly at the spot from where you can start off a new Android Project (refer to Figure 3). Among other interesting things about Android Studio is the ‘Tip of the day’ section (refer to Figure 4), which makes you familiar with the IDE.

Figure 3 New Project

Figure 4 Tip of the day

Figure 5 Different Layout Preview

Now, let’’s focus on some specific features that come with Android Studio (and quoting directly from the Android Developers Web page):

Gradle-based build support.
Android-specific refactoring and quick fixes.
Lint tools to catch performance, usability, version compatibility and other problems.
ProGuard and app-signing capabilities.
Template-based wizards to create common Android designs and components.
A rich layout editor that allows you to drag-and-drop UI components, preview layouts on multiple screen configurations, and much more.
Built-in support for Google Cloud Platform, making it easy to integrate Google Cloud Messaging and App Engine as server-side components.

One of the major changes with respect to Eclipse is the use of Gradle. Previously, Android used Ant for build, but with Android Studio, this task has been taken over by Gradle. In last year’s Google I/O, sources at Google had talked about the new Android build system –- Gradle. To quote from the Gradle website: “Google selected Gradle as the foundation of the Android SDK build system because it provides flexibility along with the ability to define common standards for Android builds. With Gradle, Android developers can use a simple, declarative DSL to configure Gradle builds supporting a wide variety of Android devices and App stores. With a simple, declarative DSL, Gradle developers have access to a single, authoritative build that powers both the Android Studio IDE and builds from the command-line.” Owing to Gradle, people will also notice a change in the project structure as compared to the project structure in Eclipse. Everything now resides inside the SRC folder. But from a developer’s perspective, it is essentially all still the same. The other major, and rather useful, change is the ability to preview the layout on different screen sizes (refer to Figure 5, as shown during the Google I/O last year). While retaining the drag-and-drop designer function, the text mode has the preview pane to the right which allows for previewing the layout on various screen sizes. There is also an option for making a landscape variation design for the same app without having to do anything much on the code level. This is just the tip of the iceberg, and the features discussed above are amongst the major changes in terms of build and layout designing. I would encourage zealous developers who want to try out this IDE to visit the Android developers’ page and check out the Android Studio section. It is definitely a different way to approach Android app development, with the focus shifting towards development rather than configuration and management.

Analyse Your Data with Pandas

Here’s an introduction to Pandas, an open source software library that’s written in Python for data manipulation and analysis. Pandas facilitates the manipulation of numerical tables and the time series.

In recent times, it has been proven again and again that data has become an increasingly important resource. Now, with the Internet boom, large volumes of data are being generated every second. To stay ahead of the competition, companies need efficient ways of analysing this data, which can be represented as a matrix, using Python’s mathematical package, NumPy.

The problem with NumPy is that it doesn’t have sufficient data analysis tools built into it. This is where Pandas comes in. It is a data analysis package, which is built to integrate with NumPy arrays. Pandas has a lot of functionality, but we will cover only a small portion of it in this article.

Getting started
Installing Pandas is a one-step process if you use Pip. Run the following command to install Pandas.

sudo pip install pandas

If you face any difficulties, visit http://pandas.pydata.org/pandas-docs/stable/install.html. You can now try importing Pandas into your Python environment by issuing the following command:

import pandas

In this tutorial, we will be using data from Weather Underground. The dataset for this article can be downloaded from http://www.synesthesiam.com/assets/weather_year.csv and can be imported into Pandas using:

data = pandas.read_csv(weather_year.csv)

The read_csv function creates a dataframe. A dataframe is a tabular representation of the data read. You can get a summary of the dataset by printing the object. The output of the print is as follows:

data

<class pandas.core.frame.DataFrame>

Int64Index: 366 entries, 0 to 365

Data columns:

EDT 366 non-null values

Max TemperatureF 366 non-null values

Mean TemperatureF 366 non-null values

Min TemperatureF 366 non-null values

Max Dew PointF 366 non-null values

MeanDew PointF 366 non-null values

Min DewpointF 366 non-null values

Max Humidity 366 non-null values

Mean Humidity 366 non-null values

Min Humidity 366 non-null values

Max Sea Level PressureIn 366 non-null values

Mean Sea Level PressureIn 366 non-null values

Min Sea Level PressureIn 366 non-null values

Max VisibilityMiles 366 non-null values

Mean VisibilityMiles 366 non-null values

Min VisibilityMiles 366 non-null values

Max Wind SpeedMPH 366 non-null values

Mean Wind SpeedMPH 366 non-null values

Max Gust SpeedMPH 365 non-null values

PrecipitationIn 366 non-null values

CloudCover 366 non-null values

Events 162 non-null values

WindDirDegrees 366 non-null values

dtypes: float64(4), int64(16), object(3)

As you can see, there are 366 entries in the given dataframe. You can get the column names using data.columns.
The output of the command is given below:

data.columns

Index([EDT, Max TemperatureF, Mean TemperatureF, Min TemperatureF, Max Dew PointF, MeanDew PointF, Min DewpointF, Max Humidity,  Mean Humidity,  Min Humidity,  Max Sea Level PressureIn,  Mean Sea Level PressureIn,  Min Sea Level PressureIn,  Max VisibilityMiles,  Mean VisibilityMiles,  Min VisibilityMiles,  Max Wind SpeedMPH,  Mean Wind SpeedMPH,  Max Gust SpeedMPH, PrecipitationIn,  CloudCover,  Events,  WindDirDegrees], dtype=object)

To print a particular column of the dataframe, you can simply index it as data['EDT'] for a single column or data[['EDT','Max Humidity']] for multiple columns. The output for data['EDT'] is:

data[EDT]

0     2012-3-10

1     2012-3-11

2     2012-3-12

3     2012-3-13

4     2012-3-14

5     2012-3-15

6     2012-3-16

...

...

...

361     2013-3-6

362     2013-3-7

363     2013-3-8

364     2013-3-9

365    2013-3-10

Name: EDT, Length: 366

And the output for data[[EDT,Max Humidity]] is:

data[[EDT,Max Humidity]]

<class pandas.core.frame.DataFrame>

Int64Index: 366 entries, 0 to 365

Data columns:

EDT 366 non-null values

Max Humidity 366 non-null values

dtypes: int64(1), object(1)

Sometimes, it may be useful to only view a part of the data, just so that you can get a sense of what kind of data you are dealing with. Here you can use the head and tail functions to view the start and end of your dataframe:

data[Max Humidity].head()

74

78

90

93

93

Name: Max Humidity

Note: The head and tail functions take a parameter which sets the number of rows to be displayed. And can be used as data[Max Humidity].head(n), where ‘n’ is the number of rows. The default is 5.

Working with columns
Now that we have a basis on which to work with our dataframe, we can explore various useful functions provided by Pandas like std to compute the standard deviation, mean to compute the average value, sum to compute the sum of all elements in a column, etc. So if you want to compute the mean of the Max Humidity column, for instance, you can use the following commands:

data['Max Humidity'].mean()

90.027322404371589

data['Max Humidity'].sum()

32950

data['Max Humidity'].std()

9.10843757197798

Note: Most of the Pandas functions ignore NaNs, by default. These regularly occur in data and a convenient way of handling them must be established. This topic is covered more in detail later in this article.

The std and sum function can be used in a similar manner. Also, rather than running these functions on individual columns, you can run them on the entire dataframe, as follows:

data.mean()

Max TemperatureF 66.803279

Mean TemperatureF 55.683060

Min TemperatureF 44.101093

Max Dew PointF 49.549180

MeanDew PointF 44.057377

Min DewpointF 37.980874

Max Humidity 90.027322

Mean Humidity 67.860656

Min Humidity 45.193989

Max Sea Level PressureIn 30.108907

Mean Sea Level PressureIn 30.022705

Min Sea Level PressureIn 29.936831

Max VisibilityMiles 9.994536

Mean VisibilityMiles 8.732240

Min VisibilityMiles 5.797814

Max Wind SpeedMPH 16.418033

Mean Wind SpeedMPH 6.057377

Max Gust SpeedMPH 22.764384

CloudCover 2.885246

WindDirDegrees 189.704918

Using apply for bulk operations
As we have already seen, functions like mean, std and sum work on entire columns, but sometimes it may be useful to apply our own functions to entire columns of the dataframe. For this purpose, Pandas provides the apply function, which takes an anonymous function as a parameter and applies to every element in the column. In this example, let us try to get the square of every element in a column. We can do this with the following code:

data[Max Humidity].apply(lambda d: d**2)

0      5476

1      6084

2      8100

3      8649

4      8649

5      8100

...

...

...

361     8464

362     7225

363     7744

364     5625

365     2916

Name: Max Humidity, Length: 366

Note: In the Lambda function, the parameter d is implicitly passed to it by Pandas, and contains each element of the a column.

Now you may wonder why you can’t just do this with a loop. Well, the answer is that this operation was written in one single line, which saves code writing time and is much easier to read.

Dealing with NaN values
Pandas provides a function called isnull, which returns a ‘True’ or ‘False’ value depending on whether the value of an element in the column is NaN or None. These values are treated as missing values from the dataset, and so it is always convenient to deal with them separately. We can use the apply function to test every element in a column to see if any NaNs are present. You can use the following command:

e = data[Events].apply(lambda d: pandas.isnull(d))

e

0      True

1     False

2     False

3      True

4      True

5     False

...

361    False

362     True

363     True

364     True

365     True

Name:  Events, Length: 366

As you can see, a list of Booleans was returned, representing values that are NaN. Now there are two options of how to deal with the NaN values. First, you can choose to drop all rows with NaN values using the dropna function, in the following manner:

data.dropna(subset=[Events])

<class pandas.core.frame.DataFrame>

Int64Index: 162 entries, 1 to 361

Data columns:

EDT                                   162  non-null values

Max TemperatureF                  162  non-null values

Mean TemperatureF                 162  non-null values

Min TemperatureF                 162  non-null values

Max Dew PointF                    162  non-null values

MeanDew PointF                    162  non-null values

Min DewpointF                     162  non-null values

Max Humidity                      162  non-null values

 Mean Humidity                    162  non-null values

 Min Humidity                     162  non-null values

 Max Sea Level PressureIn        162  non-null values

 Mean Sea Level PressureIn   162  non-null values

 Min Sea Level PressureIn     162  non-null values

 Max VisibilityMiles              162  non-null values

 Mean VisibilityMiles             162  non-null values

 Min VisibilityMiles              162  non-null values

 Max Wind SpeedMPH           162  non-null values

 Mean Wind SpeedMPH         162  non-null values

 Max Gust SpeedMPH            162  non-null values

PrecipitationIn                   162  non-null values

 CloudCover                       162  non-null values

 Events                               162  non-null values

 WindDirDegrees                   162  non-null values

dtypes: float64(4), int64(16), object(3)

As you can see, there are only 162 rows, which don’t contain NaNs in the column Events. The other option you have is to replace the NaN values with something easier to deal with using the fillna function. You can do this in the following manner:

data[Events].fillna()

0

1                  Rain

2                  Rain

3

4

5     Rain-Thunderstorm

6

7      Fog-Thunderstorm

8                  Rain

362

363

364

365

Name:  Events, Length: 366

Accessing individual rows
So far we have discussed methods dealing with indexing entire columns, but what if you want to access a specific row in your dataframe? Well, Pandas provides a function called irow, which lets you get the value of a specific row. You can use it as follows:

data.irow(0)

EDT                           2012-3-10

Max TemperatureF                         56

Mean TemperatureF                        40

Min TemperatureF                         24

Max Dew PointF                           24

MeanDew PointF                           20

Min DewpointF                            16

Max Humidity                             74

 Mean Humidity                           50

 Min Humidity                            26

 Max Sea Level PressureIn             30.53

 Mean Sea Level PressureIn            30.45

 Min Sea Level PressureIn             30.34

 Max VisibilityMiles                     10

 Mean VisibilityMiles                    10

 Min VisibilityMiles                     10

 Max Wind SpeedMPH                       13

 Mean Wind SpeedMPH                       6

 Max Gust SpeedMPH                       17

PrecipitationIn                            0.00

 CloudCover                               0

 Events                                    NaN

 WindDirDegrees                         138

Name: 0

Note: Indices start from 0 for indexing the rows.

Filtering
Sometimes you may need to find rows of special interest to you. Let’s suppose we want to find out data points in our data frame, which have a mean temperature greater than 40 and less than 50.You can filter out values from your dataframe using the following syntax:

data[(data['Mean TemperatureF']>40) & (data['Mean TemperatureF']<50)]

<class 'pandas.core.frame.DataFrame'>

Int64Index: 51 entries, 1 to 364

Data columns:

EDT                                51  non-null values

Max TemperatureF                  51  non-null values

Mean TemperatureF                 51  non-null values

Min TemperatureF                  51  non-null values

Max Dew PointF                    51  non-null values

MeanDew PointF                    51  non-null values

Min DewpointF                     51  non-null values

Max Humidity                      51  non-null values

 Mean Humidity                    51  non-null values

 Min Humidity                     51  non-null values

 Max Sea Level PressureIn    51  non-null values

 Mean Sea Level PressureIn  51  non-null values

 Min Sea Level PressureIn     51  non-null values

 Max VisibilityMiles              51  non-null values

 Mean VisibilityMiles             51  non-null values

 Min VisibilityMiles              51  non-null values

 Max Wind SpeedMPH           51  non-null values

 Mean Wind SpeedMPH         51  non-null values

 Max Gust SpeedMPH            51  non-null values

PrecipitationIn                   51  non-null values

 CloudCover                       51  non-null values

 Events                               23  non-null values

 WindDirDegrees                   51  non-null values

dtypes: float64(4), int64(16), object(3)

Note: The output of the condition data[Mean TemperatureF]>40 and data[Mean TemperatureF]<50 return a NumPy array, and we must use the brackets to separate them before using the & operator, or else you will get an error message saying that the expression is ambiguous.

Now you can easily get meaningful data from your dataframe by simply filtering out the data that you aren’t interested in. This provides you with a very powerful technique that you can use in conjunction with higher Pandas functions to understand your data.

Getting data out
You can easily write data out by using the to_csv function to write your data out as a csv file.

data.to_csv(weather-mod.csv)

Want to make a separate tab? No problem.  

data.to_csv(data/weather-mod.tsv, sep=\t)

Note: Generally, the dataframe can be indexed by any Boolean NumPy array. In a sense, only values that are true will be retained. For example, if we use the variable e, (e = data[Events].apply(lambda d: pandas.isnull(d))) which contains the list of all rows that have NaN values for data[Events], as data[e], we will get a dataframe which has rows that only have NaN values for data[Events]

Educational Revolutioners

Tuesday, 20 January 2015

Looking for a Free Backup Solution? Try Areca

Create Your First App with Android Studio

Analyse Your Data with Pandas

Analyse Your Data with Pandas

Blog Archive

About Me