How much preprocessing Vowpal Wabbit input needs?


Keywords:r 


Question: 

I know that vw can handle very raw data(e.g. raw text) but for instance should one consider scaling numerical features before feeding the data to vw? Consider the following line:

1 |n age: 80.0 height: 180.0 |c male london |d the:1 cat:2 went:3 out:4

Assuming that typical age ranges from 1 to 100 and height(in centimeters) may range from 140 to 220, is it better to transform/scale the age and height so they share a common range? I think many algorithms may need this kinda of preprocessing on their input data, for example Linear Regression.


1 Answer: 

vw SGD is highly enhanced vs the vanilla naive SGD so pre-scaling isn't needed.

If you have very few instances (small data-set), pre-scaling may help somewhat.

vw does automatic normalization for scale by remembering the range of each feature as it goes, so pre-scaling is rarely needed to achieve good results.

Normalization for scale, rarity and importance is applied by default. The relevant vw options are:

--normalized
--adaptive
--invariant

If any of them appears on the command line, the others are not applied. By default all three are applied.

See also: this stackoverflow answer

The paper explaining the enhanced SGD algorithm in vw is:

Online Importance Weight Aware Updates - Nikos Karampatziakis & John Langford