ℝolliℵg M∀th Thr∑a∂

Message Bookmarked
Bookmark Removed
Not all messages are displayed: show all messages (1159 of them)

"data science" is this meaninglessly general term that is starting to be usefully divided up in to "product data science" (e.g. machine learning in the product) and "analytics" (e.g. decision science/business intelligence).

R is virtually useless in the first, but much more useful in the second, which is more traditional stats and batch/static reporting.

𝔠𝔞𝔢𝔨 (caek), Thursday, 30 June 2016 12:31 (seven years ago) link

yes the work my wife is looking at is in "data analytics" particularly. the company she's looking at right now wants (in addition to a doctorate in math or stats, and English fluency) capacity with SQL, and R and/or Python and/or Excel. I lolled at Excel but I think that says well what they want.

droit au butt (Euler), Thursday, 30 June 2016 13:05 (seven years ago) link

"data science" is this meaninglessly general term that is starting to be usefully divided up in to "product data science" (e.g. machine learning in the product) and "analytics" (e.g. decision science/business intelligence).

R is virtually useless in the first, but much more useful in the second, which is more traditional stats and batch/static reporting.

― 𝔠𝔞𝔢𝔨 (caek), Thursday, June 30, 2016 8:31 AM (1 hour ago) Bookmark Flag Post Permalink

i work in analytics but there's tonnes of ML in R

i used to lol at Excel when i was in school but it's the least pain in the ass way to just look at data quickly imo, which is extremely useful in the job

ty for R inferno, this is hilarious

de l'asshole (flopson), Thursday, 30 June 2016 13:50 (seven years ago) link

R has ML libraries, sure. so does javascript. they don't get used in product though.

𝔠𝔞𝔢𝔨 (caek), Thursday, 30 June 2016 13:54 (seven years ago) link

what does that mean?

de l'asshole (flopson), Thursday, 30 June 2016 14:09 (seven years ago) link

as far as i've experienced, r doesn't get used as the backend for web apps, for collaborative filtering at web scale, for CNNs, etc. these are the use cases i mean when i say "product".

𝔠𝔞𝔢𝔨 (caek), Thursday, 30 June 2016 14:19 (seven years ago) link

you can probably do all those things in r (write an api, collaborative filtering, train a neural network, etc.), but i don't know anybody who does in production.

𝔠𝔞𝔢𝔨 (caek), Thursday, 30 June 2016 14:23 (seven years ago) link

it doesn't seem that needs chez moi involve developing apps of any kind, that's for the developers afaict, not the analysts, but I dunno. from what I've read of R it seems silly to do development there.

droit au butt (Euler), Thursday, 30 June 2016 15:23 (seven years ago) link

i once got asked in an interview "what kind of data scientist are you" and it turned out he was getting at this product/production vs analyst distinction. i think it's real, and IME r definitely falls on one side of it in practice, and that's at least in part because of the design of the language (rather than mere social network effects). but to be clear there are tons of jobs where r is far and away the most useful language you can know.

𝔠𝔞𝔢𝔨 (caek), Thursday, 30 June 2016 15:27 (seven years ago) link

yeah I mean we're just reading ads but it seems to me if you want a doctorate in math/stats then you're not just looking for a developer. but I dunno.

droit au butt (Euler), Thursday, 30 June 2016 15:29 (seven years ago) link

this is extremely reductive and misses out on tons of factors/complications, but gives a very rough idea of what's most valuable to know. valuable != necessary of course.

https://duu86o6n09pv.cloudfront.net/reports/2015-data-science-salary-survey.pdf

𝔠𝔞𝔢𝔨 (caek), Thursday, 30 June 2016 15:37 (seven years ago) link

huh that's interesting and helpful

here's a very stupid question: is there some recommended "certification" for having learned these tools, or can you just pick them up on your own and then list it on your cv/resumé ? my own CS degree is like 20 years old & I don't remember anything about that (& my wife doesn't have any CS degrees, just math, though she used Matlab a lot for her dissertation, in applied math). like what do self-trained people in these tools have to do to convince employers that they can use them? or will this come out in some test in an interview?

droit au butt (Euler), Thursday, 30 June 2016 15:51 (seven years ago) link

for data science, it's less of a problem to be a self taught coder in "tech" businesses than in more traditional business. the discipline is mature enough that there's a fairly good change you end up being interviewed by someone who themselves has a strong quant but non-CS phd.

so, given a maths phd, i don't think further credentials are strictly necessary.

that said, there's a cottage industry of boot camps/recruitment things that make the transition quite a lot easier (and perhaps more lucrative), either by formally teaching stuff and providing credentials, providing an environment in which your "job" is to learn for a few weeks, or helping with applications/interviews. http://insightdatascience.com/ is the best known of these.

if your wife knows matlab already, then i recommend andrew ng's coursera machine learning course. it's intellectually interesting but it's also excellent interview prep. the only thing i didn't like about it was that the exercises were in matlab, because i had to waste time learning that. i put that (and a couple of other coursera courses) on my resume my first time out, but i don't think anyone noticed or cared about how i'd acquired the knowledge.

𝔠𝔞𝔢𝔨 (caek), Thursday, 30 June 2016 16:01 (seven years ago) link

ok super, we'll have a look. she's got plenty of time for coursera courses; right now she's working through an O'Reilly book on R and it's going easily as expected.

droit au butt (Euler), Thursday, 30 June 2016 16:04 (seven years ago) link

(major caveat with any advice i give: my experience and network is all tech/startup, which is an unusual industry and is not where most of the jobs are, i.e. healthcare, insurance, finance, etc.)

𝔠𝔞𝔢𝔨 (caek), Thursday, 30 June 2016 16:04 (seven years ago) link

right, she's looking at the tech/startup industry in Paris, which is quite weird as you can imagine.

droit au butt (Euler), Thursday, 30 June 2016 16:11 (seven years ago) link

(though one startup in Paris last year hired more mathematicians in France than all universities in France combined, and this is the current target)

droit au butt (Euler), Thursday, 30 June 2016 16:12 (seven years ago) link

caek does your ilxmail work? my wife has questions for you if you'd be willing.

droit au butt (Euler), Thursday, 30 June 2016 16:44 (seven years ago) link

i read this book

http://www-bcf.usc.edu/~gareth/ISL/

which does all the examples in R. the methods are outdated but perfect for getting the intuition, and the big themes bias-variance tradeoff are really well-developed. it's extremely easy and i got through it in a week. it's the baby version (created for an MBA class iirc) of Elements Of Statistical Learning, which i'm reading now

de l'asshole (flopson), Thursday, 30 June 2016 17:03 (seven years ago) link

i hear v good things about ESL and ISL

euler i think so, and sure!

𝔠𝔞𝔢𝔨 (caek), Thursday, 30 June 2016 17:05 (seven years ago) link

you can probably do all those things in r (write an api, collaborative filtering, train a neural network, etc.), but i don't know anybody who does in production.

ha, having said that, i saw on twitter this talk is happening today

http://schedule.user2016.org/event/7Sq2/gradient-boosted-trees-model-deploying-r-models-into-production-environments

𝔠𝔞𝔢𝔨 (caek), Thursday, 30 June 2016 18:05 (seven years ago) link

Allen/etaeoe, what's your favorite plotting library (in any language) right now?

I use ggplot2 all day every day, and while I try to keep my eye on new developments, I haven't yet found anything else yet that lets me get what's in my mind's eye onto a realized plot as quickly and easily. Lately I've been using Plotly with it, and wrapping ggplots in ggplotly() for some quick and easy interactivity (zooming, tool tips, etc)

Dan I., Thursday, 30 June 2016 18:59 (seven years ago) link

-1 "yet"

Dan I., Thursday, 30 June 2016 19:00 (seven years ago) link

Another good applied intro-level book along the lines of ISL is Max Kuhn's Applied Predictive Modeling, which gets into some hairier stuff that other sources tend to skip like how to deal with extreme class imbalance. He also touches on response surface methodology and multiobjective optimization, which is potentially so useful but I never see anybody else talking about (then again I don't come from an engineering background). Again, though, the book is R-based, so don't read it if you hate R.

Dan I., Thursday, 30 June 2016 20:47 (seven years ago) link

Allen/etaeoe, what's your favorite plotting library (in any language) right now?

“It’s complicated.”

Typically, I use visualizations either as descriptive statistics or as figures.

When I need a descriptive statistic (e.g. histogram, Q-Q, or scatter), I’ll continue to use Seaborn from Python and ggplot2 from R. I find them too verbose. Especially when compared to R’s default plotting functions. But they work.

When I need a figure, I’ll use D3 to render an SVG suitable for publication. I’ve tried Cytoscape too. If a figure is computationally expensive to render (e.g. more than one hundred thousand observations), I’ll use SVG or WebGL directly.

I’ve used TikZ too. It works.

Everything I’ve mentioned feels inadequate. When I used ggplot2 (matplotlib too) in 2005, it was a major revelation. TikZ too. However, it’s been an insane decade for mathematics and statistics. 2005’s tools feel way too limiting for the ideas I want to express in 2016.

Conceptually, D3 is fantastic. And Mike Bostock has been a champion for articulating the transition we’re undergoing. Unfortunately, I don’t think D3 should become the default option. It feels antithetical to both standard and emerging web technologies. And it’s isolated from the larger web ecosystem (e.g. D3 uses custom selection and data-binding operations).

I think Plot.ly’s Plotly.js library is sensible as a curated collection of D3 visualizations. But venture-backed visualization software makes me nervous.

I also feel burdened by the lack of contemporary visualization tools for common problems (e.g. volumetric images).

Allen (etaeoe), Sunday, 3 July 2016 20:45 (seven years ago) link

Don't want to appear uncharitable, but feel like this software angle should perhaps have its own thread.

Tarzan v. BMI (James Redd and the Blecchs), Sunday, 3 July 2016 20:47 (seven years ago) link

Unless you are using to calculate Catalan numbers, of course:)

Tarzan v. BMI (James Redd and the Blecchs), Sunday, 3 July 2016 20:54 (seven years ago) link

Don't want to appear uncharitable, but feel like this software angle should perhaps have its own thread.

Yeah. Someone should start a “statistics” (or “data science” or whatever) thread.

Allen (etaeoe), Sunday, 3 July 2016 20:59 (seven years ago) link

Unless you are using to calculate Catalan numbers, of course:)

Or,

http://i.stack.imgur.com/ceazj.png

Allen (etaeoe), Sunday, 3 July 2016 21:00 (seven years ago) link

Don't want to appear uncharitable, but feel like this software angle should perhaps have its own thread.

― Tarzan v. BMI (James Redd and the Blecchs), Sunday, July 3, 2016 3:47 PM (Yesterday) Bookmark Flag Post Permalink

Unless you are using to calculate Catalan numbers, of course:)

― Tarzan v. BMI (James Redd and the Blecchs), Sunday, July 3, 2016 3:54 PM (Yesterday) Bookmark Flag Post Permalink

Maybe Catalan numbers should have their own thread, they rule

Guayaquil (eephus!), Monday, 4 July 2016 19:36 (seven years ago) link

They definitely have their own book or two.

My City Slang Was Gone (James Redd and the Blecchs), Monday, 4 July 2016 19:56 (seven years ago) link

During the "grande affaire" of the earlier twentieth century debate on The Theory of Relativity between Albert Einstein and Henri Bergson, Paul Valéry, the French poet, diarist, and general man of ideas and letters, who corresponded with both on friendly terms, acted as a middleman on at least one occasion, accompanying Einstein on a visit in 1922 to Bergson's home.

My City Slang Was Gone (James Redd and the Blecchs), Monday, 4 July 2016 20:08 (seven years ago) link

Ha, wrong thread, mostly.

My City Slang Was Gone (James Redd and the Blecchs), Monday, 4 July 2016 20:12 (seven years ago) link

i'm against a separate 'data science' thread via apprehension of other ilxors posting their 'opinions' on it. everyone except us seems to ignore this one B-)

de l'asshole (flopson), Monday, 4 July 2016 20:54 (seven years ago) link

ive successfully avoided doing just that so far fwiw :/

( ^_^) (Lamp), Monday, 4 July 2016 21:30 (seven years ago) link

RIP Kalman. almost broke my brain trying to understand your filter in time-series stats class :-)

http://hungarytoday.hu/news/renowned-hungarian-scientis-rudolf-kalman-dies-aged-86-46732

de l'asshole (flopson), Friday, 8 July 2016 16:00 (seven years ago) link

RIP

Hare in the Gated Snare (James Redd and the Blecchs), Saturday, 9 July 2016 01:06 (seven years ago) link

My "aha" moment in getting the Kalman filter was when deriving a simple version of it myself as a special case of the Bayes theorem, iirc.

anatol_merklich, Monday, 18 July 2016 08:51 (seven years ago) link

can you show us?

de l'asshole (flopson), Monday, 18 July 2016 17:09 (seven years ago) link

Proof is left to the readers.

Death of a Disco Mystic (James Redd and the Blecchs), Monday, 18 July 2016 20:17 (seven years ago) link

The proof is obvious

Miami Jeeves And The Ties That Bind (James Redd and the Blecchs), Monday, 18 July 2016 20:22 (seven years ago) link

Or is it?

Miami Jeeves And The Ties That Bind (James Redd and the Blecchs), Monday, 18 July 2016 20:23 (seven years ago) link

*leaves thread*

Miami Jeeves And The Ties That Bind (James Redd and the Blecchs), Monday, 18 July 2016 20:23 (seven years ago) link

*time passes*

Miami Jeeves And The Ties That Bind (James Redd and the Blecchs), Monday, 18 July 2016 20:23 (seven years ago) link

Yes, it's obvious

Miami Jeeves And The Ties That Bind (James Redd and the Blecchs), Monday, 18 July 2016 20:24 (seven years ago) link

Been a long time, I'll see if I can reproduce the aha. :-)

anatol_merklich, Tuesday, 19 July 2016 06:06 (seven years ago) link

https://twitter.com/AnalysisFact

flopson, Wednesday, 20 July 2016 16:53 (seven years ago) link

http://www.johndcook.com/blog/twitter_page/

flopson, Wednesday, 20 July 2016 16:55 (seven years ago) link

Euler buy this for your wife for xmas ;-) http://r4ds.had.co.nz/introduction-1.html

flopson, Friday, 22 July 2016 21:12 (seven years ago) link

looks good!

droit au butt (Euler), Saturday, 23 July 2016 15:46 (seven years ago) link


You must be logged in to post. Please either login here, or if you are not registered, you may register here.