Using Boosting for SF Crime Classification Kaggle project

2/24/2016

In this blog post I will discuss my updated work on the San Francisco Crime Classification competition by Kaggle. The data and description of the competition is located at: https://www.kaggle.com/c/sf-crime

I had used Linear Discriminant Analysis and Random Forest. I have now been able to run Boosting and obtained much better log-loss scores

First of all, I was able to generate the new features "Intersection" and "Night", utilize the data.table package to read in large csv faster, and make use of sparse matrices to save memory from this link: https://brittlab.uwaterloo.ca/2015/11/01/KaggleSFcrime/

I then implemented Gradient Boosting by using the "caret" and "xgboost" packages. I first tried eta=.3 (the larger eta is, the smaller the regularization penalty term is). With Cross-Validation using 3 folds, I found that the 16th iteration produced the smallest logloss.mean value of 2.56. However, my previous submission to Kaggle using LDA produced a log-loss of around 2.58. Because the validation error is smaller than what the test error would be, I knew that this 2.56 value was unacceptable.

I then guessed that perhaps the previous LDA model overfitted the training set, so I tried increasing the regularization penalty term and decreased eta to 0.1 According to the xgboost documentation page, if you decrease eta, you must increase the number of boosting iterations. I thus tried 50 iterations. I then submitted this to Kaggle, and my logloss score was 2.43! That was much better than the 2.58 I got from LDA

I should also note that I tried to use the parameter tuning with "caret". However, it was running too slow on my machine. Just trying a 2-fold CV, with 40 max iterations on 3 different eta values ran for over 8 hours!

In addition to Boosting, I also tried to use Random Forests, use the bigRF package, and Neural Networks. In my previous analysis using Random Forests, I kept running into errors due to my computer not having enough memory. I recently purchased a new laptop with more RAM, but I have gotten those same errors with not sufficient memory when running Random Forests. I also could not get the bigRF package to work. I believe it was because it doesn't work on my version of R. As for Neural Networks, I am working on that as I type this post

You can find the code I used for this analysis at: https://github.com/jk34/Kaggle_SF_Crime_Classification/blob/master/run_improved.r

12 Comments

Vds Satın Al link

8/20/2022 07:03:53 pm

Sizde NestaCloud ile aradığınız sunuculara ışık hızında erişebilirsiniz. en güzel en kaliteli sunucu hizmetlerine erişmek artık çok kolay. Sunucu hizmetlerimize güveniyoruz. Haydi hemen indirimli paketlerimizden deneyin. Pişman olmayacaksınız. bize güvenebilirsiniz.

Youtube Mp3 indir link

8/21/2022 11:39:31 am

Mp3Video.org'u tercih etmelisiniz çünkü Youtube Mp3 ve Youtube Video İndir alanında lider teknoloji sağlayıcılarındandır. Hızlı altyapısı güvenilir sistemi. reklamsız ara yüzü ile kalite odaklı bir hizmet sunmaktadır. pişman olmama garantili bir servistir.

Türk anime link

8/22/2022 03:16:33 am

DonghuaTR İle Türkçe Animeler Ve Donghualar yani Türk Anime seçeneklerielinizin hemen altında! İstediğiniz içeriği En yüksek kalitede izleyebilirsiniz. Hızlı ve çalışkan ekibimiz her gün yeni bölümleri ışık hızında siz değerli kullanıcılarımıza sunar.

Manga Oku link

8/22/2022 09:52:48 pm

Manga Oku Tr ile Türkçe manga okumak artık bir tık uzağınızda! Hiç bir şekilde kaliteden ödün vermeyen yönetim kadromuz her zaman siz değerli okuyucularımıza en Popüler Manga. Manhwa ve Webtoon içeriklerini sunar.

mobil ödeme bozdurma link

8/28/2022 06:49:08 pm

https://guvenbozum.com/

hacklink link

9/3/2022 12:54:42 pm

Hacklink satın almak isteyenler için muhteşem bir site hacklink.gen.tr güçlü hacklink panel satışı ile sizde sıralamalarda yükseleceksiniz. Hemen sizde hacklink satın alın

urfa haber link

9/25/2022 02:09:45 pm

En güncel urfa haber için sitemizi ziyaret et! Site adresi https://haberurfadan.com/

instagram takipçi satın al link

10/5/2022 12:04:51 pm

Mutlak gerçek takipçiler sizi bekliyor. Hiçbir şekilde başarısız olmayan kaliteli takipçilere sahip olmak artık çok kolay. Türk takipçi satın alma seçeneği ile hızlı büyüme elde edin.

kepenk tamiri link

10/6/2022 05:21:19 pm

istanbul kepenk tamiri sitesidir! Kepenk tamiri hizmetleri için ziyaret et! https://kepenktamiriistanbul.net/

dhgate discount link

10/22/2022 02:08:27 pm

DHGate Coupons · Free. Shipping. Deal. Verified. 36h ago · $7. Off. Code. $7 Off $50 Orders. Ends in 1 month · Up To. 50%. Off. Deal. Up to 50% Off Select Fashion.

instagram takipçi satın al link

12/20/2022 10:02:42 am

İnstagram takipçi satın almak istiyorsan tıkla.

100 tl bonus veren siteler link

1/7/2023 12:09:33 pm

100 tl deneme bonusu veren siteleri öğrenmek istiyorsan tıkla.

Author

Hello world, my name is Jerry Kim. I have a Master's Degree in Physics and years of work experience in Image Processing, Machine Learning, and Deep Learning. I mostly have used C++, Matlab, and Python. I created this website to showcase a small sample of the things that I have worked on

Using Boosting for SF Crime Classification Kaggle project

Leave a Reply.

Author

Archives

Categories