In this blog post I will discuss my updated work on the San Francisco Crime Classification competition by Kaggle. The data and description of the competition is located at: https://www.kaggle.com/c/sf-crime
I had used Linear Discriminant Analysis and Random Forest. I have now been able to run Boosting and obtained much better log-loss scores First of all, I was able to generate the new features "Intersection" and "Night", utilize the data.table package to read in large csv faster, and make use of sparse matrices to save memory from this link: https://brittlab.uwaterloo.ca/2015/11/01/KaggleSFcrime/ I then implemented Gradient Boosting by using the "caret" and "xgboost" packages. I first tried eta=.3 (the larger eta is, the smaller the regularization penalty term is). With Cross-Validation using 3 folds, I found that the 16th iteration produced the smallest logloss.mean value of 2.56. However, my previous submission to Kaggle using LDA produced a log-loss of around 2.58. Because the validation error is smaller than what the test error would be, I knew that this 2.56 value was unacceptable. I then guessed that perhaps the previous LDA model overfitted the training set, so I tried increasing the regularization penalty term and decreased eta to 0.1 According to the xgboost documentation page, if you decrease eta, you must increase the number of boosting iterations. I thus tried 50 iterations. I then submitted this to Kaggle, and my logloss score was 2.43! That was much better than the 2.58 I got from LDA I should also note that I tried to use the parameter tuning with "caret". However, it was running too slow on my machine. Just trying a 2-fold CV, with 40 max iterations on 3 different eta values ran for over 8 hours! In addition to Boosting, I also tried to use Random Forests, use the bigRF package, and Neural Networks. In my previous analysis using Random Forests, I kept running into errors due to my computer not having enough memory. I recently purchased a new laptop with more RAM, but I have gotten those same errors with not sufficient memory when running Random Forests. I also could not get the bigRF package to work. I believe it was because it doesn't work on my version of R. As for Neural Networks, I am working on that as I type this post You can find the code I used for this analysis at: https://github.com/jk34/Kaggle_SF_Crime_Classification/blob/master/run_improved.r
12 Comments
8/20/2022 07:03:53 pm
Sizde NestaCloud ile aradığınız sunuculara ışık hızında erişebilirsiniz. en güzel en kaliteli sunucu hizmetlerine erişmek artık çok kolay. Sunucu hizmetlerimize güveniyoruz. Haydi hemen indirimli paketlerimizden deneyin. Pişman olmayacaksınız. bize güvenebilirsiniz.
Reply
8/21/2022 11:39:31 am
Mp3Video.org'u tercih etmelisiniz çünkü Youtube Mp3 ve Youtube Video İndir alanında lider teknoloji sağlayıcılarındandır. Hızlı altyapısı güvenilir sistemi. reklamsız ara yüzü ile kalite odaklı bir hizmet sunmaktadır. pişman olmama garantili bir servistir.
Reply
8/22/2022 03:16:33 am
DonghuaTR İle Türkçe Animeler Ve Donghualar yani Türk Anime seçeneklerielinizin hemen altında! İstediğiniz içeriği En yüksek kalitede izleyebilirsiniz. Hızlı ve çalışkan ekibimiz her gün yeni bölümleri ışık hızında siz değerli kullanıcılarımıza sunar.
Reply
9/25/2022 02:09:45 pm
En güncel urfa haber için sitemizi ziyaret et! Site adresi https://haberurfadan.com/
Reply
10/5/2022 12:04:51 pm
Mutlak gerçek takipçiler sizi bekliyor. Hiçbir şekilde başarısız olmayan kaliteli takipçilere sahip olmak artık çok kolay. Türk takipçi satın alma seçeneği ile hızlı büyüme elde edin.
Reply
10/6/2022 05:21:19 pm
istanbul kepenk tamiri sitesidir! Kepenk tamiri hizmetleri için ziyaret et! https://kepenktamiriistanbul.net/
Reply
10/22/2022 02:08:27 pm
DHGate Coupons · Free. Shipping. Deal. Verified. 36h ago · $7. Off. Code. $7 Off $50 Orders. Ends in 1 month · Up To. 50%. Off. Deal. Up to 50% Off Select Fashion.
Reply
12/20/2022 10:02:42 am
İnstagram takipçi satın almak istiyorsan tıkla.
Reply
1/7/2023 12:09:33 pm
100 tl deneme bonusu veren siteleri öğrenmek istiyorsan tıkla.
Reply
Leave a Reply. |
AuthorHello world, my name is Jerry Kim. I have a Master's Degree in Physics and years of work experience in Image Processing, Machine Learning, and Deep Learning. I mostly have used C++, Matlab, and Python. I created this website to showcase a small sample of the things that I have worked on Archives
March 2017
Categories |