🔥ЛУЧШИЙ ТГ КАНАЛ 18+ У НАС🔥


@erostiktokos

🔞Разнообразный контент
❤️ Лучшие девушки
✅ Переходи и убедись сам

/ категории / Технологии /

Spark in me - Internet, data science, math, deep learning, philosophy



@snakers4
1846 +2

All this - lost like tears in rain. Data science, ML, a bit of philosophy and math. No bs. Our website - http://spark-in.me Our chat - https://t.me/joinchat/Bv9tjkH9JHbxiV5hr91a0w DS courses review - http://goo.gl/5VGU5A - https://goo.gl/YzVUKf


PPD: 0.1, PV: 1195, DV: 118, ERR: 64%

ТОП каналов

آکادمی ترانه


@tgacademy
554.43K -2.82K

معرفی کانال ها، بات ها و مقالات جالب و کاربردی در مورد تلگرام و سرویس های دنیای ترانه. 📨 ارتباط با ادمین، تبلیغات و پخش موسیقی: @academy_contact 📷اینستا: https://goo.gl/MFTk9B

PINK PROXY


@PinkProxy
405.62K +520

☑️ Collection of MTProto Proxies 🔘 تبليغات بنرى: 📈 @PinkProxy_Ads 🔘 پشتيبانى و ارسال پراكسى: 👨‍💻 @Pink1Support_Bot 👨‍💻 @Pink2Support_Bot 🌸 پينك پراكسى، قديمى ترين تيم پراكسى ايران 🌸

Код Дурова


@d_code
138.87K +15

Самые интересные новости из мира IT и науки. Ещё больше тут — kod.ru ВК — vk.com/kod Instagram — goo.gl/VTbHqM Команда: @gaik1 / ad@kod.ru - Реклама @dolbolob - Главный редактор @vojtenko14 - Глава отдела лонгридов Канал не ведется Павлом Дуровым

Telegram Baza


@tbaza
112.30K -380

Телеграм каталог Размещение - [450 руб] Повторное - [350 руб] ⤴️ @AddTbazaBot ⤵️ Внимание! Перед добавлением ознакомьтесь с правилами размещения - https://telegra.ph/Pravila-06-12 - По всем вопросам - @TbazaBot

Телеблог


@teleblog
102.00K -186

Главный блог о Telegram. Новости, инсайды, лайфхаки. Всё, чем живет Telegram-сообщество, и даже больше. Реклама: @AdsTeleblogBot Помогает обойти блокировку: @TeleBlogbigbot admin: @admteleblog

WylsacomRed


@Wylsared
93.19K -147

Единственный легальный канал Wylsacom Media. По всем вопросам пишите сюда: alexanpob@wylsacom.media Видео: https://www.youtube.com/user/Wylsacom Сайт: https://wylsa.com ВК: https://vk.com/wylsacom Instagram: https://www.instagram.com/wylsacom_red/

Каталог Telegram каналов, ботов


@openbusines
92.02K -263

Каталог каналов и ботов в Telegram. По всем вопросам пишите в бот @AddkatalogBot

Недавние публикации

610

Tensorboard logging in PyTorch

Looked at this module some time ago. Looks like it matured now.
The coolest current feature - param logging.

Just compare these two docs:
- TensorboardX
- torch.utils

Looks like PyTorch just imported the most popular libarary, copying their docs and APIs.
Nice!

#deep_learning


09:09 21.10.19
789

GANs ⬆️


17:05 20.10.19
812

17:05 20.10.19
1064

Playing with name NER

Premise

So, I needed to separate street names that are actual name + surname. Do not ask me why.
Yeah I know that maybe 70% of streets are human names more or less.
So you need 99% precision and at least 30-40% recall.
Or you can imagine a creepy soviet name like Трактор.

So, today making a NER parser is easy, take out our favourite framework (plan PyTorch ofc) of choice.
Even use FastText or something even less true. Add data and boom you have it.

The pain

But not so fast. Turns our there is a reason why cutting out proper names is a pain.
For Russian there is the natasha library, but since it works on YARGY, it has some assumptions about data structure.
I.e. names should be capitalized, come in pairs (name - surname), etc etc - I did not look their rules under the hood, but I would write it like this.

So probably this would be a name - Иван Иванов
But this probably would not ванечка иванофф

Is it bad?
Ofc no, it just assumes some stuff that may not hold for your dataset.
And yeah it works for streets just fine.

Also recognizing a proper name without context does not really work. And good luck finding (or generating) corpora for that.

Why deep learning may not work

So I downloaded some free databases with names (VK.com respects your secutity lol - the 100M leaked database is available, but useless, too much noise) and surnames.
Got 700k surnames of different origin, around 100-200k male and female names. Used just random words from CC + wiki + taiga for hard negative mining.
Got 92% accuracy on 4 classes (just word, female name, male name, surname) with some naive models.

... and it works .... kind of. If you give it 10M unique word forms, it can distinguish name-like stuff in 90% of cases.
But for addresses it is useless more or less and heuristics from natasha work much better.

The moral

- A tool that works on one case may be 90% useless on another;
- Heuristics have very high precision, low recall and are fragile;
- Neural networks are superior, but you should match your artifically created dataset to the real data (it may take a month to pull off properly);
- In any case, properly cracking both approaches may take time, but both heuristics and NNs are very fast to create, but sometimes 3 plain rules give you 100% precision with 10% recall and sometimes generating a fake dataset that matches your domain is a no-brainer. It depends.

#datascience
#nlp
#deep
learning


07:07 19.10.19
1097

Current state of TF vs PyTorch


This review kind of is nothing new, but if you are new to the market, here is my TLDR:

- In reseach PyTorch >> TF, except for obscure cases;
- For small teams PyTorch >> TF;
- For fast product delivery and iteration PyTorch >> TF;
- For corporations TF > PyTorch;
- For edge computing / mobile now TF > PyTorch;
- For production in general, soon PyTorch ~ TF;

- The research community will not likely switch from PyTorch to TF 2.0;
- The remaining question now - will the large corporations / captive audiences switch to TF 2.0 from 1.0 or to PyTorch;

#deep_learning


09:09 11.10.19
818

17:05 10.10.19
749

also people recommend backblaze for ultra cheap "fast" backup storage
it also has rsync


08:08 10.10.19